finally a bnode with a uri

Microdata, semantic markup for both RDFers and non-RDFers

RDF-in-HTML could have been so simple.
There's been a whole lot of discussion around Microdata, a new approach for embedding machine-readable information into forthcoming HTML5. What I find most attractive about Microdata is the fact that it was designed by HTMLers, not RDFers. It's refreshingly pragmatic, free of other RDF spec legacy, but still capable of expressing most of RDF.

Unfortunately, RDFa lobbyists on the HTML WG mailing list forced the spec out of HTML5 core for the time being. This manoeuver was understandable (a lot of energy went into RDFa, after all), but in my opinion very short-sighted. How many uphill battles did we have, trying to get RDF to the broader developer community? And how many were successful? Atom, microformats, OpenID, Portable Contacts, XRDS, Activity Streams (well, not really), these are examples where RDFers tried, but failed to promote some of their infrastructure into the respective solutions. Now: HTML5, where the initial RDF lobbying actually had an effect and lead to a native mechanism for RDF-in-HTML. Yes, native, not in some separate spec. This would have become part of every HTML5 book, any HTML developer on this planet would have learned about it. Finally a battle won. And what a great one. HTML.

But no, Microdata wasn't developed by an RDF group, so they voted it out again. Now, the really sad thing is, there could have been a solution that would have served everybody sufficiently well, both HTMLers and RDFers. The RDFa group recently realized that RDFa needs to be revised anyway, there is going to be an RDFa 1.1 which will require new parsers. If they'd swallowed their pride, they would most probably have been able to define RDFa 1.1 as a proper superset of Microdata.

Here is a short overview of RDF features supported by Microdata:
  • Explicit resource containers, via @itemscope (in RDFa, the boundaries of a resource are often implicitly defined by @rel or @typeof)
  • Subject declaration, via @itemid (RDFa uses @about)
  • Main subject typing, via @itemtype (RDFa uses @typeof)
  • Predicate declaration, via @itemprop (RDFa uses @property, @rel, and @rev)
  • Literal objects, via node values (RDFa also allows hidden values via @content)
  • Non-literal objects, via @href, @src, etc. (RDFa also allows hidden values via @resource)
  • Object language, via @lang
  • Blank nodes
I won't go into details why hiding semantics in RDFa will be penalized by search engines as soon as spammers discover the possibilities, why reusing RDF/XML's attribute names was probably not a smart move with regard to attracting non-RDFers, why the new @vocab idea is impractical, or why namespace prefixes, as handy as they are in other RDF formats, are not too helpful in an HTML context. Let's simply state that there is a trade-off between extended features (RDFa) and simplicity (Microdata). So, what are the core features that an RDFer would really need beyond Microdata:
  • the possibility to preserve markup, but probably not necessarily as an explicit rdf:XMLLiteral
  • datatypes for literal objects (I personally never used them in practice in the last 6 years that I've been developing RDF apps, but I can see some use cases)
Markup preservation is currently turned on by default in RDFa and can be disabled through @datatype in RDFa, so an RDFer-satisfying RDFa 1.1 spec could probably just be Microdata + @datatype + a few extended parsing rules to end up with the intended RDF. My experience with watching RDF spec creation tells me that the RDFa group won't pick this route (there simply is no "Kill a Feature" mentality in the RDF community), but hey, hope dies last.

I've been using Microdata in two of my recent RDF apps and the CMS module of (ahem, still not documented) Trice, and it's been a great experience. ARC is going to get a "microRDF" extractor that supports the RDF-in-Microdata markup below (Note: this output still requires a 2nd extraction process, as the current Microdata draft's RDF mechanism only produces intermediate RDF triples, which then still have to be post-processed. I hope my related suggestion will become official, but I seem to be the only pro-Microdata RDFer on the HTML list right now, so it may just stay as a convention):

Microdata:
<div itemscope itemtype="http://xmlns.com/foaf/0.1/Person">

  <!-- plain props are mapped to the itemtype's context -->
  <img itemprop="img" src="mypic.jpg" alt="a pic of me" />
  My name is <span itemprop="name"><span itemprop="nick">Alec</span> Tronnick</span>
  and I blog at <a itemprop="weblog" href="http://alec-tronni.ck/">alec-tronni.ck</a>.

  <!-- other RDF vocabs can be used via full itemprop URIs -->
  <span itemprop="http://purl.org/vocab/bio/0.1/olb">
    I'm a crash test dummy for semantic HTML.
  </span>
</div>
Extracted RDF:
@base <http://host/path/>
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix bio: <http://purl.org/vocab/bio/0.1/> .
_:bn1 a foaf:Person ;
      foaf:img <mypic.jpg> ;
      foaf:name "Alec Tronnick" ;
      foaf:nick "Alec" ;
      foaf:weblog <http://alec-tronni.ck/> ;
      bio:olb "I'm a crash test dummy for semantic HTML." .

Comments and Trackbacks

I perfectly understand your position and I think it is a very pragmatic one. Though, I'm not sure if it is the best way ahead to think and argue in terms of 'battles' etc.

Re "they'd swallowed their pride" - funny enough, I was thinking the same about Hixie and the WHATWG chaps. I didn't really understand why one would not just reuse what is there (RDFa) but had to come up with a totally new proposal which voids all the hours that went into tools, documents, etc. already. If you have any deeper insights, I'd be more than happy to learn. But, as I tell our children all the time: life was not meant to be fair.

So, let's see if and how the market decides :)
Comment by Michael Hausenblas on 2010-01-26 15:10:21 UTC
It's been separated into another specification, as RDFa is in a separate specification. So, what's the problem with that?
Comment by Shelley on 2010-01-26 15:30:22 UTC
Michael, Microdata is a solution that could work for both mainstream HTMLers and RDFers.

Shelley, the lost opportunity of a massive distribution channel.
Comment by Benjamin Nowack on 2010-01-26 15:41:22 UTC
Good article, and one I agree with for the most part.

I hope you haven't abandoned the public-html thread about your suggested processing changes. I agree that the monster URLs produced by the current RDF extraction algorithm are a bit impractical, but is it in fact the case that your suggested change is safe for *all* RDF vocabularies? And what about microdata types like http://n.whatwg.org/work?

Perhaps as compromise we could maintain a whitelist of namespaces for which this works together with the formal OWL equivalences (machine-generated from whitelist) This way, microdata processors can either special-case the whitelisted itemtypes for your short-cut or just include a huge chunk of OWL triples if the output is intended for a triplestore.

I'm not an RDFer. What other ways that a triplestore and SPARQL are there actually for processing RDF data?
Comment by Philip J├Ągenstedt on 2010-01-26 15:53:25 UTC
But Benjamin, HTML5, the specification, not the hype, is not a distribution channel.
Comment by Shelley on 2010-01-26 16:03:01 UTC
Hi Bengee,

I really like your viewpoint on this and believe you bring up some really essential points. All of us ought to be happy to get structure where we can. Insistence that it is "The RDF way or the highway" will ensure low adoption.

I hope it is not too late to correct the shortsightedness regarding microdata.
Comment by Mike Bergman on 2010-01-26 16:11:45 UTC
Shelley, I tend to disagree ;)
Mike, thx for the support :)
Comment by Benjamin Nowack on 2010-01-26 18:17:12 UTC
I'd just like to echo the sentiments of others here: Microdata was a complete wheel re-invention on the part of an individual author who has too much control over the HTML5 specification.

I'm of the belief RDFa is the right tool for the job, adequate thought and documentation has been created for it, tools are out there for it, and industry support is growing (Yahoo, Google, etc) for it, validators exist for it, etc etc.

It irks me that all of this can be thrown away and an unproven method can be pushed into the spec at the drop of a hat.
Comment by Daniel O'Connor on 2010-01-26 23:40:02 UTC
Daniel, RDFa is absolutely not suitable for use in HTML for numerous reasons that have been articulated before, including its reliance on namespaces and the XML-only xmlns syntax, which needs to remain meaningless in HTML.

The RDF community should be more interested in the ability to express the semantics of RDF relationships, but you, like many, seem to have an unhealthy attachment to the RDFa syntax for no good reason, regardless of whether or not the syntax is actually appropriate for the medium.

I think developing a more suitable, Designed for HTML, syntax that is able to express RDF semantics, and which is easier to use and understand for the vast majority of authors that have largely rejected RDFa up to this point, should be a more important victory for the semantic web, than it is to continue pushing the overly complicated RDFa syntax. In other words, get over the Not Invented Here syndrome and accept Microdata as a much needed improvement.
Comment by Lachlan Hunt on 2010-01-27 14:50:44 UTC
@Lachlan I don't want to start the whole debate again here.

I do want to point out some of the problems I see at this point in the process, so... *puts on flame suit*

First: Tool support, and will it work how we think?

Microdata parsers: http://www.google.com.au/search?q=microdata+parser&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-GB:official&client=firefox-a

I can only find one popular result at this time, a perl CPAN package.

RDFa parsers: http://www.google.com.au/search?hl=en&client=firefox-a&rls=org.mozilla%3Aen-GB%3Aofficial&hs=RyG&q=rdfa+parser&btnG=Search&meta=&aq=f&oq=
Oodles of results.

What I draw from these extremely limited samplings are basically (1) A fair few people are out there consuming RDFa. (2) There are far less people out there who have parsed microdata.

What are the challenges that will be faced by implementors of microdata? What are the deficiencies in microdata which aren't yet apparent to us? What are the hurdles publishers must learn about?

In my view, RDFa has answers for these questions; and is proven in the field to some degree.

Real world examples: Google / Yahoo indexing RDFa good relations / product data; publishers like Best Buy rendering their information as RDFa.

At the moment, there are few implementors of microdata on a large scale that I'm aware of.

Second: Community
With RDFa, it's got the bulk of all of the ontologies/vocabularies (ie foaf, dc, good relations) that exist from the RDF/Semantic Web world sitting right behind it.

The people who create these kinds of ontologies in the current semantic web field are researchers, academics, and pedantic people. They are generally fine with RDF/XML, triples, etc. Being the kind of people they are, they also love the linked data tool stack.

When these people make efforts to reach out, document and share their work; they are unlikely to choose microdata to publish their tutorials in. Particularly due to exactly this kind of discussion that has been taking place: a very adversarial one.

What this means is that web authors who want to publish structured data won't simply be able to google "microdata goodrelations" and get meaningful results at the drop of a hat.

Real world example: Microformats vs RDF, RDFa + Ontologies
http://microformats.org has a handful of microformats documented - hCal, hCard, XFN etc. Total: Less than 20 de facto standards.
These have gained widespread adoption, and that's great.

RDF, RDFa + Ontologies Swoogle has over 10,000 ontologies indexed.
The most popular ontologies/vocabularies, off the top of my head:
FOAF, Music Ontology, Geo, Dublin Core, RSS 1.0, Good Relations, SIOC, BioRDF, etc - each has millions of triples, and many implementors/publishers.

For me; this means I search for "gene data rdfa"; I get the answers I'm looking for, and the same answers everyone else who has an interest in this gets. I can't do that with microformats, because no one has written a gene data microformat.
I see it getting worse with microdata. Because of the interaction between the RDF(a) community and the HTML5 community; I don't see it being possible in the next 5 years to do that simple google search ("gene data microdata") and leverage the existing work of others. The biggest problem I have with that: making standards is *hard* work. To throw away the linked data / semantic web folks work of the last few years and start over strikes me as a tremendous expenditure of energy for little gain. Third: Target audience Microdata, RDFa: Both want to put structured data into human readable documents. Who are the main groups who want to do this: * Everyone involved in the Linked Data movement. They want to publish... everything! * Everyone involved with Microformats. They want to publish everything too: but starting off with the most common (contact, calendar, product, etc) * Anyone who has ever put an infobox on a wikipedia article. They want to publish everything too. * Companies: They want to publish product, contact information. * Governments: They want to publish contact information, statistics, geo and other open data. * Researchers: They want to publish research data (ie: bio data, chemistry data). * Geonames/Open Street Mappers/etc: Geo data. * Flickr users: CC photos, tags, etc. * Indie Music stores: Playlists, licensing data, product data From what I've seen, the linked data community/rdfa community has been shunned a lot during this process.
The microformats community has had a lot better luck. What I'm concerned about is that the other groups aren't being given their chance - each is being proxied by the linked data / microformats crowds. To push the science angles: RDFa has a lot better chance of modelling scientific data, government statistical data, geodata, and so on and so forth because the communities already exist, the tutorials already exist, the mailing lists already exist, the code already exists. Microdata has a pretty good chance to re-do microformats (contact, calendar, etc) - but what's next? How do members of the target audiences start putting their specialized data out there? Where do they find the vocabularies? Real world examples: XML: Meant to be the silver bullet of information exchange. Without vocabularies; it failed to achieve most of its promises beyond things like RSS 2.0, Atom; and corporate things like XBRL. It turns out the best way to exchange information that means something (no matter where you find it) for XML is RDF/XML + Vocabularies. It also turns out that for 1-1 relationships where semantics don't matter, everyone else uses JSON. If microdata can't become another word for "semantic web" (and so, when you search for microdata + your problem here, provide answers), it risks becoming something like XML. If that happens, we'll see communities forming, creating data which can't be easily combined ("Oh no, my RSS 2.0 feed can't be combined with a Creative Commons licencing snippet until someone writes a spec on how the two bits of microdata should sit together"). Well, that's it, I'm out of steam. Whew, time to get out of the flamesuit, it's itchy in here.
Comment by Daniel O'Connor on 2010-01-27 23:37:50 UTC
0 comments are currently in the approval queue.

Comments are disabled for this post.

Earlier Posts

Archives/Search

YYYY or YYYY/MM
No Posts found

Feeds