finally a bnode with a uri

Could Microdata work better for me than RDFa?

Just had a quick look at the Microdata proposal, wondering about its pros and cons.
I've always had my little issues with RDFa, mainly for personal reasons. I'm repeating them here (for the last time, promised, don't want to trigger another flame war):
  • I personally don't like the amount of new attributes and their names (about, resource, typeof, and property are at least as inconsistent as RDF/XML's tokens).
  • I've written an RDFa parser, but still don't really understand the processing model. RDFa does the job of course, and it's been specified by smart people I respect, but to me it just still feels a little too complicated. I often have to utilize an extraction service to verify the triples resulting from a snippet, and I've seen the creators of RDFa do the same.
    One reason for being less intuitive than hoped is the fact that adding an attribute to some existing snippet can easily change the entire meaning of nested information. This makes it tricky to incrementally add structure to already tested and approved RDFa (an unnoticed @rel or @typeof may add an unwanted blank intermediate node, for example, and you can have any combination of RDFa attributes on a single node).
  • I consider structured blogging a central use case for RDF in HTML, yet it's not fully supported by RDFa: RDFa does not allow sub-structures in XML Literals (for security/triple injection reasons, IIRC), so you can't extract a post body (including HTML markup) and also get the annotations encoded in the body (like reviews or events).
  • (Reliable) copy and paste is not possible when prefix definitions can be kept separate from annotations. This is relevant to some of the apps I'm working on, and it took me quite some time to admit that (intuitively desirable) URI abbreviations in HTML do have negative practical implications. It depends on the use case, but it also needs some experience to realize this, as the pro-prefix argument is practically motivated as well. (I started playing with RDF-ish copy & paste rather early, if that makes this conclusion more credible).
  • The xmlns:prefix mechanism doesn't work nicely with my development environment. This is perhaps a silly argument, but for me personally it is important to see that green little "0 errors" indicator in my browser while I'm creating sites. It was not hard to extend the Firefox validator extension with support for new attributes, but there was no clean way to make it accept xmlns:prefix. Spotting true errors in the dozens of RDFa-related complaints is annoying.

Having said that, if this little list is all I can come up with, then RDFa is probably a pretty solid and usable spec. I could easily write a list of things I find flawed in RDF/XML, or even SPARQL, my favorite RDF technology. And there is another good reason why I should tend towards using RDFa: Lack of proper alternatives. I still think it would be possible to create a cross-doctype solution. eRDF and my own poshRDF experiment show that it's possible, but so far these approaches are incomplete RDF-wise, and I wouldn't have the energy or funds to build a community to develop things further (and again, my arguments are motivated by personal use cases and habits, so there isn't a large overlap with other people's requirements anyway).

Nevertheless, the new "Microdata" proposal is currently being discussed, so it might be worth having a look and comparing it with my RDFa issue list above. I only had a quick scan, I may have gotten some details wrong:
  • It only introduces two new (mandatory) attributes: "item" and "itemprop". "item" can be used to type resources. RDFa's "about" can be re-used for URI-identified items. That sounds compact and neat so far.
  • "item" is mandatory to indicate the boundary of a resource description. This makes accidental triples much less likely to happen than with RDFa. For any "itemprop", you just have to walk up the DOM tree to find the container item, which makes both human- and code-based parsing easy.
  • Structured blogging?Aww, not really. While you can at least choose between raw markup or structured values in RDFa, Microdata only supports flat key-value pairs where the value is a node's textContent and won't contain tags (if I read the draft correctly). I don't really need datatypes and languages, but I definitely want RDF triples where the object can contain HTML markup (wiki blobs with embedded annotations are another example).
  • Copy & paste of source code or from/to contenteditable sections is more reliable than with RDFa because there is no prefix mechanism.
  • It'd be possible to make the Firefox validator eat the new Microdata attributes without complaining, but I'm not sure how likely it is to have Microdata support in the official distribution anytime soon. Marc Gueury writes that validating HTML5 may require a new sort of validator, switching to HTML5 may make things worse instead of better for me, development-wise.

I recently watched a short section of a TV fortune-teller show where desperate people could dial in to get their questions asked. The lady who called asked "Will I find a new love?", and the fortune-teller looked into her cards (very slowly, of course, given the 3 EUR/minute rate), then slowly lifted her head, looked straight into the camera and articulated her findings: "I see a definite Maybe."

I guess this awesome universal answer also works for my opening question. There simply is no ideal solution. I like the item/itemprop idea, but I'd need to add a hack for markup values (e.g. by adding a item="...XMLLiteral" container and then converting these items to XML nodes. But then I can just add a simpler hack to my RDFa extractor to deep-parse XMLLiterals). This doesn't justify a whole new spec. The copy/paste problem is not too urgent any more, as Linked Data enables nifty copy-by-reference instead of copy-by-value.

It's generally a little surprising to see that Microdata proposal. For months, the HTML5 opinion makers argued against user-defined markup structures, and now they created a completely new spec that not only extends RDFa's possibilities to identify resource types and relations, but also seems to introduce a redundant serialization for selected microformats.

Anyway, for the sake of convergence and less work, I think I still prefer (a subset of) RDFa, if only there was a way to get rid of CURIEs (who wants an abbreviation mechanism whose acronym can't even be properly expanded? ;). And an alternative for the validation pain could be a simple, locally installed validator, accessible through a Ubiquity script. When I think about it, I mainly just need well-formedness and some attribute checks. A Ubiquity script could directly show HTML errors and also extracted triples, and maybe even do some triple sanity checks, too. But then this setup would work for Microdata just as fine. Ah well..

Comments and Trackbacks

Could you elaborate on what you want XMLLiterals for? I didn't include them in microdata just because none of the use cases I had needed it, but if there's a need then we should have it (either now, or in a future version).

Also, in the interests of correctness, microdata also has a "subject" attribute that lets you put the properties somewhere other than as a descendant of the "item". Makes the parsing not quite as simple, but hopefully still simple enough.
Comment by Ian Hickson on 2009-05-15 20:56:30 UTC
I've been watching the HTML5 microdata saga unfold and this is the best description of problems in RDFa I've seen so far and its helped this developer understand what's happening.

I think these issues could be addressed for many users by follow-up innovators. I don't see the rush to solve every problem perfectly right away.

GRDDL seems like an overlooked solution to problems of repetition in syntax and could be a way to automatically repeat sections of the expressed model as well (post as XML and post as triples). The problems of copy and paste and the making of accidental changes can be addressed in IDEs, by constraint validation tools, by automated testing or just by breaking up the data across related pages in a web like way (Linked Data?) until the requirement is trivial. Your structured blogging requirement could also be solved for many blogs by breaking up the triples across index page and the post itself.

I think RDFa will be found to address a large portion of the use cases but it won't be perfect. At the same time I think having the W3C effectively ditch RDFa will do a lot of harm not just to RDFa but to RDF and to the credibility of the W3C as leaders of the community.

I've just started with this stuff and have more plans yet, so it would also be very very annoying for me!
Comment by Simon Gibbs on 2009-05-15 22:34:50 UTC
Remember that microformats achieve attribute compactness by building on misuse of little-used existing attributes. "October 5" is just not an abbreviation, even if http://microformats.org/wiki/hcalendar says that <abbr class="dtstart" title="2007-10-05">October 5</abbr> is a sensible way to mark up information.

Processing instructions (in the sense of http://www.w3.org/TR/2008/REC-xml-20081126/#sec-pi, a bit of SGML legacy markup) don't play any role in RDFa, which is a good thing. Could you expand on that comment a little?
Comment by Bob DuCharme on 2009-05-16 01:47:19 UTC
Ian, thanks for the "subject" note. It's a handy addition and still simple enough, IMO (I've added a "mandatory" to the text). With regard to XMLLiteral, two use cases that come to mind are blog posts and wiki pages with embedded microdata. It should be possible to losslessly extract the full post body (à la rss:description vs. content:encoded in RSS), but also the embedded structures. In a semantic wiki, (human-oriented, formatted) content items can be enriched with data elements. These HTML snippets should be extractable and re-distributable without losing HTML markup.

Simon, thanks for the GRDDL hint, it's indeed often overlooked. I think it would have more success if there was a processing context other than just XSLT. XSLT-only doesn't seem to attract enough developers.

Bob, I've talked and written a lot about microformats, trade-ofs for an "ideal, clean solution", how MFs compare to eRDF and RDFa, etc, but this post is actually not about their shortcomings (the abbr issue is an acknowledged problem and HTML5/Microdata even provides a solution). Thanks for pointing out my incorrect use of the term "processing instructions" (ah, symbols and semantics ... ;), I meant RDFa's (normative) Processing Model and rules which are pretty complicated (I've tweaked the text). They don't really allow authors to fully understand how RDFa works (at least not me). I found the RDF/XML spec more helpful in this regard, for example. Maybe it's a disadvantage to be both implementor and author here, but it didn't use to be.
Comment by Benjamin Nowack on 2009-05-16 07:44:42 UTC
It may be enough to recognize that an approach similar to GRDDL has value for addressing some use cases, and - at another level of abstraction - that one technical approach alone may not be enough.

By "similar to" I mean one that also places some kind of transformation at another URL.
Comment by Simon Gibbs on 2009-05-18 19:25:56 UTC
0 comments are currently in the approval queue.

Comments are disabled for this post.

Archives/Search

YYYY or YYYY/MM
No Posts found

Feeds