An updated (and customizable) comparison of the different approaches for semantically enhancing HTML.
Update (2006-02-13):
In order to avoid further flame wars with RDFa folks, I've adjusted the form to not show my personal priorities as default settings anymore (here they are if you are interested, it's a 48-42-40 ranking for MFs, eRDF, and RDFa respectively). All features are set to "Nice to have" now. As you can see, for these settings, RDFa gets the highest ranking (I *said* the comparison is not biased against RDFa!). If you disable the features related to domain-independent resource descriptions, MFs shine, if you insist on HTML validity, eRDF moves up, etc. It's all in the mix.
After a
comment of mine on the Microformats IRC channel,
SWD's
Michael Hausenblas asks for the reason why I said that I personally don't like RDFa. Damn public logs ;) OK, now I have to justify that somehow without falling into rant mode again...
I already wrote
a little comparison of
Microformats,
Structured Blogging,
eRDF, and
RDFa some time ago, sounds like a good opportunity to see how things evolved during the last 8 months. Back then I concluded that both eRDF and RDFa were preferred candidates for
SemSol, but that RDFa lacked the necessary deployment potential due to not being valid HTML (as far as any widespread HTML spec is concerned).
I excluded the Structured Blogging initiative from this comparison, it seems to have died a silent death. (Their approach to redundantly embed microcontent in script tags apparently didn't convince the developer community.) I also excluded features which are equally available in all approaches, such as visible metadata, general support for plain literals, being well-formed, no negative effect on browser behaviour, etc.
Pretending to be constructive, and in order to make things less biased, I embedded a dynamic page item that allows you to create your own, tailored comparison.
The default results reflect my personal requirements (and hopefully answer Michael's question). As your mileage does most probably vary, you can just tweak the feature priorities (The different results are not stored, but the custom comparisons can be bookmarked). Feel free to leave a comment if you'd like me to add more criteria.
Bottom line: For many requirement combinations a single solution alone is not enough. My tailored summary suggests for example that I should be fine with a combination of Microformats and eRDF. How does your preferred solution mix look like?
Comments and Trackbacks
Here's a practical example: which of these technologies can express Flickr's latest machine tags regarding an embedded photo? Only RDFa.
it's quite biased too
I disagree. My preferences obviously differ from yours, but the comparison itself should be un-biased as you can disable any feature you don't consider relevant.
Putting things like "validation" on the same level as "extensible" is a bit confusing
That's exactly why I added these custom priorities. Just select "I don't care", and validation won't affect the calculated result. Different priorities also lead to different ratings, so the criteria are *not* on the same level, unless you specifiy them as such.
especially since RDFa can be implemented only with extra attributes
Which is exactly the point. The other approaches don't need extra attributes which are invalid in any deployed HTML spec. Again, if it's a non-issue for you, just set the priority to zero ("I don't care"). With regard to non-attribute features: If RDFa can be implemented with just attributes, why are the other serialization options still part of the syntax? If we should have learned anything from the RDF/XML deployment issues, then it's not putting unnecessary "optimizations" into a spec.
(1) re-mixing different *existing* vocabularies
That's what I meant with "Custom extensions" (Feature No 3.). Maybe I should rename it, but from an RDF POV, it's irrelevant if a vocabulary exists or not.
(2) making deeper statements like "this page has an author whose name is Ben" (eRDF can't do that)
Sorry, that's not true. Please have a look at the very first example in the eRDF doc ;)
(3) making statements about other URLs (eRDF can't do it, MFs can do it only in a domain-specific way.)
True for MFs, but they are domain-specific by design. Not true for eRDF: Creating relations to other docs can be done via rel/rev in eRDF, describing arbitrary resources via a simple owl:sameAs statement.
Here's a practical example: which of these technologies can express Flickr's latest machine tags regarding an embedded photo? Only RDFa.
Sorry, that may be true for MFs, but it's wrong again for eRDF. Here is a flickr machine tag snippet:
[a href="/photos/baddie80/tags/geolon767979417/" class="Plain"]geo:lon=7.67979417[/a]
Re MFs, you could only add a rel-tag, which might still be usable for certain tags (e.g. camera models). In eRDF, you'd have to add a span around the 7.67979417. Plus an owl:sameAs. Plus a namespace definition. That's it. Could it be that you guys haven't really looked at eRDF yet?
What you forget to mention is that you lose self-containment in eRDF the moment you need a LINK rel/rev: now you have to modify the HEAD of the document as well as the BODY. And that's not doable when you want to write a simple blog entry, when you're using your average content management system, when you want copy-and-paste, etc...
You also can't do what I specifically mentioned (deep structure) without naming the intermediate node, which is really annoying and almost a deal breaker if you want to, for example, plop down your bibtex entries in the page.
Regarding the flickr machine tag: fair enough, that can be made to work in eRDF, although you're now stuck declaring a custom namespace in the HEAD, which is hardly self-contained by any definition.
Regarding the multiple serializations... we're working on simplifying, and you're right that there shouldn't be multiple ways to do the same thing. It takes time to simplify, just like it takes time to build up the right features.
That said, the pattern is very clear: every time you stretch to do something a bit more complex with eRDF or MF, you end up giving up something significant. RDFa is meant to not give up on anything, and for that we added a few attributes. It is interesting that you're willing to give up so much just so you can validate, when RDFa is already conformant (if you browser conforms to the spec, RDFa won't break anything.)
As for being biased: come on, claiming you're not is just asking for trouble. You put "custom extensions" as a single item on the list, when it really needs to be broken up into multiple items so people can express which parts they really want and which they care less about. They may not want their own vocab, but they may want to mix existing ones. You're right that in RDF that's the same thing, but in MF it's definitely not. What about "won't break browser rendering?" Why isn't that an important feature that, for just about anyone, outweighs validation?
I simply don't understand some of your other judgments. RDFa is clearly more self-contained than eRDF, given the declaration of namespaces. You also handwave that microformats can "support emerging standards like nofollow" without explaining how they would do such a thing and who decides, given that no MFs out there use a profile URL to indicate what should be parsed. I'm also still looking for how tidy breaks RDFa.
What it comes down to in the end is this: RDFa doesn't validate with current validators. But, given that it's just extra attributes, is there really a problem? Go back to fundamentals and ask yourself: is HTML with a few extra attributes really broken? Should it really be that, if you add attributes for purposes that the browser can ignore (look at the Dojo toolkit), suddenly you're broken? I don't think so. And if you believe in an extensible web, then neither should you.
What you forget to mention is that you lose self-containment in eRDF the moment you need a LINK rel/rev: now you have to modify the HEAD of the document as well as the BODY. And that's not doable when you want to write a simple blog entry, when you're using your average content management system, when you want copy-and-paste, etc...
Please look at the different feature values, they already reflect that eRDF does not support full self-containment. See below for a comment on the head/body non-issue.
You also can't do what I specifically mentioned (deep structure) without naming the intermediate node, which is really annoying and almost a deal breaker if you want to, for example, plop down your bibtex entries in the page.
Right, and covered by feature "Explicit support for blank nodes". If it's a requirement for you, just tweak the priority setting. After your previous comment I've also added two additional features: "Arbitrary resource descriptions", and "Explicit syntactic means for arbitrary resource descriptions" which will increase RDFa's score, depending on your priorities.
Regarding the flickr machine tag: fair enough, that can be made to work in eRDF, although you're now stuck declaring a custom namespace in the HEAD, which is hardly self-contained by any definition.
No, eRDF allows you to define namespaces in the body section. I'm not sure if Ian added that to the spec already, though. You simply use the a-tag to do so. Again, this comparison is not meant to be "either-xor" (quite the opposite, actually), there is no point in poking artificial holes into one solution. Try to see things positive: both eRDF and RDFa allow you to describe flickr's machine tags by using custom namespaces, which should be covered by the feature matrix.
Regarding the multiple serializations... we're working on simplifying, and you're right that there shouldn't be multiple ways to do the same thing. It takes time to simplify, just like it takes time to build up the right features.
Great! I know that you are working hard to improve RDFa. Let me repeat that this comparison isn't meant to bash RDFa. It's just for evaluating *personal* priorities against the different approaches.
That said, the pattern is very clear: every time you stretch to do something a bit more complex with eRDF or MF, you end up giving up something significant. RDFa is meant to not give up on anything, and for that we added a few attributes. It is interesting that you're willing to give up so much just so you can validate, when RDFa is already conformant (if you browser conforms to the spec, RDFa won't break anything.)
Again, as the comparison shows, there are different offers on the table, each with its own advantages and disadvantages. Some people don't need more complex features, others, like you and me, do. Having to define namespaces may be a deal-breaker for one developer, Tidy alerts may be one for someone else. This can all be tailored in the form above.
As for being biased: come on, claiming you're not is just asking for trouble. You put "custom extensions" as a single item on the list, when it really needs to be broken up into multiple items so people can express which parts they really want and which they care less about. They may not want their own vocab, but they may want to mix existing ones. You're right that in RDF that's the same thing, but in MF it's definitely not.
MFs don't support custom extensions at all. eRDF and RDFa support them in the same way. Splitting feature #3 into three separate items wouldn't really have an effect on the overall score. And of course I'm biased. That's why I added the custom priorities, so that you can be biased, too. Just add another comment with a link to your customized comparison result.
What about "won't break browser rendering?" Why isn't that an important feature that, for just about anyone, outweighs validation?
As I said in the intro, this is not a feature that makes any difference wrt to the offered solutions, and is thus kept out of the feature list. Neither MFs, nor eRDF, nor RDFa break browser rendering. No matter what priority you'd pick for it, it wouldn't change the qualitative ranking.
I simply don't understand some of your other judgments. RDFa is clearly more self-contained than eRDF, given the declaration of namespaces.
The point is that either you *can* reliably copy chunks, or you can't. From the publisher's side, you either *can* produce self-contained snippets, or you can't. For the latter, the result would again be "yes" for all three approaches, so it's excluded from the feature matrix.
You also handwave that microformats can "support emerging standards like nofollow" without explaining how they would do such a thing and who decides, given that no MFs out there use a profile URL to indicate what should be parsed.
MFs are set of hard-coded conventions. The MFs crowd uses mailing lists and a wiki to develop new formats. They are not restricting themselves to a fixed syntax (only at the lowest HTML level), so they can embrace new patterns, while eRDF and RDFa are bound to their syntactical constructs (class, rel, rev, about, property, etc.). But hey, there *is* another feature lurking in here: "Stable syntax specification" (now added to the feature list).
I'm also still looking for how tidy breaks RDFa.
Try putting a meta or link tag into the body section and then run the html through tidy. They will be moved to the head. This is related to the point you made above, if RDa is going to drop support for link and meta in the body, this is a non-issue and I'll happily remove it from the comparison.
What it comes down to in the end is this: RDFa doesn't validate with current validators. But, given that it's just extra attributes, is there really a problem? Go back to fundamentals and ask yourself: is HTML with a few extra attributes really broken? Should it really be that, if you add attributes for purposes that the browser can ignore (look at the Dojo toolkit), suddenly you're broken? I don't think so. And if you believe in an extensible web, then neither should you.
I never said that improving the HTML spec is wrong. The right question is whether it's necessary for all use cases. It may be helpful for some, with the cost of having to change running systems. I also didn't say that there is no value in RDFa (I still get (almost) the same ranking for MFs and RDFa after all). Just tweak the priorities above and you'll probably end up with a combination that suggests to use RDFa alone. But it's simply not realistic to think there will ever be only one single solution. Even a perfect proposal for RDF-in-HTML probably wouldn't have kept others from creating a namespace-free approach. And that's basically all I tried to show with the comparison. Everyone has different priorities. For you, RDFa is clearly all that's needed, for others it's MFs, for me it's currently a combination of MFs and eRDF, in the future it might be MFs and RDFa, or maybe RDFa alone.
you can use the rev attribute to switch subject and object. For things beyond that, you need owl:sameAs, which carries the same semantics as RDFa's "about" (although RDFa's syntax is clearly more compact for that feature)
There could even be a (slight) advantage for using sameAs, as it allows you to refer to external things in terms of local ids (i.e. interlink different page sections that talk about external, but connected resources) which can lead to more compact markup. In general, however, RDFa's explicit "about" construct is surely more intuitive.
I put up a small demo page that shows you data from eRDF fragments that are linked to from the page. This is only really practical because of the close tie between the html page and the data.
Is this theme good unough for the Digg?