finally a bnode with a uri

Posts tagged with: rdfa

Schema.org - Threat or Opportunity?

Some thoughts about the impact of Schema.org
I only wanted to track SemTech chatter but it seems all semantics-related tweet streams are discussing just one thing right now: Schema.org. So I apparently will have to build a #semtech filtering app, but I couldn't resist and had a close look at Schema.org, too. And just like everybody else, I'll join the fun of polluting the web with yet another opinion about its potential impact on the Semantic Web initiative and related efforts.

What exactly is Schema.org?

  • It is a list of instructions for adding structured data to HTML pages.
  • Webmasters can choose from a long, but finite list of types and properties.
  • Data-enhanced web pages trigger richer displays in Google/Bing/Yahoo search result pages.

Why the uproar?

  • Schema.org proposes the use of Microdata, a rather new RDF format that was not developed by the RDF community.
  • Schema.org introduces a new vocabulary which doesn't re-use terms from existing RDF schemas.

Who can benefit from it?

  • The web, because the simple template-like instructions on schema.org will boost the amount of structured data, similar to Facebook's Open Graph Protocol.
  • The semantic web market, by offering complementing as well as extending/competing solutions.
  • SEO people, because they can offer their service with less effort.
  • Website owners, who can more reliably customize their search engine displays and increase CTRs.
  • Possibly HTML5 (doctype) deployment, because the supported structures are based on HTML5's Microdata.
  • Verticals around popular topics (Music, Food, ...) because the format shakeout will make their parser writers' lifes easier.
  • Verticals who manage to successfully establish a schema.org extension (e.g. Job Offers).
  • The search engine companies involved, because extracting (known) structures can be less expensive and more accurate than NLP and statistical analysis. Controlling the vocabulary also means being able to tailor it to semantic advertising needs, integrating the schema.org taxonomy with AdWords would make a lot of (business) sense. And finally, the search engines can more easily generate their own verticals now (as Google has already successfully done with shopping and recipe browsers), making it harder for specialized aggregators to gain market share.
  • Spammers, unless the search engines manage to integrate the structured markup with their exisitng stats-based anti-spam algorithms.

Who might be threatened and how could they respond?

  • Microformats and overlapping RDF vocabularies such as FOAF (unlikely) or GoodRelations, which Schema.org already calls "earlier work". Even if they continue to be supported for the time being, implementers will switch to schema.org vocabulary terms. One opportunity for RDF schema providers lies in grounding their terms in the schema.org taxonomy and highlighting use cases beyond the simple SEO/Ad objectives of Schema.org. RDF vocabs excel in the long tail, and there are many opportunities left (especially for non-motorcycle businesses ;-). This will best work out if there are finally going to be applications that utilize these advanced data structures. If the main consumers continue to be search engines, there is little incentive to invest in higher granularity.
  • The RDFa community. They think they are under attack here, and I wonder if Manu is overreacting perhaps? Hey, if they had listened to me they wouldn't have this problem now, but they had several reasons to stick to their approach and I don't think these arguments get simply wiped away by Schema.org. They may have to spend some energy now on keeping Facebook on board, but there are enough other RDFa adopters that they shouldn't be worried too much. And, like the RDF vocab providers, they should highlight use cases beyond SEO. The good news is that potential spam problems, which are more likely to occur in the SEO context, will now get associated with Microdata, not RDFa. And the Schema.org graph can be manipulated by any site owner while Facebook's interest graph is built by authenticated users. Maybe the RDFa community shouldn't have taken the SEO train in the first place anyway. Now Schema.org simply stole the steam. After all, one possible future of the semantic web was to creatively destroy centralized search engines, and not to suck up to them. So maybe Schema.org can be interpreted as a kick in the back to get back on track.
  • The general RDF community, but unnecessarily so. RDFers kicked off a global movement which they can be proud of, but they will have to accept that they no longer dictate how the semantic web is going to look like. Schema.org seems to be a syntax fight, but Microdata maps nicely to RDF, which RDFers often ignore (that's why schema.rdfs.org was so easy to set up). The real wakeup call is less obvious. I'm sure that until now, many RDFers didn't notice that a core RDF principle is dying. RDFers used to think that distinct identifiers for pages and their topics are needed. This assumption was already proved wrong when Facebook started their page-based OGP effort. Now, with Schema.org's canonical URLs, we have a second, independent group that is building a semantic web truly layered on top of the existing web, without identifier mirrors (and so far without causing any URI identity crisis). This evolving semantic web is closer to the existing web than the current linked data layer, and probably even more compatible with OWL, too. There is a lot we can learn. Instead of becoming protective, the RDF community should adapt and simplify their offerings if they want to keep their niches relevant. Luckily, this is already happening, as e.g. the Linked Data API demonstrates. And I'm very happy to see Ivan Herman increasingly speaking/writing about the need to finally connect web developers with the semantic web community.
  • Early adopters in the CMS market. Projects like Drupal and IKS have put non-trivial resources into integrating machine-readable markup, and most of them are using RDFa. Microdata, in my experience, is easier to tame in a CMS than RDFa, especially when it comes to JavaScript operations. But whether semantic CMSs should add support for (or switch to) Schema.org microdata and their vocabulary depends more on whether they want/need to utilize SEO as a (short-term) selling proposition. Again, this will also depend on application developer demands.

What about Facebook?

Probably the more interesting aspect of this story, what will Facebook do? Their interest graph combined with linked data has big potential, not only for semantic advertising. And Facebook is interested in getting as many of their hooks into websites as possible. Switching to Microdata and/or aligning their types with Schema.org's vocabulary could make sense. Webmasters would probably welcome such a consolidation step as well. On the other hand, Facebook is known for wanting to keep things under their own control, too, so the chance of them adopting Schema.org and Microdata is rather low. This could well turn into an RSS-dejavu with a small set of formats (OGP-RDFa, full RDFa, Schema.org-Microdata, full Microdata) fighting for publisher and developer attention.

Conclusion

I am glad that Microdata finally gets some deserved attention and that someone acknowledged the need for a format that is easy to write and to consume. Yes, we'll get another wave of "see, RDF is too complicated" discussions, but we should be used to them by now. I expect RDF toolkits to simply integrate Microdata parsers soon-ish (if we're good at one thing then it's writing parsers), and the Linked Data community gets just another taxonomy to link to. Schema.org owns the SEO use case now, but it's also a nice starting point for our more distributed vision. The semantic web vision is bigger than data formats and it's definitely bigger than SEO. The enterprise market which RDF has mainly been targetting recently is a whole different beast anyway. No kittens killed. Now go build some apps, please ;-)

Trice' Semantic Richtext Editor

A screencast demonstrating the structured RTE bundled with the Trice CMS
In my previous post I mentioned that I'm building a Linked Data CMS. One of its components is a rich-text editor that allows the creation (and embedding) of structured markup.

An earlier version supported limited Microdata annotations, but now I've switched the mechanism and use an intermediate, but even simpler approach based on HTML5's handy data-* attributes. This lets you build almost arbitrary markup with the editor, including Microformats, Microdata, or RDFa. I don't know yet when the CMS will be publicly available (3 sites are under development right now), but as mentioned, I'd be happy about another pilot project or two. Below is a video demonstrating the editor and its easy customization options.

Could having two RDF-in-HTMLs actually be handy?

A combination of RDFa and Microdata would allow for separate semantic layers.
Apart from grumpy rants about the complexity of W3C's RDF specs and semantic richtext editing excitement, I haven't blogged or tweeted a lot recently. That's partly because there finally is increased demand for the stuff I'm doing at semsol (agency-style SemWeb development), but also because I've been working hard on getting my tools in a state where they feel more like typical Web frameworks and apps. Talis' Fanhu.bz is an example where (I think) we found a good balance between powerful RDF capabilities (data re-purposing, remote models, data augmentation, a crazy army of inference bots) and a non-technical UI (simplistic visual browser, Twitter-based annotation interfaces).

Another example is something I've been working on during the last months: I somehow managed to combine essential parts of Paggr (a drag&drop portal system based on RDF- and SPARQL-based widgets) with an RDF CMS (I'm currently looking for pilot projects). And although I decided to switch entirely to Microdata for semantic markup after exploring it during the FanHubz project, I wonder if there might be room for having two separate semantic layers in this sort of widget-based websites. Here is why:

As mentioned, I've taken a widget-like approach for the CMS. Each page section is a resource on its own that can be defined and extended by the web developer, it can be styled by themers, and it can be re-arranged and configured by the webmaster. In the RDF CMS context, widgets can easily integrate remote data, and when the integrated information is exposed as machine-readable data in the front-end, we can get beyond the "just-visual" integration of current widget pages and bring truly connectable and reusable information to the user interface.

Ideally, both the widgets' structural data and the content can be re-purposed by other apps. Just like in the early days of the Web, we could re-introduce a copy & paste culture of things for people to include in their own sites. With the difference that RDF simplifies copy-by-reference and source attribution. And both developers and end-users could be part of the game this time.

Anyway, one technical issue I encountered is when you have a page that contains multiple page items, but describes a single resource. With a single markup layer (say Microdata), you get a single tree where the context of the hierarchy is constantly switching between structural elements and content items (page structure -> main content -> page layout -> widget structure -> widget content). If you want to describe a single resource, you have to repeatedly re-introduce the triple subject ("this is about the page structure", "this is about the main page topic"). The first screenshot below shows the different (grey) widget areas in the editing view of the CMS. In the second screenshot, you can see that the displayed information (the marked calendar date, the flyer image, and the description) in the main area and the sidebar is about a single resource (an event).

Trice CMS Editor
Trice CMS editing view

Trice CMS Editor
Trice CMS page view with inline widgets describing one resource

If I used two separate semantic layers, e.g. RDFa for the content (the event description) and Microdata for the structural elements (column widths, widget template URIs, widget instance URIs), I could describe the resource and the structure without repeating the event subject in each page item.

To be honest, I'm not sure yet if this is really a problem, but I thought writing it down could kick off some thought processes (which now tend towards "No"). Keeping triples as stand-alone-ish as possible may actually be an advantage (even if subject URIs have to be repeated). No semantic markup solution so far provides full containment for reliable copy & paste, but explicit subjects (or "itemid"s in Microdata-speak) could bring us a little closer.

Conclusions? Err.., none yet. But hey, did you see the cool CMS screenshots?

Microdata, semantic markup for both RDFers and non-RDFers

RDF-in-HTML could have been so simple.
There's been a whole lot of discussion around Microdata, a new approach for embedding machine-readable information into forthcoming HTML5. What I find most attractive about Microdata is the fact that it was designed by HTMLers, not RDFers. It's refreshingly pragmatic, free of other RDF spec legacy, but still capable of expressing most of RDF.

Unfortunately, RDFa lobbyists on the HTML WG mailing list forced the spec out of HTML5 core for the time being. This manoeuver was understandable (a lot of energy went into RDFa, after all), but in my opinion very short-sighted. How many uphill battles did we have, trying to get RDF to the broader developer community? And how many were successful? Atom, microformats, OpenID, Portable Contacts, XRDS, Activity Streams (well, not really), these are examples where RDFers tried, but failed to promote some of their infrastructure into the respective solutions. Now: HTML5, where the initial RDF lobbying actually had an effect and lead to a native mechanism for RDF-in-HTML. Yes, native, not in some separate spec. This would have become part of every HTML5 book, any HTML developer on this planet would have learned about it. Finally a battle won. And what a great one. HTML.

But no, Microdata wasn't developed by an RDF group, so they voted it out again. Now, the really sad thing is, there could have been a solution that would have served everybody sufficiently well, both HTMLers and RDFers. The RDFa group recently realized that RDFa needs to be revised anyway, there is going to be an RDFa 1.1 which will require new parsers. If they'd swallowed their pride, they would most probably have been able to define RDFa 1.1 as a proper superset of Microdata.

Here is a short overview of RDF features supported by Microdata:
  • Explicit resource containers, via @itemscope (in RDFa, the boundaries of a resource are often implicitly defined by @rel or @typeof)
  • Subject declaration, via @itemid (RDFa uses @about)
  • Main subject typing, via @itemtype (RDFa uses @typeof)
  • Predicate declaration, via @itemprop (RDFa uses @property, @rel, and @rev)
  • Literal objects, via node values (RDFa also allows hidden values via @content)
  • Non-literal objects, via @href, @src, etc. (RDFa also allows hidden values via @resource)
  • Object language, via @lang
  • Blank nodes
I won't go into details why hiding semantics in RDFa will be penalized by search engines as soon as spammers discover the possibilities, why reusing RDF/XML's attribute names was probably not a smart move with regard to attracting non-RDFers, why the new @vocab idea is impractical, or why namespace prefixes, as handy as they are in other RDF formats, are not too helpful in an HTML context. Let's simply state that there is a trade-off between extended features (RDFa) and simplicity (Microdata). So, what are the core features that an RDFer would really need beyond Microdata:
  • the possibility to preserve markup, but probably not necessarily as an explicit rdf:XMLLiteral
  • datatypes for literal objects (I personally never used them in practice in the last 6 years that I've been developing RDF apps, but I can see some use cases)
Markup preservation is currently turned on by default in RDFa and can be disabled through @datatype in RDFa, so an RDFer-satisfying RDFa 1.1 spec could probably just be Microdata + @datatype + a few extended parsing rules to end up with the intended RDF. My experience with watching RDF spec creation tells me that the RDFa group won't pick this route (there simply is no "Kill a Feature" mentality in the RDF community), but hey, hope dies last.

I've been using Microdata in two of my recent RDF apps and the CMS module of (ahem, still not documented) Trice, and it's been a great experience. ARC is going to get a "microRDF" extractor that supports the RDF-in-Microdata markup below (Note: this output still requires a 2nd extraction process, as the current Microdata draft's RDF mechanism only produces intermediate RDF triples, which then still have to be post-processed. I hope my related suggestion will become official, but I seem to be the only pro-Microdata RDFer on the HTML list right now, so it may just stay as a convention):

Microdata:
<div itemscope itemtype="http://xmlns.com/foaf/0.1/Person">

  <!-- plain props are mapped to the itemtype's context -->
  <img itemprop="img" src="mypic.jpg" alt="a pic of me" />
  My name is <span itemprop="name"><span itemprop="nick">Alec</span> Tronnick</span>
  and I blog at <a itemprop="weblog" href="http://alec-tronni.ck/">alec-tronni.ck</a>.

  <!-- other RDF vocabs can be used via full itemprop URIs -->
  <span itemprop="http://purl.org/vocab/bio/0.1/olb">
    I'm a crash test dummy for semantic HTML.
  </span>
</div>
Extracted RDF:
@base <http://host/path/>
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix bio: <http://purl.org/vocab/bio/0.1/> .
_:bn1 a foaf:Person ;
      foaf:img <mypic.jpg> ;
      foaf:name "Alec Tronnick" ;
      foaf:nick "Alec" ;
      foaf:weblog <http://alec-tronni.ck/> ;
      bio:olb "I'm a crash test dummy for semantic HTML." .

Simple RDFication of SPARQL SELECT results with RDFa

How to use RDFa to make SELECT results locally available as RDF
A couple of weeks ago, I've written about the self-enforcing value spiral that RDF data enables. Here is an example about how RDFa can be used to support this "Repurpose-Republish" loop.

While data exchange between different semantic web sources is usually RDF-based (i.e. the data always maintain their semantics), there is one major exception: SPARQL SELECT queries. This developer-oriented operation returns tabular data (similar to record sets in SQL). Once the query result is separated from the query, the associated structural data is lost. You can't directly feed SELECT results back into a triple store, even though querying based on linked resources means that you have just created knowledge. It's a pity to show this generated information to human consumers only.

One of the demos at my NYC talk was a dynamic wiki item that pulled in competitor information from Semantic CrunchBase and injected that into a page template as HTML. The existing RDF infrastructure does not let me cache the SELECT results locally as usable RDF. And a semantic web client or crawler that indexes the wiki page will not learn how the described resource (e.g. Twitter) is related to the remote, linked entities.

wiki with linked data

However, by simply adding a single RDFa hook to the wiki item template, the RDF relation (e.g. competitor) can be made available again to apps that process my site content. This is basically how Linked Data works. But here is the really nifty thing: My site can be a consumer of its own pages, too, recursively enriching its own data.

markup-to-SELECT-to-RDFa-to-RDF

I tweaked the wiki script which now works like this: When the page is saved, a first operation updates the wiki markup in the page's graph (i.e. the not-yet-populated template). In a second step, the page URL is retrieved via HTTP. This will return HTML with RDFa-encoded remote data, which is then parsed by ARC, and finally added to the same graph. We end up with a graph that does not only contain the wiki markup, but also the RDFized information that was integrated from remote sites. After adding this graph to the RDF store, we can use a local query to generate the page and occasionally reset the graph to enable copy-by-reference. And all this without any custom API code.

rdfa-to-sparql

Could Microdata work better for me than RDFa?

Just had a quick look at the Microdata proposal, wondering about its pros and cons.
I've always had my little issues with RDFa, mainly for personal reasons. I'm repeating them here (for the last time, promised, don't want to trigger another flame war):
  • I personally don't like the amount of new attributes and their names (about, resource, typeof, and property are at least as inconsistent as RDF/XML's tokens).
  • I've written an RDFa parser, but still don't really understand the processing model. RDFa does the job of course, and it's been specified by smart people I respect, but to me it just still feels a little too complicated. I often have to utilize an extraction service to verify the triples resulting from a snippet, and I've seen the creators of RDFa do the same.
    One reason for being less intuitive than hoped is the fact that adding an attribute to some existing snippet can easily change the entire meaning of nested information. This makes it tricky to incrementally add structure to already tested and approved RDFa (an unnoticed @rel or @typeof may add an unwanted blank intermediate node, for example, and you can have any combination of RDFa attributes on a single node).
  • I consider structured blogging a central use case for RDF in HTML, yet it's not fully supported by RDFa: RDFa does not allow sub-structures in XML Literals (for security/triple injection reasons, IIRC), so you can't extract a post body (including HTML markup) and also get the annotations encoded in the body (like reviews or events).
  • (Reliable) copy and paste is not possible when prefix definitions can be kept separate from annotations. This is relevant to some of the apps I'm working on, and it took me quite some time to admit that (intuitively desirable) URI abbreviations in HTML do have negative practical implications. It depends on the use case, but it also needs some experience to realize this, as the pro-prefix argument is practically motivated as well. (I started playing with RDF-ish copy & paste rather early, if that makes this conclusion more credible).
  • The xmlns:prefix mechanism doesn't work nicely with my development environment. This is perhaps a silly argument, but for me personally it is important to see that green little "0 errors" indicator in my browser while I'm creating sites. It was not hard to extend the Firefox validator extension with support for new attributes, but there was no clean way to make it accept xmlns:prefix. Spotting true errors in the dozens of RDFa-related complaints is annoying.

Having said that, if this little list is all I can come up with, then RDFa is probably a pretty solid and usable spec. I could easily write a list of things I find flawed in RDF/XML, or even SPARQL, my favorite RDF technology. And there is another good reason why I should tend towards using RDFa: Lack of proper alternatives. I still think it would be possible to create a cross-doctype solution. eRDF and my own poshRDF experiment show that it's possible, but so far these approaches are incomplete RDF-wise, and I wouldn't have the energy or funds to build a community to develop things further (and again, my arguments are motivated by personal use cases and habits, so there isn't a large overlap with other people's requirements anyway).

Nevertheless, the new "Microdata" proposal is currently being discussed, so it might be worth having a look and comparing it with my RDFa issue list above. I only had a quick scan, I may have gotten some details wrong:
  • It only introduces two new (mandatory) attributes: "item" and "itemprop". "item" can be used to type resources. RDFa's "about" can be re-used for URI-identified items. That sounds compact and neat so far.
  • "item" is mandatory to indicate the boundary of a resource description. This makes accidental triples much less likely to happen than with RDFa. For any "itemprop", you just have to walk up the DOM tree to find the container item, which makes both human- and code-based parsing easy.
  • Structured blogging?Aww, not really. While you can at least choose between raw markup or structured values in RDFa, Microdata only supports flat key-value pairs where the value is a node's textContent and won't contain tags (if I read the draft correctly). I don't really need datatypes and languages, but I definitely want RDF triples where the object can contain HTML markup (wiki blobs with embedded annotations are another example).
  • Copy & paste of source code or from/to contenteditable sections is more reliable than with RDFa because there is no prefix mechanism.
  • It'd be possible to make the Firefox validator eat the new Microdata attributes without complaining, but I'm not sure how likely it is to have Microdata support in the official distribution anytime soon. Marc Gueury writes that validating HTML5 may require a new sort of validator, switching to HTML5 may make things worse instead of better for me, development-wise.

I recently watched a short section of a TV fortune-teller show where desperate people could dial in to get their questions asked. The lady who called asked "Will I find a new love?", and the fortune-teller looked into her cards (very slowly, of course, given the 3 EUR/minute rate), then slowly lifted her head, looked straight into the camera and articulated her findings: "I see a definite Maybe."

I guess this awesome universal answer also works for my opening question. There simply is no ideal solution. I like the item/itemprop idea, but I'd need to add a hack for markup values (e.g. by adding a item="...XMLLiteral" container and then converting these items to XML nodes. But then I can just add a simpler hack to my RDFa extractor to deep-parse XMLLiterals). This doesn't justify a whole new spec. The copy/paste problem is not too urgent any more, as Linked Data enables nifty copy-by-reference instead of copy-by-value.

It's generally a little surprising to see that Microdata proposal. For months, the HTML5 opinion makers argued against user-defined markup structures, and now they created a completely new spec that not only extends RDFa's possibilities to identify resource types and relations, but also seems to introduce a redundant serialization for selected microformats.

Anyway, for the sake of convergence and less work, I think I still prefer (a subset of) RDFa, if only there was a way to get rid of CURIEs (who wants an abbreviation mechanism whose acronym can't even be properly expanded? ;). And an alternative for the validation pain could be a simple, locally installed validator, accessible through a Ubiquity script. When I think about it, I mainly just need well-formedness and some attribute checks. A Ubiquity script could directly show HTML errors and also extracted triples, and maybe even do some triple sanity checks, too. But then this setup would work for Microdata just as fine. Ah well..

RDFa button (inofficial)

An inofficial light-blue RDFa button
Update/Note: This is not an official RDFa button, those (in the known colours) will be provided by W3C's Communications Team once RDFa is a Rec or CRec.

A couple of days ago I created an RDFa technology button, and I was asked to share it, so here it is:

RDFa
(PNG, GIF, SVG source file)

Please see the W3C Semantic Web Logos and Policies page for license details. This button is derived from the original W3C ones.

Adding (partial) RDFa support to the Firefox HTML Validator extension

Improving QA issues caused by RDFa
Update (2008-04-24): I managed to get rid of the xmlns-related errors (.replace() to the rescue ;), so the extension now accepts markup that follows the latest RDFa DTD (including @typeof). And while at it, I created versions for win and mac.

One of the reasons I haven't been using RDFa in production is the problem of quality assurance (a.k.a. plain old html validation). Not because RDFa isn't valid markup as such, but the main tool I'm using during development is Marc Gueury's excellent HTML Validator Extension for Firefox. RDFa is valid XHTML+RDFa, but XHTML+RDFa is not HTML, so the extension reports dozens of errors starting with the unrecognized Doctype declaration. The W3C Markup Validator supports RDFa, but I often develop while I'm offline, or on a non-public Web server, and the little "0 errors / 0 warnings" message in the status bar is more convenient than having to send markup to an online service.

Yesterday, however, I started working on an RDFa generator for one of Intellidimension's projects (Very interesting to see them use RDF big time, while many of us are still experimenting and thinking about potential markets, BTW). So, now that the RDFa-caused messages made it almost impossible to spot real HTML errors, I wondered if the add-on could perhaps be hacked to accept RDFa as well. Long story short: It can, to a certain extent. I don't know if arbitrary XML namespace prefixes (xmlns:foo="...") can be supported by a pure DTD/SGML-based validator (the FF extension uses openSP). FWIW, I couldn't get it to work.

Apart from that, RDFa-enabling the extension was mainly copying the RDFa DTD and a set of modules to the plug-in's SGML library. It now happily accepts RDFa attributes (about, resource, property, datatype, content, etc) and makes my life a little bit easier. If anyone has an idea how I could make it accept (non-predefined) namespace prefixes as well, I'd appreciate hints.

The tweaked extension is so far just a hack. I didn't even ping Marc yet or change the internal ID, so any extension update will remove the RDFa functionality. You can try/download it if you like (windows version), but I may have to take it offline should Marc not be happy about the re-distribution.

Moving out of the shadow with RDFa

RDFa can help solve the "shadow semweb" problem
Ian Davis has written an interesting series of posts related to the problems arising from using fragment identifiers in resource URIs. Ian makes a lot of valid points, but I think misses an essential one. (With this post I'm breaking with a long tradition, I'm saying positive things about RDFa ;)

So, what's the problem, and how can RDFa help? Ian is discussing a lot of architectural things, and I'm sure there are issues and inconsistencies. But the practical problem he describes is based on the following WebArch principle:
The fragment identifies a portion of a representation obtained from a URI,
and its meaning changes depending on the type of representaion. [sic]
That means that you can't use "http://example.com/ben#self" as an HTML section identifier and as a non-document identifier (e.g. the person ben). Ian concludes that
You can have a machine readable RDF version or a human readable HTML
version but not both at the same time
and that this forces the structured web into a disregarded shadow of the human-readable web.

I think that conclusion is not correct. eRDF re-uses HTML's @id to establish resource identifiers, so it mixes document identifiers with non-doc ones, and this is an ambiguity problem indeed. RDFa, however, is a layer on top of HTML that introduces a dedicated mechanism for resource identification, the @about attribute (, and that's why it unfortunately needs an own DTD, but that's another story). From a WebArch POV, the design is clean, content-type-specific identifiers don't get mixed. I can unambiguously describe what "..ben#self" is meant to identify without the representation format playing a role. RDFa can re-purpose HTML's text nodes for RDF literals, and anchors for resource URIs, but apart from that, the HTML document is not much more than a (human-friendly) container.

So, you can serve HTML and machine-readable information in a single document, you just have to make sure that your resource URI fragments don't appear in HTML @ids. And now that we are back on the practical level: Any other ID generation mechanism can work, too. It's fairly easy to implement a URI generator for RDF extracted from a microformats-enabled HTML page without overloading resource IDs. I personally don't see a huge problem (again, practically), as all my applications work with triples, not with representations or encodings which are dealt with by the parsers and extractors.

One practical issue remains, though: Current browsers don't (natively) support navigating to RDF identifiers encoded in RDFa-, microformats-, or GRDDL-enabled HTML pages. You need an additional JavaScript lib to invoke appropriate scroll actions after a page URI with a (non-HTML) fragment identifier is loaded. That's a little annoying, but doable. I think fragment identifiers are valuable. They allow the description of multiple resources in a single document, and that's a handy feature. Whether that breaks Web architecture theory, dunno. Not for me, at least ;-)

A Comparison of Microformats, eRDF, and RDFa

An updated (and customizable) comparison of the different approaches for semantically enhancing HTML.
Update (2006-02-13): In order to avoid further flame wars with RDFa folks, I've adjusted the form to not show my personal priorities as default settings anymore (here they are if you are interested, it's a 48-42-40 ranking for MFs, eRDF, and RDFa respectively). All features are set to "Nice to have" now. As you can see, for these settings, RDFa gets the highest ranking (I *said* the comparison is not biased against RDFa!). If you disable the features related to domain-independent resource descriptions, MFs shine, if you insist on HTML validity, eRDF moves up, etc. It's all in the mix.

After a comment of mine on the Microformats IRC channel, SWD's Michael Hausenblas asks for the reason why I said that I personally don't like RDFa. Damn public logs ;) OK, now I have to justify that somehow without falling into rant mode again...

I already wrote a little comparison of Microformats, Structured Blogging, eRDF, and RDFa some time ago, sounds like a good opportunity to see how things evolved during the last 8 months. Back then I concluded that both eRDF and RDFa were preferred candidates for SemSol, but that RDFa lacked the necessary deployment potential due to not being valid HTML (as far as any widespread HTML spec is concerned).

I excluded the Structured Blogging initiative from this comparison, it seems to have died a silent death. (Their approach to redundantly embed microcontent in script tags apparently didn't convince the developer community.) I also excluded features which are equally available in all approaches, such as visible metadata, general support for plain literals, being well-formed, no negative effect on browser behaviour, etc.

Pretending to be constructive, and in order to make things less biased, I embedded a dynamic page item that allows you to create your own, tailored comparison. The default results reflect my personal requirements (and hopefully answer Michael's question). As your mileage does most probably vary, you can just tweak the feature priorities (The different results are not stored, but the custom comparisons can be bookmarked). Feel free to leave a comment if you'd like me to add more criteria.

No. Feature or Requirement Priority MFs eRDF RDFa
1 DRY (Don't Repeat Yourself) yes yes mostly
2 HTML4 / XHTML 1.0 validity yes yes no
3 Custom extensions / Vocabulary mixing no yes yes
4 Arbitrary resource descriptions no yes yes
5 Explicit syntactic means for arbitrary resource descriptions no no yes
6 Supported by the W3C partly partly yes
7 Follow DCMI guidelines no yes no
8 Stable/Uniform syntax specification partly yes yes
9 Predictable RDF mappings mostly yes yes
10 Live/Web Clipboard Compatibility yes mostly mostly
11 Reliable copying, aggregation, and re-publishing of source chunks. (Self-containment) mostly partly partly
12 Support for not just plain literals (e.g. typed dates, floats, or markup). yes no yes
13 Triple bloat prevention (only actively marked-up information leads to triples) yes yes no
14 Possible integration in namespaced (non-HTML) XML languages. no no yes
15 Mainstream Web developers are already adopting it. yes no no
16 Tidy-safety (Cleaning up the page will never alter the embedded semantics) yes yes no
17 Explicit support for blank nodes. no no yes
18 Compact syntax, based on existing HTML semantics like the address tag or rel/rev/class attributes. yes mostly partly
19 Inclusion of newly evolving publishing patterns (e.g. rel="nofollow"). yes no partly
20 Support for head section metadata such as OpenID or Feed hooks. no partly partly

Results

Solution Points Missing Requirements
RDFa 35 -
eRDF 34 -
Microformats 33 -

Max. points for selected criteria: 60

Summary:

Your requirements are met by RDFa, or eRDF, or Microformats.

Feature notes/explanations:

DRY (Don't Repeat Yourself)
  • RDFa: Literals have to be redundantly put in "content" attributes in order to make them un-typed.
HTML4 / XHTML 1.0 validity
  • RDFa: Given the buzz around the WHATWG, it's uncertain when (if at all) XHTML 2 or XHTML 1.1 modules will be widely deployed enough.
Explicit syntactic means for arbitrary resource descriptions
  • eRDF: owl:sameAs statements (or other IFPs) have to be used to describe external resources.
Supported by the W3C
  • MFs, eRDF: Indirectly supported by W3C's GRDDL effort.
Stable/Uniform syntax specification
  • MFs: Although MFs reuse HTML structures, the format syntax layered on top differs, so that each MF needs separate (though stable) parsing rules.
Predictable RDF mappings
  • MFs: Microformats could be mapped to different RDF structures, but the GRDDL WG will probably recommend fixed mappings.
Live/Web Clipboard Compatibility
  • eRDF, RDFa: Tweaks are needed to make them Live-Clipboard compatible.
Reliable copying, aggregation, and re-publishing of source chunks. (Self-containment)
  • MFs: Some Microformats (e.g. XFN) lose their intended semantics when regarded out of context.
  • eRDF/RDFa: Only chunks with nearby/embedded namespace definitions can be reliably copied.
Support for head section metadata such as OpenID or Feed hooks.
  • eRDF: Can support openID hooks.
  • RDFa: Will probably interpret any rel attribute.


Bottom line: For many requirement combinations a single solution alone is not enough. My tailored summary suggests for example that I should be fine with a combination of Microformats and eRDF. How does your preferred solution mix look like?

SeenOn - Timestamp or State of Mind?

fun stuff from #microformats, comments on e/RDF/a wrt to Microformats
<tommorris> Every time I see a movie from now on,
  I'm adding the IMDB URL to my FOAF file.
<briansuda> with what predicate?
<tommorris> rdf.opiumfield.com/movie/0.1/seen
...
<briansuda> seenOn, is that a timestamp or a state-of-mind?
(microformats(!) irc channel)

Now, who said RDF was less real-word-ish than microformats?

Related link (wrt to movies, not toxics): Microformats 80%, RDF 20% by Tom Morris about the longtail utility of (e)RDF(a). Wanted to state something like this for some time. After implementing a Microcontent parser (part of the next ARC release) that creates a merged triple set from eRDF and Microformats, I can't say anymore that MFs don't scale (even though making the meaning of nested formats explicit is sometimes tricky). I was really impressed by the amount of practical use cases covered by them (Listings and qualified review ratings even go beyond the demos I've seen in RDFer circles). However, there is still a lot of room for custom RDF extensions that can be used to extend microformatted HTML. Skill levels are just one of many longtail examples: They are currently not covered by hResume, but available in Uldis' CV vocab.

The important thing IMO is that RDFers should not forget to acknowledge the amazing deployment work of the MF community and focus on what they can add to the table (storage, querying, and mixing, as a start) instead of marketing RDF-in-HTML as an alternative, replacement, or otherwise "superior" (likewise the other way round, btw.). I think we also shouldn't overcharge the big content re-publishers. When maintainers of sites like LinkedIn or Eventful get bombed with requests to add different semantic serializations to their pages, they may hesitate to support any of them at all. For most of these mainstream sites, Microformats do the job just fine, and often better. Why should people for example have to specify namespaces when a simple, agreed-on rel-license does the trick already? (We could still use RDF to specify the license details, and even the license link is only a simple conversion away from RDF.)

ARC Embedded RDF (eRDF) Parser for PHP

Announcing eRDF support for ARC + an eRDF/RDFa comparison
Update: The current RDFa primer is *not* broken wrt to WebArch, the examples were fixed two weeks ago. I've also removed the "no developer support" rant, just received personal support ;-)

While searching for a suitable output format for a new RDF framework, I've been looking at the various semantic hypertext approaches, namely microformats, Structured Blogging, RDFa, and Embedded RDF (eRDF). Each one has its pros and cons:

Microformats:
  • (+) widest deployment so far
  • (+) integrate nicely with current HTML and CSS
  • (-) centralized project, inventing custom microformats is discouraged
  • (-) don't scale, the number of MFs will either be very limited, or sooner or later there will be class name collisions

Structured Blogging:
  • (+) a large number of supporters (at least potentially, the supporters list is huge, although this doesn't represent the available tools)
  • (+) not a competitor, but a superset of microformats
  • (-) the metadata is embedded in a rather odd way
  • (-) the metadata is repeated
  • (-) the use cases are limited (e.g. reviews, events, etc)

RDFa:
  • (+) follows certain microformats principles (e.g. "Don't repeat yourself")
  • (+) freely extensible
  • (+) All resource descriptions (e.g. for events, profiles, products, etc.) can be extracted with a single transformation script
  • (+) RDF-focused
  • (+) W3C-supported
  • (-) Not XHMTL 1.0 compliant, it will take some time before it can be used in commercial products or picky geek circles
  • (-) The default datatype of literals is rdf:XMLLiteral which is wrong for most deployed properties

eRDF:
  • (+) follows the microformats principles
  • (+) freely extensible
  • (+) All resource descriptions (e.g. for events, profiles, products, etc.) can be extracted with a single transformation script
  • (+) uses existing markup
  • (+) XHTML 1.0 compliant
  • (+) RDF-focused
  • (-) Covers only a subset of RDF
  • (-) Does not support XML literals

So, both RDFa and eRDF seem like good candidates for embedding resource descriptions in HTML. The two are not really compatible, though, it is not easily possible to create a superset which is both RDFa and eRDF. However, my publishing framework is using a Wiki-like markup language (M4SH) which is converted to HTML, so I can add support for both approaches and make the output a configuration option. Maybe it's even possible to create a merged serialization without confusing transformers.

I'll surely have another look at RDFa when there is better deployment potential. For now, I've created a M4SH-to-eRDF converter (which is going to be available as part of the forthcoming SemSol framework), and an eRDF parser that can generate RDF/XML from embedded RDF. I've also added some extensions to work around (plain) eRDF's limitations, the main one being on-the-fly rewriting of owl:sameAs assertions to allow full descriptions of remote resources, e.g.
<div id="arc">
  <a rel="owl-sameAs" href="http://example.com/r/001#001"></a>
  <a rel="doap-maintainer" href="#ben">Benjamin</a>
</div>
is automatically converted to
<http://example.com/r/001#001> doap:maintainer <#ben>

The parser can be downloaded at the ARC site (documentation).
I've also put up a little demo service if you want to test the parser.

YARDFIXHTML - Yet Another RDF-In-XHTML proposal

Ian Davis introduced eRDF
Ian Davis proposes "Embedded RDF", a microformats-inspired path to metadata-enriched HTML. Unlike microformats, his approach can utilize a single generic transformation script instead of one transformation for each format (or micromodel if you prefer Danny Ayers' terminology), which is closer to RDF's idea of freely mixable vocabularies.

I had some hopes of RDF/A but stopped following its progress several months ago as it didn't seem to provide an easy way to really bridge the gap between HTML and RDF. My use case was (and is) to be able to markup html in a way which allows me to automatically (and without too much effort) generate context menus or tool-tips

Archives/Search

YYYY or YYYY/MM
No Posts found

Feeds