finally a bnode with a uri

Posts tagged with: erdf

Could Microdata work better for me than RDFa?

Just had a quick look at the Microdata proposal, wondering about its pros and cons.
I've always had my little issues with RDFa, mainly for personal reasons. I'm repeating them here (for the last time, promised, don't want to trigger another flame war):
  • I personally don't like the amount of new attributes and their names (about, resource, typeof, and property are at least as inconsistent as RDF/XML's tokens).
  • I've written an RDFa parser, but still don't really understand the processing model. RDFa does the job of course, and it's been specified by smart people I respect, but to me it just still feels a little too complicated. I often have to utilize an extraction service to verify the triples resulting from a snippet, and I've seen the creators of RDFa do the same.
    One reason for being less intuitive than hoped is the fact that adding an attribute to some existing snippet can easily change the entire meaning of nested information. This makes it tricky to incrementally add structure to already tested and approved RDFa (an unnoticed @rel or @typeof may add an unwanted blank intermediate node, for example, and you can have any combination of RDFa attributes on a single node).
  • I consider structured blogging a central use case for RDF in HTML, yet it's not fully supported by RDFa: RDFa does not allow sub-structures in XML Literals (for security/triple injection reasons, IIRC), so you can't extract a post body (including HTML markup) and also get the annotations encoded in the body (like reviews or events).
  • (Reliable) copy and paste is not possible when prefix definitions can be kept separate from annotations. This is relevant to some of the apps I'm working on, and it took me quite some time to admit that (intuitively desirable) URI abbreviations in HTML do have negative practical implications. It depends on the use case, but it also needs some experience to realize this, as the pro-prefix argument is practically motivated as well. (I started playing with RDF-ish copy & paste rather early, if that makes this conclusion more credible).
  • The xmlns:prefix mechanism doesn't work nicely with my development environment. This is perhaps a silly argument, but for me personally it is important to see that green little "0 errors" indicator in my browser while I'm creating sites. It was not hard to extend the Firefox validator extension with support for new attributes, but there was no clean way to make it accept xmlns:prefix. Spotting true errors in the dozens of RDFa-related complaints is annoying.

Having said that, if this little list is all I can come up with, then RDFa is probably a pretty solid and usable spec. I could easily write a list of things I find flawed in RDF/XML, or even SPARQL, my favorite RDF technology. And there is another good reason why I should tend towards using RDFa: Lack of proper alternatives. I still think it would be possible to create a cross-doctype solution. eRDF and my own poshRDF experiment show that it's possible, but so far these approaches are incomplete RDF-wise, and I wouldn't have the energy or funds to build a community to develop things further (and again, my arguments are motivated by personal use cases and habits, so there isn't a large overlap with other people's requirements anyway).

Nevertheless, the new "Microdata" proposal is currently being discussed, so it might be worth having a look and comparing it with my RDFa issue list above. I only had a quick scan, I may have gotten some details wrong:
  • It only introduces two new (mandatory) attributes: "item" and "itemprop". "item" can be used to type resources. RDFa's "about" can be re-used for URI-identified items. That sounds compact and neat so far.
  • "item" is mandatory to indicate the boundary of a resource description. This makes accidental triples much less likely to happen than with RDFa. For any "itemprop", you just have to walk up the DOM tree to find the container item, which makes both human- and code-based parsing easy.
  • Structured blogging?Aww, not really. While you can at least choose between raw markup or structured values in RDFa, Microdata only supports flat key-value pairs where the value is a node's textContent and won't contain tags (if I read the draft correctly). I don't really need datatypes and languages, but I definitely want RDF triples where the object can contain HTML markup (wiki blobs with embedded annotations are another example).
  • Copy & paste of source code or from/to contenteditable sections is more reliable than with RDFa because there is no prefix mechanism.
  • It'd be possible to make the Firefox validator eat the new Microdata attributes without complaining, but I'm not sure how likely it is to have Microdata support in the official distribution anytime soon. Marc Gueury writes that validating HTML5 may require a new sort of validator, switching to HTML5 may make things worse instead of better for me, development-wise.

I recently watched a short section of a TV fortune-teller show where desperate people could dial in to get their questions asked. The lady who called asked "Will I find a new love?", and the fortune-teller looked into her cards (very slowly, of course, given the 3 EUR/minute rate), then slowly lifted her head, looked straight into the camera and articulated her findings: "I see a definite Maybe."

I guess this awesome universal answer also works for my opening question. There simply is no ideal solution. I like the item/itemprop idea, but I'd need to add a hack for markup values (e.g. by adding a item="...XMLLiteral" container and then converting these items to XML nodes. But then I can just add a simpler hack to my RDFa extractor to deep-parse XMLLiterals). This doesn't justify a whole new spec. The copy/paste problem is not too urgent any more, as Linked Data enables nifty copy-by-reference instead of copy-by-value.

It's generally a little surprising to see that Microdata proposal. For months, the HTML5 opinion makers argued against user-defined markup structures, and now they created a completely new spec that not only extends RDFa's possibilities to identify resource types and relations, but also seems to introduce a redundant serialization for selected microformats.

Anyway, for the sake of convergence and less work, I think I still prefer (a subset of) RDFa, if only there was a way to get rid of CURIEs (who wants an abbreviation mechanism whose acronym can't even be properly expanded? ;). And an alternative for the validation pain could be a simple, locally installed validator, accessible through a Ubiquity script. When I think about it, I mainly just need well-formedness and some attribute checks. A Ubiquity script could directly show HTML errors and also extracted triples, and maybe even do some triple sanity checks, too. But then this setup would work for Microdata just as fine. Ah well..

Moving out of the shadow with RDFa

RDFa can help solve the "shadow semweb" problem
Ian Davis has written an interesting series of posts related to the problems arising from using fragment identifiers in resource URIs. Ian makes a lot of valid points, but I think misses an essential one. (With this post I'm breaking with a long tradition, I'm saying positive things about RDFa ;)

So, what's the problem, and how can RDFa help? Ian is discussing a lot of architectural things, and I'm sure there are issues and inconsistencies. But the practical problem he describes is based on the following WebArch principle:
The fragment identifies a portion of a representation obtained from a URI,
and its meaning changes depending on the type of representaion. [sic]
That means that you can't use "" as an HTML section identifier and as a non-document identifier (e.g. the person ben). Ian concludes that
You can have a machine readable RDF version or a human readable HTML
version but not both at the same time
and that this forces the structured web into a disregarded shadow of the human-readable web.

I think that conclusion is not correct. eRDF re-uses HTML's @id to establish resource identifiers, so it mixes document identifiers with non-doc ones, and this is an ambiguity problem indeed. RDFa, however, is a layer on top of HTML that introduces a dedicated mechanism for resource identification, the @about attribute (, and that's why it unfortunately needs an own DTD, but that's another story). From a WebArch POV, the design is clean, content-type-specific identifiers don't get mixed. I can unambiguously describe what "..ben#self" is meant to identify without the representation format playing a role. RDFa can re-purpose HTML's text nodes for RDF literals, and anchors for resource URIs, but apart from that, the HTML document is not much more than a (human-friendly) container.

So, you can serve HTML and machine-readable information in a single document, you just have to make sure that your resource URI fragments don't appear in HTML @ids. And now that we are back on the practical level: Any other ID generation mechanism can work, too. It's fairly easy to implement a URI generator for RDF extracted from a microformats-enabled HTML page without overloading resource IDs. I personally don't see a huge problem (again, practically), as all my applications work with triples, not with representations or encodings which are dealt with by the parsers and extractors.

One practical issue remains, though: Current browsers don't (natively) support navigating to RDF identifiers encoded in RDFa-, microformats-, or GRDDL-enabled HTML pages. You need an additional JavaScript lib to invoke appropriate scroll actions after a page URI with a (non-HTML) fragment identifier is loaded. That's a little annoying, but doable. I think fragment identifiers are valuable. They allow the description of multiple resources in a single document, and that's a handy feature. Whether that breaks Web architecture theory, dunno. Not for me, at least ;-)

A Comparison of Microformats, eRDF, and RDFa

An updated (and customizable) comparison of the different approaches for semantically enhancing HTML.
Update (2006-02-13): In order to avoid further flame wars with RDFa folks, I've adjusted the form to not show my personal priorities as default settings anymore (here they are if you are interested, it's a 48-42-40 ranking for MFs, eRDF, and RDFa respectively). All features are set to "Nice to have" now. As you can see, for these settings, RDFa gets the highest ranking (I *said* the comparison is not biased against RDFa!). If you disable the features related to domain-independent resource descriptions, MFs shine, if you insist on HTML validity, eRDF moves up, etc. It's all in the mix.

After a comment of mine on the Microformats IRC channel, SWD's Michael Hausenblas asks for the reason why I said that I personally don't like RDFa. Damn public logs ;) OK, now I have to justify that somehow without falling into rant mode again...

I already wrote a little comparison of Microformats, Structured Blogging, eRDF, and RDFa some time ago, sounds like a good opportunity to see how things evolved during the last 8 months. Back then I concluded that both eRDF and RDFa were preferred candidates for SemSol, but that RDFa lacked the necessary deployment potential due to not being valid HTML (as far as any widespread HTML spec is concerned).

I excluded the Structured Blogging initiative from this comparison, it seems to have died a silent death. (Their approach to redundantly embed microcontent in script tags apparently didn't convince the developer community.) I also excluded features which are equally available in all approaches, such as visible metadata, general support for plain literals, being well-formed, no negative effect on browser behaviour, etc.

Pretending to be constructive, and in order to make things less biased, I embedded a dynamic page item that allows you to create your own, tailored comparison. The default results reflect my personal requirements (and hopefully answer Michael's question). As your mileage does most probably vary, you can just tweak the feature priorities (The different results are not stored, but the custom comparisons can be bookmarked). Feel free to leave a comment if you'd like me to add more criteria.

No. Feature or Requirement Priority MFs eRDF RDFa
1 DRY (Don't Repeat Yourself) yes yes mostly
2 HTML4 / XHTML 1.0 validity yes yes no
3 Custom extensions / Vocabulary mixing no yes yes
4 Arbitrary resource descriptions no yes yes
5 Explicit syntactic means for arbitrary resource descriptions no no yes
6 Supported by the W3C partly partly yes
7 Follow DCMI guidelines no yes no
8 Stable/Uniform syntax specification partly yes yes
9 Predictable RDF mappings mostly yes yes
10 Live/Web Clipboard Compatibility yes mostly mostly
11 Reliable copying, aggregation, and re-publishing of source chunks. (Self-containment) mostly partly partly
12 Support for not just plain literals (e.g. typed dates, floats, or markup). yes no yes
13 Triple bloat prevention (only actively marked-up information leads to triples) yes yes no
14 Possible integration in namespaced (non-HTML) XML languages. no no yes
15 Mainstream Web developers are already adopting it. yes no no
16 Tidy-safety (Cleaning up the page will never alter the embedded semantics) yes yes no
17 Explicit support for blank nodes. no no yes
18 Compact syntax, based on existing HTML semantics like the address tag or rel/rev/class attributes. yes mostly partly
19 Inclusion of newly evolving publishing patterns (e.g. rel="nofollow"). yes no partly
20 Support for head section metadata such as OpenID or Feed hooks. no partly partly


Solution Points Missing Requirements
RDFa 35 -
eRDF 34 -
Microformats 33 -

Max. points for selected criteria: 60


Your requirements are met by RDFa, or eRDF, or Microformats.

Feature notes/explanations:

DRY (Don't Repeat Yourself)
  • RDFa: Literals have to be redundantly put in "content" attributes in order to make them un-typed.
HTML4 / XHTML 1.0 validity
  • RDFa: Given the buzz around the WHATWG, it's uncertain when (if at all) XHTML 2 or XHTML 1.1 modules will be widely deployed enough.
Explicit syntactic means for arbitrary resource descriptions
  • eRDF: owl:sameAs statements (or other IFPs) have to be used to describe external resources.
Supported by the W3C
  • MFs, eRDF: Indirectly supported by W3C's GRDDL effort.
Stable/Uniform syntax specification
  • MFs: Although MFs reuse HTML structures, the format syntax layered on top differs, so that each MF needs separate (though stable) parsing rules.
Predictable RDF mappings
  • MFs: Microformats could be mapped to different RDF structures, but the GRDDL WG will probably recommend fixed mappings.
Live/Web Clipboard Compatibility
  • eRDF, RDFa: Tweaks are needed to make them Live-Clipboard compatible.
Reliable copying, aggregation, and re-publishing of source chunks. (Self-containment)
  • MFs: Some Microformats (e.g. XFN) lose their intended semantics when regarded out of context.
  • eRDF/RDFa: Only chunks with nearby/embedded namespace definitions can be reliably copied.
Support for head section metadata such as OpenID or Feed hooks.
  • eRDF: Can support openID hooks.
  • RDFa: Will probably interpret any rel attribute.

Bottom line: For many requirement combinations a single solution alone is not enough. My tailored summary suggests for example that I should be fine with a combination of Microformats and eRDF. How does your preferred solution mix look like?

SeenOn - Timestamp or State of Mind?

fun stuff from #microformats, comments on e/RDF/a wrt to Microformats
<tommorris> Every time I see a movie from now on,
  I'm adding the IMDB URL to my FOAF file.
<briansuda> with what predicate?
<briansuda> seenOn, is that a timestamp or a state-of-mind?
(microformats(!) irc channel)

Now, who said RDF was less real-word-ish than microformats?

Related link (wrt to movies, not toxics): Microformats 80%, RDF 20% by Tom Morris about the longtail utility of (e)RDF(a). Wanted to state something like this for some time. After implementing a Microcontent parser (part of the next ARC release) that creates a merged triple set from eRDF and Microformats, I can't say anymore that MFs don't scale (even though making the meaning of nested formats explicit is sometimes tricky). I was really impressed by the amount of practical use cases covered by them (Listings and qualified review ratings even go beyond the demos I've seen in RDFer circles). However, there is still a lot of room for custom RDF extensions that can be used to extend microformatted HTML. Skill levels are just one of many longtail examples: They are currently not covered by hResume, but available in Uldis' CV vocab.

The important thing IMO is that RDFers should not forget to acknowledge the amazing deployment work of the MF community and focus on what they can add to the table (storage, querying, and mixing, as a start) instead of marketing RDF-in-HTML as an alternative, replacement, or otherwise "superior" (likewise the other way round, btw.). I think we also shouldn't overcharge the big content re-publishers. When maintainers of sites like LinkedIn or Eventful get bombed with requests to add different semantic serializations to their pages, they may hesitate to support any of them at all. For most of these mainstream sites, Microformats do the job just fine, and often better. Why should people for example have to specify namespaces when a simple, agreed-on rel-license does the trick already? (We could still use RDF to specify the license details, and even the license link is only a simple conversion away from RDF.)

Web Clipboard: Adding liveliness to "Live Clipboard" with eRDF, JSON, and SPARQL.

Combining Live Clipboard with eRDF and SPARQL
Some context: In 2004, Tim Berners-Lee mentioned a potential RDF Clipboard as a user model which allowed copying resource descriptions between applications. Depending on the type of the copied resource, the target app would trigger appropriate actions. (See also the ESW wiki and Danny's blog for related links and discussion.)

I had a go at an "RDF data cart" last year which allowed you to "1click"-shop resource descriptions while surfing a site. Before leaving, you could "check out" the collected resource descriptions. However, the functionality was limited to a single session, the resource pointers didn't use globally valid identifiers.

Then, a couple of months ago, Ray Ozzie announced Live Clipboard, which uses a neat trick to access the operating system's clipboard for Copy & Paste operations across web pages.

Last week, I finally found the time to combine the Live Clipboard trick with the stuff I'm currently working on: A Semantic Publishing Framework, Embeddable RDF, and SPARQL. If you haven't heard of the latter two: eRDF is a microformats-like way to embed RDF triples in HTML, SPARQL is the W3C's protocol and query language for RDF repositories.

What I came up with so far is a Web Clipboard that works similar to Live Clipboard (I'm actually thinking about making it fully compatible), with just a few differences:

  • Web Clipboard uses a hidden single-line text input instead of a textarea which seemed to be a little bit easier to insert into the document structure, and it makes it work in Opera 8.5. The downside is that input fields don't allow multi-line content to be pasted (which is not needed by Web Clipboard, but will be necessary if I want to add Live Clipboard compatibility)
  • Web Clipboard doesn't paste complete resource descriptions, but only pointers to those. This makes it possible to e.g. copy a resource from a simple list of person's names, and display full contact details after a paste operation. (See the demo for an example which does asynchronous calls to a SPARQL endpoint). This "pass by reference" enables things like distributed address books or calendars where changes at one place could be automatically updated in the other apps.
  • Instead of XML, Web Clipboard uses a small JSON object which can simply be evaluated by JavaScript applications, or split with a basic regular expression. The pasted object contains 1) a resource identifier, and 2) an endpoint where information about the identified resource is available. The endpoint information consists of a URL and a list of specifications supported by the endpoint.

Complete documentation is going to be up at the clipboard site, but I'll first see if I can make things Live Clipboard-compatible (and I'll be travelling for the rest of the week). Here is a simple explanation how the current SPARQL demo works:

Apart from adding a small javascript library and a CSS file to the page, I specified the clipboard namespace and a default endpoint to be used for any resource pointer embedded in the page (this is eRDF syntax):
<link rel="schema.webclip" href="" />
<link rel="webclip.endpoint" href="" />

Then I embedded a sparqlet that generates the list of Planet RDF bloggers (this is done server-side). The important thing is that the HTML contains eRDF hooks like this:
<div id="agent0" class="-webclip-Res">
  <span class="webclip-resID" title="_:bb1ed0e67fdb042619f2f20fdc479c3af_id2245787"></span>
  <span class="foaf-name">Bob DuCharme</span>
  <a rel="foaf-weblog" href=""> by Bob DuCharme</a>

Ideally, the resource ID (webclip:resID, here again in eRDF notation) is a URI or some other stable identifier. The queried endpoint, however, obviously couldn't find a URI for the rendered resource, so it only provided a bnode ID. This is ok for the SPARQL endpoint the clipboard uses, though. The "foaf:weblog" information could be used to further disambiguate the resource identifier, the demo doesn't use it, however.

(The nice thing about eRDF-encoded hooks is that the information can be read by any HTTP- and eRDF-enabled client, the clipboard functionality could be implemented without having to load the page in a browser.)

Now, when the page is displayed, an onload-handler instantiates a JavaScript Web Clipboard which automatically adds an icon for each resource identified by the "webclip:Res/webvlip:resID"-hooks.

When the icon is clicked, the resource pointer JSON object is created and can be copied to the system's clipboard. It currently looks like this (on a single line):
 resID : "_:bb1ed0e67fdb042619f2f20fdc479c3af_id2245787",
 endpoint: {
  url: "",
specs: [ "", "" ]
} }

We can see that the clipboard uses the default endpoint mentioned at the document level as the embedded hook didn't specify a resource-specific endpoint. We can also see that the endpoint supports two specs, namely the SPARQL protocol and JSONP.

When this JSON object is pasted to another clipboard section, the onpaste-handler can decide what to do. In the demo, any paste section will make an asynchronous On-Demand-JavaScript call to the resource's SPARQL endpoint to retrieve a custom resource representation. The "Latest blog post" section uses a pre-defined callback, but this can be overwritten (as e.g. done by the "Resource Description" section which uses a custom function to display results).

I've added a playground area to the clipboard site where you can create your own clipboard sections. Give it a try, it's not too complicated. You can even bookmark them.

Here is an example JavaScript snippet that adds a clipboard section to a clipboard-enabled page with an 'id="resultCountSection"' HTML element:
  id : "resultCountSection",
resIDVar : "myRes",
query : "SELECT ?knowee WHERE "+ "{"+ " ?myRes <> ?knowee . "+ "}"+ " LIMIT 50", callback : function(qr){ var rows=(qr.results["bindings"]) ? qr.results.bindings : []; var result="The pasted resource seems to know "+ rows.length+" persons."; /* update paste area */ this.item.innerHTML=result; /* refresh clipboard */ window.clipboard.activate(); } }); window.clipboard.activate();

Something like this is all that will be needed for the final clipboard. No microformats parsing or similar burdens (although you could use the Web Clipboard to process microformats). The Clipboard's definition of an endpoint is rather open, too. An RSS file could be considered an endpoint as well as any other Web-accessible document or API.

ARC Embedded RDF (eRDF) Parser for PHP

Announcing eRDF support for ARC + an eRDF/RDFa comparison
Update: The current RDFa primer is *not* broken wrt to WebArch, the examples were fixed two weeks ago. I've also removed the "no developer support" rant, just received personal support ;-)

While searching for a suitable output format for a new RDF framework, I've been looking at the various semantic hypertext approaches, namely microformats, Structured Blogging, RDFa, and Embedded RDF (eRDF). Each one has its pros and cons:

  • (+) widest deployment so far
  • (+) integrate nicely with current HTML and CSS
  • (-) centralized project, inventing custom microformats is discouraged
  • (-) don't scale, the number of MFs will either be very limited, or sooner or later there will be class name collisions

Structured Blogging:
  • (+) a large number of supporters (at least potentially, the supporters list is huge, although this doesn't represent the available tools)
  • (+) not a competitor, but a superset of microformats
  • (-) the metadata is embedded in a rather odd way
  • (-) the metadata is repeated
  • (-) the use cases are limited (e.g. reviews, events, etc)

  • (+) follows certain microformats principles (e.g. "Don't repeat yourself")
  • (+) freely extensible
  • (+) All resource descriptions (e.g. for events, profiles, products, etc.) can be extracted with a single transformation script
  • (+) RDF-focused
  • (+) W3C-supported
  • (-) Not XHMTL 1.0 compliant, it will take some time before it can be used in commercial products or picky geek circles
  • (-) The default datatype of literals is rdf:XMLLiteral which is wrong for most deployed properties

  • (+) follows the microformats principles
  • (+) freely extensible
  • (+) All resource descriptions (e.g. for events, profiles, products, etc.) can be extracted with a single transformation script
  • (+) uses existing markup
  • (+) XHTML 1.0 compliant
  • (+) RDF-focused
  • (-) Covers only a subset of RDF
  • (-) Does not support XML literals

So, both RDFa and eRDF seem like good candidates for embedding resource descriptions in HTML. The two are not really compatible, though, it is not easily possible to create a superset which is both RDFa and eRDF. However, my publishing framework is using a Wiki-like markup language (M4SH) which is converted to HTML, so I can add support for both approaches and make the output a configuration option. Maybe it's even possible to create a merged serialization without confusing transformers.

I'll surely have another look at RDFa when there is better deployment potential. For now, I've created a M4SH-to-eRDF converter (which is going to be available as part of the forthcoming SemSol framework), and an eRDF parser that can generate RDF/XML from embedded RDF. I've also added some extensions to work around (plain) eRDF's limitations, the main one being on-the-fly rewriting of owl:sameAs assertions to allow full descriptions of remote resources, e.g.
<div id="arc">
  <a rel="owl-sameAs" href=""></a>
  <a rel="doap-maintainer" href="#ben">Benjamin</a>
is automatically converted to
<> doap:maintainer <#ben>

The parser can be downloaded at the ARC site (documentation).
I've also put up a little demo service if you want to test the parser.

YARDFIXHTML - Yet Another RDF-In-XHTML proposal

Ian Davis introduced eRDF
Ian Davis proposes "Embedded RDF", a microformats-inspired path to metadata-enriched HTML. Unlike microformats, his approach can utilize a single generic transformation script instead of one transformation for each format (or micromodel if you prefer Danny Ayers' terminology), which is closer to RDF's idea of freely mixable vocabularies.

I had some hopes of RDF/A but stopped following its progress several months ago as it didn't seem to provide an easy way to really bridge the gap between HTML and RDF. My use case was (and is) to be able to markup html in a way which allows me to automatically (and without too much effort) generate context menus or tool-tips


No Posts found