finally a bnode with a uri

poshRDF - RDF extraction from microformats and ad-hoc markup

poshRDF is a new attempt to extract RDF from microformats and ad-hoc markup
I've been thinking about this since Semantic Camp where I had an inspiring dialogue with Keith Alexander about semantics in HTML. We were wondering about the feasibility of a true microformats superset, where existing microformats could be converted to RDF without the need to write a dedicated extractor for each format. This was also about the time when "scoping" and context issues around certain microformats started to be discussed (What happens for example with other people's XFN markup, aggregated in a widget on my homepage? Does it affect my social graph as seen by XFN crawlers? Can I reuse existing class names for new formats, or do we confuse parsers and authors then? Stuff like that).

A couple of days ago I finally wrote up this "poshRDF" idea on the ESW wiki and started with an implementation for paggr widgets, which are meant to expose machine-readable data from RDFa, microformats, but also from user-defined, ad-hoc formats, in an efficient way. PoshRDF can enable single-pass RDF extraction for a set of formats. Previously, my code had to walk through the DOM multiple times, once for each format.

A poshRDF parser is going to be part of one of the next ARC revisions. I've just put up a site at poshrdf.org to host the dynamic posh namespace. For now the site links to a possibly interesting by-product: A unified RDF/OWL schema for the most popular microformats: xfn, rel-tag, rel-bookmark, rel-nofollow, rel-directory, rel-license, hcard, hcalendar, hatom, hreview, xfolk, hresume, address, and geolocation. It's not 100% correct, poshRDF is after all still a generic mechanism and doesn't cover format-specific interpretations. But it might be interesting for implementors. The schema could be used to generate dedicated parser configurations. It also describes the typical context of class names so that you can work around scoping issues (e.g. the XFN relations are usually scoped to the document or embedded hAtom entries).

I hope to find some time to build a JSON exporter and microformats validator on top of poshRDF in the not too distant future. Got to move on for now, though. Dear Lazyweb, feel free to jump in ;)

Comments and Trackbacks

FWIW, I have thought about the same thing quite often :)

But I'm not sure I understand: how do we manage uF which do not use the class attribute? (xfn, license) or those where the class value shall be considered (listing type in hListing, for example) ?

And: what about nesting?
Comment by gabriele renzi on 2008-11-12 11:22:20 UTC
My current parser supports both @class and @rel. There are very few conflicting values ("contact" is one which can be an XFN relation or an hResume vcard) and I didn't get wrong triples yet. I did have to add a "rel-only" flag to my term definitions though, as e.g. "date" is often used as @class in the wild, but also as @rel in XFN, and I didn't want triples for each @class=date. The flag is not part of the RDF Schema, but could be added.

Not sure about the class values that neither mark a relation or node (like the hListing actions you mention). Maybe specify them as booleans? They could then generate

_:hlisting123 mf:rent true ; mf:offer true .

The nesting is working quite nicely so far. For each term, you can define the scope (i.e. the possible containers). The poshRDF parser then uses the closest matching parent node as subject. This way, you'll have an hCard nested in an hReview nested in an hAtom entry still produce the correct mf:reviewer relation (instead of mf:author). There might be cases where you may want to re-use relations in multiple nested microformats, those are not supported by poshRDF (which might not be a bad thing ;).
Comment by Benjamin Nowack on 2008-11-12 12:00:30 UTC
Sorry I did not see the reference to @rel in the wiki :)

FWIW in my stuff I considered listing types as values of the listing (<> listingType: listing:offer) but I'm not sure it's sensible. In your framework It would maybe mean marking the element as rdf-s rdf-p rdf-o which seems ugly.

Or maybe they should be managed as subproperties of a generic relation (as XFN)?

And interesting to see that we also agree on defining entity/properties groups to ease nested parsing: it means I'm not being completely stupid :) (I do not have single pass parsing though, cause I put definitions in separate files so they can be used/extended independently)


Thanks for the mind food :)
Comment by gabriele renzi on 2008-11-13 17:49:38 UTC
Gabriele, likewise! (re mind food :)
Comment by Benjamin Nowack on 2008-11-14 12:25:46 UTC
0 comments are currently in the approval queue.

Comments are disabled for this post.

Earlier Posts

Later Posts

Archives/Search

YYYY or YYYY/MM
No Posts found

Feeds