finally a bnode with a uri

Posts tagged with: parser

poshRDF - RDF extraction from microformats and ad-hoc markup

poshRDF is a new attempt to extract RDF from microformats and ad-hoc markup
I've been thinking about this since Semantic Camp where I had an inspiring dialogue with Keith Alexander about semantics in HTML. We were wondering about the feasibility of a true microformats superset, where existing microformats could be converted to RDF without the need to write a dedicated extractor for each format. This was also about the time when "scoping" and context issues around certain microformats started to be discussed (What happens for example with other people's XFN markup, aggregated in a widget on my homepage? Does it affect my social graph as seen by XFN crawlers? Can I reuse existing class names for new formats, or do we confuse parsers and authors then? Stuff like that).

A couple of days ago I finally wrote up this "poshRDF" idea on the ESW wiki and started with an implementation for paggr widgets, which are meant to expose machine-readable data from RDFa, microformats, but also from user-defined, ad-hoc formats, in an efficient way. PoshRDF can enable single-pass RDF extraction for a set of formats. Previously, my code had to walk through the DOM multiple times, once for each format.

A poshRDF parser is going to be part of one of the next ARC revisions. I've just put up a site at poshrdf.org to host the dynamic posh namespace. For now the site links to a possibly interesting by-product: A unified RDF/OWL schema for the most popular microformats: xfn, rel-tag, rel-bookmark, rel-nofollow, rel-directory, rel-license, hcard, hcalendar, hatom, hreview, xfolk, hresume, address, and geolocation. It's not 100% correct, poshRDF is after all still a generic mechanism and doesn't cover format-specific interpretations. But it might be interesting for implementors. The schema could be used to generate dedicated parser configurations. It also describes the typical context of class names so that you can work around scoping issues (e.g. the XFN relations are usually scoped to the document or embedded hAtom entries).

I hope to find some time to build a JSON exporter and microformats validator on top of poshRDF in the not too distant future. Got to move on for now, though. Dear Lazyweb, feel free to jump in ;)

An RDF Parser for Google's Social Graph API JSON

ARC gets a parser for JSON returned by Google's SG API
First, the usual credits to Morten (and also Dan), who already suggested to extract RDF from Google's SG API results some time ago.

Some work will be needed for a complete mapping of the detailed information coming out of the API. Not only because the data is not always fully accurate (the API still thinks that Ian Davis and I are the same person) but also because the claims are document-oriented while most SG-related RDF vocabs are person-centric.

However, for any given URL somehow associated with a person, the API returns a set of identifiers that are very likely to lead to related data. So, for an RDF toolkit, these pointers are often already sufficient to send out its RDF extractors and enrich the local dataset. The SG API Parser that was now added to ARC (revision 2008-07-15) is still pretty basic, but it will generate rdfs:seeAlso triples for the canonical_mapping's value (as subject) and every mentioned HTTP identifier (as object).

I'm working on more low-level/direct RDF mappings for POSH formats such as XFN, those could simplify detailed triple extraction (w/o too much of the current person --homepage-> document indirection) from the API results.

Using the new parser in ARC is identical to working with any other syntax. The format detector will auto-include the necessary components. Just call
$parser->parse("http://socialgraph.apis.google.com/lookup?q=example.com&...")
or
$store->query("LOAD <http://socialgraph.apis.google.com/lookup?q=example.com&...>")

ARC RDF/XML PHP parser v.0.2.0 passes positive parser tests

ARC passes all 128 W3C RDF/XML positive praser tests
The ARC RDF/XML Parser now passes each of W3C's positive parser tests.

I've also written a little test script using the ARC Simple Model class which helped me identify issues with the parser and also generated the test results.

As far as I know, ARC is currently the only PHP-based parser that passes all of the 128 positive tests, but in case you need a validating parser, RAP is still the only choice. The only tests it doesn't pass correctly are those using a non-ASCII rdf:ID, or XML literals. (Not passing the latter ones could also be my fault as RAP's N-Triples serializer is using a version of ARC's unicode NFC encoder. Shame on me then..)

Archives/Search

YYYY or YYYY/MM
No Posts found

Feeds