finally a bnode with a uri

SPARQL, Blank nodes, and the OWA

SPARQL and SemWeb bnode issues
I ran into an issue yesterday while working on a little "Events" portlet. My RDF store contains a couple of aggregated or locally created event descriptions (ical:Vevent, conf:Event, conf:Conference, ...) and I wanted to generate a list of links, each pointing to a detailed view of the selected event.

Seems to be a simple task, even with my limited, home-grown SPARQL implementations, but it isn't, as I learned yesterday.

Generating the list was ok, but when I tried to display the results, my query engine returned a whole bunch of unwanted triples for any event which was referenced by a bnode identifier. I was just about cursing my crappy SPARQL2SQL converter once again when I realized that this time it actually (and surprisingly ;) worked the way it should. Here is a sample query snippet which is generated when someone clicks on a link to a bnode-identified event:
SELECT DISTINCT *
WHERE {
  _:s70_bn3 ?p ?o
}
LIMIT 50
But what I get in return is not a set of predicates and objects associated with bnode _:s70_bn3 (the 3rd blank node in source no. 70), because named bnodes in SPARQL are treated as placeholders, similar to unbound variables. The query above returns the same result as:
SELECT DISTINCT ?p ?o
WHERE {
  ?s ?p ?o
}
LIMIT 50
With the current spec, it is not possible to reference a resource by a given bnode identifier.

A bug in the spec? Not really. Once again, it's related to (I think) the Semantic Web's open world assumption (OWA) and the way the RDF infrastructure works: In RDF's open world, it's not allowed to consider two (or more) resources to be identical unless identity can be deduced from identical URIs or via some OWL mechanisms such as Functional/InverseFunctional Properties (FPs/IFPs), or sameAs-statements. Bnodes are labels with a limited scope, they are not meant to be consistently used as identifiers on a global scale.

Consequently, not allowing SPARQL to treat named bnodes as identifiers makes a lot of sense. For a scalable Semantic Web we'll need distributed stores and federated queries. Stable bnode labels can't be guaranteed across multiple queries. (In my case, the source no. 70 may have been refreshed after generating the list, so _:s70_bn3 may now reference a completely different node.)

OK, so much for the theory. But my use case is a very basic and probably common one, there should definitely be a way to implement it. There are two approaches to bnode referencing which work without changing the SPARQL spec: Basic identity reasoning, and automatic generation of URIs (i.e. turning bnodes into URI-identified ones). Generating an unambigous "handle" for a resource (approach 1) can be done with OWL as mentioned above. Auto-creating URIs also needs an identifying mechanism. It won't suffice to simply turn each bnode id into a URI, as bnodes change, which can lead to redundant URIs for different resources. As stupid as it sounds: In changing graphs, you can actually only auto-create stable resource identifiers if the resources could already be (unambigously) identified before. But many of the resource descriptions in my little collection of events lack these identifying criteria, thus forcing me to hope for the DAWG to tweak the SPARQL spec. I'm not sure if they'll find a solution that doesn't lead to problems w.r.t the OWA/Semantic Web architecture, but they could add support for explicitly specified bnodes and leave it to the query engine implementors to provide stable bnodes for a certain context such as a session, or a cached graph, or some other useful scope.

For my personal task, I am luckily both user and developer of the RDF tools, so I added a little hack to my SPARQL interface which I'll replace when there is a proper W3C recommendation. My first idea was to pass a list of local_bnode_ids to the SPARQL2SQL rewriter, but as I'll have to replace the hack later anyway, I went for something that afforded less (actually no) coding: My SPARQL parser has an incomplete tolerant URI syntax checker: A bnode id put in brackets will end up as an absolute URI in the rewriter. So the snippet above becomes:
SELECT DISTINCT *
WHERE {
  <_:s70_bn3> ?p ?o
}
LIMIT 50
and as my RDF store uses the same column for URIs and bnodes, everything now works just fine. Of course, this does not solve the problem of unstable bnode ids, but that's something I can live with for the moment.

(Thanks once again to Dave Beckett for helping me with a precise answer and a pointer to the relevant discussion thread.)

Comments are disabled for this post.

Later Posts

Archives/Search

YYYY or YYYY/MM
No Posts found

Feeds