finally a bnode with a uri

Posts tagged with: bnodes

SPARQL, Blank nodes, and the OWA

SPARQL and SemWeb bnode issues
I ran into an issue yesterday while working on a little "Events" portlet. My RDF store contains a couple of aggregated or locally created event descriptions (ical:Vevent, conf:Event, conf:Conference, ...) and I wanted to generate a list of links, each pointing to a detailed view of the selected event.

Seems to be a simple task, even with my limited, home-grown SPARQL implementations, but it isn't, as I learned yesterday.

Generating the list was ok, but when I tried to display the results, my query engine returned a whole bunch of unwanted triples for any event which was referenced by a bnode identifier. I was just about cursing my crappy SPARQL2SQL converter once again when I realized that this time it actually (and surprisingly ;) worked the way it should. Here is a sample query snippet which is generated when someone clicks on a link to a bnode-identified event:
SELECT DISTINCT *
WHERE {
  _:s70_bn3 ?p ?o
}
LIMIT 50
But what I get in return is not a set of predicates and objects associated with bnode _:s70_bn3 (the 3rd blank node in source no. 70), because named bnodes in SPARQL are treated as placeholders, similar to unbound variables. The query above returns the same result as:
SELECT DISTINCT ?p ?o
WHERE {
  ?s ?p ?o
}
LIMIT 50
With the current spec, it is not possible to reference a resource by a given bnode identifier.

A bug in the spec? Not really. Once again, it's related to (I think) the Semantic Web's open world assumption (OWA) and the way the RDF infrastructure works: In RDF's open world, it's not allowed to consider two (or more) resources to be identical unless identity can be deduced from identical URIs or via some OWL mechanisms such as Functional/InverseFunctional Properties (FPs/IFPs), or sameAs-statements. Bnodes are labels with a limited scope, they are not meant to be consistently used as identifiers on a global scale.

Consequently, not allowing SPARQL to treat named bnodes as identifiers makes a lot of sense. For a scalable Semantic Web we'll need distributed stores and federated queries. Stable bnode labels can't be guaranteed across multiple queries. (In my case, the source no. 70 may have been refreshed after generating the list, so _:s70_bn3 may now reference a completely different node.)

OK, so much for the theory. But my use case is a very basic and probably common one, there should definitely be a way to implement it. There are two approaches to bnode referencing which work without changing the SPARQL spec: Basic identity reasoning, and automatic generation of URIs (i.e. turning bnodes into URI-identified ones). Generating an unambigous "handle" for a resource (approach 1) can be done with OWL as mentioned above. Auto-creating URIs also needs an identifying mechanism. It won't suffice to simply turn each bnode id into a URI, as bnodes change, which can lead to redundant URIs for different resources. As stupid as it sounds: In changing graphs, you can actually only auto-create stable resource identifiers if the resources could already be (unambigously) identified before. But many of the resource descriptions in my little collection of events lack these identifying criteria, thus forcing me to hope for the DAWG to tweak the SPARQL spec. I'm not sure if they'll find a solution that doesn't lead to problems w.r.t the OWA/Semantic Web architecture, but they could add support for explicitly specified bnodes and leave it to the query engine implementors to provide stable bnodes for a certain context such as a session, or a cached graph, or some other useful scope.

For my personal task, I am luckily both user and developer of the RDF tools, so I added a little hack to my SPARQL interface which I'll replace when there is a proper W3C recommendation. My first idea was to pass a list of local_bnode_ids to the SPARQL2SQL rewriter, but as I'll have to replace the hack later anyway, I went for something that afforded less (actually no) coding: My SPARQL parser has an incomplete tolerant URI syntax checker: A bnode id put in brackets will end up as an absolute URI in the rewriter. So the snippet above becomes:
SELECT DISTINCT *
WHERE {
  <_:s70_bn3> ?p ?o
}
LIMIT 50
and as my RDF store uses the same column for URIs and bnodes, everything now works just fine. Of course, this does not solve the problem of unstable bnode ids, but that's something I can live with for the moment.

(Thanks once again to Dave Beckett for helping me with a precise answer and a pointer to the relevant discussion thread.)

Handling blank nodes in RDF editing forms

User-friendly RDF editing.
One of the bigger problems when designing editing/creation forms as a front-end to an RDF store is the support for blank nodes. Although it's easy to offer a "don't auto-assign a URI to this resource" checkbox when a new resource is created (i.e. allow blank subjects), things get more complicated with blank objects.

Let's assume someone is creating a FOAF description of himself. As mentioned above, it's straight-forward to offer a simple form with a class drop-down (to select foaf:Person) and a checkbox to mark the new resource as "blank". After adding some basic attributes (name, etc.) the user may now want to add his date of birth. A common way to do this is by linking the person resource via the bio:event property to a blank bio:Birth resource. The date of birth is then assigned to the birth resource (see example below).
<foaf:Person>
  <foaf:name>Benjamin Nowack</foaf:name>
  <bio:event>
    <bio:Birth>
      <bio:date>1973-08-14</bio:date>
    </bio:Birth>
  </bio:event>
</foaf:Person>
Another example is the description of foaf:knows relations:
<foaf:Person>
  <foaf:name>Benjamin Nowack</foaf:name>
  <foaf:knows>
    <foaf:Person>
      <foaf:name>Andreas Harth</foaf:name>
      <foaf:homepage rdf:resource="http://www.harth.org/andreas/"/>
    </foaf:Person>
  </foaf:knows>
</foaf:Person>

The first approach that comes to mind is to simply let users create additional blank nodes (a bio:Birth or foaf:Person in this case), and then to offer them a way to link the base resource to the newly created ones. However, this solution has two UI issues:

First, we can't use simple html drop-down lists to pick a resource as the RDF store may contain hundreds or thousands of possibly fitting resources. We could restrict the resources to choose from to those created by the current user, i.e. the user can only relate his/her "own" resources to each other. But this can still lead to scalability problems (imagine Marc Canter going to enter his foaf:knows relations ;), and furthermore, we end up with lots of redundant information in the RDF store.

The second problem is that for simple cases like the birthday one, the user has to do a lot of clicking and typing.

But how can we implement these use cases in a generic way without putting the burden too much on the user?
I've spent several weeks now on building an editing front-end for OWLchestra's RDF store. Unfortunately, I didn't manage to come up with a solution which would be as easy to use as e.g. Leigh Dodds' foaf-a-matic but would also allow to create forms completely from an OWL model, and would also scale. However, here is my approach for the editing front-end of the new semanticweb.org portal (I'm finally almost there):

The editing form for relations (i.e. owl:ObjectProperties) allows users to create blank nodes by simply not providing a URI for the related resource. Additionally, it's possible to reference a resource by description. (This is a common pattern in FOAF where you say that you know someone who is a person with a homepage foo. FOAF offers a bunch of uniquely identifying properties which enable identification of blank resources afterwards.) So it is at least possible now to create the types of relations mentioned above in a single step (see screenshot below).
screenshot: add relation
After selecting the relation, the list of available properties for the related blank node is created from the underlying OWL model. Also, the "Object type" is pre-selected, but can be modified (in the screenshot above, it could be set to bio:Birth). In order to avoid that people start using the "reference by description" forms to fully describe resources (I'm still trying to keep redundancy low), the number of possibe properties for the linked resource is limited to 4.

I've added some javascript remote scripting stuff to reduce the response times, so when you are editing a relation, only the form is refreshed, not the whole page. It is also possible to edit only certain types of relations (by pre-setting a namespace or a property), and there are both single- and multi-edit modes for updating relations (e.g. "the first 10 foaf:knows relations sorted by modification time").

Although I was quite happy so far, entering all the identifying information of resources which are already described by lots of other people was not as usable as I'd have liked it to be. Why do I have to type in a person's email, weblog, name, etc., if the information is perhaps already in the RDF store and could be re-used?

I don't know if I had eaten too many chocolate bars today, but somehow I found myself working on a find-as-you-type feature to automate filling in the required fields of the form. Yay, I can imagine the funny conversation I'm going to have with the DERI guys:
them: Benjamin, are you MAD? That's going to kill any RDF store!
me: No worries, it worked fine on my test machine.
them: But how many triples are in your test store, dude?
me: More than 900, almost 1000! It scales just fine!
them:""
me: Hey, where are you going? What's wrong? Hello..?
But if we forget the back-end for a second, the utility of such an inline-popup to select from is really great. I've added a delay to the execution which reduces the server hammering, and even if you'd have to wait 10 seconds for the result, that's still faster than typing the information by hand. Here is how it works:

The user selects the type of the related resource. He then starts typing in the "Find object" field. As soon as he pauses, a javascript function is invoked which sends the selected class id and the letters to the server. The server script uses this information to query the RDF store for resoures of the given type and object values matching these letters. In a second step, the RDF store is queried again to retrieve the labels, uniquely identifying properties, and seeAlso links for the resources found in step one.

In order to not kill the server, the result list is limited to 5 entries. Furthermore, the bits which require inferencing or model operations (finding possible label properties for a given class, getting a list of available inverse functional properties, including subclasses) are stored in the user's session object, so they slow down the machine only during the first request.

After retrieving the information from the store, a javascript-friendly result is generated and sent back to the browser, where another script updates the list of matching resources.

screenshot: find as you type
In the ideal case, the server sends sufficient information to help the user choose the right resource. Clicking on one of the entries auto-completes the form:

screenshot: auto-fill

That's as usable as I could get it, but still far away from what we are used to from "normal" HTML forms. Maybe it's just that RDF is too generic (subject-predicate-object) to be stripped down to simple forms (if we don't want them to look like a direct view on the triple store), I don't know.

Well, there is still the possibility to create custom forms and to do the RDF mapping in a separate step. But that's a different story and for now, I'll stick to the generic approach. User feedback may well change this, though ;)

Archives/Search

YYYY or YYYY/MM
No Posts found

Feeds