finally a bnode with a uri

Posts tagged with: inference

Writing Inference Rules with SPARQLScript

SPARQLScript can be used for forward chaining, including string manipulations on the run.
In order to keep data structures in Semantic CrunchBase close to the source API, I used a 1-to-1 mapping between CrunchBase JSON keys and RDF terms (with only a few exceptions). This was helpful for people knowing the JSON API, but it wasn't easy to interlink the converted information with existing SemWeb data such as FOAF, or the various LOD sources.

SPARQLScript is already heavily used by the Pimp-My-API tool or the TwitterBot, but yesterday I added a couple of new features and finally had a go at implementing a (forward chaining) rule evaluator (for the reasons mentioned some time ago).

A first version ("LOD Linker") is installed on Semantic CB, with initially 9 rules (feel free to leave a comment here if you need some additional mappings). With SPARQLScript being a superset of SPARQL+, most inference scripts are not much more than a single INSERT + CONSTRUCT query (you can click on the form's "Show inference scripts" button to see the source code):
$ins_count = INSERT INTO <${target_g}>
  CONSTRUCT {?res a foaf:Organization } WHERE {
    { ?res a cb:Company }
    UNION { ?res a cb:FinancialOrganization }
    UNION { ?res a cb:ServiceProvider }
    # prevent dupes
    OPTIONAL { GRAPH ?g { ?res a foaf:Organization } }
    FILTER(!bound(?g))
  }
  LIMIT 2000
But with the latest SPARQLScript processor (ARC release 2008-09-12) you can run more sophisticated scripts, such as the one below, which infers DBPedia links from wikipedia URLs:
$rows = SELECT ?res ?link WHERE {
    { ?res cb:web_presence ?link . }
    UNION { ?res cb:external_link ?link . }
    FILTER(REGEX(?link, "wikipedia.org/wiki"))
    # prevent dupes
    OPTIONAL { GRAPH ?g { ?res owl:sameAs ?v2 } . }
    FILTER(!bound(?g))
  }
  LIMIT 500

$triples = "";
FOR ($row in $rows) {
  # extract the wikipedia identifier
  $id = ${row.link.replace("/^.*\/([^\/\#]+)(\#.*)?$/", "\1")};
  # construct a dbpedia URI
  $res2 = "http://dbpedia.org/resource/${id}";
  # append to triples buffer
  $triples = "${triples} <${row.res}> owl:sameAs <${res2}> . "
}

#insert
if ($triples) {
  $ins_count = INSERT INTO <${target_g}> { ${triples} }
}

(I'm using a similar script to generate foaf:name triples by concatenating cb:first_name and cb:last_name.)

Inferred triples are added to a graph directly associated with the script. Apart from a destructive rule that removes all email addresses, the reasoning can easily be undone again by running a single DELETE query against the inferred graph.

I'm quite happy with the functionality so far. What's still missing is a way to rewrite bnodes, I don't think that's already possible. But INSERT + CONSTRUCT will leave bnode IDs unchanged, so the inference scripts don't necessarily require URI-denoted resources.

Another cool aspect of SPARQLScript-based inferencing is the possibility to use a federated set of endpoints, each processing only a part of a rule. The initial DBPedia mapper above, for example, uses locally available wikipedia links. However, CrunchBase only provides very few of those. So I created a second script which can retrieve DBPedia identifiers for local company homepages, using a combination of local queries and remote ones against the DBPedia SPARQL endpoint (in small iterations and only for companies with at least one employee, but it works).

"Online Social Graph Consolidation" webinale Slides

Slides from my 2nd webinale 08 talk are online
I gave another talk at webinale2008, this one was about how SemWeb technology (XFN, RDF, FOAF, SPARQL, Inference) can help with the aggregation, integration, and consolidation of online social graph fragments spread across Web 2.0 services. Again, I tried to keep things demo-ish (using grawiki for Linked Data editing, and knowee for the integration and consolidation), so the slides themselves (available on slideshare) aren't too spectacular (and in german).

Archives/Search

YYYY or YYYY/MM
No Posts found

Feeds