Posts tagged with: sparql

SPO(G) in ARC

S
The Urban Dictionary describes SPOG as "super pimped out gangsta" or as "a weapon that (...) had a fusion reactor as a power source". Sorry to disappoint you, neither has become part of ARC. Nevertheless, the SPOG I mean is quite powerful, too. It is a constrained SPARQL XML result format from SELECT queries that was proposed by Morten Frederiksen a few months ago. SPOG enables streaming store backups/dumps, and being another RDF serialization, it can be used for streamed loading as well. Support for SPOG was added in the latest revision (2008-07-02) and extends the store and the endpoint components:
  • The store got a dump() method that stream-outputs SPOG from all quads, and a createBackup($path, $alternative_query) method to write a SPOG dump (or custom SPO(G) query result) to a local file
  • The SPARQL endpoint feature list accepts "dump" as a new read operation
  • The SPARQL endpoint accepts "DUMP" as a query type now ("DUMP" also works via the internal query() method)
  • The format detector accepts SPOG XML as an RDF format now, SPARQL+ queries will work fine with LOAD <some-spog-file.srx>. (There is now a dedicated SPOG parser for streaming LOADs.)

These additions should simplify graph exchange and store replication quite a bit.

Morten++ for the idea and an initial implementation.

Documentation - Release Notes

SPARQLScript Teaser

B
I just managed to trick my experimental SPARQLScript parser into accepting simple IF-branches and placeholders. Here is an example of what is going to be possible with ARC soon (and yes, I know this snippet most probably won't excite anyone but me ;)
BASE <http://sparqlbot.semsol.org/data/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

# set the endpoint
ENDPOINT <endpoint.php>

# feed still fresh?
$current = ASK FROM <graph-updates> WHERE {
  <http://planetrdf.com/index.rdf> dc:date ?date .
  FILTER (?date > ${now-1h})
}

# refresh feed and update graph log
IF (!$current) {
  DELETE FROM <http://planetrdf.com/index.rdf>
  LOAD <http://planetrdf.com/index.rdf>
  INSERT INTO <graph-updates> { <http://planetrdf.com/index.rdf> dc:date "${now}" }
} 
(Parsed Structure)

The fun thing about the whole SPARQLScript experiment is that the parser (so far) is still below 200 LOC. A lot can be re-used from the official SPARQL Grammar, e.g. IF-blocks are really just:
Script ::= ( Query | PrefixDecl | EndpointDecl | Assignment | IFBlock )*
IFBlock ::= 'IF' BrackettedExpression '{' Script '}'

Implementing the actual SPARQLScript processing engine is of course more work than the parser, but I'm making progress there, too.

Major ARC revision: Talis platform-alignment, Remote Store, SPARQLScript

T
The latest ARC release comes with a couple of non-trivial (but also not necessarily obvious) changes. The most significant (as it involves ARC's resource indexes) is the alignment with the structures used by the Talis platform. ARC's parser output and PHP or JSON formats are now directly processable by Talis' platform tools. The documentation has been updated already, you may have to adjust your code (basically just "s/val/value/" and "s/dt/datatype/") in a few places.

The second major addition is a Remote Store component (documentation still to come) that is inspired and based on Morten Frederiksen's great RemoteEndpointPlugin. The Remote Store works like Morten's Plugin, but supports SPARQL+' LOAD, INSERT, and DELETE (i.e. write/POST) operations.

The third addition is also the reason why the Remote Store (which can be used as a SPARQL Endpoint Proxy) became a core component. I've worked on a draft for a SPARQL-based scripting language during the last months, and the latest ARC revision includes an early SPARQLScript parser and a SPARQLScript processor that can run a set of routines against remote SPARQL endpoints. What's still missing before this stuff becomes more usable (apart from documentation ;) is output templating and some other essential features such as loops. I do have an early prototype running in a local SPARQLBot version, but I probably won't have it online in time for tomorrow's Semantic Scripting Workshop (that I'll try to attend remotely at least). This is really powerful (and fun) stuff that will be available soon-ish. Can't wait to replace my hard-coded inferencer with a set of easily pluggable SPARQLScript procedures.

Other tweaks and changes include a very early hCalendar extractor and a couple of bug fixes that were reported by (among others) the SMOB project maintainers.

As usual, thanks to all who sent in patches, bug reports, feature requests, and stress-tested ARC. I think we're pretty close to a release candidate now :-)

"Online Social Graph Consolidation" webinale Slides

S
I gave another talk at webinale2008, this one was about how SemWeb technology (XFN, RDF, FOAF, SPARQL, Inference) can help with the aggregation, integration, and consolidation of online social graph fragments spread across Web 2.0 services. Again, I tried to keep things demo-ish (using grawiki for Linked Data editing, and knowee for the integration and consolidation), so the slides themselves (available on slideshare) aren't too spectacular (and in german).

SPARQLBot 101

M
While SPARQLBot was mostly a fun hack for last week's SemanticCamp, there is still a lot of activity on the #sparqlbot channel (it actually seems to increase). More than 30 SPARQL commands have been created. Michael Hausenblas now kindly created an introduction that gives a nice overview of the stuff that has been added to the command collection so far: SPARQLBot 101. Have fun, and thanks, Michael!

New ARC features: Triggers and MySQL extensions

A
The latest ARC revision got two new features: SPARQL Triggers and MySQL function extensions for SPARQL.

SPARQL Triggers

Triggers in ARC were suggested by Dan Brickley, who is experimenting with dynamically populated/updated Group definitions. What you can effectively do now in ARC is associating custom trigger classes with SPARQL query types, which will then be automatically called after registered query types, for example to refresh inferred Graphs:
$config = array(
  ...
  'store_triggers' => array(
    /* register LOAD triggers */
    'load' => array('updateFriendsList', 'crawlXFNLinks'),
  ),
);
$ep = ARC2::getStoreEndpoint($config);
$ep->go();

MySQL Extension Functions

Morten Frederiksen did it again. He sent in about 10 lines of code which he suggested to add to ARC's SQL rewriter. The effect? ARC suddenly has access to dozens of MySQL functions. That's CONCAT, CURDATE, MD5, UNIX_TIMESTAMP, and many more. A namespace for MySQL function URIs is now online, and queries look like this:
PREFIX mysql: <http://web-semantics.org/ns/mysql/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person WHERE { ?person foaf:givenname ?n1 ; foaf:family_name ?n2 . FILTER (mysql:concat(?n1, " ", ?n2) = "Alec Tronnick") . }
I talked a little bit more about these things with Danny Ayers in a recent podcast.

Looking for paid (Semantic Web) Projects

I
Update 2: Yay, I think I'm safe for the next couple of months, should have blogged much earlier. Now I'm starting to think we could really need a Job site for SemWeb people..

Update: Ah, the blogosphere. I already received some replies. One to share: Aduna is looking for a Java Engineer.

About a year ago, I received some funds which allowed me to re-write the ARC toolkit, and also to bring Trice (a semantic web application framework for PHP) to production-readiness. However, Semantic Web Development is generally still very new, especially in the Web Agency market where I'm coming from. It's not that easy yet to keep things self-sustaining.

May well be that I should blog less about bleeding-edge experiments, but rather about how RDF and SPARQL allow me to deploy extensible websites at a fraction of the time it used to take in the past. "Release Early", "Data First", "Evolve on the Fly", and all those patterns that SemWeb technology enables in a web development context.

Anyway, to keep things short: I'm actively (read: urgently ;-) looking for more paid projects. I'm a Web development all-rounder with particular interest in scripting languages and quite some experience in delivering RDF and frontend solutions (more details on my profile page). While it would of course be great to work on stuff where I can use my tools, I'm available for more general web development as well. I'm most productive when I can work from my office, but temporary travelling is basically fine, too. The Düsseldorf Airport is just minutes away.

Cheers in advance for suggestions,

Grawiki - A Wiki (and aggregator) for graph-shaped data

T
In case you watched the "DriftR" screencast I created in December, there is now a live version online. (I dropped the initial name, my blog posts suddenly showed up in CrunchBase. ;-)

Grawiki is a SPARQL-based Data Wiki, a little bit inspired by freebase, less impressive, feature-rich, scalable and all that, but, well, OpenSource, SemWeb-enabled, and decentralized (each Grawiki installation can import selected graphs from other ones, back-POSTing is in the works). As it seems that I forgot to write-protect the instance mentioned above, you can play with it if you like. You'll most probably encounter bugs, the built-in inferencer is still at alpha stage, and editing of consolidated bnodes is quite tricky to implement. I'll tweak things in a day or two.

With Grawiki, I think I finally have (the start of) a tool that could work nicely for ad-hoc RDF editing and aggregation (it can import RDF and certain microformats). Oh, and a personal URI, and a FOAF file. At last ;-)

I'm now considering the addition of RDFa injections as a possible next step, the current editor uses a home-grown mechanism to activate the editing hooks and stuff, which was easier to implement and debug in my XHTML 1.0 development environment. Stay tuned, a download site probably won't be up before next week, gotta focus on urrrgent SWEO/knowee todos first...

ARC Remote Endpoint Plugin

M
OK, you're probably already wondering if Morten and I have a link exchange contract, but anyway: He just announced a plugin for ARC that provides "access to remote SPARQL endpoints as if they were local stores." Cool stuff :-)

SPARQL is a W3C Recommendation

T
I guess I already pushed out enough ARC spam today, so I'll keep things short: SPARQL is now a W3C Recommendation!

What I'm personally very happy about is the Implementation Survey which features two pure-PHP implementations*. This really opens the door for mainstream Web Developers to start exploring RDF and SPARQL on off-the-shelf hosted web servers. Everything I create these days (e.g. the ARC site, including the bots and archive generators there, or this blog) is powered by SPARQL. It's an amazing productivity booster as you never have to worry about complicated JOINs or evolving database schemas again. You can just code away and it's great fun to work with. Want more Testimonials? The Data Access Working Group collected quite a number of them from W3C member organizations.

* Don't let yourself be fooled by RAP's low report scores, their SPARQL engine is quite mature, they just didn't run the whole test suite.

RDF Tools - An RDF Store for WordPress

T
Together with Morten Frederiksen and Dan Brickley (who is revisiting his SparqlPress idea), I've created a WordPress extension (called "RDF Tools") that adds an (ARC-based) RDF Store and SPARQL Endpoint to the blogging system. The store is kept separate from the WP tables (i.e. it's not a wrapper), but you can use WP's nice admin screens to configure it (screenshot), and given the amount of developer-friendly hooks that WP offers, I'm curious what can be done now, possibly in combination with other extensions such as those Alexandre Passant is working on. It could perhaps also be handy as a deployment accelerator for knowee.

ARC Data Wiki Plugin

A
I'm blessed with a small but first-class community around ARC that helps me with bug reports, patches, encouraging feedback, and nifty ideas. One example for the latter was Morten Frederiksen's invention to allow ARC to be extended with third party plugins. He even demonstrated the utility by enhancing the toolkit with a remote SPARQL endpoint for his named graph exchange work. ARC plugins are not bundled with the core codebase (which is meant to stay compact), but can easily be integrated in any ARC installation (Developer documentation is now online, too).

My first own plugin was triggered by Tim Berners-Lee's suggestion to write a lightweight request handler for an RDF-powered Data Wiki, as described in a recent Tech Report (PDF) and already implemented with Algae. I had to tweak the SPARQL+ spec and ARC's Query Parser to make it compatible with Eric Prud'hommeaux's SPARQL/Update flavor. This had the nice side-effect that all three SPARQL Write proposals (SPARUL, SPARQL/Update, SPARQL+) now (almost) share a common subset for basic INSERTs and DELETEs. After these updates, writing the plugin itself became almost trivial.

The code is still experimental and limited, but it's available for download, together with setup instructions. The Data Wiki plugin doesn't require a database (unlike the other SPARQL components in ARC) and supports update calls sent by RDF editors such as the Tabulator. I've set up a demo RDF wiki and will try to add remote update functionality to my own editor (to be renamed) now as well. Hmmm, would be cool to have a selection of generic tools to collaboratively read from and write to shared RDF spaces one day.

Data Wiki

LOAD, INSERT, and DELETE in ARC2 via SPARQL+

F
The new ARC site is coming along quite nicely. Last week I implemented two (low-level) agents that log IRC conversations and mails to ARC-DEV. RDF and SPARQL make such things incredibly easy. Today, I started writing documentation for the preview release of ARC2, and one the core changes to ARC1 is the removal of the API class for inserts and deletes in favour of an extended SPARQL, called SPARQL+ which enables aggregates, LOAD, INSERT, and DELETE, without the need for major query engine code additions.

LOAD is compatible with the LOAD operation introduced in the SPARUL proposal:
LOAD <http://example.com/> INTO <http://example.com/archive>
INSERT and DELETE are different, though. They re-use the LOAD and CONSTRUCT handlers which simplified the implementation and will hopefully make it easier for people who just learned SPARQL's standard syntax. INSERT and DELETE in SPARQL+ each support two different forms, one for explicit triples (with simple wildcards in DELETE queries), and one for dynamically CONSTRUCTed ones, e.g.
DELETE {
 <#foo> <bar> "baz" .
 <#foo2> <bar2> ?any .
}
or
INSERT INTO <http://example.com/inferred> CONSTRUCT {
  ?s foaf:knows ?o .
}
WHERE {
  ?s xfn:contact ?o .
}
More examples and detailed information about how exactly SPARQL+ extends the SPARQL grammar are available in ARC2's SPARQL+ documentation section

ARC2 Progress

G
OK, I met this week's 2nd deadline and finished ARC2's SPARQL test suite report. Pass/Fail results as of today: 317/67 (Sept. 22nd: 352/84). That's a huge step forward compared to ARC1, so I'm quite happy.

Next actions: Making the knowee prototype public (deadline missed, boo!), and relaunching the ARC site, together with proper community tools and the new release.

Web Clipboard: Adding liveliness to "Live Clipboard" with eRDF, JSON, and SPARQL.

C
Some context: In 2004, Tim Berners-Lee mentioned a potential RDF Clipboard as a user model which allowed copying resource descriptions between applications. Depending on the type of the copied resource, the target app would trigger appropriate actions. (See also the ESW wiki and Danny's blog for related links and discussion.)

I had a go at an "RDF data cart" last year which allowed you to "1click"-shop resource descriptions while surfing a site. Before leaving, you could "check out" the collected resource descriptions. However, the functionality was limited to a single session, the resource pointers didn't use globally valid identifiers.

Then, a couple of months ago, Ray Ozzie announced Live Clipboard, which uses a neat trick to access the operating system's clipboard for Copy & Paste operations across web pages.

Last week, I finally found the time to combine the Live Clipboard trick with the stuff I'm currently working on: A Semantic Publishing Framework, Embeddable RDF, and SPARQL. If you haven't heard of the latter two: eRDF is a microformats-like way to embed RDF triples in HTML, SPARQL is the W3C's protocol and query language for RDF repositories.

What I came up with so far is a Web Clipboard that works similar to Live Clipboard (I'm actually thinking about making it fully compatible), with just a few differences:

  • Web Clipboard uses a hidden single-line text input instead of a textarea which seemed to be a little bit easier to insert into the document structure, and it makes it work in Opera 8.5. The downside is that input fields don't allow multi-line content to be pasted (which is not needed by Web Clipboard, but will be necessary if I want to add Live Clipboard compatibility)
  • Web Clipboard doesn't paste complete resource descriptions, but only pointers to those. This makes it possible to e.g. copy a resource from a simple list of person's names, and display full contact details after a paste operation. (See the demo for an example which does asynchronous calls to a SPARQL endpoint). This "pass by reference" enables things like distributed address books or calendars where changes at one place could be automatically updated in the other apps.
  • Instead of XML, Web Clipboard uses a small JSON object which can simply be evaluated by JavaScript applications, or split with a basic regular expression. The pasted object contains 1) a resource identifier, and 2) an endpoint where information about the identified resource is available. The endpoint information consists of a URL and a list of specifications supported by the endpoint.

Complete documentation is going to be up at the clipboard site, but I'll first see if I can make things Live Clipboard-compatible (and I'll be travelling for the rest of the week). Here is a simple explanation how the current SPARQL demo works:

Apart from adding a small javascript library and a CSS file to the page, I specified the clipboard namespace and a default endpoint to be used for any resource pointer embedded in the page (this is eRDF syntax):
<link rel="schema.webclip" href="http://webclip.web-semantics.org/ns/webclip#" />
<link rel="webclip.endpoint" href="http://www.sparqlets.org/clipboard/sparql" />

Then I embedded a sparqlet that generates the list of Planet RDF bloggers (this is done server-side). The important thing is that the HTML contains eRDF hooks like this:
<div id="agent0" class="-webclip-Res">
  <span class="webclip-resID" title="_:bb1ed0e67fdb042619f2f20fdc479c3af_id2245787"></span>
  <span class="foaf-name">Bob DuCharme</span>
  <a rel="foaf-weblog" href="http://www.snee.com/bobdc.blog/">bobdc.blog by Bob DuCharme</a>
</div>

Ideally, the resource ID (webclip:resID, here again in eRDF notation) is a URI or some other stable identifier. The queried endpoint, however, obviously couldn't find a URI for the rendered resource, so it only provided a bnode ID. This is ok for the SPARQL endpoint the clipboard uses, though. The "foaf:weblog" information could be used to further disambiguate the resource identifier, the demo doesn't use it, however.

(The nice thing about eRDF-encoded hooks is that the information can be read by any HTTP- and eRDF-enabled client, the clipboard functionality could be implemented without having to load the page in a browser.)

Now, when the page is displayed, an onload-handler instantiates a JavaScript Web Clipboard which automatically adds an icon for each resource identified by the "webclip:Res/webvlip:resID"-hooks.

When the icon is clicked, the resource pointer JSON object is created and can be copied to the system's clipboard. It currently looks like this (on a single line):
{
 resID : "_:bb1ed0e67fdb042619f2f20fdc479c3af_id2245787",
 endpoint: {
  url: "http://www.sparqlets.org/clipboard/sparql",
specs: [ "http://www.w3.org/TR/rdf-sparql-protocol/", "http://bob.pythonmac.org/archives/2005/12/05/remote-json-jsonp/" ]
} }

We can see that the clipboard uses the default endpoint mentioned at the document level as the embedded hook didn't specify a resource-specific endpoint. We can also see that the endpoint supports two specs, namely the SPARQL protocol and JSONP.

When this JSON object is pasted to another clipboard section, the onpaste-handler can decide what to do. In the demo, any paste section will make an asynchronous On-Demand-JavaScript call to the resource's SPARQL endpoint to retrieve a custom resource representation. The "Latest blog post" section uses a pre-defined callback, but this can be overwritten (as e.g. done by the "Resource Description" section which uses a custom function to display results).

I've added a playground area to the clipboard site where you can create your own clipboard sections. Give it a try, it's not too complicated. You can even bookmark them.

Here is an example JavaScript snippet that adds a clipboard section to a clipboard-enabled page with an 'id="resultCountSection"' HTML element:
window.clipboard.addSection({
  id : "resultCountSection",
resIDVar : "myRes",
query : "SELECT ?knowee WHERE "+ "{"+ " ?myRes <http://xmlns.com/foaf/0.1/knows> ?knowee . "+ "}"+ " LIMIT 50", callback : function(qr){ var rows=(qr.results["bindings"]) ? qr.results.bindings : []; var result="The pasted resource seems to know "+ rows.length+" persons."; /* update paste area */ this.item.innerHTML=result; /* refresh clipboard */ window.clipboard.activate(); } }); window.clipboard.activate();

Something like this is all that will be needed for the final clipboard. No microformats parsing or similar burdens (although you could use the Web Clipboard to process microformats). The Clipboard's definition of an endpoint is rather open, too. An RSS file could be considered an endpoint as well as any other Web-accessible document or API.