<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Semantic Web Posts</title>
  <link rel="alternate" type="text/html" href="http://bnode.org/blog/sw_en" />
  <link rel="self" type="application/atom+xml" href="http://bnode.org/blog/atom1/sw_en.atom.atom" />
  <id>http://bnode.org/res/channel/sw_en</id>
  <updated>2010-07-28T09:50Z</updated>
  <author>
    <name>Benjamin Nowack</name>
  </author>
  <generator uri="http://semsol.com/" version="0.2.0">SemSol</generator>

  <entry>
    <title>Linked Data Entity Extraction with Zemanta and OpenCalais</title>
    <link rel="alternate" type="text/html" hreflang="en" href="http://bnode.org/blog/2010/07/28/linked-data-entity-extraction-with-zemanta-and-opencalais"/>
    <id>http://bnode.org/blog/2010/07/28/linked-data-entity-extraction-with-zemanta-and-opencalais</id>
    <published>2010-07-28T09:50Z</published>
    <updated>2010-07-29T07:27:15Z</updated>
    <author>
      <name>Benjamin Nowack</name>
    </author>
    <summary>A comparison of the NER APIs by Zemanta and OpenCalais.</summary>
    <category term="blogdb"/>
    <category term="linkeddata"/>
    <category term="ner"/>
    <category term="nlp"/>
    <category term="opencalais"/>
    <category term="prospect"/>
    <category term="rdf"/>
    <category term="readwriteweb"/>
    <category term="rww"/>
    <category term="semanticweb"/>
    <category term="zemanta"/>
    <content type="xhtml" xml:lang="en">
      <div xmlns="http://www.w3.org/1999/xhtml">
I had another look at the Named Entity Extraction APIs by <a href="http://zemanta.com/">Zemanta</a> and <a href="http://www.opencalais.com/">OpenCalais</a> for some product launch demos. My <a href="http://bnode.org/blog/2009/01/16/connecting-the-lod-dots-with-calais-4-0-and-zemanta">first test from last year</a> concentrated more on the Zemanta API. This time I had a closer look at both services, trying to identify the &amp;quot;better one&amp;quot; for &amp;quot;BlogDB&amp;quot;, a semi-automatic blog semantifier.<br />
<br />
My main need is a service that receives a cleaned-up plain text version of a blog post and returns normalized tags and reusable entity identifiers. So, the findings in this post are rather technical and just related to the BlogDB requirements. I ignored features which could well be essential for others, such as Zemanta's &amp;quot;related articles and photos&amp;quot; feature, or OpenCalais' entity relations (&amp;quot;X hired Y&amp;quot; etc.).<br />
<br />

<h4>Terms and restrictions of the free API</h4>
<ul><li>The API terms are pretty similar (the wording is actually almost identical). You need an API key and both services can be used commercially as long as you give attribution and don't proxy/resell the service. </li>
<li>OpenCalais gives you more free API calls out of the box  than Zemanta (50.000 vs. 1.000 per day). You can get a free upgrade to 10.000 Zemanta calls via a simple email, though (or excessive API use; Andraž auto-upgraded my API limit when he noticed my <a href="http://bnode.org/blog/2009/01/16/connecting-the-lod-dots-with-calais-4-0-and-zemanta">crazy HDStreams test</a> back then ;-).</li>
<li>OpenCalais lets you process larger content chunks (up to 100K, vs. 8K at Zemanta).</li></ul>
<br />

<h4>Calling the API</h4>
<ul><li>Both interfaces are simple and well-documented. Calls to the OpenCalais API are a tiny bit more complicated as you have to encode certain parameters in an XML string. Zemanta uses simple query string arguments. I've added the respective PHP snippets below, the complexity difference is negligible.
<pre class="code">function getCalaisResult($id, $text) {
  $parms = '
    &amp;lt;c:params xmlns:c=&amp;quot;http://s.opencalais.com/1/pred/&amp;quot;
              xmlns:rdf=&amp;quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;quot;&amp;gt;
      &amp;lt;c:processingDirectives
        c:contentType=&amp;quot;TEXT/RAW&amp;quot;
        c:outputFormat=&amp;quot;XML/RDF&amp;quot;
        c:calculateRelevanceScore=&amp;quot;true&amp;quot;
        c:enableMetadataType=&amp;quot;SocialTags&amp;quot;
        c:docRDFaccessible=&amp;quot;false&amp;quot;
        c:omitOutputtingOriginalText=&amp;quot;true&amp;quot;
        &amp;gt;&amp;lt;/c:processingDirectives&amp;gt;
      &amp;lt;c:userDirectives
        c:allowDistribution=&amp;quot;false&amp;quot;
        c:allowSearch=&amp;quot;false&amp;quot;
        c:externalID=&amp;quot;' . $id . '&amp;quot;
        c:submitter=&amp;quot;http://semsol.com/&amp;quot;
        &amp;gt;&amp;lt;/c:userDirectives&amp;gt;
      &amp;lt;c:externalMetadata&amp;gt;&amp;lt;/c:externalMetadata&amp;gt;
    &amp;lt;/c:params&amp;gt;
  ';
  $args = array(
    'licenseID' =&amp;gt; $this-&amp;gt;a['calais_key'],
    'content' =&amp;gt; urlencode($text),
    'paramsXML' =&amp;gt; urlencode(trim($parms))
  );
  $qs = substr($this-&amp;gt;qs($args), 1);
  $url = 'http://api.opencalais.com/enlighten/rest/';
  return $this-&amp;gt;getAPIResult($url, $qs);
}
</pre>
<pre class="code">function getZemantaResult($id, $text) {
  $args = array(
    'method' =&amp;gt; 'zemanta.suggest',
    'api_key' =&amp;gt; $this-&amp;gt;a['zemanta_key'],
    'text' =&amp;gt; urlencode($text),
    'format' =&amp;gt; 'rdfxml',
    'return_rdf_links' =&amp;gt; '1',
    'return_articles' =&amp;gt; '0',
    'return_categories' =&amp;gt; '0',
    'return_images' =&amp;gt; '0',
    'emphasis' =&amp;gt; '0',
  );
  $qs = substr($this-&amp;gt;qs($args), 1);
  $url = 'http://api.zemanta.com/services/rest/0.0/';
  return $this-&amp;gt;getAPIResult($url, $qs);
}
</pre> </li>
<li>The actual API call is then a simple POST:<pre class="code">function getAPIResult($url, $qs) {
  ARC2::inc('Reader');
  $reader = new ARC2_Reader($this-&amp;gt;a, $this);
  $reader-&amp;gt;setHTTPMethod('POST');
  $reader-&amp;gt;setCustomHeaders(&amp;quot;Content-Type: application/x-www-form-urlencoded&amp;quot;);
  $reader-&amp;gt;setMessageBody($qs);
  $reader-&amp;gt;activate($url);
  $r = '';
  while ($d = $reader-&amp;gt;readStream()) {
    $r .= $d;
  }
  $reader-&amp;gt;closeStream();
  return $r;
}
</pre></li>
<li>Both APIs are fast.
</li></ul>
<br />

<h4>API result processing</h4>
<ul><li>The APIs return rather verbose data, as they have to stuff in a lot of meta-data such as confidence scores, text positions, internal and external identifiers, etc. But they also offer RDF as one possible result format, so I could store the response data as a simple graph and then use SPARQL queries to extract the relevant information (tags and named entities). Below is the query code for Linked Data entity extraction from Zemanta's RDF. As you can see, the graph structure isn't trivial, but still understandable:
<pre class="code">SELECT DISTINCT ?id ?obj ?cnf ?name
FROM &amp;lt;' . $g . '&amp;gt; WHERE {
  ?rec a z:Recognition ;
       z:object ?obj ;
       z:confidence ?cnf .
  ?obj z:target ?id .
  ?id z:targetType &amp;lt;http://s.zemanta.com/targets#rdf&amp;gt; ;
      z:title ?name .
  FILTER(?cnf &amp;gt;= 0.4)
} ORDER BY ?id
</pre>
</li></ul>
<br />

<h4>Extracting normalized tags</h4>
<ul><li>OpenCalais results contain a section with so-called &amp;quot;SocialTags&amp;quot; which are directly usable as plain-text tags. </li>
<li>The tag structures in the Zemanta result are called &amp;quot;Keywords&amp;quot;. In my tests they only contained a subset of the detected entities, and so I decided to use the labels associated with detected entities instead. This worked well, but the respective query is more complex.</li></ul>
<br />

<h4>Extracting entities</h4>
<ul><li>In general, OpenCalais results can be directly utilized more easily. They contain stable identifiers and the identifiers come with type information and other attributes such as stock symbols. The API result directly tells you how many Persons, Companies, Products, etc. were detected. And the URIs of these entity types are all from a single (OpenCalais) namespace. If you are not a Linked Data pro, this simplifies things a lot. You only have to support a simple list of entity types to build a working semantic application. If you want to leverage the wider <a href="http://linkeddata.org/">Linked Open Data</a> cloud, however, the OpenCalais response is just a first entry point. It doesn't contain community URIs. You have to use the OpenCalais website to first retrieve disambiguation information, which may then (often involving another request) lead you to the decentralized Linked Data identifiers.</li>
<li>Zemanta responses, in contrast, do not (yet, Andraž told me they are working on it) contain entity types at all. You always need an additional request to retrieve type information (unless you are doing nasty URI inspection, which is what I did with detected URIs from <a href="http://cb.semsol.org/">Semantic CrunchBase</a>). The retrieval of type information is done via Open Data servers, so you have to be able to deal with the usual down-times of these non-commercial services.</li>
<li>Zemanta results are very &amp;quot;webby&amp;quot; and full of community URIs. They even include sameAs information. This can be a bit overwhelming if you are not an RDFer, e.g. looking up a <a href="http://dbpedia.org/">DBPedia</a> URI will often give you dozens of entity types, and you need some experience to match them with your internal type hierarchy. But for an open data developer, the hooks provided by Zemanta are a dream come true. </li>
<li>With Zemanta associating shared URIs with all detected entities, I noticed network effects kicking in a couple of times. I used <a href="http://readwriteweb.com/">RWW</a> articles for the test, and in one post, for example, OpenCalais could detect the company &amp;quot;Starbucks&amp;quot; and &amp;quot;Howard Schultz&amp;quot; as their &amp;quot;CEO&amp;quot;, but their public RDF (when I looked up the &amp;quot;Howard Schultz&amp;quot; URI) didn't persist this linkage. The detection scope was limited to the passed snippet. Zemanta, on the other hand, directly gave me Linked Data URIs for both &amp;quot;Starbucks&amp;quot; and &amp;quot;Howard Schultz&amp;quot;, and these identifiers make it possible to re-establish the relation between the two entities at any time. This is a very powerful feature.
</li></ul>
<br />

<h4>Summary</h4>
Both APIs are great. The quality of the entity extractors is awesome. For the RWW posts, which deal a lot with Web topics, Zemanta seemed to have a couple of extra detections (such as &amp;quot;ReadWriteWeb&amp;quot; as company). As usual, some owl:sameAs information is wrong, and Zemanta uses incorrect Semantic CrunchBase URIs (&amp;quot;.rdf#self&amp;quot; instead of &amp;quot;#self&amp;quot; // <em>Update: to be fixed in the next Zemanta API revision</em>), but I blame us (the RDF community), not the API providers, for not making these things easier to implement.<br />
<br />
In the end, I decided to use both APIs in combination, with an optional post-processing step that builds a consolidated, internal ontology from the detected entities (OpenCalais has two Company types which could be merged, for example). Maybe I can make a <a href="http://semsol.com/prospect">Prospect</a> demo from the RWW data public, not sure if they would allow this. It's really impressive how much value the entity extraction services can add to blog data, though (see the screenshot below, which shows a pivot operation on products mentioned in posts by Sarah Perez). I'll write a bit more about the possibilities in another post.<br />
<br />
<a href="http://bnode.org/media/2010/07/blogdb_rww.gif"><img src="http://bnode.org/media/2010/07/blogdb_rww_small.gif" title="RWW posts via BlogDB" alt="RWW posts via BlogDB" /></a>


      </div>
    </content>
  </entry>

  <entry>
    <title>Contextual configuration - Semantic Web development for visually minded webmasters</title>
    <link rel="alternate" type="text/html" hreflang="en" href="http://bnode.org/blog/2010/05/10/contextual-configuration-semantic-web-development-for-visually-minded-webmasters"/>
    <id>http://bnode.org/blog/2010/05/10/contextual-configuration-semantic-web-development-for-visually-minded-webmasters</id>
    <published>2010-05-21T12:40Z</published>
    <updated>2010-05-21T13:06:11Z</updated>
    <author>
      <name>Benjamin Nowack</name>
    </author>
    <summary>A short screencast demonstrating contextual configuration via widgets in semsol's RDF CMS.</summary>
    <category term="cms"/>
    <category term="configuration"/>
    <category term="faceted browser"/>
    <category term="paggr"/>
    <category term="prospect"/>
    <category term="semanticweb"/>
    <category term="ux"/>
    <category term="widgets"/>
    <content type="xhtml" xml:lang="en">
      <div xmlns="http://www.w3.org/1999/xhtml">
Let's face it, building semantic web sites and apps is still far from easy. And to some extent, this is due to the configuration overhead. The RDF stack is built around declarative languages (for simplified integration at various levels), and as a consequence, configuration directives often end up in some form of declarative format, too. While fleshing out an RDF-powered website, you have to declare a ton of things. From namespace abbreviations to data sources and API endpoints, from vocabularies to identifier mappings, from queries to object templates, and what have you.<br />
<br />
Sadly, many of these configurations are needed to style the user interface, and because of RDF's open world context, designers have to know much more about the data model and possible variations than usually necessary. Or webmasters have to deal with design work. Not ideal either. If we want to bring RDF to mainstream web developers, we have to simplify the creation of user-optimized apps. The value proposition of semantics in the context of information overload is pretty clear, and some form of data integration is becoming mandatory for any modern website. But the entry barrier caused by large and complicated configuration files (Fresnel anyone?) is still too high. How can we get from our powerful, largely generic systems to end-user-optimized apps? Or the other way round: How can we support frontend-oriented web development with our flexible tools and freely mashable data sets? (Let me quickly mention Drupal here, which is doing a great job at near-seamlessly integrating RDF. OK, back to the post.)<br />
<br />
Enter RDF widgets. Widgets have obvious backend-related benefits like accessing, combining and re-purposing information from remote sources within a manageable code sandbox. But they can also greatly support frontend developers. They simplify page layouting and incremental site building with instant visual feedback (add a widget, test, add another one, re-arrange, etc.). And, more importantly in the RDF case, they can offer a way to iteratively configure a system with very little technical overhead. Configuration options could not only be scoped to the widget at hand, but also to the <em>context</em> where the widget is currently viewed. Let's say you are building an RDF browser and need resource templates for all kinds of items. With contextual configuration, you could simply browse the site and at any position in the ontology or navigation hierarchy, you would just open a configuration dialog and define a custom template, if needed. Such an approach could enable systems that worked out of the box (raw, but usable) and which could then be continually optimized, possibly even by site users.<br />
<br />
A lot of &amp;quot;could&amp;quot; and &amp;quot;would&amp;quot; in the paragraphs above, and the idea may sound quite abstract without actually seeing it. To illustrate the point I'm trying to make I've prepared a short video (embedded below). It uses <a href="http://cb.semsol.org/">Semantic CrunchBase</a> and <a href="http://semsol.com/prospect">Paggr Prospect</a> (our new faceted browser builder) as an example use case for in-context configuration.<br />
<br />
And if you are interested in using one of our solutions for your own projects, <a href="http://semsol.com/contact">please get in touch</a>!<br />
<br />
<br />
<object width="550" height="385"><param name="movie" value="http://www.youtube.com/v/Sz8ohHDViL8&amp;hl=en_US&amp;fs=1&amp;rel=0&amp;color1=0x006699&amp;color2=0x54abd6"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/Sz8ohHDViL8&amp;hl=en_US&amp;fs=1&amp;rel=0&amp;color1=0x006699&amp;color2=0x54abd6" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="550" height="385"></embed></object>
<br />
Paggr Prospect (part 1) <br />
<br />
<object width="550" height="385"><param name="movie" value="http://www.youtube.com/v/_yO_dEn0g0g&amp;hl=en_US&amp;fs=1&amp;rel=0&amp;color1=0x006699&amp;color2=0x54abd6"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/_yO_dEn0g0g&amp;hl=en_US&amp;fs=1&amp;rel=0&amp;color1=0x006699&amp;color2=0x54abd6" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="550" height="385"></embed></object>
<br />
Paggr Prospect (part 2)<br />

      </div>
    </content>
  </entry>

  <entry>
    <title>Trice' Semantic Richtext Editor</title>
    <link rel="alternate" type="text/html" hreflang="en" href="http://bnode.org/blog/2010/05/01/trice-semantic-richtext-editor"/>
    <id>http://bnode.org/blog/2010/05/01/trice-semantic-richtext-editor</id>
    <published>2010-05-01T16:35Z</published>
    <updated>2010-05-03T09:44:47Z</updated>
    <author>
      <name>Benjamin Nowack</name>
    </author>
    <summary>A screencast demonstrating the structured RTE bundled with the Trice CMS</summary>
    <category term="cms"/>
    <category term="editor"/>
    <category term="html5"/>
    <category term="linkeddata"/>
    <category term="markup"/>
    <category term="microdata"/>
    <category term="rdfa"/>
    <category term="rte"/>
    <category term="semanticweb"/>
    <category term="trice"/>
    <content type="xhtml" xml:lang="en">
      <div xmlns="http://www.w3.org/1999/xhtml">
In my <a href="http://bnode.org/blog/2010/04/15/could-having-two-rdf-in-htmls-actually-be-handy">previous post</a> I mentioned that I'm building a Linked Data CMS. One of its components is a rich-text editor that allows the creation (and embedding) of structured markup.<br />
<br />
An earlier version supported limited Microdata annotations, but now I've switched the mechanism and use an intermediate, but even simpler approach based on HTML5's handy data-* attributes. This lets you build almost arbitrary markup with the editor, including Microformats, Microdata, or RDFa. I don't know yet when the CMS will be publicly available (3 sites are under development right now), but as mentioned, I'd be happy about another pilot project or two. Below is a video demonstrating the editor and its easy customization options.<br />
<br />
<object width="550" height="385"><param name="movie" value="http://www.youtube.com/v/bn8DmFGk9rA&amp;hl=en_US&amp;fs=1&amp;rel=0&amp;color1=0x006699&amp;color2=0x54abd6"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/bn8DmFGk9rA&amp;hl=en_US&amp;fs=1&amp;rel=0&amp;color1=0x006699&amp;color2=0x54abd6" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="550" height="385"></embed></object>


      </div>
    </content>
  </entry>

  <entry>
    <title>Could having two RDF-in-HTMLs actually be handy?</title>
    <link rel="alternate" type="text/html" hreflang="en" href="http://bnode.org/blog/2010/04/15/could-having-two-rdf-in-htmls-actually-be-handy"/>
    <id>http://bnode.org/blog/2010/04/15/could-having-two-rdf-in-htmls-actually-be-handy</id>
    <published>2010-04-15T10:30Z</published>
    <updated>2010-05-10T08:15:05Z</updated>
    <author>
      <name>Benjamin Nowack</name>
    </author>
    <summary>A combination of  RDFa and Microdata would allow for separate semantic layers.</summary>
    <category term="cms"/>
    <category term="microdata"/>
    <category term="paggr"/>
    <category term="rdf"/>
    <category term="rdfa"/>
    <category term="semanticweb"/>
    <category term="stepbystep"/>
    <category term="trice"/>
    <category term="widgets"/>
    <content type="xhtml" xml:lang="en">
      <div xmlns="http://www.w3.org/1999/xhtml">
Apart from grumpy rants about the complexity of W3C's RDF specs and <a href="http://twitter.com/bengee/status/11886421732">semantic richtext editing excitement</a>, I haven't blogged or tweeted a lot recently. That's partly because there finally is increased demand for the stuff I'm doing at <a href="http://semsol.com/">semsol</a> (agency-style SemWeb development), but also because I've been working hard on getting my tools in a state where they feel more like typical Web frameworks and apps. <a href="http://talis.com/">Talis</a>' <a href="http://fanhu.bz/">Fanhu.bz</a> is an example where (I think) we found a good balance between powerful RDF capabilities (data re-purposing, remote models, data augmentation, a crazy army of inference bots) and a non-technical UI (simplistic visual browser, Twitter-based annotation interfaces).<br />
<br />
Another example is something I've been working on during the last months: I somehow managed to combine essential parts of <a href="http://paggr.com/">Paggr</a> (a drag&amp;drop portal system based on RDF- and SPARQL-based widgets) with an RDF CMS (I'm currently looking for pilot projects). And although I decided to switch entirely to <a href="http://www.w3.org/TR/microdata/">Microdata</a> for semantic markup after exploring it during the FanHubz project, I wonder if there might be room for having two separate semantic layers in this sort of widget-based websites. Here is why:<br />
<br />
As mentioned, I've taken a widget-like approach for the CMS. Each page section is a resource on its own that can be defined and extended by the web developer, it can be styled by themers, and it can be re-arranged and configured by the webmaster. In the RDF CMS context, widgets can easily integrate remote data, and when the integrated information is exposed as machine-readable data in the front-end, we can get beyond the &amp;quot;just-visual&amp;quot; integration of current widget pages and <a href="http://bnode.org/blog/2009/06/04/eswc-2009-linked-data-dashboards">bring truly connectable and reusable information to the user interface</a>.<br />
<br />
Ideally, both the widgets' structural data and the content can be re-purposed by other apps. Just like in the early days of the Web, we could re-introduce a copy &amp; paste culture of things for people to include in their own sites. With the difference that RDF simplifies copy-by-reference and source attribution. And both developers and end-users could be part of the game this time.<br />
<br />
Anyway, one technical issue I encountered is when you have a page that contains multiple page items, but describes a single resource. With a single markup layer (say Microdata), you get a single tree where the context of the hierarchy is constantly switching between structural elements and content items (page structure -&amp;gt; main content -&amp;gt; page layout -&amp;gt; widget structure -&amp;gt; widget content). If you want to describe a single resource, you have to repeatedly re-introduce the triple subject (&amp;quot;this is about the page structure&amp;quot;, &amp;quot;this is about the main page topic&amp;quot;). The first screenshot below shows the different (grey) widget areas in the editing view of the CMS. In the second screenshot, you can see that the displayed information (the marked calendar date, the flyer image, and the description) in the main area and the sidebar is about a single resource (an event).<br />
<br />
<img src="http://bnode.org/media/2010/04/trice_cms_editing.gif" title="Trice CMS Editor" alt="Trice CMS Editor" /><br />
<small>Trice CMS editing view</small><br />
<br />
<img src="http://bnode.org/media/2010/04/trice_cms_view.gif" title="Trice CMS Editor" alt="Trice CMS Editor" /><br />
<small>Trice CMS page view with inline widgets describing one resource</small><br />
<br />
If I used two separate semantic layers, e.g. RDFa for the content (the event description) and Microdata for the structural elements (column widths, widget template URIs, widget instance URIs), I could describe the resource and the structure without repeating the event subject in each page item.<br />
<br />
To be honest, I'm not sure yet if this is really a problem, but I thought writing it down could kick off some thought processes (which now tend towards &amp;quot;No&amp;quot;). Keeping triples as stand-alone-ish as possible may actually be an advantage (even if subject URIs have to be repeated). No semantic markup solution so far provides full containment for reliable copy &amp; paste, but explicit subjects (or &amp;quot;itemid&amp;quot;s in Microdata-speak) could bring us a little closer.<br />
<br />
Conclusions? Err.., none yet. But hey, did you see the cool CMS screenshots?



      </div>
    </content>
  </entry>

  <entry>
    <title>Microdata, semantic markup for both RDFers and non-RDFers</title>
    <link rel="alternate" type="text/html" hreflang="en" href="http://bnode.org/blog/2010/01/26/microdata-semantic-markup-for-both-rdfers-and-non-rdfers"/>
    <id>http://bnode.org/blog/2010/01/26/microdata-semantic-markup-for-both-rdfers-and-non-rdfers</id>
    <published>2010-01-26T12:00Z</published>
    <updated>2010-01-26T18:20:03Z</updated>
    <author>
      <name>Benjamin Nowack</name>
    </author>
    <summary>RDF-in-HTML could have been so simple.</summary>
    <category term="arc"/>
    <category term="html5"/>
    <category term="microdata"/>
    <category term="rdf-in-html"/>
    <category term="rdfa"/>
    <category term="semanticweb"/>
    <content type="xhtml" xml:lang="en">
      <div xmlns="http://www.w3.org/1999/xhtml">
There's been a whole lot of discussion around <a href="http://dev.w3.org/html5/md/Overview.html">Microdata</a>, a new approach for embedding machine-readable information into forthcoming HTML5. What I find most attractive about Microdata is the fact that it was designed by HTMLers, not RDFers. It's refreshingly pragmatic, free of other RDF spec legacy, but still capable of expressing most of RDF.<br />
<br />
Unfortunately, <a href="http://rdfa.info">RDFa</a> lobbyists on the HTML WG mailing list forced the spec out of HTML5 core for the time being. This manoeuver was understandable (a lot of energy went into RDFa, after all), but in my opinion very short-sighted. How many uphill battles did we have, trying to get RDF to the broader developer community? And how many were successful? Atom, microformats, OpenID, Portable Contacts, XRDS, Activity Streams (well, not really), these are examples where RDFers tried, but failed to promote some of their infrastructure into the respective solutions. Now: HTML5, where the initial RDF lobbying actually had an effect and lead to a native mechanism for RDF-in-HTML. Yes, <strong>native</strong>, not in some separate spec. This would have become part of every HTML5 book, any HTML developer on this planet would have learned about it. Finally a battle won. And what a great one. HTML.<br />
<br />
But no, Microdata wasn't developed by an RDF group, so they voted it out again. Now, the really sad thing is, there could have been a solution that would have served everybody sufficiently well, both HTMLers and RDFers. The RDFa group recently realized that RDFa needs to be revised anyway, there is going to be an RDFa 1.1 which will require new parsers. If they'd swallowed their pride, they would most probably have been able to define RDFa 1.1 as a proper superset of Microdata.<br />
<br />
Here is a short overview of RDF features supported by Microdata:
<ul><li>Explicit resource containers, via @itemscope (in RDFa, the boundaries of a resource are often implicitly defined by @rel or @typeof)</li>
<li>Subject declaration, via @itemid (RDFa uses @about)</li>
<li>Main subject typing, via @itemtype (RDFa uses @typeof)</li>
<li>Predicate declaration, via @itemprop (RDFa uses @property, @rel, and @rev)</li>
<li>Literal objects, via node values (RDFa also allows hidden values via @content)</li>
<li>Non-literal objects, via @href, @src, etc. (RDFa also allows hidden values via @resource)</li>
<li>Object language, via @lang</li>
<li>Blank nodes</li></ul>

I won't go into details why hiding semantics in RDFa will be penalized by search engines as soon as spammers discover the possibilities, why reusing RDF/XML's attribute names was probably not a smart move with regard to attracting non-RDFers, why the new @vocab idea is impractical, or why namespace prefixes, as handy as they are in other RDF formats, are not too helpful in an HTML context. Let's simply state that there is a trade-off between extended features (RDFa) and simplicity (Microdata). So, what are the core features that an RDFer would really need beyond Microdata:
<ul><li>the possibility to preserve markup, but probably not necessarily as an explicit rdf:XMLLiteral</li>
<li>datatypes for literal objects (I personally never used them in practice in the last 6 years that I've been developing RDF apps, but I can see some use cases)</li></ul>

Markup preservation is currently turned on by default in RDFa and can be disabled through @datatype in RDFa, so an RDFer-satisfying RDFa 1.1 spec could probably just be Microdata + @datatype +  a few extended parsing rules to end up with the intended RDF. My experience with watching RDF spec creation tells me that the RDFa group won't pick this route (there simply is no &amp;quot;<a href="http://www.slideshare.net/dmc500hats/startup-metrics-for-pirates-fowa-london-oct-2009">Kill a Feature</a>&amp;quot; mentality in the RDF community), but hey, hope dies last.<br />
<br />
I've been using Microdata in two of my recent RDF apps and the CMS module of (ahem, still not documented) Trice, and it's been a great experience. <a href="http://arc.semsol.org/">ARC</a> is going to get a &amp;quot;microRDF&amp;quot; extractor that supports the RDF-in-Microdata markup below (Note: this output still requires a 2nd extraction process, as the current Microdata draft's RDF mechanism only produces intermediate RDF triples, which then still have to be post-processed. I hope <a href="http://lists.w3.org/Archives/Public/public-html/2010Jan/0912.html">my related suggestion</a> will become official, but I seem to be the only pro-Microdata RDFer on the HTML list right now, so it may just stay as a convention):
<br />
<br />
<strong>Microdata</strong>:
<pre class="code">&amp;lt;div itemscope itemtype=&amp;quot;<strong>http://xmlns.com/foaf/0.1/</strong>Person&amp;quot;&amp;gt;

  &amp;lt;!-- plain props are mapped to the itemtype's context --&amp;gt;
  &amp;lt;img itemprop=&amp;quot;<strong>img</strong>&amp;quot; src=&amp;quot;mypic.jpg&amp;quot; alt=&amp;quot;a pic of me&amp;quot; /&amp;gt;
  My name is &amp;lt;span itemprop=&amp;quot;<strong>name</strong>&amp;quot;&amp;gt;&amp;lt;span itemprop=&amp;quot;<strong>nick</strong>&amp;quot;&amp;gt;Alec&amp;lt;/span&amp;gt; Tronnick&amp;lt;/span&amp;gt;
  and I blog at &amp;lt;a itemprop=&amp;quot;<strong>weblog</strong>&amp;quot; href=&amp;quot;http://alec-tronni.ck/&amp;quot;&amp;gt;alec-tronni.ck&amp;lt;/a&amp;gt;.

  &amp;lt;!-- other RDF vocabs can be used via full itemprop URIs --&amp;gt;
  &amp;lt;span itemprop=&amp;quot;<strong>http://purl.org/vocab/bio/0.1/olb</strong>&amp;quot;&amp;gt;
    I'm a crash test dummy for semantic HTML.
  &amp;lt;/span&amp;gt;
&amp;lt;/div&amp;gt;
</pre>

<strong>Extracted RDF</strong>:
<pre class="code">@base &amp;lt;http://host/path/&amp;gt;
@prefix foaf: &amp;lt;http://xmlns.com/foaf/0.1/&amp;gt; .
@prefix bio: &amp;lt;http://purl.org/vocab/bio/0.1/&amp;gt; .
_:bn1 a foaf:Person ;
      foaf:img &amp;lt;mypic.jpg&amp;gt; ;
      foaf:name &amp;quot;Alec Tronnick&amp;quot; ;
      foaf:nick &amp;quot;Alec&amp;quot; ;
      foaf:weblog &amp;lt;http://alec-tronni.ck/&amp;gt; ;
      bio:olb &amp;quot;I'm a crash test dummy for semantic HTML.&amp;quot; .
</pre>


      </div>
    </content>
  </entry>

  <entry>
    <title>Naming Properties and Relations (comment)</title>
    <link rel="alternate" type="text/html" hreflang="en" href="http://bnode.org/blog/2009/09/15/naming-properties-and-relations_comment"/>
    <id>http://bnode.org/blog/2009/09/15/naming-properties-and-relations_comment</id>
    <published>2009-09-15T17:45Z</published>
    <updated>2009-09-17T10:00:03Z</updated>
    <author>
      <name>Benjamin Nowack</name>
    </author>
    <summary>A local comment to JeniT's post about predicate names</summary>
    <category term="graphnote"/>
    <category term="microblogging"/>
    <category term="rdf"/>
    <category term="semantic logging"/>
    <category term="semanticweb"/>
    <category term="ui"/>
    <content type="xhtml" xml:lang="en">
      <div xmlns="http://www.w3.org/1999/xhtml">
I was incapable of adding a comment to <a href="http://www.jenitennison.com/blog/node/128">Jeni's interesting post about RDF predicate Names</a> (markdown-related, my fault), so I'll quickly post it here, as I'm pondering similar things, too.<br />
<br />
In her post, Jeni explores the issues around naming RDF terms. The community gathered a couple of experiences and suggestions in the last years, some entry points are:
<ul><li><a href="http://dig.csail.mit.edu/breadcrumbs/node/72">Backward and Forward links in RDF just as important</a></li>
<li><a href="http://esw.w3.org/topic/HasPropertyOf">HasPropertyOf (ESW Wiki)</a></li>
<li><a href="http://esw.w3.org/topic/RoleNoun">RoleNoun (ESW Wiki)</a>
</li></ul><br />

I personally find &amp;quot;role-noun&amp;quot; easier to support in RDF apps than the older hasPropertyOf (now often considered anti-)pattern. And inverse properties are just painful, as they usually require some form of inference to streamline the user experience. <br />
<br />
Not sure if that's helpful information, but for a project around semantic note-taking/logging, I played with different notations users might be comfortable with, for entering factoids using an unstructured input form (à la Twitter). I could identify the following patterns that still seemed to be acceptable (as shared/supported syntax). All of them can be implemented using role-noun predicates (assuming that predicate labels are similar to the predicate names):
<ul><li>SUBJECT'(s)? PREDICATE (:|is) OBJECT</li>
<li>OBJECT is SUBJECT'(s)? PREDICATE</li>
<li>OBJECT is (the)? PREDICATE of SUBJECT</li>
<li>SUBJECT has PREDICATE (:)? OBJECT</li>
<li>(the|a)? PREDICATE(s)? of SUBJECT (is|are) OBJECT ((,|and|&amp;) OBJECT)*</li></ul>
(There are more patterns, for things like tagging and typing, but the examples above are the predicate-related grammar rules).<br />
<br />
As soon as you add (has|is|of) to one PREDICATE, you get problems with the other notations, so role-noun seems to be a good fit.<br />
<br />
Unfortunately, one (non-trivial) problem remains: People (and Web 2.0 apps) also like 'SUBJECT PREDICATE_VERB OBJECT' (e.g. &amp;quot;likes&amp;quot;, &amp;quot;bookmarked&amp;quot;, &amp;quot;said&amp;quot;, &amp;quot;posted&amp;quot;, &amp;quot;is listening to&amp;quot; ...) and I don't have a proper idea how to handle those automatically yet, other than hard-coding support for the typical social media verbs. It could be possible to use wordnet to detect verbs and derive a canonicalized form, and then model those patterns as activities (activity = liking, bookmarking, saying, posting, listening, plus ACTIVITY_PERSON and ACTIVITY_TARGET or somesuch). If anyone has a suggestion, I'd be happy to hear it.


      </div>
    </content>
  </entry>

  <entry>
    <title>New ARC2 release</title>
    <link rel="alternate" type="text/html" hreflang="en" href="http://bnode.org/blog/2009/08/21/new-arc2-release"/>
    <id>http://bnode.org/blog/2009/08/21/new-arc2-release</id>
    <published>2009-08-21T10:20Z</published>
    <updated>2009-08-21T10:40:22Z</updated>
    <author>
      <name>Benjamin Nowack</name>
    </author>
    <summary>Finally in sync with code.semsol.org and the BZR repository</summary>
    <category term="arc2"/>
    <category term="bzr"/>
    <category term="release"/>
    <category term="semanticweb"/>
    <content type="xhtml" xml:lang="en">
      <div xmlns="http://www.w3.org/1999/xhtml">
I moved <a href="http://code.semsol.org/source/arc/">ARC's codebase</a> to a BZR repository <a href="http://bnode.org/blog/2009/06/22/code-semsol-org-a-central-home-for-semsol-code">2 months ago</a> but didn't really find the time to synchronize it with the way I created bundles in the past. Today I finally linked the repository and its TGZ creation feature from the <a href="http://arc.semsol.org/download/">main download page</a>. This is the first bundle since March, so there are quite a number of fixes. Some tweaks were not logged, but from now on, the process should be more professional (thanks to the proper versioning system).<br />
<br />
Here is the raw list of changes, the most interesting are probably the improved RDFa extractor (cheers to Toby Inkster and Masahide Kanzaki for code) and the new auto-cleanup of unused values/hashes in the RDF store. I received a couple of more patches which will be integrated in the coming weeks:
<ul><li>new component: Resource </li>
<li>new method: completeQuery (PREFIX-injection)</li>
<li>Reader: new method: getResponseHeaders</li>
<li>RDFa: fixes, +3 test case PASSes (thx to Toby Inkster &amp; Masahide Kanzaki)</li>
<li>Class: auto-populate POST (php5 bug)</li>
<li>Class: refactored *PName methods</li>
<li>new methods: toIndex, toTriples, checkRegex</li>
<li>Parsers: unsetting reader object to fix garbage collection</li>
<li>SelectQueryHandler: improved LIKE-check for REGEX-rewriting</li>
<li>Class: used prefixes were not logged, leading to serialization gaps</li>
<li>Class: fixed root calculation bug in calcURI</li>
<li>Class: new methods: toDataURI/fromDataURI</li>
<li>ARC2_SPARQLScriptProcessor: improved automatic PREFIX injection</li>
<li>ARC2_RemoteStore: added automatic PREFIX injection and getResourceLabel method</li>
<li>ARC2_StoreSelectQueryHandler: fixed missing brackets in getExpressionSQL.</li>
<li>Reader: Improved timeout handling</li>
<li>Reader: support for port in http header (thx to Roan O'Sullivan)</li>
<li>Slowly starting to switch to inline PHPDoc documentation</li>
<li>Atom_Parser: Addition: support for link types</li>
<li>DeleteQueryHandler: Addition: cleanValueTables method (auto-called every 500 DELETE queries)</li>
<li>Class: new method: resetErrors</li>
<li>Class: switch from getScriptURI to getRequestURI in init()</li></ul>
<br />
<strong>In related news:</strong>
<ul><li>Tuukka Hastrup created an <a href="http://tuukka.sioc-project.org/arc2-starter-pack/">ARC 2 Starter Pack</a> that simplifies the process of setting up an ARC store.</li>
<li>Andrew Ritz created a <a href="http://twilight-labs.co.cc/blog/?p=10">WordPress extension</a> that lets you embed results from remote SPARQL endpoints directly in your blog pages.
</li></ul>
      </div>
    </content>
  </entry>

  <entry>
    <title>SKOS + DC + Linked Data = Semantic Tagging?</title>
    <link rel="alternate" type="text/html" hreflang="en" href="http://bnode.org/blog/2009/08/19/skos-dc-linked-data-semantic-tagging"/>
    <id>http://bnode.org/blog/2009/08/19/skos-dc-linked-data-semantic-tagging</id>
    <published>2009-08-19T12:35Z</published>
    <updated>2009-08-19T13:04:42Z</updated>
    <author>
      <name>Benjamin Nowack</name>
    </author>
    <summary>Using Dublin Core terms to link SKOS concepts to Linked Data entities</summary>
    <category term="dc"/>
    <category term="dcmi"/>
    <category term="faviki"/>
    <category term="semanticweb"/>
    <category term="skos"/>
    <category term="tagging"/>
    <content type="xhtml" xml:lang="en">
      <div xmlns="http://www.w3.org/1999/xhtml">
Still looking for a simple way to tag concrete resources (to-do items, people, locations) with personal concepts (e.g. &amp;quot;non-profit&amp;quot;, &amp;quot;research&amp;quot;, &amp;quot;semweb&amp;quot;), and <strong>also</strong> with other non-conceptual resources (clients, projects), I skimmed through the fresh <a href="http://www.w3.org/TR/skos-reference/">SKOS Recommendation</a>. I'm still a fan of SKOS and frequently wonder about semweb apps where the internal models are grounded in pluggable, personal(!) SKOS schemes, instead of coordination-intensive RDF Schemas or OWL ontologies. I don't know if such an approach could really work, I guess network effects benefit more from rather tightly defined relations and identifiers. Mainly just to have it written down somewhere (this is really not well thought out yet), here are some of the related entry points and considerations:<br />
<ul><li><strong>Tagging should be personal.</strong><br />
While I like the idea of grounding tags in existing dictionaries such as DBPedia, tags seem to work best when they are as user-defined and informal as possible. Last year, I experimented with a tool that allowed me to tag things with other people's delicious tags. It just felt wrong, I wanted my &amp;quot;own&amp;quot; tags. (I think the latest <a href="http://www.faviki.com">Faviki</a> release is a nice example for combining the best of both worlds).</li>
<li><strong>SKOS supports personal tags</strong><br />
Concepts in SKOS are sort-of scoped (or &amp;quot;namespaced&amp;quot;). If I describe a &amp;quot;Fun&amp;quot; concept, it is defined as seen by the creator of the concept URI, i.e. I can annotate it with '<code>:Fun dct:creator &amp;lt;#me&amp;gt; ; dct:created &amp;quot;2009-08-19&amp;quot;</code>' etc, even though the general idea of Fun was clearly not invented by me, and definitely before today.</li>
<li><strong>Tags should be safely portable</strong><br />
Thanks to URIs, SKOS concepts can be ported to other applications, and they can be grouped and organized in so-called concept schemes, i.e. I could have a &amp;quot;Waving&amp;quot; in a &amp;quot;Dance&amp;quot; concept scheme, and also in a &amp;quot;Netiquette&amp;quot; scheme.</li>
<li><strong>There is a need to merge tag sets</strong><br />
If tags are used to organize all sorts of personal things, it should be possible to merge them into a unified model. Mainly for personal use (&amp;quot;personal world view&amp;quot;), but also for sharing with other people and linking to their views. This is again possible thanks to SKOS being based on RDF, URIs, and very loose semantics.</li>
<li><strong>There is a need to tag real-world objects with concepts</strong><br />
This is partly obvious. Tags are a means to an end. But while they are already widely used to annotate document-like resources (web pages, photos, etc), I'd also like to tag things like my projects, people in my address book, and similar non-documents. From the <a href="http://www.w3.org/TR/skos-primer/#secindexing">SKOS Primer</a>:
<cite class="cite">While the SKOS vocabulary itself does not include a mechanism for associating an arbitrary resource with a skos:Concept, implementors can turn to other vocabularies</cite> So, whatever predicate URI we are going to use, it's not going to be provided by SKOS directly. </li>
<li><strong>Maybe Dublin Core terms can link non-documents to concepts</strong><br />
This is a slightly controversial conclusion/assumption, given that DC terms are mainly associated with document metadata. But after exploring the <a href="http://dublincore.org/">DCMI website</a>, I can't find any clear evidence that their terms can't be used more generally. Both the <a href="http://dublincore.org/documents/usageguide/">Usage Guide</a> (thanks to <a href="http://twitter.com/_masaka/status/3403593488">Masahide</a> for the pointer) and the <a href="http://dublincore.org/documents/abstract-model/">Abstract Model</a> actually support this thought. The Usage guide mentions that &amp;quot;DC metadata can be applied to other resources as well&amp;quot; (but notes that the suitability may depend on the particular context at hand), and the Abstract Model states that the notion of a Dublin Core &amp;quot;resource&amp;quot; is equivalent to &amp;quot;Resource&amp;quot; defined in <a href="http://www.w3.org/2000/01/rdf-schema#">RDF Schema</a>, which can be anything, even including Literals. So, we can most probably use <code>dct:subject</code> or <code>dct:relation</code> to tag a project or person with a SKOS concept. </li>
<li><strong>There is a need to associate concepts with real-world objects</strong><br />
If we organize our personal concept space with SKOS, we may also want to more formally specify our personal concepts, so that other applications or people can merge them with their tags. Therefore, we need a predicate that can relate concepts to non-concepts such as <a href="http://dbpedia.org/">DBPedia</a> identifiers. Such a mechanism could maybe also help with RDF's general problem of URI aliases. I could have a personal, canonical concept URI for a resource and use it as a container for the resource's various aliases. Again, SKOS does not provide a predicate for this use case, so we've got to look elsewhere. </li>
<li><strong>Maybe Dublin Core terms can link concepts to real-world objects</strong><br />
Another possibly controversial conclusion, but again there is supporting text in the <a href="http://dublincore.org/documents/abstract-model/#sect-4">DCMI specs</a>: &amp;quot;<cite class="cite">A value associated with the Dublin Core Subject property is a concept (a conceptual entity) or a physical object or person (a physical entity)</cite>&amp;quot;. So, if the value of dc:subject can be a non-document, we can say things like <code>:Berlin a skos:Concept; dct:subject dbpedia:Berlin .</code>. This is very interesting because it could allow us to use dct:subject in both ways: for the tagging of things, and also for grounding tags. FOAF has a handy <a href="http://xmlns.com/foaf/spec/#term_primaryTopic">primaryTopic</a> term, which could work in this context, too, but unfortunately, its scope is (currently) set to foaf:Document. <a href="http://danbri.org/">DanBri</a> also suggested the creation of a dedicated <code>skos:it</code> (or similar) predicate which would be even better. </li>
<li><strong>Sometimes I'd like to &amp;quot;tag&amp;quot; real-world objects with real-world objects</strong><br />
Don't know if <em>tagging</em> is still the right word here, but what I mean is a generic relation for arbitrary things in a common application context. Often, we can do better by specifying the relation between two resources, but in other cases, a simple, maybe just temporary link, is better than laziness leading to a completely non-annotated resource. Given the two DCMI-related findings above, we could maybe conclude that a predicate like dct:relation can also be used to relate a project to a person, or the other way round, without having to invent a new predicate.
</li></ul>

&amp;lt;/brain:dump&amp;gt;

      </div>
    </content>
  </entry>

  <entry>
    <title>SemWeb T-Shirt Shop closed</title>
    <link rel="alternate" type="text/html" hreflang="en" href="http://bnode.org/blog/2009/07/24/semweb-t-shirt-shop-closed"/>
    <id>http://bnode.org/blog/2009/07/24/semweb-t-shirt-shop-closed</id>
    <published>2009-07-24T08:50Z</published>
    <updated>2009-07-24T08:57:48Z</updated>
    <author>
      <name>Benjamin Nowack</name>
    </author>
    <summary>I've closed the Spreadshirt shop we set up a year ago, due to lack of interest.</summary>
    <category term="semanticweb"/>
    <category term="shop"/>
    <content type="xhtml" xml:lang="en">
      <div xmlns="http://www.w3.org/1999/xhtml">
Just a quick FYI: I've closed the <a href="http://bnode.org/blog/2008/06/23/semantic-web-community-shop-now-open">SemWeb Spreadshirt Shop</a> from last year. I never had a payout (you have to reach a certain amount of profit before you earn actual money), and as I plan/have to discontinue most of my many pet projects anyway (Simplify Your Life etc.), this one was rather easy to start with.<br />
<br />
I guess my <a href="http://twitter.com/bengee">red semweb cap</a> just became a rarity ;)
      </div>
    </content>
  </entry>

  <entry>
    <title>The Semantic Web - Not a piece of cake...</title>
    <link rel="alternate" type="text/html" hreflang="en" href="http://bnode.org/blog/2009/07/08/the-semantic-web-not-a-piece-of-cake"/>
    <id>http://bnode.org/blog/2009/07/08/the-semantic-web-not-a-piece-of-cake</id>
    <published>2009-07-08T14:55Z</published>
    <updated>2009-07-08T15:05:14Z</updated>
    <author>
      <name>Benjamin Nowack</name>
    </author>
    <summary>The SemWeb layercake diagram as an isometric infographic</summary>
    <category term="infographics"/>
    <category term="isometric"/>
    <category term="layer cake"/>
    <category term="semanticweb"/>
    <category term="stack"/>
    <category term="technologies"/>
    <content type="xhtml" xml:lang="en">
      <div xmlns="http://www.w3.org/1999/xhtml">
For a client project I've been looking at <a href="http://en.wikipedia.org/wiki/Isometric_projection">Isometric Projection</a>, which is not only nice for mapping 3D objects to a 2D environment, but even more so for adding a 3rd dimension to (previously) flat visual objects. The additional axis allows for much more information to be provided, without (necessarily ;) sacrificing compactness and simplicity.<br />
<br />
While I was pushing small boxes around on a 30° grid, <a href="http://twitter.com/jahendler/statuses/2489423431">Jim Hendler  tweeted</a> about his Layer Cake talk from the recent Dagstuhl meeting (which is awesome, BTW. <a href="http://www.cs.rpi.edu/~hendler/presentations/LayercakeDagstuhl-share.pdf">Read it</a>, if you haven't yet) and I started to wonder if an isometric version of the tech stack could help reduce the overload resulting from the current two-dimensional ones. Not really, I fear, but it was a fun experiment nontheless. Might be worth exploring this a little further. At least the concepts can be separated from specific technologies and the application layer has a different angle than before (which I personally think makes more sense). Anyway, just wanted to share the result. Enjoy.<br />
<br />
<a href="http://bnode.org/media/2009/07/08/semantic_web_technology_stack.png"><img src="http://bnode.org/media/2009/07/08/semantic_web_technology_stack_small.png" title="Semantic Web Technology Stack" alt="Semantic Web Technology Stack" /></a><br />
<br />
Feel free to <a href="http://creativecommons.org/licenses/by/3.0/">use and share</a>.

      </div>
    </content>
  </entry>

  <entry>
    <title>Code.semsol.org - A central home for semsol code</title>
    <link rel="alternate" type="text/html" hreflang="en" href="http://bnode.org/blog/2009/06/22/code-semsol-org-a-central-home-for-semsol-code"/>
    <id>http://bnode.org/blog/2009/06/22/code-semsol-org-a-central-home-for-semsol-code</id>
    <published>2009-06-22T14:00Z</published>
    <updated>2009-06-23T06:37:56Z</updated>
    <author>
      <name>Benjamin Nowack</name>
    </author>
    <summary>Semsol gets code repositories and browsers</summary>
    <category term="arc"/>
    <category term="bzr"/>
    <category term="repository"/>
    <category term="semanticweb"/>
    <category term="semsol"/>
    <category term="trice"/>
    <content type="xhtml" xml:lang="en">
      <div xmlns="http://www.w3.org/1999/xhtml">
The code bundles on the <a href="http://arc.semsol.org/">ARC website</a> are generated in an inefficient manual process, and each patch has to wait for the next to-be-generated zip file. The developer community is growing (there are now 600 ARC downloads each month), I'm increasingly receiving patches and requests for a proper repository, and the <a href="http://trice.semsol.org/">Trice framework</a> is about to get online as well. So I spent last week on building a dedicated source code site for all <a href="http://semsol.com/">semsol</a> projects at <a href="http://code.semsol.org/">code.semsol.org</a>.<br />
<br />
So far, it's not much more than a directory browser with source preview and a little method navigator. But it will simplify code sharing and frequent updates for me, and hopefully also for ARC and Trice developers. You can checkout various <a href="http://bazaar-vcs.org/">Bazaar</a> code branches and generate a bundle from any directory. The app can't display repository messages yet (the server doesn't have bzr installed, I'm just deploying branches using the handy FTP option), but I'll try to come up with a work-around or an alternative when time permits.<br />
<br />
<a href="http://code.semsol.org/"><img src="http://bnode.org/media/2009/06/22/code_browser.gif" title="Code Browser" alt="Code Browser" /></a>
      </div>
    </content>
  </entry>

  <entry>
    <title>CommonTag too complicated?</title>
    <link rel="alternate" type="text/html" hreflang="en" href="http://bnode.org/blog/2009/06/12/commontag-too-complicated"/>
    <id>http://bnode.org/blog/2009/06/12/commontag-too-complicated</id>
    <published>2009-06-12T11:25Z</published>
    <updated>2009-06-12T12:08:13Z</updated>
    <author>
      <name>Benjamin Nowack</name>
    </author>
    <summary>Not sure if the commontag effort sends the right message.</summary>
    <category term="commontag"/>
    <category term="microformats"/>
    <category term="modeling"/>
    <category term="semanticweb"/>
    <category term="tagging"/>
    <content type="xhtml" xml:lang="en">
      <div xmlns="http://www.w3.org/1999/xhtml">
<strong>Update: </strong> I just read the spec again, I can't tag non-content with the CommonTag vocabulary. Too bad, ignore the last paragraph, please.
<div class="hr"><hr /></div>
Sorry for raising my voice here, but some of us are really working hard to show that SemWeb technologies <em>don't</em> have to be complicated, and unfortunately, the new <a href="http://commontag.org/">CommonTag</a> effort seems to send exactly the opposite message.<br />
<br />
Don't get me wrong, a widely used tagging ontology would be great. We do have 3 (or 4? 5?) tagging vocabularies already, but none really caught up, possibly because tagging is meant to be simple and the proposed solutions apparently weren't easy enough. CommonTag is promoted as being &amp;quot;simple&amp;quot; and &amp;quot;easy&amp;quot;, but after looking at the examples in the <a href="http://www.commontag.org/QuickStartGuide">QuickStart Guide</a>, I'm not so sure:<br />
<ul><li>The snippets are really off-putting (not only for Non-RDFers). Do I really need multiple nested HTML nodes to create something as simple as a tag? </li>
<li>Couldn't the term names be more intuitive? What could a ctag:Tag be? The actual tag or an intermediate resource that is then, err, tagged? A person ctag:tagged a resource, right? Ah, no.</li>
<li>Why aren't the term names at least consistent? &amp;quot;ctag:taggingDate&amp;quot; follows noun-role, &amp;quot;ctag:tagged&amp;quot; is a dunno, &amp;quot;ctag:means&amp;quot; is a present-form verb, &amp;quot;ctag:isAbout&amp;quot; sort-of follows the hasPropertyOf anti-pattern.</li>
<li>The vocabulary introduces aliases for well-deployed terms such as rdfs:label and dct:created, which makes its use in practical settings expensive (it'll ease things on the author side, though).</li></ul>
<br />
To be a little more constructive: Using the vocabulary doesn't have to lead to the complicated markup seen in the examples. I'm sure they'll soon get better snippets from someone in the RDFa community. And apart from that, there is also a handy term in the <a href="http://commontag.org/ns#">RDF Schema</a> which might just be what you are looking for: &amp;quot;ctag:isAbout&amp;quot;. It lets you directly point from a resource (default is the page) to a Linked Data identifier (e.g. from DBPedia), without the need for all those intermediate nodes (which lead to triple bloat and slow down SPARQL queries). CommonTag-consuming apps will have to implement some form of inferencing to handle &amp;quot;isAbout&amp;quot;, but as the term is in the spec, I assume they plan to.<br />
<br />
Granular modeling of tags is apparently tricky, but shouldn't there be some sweet spot? Something a little more expressive than rel-tag but less complex than a fully spec'd Tag ontology? <a href="http://microformats.org/wiki/xfolk">xFolk</a> looks promising, or maybe the CommonTag group members could have agreed on formalizing and supporting &amp;quot;scoped rel-tag&amp;quot; (rel-tags with an optional RDFa &amp;quot;about&amp;quot; container). Most rel-tag-to-RDF converters have some form of scoping already anyway (because tags can apply to reviews, pages, vcards, etc.). <em>That</em> would have been a cool outcome after 1 year of stealth work.<br />
<br />
I may as well just over-stress the simplicity aspect here. Maybe CommonTag is &amp;quot;simple enough&amp;quot; for web publishers. There are some initial supporters, and for RDFers, the nested structures and bnodes will most probably be acceptable. So let's see how things evolve.<br />
<br />
<s>I personally think I'll have a closer look at ctag:isAbout. I'm still looking for an alternative to dc/dct:subject to tag arbitrary things with arbitrary identifiers, maybe CommonTag can provide it, although<br />
<pre class="code">&amp;lt;#me&amp;gt; ctag:isAbout dbpedia:Semantic_Web .</pre>
still doesn't sound right for a rich tag, and the domain is &amp;quot;ctag:TaggedContent&amp;quot; which sounds wrong for non-textual resources, too. (<a href="http://dublincore.org/documents/dcmi-terms/#terms-relation">dct:relation</a> is the best I could find so far for tagging things with things, but Dublin Core is coming from a publishing context and is therefore often recommended for describing publications only).<br />
</s>
      </div>
    </content>
  </entry>

  <entry>
    <title>ESWC 2009 Linked Data Dashboards</title>
    <link rel="alternate" type="text/html" hreflang="en" href="http://bnode.org/blog/2009/06/04/eswc-2009-linked-data-dashboards"/>
    <id>http://bnode.org/blog/2009/06/04/eswc-2009-linked-data-dashboards</id>
    <published>2009-06-04T13:20Z</published>
    <updated>2009-06-04T19:19:04Z</updated>
    <author>
      <name>Benjamin Nowack</name>
    </author>
    <summary>A first Paggr application went live during ESWC2009.</summary>
    <category term="confx"/>
    <category term="dashboards"/>
    <category term="eswc2009"/>
    <category term="linked data"/>
    <category term="paggr"/>
    <category term="semanticweb"/>
    <category term="sparqlets"/>
    <category term="sparqlscript"/>
    <category term="widgets"/>
    <content type="xhtml" xml:lang="en">
      <div xmlns="http://www.w3.org/1999/xhtml">
In case you missed the tweets or a local announcement: The first <a href="http://paggr.com/about">Paggr</a> application went online a few days ago. This year's <a href="http://eswc2009.org/">ESWC</a> Technologies Team pushed things a little further, with <a href="http://social.eswc2009.org/">RFID tracking</a> during the event and extended <a href="http://data.semanticweb.org/conference/eswc/2009">conference data</a> that includes detailed session and date/time information (kudos to Michael Hausenblas for RDFizing even PDFs).<br />
<br />
Based on this dataset, we provided a <a href="http://personal.eswc2009.org/">conference explorer</a> and stress-tested the <a href="http://data.semanticweb.org/">&amp;quot;Dog Food&amp;quot;</a> server while at it. The system survived, but I also learned a lot. We used about 50 RDF stores for the different public and user-specific dashboards, which basically worked nicely. However, rendering non-ugly resource summaries requires a bit of endpoint hammering, and some of the more complex path queries resulted in timeouts. Yesterday, I had to create a mirror from the <a href="http://data.semanticweb.org/dumps/conferences/">data dump</a> to route a couple of widgets through a replicated (ARC :-) endpoint. But then this is also one of the powerful possibilities that come with semantic web technologies. You can often switch or double the back-end repository in no time, and without any code changes. (And as all the Sparqlets are created in a <a href="http://personal.eswc2009.org/widgets/">web-based tool</a>, I didn't even have to upload a changed configuration file. I simply tweaked a SPARQLScript parameter.)<br />
<br />
Anyway, there are a couple of <a href="http://personal.eswc2009.org/live">public</a> <a href="http://personal.eswc2009.org/">dashboards</a>, in case you'd like to give it a try (it's still an early version), I also embedded a short screencast below. The system is going to be moved to a <a href="http://deri.ie/">DERI</a> server when the conference is over, but the URIs and data will probably stay stable. (And no, it won't really work with IE yet.) More to come!<br />
 <br />
<object width="550" height="385"><param name="movie" value="http://www.youtube.com/v/D7V4YNJHWwU&amp;hl=de&amp;fs=1&amp;color1=0x006699&amp;color2=0x54abd6"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/D7V4YNJHWwU&amp;hl=de&amp;fs=1&amp;color1=0x006699&amp;color2=0x54abd6" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="550" height="385"></embed></object>
<br />
<br />
<a href="http://paggr.com/media/2009/06/eswc2009.mov">HQ version (quicktime, 110MB)</a>


      </div>
    </content>
  </entry>

  <entry>
    <title>Simple RDFication of SPARQL SELECT results with RDFa</title>
    <link rel="alternate" type="text/html" hreflang="en" href="http://bnode.org/blog/2009/05/26/simple-rdfication-of-sparql-select-results-with-rdfa"/>
    <id>http://bnode.org/blog/2009/05/26/simple-rdfication-of-sparql-select-results-with-rdfa</id>
    <published>2009-05-26T07:50Z</published>
    <updated>2009-05-26T08:15:49Z</updated>
    <author>
      <name>Benjamin Nowack</name>
    </author>
    <summary>How to use RDFa to make SELECT results locally available as RDF</summary>
    <category term="linked data"/>
    <category term="nyc"/>
    <category term="rdfa"/>
    <category term="semanticweb"/>
    <category term="sparql"/>
    <category term="tricks"/>
    <category term="wiki"/>
    <content type="xhtml" xml:lang="en">
      <div xmlns="http://www.w3.org/1999/xhtml">
A couple of weeks ago, I've written about the <a href="http://bnode.org/blog/2009/02/18/linked-data-value-spiral">self-enforcing value spiral that RDF data enables</a>. Here is an example about how <a href="http://rdfa.info">RDFa</a> can be used to support this &amp;quot;Repurpose-Republish&amp;quot; loop.<br />
<br />
While data exchange between different semantic web sources is usually RDF-based (i.e. the data always maintain their semantics), there is one major exception: SPARQL SELECT queries. This developer-oriented operation returns tabular data (similar to record sets in SQL). Once the query result is separated from the query, the associated structural data is lost. You can't directly feed SELECT results back into a triple store, even though querying based on linked resources means that you have just <a href="http://bnode.org/blog/2007/07/17/semweb-on-a-slide-at-duesseldorfs-1st-web-monday">created knowledge</a>. It's a pity to show this generated information to human consumers only.<br />
<br />
One of the demos at my NYC talk was a dynamic wiki item that pulled in competitor information from <a href="http://cb.semsol.org/">Semantic CrunchBase</a> and injected that into a page template as HTML. The existing RDF infrastructure does not let me cache the SELECT results locally as usable RDF. And a semantic web client or crawler that indexes the wiki page will not learn how the described resource (e.g. Twitter) is related to the remote, linked entities.<br />
<br />
<img src="http://bnode.org/media/2009/05/26/wiki_1.gif" title="wiki with linked data" alt="wiki with linked data" /><br />
<br />
However, by simply adding a single RDFa hook to the wiki item template, the RDF relation (e.g. competitor) can be made available again to apps that process my site content. This is basically how <a href="http://linkeddata.org/">Linked Data</a> works. But here is the really nifty thing: My site can be a consumer of its own pages, too, recursively enriching its own data.<br />
<br />
<img src="http://bnode.org/media/2009/05/26/wiki_2.gif" title="markup-to-SELECT-to-RDFa-to-RDF" alt="markup-to-SELECT-to-RDFa-to-RDF" /><br />
<br />
I tweaked the wiki script which now works like this: When the page is saved, a first operation updates the wiki markup in the page's graph (i.e. the not-yet-populated template). In a second step, the page URL is retrieved via HTTP. This will return HTML with RDFa-encoded remote data, which is then parsed by ARC, and finally added to the same graph. We end up with a graph that does not only contain the wiki markup, but also the RDFized information that was integrated from remote sites. After adding this graph to the RDF store, we can use a local query to generate the page and occasionally reset the graph to enable copy-by-reference. And all this without any custom API code.<br />
<br />
<img src="http://bnode.org/media/2009/05/26/wiki_3.gif" title="rdfa-to-sparql" alt="rdfa-to-sparql" /><br />
<br />

      </div>
    </content>
  </entry>

  <entry>
    <title>Back from New York &quot;Semantic Web for PHP Developers&quot; trip</title>
    <link rel="alternate" type="text/html" hreflang="en" href="http://bnode.org/blog/2009/05/25/back-from-new-york-semantic-web-for-php-developers-trip"/>
    <id>http://bnode.org/blog/2009/05/25/back-from-new-york-semantic-web-for-php-developers-trip</id>
    <published>2009-05-25T15:45Z</published>
    <updated>2010-01-26T16:22:28Z</updated>
    <author>
      <name>Benjamin Nowack</name>
    </author>
    <summary>Gave a talk and a workshop in NYC about SemWeb technologies for PHP developers</summary>
    <category term="arc"/>
    <category term="meetup"/>
    <category term="new york"/>
    <category term="nyc"/>
    <category term="php"/>
    <category term="rdf"/>
    <category term="semanticweb"/>
    <category term="slides"/>
    <category term="sparql"/>
    <category term="talk"/>
    <category term="trice"/>
    <content type="xhtml" xml:lang="en">
      <div xmlns="http://www.w3.org/1999/xhtml">
<img src="http://bnode.org/media/2009/05/19/benji_times_square.jpg" class="fr" title="http://bnode.org/me at times square" alt="http://bnode.org/me at times square" /> I'm back from New York, where I was given the great opportunity to talk about two of my favorite topics: <a href="http://arc.semsol.org/">Semantic Web Development with PHP</a>, and (not necessarily semantic) <a href="http://trice.semsol.org/">Software Development using RDF Technology</a>. I was especially looking forward to the second one, as that perspective is not only easier to understand for people from a software engineering context, but also because it is still a much neglected marketing &amp;quot;back-door&amp;quot;: If RDF simplifies working with data in general (and it does), then we should not limit its use to semantic web apps. Broader data distribution and integration may naturally follow in a second or third step once people use the technology (so much for my contribution to <a href="http://webofdata.wordpress.com/2009/05/24/technology-malbestpracticing/">Michael Hausenblas' list of RDF MalBest Practices</a> ;)<br />
<br />
The talk on Thursday at the <a href="http://semweb.meetup.com/25/calendar/9614181/">NY Semantic Web Meetup</a> was great fun. But the most impressive part of the event were the people there. A lot to learn from on this side of the pond. Not only very practical and professional, but also extremely positive and open. Almost felt like being invited to a family party.<br />
<br />
The positive attitude was even true for the workshop, which I clearly could have made more effective. I didn't expect (but should have) that many people would come w/o a LAMP stack on their laptops, so we lost a lot of time setting up MAMP/LAMP/WAMP before we started hacking ARC, Trice, and SPARQL.<br />
<br />
<a href="http://www.konallc.com/">Marco</a> brought up a number of illustrating use cases. He maintains an <s>(inofficial, sorry, can't provide a pointer)</s> <a href="http://www.swnyc.org/sparql/?id=274991">RDF wrapper</a> for any group on meetup.com, so the workshop participants could directly work with real data. We explored overlaps between different Meetup groups, the order in which people joined selected groups, inferred new triples from combined datasets via CONSTRUCT, and played with not-yet-standard SPARQL features like COUNT and LOAD.<br />
<br />
And having done the workshop should finally give me the last kick to launch the <a href="http://trice.semsol.org/">Trice site</a> now. The code is out, and it's apparently not too tricky to get started even when the documentation is still incomplete. Unfortunately, I have a strict &amp;quot;no more non-profits&amp;quot; directive, but I think Trice, despite being FOSS, will help me get some paid projects, so I'll squeeze an official launch in sometime soon-ish.<br />
<br />
Below are the slides from the meetup. I added some screenshots, but they are probably still a bit boring without the actual demos (I think a video will be put up in a couple of days, though).<br />
<br />
<div style="width:425px;text-align:left" id="__ss_1485828"><a style="font:14px Helvetica,Arial,Sans-serif;display:block;margin:12px 0 3px 0;text-decoration:underline;" href="http://www.slideshare.net/bengee/rdf-and-sparql-for-php-developers-at-new-york-semantic-web-meetup?type=powerpoint" title="RDF and SPARQL for PHP Developers (at New York Semantic Web Meetup)">RDF and SPARQL for PHP Developers</a><object style="margin:0px" width="425" height="355"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=slidesmeetupfinal-090525103230-phpapp01&amp;stripped_title=rdf-and-sparql-for-php-developers-at-new-york-semantic-web-meetup" /><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><embed src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=slidesmeetupfinal-090525103230-phpapp01&amp;stripped_title=rdf-and-sparql-for-php-developers-at-new-york-semantic-web-meetup" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"></embed></object></div>


      </div>
    </content>
  </entry>

</feed>
