finally a bnode with a uri

w3photo scutterplans, picture RDF/XML, and conference dumps

w3photo data export and conversion
Greg Elin did an amazing job for last year's WWW conference. Together with Bryce Benton, he implemented the w3photo site and also went to NYC to present and promote the project. Last year's focus was on getting the site up and collecting photos, and although we provided basic RDF export, w3photo data wasn't really accessible for re-use. We didn't manage to continue the project for WWW 2005, but I'm confident that there'll be a project update by the next conference. One of the first tasks is to make the w3photo data available so that the annotations can be integrated in new tools more easily. It would be great to see multiple annotators and viewers with data exchanged or cross-sparqled online. I'm also looking forward to combining w3photo data with conference information and FOAF files of attendees. I started to write some scripts and converters for the w3photo data yesterday, basically to get more pictures into CONFOTO, but of course also for other SemWebbers.

Scutterplans

w3photo provides news feeds for each conference (e.g. WWW 2003 feed), but only in an RSS 2.0 format and with links pointing to HTML pages. So, the first task was to convert these feeds into scutterplans. It's straight-forward to get the metadata location from an image URL (by simply replacing the file extension), the problem is that there are some bugs in the RDF/XML stored at w3photo.org. Therefore the scutterplans generated by my script (e.g. WWW 2003 scutterplan) don't point at the w3photo site but at a clean-up script which tries to return valid RDF/XML. I also tried to make sure that empty or broken feeds (e.g. the WWW 5 feed 404s) don't lead to broken scutterplans. If you omit the &conf=foo parameter, a scutterplan for all available conferences will be returned.

Adjusted picture RDF/XML

The clean-up script does some basic XML cosmetics, tries to remove syntax bugs, and partly adjusts the RDF/XML for better re-purposing (example):
  • ' attribute delimiters are replaced with "
  • the w3photo stylesheet link is removed
  • the outdated mindswap conference ontology namespace is replaced with the stable one ("~golbeck/web" -> "2004")
  • relative URIs and rdf:IDs are adjusted to absolute URIs and rdf:abouts (so that photo details can be aggregated in a single dump file)
  • photo paths are normalized (e.g. "photos/ww2003/photos/" -> "photos/www2003/"), so that the scutterplans and photo details use the same picture URLs
  • cc:Agent is changed to foaf:Agent (not 100% sure about that, can undo that if wanted)
  • conf:Event is changed to conf:Conference
  • fotonotes:label in regions is changed to dc:description
  • CDATA delimiters are removed
  • foaf:homepages to identify annotators are removed (many people used shared company homepage addresses which would lead to unwanted resource consolidations)
  • the Creative Commons license URIs are updated to version 2.5
  • empty tags are removed
  • &s in literals are changed to &s
  • XML comments are removed
  • the annotation creation date is adjusted (e.g. "2004-Jun-01T00:02:06EDT" -> "2004-06-01T04:02:06Z")
  • imreg:regionDepicts information is removed as it contained redundant rdf:IDs
  • confoto:eventShortName, confoto:eventYear, foaf:name, and dc:date are added to conference nodes.
  • ical:url is adjusted to the conference homepage and replaced with foaf:homepage (now that the domain of foaf:homepage was broadened)
  • the person name in unqualified w3photoRDF region tags is extracted and re-added as dc:subject
  • unqualified and fotonotes tags are removed
  • broken unicode characters are adjusted
  • the year of the conference is added as dc:date to the foaf:Image node
  • the conference name is added as dc:subject to the foaf:Image

w3photo dumps

I actually don't have a scutter running at CONFOTO, data is usually added online by users. In order not to have to import each photo's data one by one, I added a dump generator to the w3photo converter. Via the parameter ?view=dump (e.g. ?view=dump&conf=www2003), it's possible to get the complete photo data of a conference. The caching mechanism should keep the response times low, however, it may take some time to generate large dumps (the script doesn't send more than 1 request/second to Greg's server). If you get timeouts, just wait a couple of seconds and try again. CONFOTO saves temporary results and can continue from there.

Hm, either my SPARQL2SQL rewriter or CONFOTO's query generator needs more work, retrieving photos after adding a conference filter takes ages, while the tag filter works without problems. However, it should now be possible to retrieve photo data via the server's SPARQL endpoint.

Comments are disabled for this post.

Later Posts

Archives/Search

YYYY or YYYY/MM
No Posts found

Feeds