finally a bnode with a uri

Posts tagged with: microformats

CommonTag too complicated?

Not sure if the commontag effort sends the right message.
Update: I just read the spec again, I can't tag non-content with the CommonTag vocabulary. Too bad, ignore the last paragraph, please.

Sorry for raising my voice here, but some of us are really working hard to show that SemWeb technologies don't have to be complicated, and unfortunately, the new CommonTag effort seems to send exactly the opposite message.

Don't get me wrong, a widely used tagging ontology would be great. We do have 3 (or 4? 5?) tagging vocabularies already, but none really caught up, possibly because tagging is meant to be simple and the proposed solutions apparently weren't easy enough. CommonTag is promoted as being "simple" and "easy", but after looking at the examples in the QuickStart Guide, I'm not so sure:
  • The snippets are really off-putting (not only for Non-RDFers). Do I really need multiple nested HTML nodes to create something as simple as a tag?
  • Couldn't the term names be more intuitive? What could a ctag:Tag be? The actual tag or an intermediate resource that is then, err, tagged? A person ctag:tagged a resource, right? Ah, no.
  • Why aren't the term names at least consistent? "ctag:taggingDate" follows noun-role, "ctag:tagged" is a dunno, "ctag:means" is a present-form verb, "ctag:isAbout" sort-of follows the hasPropertyOf anti-pattern.
  • The vocabulary introduces aliases for well-deployed terms such as rdfs:label and dct:created, which makes its use in practical settings expensive (it'll ease things on the author side, though).

To be a little more constructive: Using the vocabulary doesn't have to lead to the complicated markup seen in the examples. I'm sure they'll soon get better snippets from someone in the RDFa community. And apart from that, there is also a handy term in the RDF Schema which might just be what you are looking for: "ctag:isAbout". It lets you directly point from a resource (default is the page) to a Linked Data identifier (e.g. from DBPedia), without the need for all those intermediate nodes (which lead to triple bloat and slow down SPARQL queries). CommonTag-consuming apps will have to implement some form of inferencing to handle "isAbout", but as the term is in the spec, I assume they plan to.

Granular modeling of tags is apparently tricky, but shouldn't there be some sweet spot? Something a little more expressive than rel-tag but less complex than a fully spec'd Tag ontology? xFolk looks promising, or maybe the CommonTag group members could have agreed on formalizing and supporting "scoped rel-tag" (rel-tags with an optional RDFa "about" container). Most rel-tag-to-RDF converters have some form of scoping already anyway (because tags can apply to reviews, pages, vcards, etc.). That would have been a cool outcome after 1 year of stealth work.

I may as well just over-stress the simplicity aspect here. Maybe CommonTag is "simple enough" for web publishers. There are some initial supporters, and for RDFers, the nested structures and bnodes will most probably be acceptable. So let's see how things evolve.

I personally think I'll have a closer look at ctag:isAbout. I'm still looking for an alternative to dc/dct:subject to tag arbitrary things with arbitrary identifiers, maybe CommonTag can provide it, although
<#me> ctag:isAbout dbpedia:Semantic_Web .
still doesn't sound right for a rich tag, and the domain is "ctag:TaggedContent" which sounds wrong for non-textual resources, too. (dct:relation is the best I could find so far for tagging things with things, but Dublin Core is coming from a publishing context and is therefore often recommended for describing publications only).

RPointer - The resource described by this XPointer element

URIs for resources described in microformatted or poshRDF'd content
I'm often using/parsing/supporting a combination of different in-HTML annotations. I started with eRDF and microformats, more recently RDFa and poshRDF. Converting HTML to RDF usually leads to a large number of bnodes or local identifiers (RDFa is an exception. It allows the explicit specification of a triple's subject via an "about" attribute). Additionally, multi-step parsing a document (e.g. for microformats and then for eRDF) will produce different identifiers for the same objects.

I've searched for a way to create more stable, URI-based IDs. Mainly for two use cases: Technically, for improved RDF extraction, and practically for being able to subscribe to certain resource fragments in HTML pages, like the main hCard on a person's Twitter profile. The latter is something I need for Knowee.

The closest I could find (and thanks to Leigh Dodds for pointing me at the relevant specs) is the XPointer Framework and its XPointer element() scheme, which is defined as: ...intended to be used with the XPointer Framework to allow basic addressing of XML elements.
Here is an example XPointer element and the associated URI for my Twitter hCard:
element(side/1/2/1)
http://twitter.com/bengee#element(side/1/2/1)
We can't, however, use this URI to refer to me as a person (unless I redefine myself as an HTML section ;-). It would work in this particular case as I could treat the hCard as a piece of document, and not as a person. But in most situations (for example events, places, or organizations), we may want to separate resources from their respective representations on the web (and RDFers can be very strict in this regard). This effectively means that we cant use element(), but given the established specification, something similar should work.

So, instead of element(), I tweaked ARC to generate resource() URIs from XPointers. In essence:
The RPointer resource() scheme allows basic addressing of resources described in XML elements. The hCard mentioned above as RPointer:
resource(side/1/2/1)
http://twitter.com/bengee#resource(side/1/2/1)
There is still a certain level of ambiguity as we could argue about the exact resource being described. Also, as HTML templates change, RPointers are only as stable as their context. But practically, they work quite fine for me so far.

Note: The XPointer spec provides an extension mechanism, but it would have led to very long URIs including a namespace definition for each pointer. Introducing the non-namespace-qualified resource() scheme unfortunately means dropping out of the XPointer Framework ("This specification reserves all unqualified scheme names for definition in additional XPointer schemes"), so I had to give it a new name (hence "RPointer") and have to hope that the W3C doesn't create a resource() scheme for the XPointer framework.

RPointers are implemented in ARC's poshRDF and microformats extractors.

Knowee - (The beginning of) a semantic social web address book

Knowee is a web address book that lets you integrate distributed social graph fragments. A new version is online at knowee.net.
Heh, this was planned as a one-week hack but somehow turned into a full re-write that took the complete December. Yesterday, I finally managed to tame the semantic bot army and today I've added a basic RDF editor. A sponsored version is now online at knowee.net, a code bundle for self-hosting will be made available at knowee.org tomorrow.

What is Knowee?

Knowee started as a SWEO project. Given the insane number of online social networks we all joined, together with the increasing amount of machine-readable "social data" sources, we dreamed of a distributed address book, where the owner doesn't have to manually maintain contact data, but instead simply subscribes to remote sources. The address book could then update itself automatically. And -in full SemWeb spirit- you'd get access to your consolidated social graph for re-purposing. There are several open-source projects in this area, most notably NoseRub and DiSo. Knowee is aiming at interoperability with these solutions.
knowee concept

Ingredients

For a webby address book, we need to pick some data formats, vocabularies, data exchange mechanisms, and the general app infrastructure:
  • PHP + MySQL: Knowee is based on the ubiquitous LAMP stack. It tries to keep things simple, you don't need system-level access for third-party components or cron jobs.
  • RDF: Knowee utilizes the Resource Description Framework. RDF gives us a very simple model (triples), lots of different formats (JSON, HTML, XML, ...), and free, low-cost extensibility.
  • FOAF, OpenSocial, microformats, Feeds: FOAF is the leading RDF vocabulary for social information. Feeds (RSS, Atom) are the lowest common denominator for exchanging non-static information. OpenSocial and microformats are more than just schemas, but the respective communities maintain very handy term sets, too. Knowee uses equivalent representations in RDF.
  • SPARQL: SPARQL is the W3C-recommended Query language and API for the Semantic Web.
  • OpenID: OpenID addresses Identity and Authentication requirements.
I'm still working on a solution for access control, the current Knowee version is limited to public data and simple, password-based access restrictions. OAuth is surely worth a look, although Knowee's use case is a little different and may be fine with just OpenID + sessions. Another option could be the impressive FOAF+SSL proposal, I'm not sure if they'll manage to provide a pure-PHP implementation for non-SSL-enabled hosts, though.

Features / Getting Started

This is a quick walk-through to introduce the current version.
Login / Signup
Log in with your (ideally non-XRDS) OpenID and pick a user name.

knowee login

Account setup
Knowee only supports a few services so far. Adding new ones is not hard, though. You can enable the SG API to auto-discover additional accounts. Hit "Proceed" when you're done.

knowee accounts

Profile setup
You can specify whether to make (parts of) your consolidated profile public or not. During the initial setup process, this screen will be almost empty, you can check back later when the semantic bots have done their job. Hit "Proceed".

knowee profile

Dashboard
The Dashboard shows your personal activity stream (later versions may include your contacts' activities, too), system information and a couple of shortcuts.
knowee dashboard

Contacts
The contact editor is still work in progress. So far, you can filter the list, add new entries, and edit existing contacts. The RDF editor is still pretty basic (Changes will be saved to a separate RDF graph, but deleted/changed fields may re-appear after synchronization. This needs more work.) The editor is schema-based and supports the vocabularies mentioned above. You'll be able to create your own fields at some later stage.

It's already possible to import FOAF profiles. Knowee will try to consolidate imported contacts so that you can add data from multiple sources, but then edit the information via a single form. The bot processor is extensible, we'll be able to add additional consolidators at run-time, it only looks at "owl:sameAs" at the moment.
knowee contacts

Enabling the SPARQL API
In the "Settings" section you'll find a form that lets you activate a personal SPARQL API. You can enable/protect read and/or write operations. The SPARQL endpoint provides low-level access to all your data, allows you to explore your social graph, or lets you create backups of your activity stream.

knowee api knowee api

That's more or less it for this version. You can always reset or delete your account, and manually delete incorrectly monitored graphs. The knowee.net system is running on the GoGrid cloud, but I'm still tuning things to let the underlying RDF CMS make better use of the multi-server setup. If things go wrong, blame me, not them. Caching is not fully in place yet, and I've limited the installation to 100 accounts. Give it a try, I'd be happy about feedback.

poshRDF - RDF extraction from microformats and ad-hoc markup

poshRDF is a new attempt to extract RDF from microformats and ad-hoc markup
I've been thinking about this since Semantic Camp where I had an inspiring dialogue with Keith Alexander about semantics in HTML. We were wondering about the feasibility of a true microformats superset, where existing microformats could be converted to RDF without the need to write a dedicated extractor for each format. This was also about the time when "scoping" and context issues around certain microformats started to be discussed (What happens for example with other people's XFN markup, aggregated in a widget on my homepage? Does it affect my social graph as seen by XFN crawlers? Can I reuse existing class names for new formats, or do we confuse parsers and authors then? Stuff like that).

A couple of days ago I finally wrote up this "poshRDF" idea on the ESW wiki and started with an implementation for paggr widgets, which are meant to expose machine-readable data from RDFa, microformats, but also from user-defined, ad-hoc formats, in an efficient way. PoshRDF can enable single-pass RDF extraction for a set of formats. Previously, my code had to walk through the DOM multiple times, once for each format.

A poshRDF parser is going to be part of one of the next ARC revisions. I've just put up a site at poshrdf.org to host the dynamic posh namespace. For now the site links to a possibly interesting by-product: A unified RDF/OWL schema for the most popular microformats: xfn, rel-tag, rel-bookmark, rel-nofollow, rel-directory, rel-license, hcard, hcalendar, hatom, hreview, xfolk, hresume, address, and geolocation. It's not 100% correct, poshRDF is after all still a generic mechanism and doesn't cover format-specific interpretations. But it might be interesting for implementors. The schema could be used to generate dedicated parser configurations. It also describes the typical context of class names so that you can work around scoping issues (e.g. the XFN relations are usually scoped to the document or embedded hAtom entries).

I hope to find some time to build a JSON exporter and microformats validator on top of poshRDF in the not too distant future. Got to move on for now, though. Dear Lazyweb, feel free to jump in ;)

Moving out of the shadow with RDFa

RDFa can help solve the "shadow semweb" problem
Ian Davis has written an interesting series of posts related to the problems arising from using fragment identifiers in resource URIs. Ian makes a lot of valid points, but I think misses an essential one. (With this post I'm breaking with a long tradition, I'm saying positive things about RDFa ;)

So, what's the problem, and how can RDFa help? Ian is discussing a lot of architectural things, and I'm sure there are issues and inconsistencies. But the practical problem he describes is based on the following WebArch principle:
The fragment identifies a portion of a representation obtained from a URI,
and its meaning changes depending on the type of representaion. [sic]
That means that you can't use "http://example.com/ben#self" as an HTML section identifier and as a non-document identifier (e.g. the person ben). Ian concludes that
You can have a machine readable RDF version or a human readable HTML
version but not both at the same time
and that this forces the structured web into a disregarded shadow of the human-readable web.

I think that conclusion is not correct. eRDF re-uses HTML's @id to establish resource identifiers, so it mixes document identifiers with non-doc ones, and this is an ambiguity problem indeed. RDFa, however, is a layer on top of HTML that introduces a dedicated mechanism for resource identification, the @about attribute (, and that's why it unfortunately needs an own DTD, but that's another story). From a WebArch POV, the design is clean, content-type-specific identifiers don't get mixed. I can unambiguously describe what "..ben#self" is meant to identify without the representation format playing a role. RDFa can re-purpose HTML's text nodes for RDF literals, and anchors for resource URIs, but apart from that, the HTML document is not much more than a (human-friendly) container.

So, you can serve HTML and machine-readable information in a single document, you just have to make sure that your resource URI fragments don't appear in HTML @ids. And now that we are back on the practical level: Any other ID generation mechanism can work, too. It's fairly easy to implement a URI generator for RDF extracted from a microformats-enabled HTML page without overloading resource IDs. I personally don't see a huge problem (again, practically), as all my applications work with triples, not with representations or encodings which are dealt with by the parsers and extractors.

One practical issue remains, though: Current browsers don't (natively) support navigating to RDF identifiers encoded in RDFa-, microformats-, or GRDDL-enabled HTML pages. You need an additional JavaScript lib to invoke appropriate scroll actions after a page URI with a (non-HTML) fragment identifier is loaded. That's a little annoying, but doable. I think fragment identifiers are valuable. They allow the description of multiple resources in a single document, and that's a handy feature. Whether that breaks Web architecture theory, dunno. Not for me, at least ;-)

SWEO project "knowee"

Call for participation
I finally sent out a call for participation for knowee, one of the projects supported by SWEO (just in time for the F2F reports tomorrow).

The project is about creating a semwebby address book thingy, but there actually is another dimension to the "outreach" aspect beyond running code. I'd really like to bring RDFers and microformateers closer together (from both directions). RDFers can learn a lot from the pragmatic microformats community, and adding data integration (+query) functionality to microformats can enable a whole new set of applications.

A Comparison of Microformats, eRDF, and RDFa

An updated (and customizable) comparison of the different approaches for semantically enhancing HTML.
Update (2006-02-13): In order to avoid further flame wars with RDFa folks, I've adjusted the form to not show my personal priorities as default settings anymore (here they are if you are interested, it's a 48-42-40 ranking for MFs, eRDF, and RDFa respectively). All features are set to "Nice to have" now. As you can see, for these settings, RDFa gets the highest ranking (I *said* the comparison is not biased against RDFa!). If you disable the features related to domain-independent resource descriptions, MFs shine, if you insist on HTML validity, eRDF moves up, etc. It's all in the mix.

After a comment of mine on the Microformats IRC channel, SWD's Michael Hausenblas asks for the reason why I said that I personally don't like RDFa. Damn public logs ;) OK, now I have to justify that somehow without falling into rant mode again...

I already wrote a little comparison of Microformats, Structured Blogging, eRDF, and RDFa some time ago, sounds like a good opportunity to see how things evolved during the last 8 months. Back then I concluded that both eRDF and RDFa were preferred candidates for SemSol, but that RDFa lacked the necessary deployment potential due to not being valid HTML (as far as any widespread HTML spec is concerned).

I excluded the Structured Blogging initiative from this comparison, it seems to have died a silent death. (Their approach to redundantly embed microcontent in script tags apparently didn't convince the developer community.) I also excluded features which are equally available in all approaches, such as visible metadata, general support for plain literals, being well-formed, no negative effect on browser behaviour, etc.

Pretending to be constructive, and in order to make things less biased, I embedded a dynamic page item that allows you to create your own, tailored comparison. The default results reflect my personal requirements (and hopefully answer Michael's question). As your mileage does most probably vary, you can just tweak the feature priorities (The different results are not stored, but the custom comparisons can be bookmarked). Feel free to leave a comment if you'd like me to add more criteria.

No. Feature or Requirement Priority MFs eRDF RDFa
1 DRY (Don't Repeat Yourself) yes yes mostly
2 HTML4 / XHTML 1.0 validity yes yes no
3 Custom extensions / Vocabulary mixing no yes yes
4 Arbitrary resource descriptions no yes yes
5 Explicit syntactic means for arbitrary resource descriptions no no yes
6 Supported by the W3C partly partly yes
7 Follow DCMI guidelines no yes no
8 Stable/Uniform syntax specification partly yes yes
9 Predictable RDF mappings mostly yes yes
10 Live/Web Clipboard Compatibility yes mostly mostly
11 Reliable copying, aggregation, and re-publishing of source chunks. (Self-containment) mostly partly partly
12 Support for not just plain literals (e.g. typed dates, floats, or markup). yes no yes
13 Triple bloat prevention (only actively marked-up information leads to triples) yes yes no
14 Possible integration in namespaced (non-HTML) XML languages. no no yes
15 Mainstream Web developers are already adopting it. yes no no
16 Tidy-safety (Cleaning up the page will never alter the embedded semantics) yes yes no
17 Explicit support for blank nodes. no no yes
18 Compact syntax, based on existing HTML semantics like the address tag or rel/rev/class attributes. yes mostly partly
19 Inclusion of newly evolving publishing patterns (e.g. rel="nofollow"). yes no partly
20 Support for head section metadata such as OpenID or Feed hooks. no partly partly

Results

Solution Points Missing Requirements
RDFa 35 -
eRDF 34 -
Microformats 33 -

Max. points for selected criteria: 60

Summary:

Your requirements are met by RDFa, or eRDF, or Microformats.

Feature notes/explanations:

DRY (Don't Repeat Yourself)
  • RDFa: Literals have to be redundantly put in "content" attributes in order to make them un-typed.
HTML4 / XHTML 1.0 validity
  • RDFa: Given the buzz around the WHATWG, it's uncertain when (if at all) XHTML 2 or XHTML 1.1 modules will be widely deployed enough.
Explicit syntactic means for arbitrary resource descriptions
  • eRDF: owl:sameAs statements (or other IFPs) have to be used to describe external resources.
Supported by the W3C
  • MFs, eRDF: Indirectly supported by W3C's GRDDL effort.
Stable/Uniform syntax specification
  • MFs: Although MFs reuse HTML structures, the format syntax layered on top differs, so that each MF needs separate (though stable) parsing rules.
Predictable RDF mappings
  • MFs: Microformats could be mapped to different RDF structures, but the GRDDL WG will probably recommend fixed mappings.
Live/Web Clipboard Compatibility
  • eRDF, RDFa: Tweaks are needed to make them Live-Clipboard compatible.
Reliable copying, aggregation, and re-publishing of source chunks. (Self-containment)
  • MFs: Some Microformats (e.g. XFN) lose their intended semantics when regarded out of context.
  • eRDF/RDFa: Only chunks with nearby/embedded namespace definitions can be reliably copied.
Support for head section metadata such as OpenID or Feed hooks.
  • eRDF: Can support openID hooks.
  • RDFa: Will probably interpret any rel attribute.


Bottom line: For many requirement combinations a single solution alone is not enough. My tailored summary suggests for example that I should be fine with a combination of Microformats and eRDF. How does your preferred solution mix look like?

SeenOn - Timestamp or State of Mind?

fun stuff from #microformats, comments on e/RDF/a wrt to Microformats
<tommorris> Every time I see a movie from now on,
  I'm adding the IMDB URL to my FOAF file.
<briansuda> with what predicate?
<tommorris> rdf.opiumfield.com/movie/0.1/seen
...
<briansuda> seenOn, is that a timestamp or a state-of-mind?
(microformats(!) irc channel)

Now, who said RDF was less real-word-ish than microformats?

Related link (wrt to movies, not toxics): Microformats 80%, RDF 20% by Tom Morris about the longtail utility of (e)RDF(a). Wanted to state something like this for some time. After implementing a Microcontent parser (part of the next ARC release) that creates a merged triple set from eRDF and Microformats, I can't say anymore that MFs don't scale (even though making the meaning of nested formats explicit is sometimes tricky). I was really impressed by the amount of practical use cases covered by them (Listings and qualified review ratings even go beyond the demos I've seen in RDFer circles). However, there is still a lot of room for custom RDF extensions that can be used to extend microformatted HTML. Skill levels are just one of many longtail examples: They are currently not covered by hResume, but available in Uldis' CV vocab.

The important thing IMO is that RDFers should not forget to acknowledge the amazing deployment work of the MF community and focus on what they can add to the table (storage, querying, and mixing, as a start) instead of marketing RDF-in-HTML as an alternative, replacement, or otherwise "superior" (likewise the other way round, btw.). I think we also shouldn't overcharge the big content re-publishers. When maintainers of sites like LinkedIn or Eventful get bombed with requests to add different semantic serializations to their pages, they may hesitate to support any of them at all. For most of these mainstream sites, Microformats do the job just fine, and often better. Why should people for example have to specify namespaces when a simple, agreed-on rel-license does the trick already? (We could still use RDF to specify the license details, and even the license link is only a simple conversion away from RDF.)

SWEO Community Project Task Force

Trying to gather programmers already interested in semweb technology around a few projects.
Kjetil Kjernsmo has initiated a new Semantic Web Education and Outreach Interest Group Task Force called "Community Projects". A great idea.

This rally has the goal of using our collective input to generating real running code, that can help us to demonstrate the value of the Semantic Web to a wide user base. We want to encourage developers to work together to create something that will make a real difference to people's lives today

Just added a proposal for "knowee", a web-based contact organizer (a project similar to something Ivan mentioned some weeks ago, and I think also similar to the work Henry Story recently started).

Flaws in iX article about "Semantic Web versus Web 2.0"

The title says it all.
The iX article Denny and I mentioned recently stirs up some discussion. Patrick Danowski asks for some details about the flaws in the article. I wanted to comment on his blog, but it turned into a whole post which I'll just copy below (bah, in german again):

Man muss jedem SemWeb-kritischen Artikel sicherlich dahingehend zustimmen, dass die Vermarktung und Informationsversorgung bisher nicht sonderlich gut organisiert ist. Dementsprechend ist es nachvollziehbar, wenn ein Wenig-Involvierter wie Cai Ziegler falsche Schlüsse zieht. Hoffentlich wird dies die gerade gegründete Semantic Web Education and Outreach Interessensgruppe des W3Cs ändern.

Das Hauptproblem besteht darin, dass Cai Ziegler die Herangehensweise des Semantic Webs nicht vollständig verstanden zu haben scheint (oder lediglich Spaß an Flamewars hat). Es geht nicht darum, einen "Nachfolger" des Webs zu entwickeln. Vielmehr versucht die Semantic-Web-Initiative, Technologien zu spezifizieren, mit denen sich die Inhalte des bestehenden Webs besser weiterverarbeiten lassen (insbesondere, indem die Semantik explizit gemacht wird). Wenn nun das bestehende Web (nennen wir es spaßeshalber mal "Web 2.0") Informations-Quellen wie Folksonomien oder Microformats hervorbringt, hat das mit dem Kerngebiet des SemWebs (explizite Semantik) gar nichts zu tun. Vielmehr vergrößert es den Pool an Daten, auf die evtl. später mit SemWeb-Tools zugegriffen werden kann. Ein aktuelles Beispiel, das dieses "versus"-Argument sehr schön ad absurdum führt, ist ein Entwurf der GRDDL-Arbeitsgruppe, in der die RDF-Community zusammen mit der Microformats-Community an der Spezifizierung eines Mechanismus' gearbeitet hat, Microformats in RDF zu transformieren, um sie dann mit der Abfragesprache SPARQL zu integrieren. Die Semantic-Web-Vision (so man denn von einer sprechen möchte) erstreckt sich über etliche Schichten, die alle auf "normalen" Web-Techniken (IRIs, HTTP, etc.) basieren. Ob man sich mit automatisierten Agenten 'rumschlagen möchte, kann man sich in 10 Jahren überlegen, dass bei einer Darstellung des Semantic Webs immer der komplette Layer-Stack gezeigt wird, ist sicherlich nicht besonders klug. Die eigentlichen "Semantik"-Schichten sind vielleicht nicht trivial, aber auch nicht viel komplizierter als die Programmierung eines Atom-Stores oder universellen Microformat-Parsers.

Ein paar konkrete Fehler im iX-Artikel:
  • im Kontext des oben genannten ist die Aussage "jeder auf seine gänzlich eigene Art" unsinnig. Entweder man ist vernünftig und versucht nicht "Maschinen-interpretierbare Daten im Web" mit "Der Anwender im Mittelpunkt" zu vergleichen, oder man führt den Vergleich auf technischer Ebene durch und erkennt, dass kein Widerspruch besteht.
  • die ganze Zeit wird im Artikel versucht, einen Interessenskonflikt zu konstruieren ("Erbfolgezwist", "Oberhand" etc.), gleichzeitig rudert der Autor wieder zurück, und behauptet, die Entwürfe könnten voneinander profitieren. Was denn nun?
  • Zum Ontologie-Spektrum von RDF (und RDF ist nicht einmal mit dem Semantic Web gleichzusetzen) gehören derzeit SKOS, RDFS, und OWL. Mit SKOS lassen sich Folksonomien, mit RDFS Hierarchien, und mit OWL relativ komplexe Modelle darstellen. Cai Ziegler konstruiert ein Taxonomie-versus-Folksonomie-Argument (und definiert hierbei Taxonomien auch noch falsch als reines "is-a"-Modell) und schließt, dass Taxonomien nicht funktionieren, Folksonomien aber schon, diese aber nicht SemWeb sind. Außerdem behauptet er, eine Ontologie müsse eine Domäne umfassend definieren und kapseln. Aber genau das ist es ja, was mit Web-Ontologien (SKOS, RDFS oder OWL) gerade nicht nötig ist.
  • Aussagen wie "lassen an der Sinnhaftigkeit zweifeln" sind rein politische Äußerungen und lassen an der Sinnhaftigkeit ihrer selbst zweifeln ;) Projekte wie Queso (ein RDF-basierter Atom-Store), eigene Erfahrungen bei der Kombination von Microsofts LiveClipboard mit SPARQL, RDF-basierte Web-CMS und auch die Kombination von Microformats+eRDF+SPARQL machen meines Erachtens eine Menge Sinn und zeigen einiges an Potential. Leider hat Cai Ziegler von diesen aktuelleren Entwicklungen nichts mitbekommen, aber wie gesagt ist ihm da kein Vorwurf zu machen, das Marketing müssen wir SemWebber dezent verbessern.
  • "Der große Wurf blieb aus": Noch so ein politisch gefärbtes Statement. SPARQL, welches die ganze RDF-Welt erst dem Normal-Entwickler zugänglich macht, zusammen mit SKOS, das Trends wie Folksonomien aufgreift, sind noch mitten im W3C-Prozess. Auch eRDF und GRDDL für Microformats sind relativ neu. Die Vergangenheitsform ist sicherlich nicht angebracht. Gegen das "die brauchen schon ewig"-Argument kann man schmunzelnd anführen, dass das "Web 2.0" ja auch nicht quasi über Nacht entstanden ist (wie gerne behauptet wird). Lediglich der Name ist noch relativ (mittlerweile auch schon wieder 2 Jahre) jung. Kurz vor dem Dot-Com-Doom waren doch bereits myBlaBlaBla-Portale der (vermeintlich) große Renner (der Anwender im Mittelpunkt), Amazon's "Collective Intelligence" gibt's seit 1999, eBays Longtail-Ausnutzung und Ratings seit 1996. Blogs und Wikis sind uralt. Ich habe 1999 selber für ein Startup gearbeitet, das so etwas wie Netvibes entwickelt hat (damaliger Marktführer war onepage.com). Es dauert immer, bis sich technische Entwicklungen durchsetzen. Der Ruf nach mehr offenen Daten ist erst in jüngerer Zeit lauter geworden. Der "große Wurf" konnte also wohl eher noch gar nicht stattfinden.
  • "Weblogs sind Web 2.0". Richtig, und nutzen strukturierte Formate zur Syndizierung. Ein weiteres Beispiel für die Absurdität der "Versus"-Debatte
  • Semantische Erweiterungen für Wikipedia werden als "noch keins in der Phase der Umsetzung" bezeichnet, was ja auch irgendwie suggeriert, das das Ganze nicht funktioniert (hat). Ist aber auch alles noch brandneu und ein schönes Beispiel, wie sich SemWeb-Ansätze an vielen Stellen mit relativ wenig Aufwand integrieren lassen.
  • Tagging vs. RDF (im del.icio.us-Kontext): siehe SKOS, selbst das "rel-tag"-Microformat ist nur ein paar Zeilen Code von RDF entfernt.
  • "Folksonomies stehen im krassen Gegensatz zu [...] den Grundfesten des Semantic Web": Das ist nun leider völlig falsch. Ob ich statistische Auswertungen über gesammelte Tags durchführe oder nicht, ist unabhängig von Semantic-Web-Technologien. SKOS-Folksonomien würden aber z.B. die Zusammenführung ausgewählter Tags über Service-Grenzen hinweg (z.B. del.icio.us und flickr) ermöglichen (Erweiterung/Ergänzung des bestehenden Webs, nicht Ersatz!). Und wer mal mit 'nem Viel-Nutzer von del.icio.us gesprochen hat, wird feststellen, dass bessere Strukturierungsmöglichkeiten und Portabilität der Tags auf der Wunschliste ganz oben stehen. Hoppla.
  • "Web 2.0 schlägt das Semantic Web auf eigenem Boden". Hierfür wird DMOZ als Beispiel angeführt und erneut das inkorrekte Taxonomie-Beispiel als Begründung verwendet. DMOZ ist nur leider nicht wirklich ein SemWeb-Projekt, das verteilte Informationen integriert, sondern ein zentralisiertes Verzeichnis (das lediglich eine veraltete RDF-Version als Export-Format verwendet). Es fehlt auch das Gegenbeispiel. Falls del.icio.us gemeint ist: dieses exportiert seine Listen als RSS und verwendet spezielle Auszeichnungen, um Tags in den Feeds explizit zu machen. Wunderbarer Input für ein semantisches Web.
  • "das mit dem Begriff Semantic Web assoziierte Gedankengut in seinen Grundfesten erschüttert". Abgesehen von der schrägen Formulierung zeigt sich hier wohl eher, was Cai Ziegler mit Semantic Web assoziiert, und leider führen derartige Artikel dazu, dass noch weniger Informierte diese Assoziationen übernehmen.
</rant>

ZGDV Talk: Semantic Web and Web 2.0

Talk at ZGDV Darmstadt about Semantic Web and Web 2.0
pipe dream vs. piece of jargon There is a lot of Web 2.0 media buzz at the moment, many people seem to feel a presence [of enthusiasm] they haven't felt since... well, Obi-Wan Dot Com, I guess.

However, there also seems to be a misconception about Web 2.0 (whatever that term may mean to you) "replacing" the Semantic Web effort, or that - as written in an article in the current iX issue - the Semantic Web "was a failure", and "lost against" Web 2.0.

Yesterday, I gave a talk (slides, mostly in german, I'm afraid) at a ZGDV Conference in Darmstadt and tried to demystify this SemWeb "versus" Web 2.0 perception a little bit. I tried to show that the concepts are not that easy to compare really, that the technology behind actually follows common goals, and that the whole discussion shouldn't be taken too seriously. Of course there is a mind share (and developer market) contest, but that's more or less all it boils down to when you analyse the "competition". See for example the rather childish "we are lowercase semantic web" claim of microformats. They are cool, pragmatic, and completely in line with the Semantic Web idea ("semantics for structured data on the web"). Hopefully we'll soon see some apps that demonstrate how the whole market could gain a lot if people would work together more actively (the GRDDL activity is a great example) instead of wasting their energy in politics (IMOSHO).

The talk itself went fine (I think), too speedy towards the end as I ran out of time (as usual), where I surely lost a few people. But feedback was positive (as opposed to last webmonday, where I introduced the idea behind paggr and felt like Marty McFly after his guitar solo in BTTF ;).

Minority Report starring Leo SauermannLeo blogged, too, including funny photos of me (in hacker camouflage). I took some of him in return (see below). He gave an entertaining talk - on Semantic Desktops as you might've guessed - and started the whole thing with a "personal user interfaces in hollywood movies" quiz game, successfully waking up everyone in the room with mozartkugeln as incentive.
Leo presents Nepomuk

Web Clipboard: Adding liveliness to "Live Clipboard" with eRDF, JSON, and SPARQL.

Combining Live Clipboard with eRDF and SPARQL
Some context: In 2004, Tim Berners-Lee mentioned a potential RDF Clipboard as a user model which allowed copying resource descriptions between applications. Depending on the type of the copied resource, the target app would trigger appropriate actions. (See also the ESW wiki and Danny's blog for related links and discussion.)

I had a go at an "RDF data cart" last year which allowed you to "1click"-shop resource descriptions while surfing a site. Before leaving, you could "check out" the collected resource descriptions. However, the functionality was limited to a single session, the resource pointers didn't use globally valid identifiers.

Then, a couple of months ago, Ray Ozzie announced Live Clipboard, which uses a neat trick to access the operating system's clipboard for Copy & Paste operations across web pages.

Last week, I finally found the time to combine the Live Clipboard trick with the stuff I'm currently working on: A Semantic Publishing Framework, Embeddable RDF, and SPARQL. If you haven't heard of the latter two: eRDF is a microformats-like way to embed RDF triples in HTML, SPARQL is the W3C's protocol and query language for RDF repositories.

What I came up with so far is a Web Clipboard that works similar to Live Clipboard (I'm actually thinking about making it fully compatible), with just a few differences:

  • Web Clipboard uses a hidden single-line text input instead of a textarea which seemed to be a little bit easier to insert into the document structure, and it makes it work in Opera 8.5. The downside is that input fields don't allow multi-line content to be pasted (which is not needed by Web Clipboard, but will be necessary if I want to add Live Clipboard compatibility)
  • Web Clipboard doesn't paste complete resource descriptions, but only pointers to those. This makes it possible to e.g. copy a resource from a simple list of person's names, and display full contact details after a paste operation. (See the demo for an example which does asynchronous calls to a SPARQL endpoint). This "pass by reference" enables things like distributed address books or calendars where changes at one place could be automatically updated in the other apps.
  • Instead of XML, Web Clipboard uses a small JSON object which can simply be evaluated by JavaScript applications, or split with a basic regular expression. The pasted object contains 1) a resource identifier, and 2) an endpoint where information about the identified resource is available. The endpoint information consists of a URL and a list of specifications supported by the endpoint.

Complete documentation is going to be up at the clipboard site, but I'll first see if I can make things Live Clipboard-compatible (and I'll be travelling for the rest of the week). Here is a simple explanation how the current SPARQL demo works:

Apart from adding a small javascript library and a CSS file to the page, I specified the clipboard namespace and a default endpoint to be used for any resource pointer embedded in the page (this is eRDF syntax):
<link rel="schema.webclip" href="http://webclip.web-semantics.org/ns/webclip#" />
<link rel="webclip.endpoint" href="http://www.sparqlets.org/clipboard/sparql" />

Then I embedded a sparqlet that generates the list of Planet RDF bloggers (this is done server-side). The important thing is that the HTML contains eRDF hooks like this:
<div id="agent0" class="-webclip-Res">
  <span class="webclip-resID" title="_:bb1ed0e67fdb042619f2f20fdc479c3af_id2245787"></span>
  <span class="foaf-name">Bob DuCharme</span>
  <a rel="foaf-weblog" href="http://www.snee.com/bobdc.blog/">bobdc.blog by Bob DuCharme</a>
</div>

Ideally, the resource ID (webclip:resID, here again in eRDF notation) is a URI or some other stable identifier. The queried endpoint, however, obviously couldn't find a URI for the rendered resource, so it only provided a bnode ID. This is ok for the SPARQL endpoint the clipboard uses, though. The "foaf:weblog" information could be used to further disambiguate the resource identifier, the demo doesn't use it, however.

(The nice thing about eRDF-encoded hooks is that the information can be read by any HTTP- and eRDF-enabled client, the clipboard functionality could be implemented without having to load the page in a browser.)

Now, when the page is displayed, an onload-handler instantiates a JavaScript Web Clipboard which automatically adds an icon for each resource identified by the "webclip:Res/webvlip:resID"-hooks.

When the icon is clicked, the resource pointer JSON object is created and can be copied to the system's clipboard. It currently looks like this (on a single line):
{
 resID : "_:bb1ed0e67fdb042619f2f20fdc479c3af_id2245787",
 endpoint: {
  url: "http://www.sparqlets.org/clipboard/sparql",
specs: [ "http://www.w3.org/TR/rdf-sparql-protocol/", "http://bob.pythonmac.org/archives/2005/12/05/remote-json-jsonp/" ]
} }

We can see that the clipboard uses the default endpoint mentioned at the document level as the embedded hook didn't specify a resource-specific endpoint. We can also see that the endpoint supports two specs, namely the SPARQL protocol and JSONP.

When this JSON object is pasted to another clipboard section, the onpaste-handler can decide what to do. In the demo, any paste section will make an asynchronous On-Demand-JavaScript call to the resource's SPARQL endpoint to retrieve a custom resource representation. The "Latest blog post" section uses a pre-defined callback, but this can be overwritten (as e.g. done by the "Resource Description" section which uses a custom function to display results).

I've added a playground area to the clipboard site where you can create your own clipboard sections. Give it a try, it's not too complicated. You can even bookmark them.

Here is an example JavaScript snippet that adds a clipboard section to a clipboard-enabled page with an 'id="resultCountSection"' HTML element:
window.clipboard.addSection({
  id : "resultCountSection",
resIDVar : "myRes",
query : "SELECT ?knowee WHERE "+ "{"+ " ?myRes <http://xmlns.com/foaf/0.1/knows> ?knowee . "+ "}"+ " LIMIT 50", callback : function(qr){ var rows=(qr.results["bindings"]) ? qr.results.bindings : []; var result="The pasted resource seems to know "+ rows.length+" persons."; /* update paste area */ this.item.innerHTML=result; /* refresh clipboard */ window.clipboard.activate(); } }); window.clipboard.activate();

Something like this is all that will be needed for the final clipboard. No microformats parsing or similar burdens (although you could use the Web Clipboard to process microformats). The Clipboard's definition of an endpoint is rather open, too. An RSS file could be considered an endpoint as well as any other Web-accessible document or API.

ARC Embedded RDF (eRDF) Parser for PHP

Announcing eRDF support for ARC + an eRDF/RDFa comparison
Update: The current RDFa primer is *not* broken wrt to WebArch, the examples were fixed two weeks ago. I've also removed the "no developer support" rant, just received personal support ;-)

While searching for a suitable output format for a new RDF framework, I've been looking at the various semantic hypertext approaches, namely microformats, Structured Blogging, RDFa, and Embedded RDF (eRDF). Each one has its pros and cons:

Microformats:
  • (+) widest deployment so far
  • (+) integrate nicely with current HTML and CSS
  • (-) centralized project, inventing custom microformats is discouraged
  • (-) don't scale, the number of MFs will either be very limited, or sooner or later there will be class name collisions

Structured Blogging:
  • (+) a large number of supporters (at least potentially, the supporters list is huge, although this doesn't represent the available tools)
  • (+) not a competitor, but a superset of microformats
  • (-) the metadata is embedded in a rather odd way
  • (-) the metadata is repeated
  • (-) the use cases are limited (e.g. reviews, events, etc)

RDFa:
  • (+) follows certain microformats principles (e.g. "Don't repeat yourself")
  • (+) freely extensible
  • (+) All resource descriptions (e.g. for events, profiles, products, etc.) can be extracted with a single transformation script
  • (+) RDF-focused
  • (+) W3C-supported
  • (-) Not XHMTL 1.0 compliant, it will take some time before it can be used in commercial products or picky geek circles
  • (-) The default datatype of literals is rdf:XMLLiteral which is wrong for most deployed properties

eRDF:
  • (+) follows the microformats principles
  • (+) freely extensible
  • (+) All resource descriptions (e.g. for events, profiles, products, etc.) can be extracted with a single transformation script
  • (+) uses existing markup
  • (+) XHTML 1.0 compliant
  • (+) RDF-focused
  • (-) Covers only a subset of RDF
  • (-) Does not support XML literals

So, both RDFa and eRDF seem like good candidates for embedding resource descriptions in HTML. The two are not really compatible, though, it is not easily possible to create a superset which is both RDFa and eRDF. However, my publishing framework is using a Wiki-like markup language (M4SH) which is converted to HTML, so I can add support for both approaches and make the output a configuration option. Maybe it's even possible to create a merged serialization without confusing transformers.

I'll surely have another look at RDFa when there is better deployment potential. For now, I've created a M4SH-to-eRDF converter (which is going to be available as part of the forthcoming SemSol framework), and an eRDF parser that can generate RDF/XML from embedded RDF. I've also added some extensions to work around (plain) eRDF's limitations, the main one being on-the-fly rewriting of owl:sameAs assertions to allow full descriptions of remote resources, e.g.
<div id="arc">
  <a rel="owl-sameAs" href="http://example.com/r/001#001"></a>
  <a rel="doap-maintainer" href="#ben">Benjamin</a>
</div>
is automatically converted to
<http://example.com/r/001#001> doap:maintainer <#ben>

The parser can be downloaded at the ARC site (documentation).
I've also put up a little demo service if you want to test the parser.

YARDFIXHTML - Yet Another RDF-In-XHTML proposal

Ian Davis introduced eRDF
Ian Davis proposes "Embedded RDF", a microformats-inspired path to metadata-enriched HTML. Unlike microformats, his approach can utilize a single generic transformation script instead of one transformation for each format (or micromodel if you prefer Danny Ayers' terminology), which is closer to RDF's idea of freely mixable vocabularies.

I had some hopes of RDF/A but stopped following its progress several months ago as it didn't seem to provide an easy way to really bridge the gap between HTML and RDF. My use case was (and is) to be able to markup html in a way which allows me to automatically (and without too much effort) generate context menus or tool-tips

Archives/Search

YYYY or YYYY/MM
No Posts found

Feeds