In my previous post I mentioned that I'm building a Linked Data CMS. One of its components is a rich-text editor that allows the creation (and embedding) of structured markup.
An earlier version supported limited Microdata annotations, but now I've switched the mechanism and use an intermediate, but even simpler approach based on HTML5's handy data-* attributes. This lets you build almost arbitrary markup with the editor, including Microformats, Microdata, or RDFa. I don't know yet when the CMS will be publicly available (3 sites are under development right now), but as mentioned, I'd be happy about another pilot project or two. Below is a video demonstrating the editor and its easy customization options.
Posts tagged with: rdfa
Trice' Semantic Richtext Editor
A screencast demonstrating the structured RTE bundled with the Trice CMS
Posted on 2010-05-01 at 16:35 UTC
by
(trackback)
Could having two RDF-in-HTMLs actually be handy?
A combination of RDFa and Microdata would allow for separate semantic layers.
Apart from grumpy rants about the complexity of W3C's RDF specs and semantic richtext editing excitement, I haven't blogged or tweeted a lot recently. That's partly because there finally is increased demand for the stuff I'm doing at semsol (agency-style SemWeb development), but also because I've been working hard on getting my tools in a state where they feel more like typical Web frameworks and apps. Talis' Fanhu.bz is an example where (I think) we found a good balance between powerful RDF capabilities (data re-purposing, remote models, data augmentation, a crazy army of inference bots) and a non-technical UI (simplistic visual browser, Twitter-based annotation interfaces).
Another example is something I've been working on during the last months: I somehow managed to combine essential parts of Paggr (a drag&drop portal system based on RDF- and SPARQL-based widgets) with an RDF CMS (I'm currently looking for pilot projects). And although I decided to switch entirely to Microdata for semantic markup after exploring it during the FanHubz project, I wonder if there might be room for having two separate semantic layers in this sort of widget-based websites. Here is why:
As mentioned, I've taken a widget-like approach for the CMS. Each page section is a resource on its own that can be defined and extended by the web developer, it can be styled by themers, and it can be re-arranged and configured by the webmaster. In the RDF CMS context, widgets can easily integrate remote data, and when the integrated information is exposed as machine-readable data in the front-end, we can get beyond the "just-visual" integration of current widget pages and bring truly connectable and reusable information to the user interface.
Ideally, both the widgets' structural data and the content can be re-purposed by other apps. Just like in the early days of the Web, we could re-introduce a copy & paste culture of things for people to include in their own sites. With the difference that RDF simplifies copy-by-reference and source attribution. And both developers and end-users could be part of the game this time.
Anyway, one technical issue I encountered is when you have a page that contains multiple page items, but describes a single resource. With a single markup layer (say Microdata), you get a single tree where the context of the hierarchy is constantly switching between structural elements and content items (page structure -> main content -> page layout -> widget structure -> widget content). If you want to describe a single resource, you have to repeatedly re-introduce the triple subject ("this is about the page structure", "this is about the main page topic"). The first screenshot below shows the different (grey) widget areas in the editing view of the CMS. In the second screenshot, you can see that the displayed information (the marked calendar date, the flyer image, and the description) in the main area and the sidebar is about a single resource (an event).

Trice CMS editing view

Trice CMS page view with inline widgets describing one resource
If I used two separate semantic layers, e.g. RDFa for the content (the event description) and Microdata for the structural elements (column widths, widget template URIs, widget instance URIs), I could describe the resource and the structure without repeating the event subject in each page item.
To be honest, I'm not sure yet if this is really a problem, but I thought writing it down could kick off some thought processes (which now tend towards "No"). Keeping triples as stand-alone-ish as possible may actually be an advantage (even if subject URIs have to be repeated). No semantic markup solution so far provides full containment for reliable copy & paste, but explicit subjects (or "itemid"s in Microdata-speak) could bring us a little closer.
Conclusions? Err.., none yet. But hey, did you see the cool CMS screenshots?
Another example is something I've been working on during the last months: I somehow managed to combine essential parts of Paggr (a drag&drop portal system based on RDF- and SPARQL-based widgets) with an RDF CMS (I'm currently looking for pilot projects). And although I decided to switch entirely to Microdata for semantic markup after exploring it during the FanHubz project, I wonder if there might be room for having two separate semantic layers in this sort of widget-based websites. Here is why:
As mentioned, I've taken a widget-like approach for the CMS. Each page section is a resource on its own that can be defined and extended by the web developer, it can be styled by themers, and it can be re-arranged and configured by the webmaster. In the RDF CMS context, widgets can easily integrate remote data, and when the integrated information is exposed as machine-readable data in the front-end, we can get beyond the "just-visual" integration of current widget pages and bring truly connectable and reusable information to the user interface.
Ideally, both the widgets' structural data and the content can be re-purposed by other apps. Just like in the early days of the Web, we could re-introduce a copy & paste culture of things for people to include in their own sites. With the difference that RDF simplifies copy-by-reference and source attribution. And both developers and end-users could be part of the game this time.
Anyway, one technical issue I encountered is when you have a page that contains multiple page items, but describes a single resource. With a single markup layer (say Microdata), you get a single tree where the context of the hierarchy is constantly switching between structural elements and content items (page structure -> main content -> page layout -> widget structure -> widget content). If you want to describe a single resource, you have to repeatedly re-introduce the triple subject ("this is about the page structure", "this is about the main page topic"). The first screenshot below shows the different (grey) widget areas in the editing view of the CMS. In the second screenshot, you can see that the displayed information (the marked calendar date, the flyer image, and the description) in the main area and the sidebar is about a single resource (an event).

Trice CMS editing view

Trice CMS page view with inline widgets describing one resource
If I used two separate semantic layers, e.g. RDFa for the content (the event description) and Microdata for the structural elements (column widths, widget template URIs, widget instance URIs), I could describe the resource and the structure without repeating the event subject in each page item.
To be honest, I'm not sure yet if this is really a problem, but I thought writing it down could kick off some thought processes (which now tend towards "No"). Keeping triples as stand-alone-ish as possible may actually be an advantage (even if subject URIs have to be repeated). No semantic markup solution so far provides full containment for reliable copy & paste, but explicit subjects (or "itemid"s in Microdata-speak) could bring us a little closer.
Conclusions? Err.., none yet. But hey, did you see the cool CMS screenshots?
Posted on 2010-04-15 at 10:30 UTC
by
(trackback)
Microdata, semantic markup for both RDFers and non-RDFers
RDF-in-HTML could have been so simple.
There's been a whole lot of discussion around Microdata, a new approach for embedding machine-readable information into forthcoming HTML5. What I find most attractive about Microdata is the fact that it was designed by HTMLers, not RDFers. It's refreshingly pragmatic, free of other RDF spec legacy, but still capable of expressing most of RDF.
Unfortunately, RDFa lobbyists on the HTML WG mailing list forced the spec out of HTML5 core for the time being. This manoeuver was understandable (a lot of energy went into RDFa, after all), but in my opinion very short-sighted. How many uphill battles did we have, trying to get RDF to the broader developer community? And how many were successful? Atom, microformats, OpenID, Portable Contacts, XRDS, Activity Streams (well, not really), these are examples where RDFers tried, but failed to promote some of their infrastructure into the respective solutions. Now: HTML5, where the initial RDF lobbying actually had an effect and lead to a native mechanism for RDF-in-HTML. Yes, native, not in some separate spec. This would have become part of every HTML5 book, any HTML developer on this planet would have learned about it. Finally a battle won. And what a great one. HTML.
But no, Microdata wasn't developed by an RDF group, so they voted it out again. Now, the really sad thing is, there could have been a solution that would have served everybody sufficiently well, both HTMLers and RDFers. The RDFa group recently realized that RDFa needs to be revised anyway, there is going to be an RDFa 1.1 which will require new parsers. If they'd swallowed their pride, they would most probably have been able to define RDFa 1.1 as a proper superset of Microdata.
Here is a short overview of RDF features supported by Microdata:
I've been using Microdata in two of my recent RDF apps and the CMS module of (ahem, still not documented) Trice, and it's been a great experience. ARC is going to get a "microRDF" extractor that supports the RDF-in-Microdata markup below (Note: this output still requires a 2nd extraction process, as the current Microdata draft's RDF mechanism only produces intermediate RDF triples, which then still have to be post-processed. I hope my related suggestion will become official, but I seem to be the only pro-Microdata RDFer on the HTML list right now, so it may just stay as a convention):
Microdata:
Unfortunately, RDFa lobbyists on the HTML WG mailing list forced the spec out of HTML5 core for the time being. This manoeuver was understandable (a lot of energy went into RDFa, after all), but in my opinion very short-sighted. How many uphill battles did we have, trying to get RDF to the broader developer community? And how many were successful? Atom, microformats, OpenID, Portable Contacts, XRDS, Activity Streams (well, not really), these are examples where RDFers tried, but failed to promote some of their infrastructure into the respective solutions. Now: HTML5, where the initial RDF lobbying actually had an effect and lead to a native mechanism for RDF-in-HTML. Yes, native, not in some separate spec. This would have become part of every HTML5 book, any HTML developer on this planet would have learned about it. Finally a battle won. And what a great one. HTML.
But no, Microdata wasn't developed by an RDF group, so they voted it out again. Now, the really sad thing is, there could have been a solution that would have served everybody sufficiently well, both HTMLers and RDFers. The RDFa group recently realized that RDFa needs to be revised anyway, there is going to be an RDFa 1.1 which will require new parsers. If they'd swallowed their pride, they would most probably have been able to define RDFa 1.1 as a proper superset of Microdata.
Here is a short overview of RDF features supported by Microdata:
- Explicit resource containers, via @itemscope (in RDFa, the boundaries of a resource are often implicitly defined by @rel or @typeof)
- Subject declaration, via @itemid (RDFa uses @about)
- Main subject typing, via @itemtype (RDFa uses @typeof)
- Predicate declaration, via @itemprop (RDFa uses @property, @rel, and @rev)
- Literal objects, via node values (RDFa also allows hidden values via @content)
- Non-literal objects, via @href, @src, etc. (RDFa also allows hidden values via @resource)
- Object language, via @lang
- Blank nodes
- the possibility to preserve markup, but probably not necessarily as an explicit rdf:XMLLiteral
- datatypes for literal objects (I personally never used them in practice in the last 6 years that I've been developing RDF apps, but I can see some use cases)
I've been using Microdata in two of my recent RDF apps and the CMS module of (ahem, still not documented) Trice, and it's been a great experience. ARC is going to get a "microRDF" extractor that supports the RDF-in-Microdata markup below (Note: this output still requires a 2nd extraction process, as the current Microdata draft's RDF mechanism only produces intermediate RDF triples, which then still have to be post-processed. I hope my related suggestion will become official, but I seem to be the only pro-Microdata RDFer on the HTML list right now, so it may just stay as a convention):
Microdata:
<div itemscope itemtype="http://xmlns.com/foaf/0.1/Person">
<!-- plain props are mapped to the itemtype's context -->
<img itemprop="img" src="mypic.jpg" alt="a pic of me" />
My name is <span itemprop="name"><span itemprop="nick">Alec</span> Tronnick</span>
and I blog at <a itemprop="weblog" href="http://alec-tronni.ck/">alec-tronni.ck</a>.
<!-- other RDF vocabs can be used via full itemprop URIs -->
<span itemprop="http://purl.org/vocab/bio/0.1/olb">
I'm a crash test dummy for semantic HTML.
</span>
</div>
Extracted RDF:
@base <http://host/path/>
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix bio: <http://purl.org/vocab/bio/0.1/> .
_:bn1 a foaf:Person ;
foaf:img <mypic.jpg> ;
foaf:name "Alec Tronnick" ;
foaf:nick "Alec" ;
foaf:weblog <http://alec-tronni.ck/> ;
bio:olb "I'm a crash test dummy for semantic HTML." .
Posted on 2010-01-26 at 12:00 UTC
by
(trackback)
Simple RDFication of SPARQL SELECT results with RDFa
How to use RDFa to make SELECT results locally available as RDF
A couple of weeks ago, I've written about the self-enforcing value spiral that RDF data enables. Here is an example about how RDFa can be used to support this "Repurpose-Republish" loop.
While data exchange between different semantic web sources is usually RDF-based (i.e. the data always maintain their semantics), there is one major exception: SPARQL SELECT queries. This developer-oriented operation returns tabular data (similar to record sets in SQL). Once the query result is separated from the query, the associated structural data is lost. You can't directly feed SELECT results back into a triple store, even though querying based on linked resources means that you have just created knowledge. It's a pity to show this generated information to human consumers only.
One of the demos at my NYC talk was a dynamic wiki item that pulled in competitor information from Semantic CrunchBase and injected that into a page template as HTML. The existing RDF infrastructure does not let me cache the SELECT results locally as usable RDF. And a semantic web client or crawler that indexes the wiki page will not learn how the described resource (e.g. Twitter) is related to the remote, linked entities.

However, by simply adding a single RDFa hook to the wiki item template, the RDF relation (e.g. competitor) can be made available again to apps that process my site content. This is basically how Linked Data works. But here is the really nifty thing: My site can be a consumer of its own pages, too, recursively enriching its own data.

I tweaked the wiki script which now works like this: When the page is saved, a first operation updates the wiki markup in the page's graph (i.e. the not-yet-populated template). In a second step, the page URL is retrieved via HTTP. This will return HTML with RDFa-encoded remote data, which is then parsed by ARC, and finally added to the same graph. We end up with a graph that does not only contain the wiki markup, but also the RDFized information that was integrated from remote sites. After adding this graph to the RDF store, we can use a local query to generate the page and occasionally reset the graph to enable copy-by-reference. And all this without any custom API code.

While data exchange between different semantic web sources is usually RDF-based (i.e. the data always maintain their semantics), there is one major exception: SPARQL SELECT queries. This developer-oriented operation returns tabular data (similar to record sets in SQL). Once the query result is separated from the query, the associated structural data is lost. You can't directly feed SELECT results back into a triple store, even though querying based on linked resources means that you have just created knowledge. It's a pity to show this generated information to human consumers only.
One of the demos at my NYC talk was a dynamic wiki item that pulled in competitor information from Semantic CrunchBase and injected that into a page template as HTML. The existing RDF infrastructure does not let me cache the SELECT results locally as usable RDF. And a semantic web client or crawler that indexes the wiki page will not learn how the described resource (e.g. Twitter) is related to the remote, linked entities.

However, by simply adding a single RDFa hook to the wiki item template, the RDF relation (e.g. competitor) can be made available again to apps that process my site content. This is basically how Linked Data works. But here is the really nifty thing: My site can be a consumer of its own pages, too, recursively enriching its own data.

I tweaked the wiki script which now works like this: When the page is saved, a first operation updates the wiki markup in the page's graph (i.e. the not-yet-populated template). In a second step, the page URL is retrieved via HTTP. This will return HTML with RDFa-encoded remote data, which is then parsed by ARC, and finally added to the same graph. We end up with a graph that does not only contain the wiki markup, but also the RDFized information that was integrated from remote sites. After adding this graph to the RDF store, we can use a local query to generate the page and occasionally reset the graph to enable copy-by-reference. And all this without any custom API code.

Posted on 2009-05-26 at 07:50 UTC
by
(trackback)
Could Microdata work better for me than RDFa?
Just had a quick look at the Microdata proposal, wondering about its pros and cons.
I've always had my little issues with RDFa, mainly for personal reasons. I'm repeating them here (for the last time, promised, don't want to trigger another flame war):
Having said that, if this little list is all I can come up with, then RDFa is probably a pretty solid and usable spec. I could easily write a list of things I find flawed in RDF/XML, or even SPARQL, my favorite RDF technology. And there is another good reason why I should tend towards using RDFa: Lack of proper alternatives. I still think it would be possible to create a cross-doctype solution. eRDF and my own poshRDF experiment show that it's possible, but so far these approaches are incomplete RDF-wise, and I wouldn't have the energy or funds to build a community to develop things further (and again, my arguments are motivated by personal use cases and habits, so there isn't a large overlap with other people's requirements anyway).
Nevertheless, the new "Microdata" proposal is currently being discussed, so it might be worth having a look and comparing it with my RDFa issue list above. I only had a quick scan, I may have gotten some details wrong:
I recently watched a short section of a TV fortune-teller show where desperate people could dial in to get their questions asked. The lady who called asked "Will I find a new love?", and the fortune-teller looked into her cards (very slowly, of course, given the 3 EUR/minute rate), then slowly lifted her head, looked straight into the camera and articulated her findings: "I see a definite Maybe."
I guess this awesome universal answer also works for my opening question. There simply is no ideal solution. I like the item/itemprop idea, but I'd need to add a hack for markup values (e.g. by adding a item="...XMLLiteral" container and then converting these items to XML nodes. But then I can just add a simpler hack to my RDFa extractor to deep-parse XMLLiterals). This doesn't justify a whole new spec. The copy/paste problem is not too urgent any more, as Linked Data enables nifty copy-by-reference instead of copy-by-value.
It's generally a little surprising to see that Microdata proposal. For months, the HTML5 opinion makers argued against user-defined markup structures, and now they created a completely new spec that not only extends RDFa's possibilities to identify resource types and relations, but also seems to introduce a redundant serialization for selected microformats.
Anyway, for the sake of convergence and less work, I think I still prefer (a subset of) RDFa, if only there was a way to get rid of CURIEs (who wants an abbreviation mechanism whose acronym can't even be properly expanded? ;). And an alternative for the validation pain could be a simple, locally installed validator, accessible through a Ubiquity script. When I think about it, I mainly just need well-formedness and some attribute checks. A Ubiquity script could directly show HTML errors and also extracted triples, and maybe even do some triple sanity checks, too. But then this setup would work for Microdata just as fine. Ah well..
- I personally don't like the amount of new attributes and their names (about, resource, typeof, and property are at least as inconsistent as RDF/XML's tokens).
- I've written an RDFa parser, but still don't really understand the processing model. RDFa does the job of course, and it's been specified by smart people I respect, but to me it just still feels a little too complicated. I often have to utilize an extraction service to verify the triples resulting from a snippet, and I've seen the creators of RDFa do the same.
One reason for being less intuitive than hoped is the fact that adding an attribute to some existing snippet can easily change the entire meaning of nested information. This makes it tricky to incrementally add structure to already tested and approved RDFa (an unnoticed @rel or @typeof may add an unwanted blank intermediate node, for example, and you can have any combination of RDFa attributes on a single node). - I consider structured blogging a central use case for RDF in HTML, yet it's not fully supported by RDFa: RDFa does not allow sub-structures in XML Literals (for security/triple injection reasons, IIRC), so you can't extract a post body (including HTML markup) and also get the annotations encoded in the body (like reviews or events).
- (Reliable) copy and paste is not possible when prefix definitions can be kept separate from annotations. This is relevant to some of the apps I'm working on, and it took me quite some time to admit that (intuitively desirable) URI abbreviations in HTML do have negative practical implications. It depends on the use case, but it also needs some experience to realize this, as the pro-prefix argument is practically motivated as well. (I started playing with RDF-ish copy & paste rather early, if that makes this conclusion more credible).
- The xmlns:prefix mechanism doesn't work nicely with my development environment. This is perhaps a silly argument, but for me personally it is important to see that green little "0 errors" indicator in my browser while I'm creating sites. It was not hard to extend the Firefox validator extension with support for new attributes, but there was no clean way to make it accept xmlns:prefix. Spotting true errors in the dozens of RDFa-related complaints is annoying.
Having said that, if this little list is all I can come up with, then RDFa is probably a pretty solid and usable spec. I could easily write a list of things I find flawed in RDF/XML, or even SPARQL, my favorite RDF technology. And there is another good reason why I should tend towards using RDFa: Lack of proper alternatives. I still think it would be possible to create a cross-doctype solution. eRDF and my own poshRDF experiment show that it's possible, but so far these approaches are incomplete RDF-wise, and I wouldn't have the energy or funds to build a community to develop things further (and again, my arguments are motivated by personal use cases and habits, so there isn't a large overlap with other people's requirements anyway).
Nevertheless, the new "Microdata" proposal is currently being discussed, so it might be worth having a look and comparing it with my RDFa issue list above. I only had a quick scan, I may have gotten some details wrong:
- It only introduces two new (mandatory) attributes: "item" and "itemprop". "item" can be used to type resources. RDFa's "about" can be re-used for URI-identified items. That sounds compact and neat so far.
- "item" is mandatory to indicate the boundary of a resource description. This makes accidental triples much less likely to happen than with RDFa. For any "itemprop", you just have to walk up the DOM tree to find the container item, which makes both human- and code-based parsing easy.
- Structured blogging?Aww, not really. While you can at least choose between raw markup or structured values in RDFa, Microdata only supports flat key-value pairs where the value is a node's textContent and won't contain tags (if I read the draft correctly). I don't really need datatypes and languages, but I definitely want RDF triples where the object can contain HTML markup (wiki blobs with embedded annotations are another example).
- Copy & paste of source code or from/to contenteditable sections is more reliable than with RDFa because there is no prefix mechanism.
- It'd be possible to make the Firefox validator eat the new Microdata attributes without complaining, but I'm not sure how likely it is to have Microdata support in the official distribution anytime soon. Marc Gueury writes that validating HTML5 may require a new sort of validator, switching to HTML5 may make things worse instead of better for me, development-wise.
I recently watched a short section of a TV fortune-teller show where desperate people could dial in to get their questions asked. The lady who called asked "Will I find a new love?", and the fortune-teller looked into her cards (very slowly, of course, given the 3 EUR/minute rate), then slowly lifted her head, looked straight into the camera and articulated her findings: "I see a definite Maybe."
I guess this awesome universal answer also works for my opening question. There simply is no ideal solution. I like the item/itemprop idea, but I'd need to add a hack for markup values (e.g. by adding a item="...XMLLiteral" container and then converting these items to XML nodes. But then I can just add a simpler hack to my RDFa extractor to deep-parse XMLLiterals). This doesn't justify a whole new spec. The copy/paste problem is not too urgent any more, as Linked Data enables nifty copy-by-reference instead of copy-by-value.
It's generally a little surprising to see that Microdata proposal. For months, the HTML5 opinion makers argued against user-defined markup structures, and now they created a completely new spec that not only extends RDFa's possibilities to identify resource types and relations, but also seems to introduce a redundant serialization for selected microformats.
Anyway, for the sake of convergence and less work, I think I still prefer (a subset of) RDFa, if only there was a way to get rid of CURIEs (who wants an abbreviation mechanism whose acronym can't even be properly expanded? ;). And an alternative for the validation pain could be a simple, locally installed validator, accessible through a Ubiquity script. When I think about it, I mainly just need well-formedness and some attribute checks. A Ubiquity script could directly show HTML errors and also extracted triples, and maybe even do some triple sanity checks, too. But then this setup would work for Microdata just as fine. Ah well..
Posted on 2009-05-15 at 19:50 UTC
by
(trackback)
RDFa button (inofficial)
An inofficial light-blue RDFa button
Update/Note: This is not an official RDFa button, those (in the known colours) will be provided by W3C's Communications Team once RDFa is a Rec or CRec.
A couple of days ago I created an RDFa technology button, and I was asked to share it, so here it is:

(PNG, GIF, SVG source file)
Please see the W3C Semantic Web Logos and Policies page for license details. This button is derived from the original W3C ones.

(PNG, GIF, SVG source file)
Please see the W3C Semantic Web Logos and Policies page for license details. This button is derived from the original W3C ones.
Posted on 2008-05-13 at 07:50 UTC
by
(trackback)
Adding (partial) RDFa support to the Firefox HTML Validator extension
Improving QA issues caused by RDFa
Update (2008-04-24): I managed to get rid of the xmlns-related errors (.replace() to the rescue ;), so the extension now accepts markup that follows the latest RDFa DTD (including @typeof). And while at it, I created versions for win and mac.
One of the reasons I haven't been using RDFa in production is the problem of quality assurance (a.k.a. plain old html validation). Not because RDFa isn't valid markup as such, but the main tool I'm using during development is Marc Gueury's excellent HTML Validator Extension for Firefox. RDFa is valid XHTML+RDFa, but XHTML+RDFa is not HTML, so the extension reports dozens of errors starting with the unrecognized Doctype declaration. The W3C Markup Validator supports RDFa, but I often develop while I'm offline, or on a non-public Web server, and the little "0 errors / 0 warnings" message in the status bar is more convenient than having to send markup to an online service.
Yesterday, however, I started working on an RDFa generator for one of Intellidimension's projects (Very interesting to see them use RDF big time, while many of us are still experimenting and thinking about potential markets, BTW). So, now that the RDFa-caused messages made it almost impossible to spot real HTML errors, I wondered if the add-on could perhaps be hacked to accept RDFa as well. Long story short: It can, to a certain extent. I don't know if arbitrary XML namespace prefixes (
Apart from that, RDFa-enabling the extension was mainly copying the RDFa DTD and a set of modules to the plug-in's SGML library. It now happily accepts RDFa attributes (about, resource, property, datatype, content, etc) and makes my life a little bit easier. If anyone has an idea how I could make it accept (non-predefined) namespace prefixes as well, I'd appreciate hints.
The tweaked extension is so far just a hack. I didn't even ping Marc yet or change the internal ID, so any extension update will remove the RDFa functionality. You can try/download it if you like (windows version), but I may have to take it offline should Marc not be happy about the re-distribution.
Yesterday, however, I started working on an RDFa generator for one of Intellidimension's projects (Very interesting to see them use RDF big time, while many of us are still experimenting and thinking about potential markets, BTW). So, now that the RDFa-caused messages made it almost impossible to spot real HTML errors, I wondered if the add-on could perhaps be hacked to accept RDFa as well. Long story short: It can, to a certain extent. I don't know if arbitrary XML namespace prefixes (
xmlns:foo="...") can be supported by a pure DTD/SGML-based validator (the FF extension uses openSP). FWIW, I couldn't get it to work.Apart from that, RDFa-enabling the extension was mainly copying the RDFa DTD and a set of modules to the plug-in's SGML library. It now happily accepts RDFa attributes (about, resource, property, datatype, content, etc) and makes my life a little bit easier. If anyone has an idea how I could make it accept (non-predefined) namespace prefixes as well, I'd appreciate hints.
The tweaked extension is so far just a hack. I didn't even ping Marc yet or change the internal ID, so any extension update will remove the RDFa functionality. You can try/download it if you like (windows version), but I may have to take it offline should Marc not be happy about the re-distribution.
Posted on 2008-04-23 at 19:45 UTC
by
(trackback)
Moving out of the shadow with RDFa
RDFa can help solve the "shadow semweb" problem
Ian Davis has written an interesting series of posts related to the problems arising from using fragment identifiers in resource URIs. Ian makes a lot of valid points, but I think misses an essential one. (With this post I'm breaking with a long tradition, I'm saying positive things about RDFa ;)
So, what's the problem, and how can RDFa help? Ian is discussing a lot of architectural things, and I'm sure there are issues and inconsistencies. But the practical problem he describes is based on the following WebArch principle:
I think that conclusion is not correct. eRDF re-uses HTML's @id to establish resource identifiers, so it mixes document identifiers with non-doc ones, and this is an ambiguity problem indeed. RDFa, however, is a layer on top of HTML that introduces a dedicated mechanism for resource identification, the @about attribute (, and that's why it unfortunately needs an own DTD, but that's another story). From a WebArch POV, the design is clean, content-type-specific identifiers don't get mixed. I can unambiguously describe what "..ben#self" is meant to identify without the representation format playing a role. RDFa can re-purpose HTML's text nodes for RDF literals, and anchors for resource URIs, but apart from that, the HTML document is not much more than a (human-friendly) container.
So, you can serve HTML and machine-readable information in a single document, you just have to make sure that your resource URI fragments don't appear in HTML @ids. And now that we are back on the practical level: Any other ID generation mechanism can work, too. It's fairly easy to implement a URI generator for RDF extracted from a microformats-enabled HTML page without overloading resource IDs. I personally don't see a huge problem (again, practically), as all my applications work with triples, not with representations or encodings which are dealt with by the parsers and extractors.
One practical issue remains, though: Current browsers don't (natively) support navigating to RDF identifiers encoded in RDFa-, microformats-, or GRDDL-enabled HTML pages. You need an additional JavaScript lib to invoke appropriate scroll actions after a page URI with a (non-HTML) fragment identifier is loaded. That's a little annoying, but doable. I think fragment identifiers are valuable. They allow the description of multiple resources in a single document, and that's a handy feature. Whether that breaks Web architecture theory, dunno. Not for me, at least ;-)
So, what's the problem, and how can RDFa help? Ian is discussing a lot of architectural things, and I'm sure there are issues and inconsistencies. But the practical problem he describes is based on the following WebArch principle:
The fragment identifies a portion of a representation obtained from a URI,That means that you can't use "http://example.com/ben#self" as an HTML section identifier and as a non-document identifier (e.g. the person ben). Ian concludes that
and its meaning changes depending on the type of representaion. [sic]
You can have a machine readable RDF version or a human readable HTMLand that this forces the structured web into a disregarded shadow of the human-readable web.
version but not both at the same time
I think that conclusion is not correct. eRDF re-uses HTML's @id to establish resource identifiers, so it mixes document identifiers with non-doc ones, and this is an ambiguity problem indeed. RDFa, however, is a layer on top of HTML that introduces a dedicated mechanism for resource identification, the @about attribute (, and that's why it unfortunately needs an own DTD, but that's another story). From a WebArch POV, the design is clean, content-type-specific identifiers don't get mixed. I can unambiguously describe what "..ben#self" is meant to identify without the representation format playing a role. RDFa can re-purpose HTML's text nodes for RDF literals, and anchors for resource URIs, but apart from that, the HTML document is not much more than a (human-friendly) container.
So, you can serve HTML and machine-readable information in a single document, you just have to make sure that your resource URI fragments don't appear in HTML @ids. And now that we are back on the practical level: Any other ID generation mechanism can work, too. It's fairly easy to implement a URI generator for RDF extracted from a microformats-enabled HTML page without overloading resource IDs. I personally don't see a huge problem (again, practically), as all my applications work with triples, not with representations or encodings which are dealt with by the parsers and extractors.
One practical issue remains, though: Current browsers don't (natively) support navigating to RDF identifiers encoded in RDFa-, microformats-, or GRDDL-enabled HTML pages. You need an additional JavaScript lib to invoke appropriate scroll actions after a page URI with a (non-HTML) fragment identifier is loaded. That's a little annoying, but doable. I think fragment identifiers are valuable. They allow the description of multiple resources in a single document, and that's a handy feature. Whether that breaks Web architecture theory, dunno. Not for me, at least ;-)
Posted on 2007-11-21 at 17:19:38 UTC
by
(trackback)
A Comparison of Microformats, eRDF, and RDFa
An updated (and customizable) comparison of the different approaches for semantically enhancing HTML.
Update (2006-02-13): In order to avoid further flame wars with RDFa folks, I've adjusted the form to not show my personal priorities as default settings anymore (here they are if you are interested, it's a 48-42-40 ranking for MFs, eRDF, and RDFa respectively). All features are set to "Nice to have" now. As you can see, for these settings, RDFa gets the highest ranking (I *said* the comparison is not biased against RDFa!). If you disable the features related to domain-independent resource descriptions, MFs shine, if you insist on HTML validity, eRDF moves up, etc. It's all in the mix.
After a comment of mine on the Microformats IRC channel, SWD's Michael Hausenblas asks for the reason why I said that I personally don't like RDFa. Damn public logs ;) OK, now I have to justify that somehow without falling into rant mode again...
I already wrote a little comparison of Microformats, Structured Blogging, eRDF, and RDFa some time ago, sounds like a good opportunity to see how things evolved during the last 8 months. Back then I concluded that both eRDF and RDFa were preferred candidates for SemSol, but that RDFa lacked the necessary deployment potential due to not being valid HTML (as far as any widespread HTML spec is concerned).
I excluded the Structured Blogging initiative from this comparison, it seems to have died a silent death. (Their approach to redundantly embed microcontent in script tags apparently didn't convince the developer community.) I also excluded features which are equally available in all approaches, such as visible metadata, general support for plain literals, being well-formed, no negative effect on browser behaviour, etc.
Pretending to be constructive, and in order to make things less biased, I embedded a dynamic page item that allows you to create your own, tailored comparison.The default results reflect my personal requirements (and hopefully answer Michael's question). As your mileage does most probably vary, you can just tweak the feature priorities (The different results are not stored, but the custom comparisons can be bookmarked). Feel free to leave a comment if you'd like me to add more criteria.
Bottom line: For many requirement combinations a single solution alone is not enough. My tailored summary suggests for example that I should be fine with a combination of Microformats and eRDF. How does your preferred solution mix look like?
I already wrote a little comparison of Microformats, Structured Blogging, eRDF, and RDFa some time ago, sounds like a good opportunity to see how things evolved during the last 8 months. Back then I concluded that both eRDF and RDFa were preferred candidates for SemSol, but that RDFa lacked the necessary deployment potential due to not being valid HTML (as far as any widespread HTML spec is concerned).
I excluded the Structured Blogging initiative from this comparison, it seems to have died a silent death. (Their approach to redundantly embed microcontent in script tags apparently didn't convince the developer community.) I also excluded features which are equally available in all approaches, such as visible metadata, general support for plain literals, being well-formed, no negative effect on browser behaviour, etc.
Pretending to be constructive, and in order to make things less biased, I embedded a dynamic page item that allows you to create your own, tailored comparison.
Bottom line: For many requirement combinations a single solution alone is not enough. My tailored summary suggests for example that I should be fine with a combination of Microformats and eRDF. How does your preferred solution mix look like?
Posted on 2007-02-12 at 14:10 UTC
by
(trackback)
SeenOn - Timestamp or State of Mind?
fun stuff from #microformats, comments on e/RDF/a wrt to Microformats
<tommorris> Every time I see a movie from now on, I'm adding the IMDB URL to my FOAF file. <briansuda> with what predicate? <tommorris> rdf.opiumfield.com/movie/0.1/seen ... <briansuda> seenOn, is that a timestamp or a state-of-mind?(microformats(!) irc channel)
Now, who said RDF was less real-word-ish than microformats?
Related link (wrt to movies, not toxics): Microformats 80%, RDF 20% by Tom Morris about the longtail utility of (e)RDF(a). Wanted to state something like this for some time. After implementing a Microcontent parser (part of the next ARC release) that creates a merged triple set from eRDF and Microformats, I can't say anymore that MFs don't scale (even though making the meaning of nested formats explicit is sometimes tricky). I was really impressed by the amount of practical use cases covered by them (Listings and qualified review ratings even go beyond the demos I've seen in RDFer circles). However, there is still a lot of room for custom RDF extensions that can be used to extend microformatted HTML. Skill levels are just one of many longtail examples: They are currently not covered by hResume, but available in Uldis' CV vocab.
The important thing IMO is that RDFers should not forget to acknowledge the amazing deployment work of the MF community and focus on what they can add to the table (storage, querying, and mixing, as a start) instead of marketing RDF-in-HTML as an alternative, replacement, or otherwise "superior" (likewise the other way round, btw.). I think we also shouldn't overcharge the big content re-publishers. When maintainers of sites like LinkedIn or Eventful get bombed with requests to add different semantic serializations to their pages, they may hesitate to support any of them at all. For most of these mainstream sites, Microformats do the job just fine, and often better. Why should people for example have to specify namespaces when a simple, agreed-on
rel-license does the trick already? (We could still use RDF to specify the license details, and even the license link is only a simple conversion away from RDF.)
Posted on 2007-02-10 at 14:55 UTC
by
(trackback)
ARC Embedded RDF (eRDF) Parser for PHP
Announcing eRDF support for ARC + an eRDF/RDFa comparison
Update: The current RDFa primer is *not* broken wrt to WebArch, the examples were fixed two weeks ago. I've also removed the "no developer support" rant, just received personal support ;-)
While searching for a suitable output format for a new RDF framework, I've been looking at the various semantic hypertext approaches, namely microformats, Structured Blogging, RDFa, and Embedded RDF (eRDF). Each one has its pros and cons:
Microformats:
Structured Blogging:
RDFa:
eRDF:
So, both RDFa and eRDF seem like good candidates for embedding resource descriptions in HTML. The two are not really compatible, though, it is not easily possible to create a superset which is both RDFa and eRDF. However, my publishing framework is using a Wiki-like markup language (M4SH) which is converted to HTML, so I can add support for both approaches and make the output a configuration option. Maybe it's even possible to create a merged serialization without confusing transformers.
I'll surely have another look at RDFa when there is better deployment potential. For now, I've created a M4SH-to-eRDF converter (which is going to be available as part of the forthcoming SemSol framework), and an eRDF parser that can generate RDF/XML from embedded RDF. I've also added some extensions to work around (plain) eRDF's limitations, the main one being on-the-fly rewriting of
The parser can be downloaded at the ARC site (documentation).
I've also put up a little demo service if you want to test the parser.
Microformats:
- (+) widest deployment so far
- (+) integrate nicely with current HTML and CSS
- (-) centralized project, inventing custom microformats is discouraged
- (-) don't scale, the number of MFs will either be very limited, or sooner or later there will be class name collisions
Structured Blogging:
- (+) a large number of supporters (at least potentially, the supporters list is huge, although this doesn't represent the available tools)
- (+) not a competitor, but a superset of microformats
- (-) the metadata is embedded in a rather odd way
- (-) the metadata is repeated
- (-) the use cases are limited (e.g. reviews, events, etc)
RDFa:
- (+) follows certain microformats principles (e.g. "Don't repeat yourself")
- (+) freely extensible
- (+) All resource descriptions (e.g. for events, profiles, products, etc.) can be extracted with a single transformation script
- (+) RDF-focused
- (+) W3C-supported
- (-) Not XHMTL 1.0 compliant, it will take some time before it can be used in commercial products or picky geek circles
- (-) The default datatype of literals is rdf:XMLLiteral which is wrong for most deployed properties
eRDF:
- (+) follows the microformats principles
- (+) freely extensible
- (+) All resource descriptions (e.g. for events, profiles, products, etc.) can be extracted with a single transformation script
- (+) uses existing markup
- (+) XHTML 1.0 compliant
- (+) RDF-focused
- (-) Covers only a subset of RDF
- (-) Does not support XML literals
So, both RDFa and eRDF seem like good candidates for embedding resource descriptions in HTML. The two are not really compatible, though, it is not easily possible to create a superset which is both RDFa and eRDF. However, my publishing framework is using a Wiki-like markup language (M4SH) which is converted to HTML, so I can add support for both approaches and make the output a configuration option. Maybe it's even possible to create a merged serialization without confusing transformers.
I'll surely have another look at RDFa when there is better deployment potential. For now, I've created a M4SH-to-eRDF converter (which is going to be available as part of the forthcoming SemSol framework), and an eRDF parser that can generate RDF/XML from embedded RDF. I've also added some extensions to work around (plain) eRDF's limitations, the main one being on-the-fly rewriting of
owl:sameAs assertions to allow full descriptions of remote resources, e.g.
<div id="arc"> <a rel="owl-sameAs" href="http://example.com/r/001#001"></a> <a rel="doap-maintainer" href="#ben">Benjamin</a> </div>is automatically converted to
<http://example.com/r/001#001> doap:maintainer <#ben>
The parser can be downloaded at the ARC site (documentation).
I've also put up a little demo service if you want to test the parser.
Posted on 2006-05-29 at 16:00 UTC
by
(trackback)
YARDFIXHTML - Yet Another RDF-In-XHTML proposal
Ian Davis introduced eRDF
Ian Davis proposes "Embedded RDF", a microformats-inspired path to metadata-enriched HTML. Unlike microformats, his approach can utilize a single generic transformation script instead of one transformation for each format (or micromodel if you prefer Danny Ayers' terminology), which is closer to RDF's idea of freely mixable vocabularies.
I had some hopes of RDF/A but stopped following its progress several months ago as it didn't seem to provide an easy way to really bridge the gap between HTML and RDF. My use case was (and is) to be able to markup html in a way which allows me to automatically (and without too much effort) generate context menus or tool-tips à la "Show me more info about this person" etc. I'm not sure if Ian's proposal provides a solution for this, but at least it seems to be much easier to grok than RDF/A, and not requiring XHTML2 is IMHO a huge plus.
It would perhaps have been helpful (or at least polite) to coordinate the effort with the RDF in XHTML Taskforce a bit. The W3C is working on an own approach for quite some time now (which may as well be the reason for Ian to go it alone..). However, the nice thing about GRDDLy approaches is that you don't really need a large user community for each format. For RDFers, it's another long-tail example, as long as the transformation process can be automated. Every triple counts ;)
Update: I had a closer look now at the embeddable RDF examples, and a simple
My publishing system uses a homegrown markup language internally which is sort of a trade-off between simplicity and flexibility, e.g. a Web link is generated by
The HTML for a blog-identified person as illustrated above could then perhaps be created from something as simple as
I had some hopes of RDF/A but stopped following its progress several months ago as it didn't seem to provide an easy way to really bridge the gap between HTML and RDF. My use case was (and is) to be able to markup html in a way which allows me to automatically (and without too much effort) generate context menus or tool-tips à la "Show me more info about this person" etc. I'm not sure if Ian's proposal provides a solution for this, but at least it seems to be much easier to grok than RDF/A, and not requiring XHTML2 is IMHO a huge plus.
It would perhaps have been helpful (or at least polite) to coordinate the effort with the RDF in XHTML Taskforce a bit. The W3C is working on an own approach for quite some time now (which may as well be the reason for Ian to go it alone..). However, the nice thing about GRDDLy approaches is that you don't really need a large user community for each format. For RDFers, it's another long-tail example, as long as the transformation process can be automated. Every triple counts ;)
<span id="p1">
<a rel="foaf-weblog" href="http://jd.com/blog" class="foaf-name">
John Doe
</a>
</span>
seems to be enough to implement the use case I mentioned. That's really cool as I'll now finally be able to hook an RDF store to my HTML publishing tools. It'll need some behaviour-like extensions, but it should be doable now without too many problems!My publishing system uses a homegrown markup language internally which is sort of a trade-off between simplicity and flexibility, e.g. a Web link is generated by
[[http://www.example.com/ Link label goes here]]
The HTML for a blog-identified person as illustrated above could then perhaps be created from something as simple as
[blog http://jd.com/blog John Doe] said ...Hm, generating the internal markup could be supported by some SPARQL-based "suggest persons as you type" feature, but that's of course a completely different story (and to be told another time).
Posted on 2005-10-19 at 13:50 UTC
by
(trackback)

