Posts tagged with: rdf

webinale 2008 starts today

I
see me speak at webinale 2008 Still a few hours left to finish my presentations, then I'll join Germany's WebDev crowd at the webinale 2008 in Karlsruhe (It's taking place at the same location as this year's ISWC). My talks are about "Semantic Web Tech 'n' Use" (mostly microformats, RDFa, SPARQL), and RDF-based "Online Social Graph Consolidation" (FOAF, XFN, SPARQLy inference, knowee etc.), and there will be more SemWeb-related talks:
A (personally) interesting thing about the webinale is its co-location with the International PHP Conference, and the (new) Dynamic Languages World Europe, and that registering for one conference includes free access to any of the others. It's the perfect audience to talk about practical SemWeb Scripting with ARC and PHP.

New ARC2 plugins

K
If there was a "most productive SemWeb coder" category in Danny's "This Week's Semantic Web", this week's turn would probably be Keith Alexander's. Last week, he provided no fewer than three ARC2 Plugins:
While at it, he also implemented a SPARQL+ wrapper for Talis Platform stores.

I think I blogged about Morten's RemoteEndpoint plugin a while back (this one should really become part of the core codebase), but did I mention Peter Krantz' File System Synchronizer? It keeps an RDF Store in sync with a file system directory which enables a really nice option to implement larger RDF editing systems on top of ARC: By using editing tools that work with small RDF files (quick response times and everything) and his plugin, it becomes possible to provide rich query functionality over the whole dataset without the store getting in the way of the publishing tools. RDF index rebuilding can be slow, de-coupling read from write operations and introducing an asynchronous update process is a nice solution.

Awesome stuff.

RDFAuth, with less Story-telling

A
Update: Dan Brickley suggested (in a private mail to Henry and me) that "RDFAuth" is most probably not a very smart name anyway, as something that contains official/generic technologies (RDF and oAuth in this case) may send wrong signals and cause misunderstanding. And that we shouldn't waste time fighting. He suggests more specific names (BeatnikAuth/knoweeAuth) for the time being, as this is all still premature stuff, and because no one should claim to have created an "RDFAuth", especially not if it isn't backed by the whole community. Well, what can I say, he's of course right. I apologize and will s/RDFAuth/knoweeAuth/ from now on.

You may have read Henry Story's recent post about RDFAuth, an RDF-oriented mechanism to access (partly) protected web resources. He's not describing the RDFAuth protocol, though. I've tried to clarify things a couple of times on the semantic-web list, but somehow he seems to prefer to hijack the name instead, together with parts of the idea and claim it as his invention (it's not mine either, to make things clear). Now, innovation is always based on a combination of prior work and improvements, but his "following my strict architectural guidelines, I came across what I am just calling RDFAuth" preening goes a tiny bit too far to not trigger a comment.

What he describes (a PGP-based authentication protocol) is clearly interesting, but it's simply not what RDFAuth, an idea that was developed in the knowee project, is about. For knowee (which just released the alpha version, btw), we needed something that can be implemented on basic, shared web servers. PGP is simply not an option (if considered mandatory). People won't upload their private keys to 3rd party servers, and PGP libs are not necessarily available in those environments either.

Final clarifications:
  • RDFAuth may support PGP, it's just not a requirement.
  • I'm pretty sure that Henry's PGP-only approach will attract more SemWeb geeks than RDFAuth, it just wouldn't necessarily work for knowee's target audience.
  • The RDFAuth idea is in no way special or new. It more or less predates oAuth, but long-term-ish I'll most probably have to replace it with oAuth, once there is a way to generate tokens without the browser redirect dance (fully server-side token generation is another knowee requirement).
  • I read about a token-based, decentralized identification mechanism on a very early OpenID FAQ page that described a non-browser-dependent way to log into web sites. I can't find the link anymore, but this is basically what RDFAuth is based on. So, this is not my idea either.
  • The possibility of combining 200 OK response headers with WWW-Authenticate was suggested by Etan Wexler on the FOAF mailing list
  • Dan Brickley explored SPARQL-based group membership discovery a while back. I like this idea of de-coupling data exchange decisions from the identification/authorisation process very much (RDFers don't need things like sReg or Attribute Exchange).
  • The only thing that RDFAuth adds is light-weight, personal token services (as a replacement of OpenID's browser-based identification), and the re-use of straight HTTP BasicAuth, so that partly protected resources can more easily be discovered by both server-side and client-side tools (e.g. Tabulator), and also to allow widely deployed modules like mod_php to access the login token and client identifier using built-in functionality. And I doubt that layering a protocol on top of HTTP BasicAuth hasn't been done before, so, again, nothing special to brag about here.
OK, enough geek whining ;), don't want to waste more time of my precious weekend.

Project offer: Part-time RDF/OWL Modeling

p
Here is a nice project offer I received, but that I won't have enough time to work on myself. The project is about analyzing a set of statistical reports and creating an RDF Schema or OWL Ontology for them. With help from friendly #swig folks, a first selection of probably re-usable schemas could already be identified. The next task would be picking the right terms, and maybe some thoughts about additionally needed glue terms.

If you are interested, please send a short mail to fwd_1 at semsol dot com. It will be auto-forwarded to the offerer.

Looking for paid (Semantic Web) Projects

I
Update 2: Yay, I think I'm safe for the next couple of months, should have blogged much earlier. Now I'm starting to think we could really need a Job site for SemWeb people..

Update: Ah, the blogosphere. I already received some replies. One to share: Aduna is looking for a Java Engineer.

About a year ago, I received some funds which allowed me to re-write the ARC toolkit, and also to bring Trice (a semantic web application framework for PHP) to production-readiness. However, Semantic Web Development is generally still very new, especially in the Web Agency market where I'm coming from. It's not that easy yet to keep things self-sustaining.

May well be that I should blog less about bleeding-edge experiments, but rather about how RDF and SPARQL allow me to deploy extensible websites at a fraction of the time it used to take in the past. "Release Early", "Data First", "Evolve on the Fly", and all those patterns that SemWeb technology enables in a web development context.

Anyway, to keep things short: I'm actively (read: urgently ;-) looking for more paid projects. I'm a Web development all-rounder with particular interest in scripting languages and quite some experience in delivering RDF and frontend solutions (more details on my profile page). While it would of course be great to work on stuff where I can use my tools, I'm available for more general web development as well. I'm most productive when I can work from my office, but temporary travelling is basically fine, too. The Düsseldorf Airport is just minutes away.

Cheers in advance for suggestions,

ARC Data Wiki Plugin

A
I'm blessed with a small but first-class community around ARC that helps me with bug reports, patches, encouraging feedback, and nifty ideas. One example for the latter was Morten Frederiksen's invention to allow ARC to be extended with third party plugins. He even demonstrated the utility by enhancing the toolkit with a remote SPARQL endpoint for his named graph exchange work. ARC plugins are not bundled with the core codebase (which is meant to stay compact), but can easily be integrated in any ARC installation (Developer documentation is now online, too).

My first own plugin was triggered by Tim Berners-Lee's suggestion to write a lightweight request handler for an RDF-powered Data Wiki, as described in a recent Tech Report (PDF) and already implemented with Algae. I had to tweak the SPARQL+ spec and ARC's Query Parser to make it compatible with Eric Prud'hommeaux's SPARQL/Update flavor. This had the nice side-effect that all three SPARQL Write proposals (SPARUL, SPARQL/Update, SPARQL+) now (almost) share a common subset for basic INSERTs and DELETEs. After these updates, writing the plugin itself became almost trivial.

The code is still experimental and limited, but it's available for download, together with setup instructions. The Data Wiki plugin doesn't require a database (unlike the other SPARQL components in ARC) and supports update calls sent by RDF editors such as the Tabulator. I've set up a demo RDF wiki and will try to add remote update functionality to my own editor (to be renamed) now as well. Hmmm, would be cool to have a selection of generic tools to collaboratively read from and write to shared RDF spaces one day.

Data Wiki

DriftR Linked Data Browser and Editor (Screencast)

A
While I'm unfortunately struggling to find paid projects these days, I had at least some time to work on core technology for my Trice framework and a new knowee release. The latest module is an in-browser RDF viewer and editor for Linked Data, heavily inspired by the freebase UI (hopefully with less screen flickering, though).

I'm clearly not there yet, but today I uploaded a screencast (quicktime 4MB), and I think I can start incorporating it into the knowee tools soon. Have fun watching it if you like, and Merry X-Mas!

DriftR Screencast

ARC2 preview release ready for feedback

D
ARCitecture After writing a bunch of instructions for ARC2 this week, I think/hope it is now finally ready for experiments and feedback. The site will get more documentation and code snippets in the coming weeks, and some components are not even part of the release. However, I've been waiting long enough already. So, here goes:
Shout-outs to everyone who helped with bug reports, encouraging feedback, and suggestions for the new release. Special thanks to CivicActions and Jonathan Hendler for development support and various stress tests of earlier versions.

Slowly resurfacing for more SWEOing

R
After two months of spec implementation, I'm finally getting at the more interesting stuff again. I'm not fully on schedule, but I could at least meet the first of this week's three deadlines: I presented a first knowee proof of concept at yesterday's webmontag and feedback was positive. Deadline #3 is a working prototype by this wednesday (promised to SWEO), but I'm not sure I'll be able to deliver. We are close, but there is also deadline #2 lurking: the DAWG implementation reports are due today, and I'm still working on mine for ARC2...

Nevertheless, webmontag was really great again. Had an interesting chat with mixxt's Oliver Ueberholz about the practical problems of adding social data export to SNSs. It seems that microformats are not always the obvious answer when the public export of machine-readable profile information is meant to be implemented as a user option, or when you want to be able to block certain bots from crawling your networks. They are thinking about external files now and wonder if RDF might be an option. Keeping the template code clean, and the ability to serve content for "online social graph aggregators" like knowee from separate machines are two potential benefits. At least the "hidden information is not maintained" argument is moot in their case, as the data is auto-generated anyway.

Last week I had lunch with Alexander Linden, the guy who used to position Semantic Web on the Gartner Hype Cycles. He left Gartner for his own venture (HumanGrid), a crowdsourcing platform. Surprisingly, they are not using SemWeb technology directly, but he said that their solution could be very helpful to generate and quality-improve RDF instance data.

We also talked a bit about SemWeb startup funding, and despite Gartner's latest Hype Cycle, which put SemWeb into the trough of disillusionment for the next 10(!) years, venture capital invested in semantic technology companies is apparently increasing. At least if you are in the US, that is. In Germany, a lot of money still seems to vanish in dodgy projects like smartweb. I hope that theseus is going to have more practical outcomes. They are going to run a competition for non-partners, that's a step in the right direction.

Related to startups and their technology choice is a concern about the lack of end-user semantic web applications that demonstrate the utility of RDF. A Semantic Web is going to be one of the Next Big Things, but that doesn't necessarily mean that it'll be built with W3C technologies. The only big-potential (US) startup with an RDF infrastructure, for example, is generating so much hype that they are doomed to disappoint, no matter what they are going to launch (if they'll ever do). Maybe RDFers should hurry up a little if they want to help avoid a possible backlash. I will, at least.

Alexander said the RDF stack has always been rather tough to sell (especially OWL), and identified some strategies that the SWEO group could focus on during the next couple of months:
  • Admit that the full technology framework is not trivial, it's web-scale information integration after all. If you present it to newbies, always present a consumable subset only, not the full thing (Uh, I'm guilty).
  • Organise more local meetings, BarCamp-style, open to people with related interests (i.e. not-yet-semweb developers)
  • Provide convincing solutions that clearly show how RDF saves money and/or time, or increases productivity in a way that no alternative technology can. CEOs are just one group, a new technology has to attract the developers, because they decide how much friction losses they are willing to accept before they get at the benefits of a new technology. (SWEO is already building a collection of success stories, the Community Projects address these points, too, I think)
  • Something to download and play with for those with initial interest (that's basically Danny's Semantic Web in a box suggestion)
  • Public datasets (Yay LOD project)
An additional suggestion I heard yesterday was "Non-technical Marketing". And that's something SWEO is spending quite some time on, too. (The W3C comm team is actually coming up with a full SemWeb branding strategy soon.) And to cite Dan Brickley:
16:37:57 [danbri] best thing we ever did, was make those tshirts!

So, it seems the SWEO activities are moving in the right direction, but it'd be great to get more ideas. What do you think is still missing or should get a high priority?

knowee.org

t
Just a short update on knowee, one of the SWEO Community Projects. There is nice progress, although it took some time to get things moving. An early site is now online, and we have a first design for the app.

I still have to flesh out knowee's approach to "social graph portability" (or whatever it's called this week), but then I'll focus on the prototype which will hopefully be available by Mid/End-September.

Back from webinale 2007

s
webinale/ipc sign The webinale slides are online now. The session went OK, I'd say. I always make the mistake to look at the high conference prices and then end up trying to squeeze too much information into my talks to give the people some value for their money. It also was a bit hard to predict what the audience of the newly introduced webinale would be like. I did receive some great feedback from PHP coders (sneaking in from co-located IPC) who already had specific questions and asked about RAP and ARC. But I could see from many faces right after the session, that a very basic talk may have been better. Leo suggested to skip the ontology stuff entirely, the amount of different flavours (SKOS, RDF Schema, OWL Lite/DL/Full/+/-/1.1) is surely a whole mess marketing-wise. Next time I'll try to stick to the more intuitive stuff. At least I had a convincing demo about how (low-level) ontologies can be useful to greatly reduce custom application code.

I had a short chat with pageflakes' CEO Christoph Janz. Semantic Web technologies are not on their radar yet (maybe they are now ;), but we talked a bit about the possibility to add some RDF functionality to their widgets (which they call "flakes"). They may let us try some things in the context of the knowee project, e.g. a flake that could store contact data retrieved via GRDDL or a SPARQL endpoint. Might be worth checking out their SDK.

So, next time: less OWL, more wild colours:
semweb web 2.0 layers

SWEO project "knowee"

C
I finally sent out a call for participation for knowee, one of the projects supported by SWEO (just in time for the F2F reports tomorrow).

The project is about creating a semwebby address book thingy, but there actually is another dimension to the "outreach" aspect beyond running code. I'd really like to bring RDFers and microformateers closer together (from both directions). RDFers can learn a lot from the pragmatic microformats community, and adding data integration (+query) functionality to microformats can enable a whole new set of applications.

Funded!

s
This is going to change everything. Well, almost. I will continue to work on my Semantic Web solutions, but there will be a major re-branding and finally a focused roadmap. My code experiments and projects are going to be critically reviewed and consolidated. (I can't tell yet what stuff is going to be continued, but I'll keep my SWEO commitments, esp. the knowee community project which is going to start in April).
Quite some orga action coming up, but I'm looking forward to a clean bengee.reboot()
  • I'll move from Essen to Düsseldorf, which is closer to Cologne, the DUS airport, and also a little away from the Web periphery here, with the Ruhr Valley still in reach, though.
  • The appmosphere wordplay is going to be discontinued. No German really managed to pronounce or remember it correctly, and the *-osphere naming is rather overused these days anyway.
  • The new brand will most probably be semsol.com which is going to be transformed to a Semantic Web Agency. (I've always been a frontend developer, combing this with an in-house RDF system will hopefully form a nice USP for the anticipated move towards info-driven Web apps.)
  • The open source RDF framework currently named semsol will get a new name (perhaps just "semsol suite", we'll see), and there will be more product-style solutions (a browser, an editor, a schema manager, etc.).
  • ARC will keep its name, but is going to be re-coded as ARC2 based on the experience and feedback obtained so far.
  • Less research-y slippery slopes.
  • More Germany-targeted activities.
semsol

A Comparison of Microformats, eRDF, and RDFa

A
Update (2006-02-13): In order to avoid further flame wars with RDFa folks, I've adjusted the form to not show my personal priorities as default settings anymore (here they are if you are interested, it's a 48-42-40 ranking for MFs, eRDF, and RDFa respectively). All features are set to "Nice to have" now. As you can see, for these settings, RDFa gets the highest ranking (I *said* the comparison is not biased against RDFa!). If you disable the features related to domain-independent resource descriptions, MFs shine, if you insist on HTML validity, eRDF moves up, etc. It's all in the mix.

After a comment of mine on the Microformats IRC channel, SWD's Michael Hausenblas asks for the reason why I said that I personally don't like RDFa. Damn public logs ;) OK, now I have to justify that somehow without falling into rant mode again...

I already wrote a little comparison of Microformats, Structured Blogging, eRDF, and RDFa some time ago, sounds like a good opportunity to see how things evolved during the last 8 months. Back then I concluded that both eRDF and RDFa were preferred candidates for SemSol, but that RDFa lacked the necessary deployment potential due to not being valid HTML (as far as any widespread HTML spec is concerned).

I excluded the Structured Blogging initiative from this comparison, it seems to have died a silent death. (Their approach to redundantly embed microcontent in script tags apparently didn't convince the developer community.) I also excluded features which are equally available in all approaches, such as visible metadata, general support for plain literals, being well-formed, no negative effect on browser behaviour, etc.

Pretending to be constructive, and in order to make things less biased, I embedded a dynamic page item that allows you to create your own, tailored comparison. The default results reflect my personal requirements (and hopefully answer Michael's question). As your mileage does most probably vary, you can just tweak the feature priorities (The different results are not stored, but the custom comparisons can be bookmarked). Feel free to leave a comment if you'd like me to add more criteria.

No. Feature or Requirement Priority MFs eRDF RDFa
1 DRY (Don't Repeat Yourself) yes yes mostly
2 HTML4 / XHTML 1.0 validity yes yes no
3 Custom extensions / Vocabulary mixing no yes yes
4 Arbitrary resource descriptions no yes yes
5 Explicit syntactic means for arbitrary resource descriptions no no yes
6 Supported by the W3C partly partly yes
7 Follow DCMI guidelines no yes no
8 Stable/Uniform syntax specification partly yes yes
9 Predictable RDF mappings mostly yes yes
10 Live/Web Clipboard Compatibility yes mostly mostly
11 Reliable copying, aggregation, and re-publishing of source chunks. (Self-containment) mostly partly partly
12 Support for not just plain literals (e.g. typed dates, floats, or markup). yes no yes
13 Triple bloat prevention (only actively marked-up information leads to triples) yes yes no
14 Possible integration in namespaced (non-HTML) XML languages. no no yes
15 Mainstream Web developers are already adopting it. yes no no
16 Tidy-safety (Cleaning up the page will never alter the embedded semantics) yes yes no
17 Explicit support for blank nodes. no no yes
18 Compact syntax, based on existing HTML semantics like the address tag or rel/rev/class attributes. yes mostly partly
19 Inclusion of newly evolving publishing patterns (e.g. rel="nofollow"). yes no partly
20 Support for head section metadata such as OpenID or Feed hooks. no partly partly

Results

Solution Points Missing Requirements
RDFa 35 -
eRDF 34 -
Microformats 33 -

Max. points for selected criteria: 60

Summary:

Your requirements are met by RDFa, or eRDF, or Microformats.

Feature notes/explanations:

DRY (Don't Repeat Yourself)
  • RDFa: Literals have to be redundantly put in "content" attributes in order to make them un-typed.
HTML4 / XHTML 1.0 validity
  • RDFa: Given the buzz around the WHATWG, it's uncertain when (if at all) XHTML 2 or XHTML 1.1 modules will be widely deployed enough.
Explicit syntactic means for arbitrary resource descriptions
  • eRDF: owl:sameAs statements (or other IFPs) have to be used to describe external resources.
Supported by the W3C
  • MFs, eRDF: Indirectly supported by W3C's GRDDL effort.
Stable/Uniform syntax specification
  • MFs: Although MFs reuse HTML structures, the format syntax layered on top differs, so that each MF needs separate (though stable) parsing rules.
Predictable RDF mappings
  • MFs: Microformats could be mapped to different RDF structures, but the GRDDL WG will probably recommend fixed mappings.
Live/Web Clipboard Compatibility
  • eRDF, RDFa: Tweaks are needed to make them Live-Clipboard compatible.
Reliable copying, aggregation, and re-publishing of source chunks. (Self-containment)
  • MFs: Some Microformats (e.g. XFN) lose their intended semantics when regarded out of context.
  • eRDF/RDFa: Only chunks with nearby/embedded namespace definitions can be reliably copied.
Support for head section metadata such as OpenID or Feed hooks.
  • eRDF: Can support openID hooks.
  • RDFa: Will probably interpret any rel attribute.


Bottom line: For many requirement combinations a single solution alone is not enough. My tailored summary suggests for example that I should be fine with a combination of Microformats and eRDF. How does your preferred solution mix look like?

SeenOn - Timestamp or State of Mind?

f
<tommorris> Every time I see a movie from now on,
  I'm adding the IMDB URL to my FOAF file.
<briansuda> with what predicate?
<tommorris> rdf.opiumfield.com/movie/0.1/seen
...
<briansuda> seenOn, is that a timestamp or a state-of-mind?
(microformats(!) irc channel)

Now, who said RDF was less real-word-ish than microformats?

Related link (wrt to movies, not toxics): Microformats 80%, RDF 20% by Tom Morris about the longtail utility of (e)RDF(a). Wanted to state something like this for some time. After implementing a Microcontent parser (part of the next ARC release) that creates a merged triple set from eRDF and Microformats, I can't say anymore that MFs don't scale (even though making the meaning of nested formats explicit is sometimes tricky). I was really impressed by the amount of practical use cases covered by them (Listings and qualified review ratings even go beyond the demos I've seen in RDFer circles). However, there is still a lot of room for custom RDF extensions that can be used to extend microformatted HTML. Skill levels are just one of many longtail examples: They are currently not covered by hResume, but available in Uldis' CV vocab.

The important thing IMO is that RDFers should not forget to acknowledge the amazing deployment work of the MF community and focus on what they can add to the table (storage, querying, and mixing, as a start) instead of marketing RDF-in-HTML as an alternative, replacement, or otherwise "superior" (likewise the other way round, btw.). I think we also shouldn't overcharge the big content re-publishers. When maintainers of sites like LinkedIn or Eventful get bombed with requests to add different semantic serializations to their pages, they may hesitate to support any of them at all. For most of these mainstream sites, Microformats do the job just fine, and often better. Why should people for example have to specify namespaces when a simple, agreed-on rel-license does the trick already? (We could still use RDF to specify the license details, and even the license link is only a simple conversion away from RDF.)