finally a bnode with a uri

Posts tagged with: foaf

Knowee - (The beginning of) a semantic social web address book

Knowee is a web address book that lets you integrate distributed social graph fragments. A new version is online at knowee.net.
Heh, this was planned as a one-week hack but somehow turned into a full re-write that took the complete December. Yesterday, I finally managed to tame the semantic bot army and today I've added a basic RDF editor. A sponsored version is now online at knowee.net, a code bundle for self-hosting will be made available at knowee.org tomorrow.

What is Knowee?

Knowee started as a SWEO project. Given the insane number of online social networks we all joined, together with the increasing amount of machine-readable "social data" sources, we dreamed of a distributed address book, where the owner doesn't have to manually maintain contact data, but instead simply subscribes to remote sources. The address book could then update itself automatically. And -in full SemWeb spirit- you'd get access to your consolidated social graph for re-purposing. There are several open-source projects in this area, most notably NoseRub and DiSo. Knowee is aiming at interoperability with these solutions.
knowee concept

Ingredients

For a webby address book, we need to pick some data formats, vocabularies, data exchange mechanisms, and the general app infrastructure:
  • PHP + MySQL: Knowee is based on the ubiquitous LAMP stack. It tries to keep things simple, you don't need system-level access for third-party components or cron jobs.
  • RDF: Knowee utilizes the Resource Description Framework. RDF gives us a very simple model (triples), lots of different formats (JSON, HTML, XML, ...), and free, low-cost extensibility.
  • FOAF, OpenSocial, microformats, Feeds: FOAF is the leading RDF vocabulary for social information. Feeds (RSS, Atom) are the lowest common denominator for exchanging non-static information. OpenSocial and microformats are more than just schemas, but the respective communities maintain very handy term sets, too. Knowee uses equivalent representations in RDF.
  • SPARQL: SPARQL is the W3C-recommended Query language and API for the Semantic Web.
  • OpenID: OpenID addresses Identity and Authentication requirements.
I'm still working on a solution for access control, the current Knowee version is limited to public data and simple, password-based access restrictions. OAuth is surely worth a look, although Knowee's use case is a little different and may be fine with just OpenID + sessions. Another option could be the impressive FOAF+SSL proposal, I'm not sure if they'll manage to provide a pure-PHP implementation for non-SSL-enabled hosts, though.

Features / Getting Started

This is a quick walk-through to introduce the current version.
Login / Signup
Log in with your (ideally non-XRDS) OpenID and pick a user name.

knowee login

Account setup
Knowee only supports a few services so far. Adding new ones is not hard, though. You can enable the SG API to auto-discover additional accounts. Hit "Proceed" when you're done.

knowee accounts

Profile setup
You can specify whether to make (parts of) your consolidated profile public or not. During the initial setup process, this screen will be almost empty, you can check back later when the semantic bots have done their job. Hit "Proceed".

knowee profile

Dashboard
The Dashboard shows your personal activity stream (later versions may include your contacts' activities, too), system information and a couple of shortcuts.
knowee dashboard

Contacts
The contact editor is still work in progress. So far, you can filter the list, add new entries, and edit existing contacts. The RDF editor is still pretty basic (Changes will be saved to a separate RDF graph, but deleted/changed fields may re-appear after synchronization. This needs more work.) The editor is schema-based and supports the vocabularies mentioned above. You'll be able to create your own fields at some later stage.

It's already possible to import FOAF profiles. Knowee will try to consolidate imported contacts so that you can add data from multiple sources, but then edit the information via a single form. The bot processor is extensible, we'll be able to add additional consolidators at run-time, it only looks at "owl:sameAs" at the moment.
knowee contacts

Enabling the SPARQL API
In the "Settings" section you'll find a form that lets you activate a personal SPARQL API. You can enable/protect read and/or write operations. The SPARQL endpoint provides low-level access to all your data, allows you to explore your social graph, or lets you create backups of your activity stream.

knowee api knowee api

That's more or less it for this version. You can always reset or delete your account, and manually delete incorrectly monitored graphs. The knowee.net system is running on the GoGrid cloud, but I'm still tuning things to let the underlying RDF CMS make better use of the multi-server setup. If things go wrong, blame me, not them. Caching is not fully in place yet, and I've limited the installation to 100 accounts. Give it a try, I'd be happy about feedback.

OpenSocial in RDF

I've created an RDF converter for the OpenSocial field definitions.
I'm currently working on a new release of Knowee. This is another (long-promised) item on my ToDo list before I can finally concentrate on paggr (although it took too long already and hopefully won't break my neck. All the planned paid projects for bootstrapping paggr didn't happen, due to frozen budgets and politics. I hope the situation here improves soon.)

So, while I was trawling the vocabulary market, trying to gather terms for the stuff that Knowee works with (people, their profiles, contacts, accounts, and activities), I remembered OpenSocial, the effort to standardize basic interactions between social networking sites. I can use a good amount of FOAF, but OpenSocial has very handy things such as a generic "tags" field and a clean vCard mapping. And it's a super-set of Portable Contacts, too.

Today, I wrote a converter that extracts the field definitions from the JavaScript specification files, together with their labels, comments, domains, and value types. (A little too late, I found out that Dan Brickley had already done part of this a couple of months ago, could have saved me some work, d'oh.)

I've just added the osoc spec to web-semantics.org/ns. I hope it might be of use to others as well. Funnily, the "relationship" term was not part of any of the source files, maybe I still have to invent a property (a foaf:knows equivalent that also works with organizations).

Semantic Web by Example: Semantic CrunchBase

CrunchBase is now available as Linked Data including a SPARQL endpoint and a custom API builder based on SPARQLScript.
Update: Wow, these guys are quick, there is now a full RSS feed for CrunchBoard jobs. I've tweaked the related examples.

This post is a bit late (I've even been TechCrunch'd already), but I wanted to add some features before I fully announce "Semantic CrunchBase", a Linked Data version of CrunchBase, the free directory of technology companies, people, and investors. CrunchBase recently activated an awesome API, with the invitation to build apps on top of it. This seemed like the ideal opportunity to test ARC and Trice, but also to demonstrate some of the things that become possible (or much easier) with SemWeb technology.

Turning CrunchBase into a Linked Dataset

The CB API is based on nicely structured JSON documents which can be retrieved through simple HTTP calls. The data is already interlinked, and each core resource (company, person, product, etc.) has a stable identifier, greatly simplifying the creation of RDF. Ideally, machine-readable representations would be served from crunchbase.com directly (maybe using the nicely evolving Rena toolkit), but the SemWeb community has a reputation of scaring away maintainers of potential target apps with complicated terminology and machinery before actually showing convincing benefits, so, at this stage (and given the nice API), it might make more sense to start with a separate site, and to present a selection of added values first.

For Semantic CrunchBase, I wrote a largely automated JSON2RDF converter, i.e. the initial RDF dataset is not using any known vocabs such as FOAF (or FOAFCorp). (We can INSERT mapping triples later, though.) Keeping most of the attribute names from the source docs (and mainly using just a single namespace) has another advantage besides simplified conversion: CrunchBase API users can more easily experiment with the SPARQL API (see twitter.json and twitter.rdf for a direct comparison).

An important principle in RDF land is the distinction between a resource and a page about a resource (it's very unlikely to hear an RDFer say "URLs are People" ;). This means that we need separate identifiers for e.g. Twitter and the Twitter description. There are different approaches, I decided to use (fake-)hash URIs which make embedding machine-readable data directly into the HTML views a bit more intuitive (IMHO):
  • /company/twitter#self denotes the company,
  • GETing the identifier resolves to /company/twitter which describes the company.
  • Direct RDF/XML or RDF/JSON can be retrieved by appending ".rdf" to the document URIs and/or via Content Negotiation.
This may sound a bit complicated (and for some reason RDFers love to endlessly discuss this stuff), but luckily, many RDF toolkits handle much of the needed functionality transparently.

The instant benefit of having linked data views is the possibility to freely explore the complete CrunchBase graph (e.g. from a company to its investors to their organizations to their relations etc.). However, the CrunchBase team has already done a great job, their UI already supports this functionality quite nicely, the RDF infrastructure doesn't really add anything here, functionality-wise. There is one advantage, but it's not obvious: An RDF-powered app can be extended at any time. On the data-level. Without the need for model changes (because there is none specified). And without the need for table tweaks (the DB schema is generic). We could, for example, enhance the data with CrunchBoard Jobs, DBPedia information, or profiles retrieved from Google's Social Graph API, without having to change a single script or table. (I switched to RDF as productivity booster some time ago and never looked back. The whole Semantic CrunchBase site took only a few person days to build, and most of the time was spent on writing the importer.) But let's skip the backstage benefits for now.

SPARQL - SQL for the Web

Tim Berners-Lee recently said that the success of the Semantic Web should be measured by the "level of unexpected reuse". While the HTML-based viewers support a certain level of serendipitous discovery, they only enable resource-by-resource exploration. It is not possible to spot non-predefined patterns such as "serial co-founders", or "founders of companies recently acquired". As an API provider, it is rather tricky to anticipate all potential use cases. On the CB API mailing list, people are expressing their interest in API methods to retrieve recent investments and acquisitions, or social graph fragments. Those can now only be coded and added by the API maintainers. Enter SPARQL. SPARQL, the protocol and query language for RDF graphs provides just this: flexibility for developers, less work for API providers. Semantic CrunchBase has an open SPARQL endpoint, but it's also possible to restrict/control the API while still using an RDF interface internally to easily define and activate new API methods. (During the last months I've been working for Intellidimension; they were using an on-request approach for AJAX front-ends. Setting up new API methods was often just a matter of minutes.)

With SPARQL, it gets easy to retrieve (almost) any piece of information, here is an example query that finds companies that were recently acquired:
SELECT DISTINCT ?permalink ?name ?year ?month ?code WHERE {
    ?comp cb:exit ?exit ;
          cb:name ?name ;
          cb:crunchbase_url ?permalink .

    ?exit cb:term_code ?code ;
          cb:acquired_year ?year ;
          cb:acquired_month ?month .
}
ORDER BY DESC (?year) DESC (?month)
LIMIT 20
(Query result as HTML)

Or what about a comparison between acquisitions in California and New York:
SELECT DISTINCT COUNT(?link_ca) as ?CA COUNT(?link_ny) as ?NY WHERE {
    ?comp_ca cb:exit ?exit_ca ;
             cb:crunchbase_url ?link_ca ;
             cb:office ?office_ca .
    ?office_ca cb:state_code "CA" .

    ?comp_ny cb:exit ?exit_ny ;
             cb:crunchbase_url ?link_ny ;
             cb:office ?office_ny .
    ?office_ny cb:state_code "NY" .
}
(Results)

These are just some simple examples, but they (hopefully) illustrate how RDF and SPARQL can significantly improve Web app development and community support. But hey, there is more.

Semantic Mashups with SPARQLScript

SPARQL has only just become a W3C recommendation, and the team behind it was smart enough to not add too many features (even the COUNT I used above is not part of the core spec). The community is currently experimenting with SPARQL extensions, and one particular thing that I'm personally very interested in is the creation of SPARQL-driven mashups through something called SPARQLScript (full disclosure: I'm the only one playing with it so far, it's not a standard at all). SPARQLScript enables the federation of script block execution across multiple SPARQL endpoints. In other words, you can integrate data from different sources on the fly.

Imagine you are looking for a job in California at a company that is at a specific funding stage. CrunchBase knows everything about companies, investments, and has structured location data. CrunchBoard on the other hand has job descriptions, but only a single field for City and State, and not the filter options to match our needs. This is where Linked Data shines. If we find a way to link from CrunchBoard to CrunchBase, we can use Semantic Web technology to run queries that include both sources. And with SPARQLScript, we can construct and leverage these links. Below is a script that first loads the CrunchBoard feed of current job offers (only the last 15 entries, due to common RSS' limitations/practices, the use of e.g. hAtom could allow more data to be pulled in). In a second step, it uses the company name to establish a pattern join between CrunchBoard and CrunchBase, which then allows us to retrieve the list of matching jobs at (at least) stage-A companies with offices in California.
PREFIX cboard: <http://www.crunchboard.com>
ENDPOINT <http://cb.semsol.org/sparql>
# refresh feed
if (${GET.refresh}) {
 # replaced <http://feeds.feedburner.com/CrunchboardJobs> with full feed
 LOAD <http://www.crunchboard.com/rss/affiliate/crunchboardrss_all.xml>
}
# let's query
$jobs = SELECT DISTINCT ?job_link ?comp_link ?job_title ?comp_name WHERE {
  # source: crunchboard, using full feed now
  GRAPH <http://www.crunchboard.com/rss/affiliate/crunchboardrss_all.xml> {
    ?job rss:link ?job_link ;
         rss:title ?job_title ;
         cboard:company ?comp_name .
  }
  # source: full graph
  ?comp a cb:Company ;
        cb:name ?comp_name ;
        cb:crunchbase_url ?comp_link ;
        cb:office ?office ;
        cb:funding_round ?round .
  ?office cb:state_code "CA" .
  ?round cb:round_code "a" .
}
(You can test it, this really works.)

Now that we are knee-deep in SemWeb geekery anyway, we can also add another layer to all of this and
  • allow parameterized queries so that the preferred state and investment stage can be freely defined,
  • add a browser-based tool for the collaborative creation of custom API calls
  • add a template mechanism for human-friendly results

I'll write about this "Pimp My API" app at Semantic CrunchBase in the next post. Here are some example API calls that were already created with it:
A lot of fun, more to come.

"Online Social Graph Consolidation" webinale Slides

Slides from my 2nd webinale 08 talk are online
I gave another talk at webinale2008, this one was about how SemWeb technology (XFN, RDF, FOAF, SPARQL, Inference) can help with the aggregation, integration, and consolidation of online social graph fragments spread across Web 2.0 services. Again, I tried to keep things demo-ish (using grawiki for Linked Data editing, and knowee for the integration and consolidation), so the slides themselves (available on slideshare) aren't too spectacular (and in german).

Got some SemWeb DOAP 'n' FOAF?

Starting to collect RDF descriptions of SemWeb projects at rdfer.com
All baby steps, but I've activated a DOAP editor, an RDF/XML loader, and a basic browser store dump at RDFer.com. Would be great to get some DOAP files describing SemWeb projects in there, and maybe some FOAF files as well. That'd make coding the browsers more fun and a bit more real-world-ish.
Thanks for your help!

Merry X-Mas

FOAF and Snow
FOAF in the snow
See you after the snow ;)

Term Shopping for Trackbacks and Projects

Describing trackbacks, and projects with basic vocabularies.
I already mentioned the nice HTTP vocab I'm using to describe page views in RDF. I had to add some custom properties to cover things like visits and access hosts, but the main part of the statistics module is built on top of the W3C vocab. The more I work with RDF, the less I feel comfortable with homegrown terms (although they can be handy for prototyping) and thus spend quite some time on the vocabulary market. Here are two other use cases I was gladly able to model with existing vocabs.

Trackbacks

I wanted to add support for incoming trackbacks to SemSol's blog module. Trackbacks consist of 4 parameters:
  • title (title of the remote post)
  • excerpt (excerpt of the remote post)
  • url (permalink of the remote post)
  • blog_name (name of the remote blog)
Additional local information:
  • date/time of the trackback (i.e. now)
  • permalink of the local post (derived from the trackback URL)
After a fruitful IRC chat with John "SIOC" Breslin, I'm now using (something similar to) the following code:
<$url> a rss:item ;
       an:annotates <$permalink> ;
       dc:title "$title" ;
       dc:description "$excerpt" ;
       dc:date "$now" ;
       dc:source [ dc:title "$blog_name"] .
I could have used rss:description instead of Dublin Core's but thought the structure could more easily be extended to local comments this way. Anyway, as you can see, trackbacks can nicely be described with DC, Annotea, and RSS 1.0.

Projects, Tools, Applications

The second use case comes from RDFer.com where I'd like to make some of the project and tools data collected during 2005 available. Additionally, I want to provide easy editing forms to let members describe and annotate RDF software. For SemanticWeb.org, we invented an swo:Application class to separate (developer) tools from (end-user) apps. But while analyzing the dataset, I saw that there are additional resource types which fit under the generic "project" concept, e.g. lists or data dumps. I was already in the middle of making up a whole bunch of classes when I remembered an earlier DCMI discussion about the negligible difference between dc:type and rdf:type which referred to DCMI Type definitions. Long story short, DC Types (dctype) combined with FOAF (foaf), DOAP (doap), and the DAML Tool vocab (tool) can be used to describe a whole range of resources:
  • general projects (foaf:Project)
  • software projects (doap:Project, which covers non-OS software as well)
  • resource collections (dctype:Collection, dctype:Dataset)
  • software products (dctype:Software or dctype:InteractiveResource, these could be used to e.g. attach tool:price properties which would perhaps look a bit odd on projects)
  • tools (tool:Tool)
  • online services (dctype:Service)
Something like dct:isPartOf could perhaps even be used to model sub-projects, but I'm not 100% sure.

Bottom line, again: no need for new terms, it's (often) all there already.

Proposals: a new RDF collection and an aluminium edition for FOAF

Procrastination
Nothing special to report from my side, just thought I should post something at least once a month. I'm still working on end-user-friendly RDF annotators, a SKOS editor, and started generalizing my RDF store API in order to eventually turn ARC into a complete RDF toolkit. CONFOTO is going to be upgraded as well.

FOAF- alu edition

But, of course, no plan without attractive hooks for distraction: The new CONFOTO server came with a merchandise shop, so I re-activated the 3D tool I used for the SemanticWeb.org banner this weekend and tried to design a Geek-Shirt for the upcoming SemWeb events I'm going to attend (Semantic Web Days in Munich, and ISWC in Galway). I think my shop is only available in German, maybe I should have a look at cafepress as well. And there's still this foaflets scene, anyone interested in making a shirt out of it? (Hm, does the FOAF project have a foaf:tipjar we could use for stuff like that)? However, a free T-Shirt for the first to add David Hasselhoff or another ex-star to the FOAF aluminium (FOAF Lite, ya know) edition.

FOAFlets of the Caribbean

3D FOAFlets created with Bryce5.
While waiting for the US election results coming in last night, I couldn't really concentrate on programming. So I re-arranged my todo list for the semanticweb.org project a little bit and started playing around with Bryce5, a 3D renderer for non-3D-people. I'm thinking about using it for the generation of head graphics for some of the portal's editing tools, or for the site logo. Building the test-FOAFlets was really easy. A fun tool.

foaflets of the caribbean

Archives/Search

YYYY or YYYY/MM
No Posts found

Feeds