finally a bnode with a uri

Posts tagged with: knowee

Basic multi-server LAMP hosting in the GoGrid Cloud

How to set up a multi-server PHP/MySQL app using gogrid.com
When it comes to server administration, I'm one of those rather incompetent persons who are used to running their apps on shared hosts, with FTP being the main deployment tool. This actually worked quite nicely for many years, but during the recent months I've slowly moved into areas where I need more powerful setups. I develop Semantic Web applications in pure PHP, so I need support for long-running PHP background processes to pull in and process data from distributed sources. For improved performance (and scalability), I've built a CMS that lets me separate application servers from RDF stores and allows me to spread the RDF stores across multiple MySQL servers.

With cloud/utility computing increasingly marketed as being ready for the masses, I started testing the various offerings, looking for a solution that would be (almost) as easy to manage as shared hosts. After experiments with Amazon EC2 (way too complicated for me, at least back then), Mosso (unfortunately only for US residents, the UI looks great), Flexiscale (found it unusable UI-wise), and Media Temple gs (great UI, but the grid was entirely unstable. Every 5th request was a 5xx), I came across GoGrid. They offer a simple control panel for managing servers and load balancers, fair pricing, various support channels (with Michael Sheehan doing a great job on Twitter), and with a bit of help from The Google, I managed to get everything up and running "in the Cloud".

I have two apps running on GoGrid servers now and didn't notice any downtimes (apart from self-caused ones): paggr doesn't have much traffic yet as it's still in private alpha, but the server has been running for 3 months now without interruption. The 2nd application is Knowee, a SemWeb system with about 500 bots running as PHP background processes and feeding data into about 200 RDF stores spread across a few servers.

This post describes how to set up a similar GoGrid system, including a Load Balancer, a main PHP App Server, and two MySQL Database Servers. I hope the hints are useful for others, too. (Note: This post is not about horizontal MySQL scaling or replication, the two DBs contain independent data).

Setting up the servers

Activating the Load Balancer, an App Server, and the 2 DB Servers is easy. Just click on "add" in the GoGrid control panel, specify RAM, OS, an IP (from the list of IPs assigned to your account) and an Image:
  • App Server: Ram: 0.5 - 2GB GB / OS: 64bit CentOS / Image: Apache + PHP 5
  • DB Servers: Ram: 0.5 - 2GB GB / OS: 64bit CentOS / Image: MySQL 5
  • Load Balancer: Type: Round Robin / Persistence: Source Address / Virtual IP: pick one of the available IPs / Real IP(s): the IP of your App Server
GoGrid server setup

Save the settings and start the servers (right-click + "start"). One thing that I find a little annoying and that is hopefully going to be improved soon is that you can't change the settings of a server once it is deployed, e.g. to increase RAM. Stopping a server also keeps it being billed (apart from traffic). You always have to delete and re-build servers for changes or temporary down-scaling.

When you're done, your setup should look like this:
GoGrid setup
I'd suggest adding a Load Balancer even if you only have a single App Server. This way you can experiment with different App Servers without having to change the main public IP. And GoGrid's Load Balancing is free!

DNS

There is a page in the GoGrid Help pages about setting up DNS, but if you are maintaining your domains at an external provider, you can simply point the domain at the Load Balancer's IP and things will just work.

App Server: PHP/MySQL setup

For some reason, the server images (even the LAMP ones) don't come with MySQL client libraries, so we have to install them first. Luckily, this is simple on CentOS. Get the necessary root password by right-clicking on the server in the GoGrid control panel, then SSH into your App Server (using the Terminal on a Mac or a tool like Putty on Windows).
ssh root@server.ip.address.here

When you're logged in, install PHP with MySQL support via yum and follow the instructions:
yum install php-mysql

App Server: Optional php.ini tweaks

Should you want to change PHP settings, the php.ini is located at /etc/php.ini. I usually tweak max_execution_time and memory_limit.

App Server: httpd.conf tweaks

You'll find the Apache configuration at /etc/httpd/conf/httpd.conf. Here we have to set at least the ServerName to the site's domain. You may also want to enable .htaccess files in certain directories or disable directory browsing. When you're done, restart Apache:
service httpd restart

DB Servers: MySQL user and database setup

The default MySQL setup provides unprotected access to the database server, so the first thing we have to define is a root password. SSH into the DB server and log into MySQL (This should work without a password, i.e. don't append "-p"):
mysql -u root
Now set the password for root:
UPDATE mysql.user SET Password = PASSWORD('your password here') WHERE User = 'root';
Create the database for your app:
CREATE DATABASE db_name_here;
Create a user account for your DB and make MySQL accept requests from the App Server:
GRANT ALL PRIVILEGES ON db_name_here.* TO "db_user_here"@'app.server.ip.here'
IDENTIFIED BY "db_user_password_here";
The username and password can be freely defined. If you have multiple App Servers that should be able to connect to MySQL, you can define an IP pattern instead, for example app.server.ip.%, or an IP range, or use a domain name (See the MySQL docs for access options).

(By the way, if we were using a local MySQL server, with PHP and MySQL running on the same machine, the command would have been almost identical, we'd just have used "localhost" instead of the App Server's IP.)

Flush the privileges or restart MySQL, then leave the MySQL interface:
FLUSH PRIVILEGES;
exit

DB Servers: Enabling remote access

We are using dedicated MySQL serves in our setup. In order to connect to them from the PHP App Server, we have to enable remote access. There is a detailed how-to in the GoGrid Knowledge Base, but you basically just have to comment out the socket=... line in /etc/my.cnf and restart MySQL:
service mysqld restart

Done

You can now install your app and connect to MySQL from your App Server using the usual PHP commands and the DB servers' IP as MySQL host (instead of the usual "localhost").

In case of "Lost Connection"s

If you notice frequently lost MySQL connection, this might be related to some bug in the MySQL 5.0.x versions. I found a couple of forum posts suggesting to use hostnames instead of IPs to connect to the MySQL server, which indeed solved the problem for me (I also had some broken bots not closing connections, but that's another story ;). You first have to assign domain names to your DB servers (e.g. db1.yoursite.com) and can then set the host parameter to this domain in mysql_connect().


Bottom Line

You do need some terminal hacking to run your sites in the GoGrid Cloud, but I don't think it's more work than configuring a dedicated host. After getting used to the few required shell commands, I'm now able to activate additional servers in a few minutes. Compared to other services, I found GoGrid to be a very efficient and cost-effective solution for setting up and deploying a multi-server app environment.

</lobhudelei>

Knowee - (The beginning of) a semantic social web address book

Knowee is a web address book that lets you integrate distributed social graph fragments. A new version is online at knowee.net.
Heh, this was planned as a one-week hack but somehow turned into a full re-write that took the complete December. Yesterday, I finally managed to tame the semantic bot army and today I've added a basic RDF editor. A sponsored version is now online at knowee.net, a code bundle for self-hosting will be made available at knowee.org tomorrow.

What is Knowee?

Knowee started as a SWEO project. Given the insane number of online social networks we all joined, together with the increasing amount of machine-readable "social data" sources, we dreamed of a distributed address book, where the owner doesn't have to manually maintain contact data, but instead simply subscribes to remote sources. The address book could then update itself automatically. And -in full SemWeb spirit- you'd get access to your consolidated social graph for re-purposing. There are several open-source projects in this area, most notably NoseRub and DiSo. Knowee is aiming at interoperability with these solutions.
knowee concept

Ingredients

For a webby address book, we need to pick some data formats, vocabularies, data exchange mechanisms, and the general app infrastructure:
  • PHP + MySQL: Knowee is based on the ubiquitous LAMP stack. It tries to keep things simple, you don't need system-level access for third-party components or cron jobs.
  • RDF: Knowee utilizes the Resource Description Framework. RDF gives us a very simple model (triples), lots of different formats (JSON, HTML, XML, ...), and free, low-cost extensibility.
  • FOAF, OpenSocial, microformats, Feeds: FOAF is the leading RDF vocabulary for social information. Feeds (RSS, Atom) are the lowest common denominator for exchanging non-static information. OpenSocial and microformats are more than just schemas, but the respective communities maintain very handy term sets, too. Knowee uses equivalent representations in RDF.
  • SPARQL: SPARQL is the W3C-recommended Query language and API for the Semantic Web.
  • OpenID: OpenID addresses Identity and Authentication requirements.
I'm still working on a solution for access control, the current Knowee version is limited to public data and simple, password-based access restrictions. OAuth is surely worth a look, although Knowee's use case is a little different and may be fine with just OpenID + sessions. Another option could be the impressive FOAF+SSL proposal, I'm not sure if they'll manage to provide a pure-PHP implementation for non-SSL-enabled hosts, though.

Features / Getting Started

This is a quick walk-through to introduce the current version.
Login / Signup
Log in with your (ideally non-XRDS) OpenID and pick a user name.

knowee login

Account setup
Knowee only supports a few services so far. Adding new ones is not hard, though. You can enable the SG API to auto-discover additional accounts. Hit "Proceed" when you're done.

knowee accounts

Profile setup
You can specify whether to make (parts of) your consolidated profile public or not. During the initial setup process, this screen will be almost empty, you can check back later when the semantic bots have done their job. Hit "Proceed".

knowee profile

Dashboard
The Dashboard shows your personal activity stream (later versions may include your contacts' activities, too), system information and a couple of shortcuts.
knowee dashboard

Contacts
The contact editor is still work in progress. So far, you can filter the list, add new entries, and edit existing contacts. The RDF editor is still pretty basic (Changes will be saved to a separate RDF graph, but deleted/changed fields may re-appear after synchronization. This needs more work.) The editor is schema-based and supports the vocabularies mentioned above. You'll be able to create your own fields at some later stage.

It's already possible to import FOAF profiles. Knowee will try to consolidate imported contacts so that you can add data from multiple sources, but then edit the information via a single form. The bot processor is extensible, we'll be able to add additional consolidators at run-time, it only looks at "owl:sameAs" at the moment.
knowee contacts

Enabling the SPARQL API
In the "Settings" section you'll find a form that lets you activate a personal SPARQL API. You can enable/protect read and/or write operations. The SPARQL endpoint provides low-level access to all your data, allows you to explore your social graph, or lets you create backups of your activity stream.

knowee api knowee api

That's more or less it for this version. You can always reset or delete your account, and manually delete incorrectly monitored graphs. The knowee.net system is running on the GoGrid cloud, but I'm still tuning things to let the underlying RDF CMS make better use of the multi-server setup. If things go wrong, blame me, not them. Caching is not fully in place yet, and I've limited the installation to 100 accounts. Give it a try, I'd be happy about feedback.

webinale 2008 starts today

I'm giving 2 talks at webinale 2008 in Karlsruhe
see me speak at webinale 2008 Still a few hours left to finish my presentations, then I'll join Germany's WebDev crowd at the webinale 2008 in Karlsruhe (It's taking place at the same location as this year's ISWC). My talks are about "Semantic Web Tech 'n' Use" (mostly microformats, RDFa, SPARQL), and RDF-based "Online Social Graph Consolidation" (FOAF, XFN, SPARQLy inference, knowee etc.), and there will be more SemWeb-related talks:
A (personally) interesting thing about the webinale is its co-location with the International PHP Conference, and the (new) Dynamic Languages World Europe, and that registering for one conference includes free access to any of the others. It's the perfect audience to talk about practical SemWeb Scripting with ARC and PHP.

Back from SemanticCamp London

SemanticCamp London was just great.
SemanticCamp BadgeLike many other SemWeb weavers, I followed Tom's call for last week-end's SemanticCamp London. So much fun! I used the opportunity to discuss a number of ideas I've been pondering for quite some time, and it was great to be able to get more direct insights from microformats community members. I had the impression that the event helped bringing RDFers and microformateers a little closer together. At least in the conversations I had. There was no childish "your approach is flawed/too limited/doomed to fail", and I think I didn't hear a single (serious) "fundamentally". A lot of "I prefer", "I don't like", and a number of tongue-in-cheek comments, but that's cool as part of a starting dialog. Much better than the progress-blocking arrogance we've seen for much too long (in both camps, btw).

I tried to substantiate this "common goal, complementary tech" notion with two little interactive demos and a tech pitch:
  • On saturday we created SPARQLBot, an IRC Bot based on ARC/Trice that aggregates XFN, hCard, and FOAF data, and lets you explore your "online social graph" with simple IRC commands. SPARQLBot is a nice example how the huge amount of high-quality microformats data can be combined with RDF technologies such as flexible storage and simple querying. (And that it only took a few hours to implement a working demo also shows how SemWeb technologies can significantly improve Web app development.)
  • I pulled an all-nighter from Sat to Sun and managed to demo the knowee beta on Sunday. There are still a few bugs to fix, but an official announcement should come soon now. knowee allows you to consolidate portable social network data (XFN, hCard, FOAF, feeds) and to manage the collected information via a freebase-like hyperdata editor.
  • The third thing is what might be called "micrordf". I didn't run a session, but discussed the idea with a couple of people and think it's worthwhile pursuing. Although certain RDF solutions could be really handy for the µF community, there are a couple of things that are considered deal breakers. Among those are the namespace prefix mechanism (esp. in any of the current RDF-in-HTML proposals, where non-predictable prefixes break reliable self-containment and CSS styling) and the need to map HTML-encoded information to non-identical and unstable RDF Schemas. What I was trying to figure out during SemanticCamp was the possibility of creating a simplified, but still RDF-compatible mechanism that would be acceptable to microformateers. It's essentially a simple, intermediate structure to represent any microformat (no need for a different syntax), and possibly also POSH data. What that would bring to the microformats community is the ability to auto-create universal parsers, a unified mofo-style API, and a proper test suite, which still seems to be lacking. The RDF crowd would get a way to access microformats as resource descriptions, with the ability to map those to their RDF vocab of choice. It could perhaps even be possible to auto-generate GRDDL XSLTs from the micrordf definitions. More on this soon.

As Yves put it already: Yay for SemanticCamp!

RDF Tools - An RDF Store for WordPress

The ARC WordPress Extension adds an RDF Store to the WordPress Blogging System
Together with Morten Frederiksen and Dan Brickley (who is revisiting his SparqlPress idea), I've created a WordPress extension (called "RDF Tools") that adds an (ARC-based) RDF Store and SPARQL Endpoint to the blogging system. The store is kept separate from the WP tables (i.e. it's not a wrapper), but you can use WP's nice admin screens to configure it (screenshot), and given the amount of developer-friendly hooks that WP offers, I'm curious what can be done now, possibly in combination with other extensions such as those Alexandre Passant is working on. It could perhaps also be handy as a deployment accelerator for knowee.

DriftR Linked Data Browser and Editor (Screencast)

A screencast of DriftR, an RDF browser/editor for Trice
While I'm unfortunately struggling to find paid projects these days, I had at least some time to work on core technology for my Trice framework and a new knowee release. The latest module is an in-browser RDF viewer and editor for Linked Data, heavily inspired by the freebase UI (hopefully with less screen flickering, though).

I'm clearly not there yet, but today I uploaded a screencast (quicktime 4MB), and I think I can start incorporating it into the knowee tools soon. Have fun watching it if you like, and Merry X-Mas!

DriftR Screencast

knowee prototype v0.1.0

The first knowee bundle is available, and we've set up a demo system.
Details (well, sort of) are available at knowee.org.
knowee profile

ARC2 Progress

Getting closer to a release date for ARC2
OK, I met this week's 2nd deadline and finished ARC2's SPARQL test suite report. Pass/Fail results as of today: 317/67 (Sept. 22nd: 352/84). That's a huge step forward compared to ARC1, so I'm quite happy.

Next actions: Making the knowee prototype public (deadline missed, boo!), and relaunching the ARC site, together with proper community tools and the new release.

Slowly resurfacing for more SWEOing

Resurfacing from ARC2 and Trice coding for more SWEO work
After two months of spec implementation, I'm finally getting at the more interesting stuff again. I'm not fully on schedule, but I could at least meet the first of this week's three deadlines: I presented a first knowee proof of concept at yesterday's webmontag and feedback was positive. Deadline #3 is a working prototype by this wednesday (promised to SWEO), but I'm not sure I'll be able to deliver. We are close, but there is also deadline #2 lurking: the DAWG implementation reports are due today, and I'm still working on mine for ARC2...

Nevertheless, webmontag was really great again. Had an interesting chat with mixxt's Oliver Ueberholz about the practical problems of adding social data export to SNSs. It seems that microformats are not always the obvious answer when the public export of machine-readable profile information is meant to be implemented as a user option, or when you want to be able to block certain bots from crawling your networks. They are thinking about external files now and wonder if RDF might be an option. Keeping the template code clean, and the ability to serve content for "online social graph aggregators" like knowee from separate machines are two potential benefits. At least the "hidden information is not maintained" argument is moot in their case, as the data is auto-generated anyway.

Last week I had lunch with Alexander Linden, the guy who used to position Semantic Web on the Gartner Hype Cycles. He left Gartner for his own venture (HumanGrid), a crowdsourcing platform. Surprisingly, they are not using SemWeb technology directly, but he said that their solution could be very helpful to generate and quality-improve RDF instance data.

We also talked a bit about SemWeb startup funding, and despite Gartner's latest Hype Cycle, which put SemWeb into the trough of disillusionment for the next 10(!) years, venture capital invested in semantic technology companies is apparently increasing. At least if you are in the US, that is. In Germany, a lot of money still seems to vanish in dodgy projects like smartweb. I hope that theseus is going to have more practical outcomes. They are going to run a competition for non-partners, that's a step in the right direction.

Related to startups and their technology choice is a concern about the lack of end-user semantic web applications that demonstrate the utility of RDF. A Semantic Web is going to be one of the Next Big Things, but that doesn't necessarily mean that it'll be built with W3C technologies. The only big-potential (US) startup with an RDF infrastructure, for example, is generating so much hype that they are doomed to disappoint, no matter what they are going to launch (if they'll ever do). Maybe RDFers should hurry up a little if they want to help avoid a possible backlash. I will, at least.

Alexander said the RDF stack has always been rather tough to sell (especially OWL), and identified some strategies that the SWEO group could focus on during the next couple of months:
  • Admit that the full technology framework is not trivial, it's web-scale information integration after all. If you present it to newbies, always present a consumable subset only, not the full thing (Uh, I'm guilty).
  • Organise more local meetings, BarCamp-style, open to people with related interests (i.e. not-yet-semweb developers)
  • Provide convincing solutions that clearly show how RDF saves money and/or time, or increases productivity in a way that no alternative technology can. CEOs are just one group, a new technology has to attract the developers, because they decide how much friction losses they are willing to accept before they get at the benefits of a new technology. (SWEO is already building a collection of success stories, the Community Projects address these points, too, I think)
  • Something to download and play with for those with initial interest (that's basically Danny's Semantic Web in a box suggestion)
  • Public datasets (Yay LOD project)
An additional suggestion I heard yesterday was "Non-technical Marketing". And that's something SWEO is spending quite some time on, too. (The W3C comm team is actually coming up with a full SemWeb branding strategy soon.) And to cite Dan Brickley:
16:37:57 [danbri] best thing we ever did, was make those tshirts!

So, it seems the SWEO activities are moving in the right direction, but it'd be great to get more ideas. What do you think is still missing or should get a high priority?

knowee.org

the knowee.org site is online, next step will be a prototype
Just a short update on knowee, one of the SWEO Community Projects. There is nice progress, although it took some time to get things moving. An early site is now online, and we have a first design for the app.

I still have to flesh out knowee's approach to "social graph portability" (or whatever it's called this week), but then I'll focus on the prototype which will hopefully be available by Mid/End-September.

Back from webinale 2007

slides and some impressions
webinale/ipc sign The webinale slides are online now. The session went OK, I'd say. I always make the mistake to look at the high conference prices and then end up trying to squeeze too much information into my talks to give the people some value for their money. It also was a bit hard to predict what the audience of the newly introduced webinale would be like. I did receive some great feedback from PHP coders (sneaking in from co-located IPC) who already had specific questions and asked about RAP and ARC. But I could see from many faces right after the session, that a very basic talk may have been better. Leo suggested to skip the ontology stuff entirely, the amount of different flavours (SKOS, RDF Schema, OWL Lite/DL/Full/+/-/1.1) is surely a whole mess marketing-wise. Next time I'll try to stick to the more intuitive stuff. At least I had a convincing demo about how (low-level) ontologies can be useful to greatly reduce custom application code.

I had a short chat with pageflakes' CEO Christoph Janz. Semantic Web technologies are not on their radar yet (maybe they are now ;), but we talked a bit about the possibility to add some RDF functionality to their widgets (which they call "flakes"). They may let us try some things in the context of the knowee project, e.g. a flake that could store contact data retrieved via GRDDL or a SPARQL endpoint. Might be worth checking out their SDK.

So, next time: less OWL, more wild colours:
semweb web 2.0 layers

SWEO project "knowee"

Call for participation
I finally sent out a call for participation for knowee, one of the projects supported by SWEO (just in time for the F2F reports tomorrow).

The project is about creating a semwebby address book thingy, but there actually is another dimension to the "outreach" aspect beyond running code. I'd really like to bring RDFers and microformateers closer together (from both directions). RDFers can learn a lot from the pragmatic microformats community, and adding data integration (+query) functionality to microformats can enable a whole new set of applications.

Funded!

semsol gets funding
This is going to change everything. Well, almost. I will continue to work on my Semantic Web solutions, but there will be a major re-branding and finally a focused roadmap. My code experiments and projects are going to be critically reviewed and consolidated. (I can't tell yet what stuff is going to be continued, but I'll keep my SWEO commitments, esp. the knowee community project which is going to start in April).
Quite some orga action coming up, but I'm looking forward to a clean bengee.reboot()
  • I'll move from Essen to Düsseldorf, which is closer to Cologne, the DUS airport, and also a little away from the Web periphery here, with the Ruhr Valley still in reach, though.
  • The appmosphere wordplay is going to be discontinued. No German really managed to pronounce or remember it correctly, and the *-osphere naming is rather overused these days anyway.
  • The new brand will most probably be semsol.com which is going to be transformed to a Semantic Web Agency. (I've always been a frontend developer, combing this with an in-house RDF system will hopefully form a nice USP for the anticipated move towards info-driven Web apps.)
  • The open source RDF framework currently named semsol will get a new name (perhaps just "semsol suite", we'll see), and there will be more product-style solutions (a browser, an editor, a schema manager, etc.).
  • ARC will keep its name, but is going to be re-coded as ARC2 based on the experience and feedback obtained so far.
  • Less research-y slippery slopes.
  • More Germany-targeted activities.
semsol

SWEO Community Project Task Force

Trying to gather programmers already interested in semweb technology around a few projects.
Kjetil Kjernsmo has initiated a new Semantic Web Education and Outreach Interest Group Task Force called "Community Projects". A great idea.

This rally has the goal of using our collective input to generating real running code, that can help us to demonstrate the value of the Semantic Web to a wide user base. We want to encourage developers to work together to create something that will make a real difference to people's lives today

Just added a proposal for "knowee", a web-based contact organizer (a project similar to something Ivan mentioned some weeks ago, and I think also similar to the work Henry Story recently started).

Archives/Search

YYYY or YYYY/MM
No Posts found

Feeds