finally a bnode with a uri

Posts tagged with: mysql

New ARC version (DB changes!)

ARC revision 2009-03-04 introduces some low-level changes.
I've just uploaded a new ARC revision (rev 2009-03-04). In preparation of a Store Optimizer (which will improve the RDF store's scalability and performance), I slightly changed the underlying MySQL table structure. Please backup your data before you upgrade. Here's sample code, it's not too complicated:
/* old ARC version */
$store->createBackup($backup_path); // make sure the directory is write-enabled
// on success: $store->drop();
/* new ARC */
$store->query('LOAD <' . $backup_path . '>');
(Note: Store settings, if used, are not part of the dump and have to be manually copied over)

Main changes in this version:
  • the (crappy) inferencer was removed, I'll add a rule-based system at some later stage
  • the store indexes can be defined via a "store_indexes" config option now. Default: array('sp (s,p)', 'os (o,s)', 'po (p,o)'). The previous stores had an additional index 'spo (s,p,o)'.
  • The store got an "extendColumns" method which changes the column types from MEDIUMINT to INT. You don't have to call this method explicitly, the tables will be auto-upgraded should your store reach ARC's previous 16M triples limit.
  • There is a new "store_write_buffer" config option (default: 2500, changed from 5000). This option let's you set the batch size of triples written to the MySQL tables. In certain situations, esp.with shared hosts or large literal objects, the "5000" was too much and led to MySQL rejecting the queries.
  • the toTurtle and toRDFXML methods (and associated methods in the
    Serializers) accept a 2nd "raw" parameter now, in case you don't want a full RDF document, but just the triples. (thx to Claudia Wagner for the suggestion)

If you have questions, just send them to the mailing list and I'll try to help.

Basic multi-server LAMP hosting in the GoGrid Cloud

How to set up a multi-server PHP/MySQL app using gogrid.com
When it comes to server administration, I'm one of those rather incompetent persons who are used to running their apps on shared hosts, with FTP being the main deployment tool. This actually worked quite nicely for many years, but during the recent months I've slowly moved into areas where I need more powerful setups. I develop Semantic Web applications in pure PHP, so I need support for long-running PHP background processes to pull in and process data from distributed sources. For improved performance (and scalability), I've built a CMS that lets me separate application servers from RDF stores and allows me to spread the RDF stores across multiple MySQL servers.

With cloud/utility computing increasingly marketed as being ready for the masses, I started testing the various offerings, looking for a solution that would be (almost) as easy to manage as shared hosts. After experiments with Amazon EC2 (way too complicated for me, at least back then), Mosso (unfortunately only for US residents, the UI looks great), Flexiscale (found it unusable UI-wise), and Media Temple gs (great UI, but the grid was entirely unstable. Every 5th request was a 5xx), I came across GoGrid. They offer a simple control panel for managing servers and load balancers, fair pricing, various support channels (with Michael Sheehan doing a great job on Twitter), and with a bit of help from The Google, I managed to get everything up and running "in the Cloud".

I have two apps running on GoGrid servers now and didn't notice any downtimes (apart from self-caused ones): paggr doesn't have much traffic yet as it's still in private alpha, but the server has been running for 3 months now without interruption. The 2nd application is Knowee, a SemWeb system with about 500 bots running as PHP background processes and feeding data into about 200 RDF stores spread across a few servers.

This post describes how to set up a similar GoGrid system, including a Load Balancer, a main PHP App Server, and two MySQL Database Servers. I hope the hints are useful for others, too. (Note: This post is not about horizontal MySQL scaling or replication, the two DBs contain independent data).

Setting up the servers

Activating the Load Balancer, an App Server, and the 2 DB Servers is easy. Just click on "add" in the GoGrid control panel, specify RAM, OS, an IP (from the list of IPs assigned to your account) and an Image:
  • App Server: Ram: 0.5 - 2GB GB / OS: 64bit CentOS / Image: Apache + PHP 5
  • DB Servers: Ram: 0.5 - 2GB GB / OS: 64bit CentOS / Image: MySQL 5
  • Load Balancer: Type: Round Robin / Persistence: Source Address / Virtual IP: pick one of the available IPs / Real IP(s): the IP of your App Server
GoGrid server setup

Save the settings and start the servers (right-click + "start"). One thing that I find a little annoying and that is hopefully going to be improved soon is that you can't change the settings of a server once it is deployed, e.g. to increase RAM. Stopping a server also keeps it being billed (apart from traffic). You always have to delete and re-build servers for changes or temporary down-scaling.

When you're done, your setup should look like this:
GoGrid setup
I'd suggest adding a Load Balancer even if you only have a single App Server. This way you can experiment with different App Servers without having to change the main public IP. And GoGrid's Load Balancing is free!

DNS

There is a page in the GoGrid Help pages about setting up DNS, but if you are maintaining your domains at an external provider, you can simply point the domain at the Load Balancer's IP and things will just work.

App Server: PHP/MySQL setup

For some reason, the server images (even the LAMP ones) don't come with MySQL client libraries, so we have to install them first. Luckily, this is simple on CentOS. Get the necessary root password by right-clicking on the server in the GoGrid control panel, then SSH into your App Server (using the Terminal on a Mac or a tool like Putty on Windows).
ssh root@server.ip.address.here

When you're logged in, install PHP with MySQL support via yum and follow the instructions:
yum install php-mysql

App Server: Optional php.ini tweaks

Should you want to change PHP settings, the php.ini is located at /etc/php.ini. I usually tweak max_execution_time and memory_limit.

App Server: httpd.conf tweaks

You'll find the Apache configuration at /etc/httpd/conf/httpd.conf. Here we have to set at least the ServerName to the site's domain. You may also want to enable .htaccess files in certain directories or disable directory browsing. When you're done, restart Apache:
service httpd restart

DB Servers: MySQL user and database setup

The default MySQL setup provides unprotected access to the database server, so the first thing we have to define is a root password. SSH into the DB server and log into MySQL (This should work without a password, i.e. don't append "-p"):
mysql -u root
Now set the password for root:
UPDATE mysql.user SET Password = PASSWORD('your password here') WHERE User = 'root';
Create the database for your app:
CREATE DATABASE db_name_here;
Create a user account for your DB and make MySQL accept requests from the App Server:
GRANT ALL PRIVILEGES ON db_name_here.* TO "db_user_here"@'app.server.ip.here'
IDENTIFIED BY "db_user_password_here";
The username and password can be freely defined. If you have multiple App Servers that should be able to connect to MySQL, you can define an IP pattern instead, for example app.server.ip.%, or an IP range, or use a domain name (See the MySQL docs for access options).

(By the way, if we were using a local MySQL server, with PHP and MySQL running on the same machine, the command would have been almost identical, we'd just have used "localhost" instead of the App Server's IP.)

Flush the privileges or restart MySQL, then leave the MySQL interface:
FLUSH PRIVILEGES;
exit

DB Servers: Enabling remote access

We are using dedicated MySQL serves in our setup. In order to connect to them from the PHP App Server, we have to enable remote access. There is a detailed how-to in the GoGrid Knowledge Base, but you basically just have to comment out the socket=... line in /etc/my.cnf and restart MySQL:
service mysqld restart

Done

You can now install your app and connect to MySQL from your App Server using the usual PHP commands and the DB servers' IP as MySQL host (instead of the usual "localhost").

In case of "Lost Connection"s

If you notice frequently lost MySQL connection, this might be related to some bug in the MySQL 5.0.x versions. I found a couple of forum posts suggesting to use hostnames instead of IPs to connect to the MySQL server, which indeed solved the problem for me (I also had some broken bots not closing connections, but that's another story ;). You first have to assign domain names to your DB servers (e.g. db1.yoursite.com) and can then set the host parameter to this domain in mysql_connect().


Bottom Line

You do need some terminal hacking to run your sites in the GoGrid Cloud, but I don't think it's more work than configuring a dedicated host. After getting used to the few required shell commands, I'm now able to activate additional servers in a few minutes. Compared to other services, I found GoGrid to be a very efficient and cost-effective solution for setting up and deploying a multi-server app environment.

</lobhudelei>

New ARC features: Triggers and MySQL extensions

ARC gets Triggers and MySQL function extensions
The latest ARC revision got two new features: SPARQL Triggers and MySQL function extensions for SPARQL.

SPARQL Triggers

Triggers in ARC were suggested by Dan Brickley, who is experimenting with dynamically populated/updated Group definitions. What you can effectively do now in ARC is associating custom trigger classes with SPARQL query types, which will then be automatically called after registered query types, for example to refresh inferred Graphs:
$config = array(
  ...
  'store_triggers' => array(
    /* register LOAD triggers */
    'load' => array('updateFriendsList', 'crawlXFNLinks'),
  ),
);
$ep = ARC2::getStoreEndpoint($config);
$ep->go();

MySQL Extension Functions

Morten Frederiksen did it again. He sent in about 10 lines of code which he suggested to add to ARC's SQL rewriter. The effect? ARC suddenly has access to dozens of MySQL functions. That's CONCAT, CURDATE, MD5, UNIX_TIMESTAMP, and many more. A namespace for MySQL function URIs is now online, and queries look like this:
PREFIX mysql: <http://web-semantics.org/ns/mysql/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person WHERE { ?person foaf:givenname ?n1 ; foaf:family_name ?n2 . FILTER (mysql:concat(?n1, " ", ?n2) = "Alec Tronnick") . }
I talked a little bit more about these things with Danny Ayers in a recent podcast.

Archives/Search

YYYY or YYYY/MM
No Posts found

Feeds