authored by Frank Lynam at 08/11/2012 12:50:11
Resource Description Framework or RDF is a data model used to describe a particular ontology or system of knowledge. It is not the only data model out there (the relational database model for example is one which has been extensively and successfully used throughout the history of digital data use) but it has risen to prominence recently as one of the preferred (but again not the only, cf. OWL) data model for representing Linked Open Data datasets.
RDF is built upon a very simple basic structure known as the triple. The triple concept can find its roots in the tenets of structuralism and language theory in general where a sentence can be deconstructed to the simplest level of a subject, which has a relationship with an object. In RDF parlance a triple is composed of a ‘subject’, a ‘predicate’ and an ‘object’. An example would be as follows:
‘The dog’s colour is brown’
In this case the dog is the subject, the predicate is the colour attribute and the object is brown.
So far then, so simple. The key contribution that RDF makes towards an ontological description, however, is its employment of Uniform Resource Identifiers or URIs as subjects, predicates or objects. These URIs can and should in most cases address resources that can be accessed from an online server. This simple idea belies a wealth of potential. By networking each of the component parts of all of your datastore’s nodes of information, your data becomes part of a much wider and interconnected web of data which we know as the Semantic Web. Here’s an example.
My hometown is Dublin.
Dublin is located in Ireland.
Ireland has a population of c4million people.
People is a synonym for humans.
Humans are a type of animal.
The list of possible relationships is infinite and interestingly, each of the parts of the triples described above might exist within different datastores on different servers. There are well-known datasets that are used by most ontologies such as the Dublin Core (dc) or the Friend Of A Friend (foaf) and these are also known as vocabularies. It is advised by most guides on the subject that where possible existing vocabulary elements should be used in order to reduce duplication. RDF can, however, accommodate the equation of two URIs so as to deal with this eventuality. The Linked Open Vocabularies site allows ontology creators to search through a large set of existing vocabularies.
How then does one go about creating an RDF store? Which are the best available RDF hosting solutions? By ‘best’ I suppose I mean which servers are able to host RDF data in a number of different formats securely while also providing access to the data using standardised and robust interfacing techniques. This latter requirement introduces the subject of the SPARQL Protocol and RDF Query Language (or SPARQL, a humorous recursive). As with RDF, SPARQL is a creation of the W3C group but while RDF is concerned with the structured storage of data, SPARQL’s function is to query these RDF datasets. Version 1.0 of SPARQL is currently an official W3C Recommendation although a number of SPARQL implementers now support later extensions to the specification. Most notably these extensions have allowed users to add data to an RDF datastore, something that was absent in the original specification.
Having done some online research into RDF + SPARQL server providers I came upon the 4store solution. I decided to evaluate this server in spite of a native dislike for Linux-based OSes which is all that 4store supports. With a bit (read a very large quantity) of help from those in the know, I now have the 4store server running on an Ubuntu instance that I have hosted with Amazon’s EC2 service. I have been keen to try out EC2 for a while now not least for the fact that I have been entirely incapable of finding out exactly how much they charge for the service. In the end I felt that the only reliable way of working this out was to run a few servers on the service for a month. So far, I am still getting charged about $4/month so here’s hoping that that charge continues.
By this point of the investigation I had my EC2 running an Ubuntu license with 4store running inside it. This gave me my RDF store and my way of getting access to the data. 4store even includes a primitive but usable web form GUI to allow you to send in SPARQL commands in order to access and manipulate the data. In theory, this gave me a Linked Open Data datastore as the server was running on the web and was accessible to any clients that were bothered. I could now if I wanted send a mail off to one of the vocabulary databases and promote the fact that I am now hosting LOD data.
The next step on my RDF road was to download and try out the dotNetRDF SDK. This SDK runs on top of the .NET Framework and so typically will be run on a Windows server (I think that it can also be run with the Mono Framework and a few other SDK flavours but I haven’t tried this out). So along with my Ubuntu instance, I fired up a Windows 2008 Server on Amazon EC2. I was able to get a test ASP.NET web service running on the server and interfacing into the dotNetRDF SDK which in turn made calls to my 4store SPARQL server.
So as I write this post I am now in a position where I can write web services that can access data within my 4store RDF datastore. I’m not quite sure how exactly this RDF data is being stored on the 4store server and no doubt I should really get on top of this lacuna in my knowledge sooner rather than later. Having said that, I am now in a position to start designing a datastore ontology that will accommodate my application’s needs. Once this is done I can then move on to writing a few web service functions that will allow the creation, update and access of this backend datastore. With that web service interface in place I can move on to the planning and writing of frontend client applications. I will probably begin by creating a client-side web application and from there proceed to providing UIs for mobile devices starting with the iPhone (because I know absolutely nothing about any of the other mobile device environments).
More updates soon...