On RDF, archaeological data schemas and OWL

authored by Frank Lynam at 12/11/2012 22:09:03

Having done some research into the ArchaeoML standard for archaeological data storage I came upon this email that was sent by David Schloen of the University of Chicago to one Bartosz Dobrzelecki of the University of Edinburg in Jan 2011. It is published here and states that at that point there was no ArchaeoML ‘community’ and that the ontology was solely being used to provide a structure for the backend data being housed by the University of Chicago’s OCHRE project and Berkeley’s Open Context.

This lead me on find a European Union-funded schema called Europeana Data Model, which has been written to provide an ontology to describe the data of all the Europeana content providers. Europeana is a central datastore for cultural collections. Content types include digitized books, paintings, etc. A project called SPQR, which is run by the University of Edinburgh, uses this schema to try and integrate disparate humanities (specifically antique) datasets. Does it appear like everyone is trying to solve this problem of heterogeneous datasets? And a lot of them, by using LOD?

Upon searching in the Linked Open Vocabularies online database of LOD vocabularies I was unable to find a schema that specifically handled archaeological datasets. However, once I searched for ‘heritage’ lots of results emerged out of the ether. Interestingly, the ‘edm’ namespace, which identifies the Europeana Data Model, featured prominently here. When you click on the various edm schema elements they do not dereference to a useful online resource. Instead, they lead you off to the Europeana Data Model main page, which is not so helpful. Some of the other schemas listed (a few from w3.org for instance) do link to RDF data, and this can then be queried. Presumably then, all schemas should have a descriptor RDF file that can be queried by any user or machine for that matter.

Pleiades is a service that comes up repeatedly when investigating RDF and so must be taken seriously. It is a datastore that associates ancient place names with texts and modern georeferenced locations. It also has a decent description of how it goes about using LOD. All of its data (>34,000 ancient places) is located in the one large file.

All of this web searching soon gets a bit tiresome if you don’t put some of the findings into practice. This leads me back to the dotNetRDF SDK. I had been playing around with graphs, triples and nodes but couldn’t work out how to create class type structures that would define the form of my data, something that would essentially be analogous to the Class model in Object Orientated Programming. Luckily, the good people at dotNetRDF have provided the Ontology API which professes to do just that; forget the triples and graphs and abstract all this using the class terminology of RDFS.

So let’s roll back a bit and state some objectives. I want my RDF server to be able to do the following:

1) Provide a SPARQL endpoint

2) Provide real URI endpoints for all my data nodes

a) Serve up individual pages?

3) Provide privacy option for certain nodes

4) Support multiple document types on the backend for the RDF

a) 4store says that it supports many (what is many) RDF data formats.

b) It guesses which one is being used

5) Support the inputting and consumption of data ontologies.

On OWL

OWL is a specification for encoding RDF data with semantic metadata. Essentially this allows you to create hierarchies of data. You need to choose a format in which to write your OWL. I think that I’ll go for OWL RDF/XML as it’s the one that is referenced in the LinkedDataTools.com tutorial.

It appears from this post that dotNetRDF does not support the writing of OWL RDF/XML to an RDF triple store but it does support their reading. 4store’s SPARQL endpoint appears from this to support the inputting of OWL RDF/XML files using the LOAD command so this looks like a potential strategy.

So in order for my project to support the reading and writing of OWL RDF/XML data I need to do the following:

1) Implement my own module that converts semantic data into an OWL RDF/XML file

a) All data classes will have to have something like a WriteToOwlRdfXml function.

2) Check to see that the inbuilt dotNetRDF Framework supports the correct and parsable importing of data that is received from the 4store SPARQL endpoint.

a) There is a problem at the moment where I cannot download the graph as I don’t know what graph name I need to ask for.

3) If I can get the import working then I just need to include a function such as ReadFromOwlRdfGraph to implement the importing in each data class.

Comments

submit