I completed my (hopefully) final annual PhD review meeting a couple of weeks ago and one of the things that came out of the discussion was the need to do some more work on archaeological Linked Open Data datasets from the perspective of the data consumer. Up until now, I have largely focussed on the subject in the context of a data provider. In that context, I set about building a micro site that would be used to host LOD-compliant data for the Atsipadhes Korakias Peak Sanctuary Project and to use that LOD data to present a geospatial view onto the material findings at the project. The result can be seen at http://atsipadhes.linkedarc.net.
For my main PhD project website, http://linkedarc.net, I decided to build the RDF triplestore myself. I went for a hybrid triplestore in the end with MySQL serving as the actual datastore at the backend, with interfaces allowing for the MySQL data to be accessed by LOD data consumers. For the Atsipadhes datastore I decided to have a look at some of the off-the-shelf RDF triplestore solutions and I was pleasantly surprised with how quickly and easily I managed to get Apache Jena with its Fuseki SPARQL interface up and running. In hindsight, I guess it pays to do a bit of background research before spending a few months building something from scratch. On the other hand, building a solution yourself always gives you a better appreciation of the substantive issues at play. Ultimately, the best approach probably lies somewhere in between.
The Atsipadhes site is pretty simple in terms of functionality and I like it for this reason. Single-function websites seem to be more the norm these days and this fits in nicely with the RDF data consumer model: establish an interesting way of accessing, crunching and displaying an LOD datastore and then create a defined site to achieve the objective. The Atsipadhes dataset includes about 4,500 figurine part finds. Each of these finds has been meticulously catalogued by the Atsipadhes project team, which is led by Dr Alan Peatfield and Dr Christine Morris. They were categorised in terms of their gender, form, gesture and fabric. My first job therefore was to create a standardised set of vocabularies that could then be used to construct the RDF data for the assemblage. The data was original created as a FileMaker Pro database and my first step was to export this as an Excel spreadsheet. I then exported this in turn as a CSV file and, having heard talk of Google Refine (now OpenRefine) at CAA2014, I used this app to clean up the data. I would wholeheartedly recommend using OpenRefine to help with the often-tedious problem of cleaning up data. Up until now, I have generally done this in Excel but there’s no doubt that OpenRefine adds a lot to the process.
I then needed a conceptual model with which to structure the data and I felt that it was about time that I got my teeth into CIDOC CRM, which is currently the most popular approach to providing a semantic model for your RDF data. I followed closely the implementation used by the excellent British Museum LOD service and for the archaeology specific conceptual modelling I employed English Heritage’s CRM-EH CIDOC CRM extension. Ultimately, this process resulted in my creating a four-level hierarchy of objects for the Atsipadhes data. A site object can contain trench objects, which can each contain level objects, which can each contain find objects.