Data flow from Source to Destination

August 13, 2010

The CLARION project has achieved a major milestone – a full working prototype of the system. Data can now flow from its source through to its destination RDF repository. Along the way it gets reviewed, assessed for when it can be released, marked-up with contextual metadata, transformed into CML and then RDF, and finally stored in a semantic repository, where it can be queried with a SPARQL query.

So the stages involved now look like this:

  • Chemist creates crystallographic data and stores it in their file store
  • An atom-feed provides notification of the new data to the Embargo Manager (EmMa)
  • EmMa emails scientist notifying her about new data needing reviewing
  • Scientist uses EmMa to review data, add contextual metadata, and specify conditions for releasing data from embargo
  • Scientist is notified a week before embargo-release condition is met
  • EmMa waits for embargo conditions to be met
  • Crystallography data is transformed into RDF and stored in repository
  • Data is queried using SPARQL queries

Our next activities will be to consolidate some of the infrastructural code used, and to ensure that the security used to restrict data access is adequate. We will then work on adding chemical search indexing, and importing NMR data.