CLARION project – an overview

CLARION project – Cambridge Chemistry Department

The data challenge: Chemistry laboratories produce many types of information and data – raw data, processed data, observations, chemical structures, reaction schemes, experimental write-ups, conclusions, graphs, images, crystallographic, spectroscopy data, papers, references, and so on. It is challenging to store this variety of information such that it is accessible and usable by a variety of users. The challenges include:

  • Storing data in formats that allow its use by specialist data processing tools
  • Using data formats that are suitable for publication and long-term preservation
  • Allowing certain data to be used by people outside the department
  • Motivating researchers to open their data
  • Enhancing the meaning and context of the data to improve its usability
  • Making the data searchable and easily navigable
  • Ensuring that the system has minimal support overheads, yet continually evolves as required to meet changes in the IT environment.

Using an ELN: The Cambridge Chemistry Department has a basic repository which stores crystallographic data. Project CLARION (Cambridge Laboratory Repository In/Organic Notebooks) will create an enhanced repository that captures core types of chemistry data and ensures their access and preservation. The Chemistry Department is implementing a commercial Electronic Laboratory Notebook (ELN) system; CLARION will work closely with the ELN team to create a system for ingesting chemistry data directly into the repository with minimum effort by the researcher.

Enhancing and expanding data usage: CLARION will provide functionality to enable scientists to make selected data available as Open Data for use by people external to the department. The project will use techniques for adding semantic definition to chemical data, including RDF (Resource Description Framework) and CML (Chemical Markup Language). Much of these techniques will be extensible to other disciplines. CLARION will address general issues such as ownership of data, and it will publicise its results to the chemistry and repositories communities. Effort will be put into developing a sustainable business model for operating the repository that can be adopted by the department after project completion.

Timelines: The project runs for two years from April 2009. The initial pilot deployment of the ELN is scheduled for late 2009, and we hope to be publishing open data from it in early 2010.

Project blog:

Twitter: CLARIONproject

Contact: “Brian Brooks” <>


2 Responses to CLARION project – an overview

  1. Hi Brian,

    what will be the Open nature of this project? Will it be Open Source (what license)? Will the project itself be Open?

  2. jimdowning says:

    Hi Egon,
    The code written as part of the project will be OS (probably an artistic or BSD derivative). The ELN solution is a commercial offering.

    The project won’t (can’t) be fully “Open Notebook” since we will have close interaction with the purchase and roll-out of the ELN, which might be commercially sensitive. Nor will it be open in the sense of collaboration: unless there is a latent community of chemistry departments all installing the same ELN software and wishing to collaborate on an open data platform on top of it.

    It *will* be open in the sense that our lessons learned will be recorded publicly. It will also be open in that one of the key aims of the project is to promote the publication of Open Data from the department.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: