The CLARION tech team (Sam Adams, Nick Day and myself) met with John Davies this afternoon to discuss the technical details of how we are going to bring the X-ray crystallography data into the CLARION infrastructure. In our parlance; how the X-ray Adapter will query the data and present it as an Atom feed for consumption by EmMa (the Embargo Manager).
John has two sources of data we could usefully import: –
- Processed data in CIF format
- Hand-drawn 2D molecule diagrams for each sample, and author names in CCDC database format.
We’d love to get hold of those diagrams – they’re better than the ones we can generate automatically, especially for the complicated and organometallic molecules some of the chemists in the department specialize in. The names of the authors would also be really useful for the embargo manager, and would avoid duplicated human effort. The problem is that the CCDC database format is (as far as we know) binary and proprietary. So, we’re going to do some quick investigations to see whether it’s possible to use some of CCDC’s software to extract the data in more tractable formats.
Physically getting hold of the data itself looks to be pretty straightforward, we should be able to use
cron to drive
rsync and then our adapter (which is essentially a feed builder). As the meerkat says: “Simples!”