Productive meeting with John Davies, departmental crystallographer

The CLARION tech team (Sam Adams, Nick Day and myself) met with John Davies this afternoon to discuss the technical details of how we are going to bring the X-ray crystallography data into the CLARION infrastructure. In our parlance; how the X-ray Adapter will query the data and present it as an Atom feed for consumption by EmMa (the Embargo Manager).

John has two sources of data we could usefully import: –

  1. Processed data in CIF format
  2. Hand-drawn 2D molecule diagrams for each sample, and author names in CCDC database format.

We’d love to get hold of those diagrams – they’re better than the ones we can generate automatically, especially for the complicated and organometallic molecules some of the chemists in the department specialize in. The names of the authors would also be really useful for the embargo manager, and would avoid duplicated human effort. The problem is that the CCDC database format is (as far as we know) binary and proprietary. So, we’re going to do some quick investigations to see whether it’s possible to use some of CCDC’s software to extract the data in more tractable formats.

Physically getting hold of the data itself looks to be pretty straightforward, we should be able to use cron to drive rsync and then our adapter (which is essentially a feed builder). As the meerkat says: “Simples!”


6 Responses to Productive meeting with John Davies, departmental crystallographer

  1. Are diagrams such as EFEMUX01 an example of one of the less-good automatically generated ones? Is that generated from the CIF file?

  2. jimdowning says:

    I don’t know how that one was produced, but the diag doesn’t look too bad. The software we use (CDK) would probably make a nice job of that too. It’s things like that are more of a problem

  3. For your example, how was that produced; by a human?

    • jimdowning says:

      The example I gave was created by (a slightly aged version of) CDK; an Open Source Java toolkit.

      • A simply heuristic might actually be to make bonds to metallic centers (or just with coordination higher than 4) longer in the diagram; perhaps, 1.75 the normal distance…

  4. Agreed, the EFEMUX01 would not be a problem at all for the CDK. The CDK does have more trouble with larger rings (it does not use a honeycomb pattern) and doesn’t have the heuristics to deal with metallic centers, as Jim indicated. It does have, however, have a templating system though it does not contain many entries.

    Jim, in that respect, the list of hand-drawn images can be really interesting, *if* available as connection table… we could think of some script to generate templates for common organometallic groups…

