How I failed to do
Open Notebook Cheminformatics
Egon Willighagen (@egonwillighagen)
14 July 2014, Jean-Claude Bradley Memorial Symposium
#jcbms
Department of Bioinformatics - BiGCaT 1 Jean-Claude Bradley
Department of Bioinformatics - BiGCaT 2 “Open Notebook Science”
First response: jealousy
Department of Bioinformatics - BiGCaT 3 How I failed
• I did Open Science – Strong focus on reproducibility – Open Source, Open Data, Open Standards (ODOSOS) • I did share notes... • I wrote up the stories (in a blog)...
Department of Bioinformatics - BiGCaT 4 Realization
• Scholars need notebooks – They need exact instructions – Just giving the outcome and tools is not enough • This applies to cheminformatics too
Department of Bioinformatics - BiGCaT 5 First notes where during education
Department of Bioinformatics - BiGCaT 6 ODOSOS
• Software: – The Chemistry Development Kit • based on CompChem, Jmol and JchemPaint – Bioclipse, Jmol, ... • Data – Blue Obelisk Data Repository – RDF translations of knowledge bases • Standards – eNanoMapper ...
Department of Bioinformatics - BiGCaT 7 Scribbling...
Department of Bioinformatics - BiGCaT 8 Scribbline...
Department of Bioinformatics - BiGCaT 9 Scribbline...
Department of Bioinformatics - BiGCaT 10 Why important?
• Going back to the original (raw data). • Pedagogical effect • Education (howto's) • Machines care about negative data • If we want to progress, we need to understand not just global patterns, but the fine details too
Department of Bioinformatics - BiGCaT 11 Why cheminformatics too?
• Where is the latest RDF of solubility data? Of the melting point data? • The trust problem applies to algorithms as much as data • What if...
Department of Bioinformatics - BiGCaT 12 What I have in mind...
WikiPedia, CC-BY-SA, http://en.wikipedia.org/wiki/Curtin %E2%80%93Hammett_principle
Department of Bioinformatics - BiGCaT 13 Possible ONS of cheminformatics
• Is this set of atom types covering ChEBI? • Can we map this metabolomics data to pathways? • How many CAS registry numbers can I resolve for this data set?
Department of Bioinformatics - BiGCaT 14 Conclusion
Some patience is needed, but I will start Open Notebook Science.
(And I will push this concept with my Bart and Cristian too.)
Department of Bioinformatics - BiGCaT 15 Benchmarking / metrics
Department of Bioinformatics - BiGCaT 16