Egon Willighagen (@Egonwillighagen)
Total Page:16
File Type:pdf, Size:1020Kb
How I failed to do Open Notebook Cheminformatics Egon Willighagen (@egonwillighagen) 14 July 2014, Jean-Claude Bradley Memorial Symposium #jcbms Department of Bioinformatics - BiGCaT 1 Jean-Claude Bradley Department of Bioinformatics - BiGCaT 2 “Open Notebook Science” First response: jealousy Department of Bioinformatics - BiGCaT 3 How I failed • I did Open Science – Strong focus on reproducibility – Open Source, Open Data, Open Standards (ODOSOS) • I did share notes... • I wrote up the stories (in a blog)... Department of Bioinformatics - BiGCaT 4 Realization • Scholars need notebooks – They need exact instructions – Just giving the outcome and tools is not enough • This applies to cheminformatics too Department of Bioinformatics - BiGCaT 5 First notes where during education Department of Bioinformatics - BiGCaT 6 ODOSOS • Software: – The Chemistry Development Kit • based on CompChem, Jmol and JchemPaint – Bioclipse, Jmol, ... • Data – Blue Obelisk Data Repository – RDF translations of knowledge bases • Standards – eNanoMapper ... Department of Bioinformatics - BiGCaT 7 Scribbling... Department of Bioinformatics - BiGCaT 8 Scribbline... Department of Bioinformatics - BiGCaT 9 Scribbline... Department of Bioinformatics - BiGCaT 10 Why important? • Going back to the original (raw data). • Pedagogical effect • Education (howto's) • Machines care about negative data • If we want to progress, we need to understand not just global patterns, but the fine details too Department of Bioinformatics - BiGCaT 11 Why cheminformatics too? • Where is the latest RDF of solubility data? Of the melting point data? • The trust problem applies to algorithms as much as data • What if... Department of Bioinformatics - BiGCaT 12 What I have in mind... WikiPedia, CC-BY-SA, http://en.wikipedia.org/wiki/Curtin %E2%80%93Hammett_principle Department of Bioinformatics - BiGCaT 13 Possible ONS of cheminformatics • Is this set of atom types covering ChEBI? • Can we map this metabolomics data to pathways? • How many CAS registry numbers can I resolve for this data set? Department of Bioinformatics - BiGCaT 14 Conclusion Some patience is needed, but I will start Open Notebook Science. (And I will push this concept with my Bart and Cristian too.) Department of Bioinformatics - BiGCaT 15 Benchmarking / metrics Department of Bioinformatics - BiGCaT 16.