Linking explicit and implicit knowledge
Egon Willighagen
Bioclipse & Proteochemometric Group (Prof. Wikberg) Department of Pharmaceutical Biosciences Uppsala University
2010-05-30 Explicit or implicit? Names...
benzene Problem
Building 3-[4-[3-(1-methyl-7-oxo-3-propyl-4H- Blocks pyrazolo[4,3-d]pyrimidin-5-yl)-4- Conclusion propoxyphenyl]sulfonylpiperazin-1- yl]propanoic acid InChI=1S/C25H34N6O6S/c1-4-6-19-22- 23(29(3)28-19)25(34)27-24(26-22)18-16- 17(7-8-20(18)37-15-5-2)38(35,36)31-13-11- 30(12-14-31)10-9-21(32)33/h7-8,16H,4-6,9- 15H2,1-3H3,(H,32,33)(H,26,27,34)
2010-05-30 Bioclipse & Proteochemometric Group - 2 - Egon Willighagen | chem-bla-ics.blogspot.com Knowledge...
Problem
Building We model our world, but ... Blocks Knowledge is hidden in Conclusion PDFs Transformations are needed Life is not uni- or bivariate, neither is knowledge Information Loss! Solanum lycopersicum...
2010-05-30 Bioclipse & Proteochemometric Group - 3 - Egon Willighagen | chem-bla-ics.blogspot.com Knowledge Representation: Information Loss
Problem
Building Blocks
Conclusion
2010-05-30 Bioclipse & Proteochemometric Group - 4 - Egon Willighagen | chem-bla-ics.blogspot.com Linking Data?
Problem
Building Blocks
Conclusion
http://rdf.openmolecules.net/
2010-05-30 Bioclipse & Proteochemometric Group - 5 - Egon Willighagen | chem-bla-ics.blogspot.com But what about similarity?!?
Problem
Building Blocks Conclusion identitity: owl:sameAs stereochemistry: rdf:seeAlso ? similar molecules: rdf:seeAlso, chem:hasHighTanimoto ?
2010-05-30 Bioclipse & Proteochemometric Group - 6 - Egon Willighagen | chem-bla-ics.blogspot.com ... Molecular reality...
1 000 000 000 000 000 000 000 000 Problem 000 000 000 000 000 000 000 000 Building Blocks 000 000 000 000
Conclusion ... and that just the chemical graphs ...
2010-05-30 Bioclipse & Proteochemometric Group - 7 - Egon Willighagen | chem-bla-ics.blogspot.com OpenMolecules RDF: dereferenceable URI
Problem
Building Blocks
Conclusion
http://rdf.openmolecules.net/
2010-05-30 Bioclipse & Proteochemometric Group - 8 - Egon Willighagen | chem-bla-ics.blogspot.com The Chemistry Development Kit
A Family of Projects CDK-Taverna (chemoinformatics workflows) Problem JChemPaint (semantic 2D editor) Building Blocks ChemoJava (GPL-ed extension) Conclusion Goals library of cheminformatics algorithms educational Usage CDK: 100+ times cited in scientific literature Bioclipse, KNIME, Jumbo (CML), AMBIT, ...
C. Steinbeck et al., J.Chem.Inf.Comput.Sci, 2003 C. Steinbeck et al., Curr.Pharm.Design, 2006
2010-05-30 Bioclipse & Proteochemometric Group - 9 - Egon Willighagen | chem-bla-ics.blogspot.com Bioclipse
Problem
Building Blocks
Conclusion
O. Spjuth et al., BMC Bioinformatics 2007, 8:59
2010-05-30 Bioclipse & Proteochemometric Group - 10 - Egon Willighagen | chem-bla-ics.blogspot.com Integration
Services databases: PubChem Problem
Building web services Blocks Google Spreadsheets Conclusion MyExperiment.org: Bioclipse Scripting Language Twitter, ... journals, ... Techniques SOAP, REST, XMPP, . . . Resource Description Framework dedicated APIs
2010-05-30 Bioclipse & Proteochemometric Group - 11 - Egon Willighagen | chem-bla-ics.blogspot.com Bioclipse-RDF
Problem
Building Blocks local RDF storage Conclusion read/write RDF/XML, N3 run SPARQL queries (local and remote) extract RDF from XHTML/RDFa Thanx to Jena and Pellet.
2010-05-30 Bioclipse & Proteochemometric Group - 12 - Egon Willighagen | chem-bla-ics.blogspot.com QSAR Wizards
Problem
Building Blocks
Conclusion
2010-05-30 Bioclipse & Proteochemometric Group - 13 - Egon Willighagen | chem-bla-ics.blogspot.com MyExperiment: Bioclipse Scripting Language
Problem
Building Blocks
Conclusion
2010-05-30 Bioclipse & Proteochemometric Group - 14 - Egon Willighagen | chem-bla-ics.blogspot.com Bioclipse + OpenTox
REST interaction (using RDF/SPARQL)
Problem listAlgorithms(String service); Building Blocks listDescriptors(String service); Conclusion Needed listDataSets(String service); create, manipulate listCompounds(String service, data sets . . . Integer dataSet); upload molecules downloadCompoundAsMDLMolfile( String service, Integer dataSet, calculate descriptors Integer compound); downloadDataSetAsMDLSDfile( String service, Integer dataSet, String filename);
2010-05-30 Bioclipse & Proteochemometric Group - 15 - Egon Willighagen | chem-bla-ics.blogspot.com Conclusions
Problem
Building What’s next? Blocks Where did this take us? Triple generation on Conclusion Platform to integrate the RDF with demand (XMPP, the computation world SADI, ...) Bioclipse as glue Ontology alignments Scripting, sharing of scripts with Semantic Mediawiki MyExperiment.org integration
2010-05-30 Bioclipse & Proteochemometric Group - 16 - Egon Willighagen | chem-bla-ics.blogspot.com The Details
Problem
Building Blocks http://www.citeulike.org/user/ Conclusion egonw/tag/papers http: //chem-bla-ics.blogspot.com http://egonw.github.com waveto: [email protected]
2010-05-30 Bioclipse & Proteochemometric Group - 17 - Egon Willighagen | chem-bla-ics.blogspot.com