Linking explicit and implicit knowledge

Egon Willighagen

Bioclipse & Proteochemometric Group (Prof. Wikberg) Department of Pharmaceutical Biosciences Uppsala University

2010-05-30 Explicit or implicit? Names...

benzene Problem

Building 3-[4-[3-(1-methyl-7-oxo-3-propyl-4H- Blocks pyrazolo[4,3-d]pyrimidin-5-yl)-4- Conclusion propoxyphenyl]sulfonylpiperazin-1- yl]propanoic acid InChI=1S/C25H34N6O6S/c1-4-6-19-22- 23(29(3)28-19)25(34)27-24(26-22)18-16- 17(7-8-20(18)37-15-5-2)38(35,36)31-13-11- 30(12-14-31)10-9-21(32)33/h7-8,16H,4-6,9- 15H2,1-3H3,(H,32,33)(H,26,27,34)

2010-05-30 Bioclipse & Proteochemometric Group - 2 - Egon Willighagen | chem-bla-ics.blogspot.com Knowledge...

Problem

Building We model our world, but ... Blocks Knowledge is hidden in Conclusion PDFs Transformations are needed Life is not uni- or bivariate, neither is knowledge Information Loss! Solanum lycopersicum...

2010-05-30 Bioclipse & Proteochemometric Group - 3 - Egon Willighagen | chem-bla-ics.blogspot.com Knowledge Representation: Information Loss

Problem

Building Blocks

Conclusion

2010-05-30 Bioclipse & Proteochemometric Group - 4 - Egon Willighagen | chem-bla-ics.blogspot.com Linking Data?

Problem

Building Blocks

Conclusion

http://rdf.openmolecules.net/

2010-05-30 Bioclipse & Proteochemometric Group - 5 - Egon Willighagen | chem-bla-ics.blogspot.com But what about similarity?!?

Problem

Building Blocks Conclusion identitity: owl:sameAs stereochemistry: rdf:seeAlso ? similar molecules: rdf:seeAlso, chem:hasHighTanimoto ?

2010-05-30 Bioclipse & Proteochemometric Group - 6 - Egon Willighagen | chem-bla-ics.blogspot.com ... Molecular reality...

1 000 000 000 000 000 000 000 000 Problem 000 000 000 000 000 000 000 000 Building Blocks 000 000 000 000

Conclusion ... and that just the chemical graphs ...

2010-05-30 Bioclipse & Proteochemometric Group - 7 - Egon Willighagen | chem-bla-ics.blogspot.com OpenMolecules RDF: dereferenceable URI

Problem

Building Blocks

Conclusion

http://rdf.openmolecules.net/

2010-05-30 Bioclipse & Proteochemometric Group - 8 - Egon Willighagen | chem-bla-ics.blogspot.com The Chemistry Development Kit

A Family of Projects CDK-Taverna (chemoinformatics workflows) Problem JChemPaint (semantic 2D editor) Building Blocks ChemoJava (GPL-ed extension) Conclusion Goals library of algorithms educational Usage CDK: 100+ times cited in scientific literature Bioclipse, KNIME, Jumbo (CML), AMBIT, ...

C. Steinbeck et al., J.Chem.Inf.Comput.Sci, 2003 C. Steinbeck et al., Curr.Pharm.Design, 2006

2010-05-30 Bioclipse & Proteochemometric Group - 9 - Egon Willighagen | chem-bla-ics.blogspot.com Bioclipse

Problem

Building Blocks

Conclusion

O. Spjuth et al., BMC 2007, 8:59

2010-05-30 Bioclipse & Proteochemometric Group - 10 - Egon Willighagen | chem-bla-ics.blogspot.com Integration

Services databases: PubChem Problem

Building web services Blocks Google Spreadsheets Conclusion MyExperiment.org: Bioclipse Scripting Language Twitter, ... journals, ... Techniques SOAP, REST, XMPP, . . . Resource Description Framework dedicated APIs

2010-05-30 Bioclipse & Proteochemometric Group - 11 - Egon Willighagen | chem-bla-ics.blogspot.com Bioclipse-RDF

Problem

Building Blocks local RDF storage Conclusion read/write RDF/XML, N3 run SPARQL queries (local and remote) extract RDF from XHTML/RDFa Thanx to Jena and Pellet.

2010-05-30 Bioclipse & Proteochemometric Group - 12 - Egon Willighagen | chem-bla-ics.blogspot.com QSAR Wizards

Problem

Building Blocks

Conclusion

2010-05-30 Bioclipse & Proteochemometric Group - 13 - Egon Willighagen | chem-bla-ics.blogspot.com MyExperiment: Bioclipse Scripting Language

Problem

Building Blocks

Conclusion

2010-05-30 Bioclipse & Proteochemometric Group - 14 - Egon Willighagen | chem-bla-ics.blogspot.com Bioclipse + OpenTox

REST interaction (using RDF/SPARQL)

Problem listAlgorithms(String service); Building Blocks listDescriptors(String service); Conclusion Needed listDataSets(String service); create, manipulate listCompounds(String service, data sets . . . Integer dataSet); upload molecules downloadCompoundAsMDLMolfile( String service, Integer dataSet, calculate descriptors Integer compound); downloadDataSetAsMDLSDfile( String service, Integer dataSet, String filename);

2010-05-30 Bioclipse & Proteochemometric Group - 15 - Egon Willighagen | chem-bla-ics.blogspot.com Conclusions

Problem

Building What’s next? Blocks Where did this take us? Triple generation on Conclusion Platform to integrate the RDF with demand (XMPP, the computation world SADI, ...) Bioclipse as glue Ontology alignments Scripting, sharing of scripts with Semantic Mediawiki MyExperiment.org integration

2010-05-30 Bioclipse & Proteochemometric Group - 16 - Egon Willighagen | chem-bla-ics.blogspot.com The Details

Problem

Building Blocks http://www.citeulike.org/user/ Conclusion egonw/tag/papers http: //chem-bla-ics.blogspot.com http://egonw.github.com waveto: [email protected]

2010-05-30 Bioclipse & Proteochemometric Group - 17 - Egon Willighagen | chem-bla-ics.blogspot.com