The Chemistry Development Kit An OpenSource Java library for structural chemo- and bioinformatics
Egon Willighagen, Radboud University Nijmegen Christoph Steinbeck, Cologne University BioInformatics Center
18. CIC-Workshop 14.-16. November 2004, Boppard The Chemistry Development Kit?
Library of Standard Algorithms ● Reduce need to rewrite code ● ChemoInformatics education
Toolkit for prototyping ● 2D/3D rendering ● file IO
Java ● Object oriented ● Portability ● Applet (and Internet technologies in general) ● ... speed ? Standard Algorithms
Molinformatics ● IO (CML, MDL, PDB, INChI, ...) ● SMILES parsing and canonical generation ● Isomorphism checking ● Substructure search (and SMARTS) ● Maximal Common Subgraph Searches ● Gasteiger charges ● Ring searching (SSSR) ● Structure Diagram Generation ● 2D Rendering (and 3D via Jmol) ● Fingerprinting ● HOSE codes ● Atom typing Simple example 3D Rendering: Jmol
Rendering Features ● wireframe/ball-sticks/etc ● protein ● cartoon ● backbone
Rasmol Scripting
Applet and Application History of the Project
September 2000 the CDK emerged from the CompChem libraries used by Jmol, JChemPaint and Seneca.
February 2001 the CDK project registered at SourceForge.net
March 2003
Steinbeck, C. and Han, Y. and Kuhn, S, and Horlacher, O. and Luttmann, E. and Willighagen, E. J.Chem.Inf.Comput.Sci. 2003, 43:493-500
July 2004 first release of CDK News CDK Community
Active development ● 10 active and 30 part-time ● highly internal
Users ● 50+ users on user list ● many projects using the library
Communication ● Email: user list, developers list ● Internet Relay Chat ● Informal meetings ● CDK News CDK News
Newsletter (ISSN 1614/7553) ● With articles on the use of CDK ● ChangeLog / Literature / FAQ ● Free, print copies available
Vol. 1 Issue 2 ● Customizing file IO ● First steps in the implementation of a force field ● Spok - The Spectrum Organisation Kit ● Predictor ● Konqueror web shortcuts to the CDK API A few applications...
Chemistry ● NMRShiftDB ● 2D diagram editor (JChemPaint) ● Seneca (structure elucidation) ● CML Rich Site Summary ● Nomen (IUPAC name parser)
Bioinformatics ● Brenda (enzyme database) ● Pathway analysis ● Enzyme reaction mechanisms
Many more... ● A few commercial software programs ● Some project in development NMRShiftDB
C. Steinbeck, S. Kunh et al. J.Chem.Inf.Comp.Sci., 2003, 43:1733-1739 Chemistry enhanced Rich Site Summary (CMLRSS)
P. Murray-Rust, H.S. Rzepa, M.J. Williamson, and E.L. Willighagen, J.Chem.Inf.Comp.Sci., 2004, 44:462 - 469 Enzyme reaction Mechanisms Summary
➔ Large library with key algorithms
➔ Active developer and user community
➔ Has been used in several projects
New areas of interest ● Descriptor calculation (QSAR) ● Structure optimization (force field) Acknowledgments
Code contributions from:
Ulrich Bauer, Fabian Dortu, Dan Gezelter, Rajarshi Guha, Yonquan Han, Kai Hartmann, Christian Hoppe, Oliver Horlacher, Miguel Howard, Geert Josten, Anatoli Krassavine, Stefan Kuhn, Daniel Leidert, Edgar Luttmann, Nathanaël Mazuir, Stephan Michels, Peter Murray-Rust, Chris Pudney, Jonathan Rienstra-Kiracofe, David Robinson, Bhupinder Sandhu, Jean-Sebastien Senecal, Sulev Sild, Bradley Smith, Christoph Steinbeck, Stephan Tomkinson, Joerg Wegner, Stephane Werner, Egon Willighagen, Yong Zhang.