<<

The Chemistry Development Kit An OpenSource Java library for structural chemo- and bioinformatics

Egon Willighagen, Radboud University Nijmegen Christoph Steinbeck, Cologne University BioInformatics Center

18. CIC-Workshop 14.-16. November 2004, Boppard The Chemistry Development Kit?

Library of Standard Algorithms ● Reduce need to rewrite code ● ChemoInformatics education

Toolkit for prototyping ● 2D/3D rendering ● file IO

Java ● Object oriented ● Portability ● Applet (and Internet technologies in general) ● ... speed ? Standard Algorithms

Molinformatics ● IO (CML, MDL, PDB, INChI, ...) ● SMILES parsing and canonical generation ● Isomorphism checking ● Substructure search (and SMARTS) ● Maximal Common Subgraph Searches ● Gasteiger charges ● Ring searching (SSSR) ● Structure Diagram Generation ● 2D Rendering (and 3D via ) ● Fingerprinting ● HOSE codes ● Atom typing Simple example 3D Rendering: Jmol

Rendering Features ● wireframe/-sticks/etc ● protein ● cartoon ● backbone

Rasmol Scripting

Applet and Application History of the Project

September 2000 the CDK emerged from the CompChem libraries used by Jmol, JChemPaint and Seneca.

February 2001 the CDK project registered at SourceForge.net

March 2003

Steinbeck, C. and Han, Y. and Kuhn, S, and Horlacher, O. and Luttmann, E. and Willighagen, E. J.Chem.Inf.Comput.Sci. 2003, 43:493-500

July 2004 first release of CDK News CDK Community

Active development ● 10 active and 30 part-time ● highly internal

Users ● 50+ users on user list ● many projects using the library

Communication ● Email: user list, developers list ● Internet Relay Chat ● Informal meetings ● CDK News CDK News

Newsletter (ISSN 1614/7553) ● With articles on the use of CDK ● ChangeLog / Literature / FAQ ● Free, print copies available

Vol. 1 Issue 2 ● Customizing file IO ● First steps in the implementation of a force field ● Spok - The Spectrum Organisation Kit ● Predictor ● Konqueror web shortcuts to the CDK API A few applications...

Chemistry ● NMRShiftDB ● 2D diagram editor (JChemPaint) ● Seneca (structure elucidation) ● CML Rich Site Summary ● Nomen (IUPAC name parser)

Bioinformatics ● Brenda (enzyme database) ● Pathway analysis ● Enzyme reaction mechanisms

Many more... ● A few commercial programs ● Some project in development NMRShiftDB

C. Steinbeck, S. Kunh et al. J.Chem.Inf.Comp.Sci., 2003, 43:1733-1739 Chemistry enhanced Rich Site Summary (CMLRSS)

P. Murray-Rust, H.S. Rzepa, M.J. Williamson, and E.L. Willighagen, J.Chem.Inf.Comp.Sci., 2004, 44:462 - 469 Enzyme reaction Mechanisms Summary

➔ Large library with key algorithms

➔ Active developer and user community

➔ Has been used in several projects

New areas of interest ● Descriptor calculation (QSAR) ● Structure optimization (force field) Acknowledgments

Code contributions from:

Ulrich Bauer, Fabian Dortu, Dan Gezelter, Rajarshi Guha, Yonquan Han, Kai Hartmann, Christian Hoppe, Oliver Horlacher, Miguel Howard, Geert Josten, Anatoli Krassavine, Stefan Kuhn, Daniel Leidert, Edgar Luttmann, Nathanaël Mazuir, Stephan Michels, Peter Murray-Rust, Chris Pudney, Jonathan Rienstra-Kiracofe, David Robinson, Bhupinder Sandhu, Jean-Sebastien Senecal, Sulev Sild, Bradley Smith, Christoph Steinbeck, Stephan Tomkinson, Joerg Wegner, Stephane Werner, Egon Willighagen, Yong Zhang.