Vol. 2/2, June 2005 54

Call for Registration: The CDK 5th Anniversary Workshop

The Chemistry Development Kit turns 5 at the end to interoperate. If interoperatbility is already of September 2005. We are going to celebrate this working, document it. event with a 4 day workshop A debugging session should work on fixing • by known CDK bugs. A RFE implementation session, where we The Chemistry Development Kit (CDK) was • get as many requested enhancements imple- founded on 27 September 2000, when Christoph mented as possible. Steinbeck and Egon Willighagen visited Dan Gezel- ter at the University of Notre Dame near Chicago. A hands-on intra-workshop workshop for • The three had a brainstorm session on the general things we always wanted to know but never design of the core data classes, for which the skele- dared to ask. ton was then implemented by Chris on his flight home to Germany. Today, almost five years later, the There is also room for projects suggested by the • CDK has evolved into an international 30-member participants. Suggestions will be collected until open source chemoinformatics projects with grow- about a week before the workshop. ing public awareness. We are going to celebrate CDK’s fifth birthday The event will take place at Cologne University with a CDK workshop at Cologne University Bioin- Center (CUBIC) and will be orga- formatics Center (CUBIC) from October 10th to 15th. nized by the CUBIC junior research group. Partic- The most important goal of this workshop is to unite ipation is of course free of charge, but participants as many CDK developers and users in Cologne as will have to arrange for their accomodation and liv- possible, to socialize and have fun. Clearly, some ing costs. CUBIC staff will of course be happy to work will also have to be done, so here are some assist you with finding an accomodation suited for items to work on: your budget. Please register for the workshop by email to [email protected]. A five-year plan for the further development of • the CDK. Here, we have to discuss questions Christoph Steinbeck, like: Cologne University Bioinformatics Center (CUBIC), – Does the current CDK design scale? Zülpicher Str. 47, 50674 Köln, Germany – Are large refactoring efforts needed to [email protected] make CDK more competitive? – Which functionality do we need to imple- ment to attract new users? Bibliography [1] issues: Identify other open [1] P. Murray-Rust. The Blue Obelisk. CDK News, • source chemistry projects with which we need 2(2):43–46, 2005.

Communication: An Information System for Proteochemometrics Ola Spjuth, Martin Eklund, and Jarl E.S. Wikberg nal drug design, protein engineering and functional genomics mappings [1, 2].

Introduction Proteochemometric models are multivariate re- gression or machine learning models that predict Proteochemometrics is a new QSAR-related bioinfor- binding affinity for combinations of ligands and tar- matics/chemoinformatics technology with the capa- gets, based on interaction measurements. By calcu- bility to give general insights to molecular recogni- lating molecular descriptors of e.g. small molecules tion processes. It has been proven successful in ratio- and proteins, proteochemometrics yields a single

CDK News ISSN 1614-7553 Vol. 2/2, June 2005 55 model able to predict binding affinity for not only molecule I/O, 2D editing and 3D-visualization (Fig- new small molecules, but also for changes in protein ure 2), leaving us more time to focus on the specific sequence/structure (figure 1). challenges of proteochemometric modelling. Here we describe the development of a high- throughput integrated information and analysis system for proteochemometric modelling that can be used as an expert system aiding study of pro- tein/chemical interactions with the goal of drug de- sign. By integrating bioinformatics with chemoin- formatics, our system will also be able to model other types of interactions such as protein-protein and protein-DNA/RNA interactions.

Example of workflow

A typical workflow for a proteochemometric anal- ysis is shown in Figure 1. Molecules are imported into the information system and stored as CML in a relational database (RDB). Interaction between molecules are defined, and a data set of interactions is set up. Sequences are aligned and descriptors for both molecules and proteins are calculated, after which machine learning methods are used to build a proteochemometric model. After validation, se- quences can be annotated based on predictive influ- ence and visualized. Figure 1: Example of workflow.

Components Current Status The following software components were used in the system: The system currently supports molecule I/O, database storage, interaction and data set manage- Eclipse RCP [3] GUI platform ment, as well as simple proteochemometric analysis CDK Chemoinformatic framework in an integrated graphical interface. Since the project JChemPaint 2D molecular editor is in an early development phase, we have not yet 3D molecular visualization decided on neither a name nor licensing terms. More JFreeChart [4] Chart generation information regarding these issues and code avail- Hibernate [5] Middleware for OO-RDB mapping ability will be publsihed in a suitable journal later ClustalW [6] Multiple sequence alignment and advertised here in CDK News.

We are investigating BioJAVA and BioSQL for se- quence management and annotation, but are cur- Future directions rently using our own implementation. Some future directions include (but are not limited Why CDK to): Scripting language for large, custom work- Our implementation relies on CDK for such tasks • as I/O of molecules, CML-generation, as well as flows descriptor- and forcefield calculations. We chose Distributed computing and storage CDK over Joelib [7] and JChem [8] because of its • many features, active development, and close con- Massive virtual screenings on clusters/GRIDs nection with Jmol and JChemPaint. We are very • impressed with CDK’s rapid improvement and new features, and have benefited greatly from the devel- Expert system to aid drug design • opers willingness to help. Without too much trou- ble we could build a working information system for Other types of interactions •

CDK News ISSN 1614-7553 Vol. 2/2, June 2005 56

Figure 2: Screenshot from the prototype showing viagra [9].

Ola Spjuth, Martin Eklund and Jarl E.S. Wikberg [3] Eclipse rich client platform. http://www. Department of Pharmaceutical Biosciences eclipse.org/rcp, May 2005. Uppsala University SE 751 24 Uppsala, Sweden [4] JFreeChart. http://www.jfree.org/ [email protected] jfreechart, May 2005. [email protected] [email protected] [5] Hibernate. http://www.hibernate.org, May 2005.

Bibliography [6] ClustalW. http://www.ebi.ac.uk/clustalw, May 2005. [1] J.E.S Wikberg, M. Lapinsh, and P. Prusis. Pro- teochemometrics: A tool for modelling the [7] JOElib. http://www-ra.informatik. molecular interaction space. In Kubinyi, H. and uni-tuebingen.de/software/, May Müller, G., editor, Chemogenomics in Drug Discov- 2005. ery - A Medicinal Chemistry Perspective, chapter 10. Wiley-VCH, Weinheim, 2004. [8] JChem. http://www.jchem.com/, May 2005. [2] M Lapinsh, P. Prusis, A. Gutcaits, T. Lundst- edt, and J.E.S. Wikberg. Development of proteo- [9] InChI=1/C22H30N6O4S/c1-5-7-17-19- : A novel technology of use for 20(27(4)25-17)22(29)24-21(23-19)16-14-15(8-9- analysis of drug-receptor interactions. Biochem. 18(16)32-6-2)33(30,31)28-12-10-26(3)11-13-28/h8- Biophys. Acta, pages 180–190, 2001. 9,14H,5-7,10-13H2,1-4H3,(H,23,24,29)/f/h23H.

Editors-in-Chief: CDK News is a publication of the Chemistry De- Egon Willighagen [email protected] and velopment Kit (CDK) project. All articles are copy- Christoph Steinbeck [email protected] righted by their respective authors and licenced un- der the GNU Free Documentation License (FDL). Editorial Board: Submissions can be sent to the Editors-in-Chief. Andreas Bender, Christoph Steinbeck, Egon Wil- lighagen, Rajarshi Guha, Rich Apodaca and Uli CDK Project web pages: Fechner. http://cdk.sourceforge.net/ http://www.chemistry-development-kit.org/

CDK News ISSN 1614-7553