History and Challenges of Chemoinformatics

Johann Gasteiger Computer-Chemie-Centrum University of Erlangen-Nürnberg D-91052 Erlangen, Germany

www2.chemie.uni-erlangen.de/

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Overview

• the scope of chemoinformatics

• the beginnings

• a field of ist own

• scientific challenges

• political challenges

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Synthesis of Properties

The most fundamental and lasting objective of synthesis is not production of new compounds but production of properties

George S. Hammond Norris Award Lecture, 1968

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Fundamental Questions in

What structure do I need for a certain property? structure-activity relationships

How do I make this structure? synthesis design

What is the product of my reaction? reaction prediction structure elucidation

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture chemical physical property biological property property

chemical structure synthesis reaction planning structure prediction elucidation

starting materials

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Chemoinformatics - Why?

• complex relationships structure - biological activity chemical reactivity

• amount of information many millions of compounds and reactions many millions of publications

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Number of Compounds in Chemistry

compounds published in CAS

50

40

30

20

10 compounds (millions) 0 1965 1970 1975 1980 1985 1990 1995 2000 2005 year

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture From Data to Knowledge

deductive inductive learning learning know- generalization ledge

information context

measurement data calculation

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Chemoinformatics: Definition

„The use of information technology and management has become a critical part of the drug discovery process. Chemoinformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and organization.“

F. K. Brown, Annual Reports in Medicinal Chemistry 1998, 33, 375-384

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Chemoinformatics: Definition

The application of informatics methods to solve chemical problems

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture The Scope of Chemoinformatrics

• structure representation and searching • data analysis and chemometrics • molecular modeling • spectra analysis and structure elucidation • reaction representation and searching • reaction modeling and synthesis design

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Application Areas for Chemoinformatics

• drug design • analytical chemistry • chemical engineering • inorganic chemistry • medicinal chemistry • organic chemistry • physical chemistry • theoretical chemistry

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Chemoinformatics – An Old Discipline

• structure representation 1965, Morgan • structure elucidation 1965, Sasaki, Munk, DENDRAL • synthesis design 1970, Corey & Wipke, Ugi, Gelernter, Hendrickson • molecular modeling 1970, Langridge, Marshall • data analysis / chemometrics 1970, Kowalski, Wold

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Structure Representation

• European industry: BASF, Hoechst, ICI, Thomae, BASIC, IDC (1965 - ) • Wiswesser Line Notation (1969 - ) • Chemical Abstracts Service: Morgan Algorithm 1965 • Sheffield: M. Lynch, P. Willett (1970 - ) • Paris: J.E.Dubois, DARC (1970 - ) • Munich: I. Ugi, J. Gasteiger, C. Jochum (1972 -)

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Computer-Assisted Structure Elucidation

• DENDRAL: C.Djerassi, J. Lederberg, D.Feigenbaum (1965) • CHEMICS: S.Sasaki (1965) • M.Munk (1968) • C.Steinbeck (1998)

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Computer-Assisted Synthesis Design

1969 Corey + Wipke OCSS LHASA, SECS 1973 Ugi + Gasteiger CICLOPS WODCA, THERESA 1971 Hendrickson SYNGEN 1976 Bersohn SYNSUP 1977 Gelernter SYNCHEM 1985 Hanessian CHIRON 1988 Zefirov FLAMINGOES 1988 Sasaki + Funatsu AIPHOS

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Visualzation of Chemical Structures

LHASA 1970

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Data Analysis Methods

• Chemometrics: B.Kowalski (1970) • PLS: S. Wold (1978) • Self-organizing neural network: Kohonen (1983) • Backpropagation Algorithm: Rumelhart, Hinxton (1987)

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Databases

• Chemical Abstracts Service (1975) • DARC System (1980) • Cambridge CSD (1984) • Inorganic Structures Database (1985) • Beilstein (1990) • Gmelin (1990) • ChemInformRX (1991) • SpecInfo (1991)

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Common Topics: Structure Representation

• data storage and retrieval • property prediction • drug design • synthesis design • spectra analysis and prediction

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Common Topics: Data Analysis Methods

• property prediction • drug design • analytical chemistry • spectra analysis and prediction

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Common Topics

• representation of chemical structures • searching structures in databases • visualization of chemical structures • representation of chemical reactions • data analysis methods

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Handbook of Chemoinformatics From Data to Knowledge

J. Gasteiger (Editor)

65 authors 73 contributions 4 volumes 1900 pages Wiley-VCH, Weinheim (August 2003)

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Chemical Structures - Challenges

• representation of Markush structures (patents) • representation of polymers • conformational flexibility (bioactive conformation) • similarity searching (beyond fingerprints and Tanimoto coefficient)

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Proteins - Challenges

• scoring functions for docking • flexibility of proteins

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Bioinformatics Chemoinformatics

gene protein drug lead

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Information Acquisition - Challenges

• electroníc laboratory notebooks • publishing chemical information (3D structures, spectra) • publishing and searching on the internet • text mining • optical character recognition • input of chemical structure (hand writing, voice)

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Data Mining - Challenges

• descriptor elimination • model validation • automatic model building • definition of applicability domain

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Chemical Reactions - Challenges

• modeling of chemical reactivity • prediction of the course of chemical reactions • synthesis design • prediction of metabolism/degradation (abiotic and biotic) • analysis of biochemical pathways

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture What is a ? COOH COOH O C O EC - Nr.: 4.1.3.7 CH + CH3 C 2 CH2 SCoA HO C COOH

COOH CH2 the bioinformatician COOH an event influenced by a gene, a protein

the computer scientist a context sensitive graph rewriting rule

the an event breaking and making bonds

C3 © Gasteiger et al. /slides/Biochemical_Pathways/Folien/CCC/gcb00.pptIntroduction into CI; SS 03/1st lecture C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Biochemical Pathways

C3 © Gasteiger et al. /slides/Biochemical_Pathways/Folien/CCC/roche_2.pptIntroduction into CI; SS 03/1st lecture Pathway Searching

4 2 + 5 + 6 NADPH NADP H H2O 8 Ribulose 6-Phospho 6-Phospho 3 3 2 5-phosphate 7 -gluconate -gluconolactone 9 CO2 5 7 4 + 4 6 NADPH H 5

10 11 Xylulose Ribose 1 5-phosphate 5-phosphate 2 NADP+

8 1 Glucose 12 13 Glyceraldehyde Sedoheptulose 14 6-phosphate 3-phosphate 7-phosphate maximize NADPH production

10 5[r10] 5[r12] 10[r14] 10[r1] 10[r2] 10[r3] 8[r4] 3[r6] 3[r8] 14 Erythrose 4-phosphate 2[c13] 20[c2] 10[c6] 1[c8] ---> 20[c4] 20[c5] 10[c9] 3[c12] 12 15

12 Glyceraldehyde 15 Fructose 3-phosphate 6-phosphate

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Prediction of Properties - Challenges

• physical

– spectra (CASE) – color of dyes etc

• chemical

– chemical reactivity • biological

– toxicity

• risk assessment (chemical + biological) REACH

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Application Areas for Chemoinformatics - Challenges

• drug design • analytical chemistry • chemical engineering • inorganic chemistry • medicinal chemistry • organic chemistry • physical chemistry • theoretical chemistry

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Teaching

Sheffield UMIST Strasbourg Erlangen Indiana University

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Textbooks on Chemoinformatics

• V. Gillet, A. Leach • J. Gasteiger, T. Engel • J. Bajorath

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Chemoinformatics - A Textbook -

J. Gasteiger, T. Engel (Editors)

650 pages

Wiley-VCH, Weinheim (September 2003)

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Teaching

• define curriculum in chemoinformatics

• what contents of chemoinformatics have to go into regular chemistry curricula

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Cooperation Industry - Academia

• industry: generate data

• academia: develop methods

provide academia access to data

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Funding

• increase awareness for importance of Chemoinformatics

• go into committees

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Get Organized!

Chemometrics Society

QSAR Society

FECS Working Party: Computational Chemistry

Chemoinformatics Society

C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture