History and Challenges of Chemoinformatics
Johann Gasteiger Computer-Chemie-Centrum University of Erlangen-Nürnberg D-91052 Erlangen, Germany
www2.chemie.uni-erlangen.de/
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Overview
• the scope of chemoinformatics
• the beginnings
• a field of ist own
• scientific challenges
• political challenges
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Synthesis of Properties
The most fundamental and lasting objective of synthesis is not production of new compounds but production of properties
George S. Hammond Norris Award Lecture, 1968
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Fundamental Questions in Chemistry
What structure do I need for a certain property? structure-activity relationships
How do I make this structure? synthesis design
What is the product of my reaction? reaction prediction structure elucidation
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture chemical physical property biological property property
chemical structure synthesis reaction planning structure prediction elucidation
starting materials
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Chemoinformatics - Why?
• complex relationships structure - biological activity chemical reactivity
• amount of information many millions of compounds and reactions many millions of publications
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Number of Compounds in Chemistry
compounds published in CAS
50
40
30
20
10 compounds (millions) 0 1965 1970 1975 1980 1985 1990 1995 2000 2005 year
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture From Data to Knowledge
deductive inductive learning learning know- generalization ledge
information context
measurement data calculation
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Chemoinformatics: Definition
„The use of information technology and management has become a critical part of the drug discovery process. Chemoinformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and organization.“
F. K. Brown, Annual Reports in Medicinal Chemistry 1998, 33, 375-384
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Chemoinformatics: Definition
The application of informatics methods to solve chemical problems
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture The Scope of Chemoinformatrics
• structure representation and searching • data analysis and chemometrics • molecular modeling • spectra analysis and structure elucidation • reaction representation and searching • reaction modeling and synthesis design
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Application Areas for Chemoinformatics
• drug design • analytical chemistry • chemical engineering • inorganic chemistry • medicinal chemistry • organic chemistry • physical chemistry • theoretical chemistry
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Chemoinformatics – An Old Discipline
• structure representation 1965, Morgan • structure elucidation 1965, Sasaki, Munk, DENDRAL • synthesis design 1970, Corey & Wipke, Ugi, Gelernter, Hendrickson • molecular modeling 1970, Langridge, Marshall • data analysis / chemometrics 1970, Kowalski, Wold
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Structure Representation
• European industry: BASF, Hoechst, ICI, Thomae, BASIC, IDC (1965 - ) • Wiswesser Line Notation (1969 - ) • Chemical Abstracts Service: Morgan Algorithm 1965 • Sheffield: M. Lynch, P. Willett (1970 - ) • Paris: J.E.Dubois, DARC (1970 - ) • Munich: I. Ugi, J. Gasteiger, C. Jochum (1972 -)
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Computer-Assisted Structure Elucidation
• DENDRAL: C.Djerassi, J. Lederberg, D.Feigenbaum (1965) • CHEMICS: S.Sasaki (1965) • M.Munk (1968) • C.Steinbeck (1998)
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Computer-Assisted Synthesis Design
1969 Corey + Wipke OCSS LHASA, SECS 1973 Ugi + Gasteiger CICLOPS WODCA, THERESA 1971 Hendrickson SYNGEN 1976 Bersohn SYNSUP 1977 Gelernter SYNCHEM 1985 Hanessian CHIRON 1988 Zefirov FLAMINGOES 1988 Sasaki + Funatsu AIPHOS
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Visualzation of Chemical Structures
LHASA 1970
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Data Analysis Methods
• Chemometrics: B.Kowalski (1970) • PLS: S. Wold (1978) • Self-organizing neural network: Kohonen (1983) • Backpropagation Algorithm: Rumelhart, Hinxton (1987)
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Databases
• Chemical Abstracts Service (1975) • DARC System (1980) • Cambridge CSD (1984) • Inorganic Structures Database (1985) • Beilstein (1990) • Gmelin (1990) • ChemInformRX (1991) • SpecInfo (1991)
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Common Topics: Structure Representation
• data storage and retrieval • property prediction • drug design • synthesis design • spectra analysis and prediction
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Common Topics: Data Analysis Methods
• property prediction • drug design • analytical chemistry • spectra analysis and prediction
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Common Topics
• representation of chemical structures • searching structures in databases • visualization of chemical structures • representation of chemical reactions • data analysis methods
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Handbook of Chemoinformatics From Data to Knowledge
J. Gasteiger (Editor)
65 authors 73 contributions 4 volumes 1900 pages Wiley-VCH, Weinheim (August 2003)
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Chemical Structures - Challenges
• representation of Markush structures (patents) • representation of polymers • conformational flexibility (bioactive conformation) • similarity searching (beyond fingerprints and Tanimoto coefficient)
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Proteins - Challenges
• scoring functions for docking • flexibility of proteins
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Bioinformatics Chemoinformatics
gene protein drug lead
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Information Acquisition - Challenges
• electroníc laboratory notebooks • publishing chemical information (3D structures, spectra) • publishing and searching on the internet • text mining • optical character recognition • input of chemical structure (hand writing, voice)
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Data Mining - Challenges
• descriptor elimination • model validation • automatic model building • definition of applicability domain
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Chemical Reactions - Challenges
• modeling of chemical reactivity • prediction of the course of chemical reactions • synthesis design • prediction of metabolism/degradation (abiotic and biotic) • analysis of biochemical pathways
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture What is a Chemical Reaction? COOH COOH O C O EC - Nr.: 4.1.3.7 CH + CH3 C 2 CH2 SCoA HO C COOH
COOH CH2 the bioinformatician COOH an event influenced by a gene, a protein
the computer scientist a context sensitive graph rewriting rule
the chemist an event breaking and making bonds
C3 © Gasteiger et al. /slides/Biochemical_Pathways/Folien/CCC/gcb00.pptIntroduction into CI; SS 03/1st lecture C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Biochemical Pathways
C3 © Gasteiger et al. /slides/Biochemical_Pathways/Folien/CCC/roche_2.pptIntroduction into CI; SS 03/1st lecture Pathway Searching
4 2 + 5 + 6 NADPH NADP H H2O 8 Ribulose 6-Phospho 6-Phospho 3 3 2 5-phosphate 7 -gluconate -gluconolactone 9 CO2 5 7 4 + 4 6 NADPH H 5
10 11 Xylulose Ribose 1 5-phosphate 5-phosphate 2 NADP+
8 1 Glucose 12 13 Glyceraldehyde Sedoheptulose 14 6-phosphate 3-phosphate 7-phosphate maximize NADPH production
10 5[r10] 5[r12] 10[r14] 10[r1] 10[r2] 10[r3] 8[r4] 3[r6] 3[r8] 14 Erythrose 4-phosphate 2[c13] 20[c2] 10[c6] 1[c8] ---> 20[c4] 20[c5] 10[c9] 3[c12] 12 15
12 Glyceraldehyde 15 Fructose 3-phosphate 6-phosphate
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Prediction of Properties - Challenges
• physical
– spectra (CASE) – color of dyes etc
• chemical
– chemical reactivity • biological
– toxicity
• risk assessment (chemical + biological) REACH
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Application Areas for Chemoinformatics - Challenges
• drug design • analytical chemistry • chemical engineering • inorganic chemistry • medicinal chemistry • organic chemistry • physical chemistry • theoretical chemistry
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Teaching
Sheffield UMIST Strasbourg Erlangen Indiana University
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Textbooks on Chemoinformatics
• V. Gillet, A. Leach • J. Gasteiger, T. Engel • J. Bajorath
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Chemoinformatics - A Textbook -
J. Gasteiger, T. Engel (Editors)
650 pages
Wiley-VCH, Weinheim (September 2003)
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Teaching
• define curriculum in chemoinformatics
• what contents of chemoinformatics have to go into regular chemistry curricula
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Cooperation Industry - Academia
• industry: generate data
• academia: develop methods
provide academia access to data
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Funding
• increase awareness for importance of Chemoinformatics
• go into committees
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Get Organized!
Chemometrics Society
QSAR Society
FECS Working Party: Computational Chemistry
Chemoinformatics Society
C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture