Chemoinformatics
Total Page:16
File Type:pdf, Size:1020Kb
History and Challenges of Chemoinformatics Johann Gasteiger Computer-Chemie-Centrum University of Erlangen-Nürnberg D-91052 Erlangen, Germany www2.chemie.uni-erlangen.de/ C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Overview • the scope of chemoinformatics • the beginnings • a field of ist own • scientific challenges • political challenges C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Synthesis of Properties The most fundamental and lasting objective of synthesis is not production of new compounds but production of properties George S. Hammond Norris Award Lecture, 1968 C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Fundamental Questions in Chemistry What structure do I need for a certain property? structure-activity relationships How do I make this structure? synthesis design What is the product of my reaction? reaction prediction structure elucidation C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture chemical physical property biological property property chemical structure synthesis reaction planning structure prediction elucidation starting materials C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Chemoinformatics - Why? • complex relationships structure - biological activity chemical reactivity • amount of information many millions of compounds and reactions many millions of publications C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Number of Compounds in Chemistry compounds published in CAS 50 40 30 20 10 compounds (millions) 0 1965 1970 1975 1980 1985 1990 1995 2000 2005 year C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture From Data to Knowledge deductive inductive learning learning know- generalization ledge information context measurement data calculation C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Chemoinformatics: Definition „The use of information technology and management has become a critical part of the drug discovery process. Chemoinformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and organization.“ F. K. Brown, Annual Reports in Medicinal Chemistry 1998, 33, 375-384 C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Chemoinformatics: Definition The application of informatics methods to solve chemical problems C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture The Scope of Chemoinformatrics • structure representation and searching • data analysis and chemometrics • molecular modeling • spectra analysis and structure elucidation • reaction representation and searching • reaction modeling and synthesis design C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Application Areas for Chemoinformatics • drug design • analytical chemistry • chemical engineering • inorganic chemistry • medicinal chemistry • organic chemistry • physical chemistry • theoretical chemistry C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Chemoinformatics – An Old Discipline • structure representation 1965, Morgan • structure elucidation 1965, Sasaki, Munk, DENDRAL • synthesis design 1970, Corey & Wipke, Ugi, Gelernter, Hendrickson • molecular modeling 1970, Langridge, Marshall • data analysis / chemometrics 1970, Kowalski, Wold C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Structure Representation • European industry: BASF, Hoechst, ICI, Thomae, BASIC, IDC (1965 - ) • Wiswesser Line Notation (1969 - ) • Chemical Abstracts Service: Morgan Algorithm 1965 • Sheffield: M. Lynch, P. Willett (1970 - ) • Paris: J.E.Dubois, DARC (1970 - ) • Munich: I. Ugi, J. Gasteiger, C. Jochum (1972 -) C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Computer-Assisted Structure Elucidation • DENDRAL: C.Djerassi, J. Lederberg, D.Feigenbaum (1965) • CHEMICS: S.Sasaki (1965) • M.Munk (1968) • C.Steinbeck (1998) C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Computer-Assisted Synthesis Design 1969 Corey + Wipke OCSS LHASA, SECS 1973 Ugi + Gasteiger CICLOPS WODCA, THERESA 1971 Hendrickson SYNGEN 1976 Bersohn SYNSUP 1977 Gelernter SYNCHEM 1985 Hanessian CHIRON 1988 Zefirov FLAMINGOES 1988 Sasaki + Funatsu AIPHOS C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Visualzation of Chemical Structures LHASA 1970 C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Data Analysis Methods • Chemometrics: B.Kowalski (1970) • PLS: S. Wold (1978) • Self-organizing neural network: Kohonen (1983) • Backpropagation Algorithm: Rumelhart, Hinxton (1987) C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Databases • Chemical Abstracts Service (1975) • DARC System (1980) • Cambridge CSD (1984) • Inorganic Structures Database (1985) • Beilstein (1990) • Gmelin (1990) • ChemInformRX (1991) • SpecInfo (1991) C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Common Topics: Structure Representation • data storage and retrieval • property prediction • drug design • synthesis design • spectra analysis and prediction C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Common Topics: Data Analysis Methods • property prediction • drug design • analytical chemistry • spectra analysis and prediction C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Common Topics • representation of chemical structures • searching structures in databases • visualization of chemical structures • representation of chemical reactions • data analysis methods C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Handbook of Chemoinformatics From Data to Knowledge J. Gasteiger (Editor) 65 authors 73 contributions 4 volumes 1900 pages Wiley-VCH, Weinheim (August 2003) C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Chemical Structures - Challenges • representation of Markush structures (patents) • representation of polymers • conformational flexibility (bioactive conformation) • similarity searching (beyond fingerprints and Tanimoto coefficient) C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Proteins - Challenges • scoring functions for docking • flexibility of proteins C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Bioinformatics Chemoinformatics gene protein drug lead C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Information Acquisition - Challenges • electroníc laboratory notebooks • publishing chemical information (3D structures, spectra) • publishing and searching on the internet • text mining • optical character recognition • input of chemical structure (hand writing, voice) C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Data Mining - Challenges • descriptor elimination • model validation • automatic model building • definition of applicability domain C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Chemical Reactions - Challenges • modeling of chemical reactivity • prediction of the course of chemical reactions • synthesis design • prediction of metabolism/degradation (abiotic and biotic) • analysis of biochemical pathways C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture What is a Chemical Reaction? COOH COOH O C O EC - Nr.: 4.1.3.7 CH + CH3 C 2 CH2 SCoA HO C COOH COOH CH2 the bioinformatician COOH an event influenced by a gene, a protein the computer scientist a context sensitive graph rewriting rule the chemist an event breaking and making bonds C3 © Gasteiger et al. /slides/Biochemical_Pathways/Folien/CCC/gcb00.pptIntroduction into CI; SS 03/1st lecture C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Biochemical Pathways C3 © Gasteiger et al. /slides/Biochemical_Pathways/Folien/CCC/roche_2.pptIntroduction into CI; SS 03/1st lecture Pathway Searching 4 2 + 5 + 6 NADPH NADP H H2O 8 Ribulose 6-Phospho 6-Phospho 3 3 2 5-phosphate 7 -gluconate -gluconolactone 9 CO2 5 7 4 + 4 6 NADPH H 5 10 11 Xylulose Ribose 1 5-phosphate 5-phosphate 2 NADP+ 8 1 Glucose 12 13 Glyceraldehyde Sedoheptulose 14 6-phosphate 3-phosphate 7-phosphate maximize NADPH production 10 5[r10] 5[r12] 10[r14] 10[r1] 10[r2] 10[r3] 8[r4] 3[r6] 3[r8] 14 Erythrose 4-phosphate 2[c13] 20[c2] 10[c6] 1[c8] ---> 20[c4] 20[c5] 10[c9] 3[c12] 12 15 12 Glyceraldehyde 15 Fructose 3-phosphate 6-phosphate C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Prediction of Properties - Challenges • physical – spectra (CASE) – color of dyes etc • chemical – chemical reactivity • biological – toxicity • risk assessment (chemical + biological) REACH C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Application Areas for Chemoinformatics - Challenges • drug design • analytical chemistry • chemical engineering • inorganic chemistry • medicinal chemistry • organic chemistry • physical chemistry • theoretical chemistry C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Teaching Sheffield UMIST Strasbourg Erlangen Indiana University C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Textbooks on Chemoinformatics • V. Gillet, A. Leach • J. Gasteiger, T. Engel • J. Bajorath C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Chemoinformatics - A Textbook - J. Gasteiger, T. Engel (Editors) 650 pages Wiley-VCH, Weinheim (September 2003) C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Teaching • define curriculum in chemoinformatics • what contents of chemoinformatics have to go into regular chemistry curricula C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Cooperation Industry - Academia • industry: generate data • academia: develop methods provide academia access to data C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Funding • increase awareness for importance of Chemoinformatics • go into committees C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture Get Organized! Chemometrics Society QSAR Society FECS Working Party: Computational Chemistry Chemoinformatics Society C3 © Gasteiger et al. Introduction into CI; SS 03/1st lecture.