10/26/2016
Chemoinformatics as a theoretical chemistry discipline
Alexandre Varnek University of Strasbourg
BigChem lecture, 26 October 2016
Chemoinformatics: a new discipline …
Chemoinformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and optimization”
Frank Brown, 1998
2
1 10/26/2016
Chemoinformatics: definition
Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination, visualization, and use of chemical information G. Paris, 1998
Chemoinformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and optimization” F.K. Brown, 1998
Chemoinformatics is the application of informatics methods to solve chemical problems J. Gasteiger, 2004
Chemoinformatics is a field based on the representation of molecules as objects (graphs or vectors) in a chemical space A. Varnek & I. Baskin, 2011
Chemoinformatics: new disciline combining several „old“ fields
• Chemical databases Michael Lynch Peter Willett
• Structure‐Activity modeling (QSAR) Corwin Hansch Johann Gasteiger
• Structure‐based drug design Irwin D. Kuntz Hans‐Joachim Böhm
• Computer‐aided synthesis design
Elias Corey Ivar Ugi 4
2 10/26/2016
Selected books in chemoinformatics
Chemoinformatics: intersection of chemistry, computer science, mathematics, biology, material science, …
Is Chemoinformatics an individual scientific discipline or just a mixture of methods and concepts imported from different fields ?
3 10/26/2016
Chemoinformatics is defined as individual discipline characterized by its own molecular model, basic concepts, major applications and learning approach
7
OUTLOOK
• Needs in chemoinformatics
• 3 complementary modeling disciplines o Quantum Chemistry, FF modeling and Chemoinformatics –
• Fundamentals of Chemoinformatics o Chemical Space paradigm: graphs‐based and descriptors based CS o Modeling background: Machine learning methods.
• Chemoinformatics and "Sister" Disciplines o Machine Learning, Chemometrics and Bioinformatics 8
4 10/26/2016
Needs in Chemoinformatics
Big Data Challenge
. > 108 compounds are currently available
. 1033 drug‐like molecules could be synthesized (see P. Polischuk, T. Madzidov , A. Varnek., JCAMD, 2013)
Goal: to select few useful compounds from huge chemical database
5 10/26/2016
Screening: finding the needle in the haystack
CHEMICAL DATABASE
~106 –109 molecules
Chemoinformatics: pattern recognition in chemistry
CHEMICAL DATABASE model
~106 –109 molecules ‐ Specific structural motifs, ‐ Selected molecular properties (shape, fields, …), ‐ Interaction patterns, ‐ Mathematical equations Property= F (structure)
6 10/26/2016
Virtual screening “funnel”
Filters Similarity search CHEMICAL DATABASE (Q)SAR Pharmacophore Docking
VIRTUAL SCREENING
HITS ~106 –109 molecules ~101 –103 molecules
INACTIVES
Theoretical chemistry
Quantum Chemistry
Force Field Molecular Modelling
Chemoinformatics
7 10/26/2016
Theoretical chemistry
Quantum Chemistry ‐ Molecular model ‐ Basic concepts Force Field ‐ Major applications Molecular Modelling ‐ Learning approaches
Chemoinformatics
Molecular Model
Quantum Chemistry electrons and nuclei
Force Field atoms and bonds Molecular Modelling
• molecular graphs Chemoinformatics • descriptor vectors
8 10/26/2016
Basic concepts
Quantum Chemistry wave/particle dualism
Force Field classical mechanics Molecular Modelling
Chemoinformatics chemical space
Basic approaches
Quantum Chemistry Schrödinger equation, HF, DFT, …
Force Field implementation Force Field in molecular mechanics, Molecular Modelling and molecular dynamics
- graphs theory, - statistical learning Chemoinformatics - similarity/diversity
9 10/26/2016
Major applications
-interpretation of known phenomena Quantum Chemistry -property assessment in a very limited scale
-property assessment Force Field in a limited scale -interpretation of Molecular Modelling known phenomena
-storage, organisation and search of structures Chemoinformatics (chemical databases) - property / activity assessment
Direct link with a given property
Quantum Chemistry very limited number of properties
Force Field limited number Molecular Modelling of properties
Chemoinformatics any property
10 10/26/2016
Learning approach
• In chemoinformatics the logic of learning is not basedonexistingphysicaltheories. Chemoinformatics considers the world too complex to be aprioridescribed by any set of rules. Thus, the rules (models) in chemoinformatics are not explicitly taken from rigorous physical models, but learned inductively from the data.
Chemoinformatics: From Data to Knowledge
Deductive Inductive learning learning
know‐ ledge
Quantum Chemoinformatics Mecahnics information
data
11 10/26/2016
Organic chemistry: exercise of «intuitive» chemoinformatics
Chemical Space paradigm
Chemoinformatics is a field dealing with molecular objects (graphs, vectors) in chemical space
12 10/26/2016
Chemical Space paradigm
graphs-based descriptors -based
SPACE = objects + relations between them
Scaffolds and Frameworks
R‐groups
Bemis, G.W.; Murcko, M.A. J.Med.Chem 1996, 39, 2887‐2893
13 10/26/2016
The Scaffold Tree − Visualiza on of the Scaffold Universe by Hierarchical Scaffold Classification A. Schuffenhauer, P. Ertl, et al. J. Chem. Inf. Model., 2007, 47 (1), 47‐58
27
Scaffold tree for the results of pyruvate kinase assay. Color intensity represents the ratio of active and inactive molecules with these scaffolds.
A. Schuffenhauer, P. Ertl, S. Roggo, S. Wetzel, M. A. Koch, and H.Waldmann J. Chem. Inf. Model., 2007, 47 (1), 47‐58 28
14 10/26/2016
Maximal Common Substructure (MCS) similarity index