Lecture 1 Pharmacophores & Molecular Similarity

Invited Guest Professorship Université Lois Pasteur, Strasbourg

Prof. Dr. Gisbert Schneider Goethe-University, Frankfurt

18 November 2008, (c) G. Schneider Transmutation of species

“[..] Thus between A & B immense gap of relation. C & B the finest gradation, B & D rather greater distinction. Thus genera would be formed. — bearing relation”

Charles R. Darwin (*1809) Notebook B: Transmutation of Species (1837-1838), p.36 Source: www.darwin-online.org.uk 3 Representations of Aspirin®

8 9 HO O 7 6 10 O 12 1 5 11 OH 2 4 13 3

4 Discrete Approximation: Atom

Corey-Pauling-Koltun (CPK) model • Van der Waals radius “effective size” -10 (in Ångstöm; 1 Å = 10 m) OH

• Atom = “hard sphere”

Element r / Å r H 1.2 F 1.35 F O 1.4 Cl N 1.55 C 1.7 S 1.85 O OH Cl 1.8 P 1.9 O

O

5 Discrete Approximation: Surface

Solvent accessible surface (SAS) Connolly surface vdW surface

Probe Sphere

H2O: r = 1.4 Å

Lee-Richards surface

6 Molecular Interaction: Shape Complementarity

Receptor

N

O N O H N N HN 2 OH NN O OH

NH2

Dihydrofolatereductase (DHFR) Methotrexate

Protein Data Bank (PDB) entry: 3dfr

www.pdb.org

7 Concept of Local Atom Environments

Atom properties are determined by their local environment.

topological environment (molecular graph)

spatial (3D) environment (3D model, conformation )

Oxygen atoms Carbon atoms with different properties with different properties

8 What is a ?

O OH

O

O Aspirin® Increasing “Fuzziness”

• Pharmacophoric representation scaffold-hopping on different levels of abstraction

9 Pharmacophore Definition

GLOSSARY OF TERMS USED IN (IUPAC Recommendations 1998) http://www.chem.qmul.ac.uk/iupac/medchem/

Pharmacophore (pharmacophoric pattern) A pharmacophore is the ensemble of steric and electronic features that is necessary to ensure the optimal supramolecular interactions with a specific structure and to trigger (or to block) its biological response. (A pharmacophore does not represent a real molecule or a real association of functional groups, but a purely abstract concept that accounts for the common molecular interaction capacities of a group of compounds towards their target structure.) Pharmacophoric descriptors Pharmacophoric descriptors are used to define a pharmacophore, including H-bonding, hydrophobic and electrostatic interaction sites, defined by atoms, ring centers and virtual points. 10 Pharmacophoric Features

Receptor Potential Pharmacophore Points (PPP) • H-bond donors • H-bond acceptors N • Lipophilic • Positive OH Ligand • Negative

• e.g., bitstring representation: 00010100010110011

11 Pharmacophoric Types of Functional Groups

H O NH N 2 Donor N H

O Acceptor N

O OH H N NH Donor + Acceptor N

H O O N CF3 Acid N S N (negative ionizable) OH N H O

NH Base N (positive ionizable) NH2 NH2

O Atoms excluded O N (non pharmacophoric) N O O 12 Extended List of Pharmacophoric Points

H sp3 H sp3 H sp2,aromatic sp2,aromatic Hydrogen bond H H sp2 N N N N donors O N sp3 H H H

3 2,aromatic Hydrogen bond H sp H sp3 C,H sp any O, except NO H N N N acceptors 2 N sp3 H H,C

sp3 sp3 H sp3 H sp3 Positively charged + N N N or ionizable N sp3 sp3 H

Negatively charged O or ionizable C,S,P HO

C,F,Cl,Br,I,-S- H Lipophilic C S S S Cl,Br,I

13 Scaffold-hopping: From function to molecular structure

Exploit multiple binding behavior of targets QSAR Comb. Sci. (2003) 22:713

Cl O

NH NO2 GW9662 O O S N NH

Pioglitazone O PPARγ N N O S O

N COOH

N HN

14 Re-purposing: From molecular structure to function

Exploit multiple binding behavior of ligands J. Med. Chem. (2002) 45:137

Phosphodiesterase-III OH OH O CF conductance regulator α-glucosidase L-type Ca2+ channel GABAa HO O Topoisomerase II Genistein Estrogen receptor beta

N H1 receptor Dopamine D1 Dopamine D3 N Dopamine 4.2 N α-1 adrenoceptor α-2 adrenoceptor S Serotonin 5-HT HN 2A Serotonin 5-HT2C Muscarinergic receptor Olanzapine (Zyprexa®)

15 How to Find More Chemotypes

1) Ask a medicinal chemist

2) “Fuzzify” the compound sampling procedure

• Fuzzy Pharmacophores • Shape -based / Sampling

3) Use natural product-derived building blocks

4) Employ “de novo” design

Schneider et al. (2006) QSAR Comb. Sci. 22:713 Renner & Schneider (2004) J. Med. Chem. 47:4653.

16 Applications: Scaffold-hopping

Calcium T-channel blocker Angew. Chem. Int. Ed. (1999) 38:2894

F O O CATS N O N Cl N N HN F NH O Mibefradil F

Kv1.5 potassium channel blocker Angew. Chem. Int. Ed. (2000) 39:4130

O O O O S NH H CATS S NH HN O N O HO O

17 CATS Chemically Advanced Template Search

18 CATS: Topological Pharmacophores

Structure O Histogram O H N x = {0.2, 0.1, 0.3, ...} S a) O N P

Bin b) 1 … 150

A A D L L L L c) L L A L L A,P L L L L L L L • Count all atom pairs (15 x 10) A • Scale by number of non-H atoms 7 9 1 3 5 d) 6 8 L 2 4

19 Shortest Paths: Flood-fill

O O H N S O N

Molecular graph B A A

dAB = 8 bonds

20 “Fuzzification” of the CATS2D Descriptor

Counts Counts 4 4 3.5 3.5 Counts = f ⋅Counts bin +1 3 3 2.5 2.5 Counts = f ⋅Counts bin −1 2 2 1.5 1.5 1 1 0.5 0.5

0 1 2 3 4 5 0 1 2 3 4 5 Distance / bonds Distance / bonds

Original Fuzzy

• no significant overall improvement of enrichment of actives in a focused library • can be helpful for individual searches ( scaffold hopping) 21 K (D1) = 270 nM CATS2D: A Ranked List i Ki (D3) = 21nM Ki (D4.2) = 11nM Query Ki (5-HT2A) = 25nM O Ki (α1) = 19 nM Haloperidol OH Ki (H1) = 730 nM. D2-antagonist N Cl F

O N OH OH N H 5-HT2C antagonist N D2 ligand N Br 1 F 6 O

O OH N O N O H3 antagonist D2 ligand 2 F F 7 NH F F GABA transporter HO TNF-α inhibitor type I ( channel) 3 8 N O N N N N NH2 S O O N 4 S 9 D2 ligand PPAR-γ HN N F O O Cl

5 10 Eliprodil D2 ligand H2N N N N (ion channel) F OH 22 Similarity Searching → Complementary Results

COX-2 MMP HIV-Protease

CATS2D CATS2D CATS2D CATS3D CATS3D CATS3D

Charge3D Charge3D Charge3D

∩∩∩ = 6 ∩∩∩ = 1 ∩∩∩ = 0

23 Ligand Flexibility & Receptor Shape

COX-2: buried, narrow MMP3 : shallow, solvent-exposed HIV-Prot : buried “tunnel”

1CX2 1D5J 1HSG

Red: crystal structure of L-735,524 Green: CORINA model 24 Identification of Natural Product-Derived 5-Lipoxygenase Inhibitors

H H R O 1 O O R O N 2 O α-santonin R3

Mugwort (Artemisia vulgaris)

S

HN N

O O

EC 50 ≈ 0.8 µM (5-LOX)

25 3-Point Pharmacophores

26 Topological 3-Point Pharmacophores (3PP)

Greene et al. (1994) JCICS 34:1297. Good & Kuntz (1995) JCAMD 9:373. McGregor & Muskal (1999) JCICS 39:569. NH2 OH

O

2D structure Molecular graph

d = 4 x = 10010 ...

d = 3 d = 5 Molecular fingerprint (bitstring)

PPP assignment Distance assignment (in bonds)

27 Calculation of Feature Relevance

Bit set Bit not set

x x Ri = f ( (Fi = ))1 − f ( (Fi = ))0

x : molecular fingerprint with features F

(2+3=5) 3 Ri = 3

3 2 2 Visualization PPP assignment Feature Weighting Rj = 2

Franke et al. (2005) J. Med. Chem. 48:6997. Byvatov et al. (2005) ChemBioChem 6:997. 28 3-Point Pharmacophore Screening: COX-2 Ligands (1)

Sulfonyl group a) b) planar hydrophobic Ring B O H-bond N S NH 2 acceptor N Ring A F3C O

SC-558 Celecoxib Palomer et al. (2003)

29 3-Point Pharmacophore Screening: COX-2 Ligands (2) IC 50 = 8 ± 2 µM Suggested compounds OH O H2N S O O O O O S NH2 O HO N O O S O S NH O 2 Cl N O IC = 0.2 ± 0.3HO µMO 50 1 2 3 4 Cl OO O S O HN O N S OH N O S O N O N S NH2 O O O N O 5 6 7 8 O

N O S N N O SNO N SNO O N N O N N O O N S NH S S N 2 O O NH2 9IC 50 = 5 ± 1 nM 10IC 50 =SH 6 ± 113 µM IC 50 = 12 15 ± 3 µM

Cl Positive control OH O NH O O NS NH2 O N S NH O O Cl 2 S O N O F3C O O 13 14 15 16 Diclofenac Celecoxib Rofecoxib 30 3-Point Pharmacophore Screening: COX-2 Ligands (3)

Automated Docking Molecule 5 (1cx2, GOLD) ?

His90 SC-558

new COX-2 inhibitor 5 with benzimidazole scaffold higher activity than coxibs in cellular assay predicted binding mode similar to SC-558 but: no true scaffold-hop!

Franke et al. (2005) J. Med. Chem. 48:6997. 31 Why?

• Diversity of reference ligands “Mickey Mouse” scaffold “Privileged motif” only 14 scaffolds (in 94 compounds) biased reference data ?

• Shape of the COX -2 binding pocket narrow, small, limited binding possibilities “bad” choice of target ?

• Descriptor level of abstraction too “atomistic”, too fine-grained ?

32 LIQUID Li gand -based Qu antification of Interaction Distributions

33 “Fuzzy” Pharmacophores: LIQUID

• Ligand-based Qu antification of Interaction Distributions • Fuzzy pharmacophore models based on trivariate Gaussians

34 The problem: Spherical PPPs are not appropriate for planar structures

a solution: Trivariate Gaussians

1  x µ 2   ( − )  univG(x) = exp− 2  Univariate distribution σ 2( π)  2σ 

1 1  (x − µ )2 (x − µ )2   1 1 2 2  bivG(x1 ,x2) = exp−  2 + 2  Bivariate distribution 2 2 2 σ σ σ1σ2 2π  1 2 

1 1  (x −µ)T (x −µ) trivG(x) = exp−   Trivariate distribution Σ 2( π)3 2  Σ  LIQUID step 1: Calculation of local feature density (LFD)

Atom typing:

{C![(N),(O)], S![(H), (N), (O)], Cl, I, Br} {OH,NH} {O,N![H]}

N  Type Type  Type D2(atomk ,atomi ) LFD (atomk ) = ∑max 1,0 −  i=1  rc 

N: # atoms of type „Type“

cluster radius (L: 4 Å, D/A: 1.9 Å) 36 LIQUID step 2: Clustering of atom types

• Union-Find strategy

INIT: each atom is a singleton.

FOR each atom i of type T FOR each atom j of type T calculate Distance(i,j )

IF Distance ≤ ClusterRadius rc THEN

FIND maxLFD(Cluster i)

FIND maxLFD(Cluster j)

IF maxLFD(Cluster i)≤ maxLFD(Cluster j) THEN

UNION Cluster j with Cluster i

• number of final clusters depends on rc

37 LIQUID step 3: Calculation of cluster centroids

• Cluster ≡ PPP

 1 n 1 n 1 n  T   gc(PPPk ) =  ∑x j , ∑y j , ∑zj   n j n j n j 

x, y, z: cartesian atom coordinates

38 LIQUID step 4: Principle Component Analysis

NIPALS

PC unit length scaling of PCs by standard deviation

39 LIQUID step 5: Encoding as a correlation vector

Type B OH

Type A d

× bin

A B A,B 1 1 CVd = ∑∑⋅ {}trivGi ⋅trivGj # pairs (A,B) i j 2 A, B : PPP types i, j : PPP instances alignment-free descriptor vector

dmax = 20 Å 40 A Tough Target: TAR RNA

G26

G32 G33 C39 U31 G34 C30 A35 Bulge C29 G36 G28 C37 U25 A27 U38 G26 C39 Acetylpromazine U25 C24 U40 U23 A22 U40 G21 C41 A20 U42 C19 G43 C24 C18 G44 G17 C45 G16 C46

1lvj, model 1 a)(Du et al ., 2002) b)• very flexible c) target • only poor ligands known • “tough but typical” 41 RNA Ligand Pharmacophores (1)

b) c) 20 x Re-Docking NH+

O N

S

1LVJ

Docking of Top 100 SPECS FRET Assay Cherry Picking DB

42 RNA Ligand Pharmacophores (2)

O In vitro HO N TAR transcription/translation

O Vendor lib. N 2 0 O HO O N

O N = 13

a) Most prevalent scaffolds (top 2000 cpds.)

(95) (79) (52) (49) (49) (45) (39) (35) (34) (34) 43 RNA Ligand Pharmacophores (3)

OH H O HO H O NH N O NO2 S O Cl Cl O O Chloramphenicol

OH O- Vendor lib. + HO OH O N O H N N S O N H O H Cl O O N Clindamycin O Cl H N O N N H S N O In vitro N O transcription/translation HO TAR

Tiamulin

0 3

N = 19 44 Virtual Screening Strategy (PPAR γγγ )

AF2-Helix

Helix 3

1nyx: Ragaglitazar AF2-Helix 1knu: YPA Helix 3 His323 1i7i: Tesaglitazar 1zgy: Rosiglitazone Ser289

Farglitazar

OH Tyr473 His449 O O

EC 50 = 15 µM (PPAR γ) PhAST The Ph armacophore Alignment Search Tool PhAST

• PhAST – Ph armacophore Alignment Search Tool – Method for text-based comparison of – Uses a 2D-Pharmacophore Model

possible interactions PPP symbol hydrogen bond acceptor A hydrogen bond donor D charge positive P charge negative N lipophilic L aromatic R hydrogen bond acceptor, hydrogen bond donor E hydrogen bond acceptor, charge positive Q hydrogen bond acceptor, hydrogen bond donor, U charge positive no possible interactions O 47 PhAST: The Concept O N Q O O L • Assign PPPs to atoms O N Cl Ł R Q R L 2D-PPP-Graph R R R R S R R R R R L R

6 • Create sequence of PPP- O Q 3 symbols O O 5 8 L 7 – Canonical labeling of O 9 R Q R L 14 4 16 1 vertices R R R R 11 20 21 17 R R R R 10 18 19 13 – Combine vertex symbols R L R 12 2 15 following their indices

LLQQOOLOORRRRRRRRRRRR • Compare molecules Similarity score between Ł QORRLRRRROLRRRRQRROOL compare sequences 0 – no similarity QORRLRRRROLRRRRQLRROOL 1 – identical molecules

48 PhAST: The Algorithms

• Algorithm for canonical labeling: Weininger et al. (1988)

• Similarity between sequences X and Y (S(X,Y) ) is computed by

– Pairwise global Sequence Alignment (Needleman & Wunsch , 1970)

M(A(X,Y)) A(X,Y) : Alignment of X and Y S(X,Y) = M(A(X,Y)) : number of matches in A(X,Y) L(A(X,Y)) L(A(X,Y)) : length of A(X,Y) PhAST: PPP Frequencies in Drugs and Lead Structures

PPP type count frequency R 98,786 40.4% P 325 0.1% N 2,569 1.1% E 6,572 2.7% A 10,473 4.3% U 9,025 3.7% Q 5,933 2.4% L 49,853 20.4% O 60,969 24.9%

Data from COBRA 8.2

50 PhAST: Scoring Matrix

AERLNOPQU A 8 2 -4 -2 -1 -2 -4 4 -2 E 12 -8 -4 -9 -4 -6 -4 0 R 3 1 -4 -4 -5 -9 -13 L 2 -2 -2 -2 -4 -6 N 10 -2 -6 -7 -10 O 2 -2 -4 -6 P 10 6 4 Q 14 6 U 16

51 PhAST: Alignments & Alignment Scores

N N ABC N N Cl HO NH2

S S HO

N LQQOOLOORRRRRRRRRRRR _LQQOOLOORRRRRRRRRRRR EEU__LO______RRRRRR |||||||||||||||||||| |||||||||||||||||||| ### || |||||| A LQQOOLOORRRRRRRRRRRR LLQQOOLOORRRRRRRRRRRR LQQOOLOORRRRRRRRRRRR N Score: 76 Score: 71 Score: 3 S Similarity: 1.00 Similarity : 0.95 Similarity : 0.40

N LLQQOOLOORRRRRRRRRRRR EEU___LO______RRRRRR ||||||||||||||||||||| ### || |||||| B LLQQOOLOORRRRRRRRRRRR LLQQOOLOORRRRRRRRRRRR N Cl Score: 78 Score: 2 S Similarity : 1.00 Similarity : 0.38

HO NH2 EEULORRRRRR ||||||||||| HO EEULORRRRRR C Score: 62 Similarity : 1.00

52 Shapelets Shape -based similarity searching

53 Surface Shape Index

knob

ridge convex conacave saddle

cleft

bag

54 “Shapelets”: Surface Decomposition

1. Approximate molecule as a sum of Gaussians

2. Extract an isosurface

3. Reduce the number of points on the isosuface

4. Fit paraboloids

5. Origins of paraboloids are characteristic points

Proschak et al. (2008) J. Comp. Chem. 29:108

55 Shapelets Decomposition & Alignment

Molecule 1 Molecule 2

Clique detection in association graph

56 Excursion: Bron-Kerbosch Algorithm for maximal Clique Identification C. Bron, J. Kerbosch (1973) Algorithm 457: Finding All Cliques of an Undirected Graph. Communications of the ACM Vol. 16 (9), ACM Press: New York.

Implemented in: CLIP (Willett et al. 2003), Zhang & Grigorov 2006, Shapelets (Schneider et al. 2008)

Step 1: Form the Correspondence Graph („ Assiociation graph“, „Product graph“) from the two molecular graphs.

Step 2: Search for Cliques (= completely connected subgraph, which is not contained in any other completely connected subgraph) in the Correpondence Graph by “backtracking tree search” . Cliques & Subgraphs

• complete subgraph of a graph : part of a graph in which all nodes are connected to each other

• cliques : maximal complete subgraphs (not subsumed by any other complete subgraph)

58 The Correspondence Graph, C

A1 A2 A3 B1 B2 B3

A4 B4

B5 A4,B5 A4,B4 A4,B3 A4,B2 A4,B1

A1,B1 A3,B5 A1,B1 A2,B1 A3,B1 A4,B1 A1,B2 A2,B2 A3,B2 A4,B2 A1,B2 A3,B4 C = A1,B3 A2,B3 A3,B3 A4,B3 A1,B3 etc. A3,B3 A1,B4 A2,B4 A3,B4 A4,B4 A1,B5 A2,B5 A3,B5 A4,B5 A1,B4 A3,B2

A1,B5 A3,B1

Connect two nodes (A I, B X) and (A J, B Y) in C, if D(A I, A J) = D(B X, B Y). A2,B1 A2,B2 A2,B3 A2,B4 A2,B5 59 Bron-Kerbosch Algorithm

Tree Already level List seen Candidates Action C2 = (A1,B2)

C1 = C4 = C5 = (A1,B2) (A2,B1) (A3,B3)

C3 = (A2,B2)

A Correspondence Graph, C

60 Shapelets + LIQUID = SQUIRREL

Alignment with Scoring with Shapelets LIQUID

61 PPAR Agonists: Virtual Screening with SQUIRREL (1)

SOM

SPECS ~3.000 ~200.000 PPAR-ligand like molecules molecules

60 candidates

Cherry Picking

Functional 21 molecules Assay

62 Shape + Pharmacophore Screening: SQUIRREL

new PPAR ααα-selective scaffold

COOH

N N S N

OH

EC 50 PPAR ααα = 0.044 ± 0.005 µM EC 50 PPAR γγγ = 4.9 ± 0.4 µM

• Automated ligand docking with GOLD (Verdonk et al., 2003) • Receptor structure 2p54, co-crystal with GW590735 (Sierra et al., 2007) For further reading ….

"This book provides a brilliant first access to the interdisciplinary field of molecular design. ... a ’must have’.“ Journal of Chemical Information and Modeling

“The authors have done an admirable job of simply explaining a complex and rapidly evolving field to a wide and varied audience.“ Journal of Medicinal Chemistry Exercise

1) Go to the CATS website: www.modlab.de software CATSlight • make yourself familiar with the software options

2) Perform similarity searching in the provided data set (screeningCompounds.sdf) with this SMILES query structure:

CS(=O)(=O)c1ccc(cc1)n2nc(cc2c3ccc(F)cc3)C(F)(F)F

• How does the molecule look like in a 2D sketch? ( http://daylight.com/daycgi_tutorials/depict.cgi)

Change the various CATS parameter settings. • Do you see an influence on the resulting hit list? • How do you explain the differences? • Do you observe „scaffold hops“? The screening compounds

O S F O H O S O

N HN O O O N N N F F N HN F 1 2 3 4 5

N NH H HO OH O S O O O NH HO N O NH NH N N N N OH H

6 7 8 9 10