<<

: An Application of Expert Systems and Ontologies to Chemical (Named) Reaction Analysis

Roger Sayle, John Mayfield and Noel O’Boyle NextMove Software, Cambridge, UK

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 analysis vs. prediction

• In “Applied Chemoinformatics” (2018), J. Goodman defines three main problems. – Reaction Planning: R → ? → P [Database]

– Reaction Prediction: R1 + R2 → ? [Simulation]

– Synthesis Planning: R? + R? + R? → P [Design]

• A corollary is that there’s a distinction between reactions that have already been observed and those experiments yet to be performed.

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 analysis vs. prediction

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 intuition and counter-intuition

With apologies to Mark Twain: “Never let the facts get in the way of a good synthesis plan”.

There are reactions that chemists expect will happen but don’t & those they don’t expect but do.

But cheminformaticians are alchemists that can turn lead into gold, as easily as “[Pb]>>[Au]”.

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Experimental validation • Synthesis of a novel aromatic heterocycle previously unreported in the literature. • William Pitt et al., “Heteroaromatic Rings of the Future”, Journal of Medicinal , 52(9):2952- 2963, 2009.

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 expectations: setting the scope

Goodman Challenges Carey et al. Challenges

Maitotoxin

Difficult to access substituted aromatic starting materials.

Eribulin Org. Biomol. Chem. 2005, 4,2337-2347.

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Which reactions are important?

• NCI/LHASA SAVI (14+6 reactions) – Suzuki Coupling 36262 out of 1.2M USPTO examples – Sulfonamide Schotten-Baumann 14348 out of 1.2M USPTO examples – Buchwald-Harwig 6040 out of 1.2M USPTO examples – 458 out of 1.2M USPTO examples – Fukuyama coupling 2 out of 1.2M USPTO examples – Liebeskind-Srogl coupling 0 out of 1.2M USPTO examples • Hartenfeller/Schneider (58 reactions) – #1 Pictet-Spengler reaction 7 out of 1.2M USPTO examples – #10+ #11 Azide-nitrile Huisgen- 5 out of 1.2M USPTO examples – #17 Pyridone synthesis 2 out of 1.2M USPTO examples – #20 Phthalazinone synthesis 16 out of 1.2M USPTO examples – #24 Friedlander quinoline synthesis 30 out of 1.2M USPTO examples • Enamine REAL 2016 (43 reactions) – Thiourea to guanidine (14 out of 160M examples).

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 myth #1: heterocycle formation

• Because almost all drug-like molecules contain a heterocycle, there’s a belief that heterocycle formation is important. • Analysis of both patent data and pharmaceutical ELNs reveals that heterocycle forming reactions, even named heterocycle forming reactions, are relatively rare, with ring systems often being purchased as building blocks*.

* This analysis might not apply to process development and manufacturing. https://nextmovesoftware.com/blog/2016/10/24/buying-a-ring-or-making-one-yourself/

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Reactions that don’t happen!

• Paal-Knorr Pyrrole Synthesis vs. Aldehydes/

Of the 430 examples of Paal-Knorr pyrrole synthesis reported in US patent applications 2001-2012, exactly zero have more than the two reacting ketones/aldehydes.

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Who let the dogs out?

• Big Data can determine the utility and scope of reactions.

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Reactions that do happen!

• Chloro Sonogashira Couplings (@ NCI) – The LHASA ‘CHMTRN’ rules for Transform 2267, Sonogashira couplings, (presented by Marc Nicklaus at Sheffield) states “ are usually more reactive than . do not react”. Code Name Count Yield 3.3.2 Bromo 3717 49.3% 3.3.3 Chloro Sonogashira coupling 429 44.2% 3.3.4 Iodo Sonogashira coupling 2721 64.9%

• Isotopically-labelled Compounds (@Eli Lilly) – The LAAR reactions used to construct Lilly’s PLC (Nicolaou et al. 2016) forbid the presence of isotopic labels in reactants.

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 improving upon synthia™

• As presented earlier, Chematica/Synthia maintains a manually curated “black list” of interfering functional groups. • A more labor-efficient data-driven strategy is to automatically maintain a “white list” of tolerated functional groups that can be derived from observed experiments. • It’s much easier to track the things you know about and can see, than the things you can’t see and/or don’t know about.

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 handling noisy data (n>1 statistics)

• Kumada couplings incompatible with aldehydes... • US20050107257A1 [p0196] Syngenta

A solution of 34.3 ml of ethylmagnesium in 50 ml of is added dropwise at −70° C. to 7.3 g of indium trichloride in 200 ml of tetrahydrofuran and, after stirring for 30 minutes, the reaction temperature is allowed to rise slowly to room temperature. That solution is added to a solution of 14.9 g of 3- bromo-4- carbaldehyde and 2.8 g of PdCl2 (PPh3)2 in 240 ml of tetrahydrofuran and the reaction mixture is heated under reflux for 20 hours. 5 ml of methanol are then added and the mixture is concentrated in vacuo, stirred thoroughly with diethyl ether, filtered off and concentrated in vacuo once more. The residue is chromatographed on silica gel using ethyl acetate/hexane (1:1), yielding 3-ethyl-4-pyridine carbaldehyde (I) in the form of a yellow oil.

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 prediction? about the future?

c.f. US20140296184A1 [C00008]

M. Segler, M. Preuss and M. Waller, “Planning Chemical Syntheses with Deep Neural Networks and Symbolic AI”, Nature, 555:604-610, 2018. A.Gini, M. Segler et al, “Dehydrogenative TEMPO-mediated formation of Unstable Nitrones: Easy Access to N-Carbamoyl Isoxazolines”, Chem. Eur. J. 21:12053-12060, 2015.

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 search: monte carlo v. proof num

• Appropriate choice of AI search technique. OR 1

0 0 1 0 AND

0 1 0 0 1 1 1 0 Monte Carlo search is inappropriate for search problems such as mazes.

White to win in 173 ply Kf1-f2!

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 categorization of reactions

2% 5% Heteroatom alkylation and arylation Acylation and related processes 15% 34% C-C bond formations 1% Heterocycle formation Protections 10% Deprotections Reductions 6% Oxidations 3% 17% 2% 5% conversion Functional group addition Resolution

1. J. Carey, D. Laffan, C. Thomson, M. Williams, Org. Biomol. Chem. 2337, 2006. 2. S. Roughley and A. Jordan, J. Med. Chem. 54:3451-3479, 2011.

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 reaction ontology

• Reactions are classified into a common subset of the Carey et al. classes and the RSC’s RXNO ontology. • There are 12 super-classes – e.g. 3 C-C bond formation (RXNO:0000002). • These contain 84 class/categories. – e.g. 3.5 Pd-catalyzed C-C bond formation (RXNO:0000316) • These contain ~1050 named reactions/types. – e.g. 3.5.3 (RXNO:0000088) • These require ~2200 SMIRKS-like transformations.

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Complication: agents matter

From http://en.wikipedia.org/wiki/Diazonium_compound

Sandmeyer reaction Benzenediazonium chloride heated with cuprous chloride disolved in HCl to yield chlorobenzene. + + C6H5N2 + CuCl → C6H5Cl + N2 + Cu

Gatterman reaction Benzenediazonium chloride is warmed with powder and HCl to yield chlorobenzene. + + C6H5N2 + CuCl → C6H5Cl + N2 + Cu

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 10 most popular reactions

ID Name Count 2.1.2 Carboxylic + amine 26,040 1.3.1 Buchwald-Hartwig amination 22,048 3.1 Suzuki coupling 16,508 1.7.6 Williamson ether synthesis 15,665 2.1.1 Amide Schotten-Baumann 11,016 7.1 Nitro to amino 10,234 6.1.1 N-Boc deprotection 9,821 6.2.2 CO2H-Me deprotection 9,487 6.2.1 CO2H-Et deprotection 6,749 2.2.3 Sulfonamide Schotten-Baumann 6,223

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Most/least successful reactions

ID Name Mean Yield Count 1.7.2 Diazomethane esterification 91% 41 9.3.1 Carboxylic acid to acid chloride 88% 704 9.7.14 Bromo to azido 85% 235 1.7.5 Methyl esterification 84% 2918 9.7.19 Bromo to iodo Finkelstein reaction 82% 116 6.1.3 N-Cbz deprotection 81% 1359 … 4.1.11 Larock synthesis 47% 55 3.11.3 Ullmann-type biaryl coupling 44% 407 1.7.1 Chan-Lam ether coupling 44% 154 4.1.4 Pinner synthesis 39% 47

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 rArE named reactions

• Adams decarboxylation • Imine Hosomi-Sakurai reaction • Angeli-Rimini reaction • Koch reaction • Aza-Baylis-Hillman reaction • Leuckart reaction • Boyer reaction • Liebeskind-Srogl coupling • Buchwald-Fischer indole synthesis • Lossen rearrangement • Castro-Stephens coupling • Ponzio reaction • Chugaev elimination • Prins reaction • Cook-Heilbron synthesis • Reimer-Tiemann carboxylation • Fischer-Hepp rearrangement • Fukuyama indole synthesis • Gasman indole synthesis

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Trends in Reaction Types

8.0%

7.0% Leaving Group Mean Yield N Observations

6.0% Bromo 58.80% 10817 Chloro 57.96% 2752 5.0% Iodo 57.21% 2049 Triflyloxy 65.48% 717 4.0%

3.0%

2.0%

1.0% Suzuki couplings as a percentage of reactions in a year a in reactions of percentage a as couplingsSuzuki

0.0%

1986 2001 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Namerxn reaction naming

• Many reaction classification algorithms are dependent upon atom-atom mapping assignments. • Alas MCS-based atom mapping algorithms are often slow and/or inaccurate [Lowe & Sayle 2012 & 2013]. • NameRXN is a mechanism-based atom-mapper. • All reactants and reagents are placed in a single pot (molecule) and sets of SMIRKS applied in turn. • If the desired product is generated, the reaction (its mechanism and mapping) is identified. – Rationalization is easier than prediction!

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Example smarts/smirks

# NOZAKI_HIYAMA_KISHI_REACTION [#6v4+0;X4,X3:1][BrD1h0+0:2].[Ni].[Cr].[OD1h 0+0:3]=[CD2h1v4+0:4]>>[#6:1][C:4]-[Oh1:3]

# PAAL_KNORR_THIOPHENE_SYNTHESIS [OD1h0+0:1]=[CX3v4+0:2][CX4v4+0:3]([H])[CX4v 4+0:4]([H])[CX3v4+0:5]=[OD1h0+0:6]>>[S:1]1[C :2]=[C:3][C:4]=[C:5]1

• Writing SMIRKS is both an art and a science.

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Smarts pattern compilation

bool atom_1(const RDKit::Atom *aptr) { return aptr->getAtomicNum() == 6 && ringinfo->numAtomRings(aptr->getIdx()) != 0 && aptr->getDegree() == 3 && aptr->getTotalNumHs(false) == 0 && aptr->getExplicitValence()+aptr->getImplicitValence() == 4 && aptr->getFormalCharge() == 0; } bool atom_28(const RDKit::Atom *aptr) { if (aptr->getAtomicNum() != 6 || aptr->getExplicitValence()+aptr->getImplicitValence() != 4 || aptr->getFormalCharge() != 0) return false; return (aptr->getDegree() == 2 && aptr->getTotalNumHs(false) == 1) || (aptr->getDegree() == 2 && aptr->getTotalNumHs(false) == 2) || aptr->getDegree() == 3; }

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Smarts pattern compilation

RDKit::ROMol::OBOND_ITER_PAIR biter[25]; … case_7: biter[0] = mol->getAtomBonds(atom[1]); goto case_9; case_8: avisit[atom[0]->getIdx()] = 0; ++biter[0].first; case_9: if (biter[0].first != biter[0].second) { bptr = (*mol)[*biter[0].first].get(); if (bond_1(bptr)) { aptr = bptr->getOtherAtom(atom[1]); aidx = aptr->getIdx(); if (avisit[aidx] == 0 && atom_1(aptr)) { avisit[aidx] = 1; atom[0] = aptr; goto case_10; } else ++biter[0].first; } else ++biter[0].first; } else goto case_5; goto case_9;

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Recent improvements insights

• Profiling NameRxn in 2014 to diagnose performance problems with RDKit revealed that pattern matching and transformation accounted for <1% of runtime. • The bottleneck is actually in canonicalization. • The Ah-ha experience was to use hash filtering.

• Check molecular formula: CiHjBrkCllFmInNoOpPqSr • Additional cleverness allows pre-sanitization hashing. – Triple bond count, but not single or double bond count. – Perhaps there’s something in InChI-style hashing after all.

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Pistachio: Siri for chemists

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Multi-step Synthetic Routes 700000

600000

500000

400000

Intermediates

Occurrences 300000

Terminal Products 200000

100000

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Intermediates 197702 103114 56611 31403 17268 9230 5057 2701 1256 639 301 136 58 15 5 2 Terminal Products 385149 149445 81837 47579 27670 16619 9320 5263 2511 1330 678 373 111 63 8 6 5 Number of steps

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Application to planning 1

• Cinnamic Acid (PhCHCHCO2) 1. Bromo (272) 2. Horner-Wadsworth-Emmons reaction (268) 3. Wittig olefination (129) 4. Bromo Heck-type reaction (62) 5. Iodo Heck reaction (49) 6. Triflyloxy Heck[-type] reaction (43) 7. Schotten-Baumann (10) 8. Bromo Suzuki coupling (5) 9. (2) 10. (1)

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Application to planning 2

• p-Nitrobenzoic acid • p-Nitrotoluene

1. Nitrile to carboxy (12) 1. Nitration (96) 2. CO2H-Me deprot (8) 2. Bromo Suzuki-type (1) 3. CO2H-Et deprot (5) 3. Chloro Suzuki (1) 4. Ester hydrolysis (1) 5. Nitration (1)

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 tactical vs. strategic reactions

• Traditional Synthesis Planning has concentrated on the Strategic Application of Named Reactions. • However, there’s much to be gained for the Tactical Application of Unnamed Reactions. • Nitro reduction is the 6th most frequent reaction.

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 functional group interconversion

• Relative frequency of simple group conversion

Amino Bromo Chloro Fluoro Hydroxy Iodo Sulfanyl Thioxo Amino 3951 2949 990 3976 3657 143 Bromo 3121 589 637 2390 738 435 Chloro 9424 717 1606 2156 744 798 Fluoro 1549 180 484 826 28 103 Hydroxy 2572 11441 31593 7641 3004 348 Iodo 155 445 47 Nitro 126606 138 Oxo 8419 From a total of 7,252,419 reactions from USPTO & EPO patents.

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 nextmove’s strategy

• Why expose reaction planning to the end-user at all? • During our collaboration with ChemSpace, it has become clear that what was required was not similarity nor superstructure search by a form of synthesis-aware search. • This is similar to the challenge faced by traditional restrosynthesis tools in identifying a leaf/goal state. • 937M purchasable compounds makes this is non-trivial. • The usual challenges of functional group, tautomer and protonation state lookups, now also protecting groups. • But why not return the carboxylic acid when searching for the acid chloride, or when searching for bromo derivative.

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 building block search

• Query: Results:

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 the challenge of regioselectivity

• A tricky benchmark is reactions of 2,4,5-trichloropyrimidine

• The nature of pyrimidine makes the chloro at the 4-position more reactive than the 2 position which is more reactive than the 5 position. • Simple quantum mechanical have difficulty discerning this order.

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 handy’s rule

• Scott Handy and Yanan Zhang, “A Simple Guide for Predicting Regioselectivity in the Coupling of Polyhaloheteroaromatics”, Chemical Communications, 3:299-301, Nov 2005. Abstract A simple guide for predicting the order and site of coupling (Suzuki, Stille, Negishi, Sonogashira, etc.) in polyhaloheteroaromatics based upon the 1H NMR chemical shift values of the parent non-halogenated heteroaromatics has been developed.

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 electrophilic substitution of heterocycles with qm methods

• J.C. Kromann et al., “Fast and Accurate Prediction of the Regioselectivity of Electrophilic Aromatic Substitution Reactions”, Chem. Sci., 9(3):660-665, Nov 2017. • M. Kruszyk et al., “Computational Methods to Predict the Regioselectivity of Electrophilic Aromatic Substitution Reactions of Heteroaromatic Systems”, J. Org. Chem. 81(12):5128-5134, Jun 2016.

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 a data-driven strategy

• One possible approach to the challenge of regioselectivity is to derive preferences by large-scale (statistical) analysis of reaction data sets. • Reaction classification can identify the subset of relevant examples, which can then be used to produce tables of heterocycles position preferences.

• Directing groups and their influence can also be identified and tabulated. • See https://www.scripps.edu/baran/images/grpmtgpdf/Gutekunst_Apr_10.pdf CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 the semantics of atom mapping

Prof. Goodman poses an interesting question of atom mapping. 4.1.6 Cyclic Beckman Rearrangement

1.2.9 Alcohol + Amine Condensation or 1.1.3 Iodo N-methylation

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Acknowledgements

• AbbVie • NextMove Software • AstraZeneca – Noel O’Boyle • Bristol-Myers Squibb – John Mayfield • ChemSpace • NextMove Alumni • Eli Lilly • GlaxoSmithKline – Daniel Lowe • Hoffmann-La Roche • IBM Research Zurich • Thank you for you time. • Merck • Novartis • Questions? • Royal Society of Chemistry • Thoughts? • Vernalis • Vertex Pharmaceuticals

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Transforms vs. reactions

Importance of Reaction Mechanism Example: Ullman-type Coupling Reactions SMIRKS: [H][N:1].[Cl][c:2]>>[N:1][c:2]

The SMIRKS transform alone is insufficient to predict the products and by-products in this example. A measure of nucleophility is desirable for each atom in a molecule. Without this software may be misled into believing that protecting groups are required. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Analysis of pharmaceutical elns

• NextMove Software’s HazELNut software is used to export and analyze ELN content at 6 of the top 10 large pharmaceutical companies. • In-house analysis of this data, across the industry, reveals a surprisingly high rate of synthesis failure, not indicated in the published literature (journal articles, patent applications or reaction databases). • Understanding the causes of these failures is perhaps more significant than attempting to access new chemistries.

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Extracting mps and reactions

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Example reaction mining INPUT

Methyl 4-[(pentafluorophenoxy)sulfonyl]benzoate

To a solution of methyl 4-(chlorosulfonyl)benzoate (606 mg, 2.1 mmol, 1 eq) in DCM (35 ml) was added pentafluorophenol (412 mg, 2.2 mmol, 1.1 eq) and Et3N (540 mg, 5.4 mmol, 2.5 eq) and the reaction mixture stirred at room temperature until all of the starting material was consumed. The was evaporated in vacuo and the residue redissolved in ethyl acetate (10 ml), washed with water (10 ml), saturated sodium hydrogen carbonate (10 ml), dried over sodium sulphate, filtered and evaporated to yield the title compound as a white solid (690 mg, 1.8 mmol, 85%). Example reaction mining Output CHEMICAL reactions for free

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 bond changes in indole synthesis

Synthesis A B C D Baeyer-Emmerling M Bartoli M M Bischler-Möhlau M C M Fischer M C M Fukuyama M Hemetsberger M Larock M C M Mandelung M Nenitzescu M M Reissert M M

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Suzuki coupling leaving groups

Leaving Group Mean Yield N Observations Bromo 58.80% 10817 Chloro 57.96% 2752 Iodo 57.21% 2049 Triflyloxy 65.48% 717

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Beyond drug guru

sildenafil (viagra) vardenafil (levitra)

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Eli lilly’s automated synthesis lab

Alexander G. Godfrey, Thierry Masquelin and Horst Hemmerle, “A Remote-Control Adaptive MedChem Lab: An Innovative Approach to Enable Drug Discovery in the 21st Century”, Drug Discovery Today, Vol. 18, Nos. 17-18, September 2013.

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Synthesis failures at lilly

• At the 2013 Sheffield Cheminformatics conference, Christos Nicolaou highlighted the technical challenge with predicting compounds potentially accessible by the Lilly’s Advanced Synthesis Lab (ASL). • In a proof-of-concept pilot project, only 25 of 90 compounds suggested by Lilly’s Annotated Reaction Repository (LARR) rule-set could be successfully synthesized in practice.

• http://cisrg.shef.ac.uk/shef2013/talks/14.pdf

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Synthesis failures at gsk

• Pickett et al. 2011 describe the parallel synthesis of a 50x50 library of MMP-12 inhibitors by an iodo-Suzuki .

• Only 1704 of 2500 could be assayed [566 not made] Pickett et al., ACS Med. Chem. Lett. 2(1):28, 2011

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Learning from failure (logp shift)

• Nadin et al. 2012 [1] hypothesize that low LogP is a major cause of synthesis failure in parallel synthesis of combinatorial libraries. • Analysis confirms that this is indeed a significant factor for the GSK MMP-12 library. – 1704 compounds measured, mean logP = 3.56 (1.44) – 566 compounds not made, mean logP = 2.83 (1.52) – Student’s t-test for different distributions, p<2x10-22.

1. Nadin, Hattotuwagama and Churcher ,“Lead-Oriented Synthesis: A New Opportunity for Synthetic Chemistry”, Angew. Chem. Int. Ed, 51:1114 2012.

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Example (actionable) insight

Sucessful Reactions Failed Reactions 100% 90% 55 119 127 141 80 36 8 63 80% 474 202 70% 525 340 60% 97 232 50% 40% 30% 20% 10% 0% <1.0 1.0-2.0 2.0-3.0 3.0-4.0 4.0-5.0 5.0-6.0 >6.0 Reaction Product Predicted cLogP

The clear trend between Suzuki coupling success rate and predicted octanol-water partition co-efficient. Data: Pickett et al., ACS Med. Chem. Lett. 2(1):28, 2011

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 big data mining confirmation of Nadin-churcher hypothesis On 16,335 Suzuki coupling reactions extracted from US patent applications between 2001 and 2012. LogP Mean Yield N Obs < 1.0 52.89% 196 1.0 – 2.0 56.02% 1155 2.0 – 3.0 56.72% 2881 3.0 – 4.0 58.14% 4071 4.0 – 5.0 57.26% 3186 5.0 – 6.0 59.25% 2126 > 6.0 63.83% 2720

Nadin, Hattotuwagama and Churcher ,“Lead-Oriented Synthesis: A New Opportunity for Synthetic Chemistry”, Angew. Chem. Int. Ed, 51:1114 2012.

CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 “big data” reaction yield analysis

AZ Data courtesy of Nick Tomkinson, AstraZeneca RDI, Alderley Park, UK. “big data” reaction yield analysis

AZ Data courtesy of Nick Tomkinson, AstraZeneca RDI, Alderley Park, UK. Functional group changes analyzed Do the results make sense at all?

Avg in Overall Functional Group Reaction Average Compare the average deltas for the >39K instances of acidchloride 0 -0.07 Williamson ether synthesis acidchloride_aliphatic 0 -0.05 acidchloride_aromatic 0 -0.02 aldehyde 0 -0.04 Avg in Overall aldehyde_aliphatic 0 -0.01 Functional Group Reaction Average aldehyde_aromatic 0 -0.03 -0.98 -0.3 amine_aromatic 0 -0.03 alcohol -0.95 -0.12 amine_primary 0 -0.15 halogen_notfluorine -0.89 -0.27 amine_primary_aliphatic 0 -0.07 alcohol_aromatic -0.67 -0.04 amine_primary_aromatic 0 -0.07 halogen_aliphatic -0.62 -0.15 amine_secondary 0 -0.04 halogen_notfluorine_aliphatic -0.62 -0.14 amine_secondary_aliphatic 0 -0.07 carboxylicacid -0.5 -0.23 amine_secondary_aromatic 0 0.03 halogen_bromine -0.42 -0.11 amine_tertiary_aromatic 0 0 halogen_bromine_aliphatic -0.39 -0.06 azide 0 0 halogen_aromatic -0.36 -0.16 azide_aliphatic 0 0 alcohol_aliphatic -0.28 -0.08 azide_aromatic 0 0 halogen_notfluorine_aromatic -0.27 -0.13 boronicacid 0 -0.03 amine -0.04 -0.3 boronicacid_aliphatic 0 0 amine_aliphatic -0.04 -0.27 boronicacid_aromatic 0 -0.03 carboxylicacid_aliphatic -0.04 -0.08 carboxylicacid_alphaamino 0 0 halogen_bromine_aromatic -0.03 -0.05 isocyanate 0 -0.01 amine_tertiary -0.02 -0.06 isocyanate_aliphatic 0 0 amine_tertiary_aliphatic -0.02 -0.08 isocyanate_aromatic 0 0 carboxylicacid_aromatic -0.02 -0.03 nitro 0 -0.03 amine_cyclic -0.01 -0.02 nitro_aliphatic 0 0 halogen_bromine_bromoketone -0.01 0 nitro_aromatic 0 -0.03 sulfonylchloride 0 -0.02 sulfonylchloride_aliphatic 0 -0.01 These look sensible