Regioselectivity Reaction Analytics
Total Page:16
File Type:pdf, Size:1020Kb
Regioselectivity: An Application of Expert Systems and Ontologies to Chemical (Named) Reaction Analysis Roger Sayle, John Mayfield and Noel O’Boyle NextMove Software, Cambridge, UK CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 analysis vs. prediction • In “Applied Chemoinformatics” (2018), J. Goodman defines three main problems. – Reaction Planning: R → ? → P [Database] – Reaction Prediction: R1 + R2 → ? [Simulation] – Synthesis Planning: R? + R? + R? → P [Design] • A corollary is that there’s a distinction between reactions that have already been observed and those experiments yet to be performed. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 analysis vs. prediction CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 intuition and counter-intuition With apologies to Mark Twain: “Never let the facts get in the way of a good synthesis plan”. There are reactions that chemists expect will happen but don’t & those they don’t expect but do. But cheminformaticians are alchemists that can turn lead into gold, as easily as “[Pb]>>[Au]”. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Experimental validation • Synthesis of a novel aromatic heterocycle previously unreported in the literature. • William Pitt et al., “Heteroaromatic Rings of the Future”, Journal of Medicinal Chemistry, 52(9):2952- 2963, 2009. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 expectations: setting the scope Goodman Challenges Carey et al. Challenges Maitotoxin Difficult to access substituted aromatic starting materials. Eribulin Org. Biomol. Chem. 2005, 4,2337-2347. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Which reactions are important? • NCI/LHASA SAVI (14+6 reactions) – Suzuki Coupling 36262 out of 1.2M USPTO examples – Sulfonamide Schotten-Baumann 14348 out of 1.2M USPTO examples – Buchwald-Harwig 6040 out of 1.2M USPTO examples – Hiyama coupling 458 out of 1.2M USPTO examples – Fukuyama coupling 2 out of 1.2M USPTO examples – Liebeskind-Srogl coupling 0 out of 1.2M USPTO examples • Hartenfeller/Schneider (58 reactions) – #1 Pictet-Spengler reaction 7 out of 1.2M USPTO examples – #10+ #11 Azide-nitrile Huisgen-cycloaddition 5 out of 1.2M USPTO examples – #17 Pyridone synthesis 2 out of 1.2M USPTO examples – #20 Phthalazinone synthesis 16 out of 1.2M USPTO examples – #24 Friedlander quinoline synthesis 30 out of 1.2M USPTO examples • Enamine REAL 2016 (43 reactions) – Thiourea to guanidine (14 out of 160M examples). CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 myth #1: heterocycle formation • Because almost all drug-like molecules contain a heterocycle, there’s a belief that heterocycle formation is important. • Analysis of both patent data and pharmaceutical ELNs reveals that heterocycle forming reactions, even named heterocycle forming reactions, are relatively rare, with ring systems often being purchased as building blocks*. * This analysis might not apply to process development and manufacturing. https://nextmovesoftware.com/blog/2016/10/24/buying-a-ring-or-making-one-yourself/ CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Reactions that don’t happen! • Paal-Knorr Pyrrole Synthesis vs. Aldehydes/Ketones Of the 430 examples of Paal-Knorr pyrrole synthesis reported in US patent applications 2001-2012, exactly zero have more than the two reacting ketones/aldehydes. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Who let the dogs out? • Big Data can determine the utility and scope of reactions. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Reactions that do happen! • Chloro Sonogashira Couplings (@ NCI) – The LHASA ‘CHMTRN’ rules for Transform 2267, Sonogashira couplings, (presented by Marc Nicklaus at Sheffield) states “Iodides are usually more reactive than bromide. Chlorides do not react”. Code Name Count Yield 3.3.2 Bromo Sonogashira coupling 3717 49.3% 3.3.3 Chloro Sonogashira coupling 429 44.2% 3.3.4 Iodo Sonogashira coupling 2721 64.9% • Isotopically-labelled Compounds (@Eli Lilly) – The LAAR reactions used to construct Lilly’s PLC (Nicolaou et al. 2016) forbid the presence of isotopic labels in reactants. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 improving upon synthia™ • As presented earlier, Chematica/Synthia maintains a manually curated “black list” of interfering functional groups. • A more labor-efficient data-driven strategy is to automatically maintain a “white list” of tolerated functional groups that can be derived from observed experiments. • It’s much easier to track the things you know about and can see, than the things you can’t see and/or don’t know about. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 handling noisy data (n>1 statistics) • Kumada couplings incompatible with aldehydes... • US20050107257A1 [p0196] Syngenta A solution of 34.3 ml of ethylmagnesium chloride in 50 ml of tetrahydrofuran is added dropwise at −70° C. to 7.3 g of indium trichloride in 200 ml of tetrahydrofuran and, after stirring for 30 minutes, the reaction temperature is allowed to rise slowly to room temperature. That solution is added to a solution of 14.9 g of 3- bromo-4-pyridine carbaldehyde and 2.8 g of PdCl2 (PPh3)2 in 240 ml of tetrahydrofuran and the reaction mixture is heated under reflux for 20 hours. 5 ml of methanol are then added and the mixture is concentrated in vacuo, stirred thoroughly with diethyl ether, filtered off and concentrated in vacuo once more. The residue is chromatographed on silica gel using ethyl acetate/hexane (1:1), yielding 3-ethyl-4-pyridine carbaldehyde (I) in the form of a yellow oil. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 prediction? about the future? c.f. US20140296184A1 [C00008] M. Segler, M. Preuss and M. Waller, “Planning Chemical Syntheses with Deep Neural Networks and Symbolic AI”, Nature, 555:604-610, 2018. A.Gini, M. Segler et al, “Dehydrogenative TEMPO-mediated formation of Unstable Nitrones: Easy Access to N-Carbamoyl Isoxazolines”, Chem. Eur. J. 21:12053-12060, 2015. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 search: monte carlo v. proof num • Appropriate choice of AI search technique. OR 1 0 0 1 0 AND 0 1 0 0 1 1 1 0 Monte Carlo search is inappropriate for search problems such as mazes. White to win in 173 ply Kf1-f2! CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 categorization of reactions 2% 5% Heteroatom alkylation and arylation Acylation and related processes 15% 34% C-C bond formations 1% Heterocycle formation Protections 10% Deprotections Reductions 6% Oxidations 3% 17% 2% 5% Functional group conversion Functional group addition Resolution 1. J. Carey, D. Laffan, C. Thomson, M. Williams, Org. Biomol. Chem. 2337, 2006. 2. S. Roughley and A. Jordan, J. Med. Chem. 54:3451-3479, 2011. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 reaction ontology • Reactions are classified into a common subset of the Carey et al. classes and the RSC’s RXNO ontology. • There are 12 super-classes – e.g. 3 C-C bond formation (RXNO:0000002). • These contain 84 class/categories. – e.g. 3.5 Pd-catalyzed C-C bond formation (RXNO:0000316) • These contain ~1050 named reactions/types. – e.g. 3.5.3 Negishi coupling (RXNO:0000088) • These require ~2200 SMIRKS-like transformations. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Complication: agents matter From http://en.wikipedia.org/wiki/Diazonium_compound Sandmeyer reaction Benzenediazonium chloride heated with cuprous chloride disolved in HCl to yield chlorobenzene. + + C6H5N2 + CuCl → C6H5Cl + N2 + Cu Gatterman reaction Benzenediazonium chloride is warmed with copper powder and HCl to yield chlorobenzene. + + C6H5N2 + CuCl → C6H5Cl + N2 + Cu CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 10 most popular reactions ID Name Count 2.1.2 Carboxylic acid + amine 26,040 1.3.1 Buchwald-Hartwig amination 22,048 3.1 Suzuki coupling 16,508 1.7.6 Williamson ether synthesis 15,665 2.1.1 Amide Schotten-Baumann 11,016 7.1 Nitro to amino 10,234 6.1.1 N-Boc deprotection 9,821 6.2.2 CO2H-Me deprotection 9,487 6.2.1 CO2H-Et deprotection 6,749 2.2.3 Sulfonamide Schotten-Baumann 6,223 CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Most/least successful reactions ID Name Mean Yield Count 1.7.2 Diazomethane esterification 91% 41 9.3.1 Carboxylic acid to acid chloride 88% 704 9.7.14 Bromo to azido 85% 235 1.7.5 Methyl esterification 84% 2918 9.7.19 Bromo to iodo Finkelstein reaction 82% 116 6.1.3 N-Cbz deprotection 81% 1359 … 4.1.11 Larock indole synthesis 47% 55 3.11.3 Ullmann-type biaryl coupling 44% 407 1.7.1 Chan-Lam ether coupling 44% 154 4.1.4 Pinner pyrimidine synthesis 39% 47 CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 rArE named reactions • Adams decarboxylation • Imine Hosomi-Sakurai reaction • Angeli-Rimini