Value Through Innovation

Chemistry Infrastructure Migration in a Global Pharmaceutical Company: Concerns and Reality Zhenbin (Benjamin) Li, Ph.D. Research Data Integration & Logistics Service Chemistry Infrastructure • Chemistry Infrastructure: Computer systems, applications or software that store, search, manipulate, calculate, and visualize chemical or biological entities and their properties. Chemistry infrastructure is indispensible computer support in drug discovery and development processes of pharmaceutical industry. • Examples of Chemistry Infrastructure: Chemistry cartridge, chemistry drawing tools, structure standardization, chemical reaction and molecule visualization, etc. • Vendors of Chemistry Cartridges: MDL Direct (Accelrys), Accord (Accelrys), JChem (ChemAxon), ICCartridge (InfoChem) Daylight (daylight), Bingo (GGA), etc. Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 2 History of Chemistry Infrastructure 1950 •1955 CAS Laid ground work for computer-based chemical information database •1957 Ray and Kirsch: substructure searching algorithm (atom-by-atom matching), later modified by Sussenguth (1965) •1959 Opler and Baird: first graphical display of chemical structure •1965 Gluck, Morgan, Chemical storage and search system (Du Pont), canonical form of connection (bond-by-bond) and later modified by Morgan and become Gluck-Morgan Algorithm •1967 Armitage and Lynch, Structure similarity •1970 Crowe et al. fragment-based screening •1971 Hamilton, established the protein data bank (PDB) at Brookhaven National Lab •1971 Gund et al. 3D structure searching •1972 Wipke et al. 3D model from 2 D drawing with stereochemistry •1977 Mason, Peacock, Wipke, Molecular Design Limited, First database MACCS •1979 Chevron Chemical Company, first company to license MDL •1981 Lynch et al. Markush structures, 2 Patent databases Markush DARC (Derwent) and MARPAT (CAS) •1985 First commercial sale of Robernstein’s ChemDraw to Stu Schrieber and Yale Univ. •1986 ChemDraw 1.0 was released •1987 Dolata et al. 2D-3D converter and Hiller and Gasteiger CORINA •1987 MDL listed on NASDAQ •1988 Weininger, SMILES notation •1988 Downs et al. Parallel computing system (Transputer) •1989 Gasteiger and Weiske ChemInform, ChemoData, InfoChem reaction database, digitalized Beilstein Handbook •1991 MDL ISIS Client/server application •1992 Delby et al. Introduced MOLfile, Sdfile, RDfile, RXNfile, CTAB (v2000), the de facto •1996 MDL introduced V3000 format •1997 MDL acquired by Reed Elsevier •1998 ChemAxon formed •2003 Elsevier MDL introduced Xdfile •2004 Launch and adoption of ChemAxon's JChem Cartridge for Oracle to medium sized CRO •2005 Neurogen completely migrated chemistry infrastructure from MDL to ChemAxon •2007 Symyx acquired Elsevier MDL •2010 Accelrys merged with Symyx •2012 ChemAxon JChem Cartridge globally licensed to 5 of the top 10 pharma 2012 W. L. Chen, 2006, J. Chem. Inf. Model And personal communications Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 3 When to Consider Chemistry Infrastructure Migration Common Considerations of Chemistry Infrastructure Migration • Certain legacy systems would not run on new environment (hardware and operation systems) or requires tremendous effort (or cost) for upgrade • The current chemistry infrastructure technology lags behind the industry trend • The current chemistry infrastructure cannot meet the increasing demand of in- house software development • Dissatisfaction from technology and business demand for support and consulting • Long-term financial gain Challenges in BI • Historically, systems were built with ISIS platform, and needed to be migrated away • Isentris based alternative solutions did not offer performance advantages, and were therefore temporarily shelved • In-house chemistry systems demand robust APIs to integrate and manage global work-flows Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 4 Chemistry Infrastructure Selection Criteria • Reliability • Consulting • License model • Company stability • Extensibility • Support • Negotiation power • Size • User friendliness • Expertise • Short-term cost • Culture/work • Clear path for • Customization cut ethics migration • New upgrade • Long-term • Familiarity with according to financial gain global customers’ needs pharmaceutical industry Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 5 ChemAxon as Chemistry Infrastructure Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 ChemAxon as Chemistry Infrastructure JChemBase (Java and .NET) Marvin/MarvinSketch JChem Cartridge InstantJChem Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 Things to Consider for Chemistry Infrastructure Migration Legacy systems Timeline System interdependency On-going business demand Production interruption Re-training of developers User acceptance Re-training of end-users Financial commitment Resource Expertise Global alignment Reliability and flexibility of the new chemistry infrastructure Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 8 Roadmap of Chemistry Infrastructure Migration Negotiation Financial commitment Business Business approval Data migration intention Acquire licenses In-depth System migration Market options evaluation Testing Preliminary Pilot Completion evaluation implementation Consulting In-depth system analysis Implementation planning Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 9 External Request Management System as A Pilot Project BI Chemist CRO Chemist ERMS -Initials -Ordering dates -Amount -Completion dates -Request date DB UI -Quantity shipped -# of steps -Difficulty Logistics Calculation and Reporting Inventory System Management Local / International Transfer compound info into inventory FedEx/DHL automatically Shipping Sheet -Customs issues -Shipment contents Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 10 ERMS as a Pilot Project Request Management ERMS DB Logistics Calculations User Login and Reporting Authentication Reagent and Authorization DB Shipping and Status Compound DB Structure Searching Commercial DB BI Internal only Accessible by both BI and CROs Accessible by BI, but partially accessible by CROs E-Notebook DB Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 11 Reaction Scheme Aldol condensation of acetone and ethyl acetoacetate gave β-keto-ester 3. A Grignard reaction involving methylmagnesium bromide provided alcohol 4, which was subjected to acid catalyzed elimination to give diene 5. Reduction and acylation gave diene 7 (Scheme 3, compound 1). -MgBr * http://en.wikipedia.org/wiki/Nicolaou_Taxol_total_synthesis Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 12 Reaction Scheme: Iteration of Compounds • JChem allows parsing the reaction scheme into individual compounds. • This can only be achieved when the regular arrow, instead of reaction arrow, is used in the scheme. • Mol file containing a mixture of all the compounds in reaction scheme can be separated using getFragments() method. • However, the order of compounds is not necessarily consistent with the reaction scheme. • Ideally developer should have some control of the order or at least the behavior can be understandable. Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 13 Steps of JChem Implementation • JChem Oracle Cartridge Installation • Data migration using JChem Manager or pure SQL statement • Create domain indices on the structures if data are created via SQL. • ChemAxon domain index can coexist with MDL Direct index on a same database instance. This allows us to better planning the data migration with low impact on current production systems • Rebuild the relationship in the database • Change application codes to implement ChemAxon technology • Change interface with ChemAxon user interfaces • Testing and deployment Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 Common Cartridge Functions •Search Structure: •Insertion SELECT COUNT(*) FROM JCHEM_STRUCTURE jchem_table_pkg.jc_insert('C1CCCCC1 WHERE JC_COMPARE(CD_STRUCTURE, ', 'JCHEM_Structure', null,null, null, 'C1CCCCC1', 'T:S') = 1; null); s: substructure search (default) na: substructure search fingerprint-only •Update f: full structure search; query and target jchem_table_pkg.jc_update('c1ccccc1' must have the same heavy atom network for , 'JCHEM_STRUCTURE', cd_id, null); matching. •Deletion ff: full fragment search; query must be full matching to a target fragment. jchem_table_pkg. d: duplicate search jc_delete('JCHEM_STRUCTURE', 'where i: similarity search. structure_id = 1001', null); u: superstructure search http://www.chemaxon.com/jchem/doc/dev/cartridge/cartapi.html Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 ERMS Fully ChemAxon-enabled MDL ChemAxon select structure_id, molfile(molecule) as mole, select structure_id, jc_molconvertb(cd_structure, molwt(molecule) as mw, molfmla(molecule) as 'mol') as mole, cd_molweight as mw, cd_formula as formula, smiles, ….. from structure where formula, cd_smiles as smiles, ….. from flexmatch(molecule, ?, 'match=all')=1 jchem_structure where jc_compare(cd_structure, ?, 't:ff') = 1 select structure_id, jc_molconvertb(cd_structure, select structure_id, molfile(molecule) as mole, 'mol') as mole, cd_molweight as mw, cd_formula as chime(molecule) as chime, molwt(molecule) as mw, formula, cd_smiles as smiles, ….. from molfmla(molecule) as formula, smiles, ….. from jchem_structure where jc_compare(cd_structure, ?, structure where similar(molecule, ?, ?)=1 't:i simThreshold:?') = 1 select structure_id, molfile(molecule) as mole, select structure_id, jc_molconvertb(cd_structure, chime(molecule) as chime, molwt(molecule) as mw, 'mol')

Value Through Innovation

Report on an NIH Workshop on Ultralarge Chemistry Databases Wendy A

Qsar Methods Development, Virtual and Experimental Screening for Cannabinoid Ligand Discovery

Open Chemoinformatic Resources to Explore the Structure, Properties and Chemical Space of Cite This: RSC Adv.,2017,7,54153 Molecules

Retro Drug Design: from Target Properties to Molecular Structures

A Chemaxon/KNIME Based Tool for Designing Chemical Libraries

Optimizing the Use of Open-Source Software Applications in Drug

Bringing Open Source to Drug Discovery

Deltasoft's Chemcart

Useful Molecular Modelling and Drug Design Softwares and Databases

LNCS 5102, Pp

Mining Collections of Compounds with Screening Assistant 2

Press Release. Enamine Collaborates with Chemaxon to Provide