Chemistry Infrastructure Migration in a Global Pharmaceutical Company: Concerns and Reality

Zhenbin (Benjamin) Li, Ph.D.

Research Data Integration & Logistics Service Chemistry Infrastructure

• Chemistry Infrastructure: Computer systems, applications or software that store, search, manipulate, calculate, and visualize chemical or biological entities and their properties. Chemistry infrastructure is indispensible computer support in and development processes of . • Examples of Chemistry Infrastructure: Chemistry cartridge, chemistry drawing tools, structure standardization, chemical reaction and molecule visualization, etc. • Vendors of Chemistry Cartridges: MDL Direct (Accelrys), Accord (Accelrys), JChem (ChemAxon), ICCartridge (InfoChem) Daylight (daylight), Bingo (GGA), etc.

Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 2 History of Chemistry Infrastructure

1950 •1955 CAS Laid ground work for computer-based chemical information •1957 Ray and Kirsch: substructure searching algorithm (atom-by-atom matching), later modified by Sussenguth (1965) •1959 Opler and Baird: first graphical display of chemical structure •1965 Gluck, Morgan, Chemical storage and search system (Du Pont), canonical form of connection (bond-by-bond) and later modified by Morgan and become Gluck-Morgan Algorithm •1967 Armitage and Lynch, Structure similarity •1970 Crowe et al. fragment-based screening •1971 Hamilton, established the protein data bank (PDB) at Brookhaven National Lab •1971 Gund et al. 3D structure searching •1972 Wipke et al. 3D model from 2 D drawing with stereochemistry •1977 Mason, Peacock, Wipke, Molecular Design Limited, First database MACCS •1979 Chevron Chemical Company, first company to license MDL •1981 Lynch et al. Markush structures, 2 Patent Markush DARC (Derwent) and MARPAT (CAS) •1985 First commercial sale of Robernstein’s ChemDraw to Stu Schrieber and Yale Univ. •1986 ChemDraw 1.0 was released •1987 Dolata et al. 2D-3D converter and Hiller and Gasteiger CORINA •1987 MDL listed on NASDAQ •1988 Weininger, SMILES notation •1988 Downs et al. Parallel computing system (Transputer) •1989 Gasteiger and Weiske ChemInform, ChemoData, InfoChem reaction database, digitalized Beilstein Handbook •1991 MDL ISIS Client/server application •1992 Delby et al. Introduced MOLfile, Sdfile, RDfile, RXNfile, CTAB (v2000), the de facto •1996 MDL introduced V3000 format •1997 MDL acquired by Reed Elsevier •1998 ChemAxon formed •2003 Elsevier MDL introduced Xdfile •2004 Launch and adoption of ChemAxon's JChem Cartridge for Oracle to medium sized CRO •2005 Neurogen completely migrated chemistry infrastructure from MDL to ChemAxon •2007 Symyx acquired Elsevier MDL •2010 Accelrys merged with Symyx •2012 ChemAxon JChem Cartridge globally licensed to 5 of the top 10 pharma

2012 W. L. Chen, 2006, J. Chem. Inf. Model And personal communications Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 3 When to Consider Chemistry Infrastructure Migration Common Considerations of Chemistry Infrastructure Migration • Certain legacy systems would not run on new environment (hardware and operation systems) or requires tremendous effort (or cost) for upgrade • The current chemistry infrastructure technology lags behind the industry trend • The current chemistry infrastructure cannot meet the increasing demand of in- house • Dissatisfaction from technology and business demand for support and consulting • Long-term financial gain

Challenges in BI

• Historically, systems were built with ISIS platform, and needed to be migrated away • Isentris based alternative solutions did not offer performance advantages, and were therefore temporarily shelved • In-house chemistry systems demand robust APIs to integrate and manage global work-flows

Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 4

Chemistry Infrastructure Selection Criteria

• Reliability • Consulting • License model • Company stability • Extensibility • Support • Negotiation power • Size • User friendliness • Expertise • Short-term cost • Culture/work • Clear path for • Customization cut ethics migration • New upgrade • Long-term • Familiarity with according to financial gain global customers’ needs pharmaceutical industry

Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 5 ChemAxon as Chemistry Infrastructure

Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 ChemAxon as Chemistry Infrastructure

JChemBase (Java and .NET) Marvin/MarvinSketch

JChem Cartridge InstantJChem

Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 Things to Consider for Chemistry Infrastructure Migration

Legacy systems Timeline System interdependency On-going business demand Production interruption Re-training of developers User acceptance Re-training of end-users Financial commitment Resource Expertise Global alignment Reliability and flexibility of the new chemistry infrastructure

Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 8 Roadmap of Chemistry Infrastructure Migration

Negotiation Financial commitment Business Business approval Data migration intention Acquire licenses In-depth System migration Market options evaluation Testing Preliminary Pilot Completion evaluation implementation Consulting In-depth system analysis Implementation planning

Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 9 External Request Management System as A Pilot Project

BI Chemist CRO Chemist ERMS -Initials -Ordering dates -Amount -Completion dates -Request date DB UI -Quantity shipped -# of steps -Difficulty Logistics Calculation and Reporting

Inventory System Management Local / International Transfer compound info into inventory FedEx/DHL automatically Shipping Sheet

-Customs issues -Shipment contents

Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 10 ERMS as a Pilot Project

Request Management

ERMS DB Logistics Calculations User Login and Reporting Authentication Reagent and Authorization DB Shipping and Status Compound DB Structure Searching Commercial DB

BI Internal only Accessible by both BI and CROs Accessible by BI, but partially accessible by CROs E-Notebook DB Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 11 Reaction Scheme

Aldol condensation of acetone and ethyl acetoacetate gave β-keto-ester 3. A Grignard reaction involving methylmagnesium bromide provided alcohol 4, which was subjected to acid catalyzed elimination to give diene 5. Reduction and acylation gave diene 7 (Scheme 3, compound 1).

-MgBr

* http://en.wikipedia.org/wiki/Nicolaou_Taxol_total_synthesis

Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 12 Reaction Scheme: Iteration of Compounds

• JChem allows parsing the reaction scheme into individual compounds. • This can only be achieved when the regular arrow, instead of reaction arrow, is used in the scheme. • Mol file containing a mixture of all the compounds in reaction scheme can be separated using getFragments() method. • However, the order of compounds is not necessarily consistent with the reaction scheme. • Ideally developer should have some control of the order or at least the behavior can be understandable.

Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 13 Steps of JChem Implementation

• JChem Oracle Cartridge Installation • Data migration using JChem Manager or pure SQL statement • Create domain indices on the structures if data are created via SQL. • ChemAxon domain index can coexist with MDL Direct index on a same database instance. This allows us to better planning the data migration with low impact on current production systems • Rebuild the relationship in the database • Change application codes to implement ChemAxon technology • Change interface with ChemAxon user interfaces • Testing and deployment

Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 Common Cartridge Functions

•Search Structure: •Insertion SELECT COUNT(*) FROM JCHEM_STRUCTURE jchem_table_pkg.jc_insert('C1CCCCC1 WHERE JC_COMPARE(CD_STRUCTURE, ', 'JCHEM_Structure', null,null, null, 'C1CCCCC1', 'T:S') = 1; null); s: substructure search (default) na: substructure search fingerprint-only •Update f: full structure search; query and target jchem_table_pkg.jc_update('c1ccccc1' must have the same heavy atom network for , 'JCHEM_STRUCTURE', cd_id, null); matching. •Deletion ff: full fragment search; query must be full matching to a target fragment. jchem_table_pkg. d: duplicate search jc_delete('JCHEM_STRUCTURE', 'where i: similarity search. structure_id = 1001', null); u: superstructure search

http://www.chemaxon.com/jchem/doc/dev/cartridge/cartapi.html

Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 ERMS Fully ChemAxon-enabled

MDL ChemAxon select structure_id, molfile(molecule) as mole, select structure_id, jc_molconvertb(cd_structure, molwt(molecule) as mw, molfmla(molecule) as 'mol') as mole, cd_molweight as mw, cd_formula as formula, smiles, ….. from structure where formula, cd_smiles as smiles, ….. from flexmatch(molecule, ?, 'match=all')=1 jchem_structure where jc_compare(cd_structure, ?, 't:ff') = 1

select structure_id, jc_molconvertb(cd_structure, select structure_id, molfile(molecule) as mole, 'mol') as mole, cd_molweight as mw, cd_formula as chime(molecule) as chime, molwt(molecule) as mw, formula, cd_smiles as smiles, ….. from molfmla(molecule) as formula, smiles, ….. from jchem_structure where jc_compare(cd_structure, ?, structure where similar(molecule, ?, ?)=1 't:i simThreshold:?') = 1 select structure_id, molfile(molecule) as mole, select structure_id, jc_molconvertb(cd_structure, chime(molecule) as chime, molwt(molecule) as mw, 'mol') as mole, cd_molweight as mw, cd_formula as molfmla(molecule) as formula, smiles, …..from formula, cd_smiles as smiles, …..from structure where sss(molecule, ?)=1 jchem_structure where jc_compare(cd_structure, ?, 't:s') = 1

Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 Summary of the Pilot Project

• ChemAxon JChem allows us to achieve reaction scheme capture, regeneration and compound iteration programmatically, which are difficult, if possible, to achieve with other chemistry software. • We installed JChem cartridge and successfully implemented in ERMS v2. • Switching from MDL to ChemAxon is very straightforward, suggesting a high feasibility to migrate chemistry infrastructure from MDL to ChemAxon. • The total amount of time (60 hrs) is very reasonable to completely change from MDL-based technology to ChemAxon-based technology. • This project established the benchmark for other similar applications • ChemAxon technology provides opportunities to BI for future system development and integration. • Modular design in ERMS helps the smooth migration

Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 System Interdependency Analysis

Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 18 Migration Planning

Time 2012 2013 2014 System Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 System A System B System C System D System E System F System G System H

. Migration planning should be based on the interdependency of systems and applications as well as available resources. . The pilot project should provide benchmark for the estimation of time needed for each system. The estimation should include the migration of data, the application and interfaces. . Each additional system migration can serve as additional benchmark for the other systems in order to further modify and optimize the estimation. . Learning curve effect should be considered Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 Conclusions

• This has never been easy decision, but considering this as investment for next decades. • Use pilot project to prove the concept and address concerns, especially from the management. • System interdependency analysis is important step. It helps the migration plan independent of what vendor of chemistry infrastructure is. • The reality of the chemistry infrastructure migration may not be as bad as we originally fear • In migration process, the effort of data curation can be high. • Strong user support, especially from the key stakeholders, must be expected because there will be changes in user interface, even applications.

Zhenbin Li, Boehringer Ingelheim, ChemAxon UGM 2012 20