A ChemAxon/KNIME based tool for designing chemical libraries

Tim Parrott Brock Luty Dart NeuroScience Dart NeuroScience September 25, 2013 ChemAxon UGM Dart NeuroScience

Small molecules to maintain cognitive vitality (LTM)

Currently about 200 FTEs with build-out expected at 260

Privately held LLC by a single individual Scientific Computing

Scientific Computing collaborates with other DNS Departments to deliver solutions that simplify and accelerate the process.

We rely on our (non-traditional) knowledge and experience in both Science and Technology to develop novel and efficient systems to meet this goal

Scientific Computing Groups

Computational Chemistry Project Support Project Support - Modeling Philip Cheung - Target ID *Tami Marrone - SBDD/Library Design Doug Fenger - Expression Analysis / Pathways Meg McCarrick + 1 FTE James Na - Apply Methods - Novel Software algorithms Amy Shih - Pre-LO/LO/PCC - Enterprise Software (with Methods) Bill Sinko

Information Methods Management Development Data / Biz Analysis - Informatics Software Development - Data Capture + 1 Group Lead Ron Blanford - Developing new methods John Jaeger - Analytics Daniel Garden - Enterprise Scale Architecture Tim Parrott - Data Access Kevin Neal - RIA (MVC) with SOA James Harr Hari Muddana - QA/Scientific Support - Extensions for ELN, Spotfire, IJC, etc Eileen Tompkins - Project Management + 1 FTE Heather Jones

Background

Dart NeuroScience (DNS) 200+ Scientists

50+ Chemists

Parallel Synthesis Group We need a About 20 chemists involved in chemical library the design and creation of design tool ! chemical libraries A Basic Chemical Library Design Tool

Analyze Select Reactants

Enumerate Products Test Design Calculate Properties

Analyze & Filter Synthesize Goals Constraints Approach Limited IT/IM support Standardize calculations & reactions (services) Chemists already on software overload Simplify: wrap processes and minimize import/export operations

Enhance capabilities and speed by doing calculations remotely

Ease of Use Support

= Productivity  Platforms

Visualization / Analytics

Data Pipelining

Chemical Property Calculations, Reaction Enumeration 3D Scoring Architecture Heavily invested in Service Oriented architecture (Rest Style API) with standardized DNS patterns Application Domain CRUD (Create, Read, Update, Delete) GUIs written for specific Service entities using MVC pattern (relying on Backbone.js and standardized DNS patterns)

Database Traditional Stateless Computational Services (Property Calculation, Enumeration, etc)

Services can be based on Scripts using command-line applications (primary Brock’s Geeky use-case). Services can also be written on KNIME and run in this Slide architecture.

Move all the heavy lifting to the servers (automated parallelization). KNIME as a Service Orchestration Layer

Tool Overview

Spotfire Selection & Export Configuration Panel

Custom Nodes Reactant Selection

Import curated classes of reactants (CRUD Service) Reactant Selection

Import list of Reagent Numbers (CRUD Service) Reactant Deduplication

Input Output

Need to identify and remove functionally equivalent reactants (Comp Service) Reaction Selection Reactions: A Look under the Hood

Server-Side “Reactor” nodes can contain multi- step workflows. (Comp Services) Calculations Clustering

Server-Side Calculations --- OpenEye ROCS

ROCS output includes the Shape/Pose that scored best and the Tanimoto Score against that query. (Computational Service) Pausing Local Execution Export to Spotfire Selections made in Spotfire Spotfire Selections returned to KNIME

New nodes with selected products & reactants appear in KNIME Final Steps

Stereochemical codes The library design plan contains needed for registration separate sdf files for the products are assigned based on and each reactant, along with a .csv structure. file listing how many times each (Computational Service) reactant is used. The zipped file is parsed on import into a chemist’s electronic laboratory notebook. Load Library Design Plan into the Agilent ELN

Custom Forms for planning and products tables Summary

• June 2011 Parallel Synthesis Group formed

• June 2012 First release of Library Design Tool (LDT)

• Sept 2012 Additional KNIME training

• November 2012 Second release (Clustering, ROCS)

• April 2013 Pausable Nodes, Deduplication

• August 2013 RN Lookup, Stereo Code Assigner 40 Total Reactions Acknowledgments

Node Development loki der quaeler

Services & Deployment Testing and troubleshooting Ron Blanford Eileen Tompkins Karen Do Andrew Burritt Kenny Leung The SGC Team Zach Young Daniel Garden Management & PM Melanie Nelson Heather Jones Brock Luty