A ChemAxon/KNIME based tool for designing chemical libraries
Tim Parrott Brock Luty Dart NeuroScience Dart NeuroScience September 25, 2013 ChemAxon UGM Dart NeuroScience
Small molecules to maintain cognitive vitality (LTM)
Currently about 200 FTEs with build-out expected at 260
Privately held LLC by a single individual Scientific Computing
Scientific Computing collaborates with other DNS Departments to deliver solutions that simplify and accelerate the drug discovery process.
We rely on our (non-traditional) knowledge and experience in both Science and Technology to develop novel and efficient systems to meet this goal
Scientific Computing Groups
Computational Bioinformatics Chemistry Project Support Project Support - Modeling Philip Cheung - Target ID *Tami Marrone - SBDD/Library Design Doug Fenger - Expression Analysis / Pathways Meg McCarrick + 1 FTE James Na - Apply Methods - Novel Software algorithms Amy Shih - Pre-LO/LO/PCC - Enterprise Software (with Methods) Bill Sinko
Information Methods Management Development Software Development Data / Biz Analysis - Informatics Software Development - Data Capture + 1 Group Lead Ron Blanford - Developing new methods John Jaeger - Analytics Daniel Garden - Enterprise Scale Architecture Tim Parrott - Data Access Kevin Neal - RIA (MVC) with SOA James Harr Hari Muddana - QA/Scientific Support - Extensions for ELN, Spotfire, IJC, etc Eileen Tompkins - Project Management + 1 FTE Heather Jones
Background
Dart NeuroScience (DNS) 200+ Scientists
50+ Chemists
Parallel Synthesis Group We need a About 20 chemists involved in chemical library the design and creation of design tool ! chemical libraries A Basic Chemical Library Design Tool
Analyze Select Reactants
Enumerate Products Test Design Calculate Properties
Analyze & Filter Synthesize Goals Constraints Approach Limited IT/IM support Standardize calculations & reactions (services) Chemists already on software overload Simplify: wrap processes and minimize import/export operations
Enhance capabilities and speed by doing calculations remotely
Ease of Use Support
= Productivity Platforms
Visualization / Analytics
Data Pipelining
Chemical Property Calculations, Reaction Enumeration 3D Scoring Architecture Heavily invested in Service Oriented architecture (Rest Style API) with standardized DNS patterns Application Domain CRUD (Create, Read, Update, Delete) GUIs written for specific Service entities using MVC pattern (relying on Backbone.js and standardized DNS patterns)
Database Traditional Stateless Computational Services (Property Calculation, Enumeration, etc)
Services can be based on Scripts using command-line applications (primary Brock’s Geeky use-case). Services can also be written on KNIME and run in this Slide architecture.
Move all the heavy lifting to the servers (automated parallelization). KNIME as a Service Orchestration Layer
Tool Overview
Spotfire Selection & Export Configuration Panel
Custom Nodes Reactant Selection
Import curated classes of reactants (CRUD Service) Reactant Selection
Import list of Reagent Numbers (CRUD Service) Reactant Deduplication
Input Output
Need to identify and remove functionally equivalent reactants (Comp Service) Reaction Selection Reactions: A Look under the Hood
Server-Side “Reactor” nodes can contain multi- step workflows. (Comp Services) Calculations Clustering
Server-Side Calculations --- OpenEye ROCS
ROCS output includes the Shape/Pose that scored best and the Tanimoto Score against that query. (Computational Service) Pausing Local Execution Export to Spotfire Selections made in Spotfire Spotfire Selections returned to KNIME
New nodes with selected products & reactants appear in KNIME Final Steps
Stereochemical codes The library design plan contains needed for registration separate sdf files for the products are assigned based on and each reactant, along with a .csv structure. file listing how many times each (Computational Service) reactant is used. The zipped file is parsed on import into a chemist’s electronic laboratory notebook. Load Library Design Plan into the Agilent ELN
Custom Forms for planning and products tables Summary
• June 2011 Parallel Synthesis Group formed
• June 2012 First release of Library Design Tool (LDT)
• Sept 2012 Additional KNIME training
• November 2012 Second release (Clustering, ROCS)
• April 2013 Pausable Nodes, Deduplication
• August 2013 RN Lookup, Stereo Code Assigner 40 Total Reactions Acknowledgments
Node Development loki der quaeler
Services & Deployment Testing and troubleshooting Ron Blanford Eileen Tompkins Karen Do Andrew Burritt Kenny Leung The SGC Team Zach Young Daniel Garden Management & PM Melanie Nelson Heather Jones Brock Luty