<<

Advancing Non-Targeted Analysis of Xenobiotics in Environmental & Biological Media

EPA’s DSSTox Chemical : A Resource for the Non-Targeted Testing Community

Ann Richard National Center for Computational Toxicology Office of Research & Development, US EPA

August 18-19, 2015 Research Triangle Park, NC 0 Outline

view of problem

 DSSTox chemical database

 ToxCast chemical library

 Tox21 analytical QC

 Challenges

1 Cheminformatics view of non-targeted testing problem

water upholstery clothes Chemical universe air blood serum Exposure? Toxic? toys Suspect testing, dust carpet physical standards

MI Chemical mass substance database {mono-isotopic 1:many mass}

{formulas} Inventory/Data Thousands of mass ion 1:many peaks & abundances {parent structures} {CAS-name} 1:1

What’s in the sample? Chemical {structure} & structures structure Should I be worried? How big is the problem? http://www.chemspider.com/blog/

(22 million ChemSpider IDs in 2008)

Highest frequency formula ChemSpider? Theoretically…

C H O has 11 12 3 C H N O 4,703,963,455 18 20 2 3 occurs 5110 possible times isomers!!! How big is the problem? http://www.chemspider.com/blog/

(22 million ChemSpider IDs in 2008)

ChemSpider 22 M

Problem too big to solve with formula matching & KEGG structure enumeration HMDB C6H12O6 (71 hits) 8590 unique formulas Cheminformatics view of non-targeted testing problem

Non-Targeted Screening Too many “hit possibilities

What’s in the sample?

Prioritization

 Enriched Targeted Screening chemical , Should I be worried? standards & tools

Number Number of chemicals Suspect Screening Too little “hit” coverage DSSTox_v1 (thru 3/2014) http://www.epa.gov/ncct/dsstox/  Original target audience: Structure-Activity Relationship (SAR) toxicity modeling community  Focus on EPA, HPV, environmental toxicity datasets  Emphasis on accurate CAS-name-structure annotations at substance level  Public resource for high-quality structure-data files

Approx. 25K CAS-substances, 16K structures 6 DSSTox Update

DSSTox_v1 DSSTox_v2  Convert DSSTox tables to MySQL  Develop curation interface  Implement cheminformatics workflow  Expand chemical content  Register ACToR data inventories  Web-services & Dashboard access

 Website to be retired 9/30/2015  DSSTox_v1 files & documentation will remain available on EPA ftp site  Original 25K substance records stored in MS ACCESS and Excel tables serve as the initial Level 1,2 input to DSSTox_v2

7 DSSTox_v2 CAS-Structure Sources: QC levels

DSSTox_v1 EPA SRS NLM ChemID EPA ACToR ~900K records ~100K records ~400K records ~500K records ~25K records MLSCN HTS TSCA, EPA ToxNet:HSDB, data, large, incl. EnvironTox focus Curated Regulatory Docs EPA, FDA ChemID, Quality issues, EnvironTox focus Valid CAS names Semi-curated ACToR public structures Too small No Structures Quality, Stereo Quality issues

DSSTox_v2

1. High

ChemID

w/ Curated 2. Low conflicts no no conflicts 3. High w/PubChem sole source, 4. Med

no conflicts Public sole source, no conflicts 5. Low conflicts 6. Untrusted conflicts 7. Incomplete DSSTox_v2 CAS-Structure Sources: QC levels

DSSTox_v1 EPA SRS NLM ChemID EPA ACToR ~500KData CAS Structure ~900K records records 1 1 ~100K1 records ~400K records ~500K records ~20K records~25K records MLSCN HTS Data2 CAS2 StructureTSCA,1 EPA ToxNet:HSDB, data, large, incl. Limitation: EnvironTox focus Limitation:Curated Regulatory Docs EPA, FDA ChemID, CAS-Name- DataQuality3 issues, CASTooEnvironTox Small;3 focusStructureValid1 CAS names Semi-curated ACToR Structure public structures AccessibilityToo small No Structures Quality, Stereo Quality issues Error Data4 CAS4 Structure1

DSSTox_v2

1. High

ChemID

w/ Curated 2. Low conflicts no no conflicts ? Data1 ? CAS1 Structure1 3. High w/PubChem sole source, 4. Med Data2 ? CAS2 ? Structure1 no conflicts Public sole source, no conflicts 5. Low conflicts Data CAS ? Structure 6. Untrusted 3 ? 3 1 conflicts 7. Incomplete Data4 ? CAS4 ? Structure1 DSSTox_v2 Construction

Data source load order: 1) DSSTox_v1 (~22K)  1:1 CAS-structure mappings  Assign NOCAS_GSID  Related CAS & structure mappings (e.g., NOCAS, mixtures) 2) EPA SRS (~77K)  systematic name structure conversion  internal CAS-structure conflicts (12.5%)  ChemID conflicts (24% of 30K overlaps)  DSSTox conflicts (8% of 6200 overlaps)  queue for curation 3) ChemID (~77K)  internal CAS-structure conflicts (4.5%)  PubChem conflicts (45% of 225K overlaps) … OUCH!!  DSSTox conflicts (11% of 2300 overlaps)  queue for curation 4) And so on …

10 Example problem

11 CAS-Structure “Sphere of Confusion”

Monoisotopic Mass CAS2 ?

CAS1 ? Formula Data1 • Deleted CAS Data2 • Invalid CAS Parent structure Data3 • Salt forms (no stereo, desalted) Data4 • Complex forms Data5 • Hydrate forms Data6 • Approx mappings to mixtures many:1 Valid CAS-substance? Data7 • Approx mappings to ill- Data8 defined substances  Resolve CAS-structure mappings Data9 • Stereoisomers CAS5 ? • Unresolved tautomers for accurate chemical-data CAS ? NOCAS? 4 mapping, i.e. what was tested? CAS3 ?  Collapse sphere to bring all DSSTox_v2 Database related data to NTS parent & Cheminformatics Layer structure-formula level DSSTox_v2 Totals

QC Level Totals (12Jun2015)

DSSTox_v2 DSSTox_High 4535  154.5K substances in top 4 1. High QC bins exceed public DSSTox_Low 16K

Curated 2. Low accuracy standards Public_High 33K 3. High  ~15K w/o structures Public_Medium 101K 4. Med

Public 5. Low Public_Low 584K 6. Untrusted Public_Untrusted ~ 310K pending 7. Incomplete ~ 150K pending

validated KNIME structure-“cleaning” workflow

https://www.knime.org/knime Objectives:  Combine community approaches to structure processing  Develop a flexible workflow to be used by EPA and shared publicly  Process DSSTox files to create “QSAR-ready” structures

 Parse SDF, remove fragments  Explicit hydrogen removed  Dearomatization  Removal of chirality info, isotopes and pseudo-  Aromatization + add explicit hydrogen atoms  Standardize Nitro groups  Other tautomerize/mesomerization Publicly available cheminformatics  Neutralize (when possible) toolkits in KNIME:

Indigo Slide courtesy of K. Mansouri

14 DSSTox Viewer (EPA Intranet)

http://rtpavaki1.epa.gov:8080/DSSToxViewer/

 35% DSSTox_High have deleted CAS  >20K total deleted CAS

validated DSSTox Batch Tool (EPA Intranet)

http://rtpavaki1.epa.gov:8080/DSSToxViewer/ CSS Chemical Explorer Dashboard (powered by DSSTox & ACToR)

MOA & Knowledge- Structure “cleaning” ToxQP: QSAR models; informed features & SAR-ready structure files phys-chem properties chemotypes Structure-based predictions Chemical Similarity searching Structures Chemical search by Fingerprints, feature sets CAS-names Phys-chem calc & CSS measured properties SMILES Dashboards Structures CASRN & Chemical Webservices, link- Names Structure file Chemical EPA SRS outs to EPA Inventories & downloads (TSCA, EcoTox) databases & webtools Data Sources ToxCast/Tox21 (e.g.,WebICE, PK/ADME model 1:1 mapping EnvironFateSimulator, CAS:Name:structure (ACTOR) ToxRef inputs & outputs EDSP, etc.) EDSP21 Quality scores CAS look-up CPCAT ToxCast/Tox21 New list error-checking >700K substances HTS activities

17 EPA’s ToxCast/Tox21 Projects

• Build a diverse, highly prioritized chemical library of interest to EPA regulatory programs (e.g., EDSP) and of relevance to environmental toxicology • Use high-throughput screening (HTS) to generate bioassay profiles and fill data gaps for thousands of chemicals • Use all of these data to improve ability to model adverse outcomes

ToxCast Chemicals Tox21

ToxRef HTS-In vitro

18 ToxCast & Tox21 Inventories

Set Chemicals Assays Endpoints Completion Available ToxCast Phase I 293 ~600 ~700 2011 Now ToxCast Phase II 767 ~600 ~700 03/2013 Now ToxCast E1K 800 ~50 ~120 03/2013 Now Tox21 ~8900 ~80 ~150 Ongoing Ongoing ToxCast Phase III ~2000 ~300 ~300 In process 2014-2015

Pesticides , antimicrobials, food additives, green alternatives, HPV, MPV, >600 endocrine reference cmpds, tox reference cmpds, NTP in vivo, FDA GRAS, FDA PAFA, EDSP, water contaminants, exposure data, industrial, failed drugs,

marketed drugs, fragrances, flame retardants, … Assays

0 Chemicals >9000 Construction of EPA’s Tox21/ToxCast Inventories

EPA ACToR EPA DSSTox EPA Program Offices OECD, EU Stakeholder Nominations

EPA’s Tox21/ToxCast Phase II Chemical Nominations (>100 lists)

Candidates for 38% Complex mixtures, polymers procurement Ill-defined substances No structure available 37% Insoluble (est. LogP) Able to procure Volatile (est. Vapor Pressure) Unable to procure Too reactive, explosive or cost prohibitive Inorganics, radioactive, etc.

DMSO insoluble EPA Tox21 15% Volatile

ph1v2 ph2 ph1_v2, E1K reference chemicals E1K Donated chemicals (incl. 135 failed drugs) 1860 3726 ~4400 ~7K ~19K NUMBER OF CHEMICALS

20 ToxCast Chemical Coverage: Use, Exposure, Toxicity

15 # lists /chemical 10 5

Consumer Use Colorant Fragrance Personal Care Inert FDA EAFUS FDA GRAS EDSP21 IRISTR Antimicrobial Pesticide Pharmaceutical FDAMDD drugs EPA_IUR NHANES Chem Industrial HPVCSI COSMOS NTPBSI CPDBAS TOXREF

Phase I: Phase II (ph2,E1K): Phase III (ph3): Pesticides In vivo-rich; donated failed drugs; Toxicity reference chemicals; EDSP21; In vivo-rich EDSP21; high diversity, coverage extend coverage of industrial chemicals of toxicity space ToxCast: Toxicity Structure-Alerts & Generic Feature Coverage

 High incidence of predicted toxicity alerting features

 High degree of structural & feature diversity

Ashby Alerts(15) Ashby

TTC Category Alerts(37) TTC Category Profiling Features (38)

ToxCast Chemicals (4214) ordered by Testing Phase 22 Tox21 Analytical QC

A copy of each parent Tox21 384 well plate is subjected to analytical QC for assessing purity, identity, concentration, stability

LC-MS Fail, inconclusive or analytical method inappropriate Confirm parent ion peak and >90% purity

GC-MS Retest at 4 mo. time point under conditions for stability

Work performed under NIH Contract with Publish QC summary results in OpAns, Durham, NC

association with assay data 23 Tox21 and ToxCast Chemical Library Analytical QC Results (8/2015)

Tox21_QC_Sum-GSID (8593 total) • 50% pass purity/ID/concentration checks • A third(34%) of library pose analytical QC Inconclusives challenges (LC-MS and GC-MS) Pass 34% (50%) • 2% degrade after 4 months under testing Problems conditions Degrade 14% 2% • 14% problems - purity (<75%), ID and/or low concentration (<30% of expected [C])

 Which chemicals have QC issues? (e.g. SVOCs?)  Which chemicals were not analyzed? (e.g., mixtures, inorganics, etc.)  How are the HTS activity profiles linked to QC?

Results as of 8/2015 Tox21 Analytical Chemical QC: Publicly available in PubChem Chemical “Universe” problem

Biodegradation products Metabolome Protein & DNA fragments Virtual screening libraries The other Combinatorial chemistry 98% Scientific literature Polymer fragments Toxicology studies Polyorganic acids Environment/Industry Adducts, surfactants Commercially available Exposure? Toxic?

 Where should DSSTox expand chemically?  What part of the universe should we store in databases? DSSTox_v2  How can the valuable ToxCast physical CAS-structures library be shared for greatest gain? CAS-no structures  What cheminformatics “plumbing” would be most useful to this community? Building EPA’s Chemical Informatics Infrastructure & Linkages Phys-chem properties Measured Exposure Models Properties Data Reactivity Biotransformation Fate & Transport ADME Toxicity Predictions Chemicals DSSTox_v2 Structure-alerts

water upholstery NTS clothes Biological air toys activities dust carpet Biological ToxCast Chemicals Data Tox21 ToxRef ACToR COSMOS Inventories CPCat, Use HTS-In vitro Exposure Acknowledgements:

Chris Grulke - Lockheed Martin Contractor to EPA DSSTox lead developer/ programmer Indira Thillainadarajah - EPA SEEP DSSTox lead curator Kamel Mansouri - ORISE Post Doctoral Fellow KNIME workflow Jayaram Kancherla - ORISE Pre-Doctoral Fellow Chemical dashboard, web-tools development Richard Judson - NCCT ACToR Project Lead Antony Williams - NCCT Cheminformatics Lead Mark Strynar, Jon Sobus, Julia Rager, John Wambaugh - EPA

This work was reviewed by EPA and approved for publication but does not necessarily reflect official Agency policy.