
Advancing Non-Targeted Analysis of Xenobiotics in Environmental & Biological Media EPA’s DSSTox Chemical Database: A Resource for the Non-Targeted Testing Community Ann Richard National Center for Computational Toxicology Office of Research & Development, US EPA August 18-19, 2015 Research Triangle Park, NC 0 Outline Cheminformatics view of problem DSSTox chemical database ToxCast chemical library Tox21 analytical QC Challenges 1 Cheminformatics view of non-targeted testing problem water upholstery clothes Chemical universe air blood serum Exposure? Toxic? toys Suspect testing, dust carpet physical standards MI Chemical mass substance database {mono-isotopic 1:many mass} {formulas} Inventory/Data Thousands of mass ion 1:many peaks & abundances {parent structures} {CAS-name} 1:1 What’s in the sample? Chemical {structure} & structures structure Should I be worried? How big is the problem? http://www.chemspider.com/blog/ (22 million ChemSpider IDs in 2008) Highest frequency formula ChemSpider? Theoretically… C H O has 11 12 3 C H N O 4,703,963,455 18 20 2 3 occurs 5110 possible times isomers!!! How big is the problem? http://www.chemspider.com/blog/ (22 million ChemSpider IDs in 2008) ChemSpider 22 M Problem too big to solve with formula matching & KEGG structure enumeration HMDB C6H12O6 (71 hits) 8590 unique formulas Cheminformatics view of non-targeted testing problem Non-Targeted Screening Too many “hit possibilities What’s in the sample? Prioritization Enriched Targeted Screening chemical databases, Should I be worried? standards & tools Number of chemicals Suspect Screening Too little “hit” coverage DSSTox_v1 (thru 3/2014) http://www.epa.gov/ncct/dsstox/ Original target audience: Structure-Activity Relationship (SAR) toxicity modeling community Focus on EPA, HPV, environmental toxicity datasets Emphasis on accurate CAS-name-structure annotations at substance level Public resource for high-quality structure-data files Approx. 25K CAS-substances, 16K structures 6 DSSTox Update DSSTox_v1 DSSTox_v2 Convert DSSTox tables to MySQL Develop curation interface Implement cheminformatics workflow Expand chemical content Register ACToR data inventories Web-services & Dashboard access Website to be retired 9/30/2015 DSSTox_v1 files & documentation will remain available on EPA ftp site Original 25K substance records stored in MS ACCESS and Excel tables serve as the initial Level 1,2 input to DSSTox_v2 7 DSSTox_v2 CAS-Structure Sources: QC levels DSSTox_v1 EPA SRS NLM ChemID EPA ACToR ~900K records ~100K records ~400K records ~500K records ~25K records MLSCN HTS TSCA, EPA ToxNet:HSDB, data, large, incl. EnvironTox focus Curated Regulatory Docs EPA, FDA ChemID, Quality issues, EnvironTox focus Valid CAS names Semi-curated ACToR public structures Too small No Structures Quality, Stereo Quality issues DSSTox_v2 1. High ChemID w/ Curated 2. Low conflicts no no conflicts 3. High w/PubChem sole source, 4. Med no conflicts Public sole source, no conflicts 5. Low conflicts 6. Untrusted conflicts 7. Incomplete DSSTox_v2 CAS-Structure Sources: QC levels DSSTox_v1 EPA SRS NLM ChemID EPA ACToR ~500KData CAS Structure ~900K records records 1 1 ~100K1 records ~400K records ~500K records ~20K records~25K records MLSCN HTS Data2 CAS2 StructureTSCA,1 EPA ToxNet:HSDB, data, large, incl. Limitation: EnvironTox focus Limitation:Curated Regulatory Docs EPA, FDA ChemID, CAS-Name- DataQuality3 issues, CASTooEnvironTox Small;3 focusStructureValid1 CAS names Semi-curated ACToR Structure public structures AccessibilityToo small No Structures Quality, Stereo Quality issues Error Data4 CAS4 Structure1 DSSTox_v2 1. High ChemID w/ Curated 2. Low conflicts no no conflicts ? Data1 ? CAS1 Structure1 3. High w/PubChem sole source, 4. Med Data2 ? CAS2 ? Structure1 no conflicts Public sole source, no conflicts 5. Low conflicts Data CAS ? Structure 6. Untrusted 3 ? 3 1 conflicts 7. Incomplete Data4 ? CAS4 ? Structure1 DSSTox_v2 Construction Data source load order: 1) DSSTox_v1 (~22K) 1:1 CAS-structure mappings Assign NOCAS_GSID Related CAS & structure mappings (e.g., NOCAS, mixtures) 2) EPA SRS (~77K) systematic name structure conversion internal CAS-structure conflicts (12.5%) ChemID conflicts (24% of 30K overlaps) DSSTox conflicts (8% of 6200 overlaps) queue for curation 3) ChemID (~77K) internal CAS-structure conflicts (4.5%) PubChem conflicts (45% of 225K overlaps) … OUCH!! DSSTox conflicts (11% of 2300 overlaps) queue for curation 4) And so on … 10 Example problem 11 CAS-Structure “Sphere of Confusion” Monoisotopic Mass CAS2 ? CAS1 ? Formula Data1 • Deleted CAS Data2 • Invalid CAS Parent structure Data3 • Salt forms (no stereo, desalted) Data4 • Complex forms Data5 • Hydrate forms Data6 • Approx mappings to mixtures many:1 Valid CAS-substance? Data7 • Approx mappings to ill- Data8 defined substances Resolve CAS-structure mappings Data9 • Stereoisomers CAS5 ? • Unresolved tautomers for accurate chemical-data CAS ? NOCAS? 4 mapping, i.e. what was tested? CAS3 ? Collapse sphere to bring all DSSTox_v2 Database related data to NTS parent & Cheminformatics Layer structure-formula level DSSTox_v2 Totals QC Level Totals (12Jun2015) DSSTox_v2 DSSTox_High 4535 154.5K substances in top 4 1. High QC bins exceed public DSSTox_Low 16K Curated 2. Low accuracy standards Public_High 33K 3. High ~15K w/o structures Public_Medium 101K 4. Med Public 5. Low Public_Low 584K 6. Untrusted Public_Untrusted ~ 310K pending 7. Incomplete ~ 150K pending validated KNIME structure-“cleaning” workflow https://www.knime.org/knime Objectives: Combine community approaches to structure processing Develop a flexible workflow to be used by EPA and shared publicly Process DSSTox files to create “QSAR-ready” structures Parse SDF, remove fragments Explicit hydrogen removed Dearomatization Removal of chirality info, isotopes and pseudo-atoms Aromatization + add explicit hydrogen atoms Standardize Nitro groups Other tautomerize/mesomerization Publicly available cheminformatics Neutralize (when possible) toolkits in KNIME: Indigo Slide courtesy of K. Mansouri 14 DSSTox Viewer (EPA Intranet) http://rtpavaki1.epa.gov:8080/DSSToxViewer/ 35% DSSTox_High have deleted CAS >20K total deleted CAS validated DSSTox Batch Tool (EPA Intranet) http://rtpavaki1.epa.gov:8080/DSSToxViewer/ CSS Chemical Explorer Dashboard (powered by DSSTox & ACToR) MOA & Knowledge- Structure “cleaning” ToxQP: QSAR models; informed features & SAR-ready structure files phys-chem properties chemotypes Structure-based predictions Chemical Similarity searching Structures Chemical search by Fingerprints, feature sets CAS-names Phys-chem calc & CSS measured properties SMILES Dashboards Structures CASRN & Chemical Webservices, link- Names Structure file Chemical EPA SRS outs to EPA Inventories & downloads (TSCA, EcoTox) databases & webtools Data Sources ToxCast/Tox21 (e.g.,WebICE, PK/ADME model 1:1 mapping EnvironFateSimulator, CAS:Name:structure (ACTOR) ToxRef inputs & outputs EDSP, etc.) EDSP21 Quality scores CAS look-up CPCAT ToxCast/Tox21 New list error-checking >700K substances HTS activities 17 EPA’s ToxCast/Tox21 Projects • Build a diverse, highly prioritized chemical library of interest to EPA regulatory programs (e.g., EDSP) and of relevance to environmental toxicology • Use high-throughput screening (HTS) to generate bioassay profiles and fill data gaps for thousands of chemicals • Use all of these data to improve ability to model adverse outcomes ToxCast Chemicals Tox21 ToxRef HTS-In vitro 18 ToxCast & Tox21 Inventories Set Chemicals Assays Endpoints Completion Available ToxCast Phase I 293 ~600 ~700 2011 Now ToxCast Phase II 767 ~600 ~700 03/2013 Now ToxCast E1K 800 ~50 ~120 03/2013 Now Tox21 ~8900 ~80 ~150 Ongoing Ongoing ToxCast Phase III ~2000 ~300 ~300 In process 2014-2015 Pesticides , antimicrobials, food additives, green alternatives, HPV, MPV, >600 endocrine reference cmpds, tox reference cmpds, NTP in vivo, FDA GRAS, FDA PAFA, EDSP, water contaminants, exposure data, industrial, failed drugs, marketed drugs, fragrances, flame retardants, … Assays 0 Chemicals >9000 Construction of EPA’s Tox21/ToxCast Inventories EPA ACToR EPA DSSTox EPA Program Offices OECD, EU Stakeholder Nominations EPA’s Tox21/ToxCast Phase II Chemical Nominations (>100 lists) Candidates for 38% Complex mixtures, polymers procurement Ill-defined substances No structure available 37% Insoluble (est. LogP) Able to procure Volatile (est. Vapor Pressure) Unable to procure Too reactive, explosive or cost prohibitive Inorganics, radioactive, etc. DMSO insoluble EPA Tox21 15% Volatile ph1v2 ph2 ph1_v2, E1K reference chemicals E1K Donated chemicals (incl. 135 failed drugs) 1860 3726 ~4400 ~7K ~19K NUMBER OF CHEMICALS 20 ToxCast Chemical Coverage: Use, Exposure, Toxicity 15 # lists /chemical 10 5 Consumer Use Colorant Fragrance Personal Care Inert FDA EAFUS FDA GRAS EDSP21 IRISTR Antimicrobial Pesticide Pharmaceutical FDAMDD drugs EPA_IUR NHANES Chem Industrial HPVCSI COSMOS NTPBSI CPDBAS TOXREF Phase I: Phase II (ph2,E1K): Phase III (ph3): Pesticides In vivo-rich; donated failed drugs; Toxicity reference chemicals; EDSP21; In vivo-rich EDSP21; high diversity, coverage extend coverage of industrial chemicals of toxicity space ToxCast: Toxicity Structure-Alerts & Generic Feature Coverage High incidence of predicted toxicity alerting features High degree of structural & feature diversity Ashby Alerts (15) Ashby TTC Category Alerts (37) TTC Category ProfilingFeatures (38) ToxCast Chemicals (4214) ordered by Testing Phase
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages29 Page
-
File Size-