EPA's Dsstox Chemical Database

EPA's Dsstox Chemical Database

Advancing Non-Targeted Analysis of Xenobiotics in Environmental & Biological Media EPA’s DSSTox Chemical Database: A Resource for the Non-Targeted Testing Community Ann Richard National Center for Computational Toxicology Office of Research & Development, US EPA August 18-19, 2015 Research Triangle Park, NC 0 Outline Cheminformatics view of problem DSSTox chemical database ToxCast chemical library Tox21 analytical QC Challenges 1 Cheminformatics view of non-targeted testing problem water upholstery clothes Chemical universe air blood serum Exposure? Toxic? toys Suspect testing, dust carpet physical standards MI Chemical mass substance database {mono-isotopic 1:many mass} {formulas} Inventory/Data Thousands of mass ion 1:many peaks & abundances {parent structures} {CAS-name} 1:1 What’s in the sample? Chemical {structure} & structures structure Should I be worried? How big is the problem? http://www.chemspider.com/blog/ (22 million ChemSpider IDs in 2008) Highest frequency formula ChemSpider? Theoretically… C H O has 11 12 3 C H N O 4,703,963,455 18 20 2 3 occurs 5110 possible times isomers!!! How big is the problem? http://www.chemspider.com/blog/ (22 million ChemSpider IDs in 2008) ChemSpider 22 M Problem too big to solve with formula matching & KEGG structure enumeration HMDB C6H12O6 (71 hits) 8590 unique formulas Cheminformatics view of non-targeted testing problem Non-Targeted Screening Too many “hit possibilities What’s in the sample? Prioritization Enriched Targeted Screening chemical databases, Should I be worried? standards & tools Number of chemicals Suspect Screening Too little “hit” coverage DSSTox_v1 (thru 3/2014) http://www.epa.gov/ncct/dsstox/ Original target audience: Structure-Activity Relationship (SAR) toxicity modeling community Focus on EPA, HPV, environmental toxicity datasets Emphasis on accurate CAS-name-structure annotations at substance level Public resource for high-quality structure-data files Approx. 25K CAS-substances, 16K structures 6 DSSTox Update DSSTox_v1 DSSTox_v2 Convert DSSTox tables to MySQL Develop curation interface Implement cheminformatics workflow Expand chemical content Register ACToR data inventories Web-services & Dashboard access Website to be retired 9/30/2015 DSSTox_v1 files & documentation will remain available on EPA ftp site Original 25K substance records stored in MS ACCESS and Excel tables serve as the initial Level 1,2 input to DSSTox_v2 7 DSSTox_v2 CAS-Structure Sources: QC levels DSSTox_v1 EPA SRS NLM ChemID EPA ACToR ~900K records ~100K records ~400K records ~500K records ~25K records MLSCN HTS TSCA, EPA ToxNet:HSDB, data, large, incl. EnvironTox focus Curated Regulatory Docs EPA, FDA ChemID, Quality issues, EnvironTox focus Valid CAS names Semi-curated ACToR public structures Too small No Structures Quality, Stereo Quality issues DSSTox_v2 1. High ChemID w/ Curated 2. Low conflicts no no conflicts 3. High w/PubChem sole source, 4. Med no conflicts Public sole source, no conflicts 5. Low conflicts 6. Untrusted conflicts 7. Incomplete DSSTox_v2 CAS-Structure Sources: QC levels DSSTox_v1 EPA SRS NLM ChemID EPA ACToR ~500KData CAS Structure ~900K records records 1 1 ~100K1 records ~400K records ~500K records ~20K records~25K records MLSCN HTS Data2 CAS2 StructureTSCA,1 EPA ToxNet:HSDB, data, large, incl. Limitation: EnvironTox focus Limitation:Curated Regulatory Docs EPA, FDA ChemID, CAS-Name- DataQuality3 issues, CASTooEnvironTox Small;3 focusStructureValid1 CAS names Semi-curated ACToR Structure public structures AccessibilityToo small No Structures Quality, Stereo Quality issues Error Data4 CAS4 Structure1 DSSTox_v2 1. High ChemID w/ Curated 2. Low conflicts no no conflicts ? Data1 ? CAS1 Structure1 3. High w/PubChem sole source, 4. Med Data2 ? CAS2 ? Structure1 no conflicts Public sole source, no conflicts 5. Low conflicts Data CAS ? Structure 6. Untrusted 3 ? 3 1 conflicts 7. Incomplete Data4 ? CAS4 ? Structure1 DSSTox_v2 Construction Data source load order: 1) DSSTox_v1 (~22K) 1:1 CAS-structure mappings Assign NOCAS_GSID Related CAS & structure mappings (e.g., NOCAS, mixtures) 2) EPA SRS (~77K) systematic name structure conversion internal CAS-structure conflicts (12.5%) ChemID conflicts (24% of 30K overlaps) DSSTox conflicts (8% of 6200 overlaps) queue for curation 3) ChemID (~77K) internal CAS-structure conflicts (4.5%) PubChem conflicts (45% of 225K overlaps) … OUCH!! DSSTox conflicts (11% of 2300 overlaps) queue for curation 4) And so on … 10 Example problem 11 CAS-Structure “Sphere of Confusion” Monoisotopic Mass CAS2 ? CAS1 ? Formula Data1 • Deleted CAS Data2 • Invalid CAS Parent structure Data3 • Salt forms (no stereo, desalted) Data4 • Complex forms Data5 • Hydrate forms Data6 • Approx mappings to mixtures many:1 Valid CAS-substance? Data7 • Approx mappings to ill- Data8 defined substances Resolve CAS-structure mappings Data9 • Stereoisomers CAS5 ? • Unresolved tautomers for accurate chemical-data CAS ? NOCAS? 4 mapping, i.e. what was tested? CAS3 ? Collapse sphere to bring all DSSTox_v2 Database related data to NTS parent & Cheminformatics Layer structure-formula level DSSTox_v2 Totals QC Level Totals (12Jun2015) DSSTox_v2 DSSTox_High 4535 154.5K substances in top 4 1. High QC bins exceed public DSSTox_Low 16K Curated 2. Low accuracy standards Public_High 33K 3. High ~15K w/o structures Public_Medium 101K 4. Med Public 5. Low Public_Low 584K 6. Untrusted Public_Untrusted ~ 310K pending 7. Incomplete ~ 150K pending validated KNIME structure-“cleaning” workflow https://www.knime.org/knime Objectives: Combine community approaches to structure processing Develop a flexible workflow to be used by EPA and shared publicly Process DSSTox files to create “QSAR-ready” structures Parse SDF, remove fragments Explicit hydrogen removed Dearomatization Removal of chirality info, isotopes and pseudo-atoms Aromatization + add explicit hydrogen atoms Standardize Nitro groups Other tautomerize/mesomerization Publicly available cheminformatics Neutralize (when possible) toolkits in KNIME: Indigo Slide courtesy of K. Mansouri 14 DSSTox Viewer (EPA Intranet) http://rtpavaki1.epa.gov:8080/DSSToxViewer/ 35% DSSTox_High have deleted CAS >20K total deleted CAS validated DSSTox Batch Tool (EPA Intranet) http://rtpavaki1.epa.gov:8080/DSSToxViewer/ CSS Chemical Explorer Dashboard (powered by DSSTox & ACToR) MOA & Knowledge- Structure “cleaning” ToxQP: QSAR models; informed features & SAR-ready structure files phys-chem properties chemotypes Structure-based predictions Chemical Similarity searching Structures Chemical search by Fingerprints, feature sets CAS-names Phys-chem calc & CSS measured properties SMILES Dashboards Structures CASRN & Chemical Webservices, link- Names Structure file Chemical EPA SRS outs to EPA Inventories & downloads (TSCA, EcoTox) databases & webtools Data Sources ToxCast/Tox21 (e.g.,WebICE, PK/ADME model 1:1 mapping EnvironFateSimulator, CAS:Name:structure (ACTOR) ToxRef inputs & outputs EDSP, etc.) EDSP21 Quality scores CAS look-up CPCAT ToxCast/Tox21 New list error-checking >700K substances HTS activities 17 EPA’s ToxCast/Tox21 Projects • Build a diverse, highly prioritized chemical library of interest to EPA regulatory programs (e.g., EDSP) and of relevance to environmental toxicology • Use high-throughput screening (HTS) to generate bioassay profiles and fill data gaps for thousands of chemicals • Use all of these data to improve ability to model adverse outcomes ToxCast Chemicals Tox21 ToxRef HTS-In vitro 18 ToxCast & Tox21 Inventories Set Chemicals Assays Endpoints Completion Available ToxCast Phase I 293 ~600 ~700 2011 Now ToxCast Phase II 767 ~600 ~700 03/2013 Now ToxCast E1K 800 ~50 ~120 03/2013 Now Tox21 ~8900 ~80 ~150 Ongoing Ongoing ToxCast Phase III ~2000 ~300 ~300 In process 2014-2015 Pesticides , antimicrobials, food additives, green alternatives, HPV, MPV, >600 endocrine reference cmpds, tox reference cmpds, NTP in vivo, FDA GRAS, FDA PAFA, EDSP, water contaminants, exposure data, industrial, failed drugs, marketed drugs, fragrances, flame retardants, … Assays 0 Chemicals >9000 Construction of EPA’s Tox21/ToxCast Inventories EPA ACToR EPA DSSTox EPA Program Offices OECD, EU Stakeholder Nominations EPA’s Tox21/ToxCast Phase II Chemical Nominations (>100 lists) Candidates for 38% Complex mixtures, polymers procurement Ill-defined substances No structure available 37% Insoluble (est. LogP) Able to procure Volatile (est. Vapor Pressure) Unable to procure Too reactive, explosive or cost prohibitive Inorganics, radioactive, etc. DMSO insoluble EPA Tox21 15% Volatile ph1v2 ph2 ph1_v2, E1K reference chemicals E1K Donated chemicals (incl. 135 failed drugs) 1860 3726 ~4400 ~7K ~19K NUMBER OF CHEMICALS 20 ToxCast Chemical Coverage: Use, Exposure, Toxicity 15 # lists /chemical 10 5 Consumer Use Colorant Fragrance Personal Care Inert FDA EAFUS FDA GRAS EDSP21 IRISTR Antimicrobial Pesticide Pharmaceutical FDAMDD drugs EPA_IUR NHANES Chem Industrial HPVCSI COSMOS NTPBSI CPDBAS TOXREF Phase I: Phase II (ph2,E1K): Phase III (ph3): Pesticides In vivo-rich; donated failed drugs; Toxicity reference chemicals; EDSP21; In vivo-rich EDSP21; high diversity, coverage extend coverage of industrial chemicals of toxicity space ToxCast: Toxicity Structure-Alerts & Generic Feature Coverage High incidence of predicted toxicity alerting features High degree of structural & feature diversity Ashby Alerts (15) Ashby TTC Category Alerts (37) TTC Category ProfilingFeatures (38) ToxCast Chemicals (4214) ordered by Testing Phase

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    29 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us