http://www.orcid.org/0000-0002-2668-4821
Using the US EPA’s CompTox Chemistry Dashboard for structure identification and non-targeted analyses
Antony Williams1, Andrew D. McEachran3, Seth Newton2, Kristin Isaacs2, Katherine Phillips2, Nancy Baker1, Chris Grulke1 and Jon R. Sobus2 1) National Center for Computational Toxicology, U.S. Environmental Protection Agency, RTP, NC 2) National Exposure Research Laboratory, U.S. Environmental Protection Agency, RTP, NC 3) Oak Ridge Institute of Science and Education (ORISE) Research Participant, Research Triangle Park, NC
The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
March 2018 ACS Spring Meeting, New Orleans The CompTox Chemistry Dashboard
• A publicly accessible website delivering access: – ~760,000 chemicals with related property data – Experimental and predicted physicochemical property data – Experimental Human and Ecological hazard data – Integration to “biological assay data” for 1000s of chemicals – Information regarding consumer products containing chemicals – Links to other agency websites and public data resources – “Literature” searches for chemicals using public resources – “Batch searching” for thousands of chemicals
– DOWNLOADABLE Open Data for reuse and repurposing
1 CompTox Chemistry Dashboard https://comptox.epa.gov/dashboard
2 1 of ~761,000 Chemical Pages
3 Access to Chemical Hazard Data
4 Sources of Exposure to Chemicals
5 Dashboard for Structure ID
• Structure Identification using the dashboard – Formula/mass-based searching – 1 chemical at a time
6 Advanced Searches
7 Advanced Searches Mass Based Search
8 Advanced Searches
9 Formula Searches
10 Exact Formula Search: C12H17NO 298 Chemicals
11 Dashboard for Structure ID
• Structure Identification using the dashboard – Formula/mass-based searching – 1 chemical at a time – Distilling structures into “MS-Ready form”
12 Specific Data-Mappings “MS-Ready Structures”
13 Diphenhydramine 15 Total MS-Ready Mappings
14 “MS Ready” Formula Search C12H17NO 354 Chemicals
15 Dashboard for Structure ID
• Structure Identification using the dashboard – Formula/mass-based searching – 1 chemical at a time – Distilling structures into “MS-Ready form” – Ranking based on metadata
16 Identifying Known Unknowns by reference ranking
17 Data source ranking using the Dashboard
18 DOI: 10.1007/s00216-016-0139-z Additional Metadata Ranking
• US EPA CompTox Chemistry Dashboard Data Sources • “CPDat” Consumer Product Database • PubChem Data Source Count • PubMed Reference Count
19 Additional Metadata Ranking C12H17NO: 354 Chemicals
20 Additional Metadata Ranking C12H17NO: 354 Chemicals
21 Top Ranked Chemical
22 Additional data streams in development
• US EPA CompTox Chemistry Dashboard Data Sources • “CPDat” Consumer Product Database • PubChem Data Source Count • PubMed Reference Count • Retention Time Prediction = + + + + • Predicted Environmental Media Occurrence 𝑆𝑆𝑆𝑆𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑆𝑆𝑆𝑆𝐷𝐷𝐷𝐷 𝑆𝑆𝑆𝑆𝑃𝑃𝑃𝑃 𝑆𝑆𝑆𝑆𝑅𝑅𝑅𝑅 𝑆𝑆𝑆𝑆𝑀𝑀𝑀𝑀 ⋯ • Presence in Lists
C7H7NO3 0 1 2 3 4 DTXSID5024506 DTXSID3020962 DTXSID0026961 DTXSID2022591 DTXSID9059208 DTXSID1052298 DTXSID5075365 DTXSID2062535 DTXSID0046066 DTXSID90197716
Data Sources Retention Time Media Occurrence Method Compatibility 23 “Chemicals Detected in Water”
24 Dashboard for Structure ID
• Structure Identification using the dashboard – Formula/mass-based searching – 1 chemical at a time – Distilling structures into “MS-Ready form” – Ranking based on metadata – Batch searching of formulae and masses
25 Batch Search
26 Batch Search
27 Excel Output
28 Batch Search Integration to MetFrag http://c-ruttkies.github.io/MetFrag/projects/metfragweb/
29 MetFrag Input File
30 Batch Search Integration to MetFrag http://c-ruttkies.github.io/MetFrag/projects/metfragweb/
31 The Dashboard to Support MS-Analysis
MS-Ready Structures Underpin Analysis
32 Downloadable Data
33 Future Work: Combined Substructure/Formula Searching
34 Future Work: Searching Against Predicted Spectra
35 Future Work: Searching Against Predicted Spectra • CFM-ID predicted spectra generated for 700,000 chemicals – Positive ion, Negative ion, Electron Impact – Three energies
36 Future Work Scoring scheme into results
= + + + +
𝑆𝑆𝑆𝑆𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑆𝑆𝑆𝑆𝐷𝐷𝐷𝐷 𝑆𝑆𝑆𝑆𝑃𝑃𝑃𝑃 𝑆𝑆𝑆𝑆𝑅𝑅𝑅𝑅 𝑆𝑆𝑆𝑆𝑀𝑀𝑀𝑀 ⋯37 Conclusion
• The CompTox Chemistry Dashboard provides access to data for ~760,000 chemicals • High quality curated data and rich metadata facilitates mass spec analysis • “MS-Ready” processed data enables structure identification
38 Acknowledgments
• The CompTox Chemistry Dashboard team • NERL colleagues: – Jon Sobus, Elin Ulrich, Mark Strynar, Seth Newton (NTA Analysis) – Katherine Phillips, Kathie Dionisio, Kristin Isaacs (Consumer Products Database) • Emma Schymanski – Luxembourg Center for Systems Biomedicine (MS-ready/NTA)
39 Contact
Antony Williams US EPA Office of Research and Development National Center for Computational Toxicology (NCCT) [email protected] ORCID: https://orcid.org/0000-0002-2668-4821
40