<<

http://www.orcid.org/0000-0002-2668-4821

Using the US EPA’s CompTox Dashboard for structure identification and non-targeted analyses

Antony Williams1, Andrew D. McEachran3, Seth Newton2, Kristin Isaacs2, Katherine Phillips2, Nancy Baker1, Chris Grulke1 and Jon R. Sobus2 1) National Center for Computational Toxicology, U.S. Environmental Protection Agency, RTP, NC 2) National Exposure Research Laboratory, U.S. Environmental Protection Agency, RTP, NC 3) Oak Ridge Institute of Science and Education (ORISE) Research Participant, Research Triangle Park, NC

The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA

March 2018 ACS Spring Meeting, New Orleans The CompTox Chemistry Dashboard

• A publicly accessible website delivering access: – ~760,000 chemicals with related property data – Experimental and predicted physicochemical property data – Experimental Human and Ecological hazard data – Integration to “biological data” for 1000s of chemicals – Information regarding consumer products containing chemicals – Links to other agency websites and public data resources – “Literature” searches for chemicals using public resources – “Batch searching” for thousands of chemicals

– DOWNLOADABLE Open Data for reuse and repurposing

1 CompTox Chemistry Dashboard https://comptox.epa.gov/dashboard

2 1 of ~761,000 Chemical Pages

3 Access to Chemical Hazard Data

4 Sources of Exposure to Chemicals

5 Dashboard for Structure ID

• Structure Identification using the dashboard – Formula/mass-based searching – 1 chemical at a time

6 Advanced Searches

7 Advanced Searches Mass Based Search

8 Advanced Searches

9 Formula Searches

10 Exact Formula Search: C12H17NO 298 Chemicals

11 Dashboard for Structure ID

• Structure Identification using the dashboard – Formula/mass-based searching – 1 chemical at a time – Distilling structures into “MS-Ready form”

12 Specific Data-Mappings “MS-Ready Structures”

13 15 Total MS-Ready Mappings

14 “MS Ready” Formula Search C12H17NO 354 Chemicals

15 Dashboard for Structure ID

• Structure Identification using the dashboard – Formula/mass-based searching – 1 chemical at a time – Distilling structures into “MS-Ready form” – Ranking based on metadata

16 Identifying Known Unknowns by reference ranking

17 Data source ranking using the Dashboard

18 DOI: 10.1007/s00216-016-0139-z Additional Metadata Ranking

• US EPA CompTox Chemistry Dashboard Data Sources • “CPDat” Consumer Product Database • PubChem Data Source Count • PubMed Reference Count

19 Additional Metadata Ranking C12H17NO: 354 Chemicals

20 Additional Metadata Ranking C12H17NO: 354 Chemicals

21 Top Ranked Chemical

22 Additional data streams in development

• US EPA CompTox Chemistry Dashboard Data Sources • “CPDat” Consumer Product Database • PubChem Data Source Count • PubMed Reference Count • Retention Time Prediction = + + + + • Predicted Environmental Media Occurrence 𝑆𝑆𝑆𝑆𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑆𝑆𝑆𝑆𝐷𝐷𝐷𝐷 𝑆𝑆𝑆𝑆𝑃𝑃𝑃𝑃 𝑆𝑆𝑆𝑆𝑅𝑅𝑅𝑅 𝑆𝑆𝑆𝑆𝑀𝑀𝑀𝑀 ⋯ • Presence in Lists

C7H7NO3 0 1 2 3 4 DTXSID5024506 DTXSID3020962 DTXSID0026961 DTXSID2022591 DTXSID9059208 DTXSID1052298 DTXSID5075365 DTXSID2062535 DTXSID0046066 DTXSID90197716

Data Sources Retention Time Media Occurrence Method Compatibility 23 “Chemicals Detected in Water”

24 Dashboard for Structure ID

• Structure Identification using the dashboard – Formula/mass-based searching – 1 chemical at a time – Distilling structures into “MS-Ready form” – Ranking based on metadata – Batch searching of formulae and masses

25 Batch Search

26 Batch Search

27 Excel Output

28 Batch Search Integration to MetFrag http://c-ruttkies.github.io/MetFrag/projects/metfragweb/

29 MetFrag Input File

30 Batch Search Integration to MetFrag http://c-ruttkies.github.io/MetFrag/projects/metfragweb/

31 The Dashboard to Support MS-Analysis

MS-Ready Structures Underpin Analysis

32 Downloadable Data

33 Future Work: Combined Substructure/Formula Searching

34 Future Work: Searching Against Predicted Spectra

35 Future Work: Searching Against Predicted Spectra • CFM-ID predicted spectra generated for 700,000 chemicals – Positive ion, Negative ion, Electron Impact – Three energies

36 Future Work Scoring scheme into results

= + + + +

𝑆𝑆𝑆𝑆𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑆𝑆𝑆𝑆𝐷𝐷𝐷𝐷 𝑆𝑆𝑆𝑆𝑃𝑃𝑃𝑃 𝑆𝑆𝑆𝑆𝑅𝑅𝑅𝑅 𝑆𝑆𝑆𝑆𝑀𝑀𝑀𝑀 ⋯37 Conclusion

• The CompTox Chemistry Dashboard provides access to data for ~760,000 chemicals • High quality curated data and rich metadata facilitates mass spec analysis • “MS-Ready” processed data enables structure identification

38 Acknowledgments

• The CompTox Chemistry Dashboard team • NERL colleagues: – Jon Sobus, Elin Ulrich, Mark Strynar, Seth Newton (NTA Analysis) – Katherine Phillips, Kathie Dionisio, Kristin Isaacs (Consumer Products Database) • Emma Schymanski – Luxembourg Center for Systems Biomedicine (MS-ready/NTA)

39 Contact

Antony Williams US EPA Office of Research and Development National Center for Computational Toxicology (NCCT) [email protected] ORCID: https://orcid.org/0000-0002-2668-4821

40