Interoperability in Toxicology: Connecting Chemical, Biological, and Complex Disease Data
Total Page:16
File Type:pdf, Size:1020Kb
INTEROPERABILITY IN TOXICOLOGY: CONNECTING CHEMICAL, BIOLOGICAL, AND COMPLEX DISEASE DATA Sean Mackey Watford A dissertation submitted to the faculty at the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Gillings School of Global Public Health (Environmental Sciences and Engineering). Chapel Hill 2019 Approved by: Rebecca Fry Matt Martin Avram Gold David Reif Ivan Rusyn © 2019 Sean Mackey Watford ALL RIGHTS RESERVED ii ABSTRACT Sean Mackey Watford: Interoperability in Toxicology: Connecting Chemical, Biological, and Complex Disease Data (Under the direction of Rebecca Fry) The current regulatory framework in toXicology is expanding beyond traditional animal toXicity testing to include new approach methodologies (NAMs) like computational models built using rapidly generated dose-response information like US Environmental Protection Agency’s ToXicity Forecaster (ToXCast) and the interagency collaborative ToX21 initiative. These programs have provided new opportunities for research but also introduced challenges in application of this information to current regulatory needs. One such challenge is linking in vitro chemical bioactivity to adverse outcomes like cancer or other complex diseases. To utilize NAMs in prediction of compleX disease, information from traditional and new sources must be interoperable for easy integration. The work presented here describes the development of a bioinformatic tool, a database of traditional toXicity information with improved interoperability, and efforts to use these new tools together to inform prediction of cancer and complex disease. First, a bioinformatic tool was developed to provide a ranked list of Medical Subject Heading (MeSH) to gene associations based on literature support, enabling connection of compleX diseases to genes potentially involved. Second, a seminal resource of traditional toXicity information, ToXicity Reference Database (ToXRefDB), was redeveloped, including a controlled vocabulary for adverse events used to map identifiers in the Unified Medical Language System (UMLS), thus enabling a connection to MeSH terms. Finally, gene to MeSH associations were used to evaluate the biological coverage of ToxCast for cancer to understand the capacity to use ToXCast to identify chemical hazard potential. ToXCast covers many gene targets putatively linked to cancer; however, more information on pathways in cancer progression is needed to identify robust associations between chemical eXposure and risk of compleX disease. The findings herein demonstrate that increased interoperability between data iii resources is necessary to leverage the large amount of data currently available to understand the role environmental exposures play in etiologies of complex diseases. iv ACKNOWLEDGEMENTS I must recognize and thank Katie Paul-Friedman for her mentorship and guidance through this program. Katie has been my supervisor and mentor at USEPA’s National Center for Computational Toxicology for the past two years. She has been an invaluable advocate supporting my interests as I develop expertise and skills to apply within public health and toxicology. v TABLE OF CONTENTS LIST OF TABLES ....................................................................................................................................... viii LIST OF FIGURES ...................................................................................................................................... ix LIST OF ABBREVIATIONS .......................................................................................................................... X CHAPTER 1: INTRODUCTION ................................................................................................................... 1 Current issues in data interoperability to support computational toXicology and chemical safety evaluation .................................................................................................................................... 1 Current state of the data landscape in toXicology .................................................................................. 3 Example Use Cases .............................................................................................................................. 9 Conclusion ........................................................................................................................................... 13 Scope of dissertation ........................................................................................................................... 13 CHAPTER 2: NOVEL APPLICATION OF NORMALIZED POINTWISE MUTUAL INFORMATION (NPMI) TO MINE BIOMEDICAL LITERATURE FOR GENE SETS ASSOCIATED WITH DISEASE ................................................................................................................. 15 Introduction .......................................................................................................................................... 15 Methods ............................................................................................................................................... 19 Results ................................................................................................................................................. 28 Discussion ........................................................................................................................................... 36 Summary ............................................................................................................................................. 42 CHAPTER 3:TOXREFDB VERSION 2.0: IMPROVED UTILITY FOR PREDICTIVE AND RETROSPECTIVE TOXICOLOGY ANALYSES ........................................................................................ 44 Introduction .......................................................................................................................................... 44 Meeting Challenges in ToXRefDB 2.0 ................................................................................................. 47 Conclusions ......................................................................................................................................... 72 Summary ............................................................................................................................................. 73 CHAPTER 4: EVALUATING THE CANCER- RELATED BIOLOGICAL COVERAGE OF TOXICITY FORECASTER (TOXCAST): IDENTIFICATION OF KNOWLEDGE GAPS AND IMPLICATIONS FOR CHEMICAL SCREENING ....................................................................................... 74 Introduction .......................................................................................................................................... 74 vi Methods ............................................................................................................................................... 78 Results ................................................................................................................................................. 86 Discussion ........................................................................................................................................... 91 Summary ............................................................................................................................................. 95 CHAPTER 5: CONCLUSIONS, PERSPECTIVES, AND FUTURE DIRECTIONS .................................... 96 APPENDIX 1: USEPA PRODUCTS SUPPORTING COMPUTATIONAL TOXICOLOGY ....................... 103 APPENDIX 2: METADATA FOR PUBMED ARTICLES CITING EITHER MARTIN ET AL. (2009) (28) OR MARTIN ET AL. (2009) (29) AS OF AUGUST 8, 2018 .................................................. 104 APPENDIX 3: SUBSET OF TOXREFDB ENDPOINTS AND EFFECTS CONSIDERED CANCER-RELATED FOR RESEARCH PURPOSES ............................................................................. 109 APPENDIX 4: CANCER -RELATED ENDPOINTS FROM TOXREFDB AND THE CORRESPONDING TOXCAST INTENDED GENE TARGETS AND ASSAY ENDPOINTS ................... 117 APPENDIX 5: RELEVANCE DECISIONS FOR TOP TEN TERMS FROM EACH REFERENCE LIBRARY .......................................................................................................................... 118 REFERENCES ........................................................................................................................................ 134 vii LIST OF TABLES Table 2.1: Equations for ranking GeneID-MeSH co-occurrences .............................................................. 24 Table 2.2: Manually curated resources used to construct EMCON .......................................................... 29 Table 2.3: Manual Precision for 17 selected MeSH terms ......................................................................... 34 Table 2.4: Top MeSH terms for genes BRCA1, BRCA2, ESR1, ESR2, and PGR from EMCON ......................................................................................................................................... 35 Table 3.1: Extraction progress as of ToXRefDB v2 release ....................................................................... 57 Table 3.2: OCSPP 870 series health effects guidelines in ToXRefDB ....................................................... 61 Table 3.3: Observations for guideline profiles ...........................................................................................