Public Data Sources to Support Systems Toxicology Applications Allan Peter Davis1, Jolene Wiegers1, Thomas C
Total Page:16
File Type:pdf, Size:1020Kb
Available online at www.sciencedirect.com Current Opinion in ScienceDirect Toxicology Public data sources to support systems toxicology applications Allan Peter Davis1, Jolene Wiegers1, Thomas C. Wiegers1 and Carolyn J. Mattingly1,2 Abstract metabolic pathways, interaction networks, and/or bio- Public databases provide a wealth of freely available infor- logical processes; and the outcome can be the perturbed mation about chemicals, genes, proteins, biological networks, biological system trying to return back to a normal state phenotypes, diseases, and exposure science that can be in- (resilience) or a new, and potentially, adverse state due tegrated to construct pathways for systems toxicology appli- to toxicity [2]. The toxicant may trigger multiple cations. Relating this disparate information from public different pathways simultaneously by interacting with repositories, however, can be challenging because databases different biological mediators, complicating under- use a variety of ways to represent, describe, and make avail- standing about the specific connectivity between indi- able their content. The use of standard vocabularies to anno- vidual events, mediators, and outcomes. As well, in the tate key data concepts, however, allows the information to be real world, humans are typically exposed simultaneously more easily exchanged and combined for discovery of new to mixtures of chemicals and drugs, further complicating findings. We explore some of the many public data sources understanding of toxic mechanisms of actions [3,4]. currently available to support systems toxicology and demon- strate the value of standardizing data to help construct The goal of systems toxicology is to discern and organize chemical information pathways. exposure and toxicogenomic events to help formalize an understanding for drug therapy, risk assessment, expo- Addresses e 1 Department of Biological Sciences, North Carolina State University, sure hazards, and toxicity to humans [5 9]. Creating, Raleigh, NC 27695, United States testing, and validating systems toxicology pathways de 2 Center for Human Health and the Environment, North Carolina State novo can be an onerous task because they require in- University, Raleigh, NC 27695, United States formation from diverse disciplines and data types, Corresponding author: Mattingly, Carolyn J ([email protected]) including environmental science, exposure biology, chemistry, structural biology, pharmacokinetics, genomic influences, evolutionary biology, genetic and Current Opinion in Toxicology 2019, 16:17–24 protein networks, high-throughput data sets (tran- This review comes from a themed issue on 16C Systems Toxicology scriptomics, proteomics, metabolomics, and so on), Available online 11 March 2019 biological processes, cellular and tissue physiology, phenotypes, diseases, medicine, epidemiology, global For a complete overview see the Issue and the Editorial health, statistics, and computational modeling, among https://doi.org/10.1016/j.cotox.2019.03.002 others. The intricacies of such events make it difficult 2468-2020/Published by Elsevier B.V. for any one laboratory on its own to investigate and resolve a complete pathway without leveraging external Keywords data resources. Public database, Systems toxicology, Data standard, Chemical, Gene, Disease. Conveniently, a surfeit of public databases applicable to systems toxicology readily provides information about Introduction many of these key interactions and pathway steps. Systems toxicology assembles information into pathways These public data, made freely available and unre- to describe a xenobiotic toxicant interacting with a stricted to all people in all places at any time, can be mediator in a living system to set into motion a series of leveraged to help generate information frameworks. biological events that ultimately result in an outcome Combining diverse information from a variety of re- [1]. The toxicant is typically a chemical; the biological sources is made easier when public databases use stan- mediator can be any type of macromolecule but is often dardized terms, stable accession identifiers (IDs), and studied as a gene-encoded product (e.g. receptor, cross-reference IDs. Here, we review and explore a va- transporter, transcription factor, enzyme); the series of riety of public repositories that can be used and applied biological events can be wide ranging and include al- in the construction of systems toxicology pathways, with terations in transcripts, protein expression patterns, comparisons to how this is done at Comparative www.sciencedirect.com Current Opinion in Toxicology 2019, 16:17–24 18 16C Systems Toxicology Toxicogenomics Database (CTD; http://ctdbase.org/), a (including proteins, mRNA, and so on), chemicalegene public toxicology resource that advances understanding interactions, proteineprotein interactions, pathway/ about environmental chemical exposures and their ef- network data, phenotypes/diseases, anatomy, and fects on human heath [10,11]. human populationelevel information. Finding public databases for systems Leveraging public data toxicology Data standards, terms, and accession IDs The primary literature can be perused to discover the An important quality of any good public databases is the most appropriate public databases for systems toxicology practice of defining the data object with a primary term applications. Database: The Journal of Biological Databases and a stable accession ID, and then supplementing the and Curation (https://academic.oup.com/database) and primary term with a list of synonyms, abbreviations, and the annual ‘Database issue’ of Nucleic Acids Research cross-reference accession IDs for the same data object in (https://academic.oup.com/nar) are popular journals other public databases [22]. Use of controlled vocabu- where articles report new or updates to existing public laries and ontologies to represent and annotate data is databases and describe their resource, data standards, critical, and many vocabularies exist for a wide variety of content, and features. As well, toxicology review articles biological topics and are freely available for use at the are good sources for learning about public databases for OBO Foundry [23]. Choosing the most appropriate vo- systems toxicology applications [2,8,12e18]. In addition cabulary for a biocuration initiative can be challenging; to print, researchers can use online catalogs to search for considerations include whether a particular vocabulary databases relevant to systems toxicology. The NAR or ontology adequately addresses the specific content Molecular Biology Database Collection (http://www. with respect to both breadth and depth for the re- oxfordjournals.org/nar/database/a/), for example, fea- source’s goal. The stable accession ID allows informa- tures a categorized list of all the database resources tion to be easily stored, identified, and exchanged with featured in the Nucleic Acids Research annual reviews [19], other databases. Database cross-reference links collate and a search with ‘chemical’ retrieves over 65 available the same terms from various controlled vocabularies and public repositories. One of the most extensive and act as a Rosetta stone to translate and unambiguously comprehensive online catalogs is FAIRsharing (https:// resolve terms and accession IDs from different vocabu- fairsharing.org/databases/), a manually curated resource laries to the same biological concept. Using shared data of data standards and databases that implement the standards, terms, and accession IDs helps disparate BioDBcore guidelines for core database descriptors databases speak the same language and enables their [20,21]. Currently, over 1200 repositories are listed at content to adhere to the FAIR principle, allowing in- FAIRsharing, and each entry is annotated with subject formation to be Findable, Accessible, Interoperable, and and knowledge domain tags, allowing users to filter their Reusable [24], which in turn increases the value of the searches; for example, a basic query with ‘chemical’ re- data by making it a more readily shareable asset [25]. trieves over 430 public resources, which can be further refined to include domains such as ‘protein The scientific literature interaction,’ ‘phenotype,’ and ‘disease’ to filter for re- One of the most resourceful and ubiquitous data stan- sources geared toward systems toxicology. Searches for dards in public databases is the PubMed identifier taxonomy, related databases, and data standards discover (PMID), a short numerical accession ID to represent a similarly themed and constructed public repositories. published scientific document. PubMed (https://www. The accuracy, reliability, and effectiveness of this ncbi.nlm.nih.gov/pubmed/) is a user-friendly portal community-driven catalog, however, are dependent developed by the National Center for Biotechnology upon database creators reviewing and updating their Information (NCBI) that assigns a unique PMID to over entry page at the portal. 28 million citations in life sciences, medicine, molecular biology, behavioral sciences, and chemistry [26]. The We surveyed over a hundred databases from selected PMID simplifies the process of describing and articles and websites to identify those that are relevant communicating complex article citations and allows to systems toxicology. To focus on current and truly other public databases to associate their curated content public databases, we eliminated repositories that had to original source articles, providing both