Genomics and Risk Assessment | Mini-Monograph
Total Page:16
File Type:pdf, Size:1020Kb
Genomics and Risk Assessment | Mini-Monograph Database Development in Toxicogenomics: Issues and Efforts William B. Mattes,1 Syril D. Pettit,2 Susanna-Assunta Sansone,3 Pierre R. Bushel,4 and Michael D. Waters4 1Pfizer Inc, Groton, Connecticut, USA; 2ILSI Health and Environmental Sciences Institute, Washington, DC, USA; 3European Molecular Biology Laboratory–European Bioinformatics Institute, Hinxton, United Kingdom; 4National Center for Toxicogenomics, National Institute of Environmental Health Sciences, National Institutes of Health, Department of Health and Human Services, Research Triangle Park, North Carolina, USA thousands of defined probes, each of which The marriage of toxicology and genomics has created not only opportunities but also novel infor- is intended to detect a single mRNA mole- matics challenges. As with the larger field of gene expression analysis, toxicogenomics faces the cule. The mRNA sample is labeled and problems of probe annotation and data comparison across different array platforms. hybridized to the microarray such that the Toxicogenomics studies are generally built on standard toxicology studies generating biological end signal at a given probe is related to the point data, and as such, one goal of toxicogenomics is to detect relationships between changes in amount of that particular mRNA in the sam- gene expression and in those biological parameters. These challenges are best addressed through ple. This readout characteristic makes data collection into a well-designed toxicogenomics database. A successful publicly accessible toxi- microarray-based transcript profiling particu- cogenomics database will serve as a repository for data sharing and as a resource for analysis, data larly appealing because the identities of the mining, and discussion. It will offer a vehicle for harmonizing nomenclature and analytical signals are predetermined. In this sense, data approaches and serve as a reference for regulatory organizations to evaluate toxicogenomics data generated in transcript profiling experiments submitted as part of registrations. Such a database would capture the experimental context of in are rather straightforward. However, because vivo studies with great fidelity such that the dynamics of the dose response could be probed statisti- of the relatively poor annotation of expressed cally with confidence. This review presents the collaborative efforts between the European genes and sequence tags, particularly in the Molecular Biology Laboratory–European Bioinformatics Institute ArrayExpress, the International dog and rat, the interpretation of transcript Life Sciences Institute Health and Environmental Science Institute, and the National Institute of profiling experiments is challenging. Environmental Health Sciences National Center for Toxigenomics Chemical Effects in Biological The field of toxicogenomics integrates Systems knowledge base. The goal of this collaboration is to establish public infrastructure on an the data-rich science of transcript profiling international scale and examine other developments aimed at establishing toxicogenomics data- with traditional toxicological end point eval- bases. In this review we discuss several issues common to such databases: the requirement for iden- uation. If successfully implemented, this tifying minimal descriptors to represent the experiment, the demand for standardizing data storage integration has the potential to serve as a and exchange formats, the challenge of creating standardized nomenclature and ontologies to powerful synergistic tool for understanding describe biological data, the technical problems involved in data upload, the necessity of defining the relationship between gross toxicology parameters that assess and record data quality, and the development of standardized analytical and genome-level effects. From its inception approaches. Key words: ArrayExpress, bioinformatics, CEBS, database, EBI, HESI, MIAME, NCT, the field of transcript profiling using toxicogenomics. Environ Health Perspect 112:495–505 (2004). doi:10.1289/txg.6697 available via microarrays has, through the sheer volume http://dx.doi.org/ [Online 15 January 2004] This article is part of the mini-monograph “Application of Genomics to Mechanism-Based Toxicology, the study of poisons, focuses referred to as genomics or transcriptomics. Risk Assessment.” on substances and treatments that cause The application of these technologies to Address correspondence to W.B. Mattes, adverse effects in living things. A critical toxicology is based on the assumption that GeneLogic, Inc., 610 Professional Dr., Gaithersburg, part of this study is the characterization of the sequelae of events leading to adverse MD 20879 USA. Telephone: (240) 364-6238. Fax: (240) 364-6262. E-mail: [email protected] the adverse effects at the level of the organ- events at the cellular and organism levels We thank A. Brazma, Microarray Informatics, ism, the tissue, the cell, and the molecular will include critical changes in certain (EMBL–EBI);C. Bradfield, McArdle Laboratory for makeup of the cell. Thus, studies in toxi- mRNAs and proteins. Consequently, these Cancer Research, University of Wisconsin, Madison, cology measure effects on body weight and changes may give insight into the molecu- WI; W. Tong, National Center for Toxicological food consumption of an organism, on indi- lar mechanisms of toxicity and/or may be Research, Jefferson, AR; and W. Eastin, National vidual organ weights, on microscopic diagnostic for a given mode of toxicity. Toxicology Program, National Institute of Environmental Health Sciences, Research Triangle histopathology of tissues, and on cell via- Thus the number of toxicology studies Park, NC, for their review of this manuscript prior bility, necrosis, and apoptosis. Recently incorporating either proteomics or tran- to submission. We also thank the microarray infor- added to the arsenal of end points that script profiling has been exponentially matics team at EMBL-EBI, the expression profiler such toxicological studies can use is the increasing for several years. developers, and the ArrayExpress curation and devel- measurement of levels of the thousands of Although both proteomics and transcript opment teams. We especially thank S. Contrino for proteins and mRNAs present in the cell. profiling measure molecular events at a his contribution to Tox-MIAMExpress. The ArrayExpress project is funded by EMBL, the The former measurement was made possi- global and cellular levels, the two are dramat- European Commission [TEMBLOR (The European ble with the advent of two-dimensional gel ically different in both technology and read- Molecular Biology Linked Original Resources) electrophoresis and forms the basis of the out. Proteomics relies on the physical grant], the EBI Industry Programme (Biostandards), field of proteomics. The latter measure- separation of all the proteins of a sample, the CAGE (Compendium of Arabidosis Gene ment was made possible with the advent of usually by means of two separate characteris- Expression) consortium, and the Health and whole genomic sequencing and the subse- tics such as charge and molecular weight, fol- Environmental Sciences Institute (HESI) Toxicogenomics Database grant. quent development of microarrays capable lowed by detection of the protein with a dye, The authors declare they have no competing of measuring thousands of transcripts at and finally, identification by means of mass financial interests. once and is best described as transcript spectrometry. Transcript profiling with Received 25 August 2003; accepted 12 January profiling, although it has often been microarrays makes use of hundreds to 2004. Environmental Health Perspectives • VOLUME 112 | NUMBER 4 | March 2004 495 Mini-Monograph | Mattes et al. of data involved, required incorporation of of data into public databases has already Schena et al. 1995). The first of these is resources for bioinformatics, data manage- been proposed as a requirement for journal specificity: for example, the mRNA for ment, and statistical analysis (Bassett et al. publication of standard genomics experi- cytochrome P450 (Cyp) 3A4 (GenBank 1999; Eisen et al. 1998; Ermolaeva et al. ments (Anonymous 2002; Ball et al. 2002), accession no. NM_017460; http://www. 1998). The addition of toxicology informa- and public databases for microarray data ncbi.nih.gov/GenBank/) is 92% identical to tion to these data poses additional and have been established (Anonymous 2002; the mRNA for Cyp3A7 (GenBank accession unique informatics challenges. A typical Brazma et al. 2003; Edgar et al. 2002). no. NM_000765), and thus a microarray toxicogenomics study might involve an ani- Another important function of some element consisting of a cDNA sequence for mal study with three dose groups (one vehi- public repositories is the promotion of Cyp3A7 would be expected to detect cle group, one low-dose group, and one international standards in data organiza- Cyp3A4 as well. Similarly, a microarray ele- high-dose group), two to three sacrifice tion and nomenclature (Anonymous 2002; ment may lack specificity because it corre- times, and four to five animals per group. Bassett et al. 1999; Brazma et al. 2001; sponds to a sequence (e.g., a 3´ untranslated Even if only one tissue is examined per ani- Stoeckert et al. 2002). Particularly in the region) common to several alternatively mal, this represents 36–45 arrays per study, case of biological data, the establishment spliced transcripts, for example, the UDP- not including replicates.