Toxicogenomics Analysis of Non-Model Transcriptomes Using Next-Generation Sequencing and Microarray
Total Page:16
File Type:pdf, Size:1020Kb
The University of Southern Mississippi The Aquila Digital Community Dissertations Fall 12-2010 Toxicogenomics Analysis of Non-Model Transcriptomes Using Next-Generation Sequencing and Microarray Arun Rawat University of Southern Mississippi Follow this and additional works at: https://aquila.usm.edu/dissertations Part of the Biology Commons, and the Computational Biology Commons Recommended Citation Rawat, Arun, "Toxicogenomics Analysis of Non-Model Transcriptomes Using Next-Generation Sequencing and Microarray" (2010). Dissertations. 474. https://aquila.usm.edu/dissertations/474 This Dissertation is brought to you for free and open access by The Aquila Digital Community. It has been accepted for inclusion in Dissertations by an authorized administrator of The Aquila Digital Community. For more information, please contact [email protected]. The University of Southern Mississippi TOXICOGENOMICS ANALYSIS OF NON-MODEL TRANSCRIPTOMES USING NEXT-GENERATION SEQUENCING AND MICROARRAY by Arun Rawat Abstract of a Dissertation Submitted to the Graduate School of The University of Southern Mississippi in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy December 2010 ABSTRACT TOXICOGENOMICS ANALYSIS OF NON-MODEL TRANSCRIPTOMES USING NEXT-GENERATION SEQUENCING AND MICROARRAY by Arun Rawat December 2010 With the advent of next generation technologies like Roche/454 Life Sciences that require low cost and less time for sequencing will help in providing a workable draft of non-model species genomes. Availability of high throughput microarray technologies for gene expression profiling provides low-cost tools for investigation of highly-integrated responses to various stimuli. These advancements along with bioinformatics processing have led to an increasing number of non-model species having well-annotated transcriptomes. The project focuses on the life cycle of development, functional annotation, and utilization of genomic tools for the avian wildlife species to determine the molecular impacts of exposure to munitions constituents (MCs). Massively parallel pyrosequencing is created from the normalized multi-tissue library of Northern bobwhite (Colinus virginianus) and Japanese quail (Coturnix coturnix) cDNAs. The assembly of next generation sequencing for transcriptomes of these organisms is challenging. High number of ESTs and longer read length require high computational memory and management between the sensitivity and accuracy to assemble correctly. The researcher developed a new pipeline “Contigs Assembly Pipeline using Reference Genome” (CAPRG) to assemble long reads for non-model organisms that have available reference genome. The results were benchmarked by employing parameter space for ii different available methods that utilize de novo strategies like overlap-layout-consensus (OLC) and graphs for long reads. It was observed that CAPRG performance was better or near equivalent in the two transcriptomic datasets based on different benchmarks but also completes the assembly in a fraction of the time as compared to assemblers that yield competitive results. The researcher performed statistical analysis to generate differentially expressed genes and utilized metabolic maps, biological networks, pathway analysis and GO enrichment to the differentially-expressed genes in the livers of birds exposed for 60 days (d) to 10 and 60 mg/kg/d 2,6-DNT. These revealed insights into the metabolic perturbations underlying several observed toxicological phenotypes. The impacts were validated by RT- qPCR including: a shift in energy metabolism toward protein catabolism via inhibition of control points for glucose and lipid metabolic pathways, PCK1 and PPARGC1, respectively. To greatly expand the information-base for Northern bobwhite that has little supporting information in Genbank, the researcher initiated and developed the web-based knowledgebase (www.quailgenomics.info). The Quail genomics share and develop functional genomic data for Northern bobwhite to allow researchers to perform analysis and curate genomic information for this non-model species. iii COPYRIGHT BY ARUN RAWAT 2010 The University of Southern Mississippi TOXICOGENOMICS ANALYSIS OF NON-MODEL TRANSCRIPTOMES USING NEXT-GENERATION SEQUENCING AND MICROARRAY by Arun Rawat A Dissertation Submitted to the Graduate School of The University of Southern Mississippi in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Approved: Mohamed O. Elasri ____________________________________ Director Edward J. Perkins ____________________________________ Tim McLean ____________________________________ Jonathan Sun ____________________________________ Preetam Ghosh ____________________________________ Susan A. Siltanen ____________________________________ Dean of the Graduate School December 2010 ACKNOWLEDGMENTS I am grateful to the committee chair, Dr. Mohamed O. Elasri, and committee members, Dr. Edward J. Perkins, Dr. Tim McLean, Dr. Jonathan Sun and Dr. Preetam Ghosh, for their advice and support for the project. The laboratory experiments cDNA library, microarray experiments, and RT-qPCR were conducted at Engineer Research and Development Centre (ERDC), Vicksburg, and I am thankful to Dr. Kurt A. Gust and Dr. Edward J. Perkins for providing the data and useful inputs for the analysis for the project. I am also thankful to Glover George for providing access to parallel computing facilities at School of Computing Sciences and to my lab members, Dr. Tanwir Habib and Dr. Jagan Thodima, for their help in the project. Special thanks to Antony Swartz and other team members in the lab for useful feedback that helped me with the presentation. iv TABLE OF CONTENTS ABSTRACT….. ...................................................................................................................... ii ACKNOWLEDGMENTS ....................................................................................................... iv LIST OF TABLES .................................................................................................................. vii LIST OF ILLUSTRATIONS ................................................................................................ viii CHAPTER I. INTRODUCTION AND BACKGROUND……………………………………1 Motivation Background II. METHODS AND MATERIALS……………………………………………..20 Preparation of cDNA Library cDNA Library Normalization Sequencing EST Analysis Microarray Data Design and Analysis Life Cycle of Development for the Northern Bobwhite Knowledgebase Design and Development Sequence Alignment Strategies and Development of CAPRG Genomic Comparisons between Avian Species III. RESULTS AND DISCUSION………………………………………………..53 The Life Cycle of Toxicogenomics Analysis for Northern Bobwhite Knowledgebase Design and Development v Sequence Assembly and CAPRG Genomic Comparisons between Avian Species Conclusions Future Work APPENDIXES…………………………………………………………………………...106 REFERENCES…………………………………………………………………………..120 vi LIST OF TABLES Table 1. The Sequencing Data Generated with 454 Life Sciences Parallel Pyrosequencing………………………………………………………….……….26 2. The Experimental Design of Liver versus Feather for Northern Bobwhite Microarray for Control and Doses 10,60mg/kd/d for 2,6-DNT………...………..39 3. Composition of Data Available in Quail Genomics Knowledgebase.........…...…46 4. Overview of Blastx Matches Derived from Contig Assemblies Conducted Using CAP3 Assembly and Newbler Assembly..………………..….………………….55 5. Genomic Comparison of Northern Bobwhite Transcriptome against nr Database for Model Organisms……………………………………...…...........…...………58 6. Similar Genes Among Model Organisms Represented in Northern Bobwhite Transcriptome..…………………………………………………………………..59 7. The Distribution of Number of Reads Per Contigs for CAPRG, PAVE p=80, 90 for Northern Bobwhite……………………………..…………………………….99 8. The Distribution of Number of Reads Per Contigs for CAPRG, PAVE p=80, 90 for Japanese Quail...……………………………………………….......………..100 vii LIST OF ILLUSTRATIONS Figure 1. The Project Flowchart Representing Data Components..........................................4 2. The Initiation of Signal Transduction Activates Transcription...............................6 3. The Flow Chart of CAPRG Representing the Mapping of Reads to Generate Contigs...................................................................................................................10 4. The Flow of the Pyrosequencing...........................................................................25 5. The Flowchart for EST Analysis...........................................................................27 6. Directed Graph Representing K-mer with K=4 for Sequence...............................29 7. The Flowchart for Microarray Design and Analysis.............................................31 8. The Classification of Two Populations..................................................................34 9. The ROC Space and the Prediction Outcome........................................................34 10. The ROC Plots for the Liver and Feather at 10/60mg/kg/d for 2,6-DNT..............40 11. The Web Architecture of Quail Genomics............................................................43 12. The Web Interface of the Quail Genomics............................................................44 13. The Data-Flow Diagram and Interaction Among the Data Entities Stored in Database.................................................................................................................45 14. The Flow Chart Representing