The Pennsylvania State University
Total Page:16
File Type:pdf, Size:1020Kb
The Pennsylvania State University Eberly College of Science Biology Department FUNCTIONAL TRANSCRIPTOMICS: ECOLOGICALLY IMPORTANT GENETIC VARIATION IN NON-MODEL ORGANISMS A Dissertation in Biology by Juan Cristobal Vera 2012 Juan Cristobal Vera Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy May 2012 ii The thesis of Juan Cristobal Vera was reviewed and approved* by the following: James Marden Professor of Biology Dissertation Advisor Stephen Wade Schaeffer Associate Professor of Biology Chair of Committee Webb Miller Professor of Biology and Computer Science and Engineering Iliana B. Baums Assistant Professor of Biology Christina Grozinger Associate Professor of Entomology Charles R. Fisher Professor of Biology Assistant Department Head for Graduate Education *Signatures are on file in the Graduate School iii ABSTRACT With the advent of second generation, high-throughput sequencing, genomic molecular resources once limited to the study of molecular model organisms have become widely available. The ability to sequence a transcriptome (i.e. transcribed portion of the genome) without reference to a sequenced genome is a particularly useful, efficient, and cost-saving method for applying genome-wide analyses to ecologically interesting organisms with little or no existing genetic data. Here I present an introduction to the tools and methods I developed specifically for de novo transcriptomics, as well as the results of an analysis of the ecologically well-studied Glanville fritillary butterfly (Melitaea cinxia) and the two coral species P. damicornis and A. millepora. Results of theanalyses include sample preparation, assembly of reads into contiguous consensus sequences, annotation, and verification of the resulting de novo transcriptome quality and completeness. Additional tools and methods were developed for running simulations and other comparative quality analyses, as well as for SNP discovery in a de novo transcriptome. The de novo M. cinxia transcriptome was scanned for SNPs that were used in a polymorphism survey to detect candidate genes with the potential to be undergoing long-term balancing selection. One of the top candidates was juvenile hormone acid methyltransferase gene (jhamt), the gene for the final metabolic enzyme in the Juvenile Hormone synthesis pathway. After partial molecular characterization, jhamt was found to consist of at least two paralogs in M. cinxia, both of which contain extensive allelic polymorphism. In addition, these methods were verified in a transcriptome-wide environmental association study of adaptive genetic variation related to environmental variation in natural populations of the two coral species, where several candidate genes potentially involved in environmental stress response were identified. In conclusion, this work developes a set of novel tools and methods useful for initiating genome-wide genetic and evolutionary analysis in ecologically interesting organisms lacking in previous molecular model organsism resources and presents the results of two such studies as examples for future researchers. iv TABLE OF CONTENTS LIST OF FIGURES……………………………………………………………………….. vi LIST OF TABLES………………………………………………………………………... xiv PREFACE………………………………………………………………………………… xx ACKNOWLEDGEMENTS………………………………………………………………. Xxi Chapter One………… A brief history of transcriptomics………………………………… 1 References………………………………………………………………………………… 5 Chapter Two………... Rapid transcriptome characterization for a non-model organism using 454 pyrosequencing……………………………………………… 7 Introduction……………………………………………………………………………….. 7 Materials and Methods……………………………………………………………………. 8 Results…………………………………………………………………………………….. 10 Sequencing and assembly…………………………………………………………. 10 Quality and performance of the 454 assembly……………………………………. 11 Transcriptome coverage breadth…………………………………………………... 12 Functional annotation……………………………………………………………… 13 SNP discovery…………………………………………………………………….. 14 Alternative splicing effects on assembly…………………………………………. 14 Performance of microarray probes……………………………………………….. 15 Metatrascriptomics and detection of an intracellular parasite……………………. 16 Discussion………………………………………………………………………….. 16 References………………………………………………………………………….. 19 Chapter Three………. PipeMeta: a Toolset for de novo Transcriptome Assembly with an Additional New Method for Transcriptome Quality Analysis………… 42 Introduction…………………………………………………………………………. 42 Materials and Methods……………………………………………………………… 46 M. cinxia 454 Reads……………………………………………………………… 46 Additional Assembly and Annotation…………………………………………….. 46 PipeMeta Pipeline………………………………………………………………… 47 Modified in silico Predicted Gene Set…………………………………………… 47 Simulated Transcriptome Reads…………………………………………………. 48 Additional Simulated Reads……………………………………………………… 50 TranscriptSimulator………………………………………………………………. 50 Simulated Assembly……………………………………………………………… 51 Comparison of Assemblies: Metrics……………………………………………… 51 Comparison of Assemblies: by Reference………………………………………… 54 Results………………………………………………………………………………. 55 Assembly of M. cinxia Reads…………………………………………………….. 55 Assembly of Simulated Lepidoptera Reads………………………………………. 55 Transcript Abundance for M. cinxia and Simulated Reads………………………. 56 Comparison of Assemblies: Using Metrics to Assess Assembly Quality………… 58 Evaluation of a Real assembly by using a Reference Transcriptome and Simulation.. 62 Conclusions…………………………………………………………………………. 64 References…………………………………………………………………………… 67 Chapter Four……….. Scanning the Glanville frittilary Transcriptome for SNPs and Partial Characterization of a Polymorphic Life History Gene… 93 Introduction…………………………………………………………………………. 93 Methods……………………………………………………………………………... 97 SNPHunter………………………………………………………………………….. 97 M. cinxia 454 reads…………………………………………………………………. 99 Genome-wide scan of polymorphism in a population of M. cinxia………………… 100 Partial Characterization of JHaMT………………………………………………….. 102 Results………………………………………………………………………………… 102 v M. cinxia 454 read sequencing, assembly, annotation……………………………… 102 SNPs in M. cinxia………………………………………………………………….... 102 Genome-wide scan of polymorphism in a population of M. cinxia………………… 103 Partial Characterization of JHaMT in M. cinxia……………………………………. 104 Conclusions…………………………………………………………………………… 105 References…………………………………………………………………………….. 107 Chapter Five………... Environmentally Associated Genetic Variation in Two Coral Species through de novo Transcriptomics………………………………… 124 Introduction………………………………………………………………………… 124 Methods……………………………………………………………………………. 125 P. damicornis and A. millepora polymorphism in 454 reads and association with environmental variables…………………………………………………………. 125 Targeted SNP detection in P. damicornis and A. millepora……………………… 126 Validation of environmental SNP associations in independent samples………… 127 Statistical Analysis of P. damicornis and A. millepora………………………….. 128 Results……………………………………………………………………………… 129 P. damicornis 454 read sequencing, assembly, and annotation………………….. 129 Allele frequencies and environmental correlations in P. damicornis and A. millepora.. 130 Conclusions…………………………………………………………………………… 131 References……………………………………………………………………………. 133 vi LIST OF FIGURES Figure 2-1. Gel image of two cDNA collections (A is larval/pupal material; B is adult) made from poly(A+) mRNA, before and after cDNA normalization. Size markers are shown in the lane at the left; normalized cDNA lanes are labeled N-cDNA. Note the high relative abundance of about 10 eretranscripts (distinct bands) in the cDNA lanes and complete absence thereof after normalization. ................................................. 23 Figure 2-2. Characteristics of assembled M. cinxia 454 contigs, and blastx alignments against Bombyx mori. A,B; Length and coverage of contigs. Variable bar width in plot A is an artifact of the logarithmic horizontal axis; there is no information content in the area of these bars. Note the logarithmic vertical axes; the majority of contigs were small and had low coverage depth. C,D; Frequency distributions of percent identity and deduced amino acid alignment length for all blast hits (bitscore > 45) of M. cinxia contigs and singletons (N= 13,653) to 6,289 Bombyx mori predicted proteins. ............................................................................................................ 24 Figure 2-3. Examples of blast alignments of M. cinxia assembled Sanger ESTs against M. cinxia 454 contigs. A: An alignment that is representative of the average alignment length and percent identity (see Table 4). Note the variable presence of what appears to be an alternatively spliced microexon (single codon). B: The longest Sanger vs. 454 alignment. .................................................................................... 25 Figure 2-4. Upper; Schematic showing how local clusters were often not joined together due primarily to lack of critical sequence reads necessary for joining clusters into one full length, or nearly full length contig. Naturally-occurring variation in the form of polymorphism and alternative splicing, and bias toward increased coverage of transcript end regions (Bainbridge et al 2006; see Figure 4a) can hamper assembly, even where there is high coverage and low rate of sequencing errors.Lower; Within clusters of multiple reads, polymorphic sites, or SNPs, are readily identified. The right side of this alignment shows an example of how sequencing error is ignored in the consensus sequence (underlined sequence above the alignment). ............................. 26 Figure 2-5. A: