Open Tkaiser Dissertation 71018
Total Page:16
File Type:pdf, Size:1020Kb
The Pennsylvania State University The Graduate School Intercollege Graduate Program in Molecular, Cellular, and Integrative Biosciences MOLECULAR AND EVOLUTIONARY ANALYSES OF TRANSCRIPTIONALLY ACTIVE ENDOGENOUS RETROVIRUSES IN MULE DEER A Dissertation in Molecular, Cellular, and Integrative Biosciences by Theodora Alexis Kaiser © 2018 Theodora Alexis Kaiser Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy August 2018 The dissertation of Theodora Alexis Kaiser was reviewed and approved∗ by the following: Mary Poss Professor of Biology and Veterinary and Biomedical Sciences Dissertation Advisor Chair of Committee Cooduvalli S. Shashikant Associate Professor of Molecular and Developmental Biology Co-Director, Bioinformatics and Genomics Graduate Program Michael Axtell Professor of Biology Le Bao Associate Professor of Statistics Melissa Rolls Associate Professor of Biochemistry and Molecular Biology Chair, Molecular, Cellular, and Integrative Biosciences Graduate Program ∗Signatures are on file in the Graduate School. ii Abstract Endogenous retroviruses (ERVs) are genetic elements originally acquired by infection of a retrovirus in a germ cell, but are subsequently inherited from parent to offspring. ERVs contribute to essential host processes, but can also negatively impact gene expression and are often silenced. Many species contain insertionally polymorphic ERVs, which are variably present among individuals. Thus, ERV insertional polymorphism could result in diversity within a population or species, and ERV expression in this context has been largely unexplored. We investigated transcriptionally active Cervid Endogenous Retroviruses (CrERV) in mule deer, which contain insertionally polymorphic ERVs of different lineages that have been acquired sequentially throughout species evolution, and asked if transcriptional activity was due to proximity to genes or recent integration into the genome or, a functional impact on gene expression. We first evaluated sequence and insertion diversity of CrERV in a Montana mule deer population. Next, we evaluated CrERV expression in this population, and identified four transcriptionally active CrERV. These CrERV were close to genes, but we showed that CrERV can also be silenced close to an expressed host gene. Unexpectedly, transcriptionally active CrERV were phylogenetically older, except one CrERV that has recently expanded within the genome. Transcriptionally active CrERV were widespread in the population, present in the provirus/solo-LTR configuration in all animals, and some CrERV impacted host gene splicing. Finally, we demonstrated that CrERV expression levels differed among populations, and there was higher CrERV expression in animals from a prion disease-endemic region. We also showed that CrERV expression levels correlated with gene expression levels among populations, further supporting an effect of some CrERV on proximal gene expression. In conclusion, we showed that transcriptionally active CrERV were close to genes, and some impact proximal host gene expression. All transcriptionally active CrERV have been maintained in the population in the provirus/solo-LTR configuration. These results suggest that the maintenance of transcriptionally active CrERV iii involves more than the co-option of CrERV LTRs as regulatory elements for host genes. We propose that the CrERV RNA is beneficial to the host, and suggest future investigations of CrERV transcripts as long non-coding RNAs. iv Table of Contents List of Figures viii List of Tables xi Acknowledgments xii Chapter 1 Introduction 1 1.1 Retroviruses . 1 1.1.1 Retroviral Genome . 2 1.1.2 Reverse Transcription and Integration . 3 1.2 Endogenous Retroviruses . 4 1.2.1 Establishment in Host Genomes . 4 1.2.2 ERV Distribution Within the Genome . 5 1.2.3 ERV Classification and Distribution . 5 1.2.4 ERV Expansion in Host Genome . 6 1.2.5 Other Retrotransposons in Host Genomes . 6 1.3 Impacts of ERVs . 8 1.3.1 Genomic Structural Variations . 8 1.3.2 Insertional Mutagenesis . 8 1.3.3 Viral Proteins . 9 1.3.4 Disease Associations . 9 1.4 Host Control of ERVs . 11 1.4.1 Epigenetic Regulation . 11 1.4.2 Small RNAs . 14 1.4.3 KAP1 and KRAB-ZFPs . 15 1.4.4 Restriction Factors . 15 1.4.5 TLR Signaling . 16 1.5 Functional Contributions of ERVs . 16 1.5.1 ERV LTRs as Regulatory Elements for Host Genes . 16 v 1.5.2 ERV RNAs as long non-coding RNAs (lncRNAs) . 20 1.5.3 ERV Proteins . 21 1.6 Transcriptional Activation of ERVs . 21 1.6.1 Position in Genome . 21 1.6.2 Activation by Environmental Stimuli . 22 1.7 ERV Identification in Genomes . 23 1.7.1 Previous Methods . 23 1.7.2 Problems Identifying ERVs from Short-read Sequence Data . 24 1.7.3 Existing Solutions . 24 1.8 Mule Deer and Cervid Endogenous Retrovirus (CrERV) . 26 1.8.1 Mule Deer . 26 1.8.2 Identification of Novel Gammaretrovirus, CrERV . 26 1.9 Dissertation Objectives . 28 Chapter 2 Characterizing polymorphic CrERV in the Montana mule deer population 29 2.1 Introduction . 29 2.2 Materials and Methods . 32 2.3 Results . 35 2.4 Discussion . 44 Chapter 3 Evolutionary implications of endogenous retrovirus expression on host genome evo- lution 48 3.1 Introduction . 48 3.2 Materials and Methods . 50 3.3 Results . 57 3.4 Discussion . 73 Chapter 4 Comparison of transcriptionally active CrERV in mule deer from Montana and Wyoming 79 4.1 Introduction . 79 4.2 Materials and Methods . 81 4.3 Results . 87 4.4 Discussion . 111 vi Chapter 5 Discussion and Future Directions 115 Bibliography 120 vii List of Figures 1.1 Retrovirus genome structure. ................... 3 1.2 Effects of ERVs on genome function. .............. 9 1.3 Summary of ERV silencing mechanisms. 12 2.1 CrERV integrations per animal in Montana. 36 2.2 CrERV prevalence among animals in Montana. 37 2.3 The number of singletons varies with number of Total Cr- ERV integrations. .......................... 38 2.4 PCA and map displaying geographical location based on latitude and longitude of kill-location of Montana (MT), Oregon (OR), and Wyoming (WY) animals. 39 2.5 Phylogeny of representative full-length CrERV from M273. 41 2.6 Distribution of CrERV identified in M273 throughout the Montana population. ........................ 42 2.7 Proportion of WY/OR animals that have each M273 CrERV. 43 2.8 Proportion of WY/OR animals that have each CrERV from M273 found in 6 or less Montana animals. 44 2.9 Prevalence in Montana of CrERV that are not found in M273. ................................. 45 3.1 Methylation patterns of CrERV loci. 59 3.2 Schematic of approach to amplify CrERV spliced env tran- scripts. ................................. 59 3.3 CrERV loci in M273 that produce spliced env transcripts. 60 3.4 RNAseq Read coverage of transcriptionally active CrERV in M273. ................................ 61 3.5 Lineage-specific CrERV quantification. 63 3.6 Results of PCR for individual CrERV loci. 64 3.7 Orientation of transcriptionally active CrERV with respect to genes. ............................... 66 viii 3.8 Schematic of KXD1 transcript isoforms. 67 3.9 KXD1 gene expression levels between animals with and without S26536. ........................... 68 3.10 Schematic of SIRT6 transcript isoforms. 69 3.11 Alignment of S386 LTRs and LTR-driven SIRT6 transcript. 70 3.12 SIRT6 gene expression levels between animals with and without S386. ............................. 71 3.13 ISY1 transcript structure. ..................... 72 3.14 Presence of S3442n integration alters FBXO42 Exon 1 us- age patterns. ............................. 73 3.15 Splicing patterns of S3442 and FBXO42. 74 4.1 PCA of shared CrERV. ...................... 88 4.2 Map showing geographic locations of animals from Mon- tana, Oregon, and Wyoming based on latitude and longi- tude of kill-location. ........................ 89 4.3 Distribution of CrERV per Animal in Montana and Wyoming. 90 4.4 CrERV prevalence among animals in Montana and Wyoming. 91 4.5 The number of singletons varies with number of total Cr- ERV. .................................. 92 4.6 The proportion of Montana vs Wyoming animals that con- tain each M273 CrERV. ...................... 94 4.7 Lineage-specific CrERV expression in Montana and Wyoming animals. ................................ 96 4.8 Overall CrERV expression levels differ between Montana and Wyoming. ............................ 98 4.9 Overall CrERV expression levels differ between Montana and two Wyoming populations. 99 4.10 Lineage-specific CrERV expression levels differ between Montana and Wyoming. 100 4.11 Lineage-specific CrERV expression levels differ between Montana and two Wyoming populations. 101 4.12 KXD1 gene expression levels between Wyoming animals with and without S26536. 102 4.13 Total KXD1 and canonical KXD1 gene expression between populations. .............................103 4.14 LTR-driven KXD1 gene expression between populations. 104 ix 4.15 SIRT6 gene expression levels between Wyoming animals with and without S386. 105 4.16 LTR-driven SIRT6 gene expression between populations. 106 4.17 Total SIRT6 gene expression between populations. 107 4.18 Presence of S3442n integration alters FBXO42 Exon 1 us- age patterns in Wyoming. 108 4.19 FBXO42 Exon 1B-exon 2 expression between populations. 109 4.20 FBXO42 Exon 1C-exon 2 expression between populations. 110 x List of Tables 3.1 Primers used for spliced env amplification and cloning. 52 3.2 Primers used for CrERV locus-specific PCR. 53 3.3 Primers used for CrERV qPCR. 54 3.4 Primers used for Gene qPCR ................... 56 3.5 Summary of Bisulfite Analysis of 50 LTRs in