Computational Studies of the Genome Dynamics of Mammalian Transposable Elements and Their Relationships to Genes

COMPUTATIONAL STUDIES OF THE GENOME DYNAMICS OF MAMMALIAN TRANSPOSABLE ELEMENTS AND THEIR RELATIONSHIPS TO GENES by Ying Zhang M.Sc., Katholieke Universiteit Leuven (BELGIUM), 2004 B.E., Harbin Institute of Technology (CHINA), 1993 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES (Genetics) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) May, 2012 © Ying Zhang, 2012 Abstract Sequences derived from transposable elements (TEs) comprise nearly 40 - 50% of the genomic DNA of most mammalian species, including mouse and human. However, what impact they may exert on their hosts is an intriguing question. Originally considered as merely genomic parasites or “selfish DNA”, these mobile elements show their detrimental effects through a variety of mechanisms, from physical DNA disruption to epigenetic regulation. On the other hand, evidence has been mounting to suggest that TEs sometimes may also play important roles by participating in essential biological processes in the host cell. The dual-roles of TE-host interactions make it critical for us to understand the relationship between TEs and the host, which may ultimately help us to better understand both normal cellular functions and disease. This thesis encompasses my three genome-wide computational studies of TE-gene dynamics in mammals. In the first, I identified high levels of TE insertional polymorphisms among inbred mouse strains, and systematically analyzed their distributional features and biological effects, through mining tens of millions of mouse genomic DNA sequences. In the second, I examined the properties of TEs located in introns, and identified key factors, such as the distance to the intron-exon boundary, insertional orientation, and proximity to splice sites, that influence the probability that TEs will be retained in genes. In the third, a study specifically focused on genes with extremely high or low TE content in three mammalian species, I showed associations between TE density and the function/conservation of genes, as well as the relevance of chromatin state to TE accumulation in genes. ii While most of my results clearly support the idea that today’s TE distribution pattern is an outcome of natural selection or genetic drift during evolution, the final part of my work, which compares TE density to chromatin state in embryonic stem cells, suggests that traces of the initial integration preference of TEs still exist. Taken together, these results demonstrated the effects of both initial TE integration and natural selection in shaping the landscape of today’s mammalian genomes and, most importantly, shed light on the roles of mobile elements in evolution. iii Preface A version of Chapter 2 has been published: Ying Zhang, Irina A. Maksakova, Liane Gagnier, Louie N. van de Lagemaat, Dixie L. Mager (2008). “Genome-wide assessments reveal extremely high levels of polymorphism of two active families of mouse endogenous retroviral elements.” PLoS Genet 4(2): e1000007. I designed all the computational methods for identifying mouse ERV polymorphisms, performed most computational data analyses, and wrote sections of the paper. I.A.M. and L.G. performed all the biological experiments for Figure 2.5 and 2.7, L.N.v.d.L. analyzed the data in Figure 2.6, D.L.M and I.A.M. wrote sections of the paper. A version of Chapter 3 has been published: Ying Zhang, Mark T. Romanish, Dixie L. Mager (2011). “Distributions of transposable elements reveal hazardous zones in Mammalian introns.” PLoS Comput Biol 7(5): e1002046. I designed and performed all the computational data analyses, and wrote sections of the paper. M.T.R. performed the biological experiments for Figure 2.8. D.L.M and M.T.R. wrote sections of the paper. A version of Chapter 4 has been published: Ying Zhang and Dixie L. Mager (2012). “Gene properties and chromatin state influence the accumulation of transposable elements in genes.” PLoS One 7(1): e30158. I designed and performed all the computational data analyses, and wrote the paper. D.L.M corrected and revised the paper. The mouse work was covered by Animal Care Certificate A09-0372 issued by the UBC Animal Care Committee. iv Table of Contents Abstract .................................................................................................................................... ii Preface ..................................................................................................................................... iv Table of Contents .................................................................................................................... v List of Tables .......................................................................................................................... xi List of Figures ........................................................................................................................ xii Acknowledgements ............................................................................................................... xv Chapter 1: Introduction ........................................................................................................ 1 1.1 Mammalian Transposable Elements ..................................................................... 2 1.1.1 Long interspersed nucleotide elements (LINEs) ................................................... 6 1.1.2 Short interspersed nucleotide elements (SINEs) ................................................... 7 1.1.3 Long terminal repeat (LTR) retrotransposons ...................................................... 8 1.1.4 DNA transposons ................................................................................................ 10 1.2 Distribution of TEs in Mammalian Genomes .................................................... 11 1.2.1 TE distribution and the local G/C content .......................................................... 12 1.2.2 TE orientation bias in genes ................................................................................ 13 1.2.3 TE density within genes ...................................................................................... 14 1.3 Initial Integration Site Preference of TEs ........................................................... 15 1.3.1 Integration site preference of retroviruses and ERVs ......................................... 16 1.3.2 Integration site preference of other TEs .............................................................. 19 1.4 Activity and Polymorphism of TEs in Mammals ............................................... 20 1.4.1 TE activity and polymorphism in humans .......................................................... 21 v 1.4.2 TE activity and polymorphism in mice ............................................................... 24 1.4.3 TE activity and polymorphism in other mammals .............................................. 26 1.5 Effects of TE Integration in the Host Genome ................................................... 29 1.5.1 TE-mediated physical damage at the DNA level ................................................ 29 1.5.2 Transcriptional influence of TE sequences on host genes .................................. 32 1.5.3 Epigenetic Effects of TEs ................................................................................... 36 1.6 Transposable Elements and Host Evolution....................................................... 39 1.6.1 TEs vs. the host genome: an everlasting battle ................................................... 39 1.6.2 TE exaptation: turning “junk” into “gold” .......................................................... 43 1.7 Thesis Objectives ................................................................................................... 47 Chapter 2: Identification and Investigation of ERV Polymorphisms in Mice ............... 50 2.1 Background ........................................................................................................... 51 2.2 Results and Discussion .......................................................................................... 53 2.2.1 Prevalence of ETn/MusDs and IAPs in different strains .................................... 53 2.2.2 Identification and frequency of polymorphic ERVs ........................................... 54 2.2.3 Genic distribution patterns of the youngest ERVs are distinct from older elements .......................................................................................................................... 57 2.2.4 Confirmation of polymorphic ERVs in gene introns .......................................... 58 2.2.5 Potential gene expression effects mediated by polymorphic ERVs ................... 63 2.3 Concluding Remarks ............................................................................................ 71 2.4 Materials and Methods ......................................................................................... 72 2.4.1 Source data .......................................................................................................... 72 2.4.2 Design of ERV probes and detection of ERVs in the assembled B6 genome .... 73 vi 2.4.3 Detection of ERV insertions in test strains ......................................................... 74 2.4.4 Determining the polymorphism status of ERVs present in B6 or in the test strains .............................................................................................................................75 2.4.5 B6 trace sampling

Computational Studies of the Genome Dynamics of Mammalian Transposable Elements and Their Relationships to Genes

Genetic Analysis of Retinopathy in Type 1 Diabetes

Hearing Aging Is 14.1±0.4% GWAS-Heritable

Lineage-Specific Evolution of the Vertebrate Otopetrin Gene Family Revealed by Comparative Genomic Analyses

Ccdc80 and Ccdc80-L1: Identification and Functional Analysis of Two Novel Genes Involved in Zebrafish (Danio Rerio) Development

Whole-Genome Microarray Detects Deletions and Loss of Heterozygosity of Chromosome 3 Occurring Exclusively in Metastasizing Uveal Melanoma

Dual Proteome-Scale Networks Reveal Cell-Specific Remodeling of the Human Interactome

Supplementary Table 3 Complete List of RNA-Sequencing Analysis of Gene Expression Changed by ≥ Tenfold Between Xenograft and Cells Cultured in 10%O2

C3orf70 Is Involved in Neural and Neurobehavioral Development

A Single-Cell Transcriptomic Landscape of Primate Arterial Aging

1 Supporting Information for a Microrna Network Regulates

High-Density Single Nucleotide Polymorphism Array Defines Novel Stage and Location-Dependent Allelic Imbalances in Human Bladder Tumors

Relevance Network Between Chemosensitivity and Transcriptome in Human Hepatoma Cells1