Transposable Element Polymorphisms and Human Genome Regulation
Total Page:16
File Type:pdf, Size:1020Kb
TRANSPOSABLE ELEMENT POLYMORPHISMS AND HUMAN GENOME REGULATION A Dissertation Presented to The Academic Faculty by Lu Wang In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in Bioinformatics in the School of Biological Sciences Georgia Institute of Technology December 2017 COPYRIGHT © 2017 BY LU WANG TRANSPOSABLE ELEMENT POLYMORPHISMS AND HUMAN GENOME REGULATION Approved by: Dr. I. King Jordan, Advisor Dr. John F. McDonald School of Biological Sciences School of Biological Sciences Georgia Institute of Technology Georgia Institute of Technology Dr. Fredrik O. Vannberg Dr. Victoria V. Lunyak School of Biological Sciences Aelan Cell Technologies Georgia Institute of Technology San Francisco, CA Dr. Greg G. Gibson School of Biological Sciences Georgia Institute of Technology Date Approved: November 6, 2017 To my family and friends ACKNOWLEDGEMENTS I am truly grateful to my advisor Dr. I. King Jordan for his guidance and support throughout my time working with him as a graduate student. I am fortunate enough to have him as my mentor, starting from very basic, well-defined research tasks, and guided me step-by-step into the exciting world of scientific research. Throughout my PhD training, I have been always impressed by his ability to explain complex ideas – sometimes brilliant ideas of his own – in short and succinct sentences in such a way that his students could easily understand. I am also very impressed and inspired by his diligence and passion for his work. It is my great honor to have Dr. Greg Gibson, Dr. Victoria Lunyak, Dr. John McDonald, Dr. Fredrik Vannberg as my committee members. I really appreciate the guidance they provided me throughout my PhD study and the insightful thoughts they generously share with me during our discussions. Dr. Victoria Lunyak is also a long-time collaborator of the Jordan lab. I have always been impressed and inspired by her brilliant ideas for scientific research and great passion for science. I am very grateful to my friends and colleagues from the Jordan lab, Dr. Andrew Conley, Dr. Ashan Huda, Dr. Jianrong Wang, Dr. Daudi Jjingo, Dr. Lee Katz, Eishita Tyagi, Dr. Lavanya Rishishwar, Emily Norris, Ying Sha, Evan Clayton, Aroon Chande, Junke Wang. I also appreciate the helpful discussions with Dr. Urko Martinez Marigorta and Biao Zeng from the department. iv I am also grateful to have my friends from within and outside Georgia Tech to be here with me in Atlanta, exploring graduate school and life in Atlanta piece by piece. I feel very lucky to have spent time with you during school days and during weekends. Thank you for making my adventures in Atlanta so exciting and colorful. Last but not least, I feel truly grateful to my family for all their love, understanding and support for my research and career. v TABLE OF CONTENTS ACKNOWLEDGEMENTS iv LIST OF TABLES ix LIST OF FIGURES x LIST OF SYMBOLS AND ABBREVIATIONS xi SUMMARY xii CHAPTER 1. INTRODUCTION 1 1.1 Transposable elements (TEs) – definitions and concepts 1 1.1.1 Retrotransposons 1 1.1.2 TE compositions in the host genomes 2 1.2 Functional Roles of TEs 3 1.2.1 TE-derived cis-regulatory elements 4 1.2.2 Trans-regulatory activities of TEs 4 1.3 Human genome regulation of TE activities 5 1.3.1 Transcriptional suppression of TE activities 6 1.3.2 Post-transcriptional suppression of TE activities 7 1.4 Active TEs in the human genome 8 1.5 Human TE Polymorphisms and Genome Regulation 11 CHAPTER 2. GENOME-WIDE SCREEN FOR MODIFIERS OF HUMAN LINE-1 EXPRESSION 13 2.1 Abstract 13 2.2 Introduction 14 2.3 Materials and Methods 18 2.3.1 Genome-wide SNP genotypes 18 2.3.2 Gene expression quantification and normalization 18 2.3.3 L1 expression quantification and normalization 19 2.3.4 Controlling sample covariates for L1 expression levels 20 2.3.5 eQTL association analysis 21 2.3.6 Genetic modifiers of L1 expression 22 2.4 Results and Discussion 23 2.4.1 Genome-wide screening with shared eQTL associations 23 2.4.2 Known modifiers of L1 expression 28 2.4.3 Novel transcription factor L1 modifiers 30 2.4.4 Novel chromatin L1 modifiers 34 2.5 Conclusions 36 CHAPTER 3. HUMAN POPULATION-SPECIFIC GENE EXPRESSION AND TRANSCRIPTIONAL NETWORK MODIFICATION WITH POLYMORPHIC TRANSPOSABLE ELEMENTS 38 3.1 Abstract 38 vi 3.2 Introduction 39 3.3 Materials and Methods 41 3.3.1 Polymoprhic transposable element (polyTE) analysis 41 3.3.2 RNA sequencing (RNA-seq) analysis 45 3.3.3 Expression quantitative trait loci (eQTL) analysis 46 3.3.4 Functional enrichment analysis 48 3.3.5 Transcription factor (TF) target identification 48 3.4 Results 48 3.4.1 The landscape of human TE polymorphisms 48 3.4.2 TE expression quantitative trait loci (TE-eQTL) 50 3.4.3 Population-specific TE-eQTL 53 3.4.4 Transcriptional network TE-eQTLs 57 3.5 Discussion 59 CHAPTER 4. HUMAN RETROTRANSPOSON INSERTION POLYMORPHISMS ARE ASSOCIATED WITH HEALTH AND DISEASE VIA GENE REGULATORY PHENOTYPES 61 4.1 Abstract 61 4.2 Introduction 62 4.3 Materials and Methods 64 4.3.1 Polymorphic transposable element (polyTE) and SNP genotypes 64 4.3.2 PolyTE-SNP linkage analysis 67 4.3.3 Genome wide association studies (GWAS) for disease 67 4.3.4 Evaluating polyTE regulatory potential 68 4.3.5 Expression quantitative trait locus (eQTL) analysis 69 4.3.6 Interrogation of disease-associated gene function and association consistency 70 4.4 Results 71 4.4.1 Linkage disequilibrium for polyTEs and disease-associated SNPs 72 4.4.2 Co-location of disease-linked polyTEs with tissue-specific enhancers 75 4.4.3 Expression associations for disease-linked and enhancer co-located polyTEs 78 4.4.4 Effects of polyTE insertions on immune- and blood-related conditions 82 4.5 Discussion 85 CHAPTER 5. TRANSPOSABLE ELEMENT ACTIVITY, GENOME REGULATION AND HUMAN HEALTH 87 5.1 Abstract 87 5.2 Introduction 88 5.3 Genome-enabled approaches for characterizing TE insertion variants 91 5.4 TE polymorphisms and human genome regulation 94 5.5 TE polymorphisms and complex common disease 96 5.6 Conclusions 99 APPENDIX A. SUPPLEMENTARY INFORMATION FOR CHAPTER 2 100 APPENDIX B. SUPPLEMENTARY INFORMATION FOR CHAPTER 3 101 APPENDIX C. SUPPLEMENTARY INFORMATION FOR CHAPTER 4 126 vii PUBLICATIONS 155 REFERENCES 158 viii LIST OF TABLES Table 1 Top modifiers genes identified via joint eQTL analysis ...................................... 31 Table 2 Summary of TE-disease associations .................................................................. 80 Table 3 Best TE eQTLs identified in the analysis .......................................................... 105 Table 4 Functional enrichment of genes that are associated with TE-eQTL .................. 113 Table 5 Results for conditional association controls ...................................................... 115 Table 6 Results for regional association controls ........................................................... 117 Table 7 eQTL results for known Pax5 target genes that are associated with Alu-7481 . 119 Table 8 Top LD results for polyTE for African population ............................................ 126 Table 9 Top LD results for polyTE for European population ........................................ 137 Table 10 Genome-wide significant TE-eQTL for African population ........................... 151 Table 11 Genome-wide significant TE-eQTL for European population ........................ 153 ix LIST OF FIGURES Figure 1 Schematic showing the structure of different TEs in the human genome .......... 10 Figure 2 Genome-wide approach to screen for genetic modifiers of L1 expression ........ 24 Figure 3 Results of the eQTL association analyses for L1 and gene expression .............. 26 Figure 4 Results of the joint L1-gene eQTL association analysis .................................... 27 Figure 5 Known L1 suppressor genes implicated by the joint eQTL association analysis ........................................................................................................................................... 29 Figure 6 Known L1 retrotransposition promoting genes implicated by the joint eQTL association analysis. .......................................................................................................... 30 Figure 7 Putative L1 modifier transcription factors uncovered by the joint eQTL association analysis ........................................................................................................... 33 Figure 8 Putative chromatin L1 modifiers uncovered by the joint eQTL association analysis .............................................................................................................................. 35 Figure 9 Scheme for the polymorphic transposable element (polyTE) expression quantitative trait loci (eQTL) analysis conducted ............................................................. 42 Figure 10 Distribution of polyTEs among the African and European population groups analyzed ............................................................................................................................ 44 Figure 11 Gene expression profiles within and between populations analyzed ............... 46 Figure 12 Polymorphic transposable element expression quantitative trait loci (TE-eQTL) detected ............................................................................................................................. 51 Figure 13 Functional enrichment of polyTE loci associated genes .................................. 53 Figure 14 Examples of population-specific TE-eQTL detected ....................................... 56