Population Genomics of Human Polymorphic Transposable Elements

POPULATION GENOMICS OF HUMAN POLYMORPHIC TRANSPOSABLE ELEMENTS A Dissertation Presented to The Academic Faculty by Lavanya Rishishwar In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in Bioinformatics in the School of Biological Sciences Georgia Institute of Technology December 2016 COPYRIGHT © 2016 BY LAVANYA RISHISHWAR POPULATION GENOMICS OF HUMAN POLYMORPHIC TRANSPOSABLE ELEMENTS Approved by: Dr. I. King Jordan, Advisor Dr. John F. McDonald School of Biological Sciences School of Biological Sciences Georgia Institute of Technology Georgia Institute of Technology Dr. Greg Gibson Dr. Soojin V. Yi School of Biological Sciences School of Biological Sciences Georgia Institute of Technology Georgia Institute of Technology Dr. Leonardo Mariño-Ramírez National Center for Biotechnology Information National Institutes of Health Date Approved: November 7, 2016 To my family and friends ACKNOWLEDGEMENTS I am deeply grateful to my advisor Dr. I. King Jordan for his constant support and guidance throughout my time here as a student, working with him and his lab. I will always be appreciative of his patience and the time he took teach me many fundamental things of science and scientific communication. His passion of science is contagious and has left a large impression on me. He is the finest mentor I’ve ever had, a great friend and a father- like figure to me. Under his tutelage, I’ve become a better researcher, better teacher and, most of all, a better person. I had the pleasure of having Dr. Greg Gibson, Dr. John McDonald, Dr. Soojin Yi and Dr. Leonardo Mariño-Ramírez as the most supportive committee members I could’ve ever had. I would also like to thank Dr. Joseph Lachance for all the helpful discussions and insights. Their constant encouragement throughout this program was crucial for my success and development. I am grateful to my friends and colleagues from the Jordan Lab, Dr. Andrew Conley, Dr. Jianrong Wang, Dr. Daudi Jjingo, Dr. Lee Katz, Eishita Tyagi and Lu Wang, who motivated and inspired me to do amazing things. I want to thank Emily Norris for being a great friend and giving me her unconditional support over the past few years. I am extremely thankful of her time with all of the helpful discussions over my numerous papers. I am also obliged to Dr. Vinay Mittal, Piyush Ranjan and Siddharth Choudhary for the fun and crazy moments that helped me get through grad school in a stress free manner. iv I am very appreciative of Dr. Bala Swaminathan for trusting me and giving me the opportunity to lead the Applied Bioinformatics Laboratory (ABiL) and broaden my horizons. I also thank Michael Astwood for trusting me and constantly pushing me to believe in myself. You have been a great friend and colleague. I must also thank Troy Hilley, for being patient and helping me out with all the IT related problems, and Lisa Redding, for always being responsive and helping me out with all the program related requirements. Their support saved me a lot of time and frustration over the years. I am very much appreciative of my friends in Atlanta: Suzanna Kim, Angela Peña, Juliana Soto, Kizee Etienne, Linh Chau, Chinar Patil, Laurel Jenkins, Kawther Abdilleh, Camila Medrano and Aroon Chande for the memorable times we have had. Your company helped me to feel that Atlanta was my home. Last, but not the least, I couldn’t have done all of this without the constant love and support from my family, my brother Kshitij Rishishwar, my father Sanjay Rishishwar, my mother Dr. Poonam Rishishwar and my grandfather Dr. Krishna Mohan. This was their dream and my sole reason for working hard all of these years. v TABLE OF CONTENTS ACKNOWLEDGEMENTS iv LIST OF TABLES ix LIST OF FIGURES x LIST OF SYMBOLS AND ABBREVIATIONS xii SUMMARY xiii CHAPTER 1. INTRODUCTION 1 1.1 Transposable elements defined 1 1.2 Transposable elements (TEs) in the human genome 3 1.3 Active human transposable elements (TEs) 5 1.4 Computational detection of polymorphic transposable element (polyTE) insertions 7 1.5 Evolutionary implications of human polyTE insertions 9 1.5.1 Ancestry informative markers 9 1.5.2 Selection 10 1.6 Population genomics of human polyTEs 12 CHAPTER 2. BENCHMARKING COMPUTATIONAL TOOLS FOR POLYMORPHIC TRANSPOSABLE ELEMENT DETECTION 15 2.1 Abstract 15 2.2 Introduction 16 2.2.1 Polymorphic TEs in the human genome 16 2.2.2 Polymorphic TE detection tools 18 2.3 Materials and Methods 23 2.3.1 Benchmarking sequence data sets 23 2.3.2 Benchmarking and validation parameters 25 2.4 Results and Discussions 28 2.4.1 PolyTE detection performance 28 2.4.2 Sequence coverage and tool performance 32 2.4.3 Runtime parameters 34 2.5 Additional notes for users and developers 36 2.6 Conclusions and future prospects 37 CHAPTER 3. TRANSPOSABLE ELEMENT POLYMORPHISMS RECAPITULATE HUMAN EVOLUTION 41 3.1 Abstract 41 3.2 Background 42 3.3 Results 45 3.3.1 Human population genomics of polyTEs 45 3.3.2 Human evolutionary relationships based on polyTEs 51 vi 3.3.3 Ancestry prediction with polyTEs 54 3.3.4 Admixture prediction with polyTEs 58 3.4 Discussion 59 3.4.1 Human ancestry and admixture from polyTEs 59 3.4.2 Deleteriousness and selection on polyTE insertions 61 3.5 Conclusions 61 3.6 Materials and Methods 62 3.6.1 Transposable element polymorphisms 62 3.6.2 Ancestry analysis 63 3.6.3 Admixture analysis 63 3.6.4 Ancestry and admixture prediction analyses 64 CHAPTER 4. Population-specific positive selection on human transposable element insertions 66 4.1 Abstract 66 4.2 Introduction 66 4.3 Results and Discussion 70 4.3.1 Characterization of human polymorphic transposable elements (polyTEs) 70 4.3.2 Negative selection on human polyTEs 72 4.3.3 Detecting positive selection on human polyTEs 75 4.3.4 Examples of positively selected human polyTEs 82 4.4 Materials and Methods 85 4.4.1 Polymorphic transposable element (polyTE) analysis 85 4.4.2 Population branch statistic (PBS) calculation 86 4.4.3 Detection of positively selected polyTEs using PBS values and coalescent modelling 88 4.4.4 Gene regulatory potential of selected polyTEs 89 4.5 Conclusions 89 CHAPTER 5. Population and clinical genetics of human transposable elements in the (post) genomic era 91 5.1 Abstract 91 5.2 Human transposable element research in the (post) genomic era 92 5.2.1 Technology driven research and discovery on human transposable elements 92 5.3 Active families of human TEs 94 5.4 Genome-scale characterization of TE insertions 96 5.4.1 Human genome sequencing initiatives 96 5.4.2 High-throughput techniques for TE insertion detection 100 5.5 Evolutionary genetics of active human TEs 109 5.5.1 Human genetic variation from TE activity 109 5.5.2 Polymorphic TE insertions as ancestry informative markers 112 5.5.3 Effects of natural selection on polymorphic TE insertions 114 5.6 Clinical genetics of polymorphic TE insertions 117 5.6.1 TE insertions in Mendelian disease 117 5.6.2 TE activity and cancer 118 5.6.3 Polymorphic TE insertion associations with common diseases 119 5.6.4 TE insertion associations with quantitative traits 121 vii 5.7 Conclusions and prospects 123 APPENDIX A. SUPPLEMENTARY INFORMATION FOR CHAPTER 2 124 A.1 Notes on the general structure of the commands 124 A.2 Commands used for generating simulated BAM files 124 A.2 Commands used for calling polyTE in different tools 125 A.2.1 MELT (Version: 1.2.20) 126 A.2.2 Mobster (Version: 0.1.7c) 127 A.2.3 RetroSeq (Version: 1.41) 128 A.2.4 TEMP (Version: 1.04) 129 A.2.5 Tangram (Version: 0.3.1) 130 A.2.6 ITIS (Download date: 1st March 2015) 131 A.2.7 T-lex2 (Version: 2.2.2) 132 APPENDIX B. SUPPLEMENTARY INFORMATION FOR CHAPTER 3 134 APPENDIX C. SUPPLEMENTARY INFORMATION FOR CHAPTER 4 161 PUBLICATIONS 169 REFERENCES 173 viii LIST OF TABLES Table 1 Abundance of different TE types in the human genome. ...................................... 4 Table 2 List of polyTE detection tools benchmarked in this study. ................................. 20 Table 3 Actual and simulated data sets used for benchmarking polyTE detection tools. 25 Table 4 Benchmarking and validation results for seven polyTE detection tools. ............ 26 Table 5 Human populations analyzed in this study. ......................................................... 46 Table 6 Human populations analyzed in this study. ......................................................... 71 Table 7 List of high confidence positively selected polyTEs. .......................................... 84 Table 8 Large scale genome sequencing initiatives. ......................................................... 98 Table 9 Computational approaches for genome-wide detection of TE insertions. ......... 102 Table 10 High-throughput experimental approaches for TE insertion detection. .......... 107 Table 11 List of human polyTE loci with allele frequencies and FST values. ................ 143 ix LIST OF FIGURES Figure 1 Broad classification of human transposable elements (TEs). ............................... 3 Figure 2. Genomic structures of the three main active TE families in the human genome (not to scale). ....................................................................................................................... 6 Figure 3 Read mapping types frequently analyzed for computational TE detection from whole genome sequencing data. ......................................................................................... 8 Figure 4 Human populations characterized in the phase III release of the 1000 Genomes Project. .............................................................................................................................

Population Genomics of Human Polymorphic Transposable Elements

Chuanxiong Rhizoma Compound on HIF-VEGF Pathway and Cerebral Ischemia-Reperfusion Injury’S Biological Network Based on Systematic Pharmacology

Proteome Profiling of Developing Murine Lens Through Mass

Whole Exome Sequencing in Families at High Risk for Hodgkin Lymphoma: Identification of a Predisposing Mutation in the KDR Gene

Region Based Gene Expression Via Reanalysis of Publicly Available Microarray Data Sets

CRYZ Conjugated Antibody

Rabbit Anti-CRYZ Antibody-SL14080R

Mouse Cryz Conditional Knockout Project (CRISPR/Cas9)

Vast Human-Specific Delay in Cortical Ontogenesis Associated With

Primepcr™Assay Validation Report

[KO Validated] CRYZ Rabbit Pab

Network Pharmacology-Based Approach Uncovers the Mechanism of Guanxinning Tablet for Treating Thrombus by Mapks Signal Pathway

Genome-Wide Association Analysis Identifies TYW3