Retrotransposon-Mediated Instability in the Human Genome Shurjo Kumar Sen Louisiana State University and Agricultural and Mechanical College, [email protected]
Total Page:16
File Type:pdf, Size:1020Kb
Louisiana State University LSU Digital Commons LSU Doctoral Dissertations Graduate School 2008 Retrotransposon-mediated instability in the human genome Shurjo Kumar Sen Louisiana State University and Agricultural and Mechanical College, [email protected] Follow this and additional works at: https://digitalcommons.lsu.edu/gradschool_dissertations Recommended Citation Sen, Shurjo Kumar, "Retrotransposon-mediated instability in the human genome" (2008). LSU Doctoral Dissertations. 1306. https://digitalcommons.lsu.edu/gradschool_dissertations/1306 This Dissertation is brought to you for free and open access by the Graduate School at LSU Digital Commons. It has been accepted for inclusion in LSU Doctoral Dissertations by an authorized graduate school editor of LSU Digital Commons. For more information, please [email protected]. RETROTRANSPOSON-MEDIATED INSTABILITY IN THE HUMAN GENOME A Dissertation Submitted to the Graduate Faculty of the Louisiana State University and Agricultural and Mechanical College in partial fulfillment of the requirements for the degree of Doctor of Philosophy in The Department of Biological Sciences by Shurjo Kumar Sen B.Sc. (Hons.), University of Calcutta, 2001 M.Sc., University of Calcutta, 2003 May 2008 ACKNOWLEDGEMENTS I owe a debt of gratitude to several people for their assistance during the course of my dissertation research. My graduate advisor, Dr. Mark A. Batzer, has been no less than a father to me ever since I joined LSU, and has frequently helped me in his own special way to get over periodic attacks of laziness. Drs. Michael Hellberg, Joomyeong Kim and Stephania Cormier have been simply wonderful as members of my graduate committee. Dr. John Battista graciously spent large amounts of time answering my questions about DNA repair. I have been lucky to have outstanding collaborators, both within and outside the Batzer laboratory whom I would like to thank for enriching my graduate school experience. For his contribution to chapters two, three and four: Dr. Kyudong Han. For their contribution to chapters two and three: Drs. Ping Liang and Richard Cordaux. For his contribution to chapter four: Charles Huang. In addition, I remain grateful to all other members of the Batzer laboratory (especially Jerilyn Walker, our many undergraduate workers, and my good buddies Jason Gray and Chad Jarreau) for being the best set of colleagues possible. My parents, brother and extended family were a constant source of support throughout the last four years, especially my maternal grandfather who remains my only hero. Lastly, I would like to express my gratitude to my wife Soma for her love, devotion and the many delicious meals she supplied at odd hours. ii TABLE OF CONTENTS ACKNOWLEDGEMENTS................................................................................. ii LIST OF TABLES............................................................................................... iv LIST OF FIGURES ............................................................................................. v ABSTRACT......................................................................................................... vi CHAPTER ONE: BACKGROUND.................................................................... 1 CHAPTER TWO: GENOMIC REARRANGEMENTS BY LINE-1 INSERTION- MEDIATED DELETION IN THE HUMAN AND CHIMPANZEE LINEAGES ............................................................................................. 13 CHAPTER THREE: HUMAN GENOMIC DELETIONS MEDIATED BY RECOMBINATION BETWEEN ALU ELEMENTS............................. 47 CHAPTER FOUR: ENDONUCLEASE-INDEPENDENT INSERTION PROVIDES AN ALTERNATIVE PATHWAY FOR L1 RETROTRANSPOSITION IN THE HUMAN GENOME ......................................................................... 80 CHAPTER FIVE: SUMMARY........................................................................... 117 APPENDIX: LETTERS OF PERMISSION....................................................... 122 VITA .................................................................................................................... 125 iii LIST OF TABLES 2.1 Structural summary of L1 insertion-mediated deletions ................................................ 22 2.2 L1 insertion-mediated deletion frequency and polymorphism levels within the human and chimpanzee lineages ................................................................... 25 3.1 Summary of human-specific ARMD events .................................................................. 49 3.2 Genomic DNA sequences deleted by ARMD ............................................................... 62 4.1 Human NCLI loci and insertion site characteristics ....................................................... 90 iv LIST OF FIGURES 2.1 L1 insertion-mediated deletion in the human genome ................................................... 17 2.2 Endonuclease cleavage site preferences for the L1IMDs .............................................. 19 2.3 Median-joining network of the L1 elements associated with L1IMD ........................... 21 2.4 Size distribution of the L1IMDs .................................................................................... 24 2.5 Models for the creation of L1IMDs ............................................................................... 33 2.6 Models for the formation of deletions associated with atypical L1 elements ................ 37 3.1 ARMD in the human genome ........................................................................................ 50 3.2 Density of ARMD events (red lines) and all Alu insertions (blue lines) on individual human chromosomes ............................................................................... 52 3.3 Alu subfamily composition in ARMD events ................................................................ 53 3.4 Size distribution of human-specific ARMD events, displayed in 100-bp bin sizes ...... 55 3.5 Four different types of the recombination between Alu elements ................................. 57 3.6 Recombination window between Alu elements and percentage frequencies of breakage (during recombination) at different positions along an Alu consensus sequence ......................................................................................................................... 58 3.7 Density of ARMD events (red lines) and RefSeq genes (blue lines) on individual human chromosomes ............................................................................... 60 3.8 Computational data mining for human lineage-specific ARMD loci ............................ 71 4.1 Comparison of TPRT and NCLI L1 insertions............................................................... 82 4.2 Analysis of NCLI elements............................................................................................. 87 4.3 Schematic diagram of NCLI L1 element length............................................................. 92 4.4 L1 cleavage site analysis................................................................................................. 93 4.5 NCLI microhomology analysis....................................................................................... 98 v ABSTRACT LINE-1 (Long Interspersed Element-1/L1) and Alu are two active retrotransposon families in the human genome that have the potential to create genomic instability either during the insertion of new elements or through ectopic recombination. However, recent in vitro analyses have demonstrated that these elements also repair DNA double-strand breaks, hence contributing to the maintenance of genomic integrity. As such, the comprehensive role of mobile elements in either creating or mitigating instability in primate genomes remains unclear. The recent sequencing of the chimpanzee and rhesus macaque genomes uniquely facilitates the accurate resolution of this question, as three- way computational alignment of the human genome with two other hominoid genomes allows human lineage-specific changes (i.e., those younger than 5-6 million years) to be accurately dissected out. Here, using a combined computational and experimental approach, we have attempted to provide an unbiased picture of the contribution of the Alu and L1 families to human genomic stability. In the first analysis described herein, we assessed levels of genomic deletion associated with L1 retrotransposition and reported 50 deletions resulting in the loss of ~18 kb of human genomic sequence and ~15 kb of chimpanzee genomic sequence. We developed models to explain the observed bimodality of the deletion size distribution, and showed that overall, in vivo deletions are smaller than those observed in cell culture analyses. Next, we quantified Alu recombination- mediated deletion in the human genome subsequent to the human-chimpanzee divergence and described 492 deletions (totaling ~400 kb of human genomic sequence) attributable to this process. Interestingly, the majority of these deletions are located within known or predicted genes, opening the possibility that a portion of the phenotypic differences vi between humans and chimpanzees may be attributed to this mechanism. In the third analysis, we reported the in vivo existence of an endonuclease-independent insertion pathway for L1 elements and characterized twenty-one loci where L1 elements appear to have bridged genomic lesions. We show that these insertions are structurally distinguishable from classical L1 elements and suggest that this pathway may escape the purifying selection thought to be