An Alu Element-Based Model of Human Genome Instability George Wyndham Cook, Jr
Total Page:16
File Type:pdf, Size:1020Kb
Louisiana State University LSU Digital Commons LSU Doctoral Dissertations Graduate School 2013 An Alu element-based model of human genome instability George Wyndham Cook, Jr. Louisiana State University and Agricultural and Mechanical College, [email protected] Follow this and additional works at: https://digitalcommons.lsu.edu/gradschool_dissertations Recommended Citation Cook, Jr., George Wyndham, "An Alu element-based model of human genome instability" (2013). LSU Doctoral Dissertations. 2090. https://digitalcommons.lsu.edu/gradschool_dissertations/2090 This Dissertation is brought to you for free and open access by the Graduate School at LSU Digital Commons. It has been accepted for inclusion in LSU Doctoral Dissertations by an authorized graduate school editor of LSU Digital Commons. For more information, please [email protected]. AN ALU ELEMENT-BASED MODEL OF HUMAN GENOME INSTABILITY A Dissertation Submitted to the Graduate Faculty of the Louisiana State University and Agricultural and Mechanical College in partial fulfillment of the requirements for the degree of Doctor of Philosophy in The Department of Biological Sciences by George Wyndham Cook, Jr. B.S., University of Arkansas, 1975 May 2013 TABLE OF CONTENTS LIST OF TABLES ...................................................................................................... iii LIST OF FIGURES .................................................................................................... iv LIST OF ABBREVIATIONS ....................................................................................... vi ABSTRACT .............................................................................................................. viii CHAPTER ONE: BACKGROUND .............................................................................. 1 CHAPTER TWO: ALU PAIR EXCLUSIONS IN THE HUMAN GENOME ................... 6 CHAPTER THREE: A COMPARISION OF 100 HUMAN GENES USING AN ALU ELEMENT-BASED INSTABILITY MODEL ................................................. 58 CHAPTER FOUR: CONCLUSIONS ....................................................................... 100 APPENDIX A: SUPPLEMENTAL INFORMATION ................................................. 101 APPENDIX B: LETTERS OF REQUEST AND PERMISSION ............................... 150 VITA ....................................................................................................................... 152 ii LIST OF TABLES 2.1 CLIQUE adjusted FAP sample sizes and I:D ratios, hg18............................ 24 2.2 Chimpanzee specific APEs characterized by PCR ...................................... 25 2.3 Primers for selected APE loci listed in Table 2.2 .......................................... 28 2.4 Comparison of orthologous direct and inverted FAP loci ............................. 30 A3.1 Studies linking Alu-related deletions to human disease phenotypes .......... 101 A3.2 Actual and fitted Alu pair I:D ratios across ten spacer percentiles, APSNs 1-115 ............................................................................................. 115 A3.3 Characteristics of 50 deletion-prone cancer genes .................................... 117 A3.4 Characteristics of 50 randomly chosen human genes ................................ 118 A3.5 Spacer sample sizes and groupings used in determination of I:D ratios for Type 1 Alu pairs .................................................................................... 119 A3.6 Coefficients for equations describing the I:D ratio versus spacer size for Type 1 Alu pairs ..................................................................................... 124 iii LIST OF FIGURES 2.1 Full-length Alu element .................................................................................. 8 2.2 Size distribution of Alu elements in the human genome ............................... 10 2.3 Four types of Alu pairs ................................................................................. 12 2.4 Orientational clustering of Alu elements in human chromosome 1 ............... 13 2.5 Frequency of closely-spaced, full-length Alu pairs, FAPs............................. 16 2.6 CLIQUE adjusted adjacent FAP I:D ratios versus spacer size ..................... 19 2.7 CLIQUE density across the human genome ................................................ 20 2.8 Naming convention for FAPs ........................................................................ 21 2.9 FAP I:D ratio versus Alu pair sequence number with and without adjusting for CLIQUEs ................................................................................. 23 2.10 Chimpanzee specific APE deletions ............................................................. 25 2.11 ARMDs in proximity to inverted Alu pairs ..................................................... 33 2.12 Estimated ranges for four potential mechanisms for generating APEs ........ 35 2.13 Possible pathways for formation of G and S phase DDJs ............................ 37 2.14 Possible S phase dual replication bubble DDJ formation pathway ............... 39 2.15 Possible deletion patterns resulting from resolution of DDJs ....................... 42 3.1 Proposed mechanism for formation and resolution of doomsday junction formed by the ectopic invasion and annealing of complementary DNA breathing bubbles ................................................................................ 62 3.2 Proposed mechanism for the formation of a doomsday junction formed by the ectopic invasion and annealing of complementary replication forks ............................................................................................................. 63 3.3 Alu Pair I:D ratio versus spacer size for Type 1 Alu pairs for APSNs 1-10 ... 65 3.4 Alu pair I:D ratio versus Alu pair type, spacer size, and APSN .................... 66 iv 3.5 Alu landscapes for BRCA1 and VHL ............................................................ 69 3.6 Estimated human deletion size frequency distribution ................................. 72 3.7 Distributions of estimated relative stabilities for 50 deletion-prone cancer genes and 50 randomly chosen genes ........................................................ 74 3.8 Estimated relative exon stability distributions for the 50 deletion-prone cancer genes and 50 randomly chosen genes ............................................. 77 A3.1 Alu landscapes for selected genes ............................................................. 138 A3.2 Regression fits for 2.5th spacer size percentiles for Type 1, 2 and 3 Alu pairs for APSNs 1-115 ......................................................................... 146 A3.3 Sensitivity of the shape of the human deletion size frequency distribution on the relative stabilities of the 50 deletion-prone cancer genes ................ 149 v LIST OF ABBREVIATIONS AAI Alu-Alu Insertion APE Alu Pair Exclusion APSN Alu Pair Sequence Number ARMD Alu Recombination Mediated Deletion BLAST Basic Local Alignment Search Tool BLAT BLAST-like Alignment Tool CAC Catenated Alu Cluster CLIQUE Catenated LINE1 Endonuclease Induced Queues of Uninterrupted Alu, LINE1 and SVA Elements DDJ Doomsday Junction DNA Deoxyribonucleic Acid DSB Double-Strand Break FAP Full Length Alu pair hg18 Human Genome Assembly 18 hg19 Human Genome Assembly 19 I:D Ratio of Inverted to Direct Oriented Alu Pairs L1 LINE1 Element L1EN LINE1 Endonuclease L1RT LINE1 Reverse Transcriptase NAHR Non-Allelic Homologous Recombination ORF2p Protein from the Second Open Reading Frame in LINE1 Elements PanTro2 Second Genome Assembly for the Common Chimpanzee, Pan troglodytes vi PCR Polymerase Chain Reaction Poly(A) Poly-Adenine RNA Ribonucleic Acid SINE Short Interspersed Element SSA Single Strand Annealing Repair of a Double-Strand DNA Break SVA SINE-r; VNTR; HERV-like Region TPRT Target Primed Reverse Transcription TSD Target Site Duplication UCSC University of California, Santa Cruz vii ABSTRACT The human genome is strewn with repetitive sequence. An early estimate derived from the draft human genome sequence placed this repetitive content at ~45%. More detailed recent analyses have advanced the idea that the human repetitive and repeat derived contribution to the genome may be closer to 66-69%. The most commonly repeated sequence in the human genome is the Alu element. Alus make up 10.6 percent of all human DNA and have expanded to over one million copies in the human genome reproducing through a copy and paste mechanism. New Alu germline insertions are estimated to occur at a rate of 1 in 20 human births. In addition to their insertional impact, Alus have also been associated with various forms of genomic sequence disruptions including inversions, rearrangements, translocations and deletions. Chimeric Alus are frequently located at the breakpoints of these various forms of structural variations. This observation has led to the putative conclusion that chimeric Alus primarily result from the non-allelic homologous recombination between Alu elements. However, little proof is available regarding the actual mechanism(s) that catalyze this activity. This dissertation reveals a newly recognized pattern among human Alu pairs that may provide additional insight into the mechanism(s) driving chimeric Alu formation. After adjusting for directional biases associated with clustering, Alu pairs in the same orientation (direct) outnumber Alu pairs in the opposite orientation (inverted pairs) by over two percent (p<0.05). If this imbalance was generated by deletions