Download (8MB)

DETECTION OF REARRANGEMENT HOTSPOTS AND THEIR IMPLICATION IN COMPLEX HUMAN DISEASES By Mohammed Uddin A thesis submitted to the School of Graduate Studies in partial fulfillment of the requirements for the degree of Doctor of Philosophy Discipline of Genetics Faculty of Medicine Memorial University of Newfoundland St. John's, Newfoundland and Labrador January 7th, 2013 ABSTRACT A segment of DNA can vary in the number of copies between two or more genomes known a copy number variation (CNV). CNVs are a common genomic variant that is now shown to be associated with numerous diseases. A large portion of the genome (approximately 12%) has shown to be vulnerable in producing common or rare CNVs. These genomic regions are often prone to rearrangement due to the underlying molecular mechanisms (i.e. non-allefic homologous recombination (NAHR), non homologous end joining (NHEJ), fork stalling and template switching (FoSTeS) and microhomology-mediated break-induced recombination (MMBIR)) that gives rise to inter-individual genetic differences through copy number changes. Segmental Duplication (SO) is another type of genomic variant and is at least 1 kb in length with >90% sequence identity with other genomic regions, constitutes a significant portion of the human genome. A pair of SO block that are homologous to each other influences the rate of NAHR events, often resulting in CNVs. Large portions of SDs overlap with CNVs, some of which are associated with disease. Although investigating SO is important to the study of evolution, and for inferring the underlying mechanism of nearby CNVs, the structural relationship of SDs and CNVs in complex disease is yet to be elucidated. Detecting SO is notoriously complex and genotyping single nucleotide polymorphism (SNP) within these regions is near impossible with traditional array based approaches. ii The primary objectives of this thesis are - i) to detect CNVs arising from complex genomic rearrangements mediated through SDs; ii) to design a custom microarray, targeting rearrangement hotspot regions and iii) to identify novel CNVs associated with complex disease (i .e. Ankylosing Spondylitis (AS) and Tourette Syndrome (TS)) using the custom microarray. Until the advent of high throughput whole genome sequencing, the primary CNV detection methods included SNP genotyping arrays, comparative genomic hybridization (CGH) arrays, clone-PCR product arrays, and fluorescent in situ hybridization (FISH). Each method is reported to have pros and cons show no single method have the capacity to detect the entire CNV content in a genome. In this thesis, the use of SNP microarrays (primarily designed for SNP genotyping) to detect CNVs in a complex disease cohort was explored. This analysis shows the limited capacity of existing SNP arrays for the detection of CNVs. In light of this apparent complexity on the detection of CNVs, a hybrid model was described in the second chapter to detect SDs and CNVs using both whole genome sequencing and microarray technologies. The whole genome sequence analysis was performed to detect approximately 2000 rearrangement hotspots within SDs for an African genome (18x coverage). A high density microarray consisting of 2 X 1 million probes was custom designed targeting the hotspots. The application of the microarray leads to the detection of a large number of CNVs and was applied on two complex (i.e. Ankylosing Spondylitis (AS), and Tourette Syndrome (TS)) diseases to identify novel disease iii associated CNVS. The third chapter shows the detection of a highly stratified gene UGT2B17 in copy number and its association with Ankylosing Spondylitis. The fourth chapter of this thesis show the application of the custom array that leads to the detection of a novel locus at 2q21.1-q21.2 for Tourette syndrome. CNVs breakpoint within the locus show atypical micro duplications includes C2orf27A gene that segregates within multiple generations correlates with severe TS phenotypes. iv ACKNOWLEDGEMENTS I would like to thank Dr. Proton Rahman and Dr. Michael Woods for their guidance during my research project and the subsequent writing of our publications and this thesis. My thanks are also extended to my supervisory committee member Dr. Roger Green. I extend my appreciation to Dr. Darren O'Rielly for his kind guidance and help. I would like to recognize the Atlantic Innovation Fund (AIF) and Canadian Institutes of Health Research (CIHR), for salary support I received for this project. The financial contribution for my project through operating grants awarded to Dr. Proton Rahman . I would also like to give thanks to my colleagues at the Population Therapeutic Research Group (PTRG), particularly Mitch Sturge and Catherine Street for their kind support. Finally, I would like to thank my parents for their constant support. Thanks to my wife Syeda Nusrat Ferdaus for her kind support and, my son Aahiyan Ferdaus for being the joy of my life. v ABSTRACT ........................................................ ......... .. .................. .................. ... ................ ... .. ..... ii ACKNOWLEDGEMENTS .... .. .............................. ............................. .. ............ .............. .. ..... ....... v LIST OF FIGURES ... ................................................. ................................................................ .. ix LIST OF TABLES ......... .. .. ..... ........ .. ............. ........................ .. .. ............ .. .. ............. .................... xix LIST OF ABBREVIATIONS ..................................................................................... ....... ..... ...... 1 Chapter 1 .... .. .... .... .... .. .. ........ .. .. ... .. ... ....... ... ....... .... ............ ................ .. ... .. ... .. ... ...... .. ........... .... .... 3 INTRODUCTION .................. ................... .. .... ................... .. .................... ........ ............. ............. 3 1.1 COPY NUMBER VARIATION ... ................................................ .... .. ................ ................. 3 1.2 REARRANGEMENT HOTSPOTS AND CNVS .... .......... ... ... .. ...... ...... ................ .. ......... 7 1.3 DETECTION OF COPY NUMBER VARIATIONS (CNVs) ......... ...... .. ...... ... .............. 12 1.4 NEUROPSYCHIATRIC DISEASES AND COPY NUMBER VARIATIONS ............ 29 1.5 COMMON COMPLEX DISEASES AND COPY NUMBER VARIATIONS .. ... ......... 33 1.6 RESEARCH HYPOTHESIS AND SPECIFIC OBJECTIVES .. .. .. .................. ............ 38 Chapter 2 ................... .... .......................... .... ................ .......... .. .................... ............................... 40 Genome-wide Signatures of 'Rearrangement Hotspots' Within Segmental Duplications in Humans ..................................... .. ...... ................... .. ....................................... 40 PREFACE ... ............ ...... .... .... ..... .... .. .............. .... .... ................................. ... .............. ...... .......... 40 ABSTRACT ........................ ..... .................. ........... ............... .. .................................................. 41 2.1 INTRODUCTION ........ ................................. .......................................... .. ......... ..... ........... 43 vi 2.2 RESULTS AND DISCUSSION ..... ............................................................ ................. .... 46 2.3 MATERIALS AND METHODS .......... ................. .. .............. .... ... .. .. ................................. 70 Chapter 3 .... ... ... .. ......... .. ... .... ........ .. ...... .. .. .... ...... ............................... .. .. .... ............ .. ... ...... ..... ..... 81 Atypical Micro-duplications at 2q21.1-21.2 Co-segregates with Tourette Syndrome in a Three Generation Family ....................................... .... ................................ 81 PREFACE ................... ................................. .. ... .. ............................. .... ....... ............................. 81 ABSTRACT .............. .......................................................................................... ................. ..... 83 3.1 INTRODUCTION ........................ ...... ....................... .... ............................................ ......... 85 3.2 RESULTS .................................................................................. ................................... ..... 92 3.3 DISCUSSION ............................... .... .... ............... ............ ......... ...................................... 100 3.4 MATERIALS AND METHODS ... .. .. .... .... ....... ............... .... ................ .. .... ................ .. .. .. 104 Chapter 4 ........... ... .................. ................................................................. .. ... ....... .. ........ ..... .. .. .. 110 Autosome-wide Copy Number Variation Association Analysis for Rheumatoid Arthritis Using the WTCCC High-density SNP Genotype Data ...... 110 PREFACE ...................... ......................... ..................... ............................ ...... ..................... ... 110 ABSTRACT ............. ................................. .. .. ................ ........ ........... ................. .. .................... 112 4.1 INTRODUCTION ................. ............ ......... .... ....... ..................................... ........ .............. 114 4.2 RESULTS .................................................. .......

Download (8MB)

Supplementary Information

Supplemental Table 3 Site ID Intron Poly(A) Site Type NM/KG Inum

Chromosomal Microarray Analysis of Patients with Duane Retraction Syndrome

Partial Gene Duplication and the Formation of Novel Genes

SUPPLEMENTAL METHODS Flow Cytometry Analysis Phycoerythrin

There Are 1393 Genes Significant by SAM with 90Th Percentile Confidence, the False Discovery Rate Among the 1393 Significant Genes Is 0.09968

Table of Contents

Method for a Genome Wide Identification of Expression

12.2% 122000 135M Top 1% 154 4800

Supplementary File 1B Revised-Förslag

GLRX5 Deficiency Causes Sideroblastic Anemia by Specifically

The Endothelial Genomic Response to Chronic Hypoxia