Population and Family Based Studies of Consanguinity: Genetic And
Total Page:16
File Type:pdf, Size:1020Kb
Population and family based studies of consanguinity: Genetic and computational approaches Abdullah Mesut Erzurumluoğlu A dissertation submitted to the University of Bristol in accordance with the requirements for award of degree of Doctor of Philosophy (PhD) in the Faculty of Medicine and Dentistry October 2015 Word Count = ~65,000* *Excluding preface, tables, footnotes, references and appendices Thesis Abstract Consanguinity is the union of closely related individuals – which can have genetic implications on the health of offspring(s). Consanguineous families with disorders have been extensively analysed by geneticists and this has led to the identification of many autosomal recessive disorder causal variants and genes. Two copies of the ‘inactivating’ or loss of function (LoF) allele are required to cause an autosomal recessive disorder, one inherited from the mother and the other from the father. In outbreeding populations these LoF alleles very rarely meet their counterpart (as it requires both parents to possess the allele), thus are passed down the generations silently – sometimes for millennia. However, consanguineous and/or endogamous offspring have elevated levels of homozygosity, which dramatically increases the probability of any allele to be in a homozygous (or more correctly autozygous) state. This increase in probability applies to LoF mutations also; and this elevation of levels of homozygosity is the main reason why extremely rare autosomal recessive disorders are usually only seen in populations where consanguinity (and/or endogamy) levels are high. With the ever decreasing prices of DNA sequencing, whole-genome sequencing is becoming a reality for many laboratories. However, for now, whole-exome sequencing (WES) is the most feasible sequencing technique mostly due to cost factors. Combining the concepts of consanguinity and WES, the aim of this thesis was to identify ‘causal’ variants by analysing whole-exome data obtained from consanguineous families/individuals affected from autosomal recessive disorders such as Primary Ciliary Dyskinesia (PCD) and Autosomal Recessive Intellectual Disability (ARID). Using autozygosity mapping, a novel region located on chromosome 19 (p13.3) was identified to be associated with ARID (later sequenced by another research group and ADAT3 was identified as the causal gene). Using WES, rare homozygous nonsense mutations p.E309* in CCDC151 and p.R136* in DNAAF3 were found to be causal of PCD. Other variants such as p.M263T in MNS1, p.R263* in DNALI1, p.G734fs in HEATR2 and p.E328* in LRRC48 have also been identified which may be causal of PCD but studies in this thesis remained inconclusive due to various reasons. Additionally, a rare missense mutation p.G300D in CTSC was found to be Papillon-Lèfevre syndrome (PLS) causal. This latter finding illustrated the additional information that can be gained from WES data – which is discussed in the thesis. Finding novel causal variants and gene functions can improve genetic counselling and lead to the identification of targets for preventive and/or curative medicine. In this respect, analysing consanguineous populations as a whole rather than ‘cherry picking’ families with disorders will have additional benefits and facilitate our understanding of the human genome – and this subject is also discussed in this thesis. i Author’s declaration I declare that the work in this dissertation was carried out in accordance with the requirements of the University’s Regulations and Code of Practice for Research Degree Programmes and that it has not been submitted for any other academic award. Except where indicated by specific reference in text, the work is the candidate’s own work. Work done in collaboration with, or with the assistance of others, is indicated as such. Any views expressed in the dissertation are those of the author. SIGNED: ....................................................... DATE: 01/10/2015 ii Acknowledgements I would like to begin by thanking every single one of my colleagues at the Bristol Genetic Epidemiology Laboratories (BGEL) for their academic, social and emotional help over the four years I have been there. Special mentions should go to my supervisors Dr. Santiago Rodriguez, Dr. Tom R. Gaunt and Prof. Ian Day, my desk mates (and colleagues) Dr. Hashem A. Shihab, Denis A. Baird, Tom G. Richardson, Dr. Jie Zheng and Dr. Chris Boustred, and colleagues Dr. John Kemp, Dr. Philip Guthrie and Dr. Osama Al-Ghamdi for their extra effort and time on transforming the academically ‘naïve and inexperienced’ Mesut that joined the BGEL in January 2012 to Dr. Erzurumluoğlu today. I am indebted to my parents Ayla and (Dr.) Bayram Erzurumluoğlu, and siblings (Esat, Hasna and Nuran) for the sacrifices they have made; my friends and housemates for putting up with me; and most of all, God Almighty (The Most Gracious, The Most Merciful) for giving me the ability, strength and will power to overcome obstacles and get through hard times in academic, economic and social life. Finally, I am grateful to the Medical Research Council (MRC) for the scholarship they have provided as it enabled me to concentrate solely on my research and therefore directly contributed to the findings published in this thesis. iii “Kulları içinde ancak âlimler, Allah’ı gerektiği tarzda tazim ederler.” Kuran-ı Kerîm, Fâtir, 28 “Among His servants/creation, only the scholars (those who have knowledge) truly fear and honour God.” Holy Qur’an, Fatir, 28 iv Table of Contents Chapter 1. Introduction and Literature Review ................................................................. 1 1.1. Organism to Genome to Genes .............................................................................. 1 1.1.1. What is a gene? .................................................................................................. 4 1.1.2. Variation in a genome ...................................................................................... 8 1.2. Epidemiology and Genetics .................................................................................. 15 1.2.1. Genetic Epidemiology .................................................................................... 15 1.2.2. Terminology ..................................................................................................... 17 1.2.3. Mendelian disorders ....................................................................................... 20 1.2.4. Past and present hypotheses on Mendelian disorders .............................. 24 1.2.5. Complex disorders .......................................................................................... 26 1.2.6. Past and present hypotheses on Complex disorders ................................. 30 1.3. Consanguinity and Genetic research ................................................................... 33 1.3.1. Consanguineous societies and genetic disease ........................................... 36 1.3.2. Inbreeding depression in humans? .............................................................. 37 1.3.3. Historical perspective ..................................................................................... 45 1.3.4. Autozygosity ................................................................................................... 50 1.3.5. World-wide Consanguinity ........................................................................... 55 1.4. Identifying the Genetic basis of human diseases ............................................... 60 1.4.1. Traditional methods ....................................................................................... 61 1.4.2. Current methods ............................................................................................. 62 1.5. Detecting DNA sequence variation ..................................................................... 63 1.5.1. Whole genome sequencing ............................................................................ 65 1.5.2. Whole exome sequencing .............................................................................. 66 1.5.3. Other methods ................................................................................................. 68 v 1.6. DNA Sequencing technologies ............................................................................. 71 1.6.1. Historical background .................................................................................... 71 1.6.2. Next-generation sequencing .......................................................................... 72 1.7. Population-based genetic variation datasets: why collect them? .................... 73 1.7.1. Projects for mapping human genetic variation .......................................... 74 1.7.2. Clinical uses ..................................................................................................... 77 1.7.3. Bioinformatics uses ......................................................................................... 77 1.8. Summary of Aims and Objectives ....................................................................... 78 Chapter 2. Overview of methods ........................................................................................ 80 2.1. Materials/samples ................................................................................................. 80 Participants ............................................................................................................................... 80 Blood samples ..........................................................................................................................