Genetic and Epigenetic Analysis of Type 2 Diabetes Among Qatari Families
Total Page:16
File Type:pdf, Size:1020Kb
Genetic and Epigenetic Analysis of Type 2 Diabetes among Qatari Families Wadha Al Muftah, MD Department of Genomics of Common Disease School of Public Health Faculty of Medicine Imperial College London Thesis Submitted for the degree of Doctor of Philosophy Copyright Declaration ‘The copyright of this thesis rests with the author and is made available under a Creative Commons Attribution Non-Commercial No Derivatives licence. Researchers are free to copy, distribute or transmit the thesis on the condition that they attribute it, that they do not use it for commercial purposes and that they do not alter, transform or build upon it. For any reuse or redistribution, researchers must make clear to others the licence terms of this work’. 2 Abstract Type 2 diabetes (T2D) is a complex multifactorial disorder driven by both genetic and environmental factors. The rapid increase of T2D in Qatar -prevalence of 16.3% in 2014 according to the International Diabetes Federation (IDF) - motivated the introduction of genetic studies among this under-presented population. Major progress to study the genetic basis of T2D came from studying common variants. However, these variants were of mild effect sizes. Studies have been shifted from applying the common variants hypothesis towards the investigation of other genetic variables including rare variants, copy number variants (CNV) and epigenetic mechanisms. This PhD project is focused on identifying genomic risk factors of T2D among the Qatari population. Eight Qatari families were analysed using the advances in genotyping and sequencing technologies. Three analyses were carried out; the aim was to identify known or novel rare variants within candidate T2D genes, identification of potential large CNVs related to T2D and detection of methylation associations with T2D. Three data sets were generated for this PhD project; these were genome-wide genotyping arrays (Illumina HumanOmni2.5M bead chip), whole genome sequencing (WGS) (Illumina hiSeq2500 platform) and genome-wide methylation arrays (Illumina 450K BeadChip capturing more than 485,577 probes). I performed three analytical experiments for the aim of this PhD project. First, linkage analysis and runs of homozygosity in combination with WGS were used to map rare variants. Second, PennCNV prediction algorithm for CNV calling was applied aiming to identify rare T2D-related CNVs. Third, association test using a linear mixed model was used for the identification of methylation associations with T2D. The findings included the identification of three rare variants detected through WGS. One variant identified within known monogenic T2D gene (GCKR) and two variants detected within known T2D genes (IGFBP2 and TGM2). These genes are functionally related to T2D and replication analysis will be required to assess their contribution to T2D among Qataris. A number of large rare CNVs detected from CNV analysis in the second experiment of the project, but potential T2D genes were not found within candidate CNVs. Also, methylation analysis replicated a significant association with T2D at cg19693031 in TXNIP (p=5.28x10¯⁶) among Qataris. The identified candidate genes were proved to be involved in the pathogenesis of T2D by causing insulin resistance and beta-cell dysfunction. 3 Table of Contents Copyright Declaration ………………………………………………………………………………………………………………. 2 Abstract ……………………………………………………………………………………………………………………………………. 3 Table of Contents ……………………………………………………………………………………………………………………… 4 List of Tables …………………………………………………………………………………………………………………………….. 11 List of Figures ……………………………………………………………………………………………………………………………. 13 Acknowledgements ………………………………………………………………………………………………………………….. 15 Declaration of Originality …………………………………………………………………………………………………………. 16 List of Publications ……………………………………………………………………………………………………………………. 17 Abbreviation …………………………………………………………………………………………………………………………….. 20 1 Introduction ……………………………………………………………………………………………………………. 24 1.1 Prevalence of Type 2 Diabetes ………………………………………………………………………………………. 25 1.2 Prevalence of Type 2 Diabetes in Qatar ……………………………………………………………………….. 25 1.3 Gene-Environment Interaction in T2D ………………………………………………………………………….. 26 1.4 Obesity risk in Developing Type 2 Diabetes ………………………………………………………………….. 28 1.5 Heritability of Type 2 Diabetes ……………………………………………………………………………………… 29 1.6 Genetic Basis of Type 2 Diabetes …………………………………………………………………………………… 29 1.6.1 Monogenic Form of T2D ……………………………………………………………………………………. 30 1.6.2 Polygenic Form of T2D ………………………………………………………………………………………. 32 1.7 Gene Mapping Approaches to Identify T2D Genes ………………………………………………………. 32 1.7.1 Linkage Analysis ……………………………………………………………………………………………………… 32 1.7.2 Candidate Gene Studies …………………………………………………………………………………………. 34 1.7.3 Genome Wide Association Studies …………………………………………………………………………. 35 1.8 Missing Heritability of Complex Diseases ……………………………………………………………………….. 42 1.8.1 Rare Variants ………………………………………………………………………………………………………….. 43 1.8.2 Genomic Structural Variation (GSV) ………………………………………………………………………… 44 1.8.3 Epigenetic Modifications ………………………………………………………………………………………… 44 1.8.4 Parent-of-Origin ……………………………………………………………………………………………………… 44 4 1.9 Next Generation Sequencing (NGS) ……………………………………………………………………………….. 45 1.10 Copy Number Variation (CNV) ……………………………………………………………………………………... 47 1. 10.1 Copy Number Variants: Mechanisms of formation ……………………………………………… 49 1. 10.2 Copy Number Variants prediction and Validation ……………………………………………….. 50 1.10.3 Role of Copy Number Variants in T2D Development ………………………………………….. 52 1.11 Epigenetic of Type 2 Diabetes ………………………………………………………………………………………. 53 1.11.1 Epigenetic Mechanisms ……………………………………………………………………………………….. 53 1.11.1.1 DNA methylation ………………………………………………………………………………….. 54 1.11.1.2 Histone Modifications …………………………………………………………………………… 55 1.11.1.3 RNA interference ………………………………………………………………………………….. 56 1.11.1.4 Environmental factors …………………………………………………………………………… 56 1.11.2 Heritability Estimates of DNA methylation ……………………………………………………………. 58 1.11.3. Role of Epigenetics in Developing T2D ………………………………………………………………… 59 1. 11.4 Characterization of T2D Epigenetic Risk ………………………………………………………………. 59 1.12 Genetic of Type 2 Diabetes in Qatar …………………………………………………………………………….. 61 1. 13 Aims and Objectives ……………………………………………………………………………………………………. 62 1.13.1 Overall Aim ………………………………………………………………………………………………………….. 62 1.13.2 Objectives ………………………………………………………………………………………………………….. 62 2 Material and Methods …………………………………………………………………………………………… 63 2.1 Ethical Approval ………………………………………………………………………………………………………………. 64 2.2 Study Subjects …………………………………………………………………………………………………………………. 64 2.2.1 Qatari Families with Early Onset of Type 2 Diabetes ………………………………………………. 64 2.2.2 Qatari Families with Type 2 Diabetes and Obesity ………………………………………………….. 68 2.2.3 Replication Samples ……………………………………………………………………………………………….. 68 5 2.3 DNA extraction ………………………………………………………………………………………………………………… 68 2.4 Sample Preparation ………………………………………………………………………………………………………….. 69 2.5 Whole Genome Genotyping Arrays ………………………………………………………………………………….. 69 2.6 Prediction of Copy Number Variants associated with T2D ………………………………………………. 70 2.6.1 Generation of B allele frequency and Log R ratio using signal intensity files …………… 71 2.6.2 PennCNV algorithm for CNV prediction ………………………………………………………………….. 72 2.6.3 Filtration of Samples with low Quality Calls ……………………………………………………………. 72 2.6.4 Merging adjacent CNV calls ……………………………………………………………………………………. 73 2.6.5 Filtering Large and Rare CNVs ………………………………………………………………………………… 73 2.6.6 Exclude false CNV calls …………………………………………………………………………………………… 74 2.7 Assessment of Candidate CNVs ………………………………………………………………………………………. 74 2.7.1 Identify rare CNVs …………………………………………………………………………………………………. 74 2.7.2 Visualization of candidate CNVs …………………………………………………………………………….. 74 2.7.3 Replicate CNVs in other dataset …………………………………………………………………………….. 75 2.8 Identification of Rare T2D Variants ………………………………………………………………………………….. 75 2.8.1 Linkage Analysis …………………………………………………………………………………………………….. 76 2.8.1.1 GenomeStudio ……………………………………………………………………………………….. 76 2.8.1.2 SNP array Quality Control ……………………………………………………………………….. 77 2.8.1.2.1 PLINK ………………………………………………………………………………………. 77 2.8.1.2.2 MERLIN …………………………………………………………………………………… 78 2.8.1.3 Marker Selection ……………………………………………………………………………………. 78 2.8.1.3.1 Informativeness …………………………………………………………….……….. 78 2.8.1.3.2 LD Pruning ……………………………………………………………………………… 80 2.8.1.4 Running Linkage Analysis using MERLIN ………………………………………………… 80 2.8.1.5 Runs of Homozygosity (ROH) ……………………………………………………….………… 81 6 2.8.2 Whole Genome Sequencing (WGS) ……………………………………………………………………….. 82 2.8.2.1 Quality control measures for WGS …………………………………………………………. 82 2.8.2.1.1 QC of raw data using FASTQC ………………………………………………………………. 83 2.8.2.1.2 QC of aligned reads using QPLOT …………………………………………………………. 86 2.8.2.2 Generation of variant calls using GATK best practices pipeline ……………….. 87 2.8.2.3 Variants filtration and identification of rare variants using Qiagen Ingenuity Variant Analysis (IVA) ………………………………………………………………………………………….. 89 2.8.2.4 IVA filtration options used at different WGS analysis stages ……………………. 90 2.8.2.4.1 Confidence ………………………………………………………………………………. 90 2.8.2.4.2 Frequency ……………………………………………………………………………….. 90 2.8.2.4.3 Predicted Deleterious ……………………………………………………………… 91 2.8.2.4.4 Biological Context …………………………………………………………………… 91 2.8.2.4.5 Genetic Analysis.......................................................................... 91 2.8.2.5 Validation of Candidate genes using PCR-based method