Next-Generation Sequencing of Centenarians to Identify Genetic Variants
Total Page:16
File Type:pdf, Size:1020Kb
Next-generation sequencing of centenarians to identify genetic variants predisposing to human longevity Dissertation In fulfillment of the requirements for the degree “Dr. rer. nat.” of the Faculty of Mathematics and Natural Sciences at Christian-Albrechts-Universität zu Kiel Submitted by Nandini Badarinarayan Research Group for Healthy Ageing Institute of Clinical Molecular Biology Kiel, July 2014 First referee: Prof. Dr. Tal Dagan Second referee: Prof. Dr. Almut Nebel th Date of oral examination: 24 October 2014 Signed: Prof. Dr. Wolfgang Duschl (Dean) “Always keep your smile. That's how I explain my long life.”- Jeanne Calment (who had the longest confirmed human lifespan in history, living to the age of 122 years, 164 days) For my family Contents I Table of Contents List of figures ...................................................................................... IV List of tables ..........................................................................................V List of abbreviations ........................................................................... VII 1 Introduction ............................................................................... 1 1.1 Longevity phenotype............................................................................... 1 1.2 Genetic epidemiology of human longevity ............................................. 4 1.3 Genetic influences on longevity .............................................................. 6 1.3.1 Findings in model organisms ..................................................................................... 6 1.3.2 Findings in humans .................................................................................................... 9 1.4 Next-generation sequencing to detect variants associated with human longevity .................................................................................... 13 1.5 Research objectives ............................................................................... 15 2 Materials ................................................................................. 18 2.1 Enzymes, kits and instruments .............................................................. 18 2.2 Online databases and software .............................................................. 19 3 Methods .................................................................................. 20 3.1 Sequencing ............................................................................................ 20 3.1.1 Study participants ..................................................................................................... 20 3.1.2 SOLiD technology ................................................................................................... 21 3.1.3 Illumina technology ................................................................................................. 22 3.2 Mapping and variant calling ................................................................. 25 3.3 Selection of variants .............................................................................. 27 3.3.1 Method 1: SNVs that may have functional impact .................................................. 27 3.3.2 Method 2: Low-frequency variants with functional impact ..................................... 28 Contents II 3.4 Genotyping and replication ................................................................... 31 3.4.1 Sequenom technology .............................................................................................. 31 3.4.2 TaqMan technology ................................................................................................. 32 3.4.3 Study population ...................................................................................................... 32 3.4.4 Statistical analysis .................................................................................................... 34 4 Results ..................................................................................... 35 4.1 Mapping, coverage and variant calling ................................................. 35 4.1.1 SOLiD technology ................................................................................................... 35 4.1.2 Illumina technology ................................................................................................. 37 4.2 Method 1: SNVs that may have a functional impact ............................ 42 4.2.1 Selection of variants ................................................................................................. 42 4.2.2 Analysis of selected SNVs ....................................................................................... 44 4.3 Method 2: Low-frequency variants with functional impact ................. 48 4.3.1 Selection of variants ................................................................................................. 48 4.3.2 Analysis of selected SNVs ....................................................................................... 54 5 Discussion ............................................................................... 60 5.1 Coverage and variant calling performance for SOLiD and Illumina sequencing data ...................................................................... 61 5.2 Challenges of next-generation sequencing ........................................... 66 5.3 Methods implemented for selection of variants .................................... 68 5.4 Potential influencing factors for association studies ............................ 72 5.4.1 Population stratification ........................................................................................... 72 5.4.2 Population-specificity .............................................................................................. 74 5.4.3 Phenotype heterogeneity .......................................................................................... 75 5.4.4 Sample sizes and statistical power ........................................................................... 75 5.5 Summary of findings ............................................................................. 77 5.5.1 Study findings .......................................................................................................... 77 5.5.2 Genetic profiles of centenarians ............................................................................... 79 5.6 Conclusion and outlook ........................................................................ 85 6 Summary ................................................................................. 89 Contents III 7 Zusammenfassung .................................................................. 90 8 References ............................................................................... 92 9 Declaration ............................................................................ 108 10 Curriculum vitae ................................................................... 109 11 Acknowledgements .............................................................. 110 12 Supplementary material ........................................................ 111 List of figures IV List of figures No. 1-1: Number of living supercentenarians ................................................................................... 3 1-2: Disease and longevity variants ............................................................................................ 4 1-3: Lifespan regulation in C. elegans........................................................................................ 8 3-1: Emulsion PCR for SOLiD™ sequencing .......................................................................... 22 3-2: Colour-space reference for SOLiD™ sequencing ............................................................ 22 3-3: Bridge amplification for Illumina sequencing. ................................................................. 23 3-4: Individuals sequenced on SOLiD and Illumina technologies for Method 1 ..................... 27 3-5: Individuals sequenced on SOLiD and Illumina technologies for Method 2 ..................... 29 4-1: Genotype concordance for SOLiD sequencing data ......................................................... 37 4-2: Genotype concordance for Illumina sequencing data ....................................................... 39 4-3: Genotype concordance for Illumina exome sequencing data ........................................... 42 4-4: Variant calling metrics for four exomes generated by CRG, Spain .................................. 43 4-5: Selection of variants for Method 1 .................................................................................... 44 4-6: Intersection of exonic variants between SOLiD and Illumina technology: ...................... 49 4-7: Selection of variants for Method 2 .................................................................................... 51 4-8: Power and sample size calculation .................................................................................... 57 5-1: Coverage plot for three samples sequenced with SOLiD technology............................... 61 5-2: Coverage plot for four samples sequenced with Illumina technology .............................. 62 5-3: Coverage plot for six exomes sequenced with Illumina technology................................. 63 12-1: SOLiD™ technology sequencing schema .................................................................... 114 12-2: Illumina technology sequencing schema .....................................................................