Dissecting the Genetic Architecture of Cardiac Disorders Through the Use Of
Total Page:16
File Type:pdf, Size:1020Kb
Dissecting the genetic architecture of cardiac disorders through the use of High Throughput Sequencing. Author: Cian Murphy Supervisors: Dr. Vincent Plagnol Dr. Pier Lambiase UCL Genetics Institute Department of Genetics, Evolution and Environment October 9, 2016 I, Cian Murphy confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the thesis. i Abstract The overriding goal of this thesis was to further refine our understanding of the genetic architecture of car- diomyopathies, Arrhythmogenic Right Ventricular Cardiomyopathy (ARVC) and Hypertrophic Cardiomy- opathy (HCM). 407 patients with ARVC and 957 with HCM had 41 cardiomyopathy and other putative candidate genes sequenced. By comparing these cohorts against each other and against ethnicity and phe- notype matched controls, insights were gained into the role of different types of genetic variants in these conditions. This in part involved utilising 4500 Whole Exome Sequences (WES) that are part of the UCL- exomes consortium, an in-house dataset that aggregates a diverse set of studies. High throughput DNA sequencing technologies, WES or Whole Genome Sequencing (WGS) are revolutionizing the diagnosis and novel gene discovery for rare disorders. As the field transitions from the early discovery for Mendelian and near Mendelian diseases to more complex and oligo-genic diseases, there is substantial benefit in being able to combine data across studies, performing the type of meta-analysis for cases and controls that have proven to be so successful for Genome-Wide Association Studies (GWAS). However, WGS and WES are substantially more affected by sequencing errors and technical artefacts than genome-wide genotyping arrays. As a consequence, meta-analysis of sequence based association studies are often dominated by spurious associations, which result in technical limitations. Here, we show that it is possible to take advantage of the type of mixed models developed initially to control for population structure in GWAS studies, and apply these ideas to control for technical artefacts. In an attempt to ascertain the role of CNVs in HCM, these data were examined for the presence i of rare causative CNVs. 12 CNVs were identified from an initial Read Depth approach. 4 of these were subsequently validated by CoNIFER, a bioinformatics method, and Array Comparative Genomic Hybridis- ation (aCGH): one large deletion in MYBPC3, one large deletion in PDLIM3, one duplication of the entire TNNT2 gene and one large duplication in LMNA. These results show that the role of CNVs in HCM is small and highlight the efficiency of this two step-strategy. ii Acknowledgements Firstly, I would like to thank my primary supervisor, Dr. Vincent Plagnol. He has provided support from before day one and throughout on statistics to programming and how to do everything else. Apart from technical knowledge, the most lasting thing I feel I have learned is a never ending scepticism of the data, regardless of the pvalue. My secondary supervisor, Dr. Pier Lambiase offered excellent help on the clinical interpretation of the data alongside general life advice. Dr. Doug Speed of LDAK fame was an immense help in the latter stages of the PhD. I learned a lot more about statistics than I thought I could because of you. Warren, Chris, Lucy, Kitty, Shush, Elvira, Jon, Gareth, Valentina and Julie were some of the best group members I could ask for. From bug catching to the 3.30 coffee routine, you all made it fun to come into UGI. I'll miss you. I want to thank Daniel, my housemate and fellow PhD student. It was always useful to be able to compare PhDs and brainstorm over dinner. And a hefty amount of tennis and bomberman helped keep me somewhat sane. The tail end of my PhD was one of the hardest periods of my life. Writing up made me realise how much more I had to and combined with starting medical school resulted in a pretty serious coffee addiction. Noor, thank you so much for putting up with me. Too much help to list, but your humour, patience and food definitely deserve mentions. Best proof reader ever. I don't need a PhD in genetics to know that I couldn't have gotten here without my parents. Tirelessly supportive, I love you and thank you for giving me the freedom and encouragement to do what iii I wanted. Cillian, thank you for helping me with photoshop; there are somethings that even R cannot do it would seem and you were always willing to help me tweak figures until I realised what I wanted. Oisin, you're one of the smartest people I know and as my older brother you were always good at teaching me. Whether or not I wanted to learn. It was an honour to get to work somewhere the calibre of UCL. Being surrounded by the best researchers I have ever met is the most stimulating environment I could ask for. Lastly, thank you to the British Heart Foundation for being such a good supporter. iv Publications arising from this thesis 1. Panagiotis I Sergouniotis, Christina Chakarova, Cian Murphy, Mirjana Becker, Eva Lenassi, Gavin Arno, Monkol Lek, Daniel G Macarthur, Shomi S Bhattacharya, Anthony T Moore, Graham E Holder, An- thony G Robson, Uwe Wolfrum, Andrew R Webster, and Vincent Plagnol. AJHG The American Journal of Human Genetics Biallelic variants in TTLL5 , encoding a tubulin glutamylase , cause retinal dystrophy. American journal of human genetics, 94(5):760769, 2014. 2. Laurence M. Nunn, Luis R. Lopes, Petros Syrris, Cian Murphy, Vincent Plagnol, Eileen Firman, Chrysoula Dalageorgou, Esther Zorio, Diana Domingo, Victoria Murday, Iain Findlay, Alexis Duncan, Gerry Carr-White, Leema Robert, Tela Bueser, Caroline Langman, Simon P Fynn, Martin Goddard, Anne White, Henning Bundgaard, Laura Ferrero-Miliani, Nigel Wheeldon, Simon K. Suvarna, Aliceson O'Beirne, Martin D. Lowe, William J. McKenna, Perry M. Elliott, and Pier D. Lambiase. Diagnostic yield of molecular autopsy in patients with sudden arrhythmic death syndrome using targeted exome sequencing. Europace, page euv285, 2015. ISSN 1099-5129. doi: 10.1093/europace/euv285. U 3. L.R. Lopes, C. Murphy, P. Syrris, C. Dalageorgou, W.J. McKenna, P.M. Elliott, and V. Plagnol. Use of High-throughput Targeted Exome-Sequencing to screen for Copy Number Variation in Hypertrophic Cardiomyopathy. European Journal of Medical Genetics, pages 16, 2015. ISSN 17697212. doi: 10.1016/j.ejmg.2015.10.001. v Acronyms MYBPC3 Myosin Binding Protein Cardiac 3. 6, 49, 51 RPGR Retinitis Pigmentosa GTPase Regulator. 66 TTLL5 Tubulin Tyrosine Ligase-Like family member 5. 66 1KG 1000 Genome Project. 73 aCGH Array Comparative Genome Hybridisation. 1, 40, 41, 43, 45, 46, 51{62 ArtQ Artefact. 14, 15 ARVC Arrhythmogenic Right Ventricular Cardiomyopathy. 1{3, 6, 7, 19, 21, 24, 28{31, 34, 35, 37, 40, 41, 66, 105, 109, 110, 115 CA Cochran Armitage. 12 CBS Circular Binary Segmentation. 43, 51{62 CGH Comparative Genome Hybridisation. 43 CNV Copy Number Variant. 1, 3, 19, 40, 42, 45, 49, 63, 77 CR Cryptic Relatedness. 65, 77 CV Corrected Variant. 95 vi DCM Dilated Cardiomyopathy. 41, 64 ECG Electrocardiogram. 41 EMMA Efficient Mixed Model Association. 17 ExAC Exome Aggregation Consortium. 23 FFPE Formalin Fixed Paraffin Embedded. 36 FPL2 Familial Partial Lipodystrophy 2. 64 GATK Genome Analysis Tool Kit https://www.broadinstitute.org/gatk. 17, 18 GC Genomic Control. 8, 12, 13 GIF Genomic Inflation Factor. 98 GWAS Genome Wide Association Study. 2, 16, 81, 126 HCM Hypertrophic Cardiomyopathy. 1, 6, 7, 19, 21, 24, 28{31, 34, 35, 37, 40{42, 63, 64, 66, 109, 110 HLA Human Leukocyte Antigen. 8 HMM Hidden Markov Model. 45 HTS High Throughput Sequencing. 1, 10, 11, 14, 19, 65, 66, 71, 77, 92 HWE Hardy-Weinberg Equilibrium. 81 IBD Inflammatory Bowel Disease. 99, 100 INDEL Insertions-Deletion. 11, 18, 66, 108, 110 LMM Linear Mixed Model. 15, 16, 66, 78, 79, 82, 112 LOF Loss of Function - frameshift and stop-gain or stop loss. 29, 40, 66, 68 vii MAF Minor Alelle Frequency. 2, 3, 22, 23, 29, 38, 40, 81, 85, 86, 98, 105 MHC Myosin Heavy Chain. 6 MLPA Multiplex Ligation-Dependent Probe Amplification. 63 PC Principal Component. 10, 65, 67, 73, 77{79, 82, 88, 91, 92, 95, 102, 107, 110 PCA Principal Component Analysis. 9, 10, 43, 65{69, 73, 88, 91, 92, 102, 107, 108, 110{112 PCR Polymerase Chain Reaction. 77 PID Primary ImmunoDeficiency. 89 PS Population Stratification. 8, 10, 15, 65, 73, 77, 82, 85, 99, 117, 121, 126 QTL Quantitative Trait Loci. 82 RD Read Depth. 1, 4, 37, 40, 42, 49, 63, 66, 77, 78, 81, 82, 92{94, 111, 112 REML Restricted Maximum Likelihood. 82 RPKM Reads per kilobase per milllion. 43 RRM Realized Relationship Matrix. 16 SADS Sudden Arrhythmic Death Syndrome. 6, 22, 127 SCD Sudden Cardiac Death. 2, 5{7, 19, 21, 28, 36, 37, 84, 114 SKAT Sequence Kernel Association Test. 27, 29 SKAT-O Sequence Kernel Association Test Optimised. 27 SNP Single Nucleotide Polymorphism. 3, 4, 16, 18, 19, 41, 63, 66, 72, 73, 79, 98, 108, 110, 111, 115, 120 SVD Singular Value Decomposition. 43 viii TK Technical Kinship. 66, 81 UCL-ex UCL Exome Consortium. 1, 17{21, 36, 37, 66, 67, 69, 71{75, 79, 81, 83, 84, 87, 91{95, 98{100, 103, 104, 106, 107, 109, 111{114, 116, 118, 125 VCE Variance Component Estimation. 82 VQSR Variant Quality Score Recalibration. 18 WES Whole Exome Sequencing. 2, 4, 12, 125, 126 WGS Whole Genome Sequencing. 2, 4 ix List of Figures 1.1 Overview of various DNA capture methods . .5 1.2 Comparison of a normal heart to one with Hypertrophic Cardiomyopathy .