Whole-Exome Capture and Next-Generation Sequencing to Discover Rare Variants Predisposing to Congenital Heart Disease

Whole-Exome Capture and Next-Generation Sequencing to discover rare variants predisposing to congenital heart disease Matthieu J. Miossec For the degree of Doctor of Philosophy Newcastle University Faculty of Medical Sciences Institute of Genetic Medicine December 2015 Abstract Congenital heart disease (CHD) is the most common congenital malformation, affecting 8 out of 1000 lives births, yet its aetiology remains largely unresolved. The rapidly growing number of point mutations implicated in isolated CHD suggests that single mutations may contribute significantly to CHD risk. This thesis presents an investigation of the genetic underpinnings of various types of CHD following different study designs. First, I designed a new approach to variant calling which I implemented as the variant caller BAMily. My aim was to develop a method of uncovering putative variants in next- generation sequencing data, shared by a subset of individuals and absent in another subset. I tested the variant caller’s performance against other known variant callers and demonstrated that it provides comparable; and often better, results. This novel variant caller was applied to a study of 8 families in which a disease trait was segregating; along with the variant caller SAMtools, leading to the discovery of likely disease-causing variants in 5 families. Second, I studied de novo mutation in 32 sporadic cases of transposition of the great arteries (TGA) in an attempt to identify genes that, when mutated, lead to TGA. The 32 patients with TGA were sequenced with their parents; as well as one unaffected sibling. To achieve this aim, three variant callers were used: SAMtools, GATK Unified Genotyper and BAMily, the latter acting as a filter. Potential de novo variants were found in GREB1, RBP5, SNX13. Results suggested a complex genetic etiology underlying TGA. Finally, I studied a large series of cases of tetralogy of Fallot (ToF). The study involved 824 patients which ToF and a comparator set of 490 patients with neurodevelopmental disorders lifted from the UK10K project. The aim of the study was to identify genes that, when mutated, play a role in the manifestation of ToF might cluster. For this, I first categorised variants according to their potential to disrupt protein function. I then compared genes in which potentially disease-causing rare variants occurred to lists of genes previously implicated in CHD in the literature. Following this, I identified the clustering of potentially deleterious rare variants across the coding region of genes and exons in ToF patients, hypothesising that variants influencing ToF would cluster in ToF patients. This study led to the discovery of candidate variants in FLT4 and NOTCH1 for non-syndromic ToF. As with TGA, the results I have obtained suggested a complex etiology for ToF. i Acknowledgments First and foremost, I would like to thank my supervisors Prof. Bernard Keavney, Dr. Mauro Santibáñez-Koref and Prof. Judith Goodship, for their excellent supervision. I consider myself fortunate to have had such dedicated supervisors. I would also like to thank Prof. Helen Arthur and Dr. Joana Elson for diligently reviewing my progress over the course of four years. A special thanks goes to all those with whom I have had the pleasure of sharing an office, Prof. Heather Cordell, Dr. Ian Wilson, Holly Ainsworth, Kristin Ayers, Rebecca Darlay, Marla Endriga, Jakris ‘Nat’ Eu-Ahsunthornwattna, Osagie Ginikachukwu Izuogu, Helen Griffin, Darren Houniet, Richard Howey, Mikyung Jang, Michael Keogh, Valentina Mamasoula, Ali Qatan, Rachel Queen, So-Youn Shin, Wei Wei, Yaobo Xu. Thank you Darren for taking the time to show me the ropes in my first year. I would also like to thank the IT team; Arron Scott and Bryan Hepworth, for their timely responses to my many queries. Over the course of four years, I have worked with a lot of talented people: Dr. Danielle Brown, Dr. Elise Glen, Dr. Darroch Hall, Mr. Rafiqul ‘Raf’ Hussain, Dr. Ruairidh Martin, Mr. Mzwandile Mbele, Thahira Rhaman, Dr. Louise Sutcliffe, Dr. Gennadiy Tenin, Dr. Ana Topf and Mrs. Helen Weatherstone. I have enjoyed working with each of you. Elise, Ana and Thahira have helped me tremendously over the years. Raf has always been quick to respond to questions and is an absolute pleasure to talk to. I have also had the privilege to supervise a great Masters student, Wirginia Bada, for which I am grateful. I would also like to acknowledge my dear friend Laurence Dawson for encouraging me to pursue a doctorate. Thank you to my family, my parents Trisha and Michel, without whom none of this would have been possible. Finally, thank you to my beautiful and talented partner, Sandra Morales, for her endless support and encouragement. You have been by my side, through the good days and the bad days, and for this I am eternally grateful. I love you. ii Statement of contribution Unless specified otherwise, the work presented in this thesis is my own. My focus in this thesis was the development of a novel bioinformatic tool for the analysis of next-generation sequencing data, and the analysis and interpretation of familial, trio and case-control datasets in congenital heart disease. Other aspects of the work were performed as outlined below. Prof. Judith Goodship, Prof. Bernard Keavney and their collaborators; Prof. Bongani Mayosi, Prof. Paul Brennan and Dr. Andrew Harper, provided the samples and clinical information for the 8 familial cases studied in Chapter 3 and for the 32 families studied in Chapter 4. The ToF samples analysed in Chapter 5 were collected from 9 different locations across Europe and Australia as part of collaborative study coordinated by Prof. Bernard Keavney, Prof. Judith Goodship and Prof. Mark Lathrop; based at McGill University and Génome Québec Innovation Centre. Sequencing performed at the Institute of Genetic Medicine in Newcastle was done under the supervision of Mr. Rafiqul Hussein, with sequencing data delivered by Bryan Hepworth. Sequencing for Chapter 4 and Chapter 5 was also performed by collaborators at Glasgow Polyomics, the McGill University and Génome Québec Innovation Centre, and The University of Manchester. Microarray and sequencing data for 21 parent- offspring trios used to test my variant caller in Chapter 3 were provided by Dr. Matthew Hurles. Variant validations were performed by Dr. Elise Glen, Dr. Thahira Rahman, Dr. Ana Topf, Dr. Gennadiy Tenin and Dr. Louise Sutcliffe. The method used to measure variant calling sensitivity and specificity in Chapter 5 was developed by Dr. Darren Houniet. iii Table of Contents Abstracts ............................................................................................................................... i Acknowledgments................................................................................................................ ii Statement of contribution ....................................................................................................iii List of figures ..................................................................................................................... vii List of tables ........................................................................................................................ ix Abbreviations ..................................................................................................................... xii Chapter 1. Introduction ............................................................................................................... 1 1.1. Summary ...................................................................................................................... 2 1.2. Congenital Heart Disease Overview ............................................................................. 2 1.2.1. Definition and incidence .......................................................................................... 2 1.2.2. Evolution of treatment ............................................................................................. 9 1.2.3. Genetic contribution to CHD .................................................................................. 10 1.3. Whole-Exome Sequencing (WES) Studies Of Disease ................................................ 17 1.3.1. Sequencing technologies ....................................................................................... 17 1.3.2. Exome capture via target-enrichment ................................................................... 19 1.3.3. Exome-sequencing study tools .............................................................................. 21 1.3.4. Exome-sequencing study designs .......................................................................... 27 1.4. Conclusion .................................................................................................................. 30 Chapter 2. Methods ................................................................................................................... 31 2.1. Overview .................................................................................................................... 32 2.1.1. Genome sequencing and exome capture .............................................................. 32 2.1.2. Sequencing and capture ........................................................................................ 32 2.1.3. Quality control ....................................................................................................... 33 2.2. Data analysis .............................................................................................................

Whole-Exome Capture and Next-Generation Sequencing to Discover Rare Variants Predisposing to Congenital Heart Disease

A Computational Approach for Defining a Signature of Β-Cell Golgi Stress in Diabetes Mellitus

Identification of Transcriptomic Differences Between Lower

Mechanisms Underlying Phenotypic Heterogeneity in Simplex Autism Spectrum Disorders

Nº Ref Uniprot Proteína Péptidos Identificados Por MS/MS 1 P01024

Based on Network Pharmacology and RNA Sequencing Techniques to Explore the Molecular Mechanism of Huatan Jiangzhuo Decoction for Treating Hyperlipidemia

Longitudinal Exome-Wide Association Study to Identify Genetic Susceptibility Loci for Hypertension in a Japanese Population

Homozygous Deletions Implicate Non-Coding Epigenetic Marks In

Supplementary Table 1 Double Treatment Vs Single Treatment

Single Nucleotide Polymorphisms Associated with Alcohol- Induced Flushing Syndrome in Korean Population

Expression Profile and Bioinformatics Analysis of Circular Rnas in Acute

Discovering Pathways in Benign Prostate Hyperplasia: a Functional Genomics Pilot Study

The Differential Expression of Long Noncoding Rnas in Type 2 Diabetes Mellitus and Latent Autoimmune Diabetes in Adults