University of Groningen

Matters of the heart: genetic and molecular characterisation of cardiomyopathies Posafalvi, Anna

IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below. Document Version Publisher's PDF, also known as Version of record

Publication date: 2015

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA): Posafalvi, A. (2015). Matters of the heart: genetic and molecular characterisation of cardiomyopathies. University of Groningen.

Copyright Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).

Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum.

Download date: 23-09-2021 Matters of the heart: genetic and molecular characterisation of cardiomyopathies

Pósafalvi Anna The work described in this thesis was supported by the University Medical Center Groningen, the Jan Kornelis de Cock Foundation, the NutsOhra Foundation and the Netherlands Heart Foundation. Printing of this thesis was supported by the Graduate School of Medical Sciences and the University Library, University of Groningen, Groningen, the Netherlands.

Copyright: © 2015 by Anna Posafalvi All rights reserved. No parts of this book may be reproduced, stored in retrieval system, or transmitted in any form or by any means without prior written permission of the author and the publishers holding the copyrights of the published articles.

Cover photograpy: ©Anna Posafalvi Design: dreamed of by Anna Posafalvi, dreams made come true by Joanna Smolonska

Layout and printing: Lovebird Design & Printing Solutions ISBN: 978-90-367-7767-4 (printed) 978-90-367-7766-7 (electronic) Matters of the heart: genetic and molecular characterisation of cardiomyopathies

PhD thesis

to obtain the degree of PhD at the University of Groningen on the authority of the Rector Magnificus Prof. E. Sterken and in accordance with the decision by the College of Deans.

This thesis will be defended in public on

Monday 20 April 2015 at 16:15 hours

by

Anna Posafalvi born on 17 June 1986 in Debrecen, Hungary Supervisor Prof. RJ Sinke

Co-supervisor Dr. JDH Jongbloed

Assessment committee Prof. MH Breuning Prof. RA de Boer Prof. RMW Hofstra D r á g a N a g y a p á m n a k . . . To my dearest grandfather...

“Wheresoever you go, go with all your heart.” (Confucius) Paranymphs Ena Sokol Eva Teuling TABLE OF CONTENTS Preface: …about cardiomyopathies in a nutshell 9 Outline of this thesis 15 Appendix 1 List of 16 Frequently used abbreviations 18 Chapter 1: Introduction New clinical molecular diagnostic methods for congenital and 23 inherited heart disease (Expert Opin Med Diagn 2011, review) Chapter 2: Candidate screening 2.1: Mutational characterisation of RBM20 in 53 and other cardiomyopathy subtypes 2.2: Missense variants in the rod domain of increase 79 susceptibility to arrhythmogenic right ventricular cardiomyopathy Chapter 3: Exome sequencing 3.1: Hunting for novel disease genes in autosomal dominant 109 cardiomyopathies: elucidating a role for the sarcomeric pathway 3.2: Homozygous SOD2 mutation as a cause of lethal neonatal 139 dilated cardiomyopathy 3.3: One family, two cardiomyopathy subtypes, three disease genes: 157 an intriguing case Chapter 4: Targeted sequencing 4.1: Gene-panel based Next Generation Sequencing (NGS) 173 substantially improves clinical genetic diagnostics in inherited cardiomyopathies 4.2: gene mutations are common in families with both 209 peripartum cardiomyopathy and dilated cardiomyopathy (Eur Heart J 2014) Chapter 5: Discussion and future perspectives 233 Summary 251 Appendix 2 List of authors and affiliations 268 About the author 270 Acknowledgements 272

…about cardiomyopathies in a nutshell The disease

Cardiomyopathy is an insidious disease of the heart muscle (myocardium) PREFACE leading to decreased pumping capacity, and resulting in a wide range of symptoms. These range from mild (dizziness, fatigue, chest pain or oedema) to severe (heart failure, arrhythmia, embolism, or even sudden death).

Figure 1. Schematic cross-section of a healthy heart (a) and hearts with DCM (b), HCM (c) and ARVC (d) In dilated cardiomyopathy (DCM), the left becomes enlarged with a thin, weakened muscle wall, and is unable to generate enough pumping force during contractions; the myocardium is thickened in hypertrophic cardiomyopathy (HCM); in arrhythmogenic right ventricular cardiomyopathy (ARVC) fibrofatty infiltration of the myocardium leads to arrhythmia. Figure published by Wilde & Behr, Nature Reviews Cardiology, 2013; used with permission. For more information on cardiomyopathy types, see box 1.

THE DISEASE 9 Clinical diagnosis and treatment guidelines When the symptoms of cardiomyopathy appear, the diagnosis of the disease is most frequently made by electrocardiogram (ECG), and non- invasive imaging techniques such as an X-ray of the chest, echocardiography (imaging of the heart with ultrasound), or MRI. In addition, patients receive a general medical examination combined with a simple blood test (measuring, for instance, molecular markers of heart failure or kidney function). Less regularly, cardiac catheterization or coronary angiography is used. Both these methods are “minimally invasive”, only a thin tube is inserted in one of the biggest veins of the body and threaded to the heart, instead of an open surgery. These help physicians acquire either a myocardial biopsy for further experimental analysis or enough information to exclude potential blockage (stenosis) of the heart and the coronary blood vessels. The therapy for cardiomyopathy largely depends on the disease type and the severity of the symptoms. Therapy aims at slowing down the progres- sion of the disease or at disease prevention in susceptible individuals through life style changes and medical treatment using different antihypertensive, antiarrhythmic, diuretic or anticoagulant drugs (e.g., ACE inhibitors or calcium antagonists). In serious cases of arrhythmia, the implementation of an ICD (a small, implantable defibrillator) or a pacemaker may be the solution. Heart transplantation is only considered as a last resort in patients with end-stage heart failure.

The genetic causes Even though there are several environmental factors that may trigger the onset of cardiomyopathy (viral infections, the use of certain drugs, alcoholism, and other cardiovascular conditions, as well as certain systemic disorders), we often see the disease running in families (30-50% of ARVC and DCM cases; see box 1 for definitions). Most of these familial cardiomyopathy cases can be explained by an autosomal dominant (AD) inheritance pattern. To date, about 76 genes are known to be involved in different types of cardiomyopathy, which often show considerable genetic overlap (figure 2), and the majority of these 76 genes show AD inheritance. Additionally, a few genes, such as DMD, EMD, GLA, LAMP2, or TAZ, are involved in the X-linked form of the disease. Exceptionally, autosomal recessive inheritance is also observed. These patients usually exhibit more severe symptoms, and the disease generally begins in infancy or early childhood (paediatric cardiomyopathies; the genes involved include ANO5, MYL2, PKP2, TNNI3).

10 CLINICAL DIAGNOSIS AND TREATMENT GUIDELINES Although it is also known that abnormalities in mitochondrial DNA can contribute to the pathogenesis of different cardiomyopathies (e.g., mutations of MTTL1), this has not yet been extensively studied. The possible complex, PREFACE oligogenic or multifactorial causes for cardiomyopathies have also not been investigated in detail, nor have the potential roles of risk alleles of lower effect size, copy number variations (such as those including the BAG3 or PRDM16 genes), or microRNAs. To date, a significant proportion of familial cardiomyopathies (about 30-40% of HCM, 40-50% of ARVC, and around 50% of DCM cases; see box 1 for definitions) remain genetically unexplained.

Figure 2. Cardiomyopathy disease genes and the genetic overlap between subtypes of the disease (updated from Jongbloed at el, EOMD 2011) Not only is there considerable phenotypic overlap between the subtypes of cardiomyopathy, many genes are also involved in multiple forms of the disease. The official full names of the abbreviated genes, according to OMIM, are listed in appendix 1.

THE GENETIC CAUSES 11 Types of cardiomyopathy There are various forms of cardiomyopathy, each with different underlying causes for the insufficient circulation. The cardiomyopathies investigated in this thesis include: 1. dilated cardiomyopathy (DCM): one or both of the ventricles (in most cases only the left one) become enlarged with a thin, weakened muscle wall unable to generate enough pumping force during contractions (figure 1) 2. arrhythmogenic right ventricular cardiomyopathy (ARVC): the replacement of the degenerating myocardium with scar (fibrofatty) tissue results in disturbed electrical sig- nals and conduction in the heart (arrhythmia) 3. hypertrophic cardiomyopathy (HCM): a thickened myocardium due to abnormal growth and arrangement (hypertrophy and disarray) of muscle fibres results in smaller chamber volume and sometimes blocks the blood flow (obstruction) 4. restrictive cardiomyopathy (RCM): due to their stiffness, the ventricles do not get refilled with enough blood during relaxation, hence the heart cannot supply the organs with sufficient circulation during contraction 5. left-ventricular non-compaction cardiomyopathy (LVNC): the wall of the left ven- tricle is spongiform, characterized by a meshwork of muscle fibres 6. peripartum cardiomyopathy (PPCM): a special form of dilated cardiomyopathy that becomes manifest towards the end of pregnancy or within a few months following delivery 7. paediatric cardiomyopathy: this type of cardiomyopathy becomes manifest in infancy or early childhood, and is usually characterized by more severe symptoms and worse outcomes than when the disease manifests in adulthood (from a structural-functional point of view, most frequently it is DCM>HCM>RCM>ARVC)

Our methods Candidate gene screening • Sanger-sequencing: This method of DNA-sequencing allows us to detect single nucle- otide changes and small indels of DNA fragments with an average size of 400-500 base pairs. It can be used for screening candidate genes in a large cohort of patients, as well as for segregation analysis of a variant within a family, or for confirmation of DNA-vari- ations detected by high-throughput sequencing. Disease gene mapping • haplotype sharing test (HST): An ideal, SNP-genotyping-based method for small car- diomyopathy families, who are usually not suitable for classical linkage analysis. With this method, we aim to identify chromosomal regions shared among affected family members, hypothesizing that the highest chance of finding the mutation is in the largest shared re- gion of the family. We use this method as a filtering step in exome sequencing data anal- ysis – if a variant is located in the 2nd largest shared haplotype of 10 cM, it is more likely to be causative than a variant located in the 57th largest shared haplotype of only 0.1 cM. High-throughput sequencing • exome sequencing: Sequencing all coding parts (exons) of all genes (about 1% of the genome). Though costly and requiring intensive data analysis, this method is suitable for identifying private coding mutations of novel disease genes in families with an un- known genetic cause of cardiomyopathy. • gene-panel based (targeted) sequencing: High-throughput sequencing of a DNA sample previously enriched for the small set of genes we are interested in. Since this method results in very high coverage across the regions of interest and high data quality, it has recently been implemented in routine diagnostics.

12 THE GENETIC CAUSES The challenges we face Identifying a novel disease gene carrying the heterozygous causal variant (heterozygous because of the dominant inheritance) is usually more PREFACE challenging than working on a recessive disease, but there are also other complications to be considered in our research. Cardiomyopathy is, in general, a late onset disease. For example, DCM usually begins between 20 and 50 years of age, while most ARVC patients are diagnosed before 40 years of age. Thus, low penetrance of the disease at young age makes it difficult to make the genetic diagnosis in a family as the disease status of young relatives is uncertain (partly due to the variety in the nature and severity of the symptoms). Furthermore, phenocopies also occur, with family members having comparable symptoms due to an independent cause (e.g. developing disease on the basis of another, often non-genetic, cardiovascular event: coronary artery disease). In consequence, the medical diagnosis of cardiomyopathy is based on exclusion criteria and performing segregation analysis for a putative pathogenic variant in families without being absolutely sure of the healthy/affected status of the screened individuals can be complicated. Since cardiomyopathy can be so difficult to diagnose, and because the chances of a successful treatment rapidly decline with time, our aims are (1) to obtain an early (molecular) diagnosis of the inherited form of the disease before severe symptoms become manifest, and (2) to enable preventive treatment (including life-style changes as well as medical treatment if necessary) of the endangered individuals, combined with regular, thorough cardiological check-ups.

THE CHALLENGES WE FACE 13 Recommended literature website of the National Heart, Lung, and Blood Institute, health topic on cardiomyopathies: http://www. nhlbi.nih.gov/health/health-topics/topics/cm/ website of the Children’s Cardiomyopathy Foundation: http://www.childrenscardiomyopathy.org/site/ main_brochure.htm Herschberger RE, Lindenfeld J, Mestroni L et al: Genetic evaluation of cardiomyopathy – a Heart Failure Society of America guideline. J Cardiac Fail 2009;15:83-97 Wilde AA & Behr ER: Genetic testing for inherited cardiac disease. Nat Rev Cardiol 2013;10:571-83 Teekakirikul P, Kelly MA, Rehm HL et al: Inherited cardiomyopathies – Molecular genetics and clinical genetic testing in the postgenomic era. J Mol Diagn 2013;15:158-70 Jongbloed JD, Pósafalvi A, Kerstjens-Frederikse WS et al: New clinical molecular diagnostic methods for congenital and inherited heart disease. Expert Opin Med Diagn 2011;5:9-24 Posafalvi A, Herkert JC, Sinke RJ et al: Clinical utility gene card for: dilated cardiomyopathy (CMD). Eur J Hum Genet 2013;21 doi: 10.1038/ejhg.2012.276 Te Rijdt WP, Jongbloed JD, de Boer RA et al: Clinical utility gene card for: arrhythmogenic right ventricular cardiomyopathy (ARVC). Eur J Hum Genet 2014;22. doi: 10.1038/ejhg.2013.124 Udeoji DU, Philip KJ, Morrissey RP et al: Left ventricular noncompaction cardiomyopathy: updated review. Ther Adv Cardiovasc Dis 2013;7:260-73 Caleshu C, Sakhuja R, Nussbaum RL et al: Furthering the link between the and primary cardio- myopathies: restrictive cardiomyopathy associated with multiple mutations in genes previously associ- ated with hypertrophic or dilated cardiomyopathy. Am J Med Genet A 2011;155A:2229-35 Peled Y, Gramlich M, Yoskovitz G et al: Titin mutation in familial restrictive cardiomyopathy. Int J Cardiol 2014;171:24-30 Wooten EC, Hebl VB, Wolf MJ et al: Formin homology 2 domain containing 3 variants associated with hypertrophic cardiomyopathy. Circ Cardiovasc Genet 2013;6:10-8 Chang B, Nishizawa T, Furutani M et al: Identification of a novel TPM1 mutation in a family with left ven- tricular noncompaction and sudden death. Mol Genet Metab 2011;102:200-6 Luxán G, Casanova JC, Martínez-Poveda B et al: Mutations in the NOTCH pathway regulator MIB1 cause left ventricular noncompaction cardiomyopathy. Nat Med 2013;19:193-201 Purevjav E, Varela J, Morgado M et al: Nebulette mutations are associated with dilated cardiomyopathy and endocardial fibroelastosis. J Am Coll Cardiol 2010;56:1493-502 Arndt AK, Schafer S, Drenckhahn JD et al: Fine mapping of the 1p36 deletion syndrome identifies mutation of PRDM16 as a cause of cardiomyopathy. Am J Hum Genet 2013;93:67-77 Ohno S, Omura M, Kawamura M et al: Exon 3 deletion of RYR2 encoding cardiac is associated with left ventricular non-compaction. Europace 2014;16:1646-54 Pinto JR, Yang SW, Hitz MP et al: Fetal cardiac isoforms rescue the increased Ca2+ sen- sitivity produced by a novel double deletion in cardiac linked to restrictive car- diomyopathy: a clinical, genetic, and functional approach. J Biol Chem 2011;286:20901-12 van Hengel J, Calore M, Bauce B et al: Mutations in the area composita αT- are as- sociated with arrhythmogenic right ventricular cardiomyopathy. Eur Heart J 2013;34:201-10 Pruszczyk P, Kostera-Pruszczyk A, Shatunov A et al: Restrictive cardiomyopathy with atrioventricular con- duction block resulting from a mutation. Int J Cardiol 2007;117:244-53

14 RECOMMENDED LITERATURE OUTLINE OF THIS THESIS The aims of this thesis are (1) to provide a better understanding of the genetic background and the molecular pathomechanism of familial cardiomyopathies, (2) to identify novel disease genes in unsolved families, and (3) to improve the existing methods of molecular diagnostic testing. Chapter 1 is a detailed introduction to the field of cardiogenetics. This chapter reviews congenital and late onset heart diseases (the latter referring OUTLINE OF THE THESIS to cardiomyopathies and arrhythmia syndromes), categorizes the genes involved in the different types of heritable heart diseases, and thoroughly describes the research methods with special attention paid to their potential future diagnostic applications in cardiovascular diseases. The subsequent chapters contain experimental data and are subdivided based on the research methods used. In chapter 2, we applied the classical candidate gene screening approach Sanger sequencing. We were interested if (and to what extent) the known DCM gene RBM20 contributes to the genetic background of the disease in Dutch patients (2.1). In addition, we hypothesized that the desmosomal PLEC gene may play a role in the development of ARVC. In an attempt to prove this, we studied the clustering of sequence variations in patients compared to that in a healthy control population (2.2). High-throughput sequencing is a recent technological development that is revolutionizing the science of genetics. We applied two different experimental designs of this method to elucidate genetic causes for cardiomyopathies. In chapter 3, we have described families where mutations in known cardiomyopathy genes had been excluded, and we successfully applied exome sequencing to identify novel disease genes in both autosomal dominant (3.1) and recessive (3.2) cardiomyopathies, while 3.3 is an interesting case report on a family suffering from both forms of the disease. In chapter 4, we applied targeted enrichment of DNA samples to a set of well-defined candidate disease genes. We address the applicability and the quantitative advantages of targeted sequencing in routine diagnostics for a cohort of 252 unselected cardiomyopathy patients in 4.1, while report our findings on targeted sequencing of PPCM/DCM families in 4.2. The work described in this thesis is then discussed in a broader context, and future perspectives for the use of high-throughput sequencing in research and diagnostic settings, as well as potential research directions in the field of cardiogenetics, are presented in chapter 5.

…ABOUT CARDIOMYOPATHIES IN A NUTSHELL 15 APPENDIX 1 List of cardiomyopathy genes:

(official abbreviations and names of genes included in figure 2 of the preface)

ABCC9 ATP-binding cassette, subfamily C (CFTR/MRP), member 9 ACTC1 , alpha, cardiac muscle 1 ACTN2 , alpha 2 ANKRD1 repeat domain 1 (cardiac muscle) ANO5 anoctamin 5 BAG3 BCL2-associated athanogene 3 CALR3 3 CAV3 caveolin 3 CRYAB crystalline, alpha B CSRP3 cysteine and glycine-rich protein 3 (cardiac LIM protein) CTNNA3 catenin (cadherin-associated protein), alpha 3 DES desmin DMD DOLK dolichol DSC2 desmocollin 2 DSG2 desmoglein 2 DSP DTNA dystrobrevin, alpha EMD emerin EYA4 EYA transcriptional coactivator and phosphatase 4 FHL1 four and a half LIM domains 1 FHL2 four and a half LIM domains 2 FHOD3 formin homology 2 domain containing 3 FKRP fukutin related protein FKTN fukutin FXN frataxin GATAD1 GATA zinc finger domain containing protein 1 GLA galactosidase, alpha ILK integrin-linked kinase JPH2 junctophilin 2 JUP junction LAMA4 laminin, alpha 4 LAMP2 lysosomal-associated 2 LDB3 LIM domain binding 3 LMNA A/C MIB1 mindbomb E3 ubiquitin protein 1 MT-TL1 mitochondrially encoded tRNA leucine 1 (UUA/G)

16 APPENDIX 1 MYBPC3 -binding protein C, cardiac MYH6 myosin, heavy chain 6, cardiac muscle, alpha MYH7 myosin, heavy chain 7, cardiac muscle, beta MYL2 myosin, light chain 2, regulatory, cardiac, slow

MYL3 myosin, light chain 3, alkali; ventricular, skeletal, slow APPENDIX 1 MYLK2 kinase 2 MYOZ2 myozenin 2 MYPN myopalladin NEBL nebulette NEXN nexilin (F actin binding protein) NKX2-5 NK2 homeobox 5 mtDNA mitochondrial DNA PDLIM3 PDZ and LIM domain 3 PKP2 2 PLN PRDM16 PR domain containing 16 PRKAG2 protein kinase, AMP-activated, gamma 2 noncatalytic subunit PSEN1 presenilin 1 PSEN2 presenilin 2 PTPN11 protein tyrosine phosphatase, non-receptor type 11 RAF1 Raf-1 proto-oncogene, serine/threonine kinase RBM20 RNA binding motif protein 20 RYR2 ryanodine receptor 2 (cardiac) SCN5A sodium channel, voltage-gated, type V, alpha subunit SDHA succinate dehydrogenase complex, subunit A, flavoprotein (Fp) SGCD sarcoglycan, delta (35kDa dystrophin-associated glycoprotein) TAZ tafazzin TBX20 T-box 20 TCAP titin-cap TGFB3 transforming growth factor, beta 3 TMEM43 transmembrane protein 43 TMPO thymopoietin TNNC1 type 1 (slow) TNNI3 type 3 (cardiac) TNNT2 troponin T type 2 (cardiac) TPM1 1 (alpha) TTN titin TTR transthyretin TXNRD2 thioredoxin reductase 2 VCL

LIST OF CARDIOMYOPATHY GENES 17 APPENDIX 1 Frequently used abbreviations: ACE angiotensin convertase AD autosomal dominant inheritance pattern AGVGD align Grantham variation Grantham distance (pathogenicity prediction software for missense variants) AR autosomal recessive ARVC arrhythmogenic right ventricular cardiomyopathy bp CGH comparative genomic hybridization CHD congenital heart disorders cM centimorgan CNV copy number variation DCM dilated cardiomyopathy dbSNP NCBI’s SNP database DMEM Dulbecco’s Modified Eagle Medium DNA deoxyribonucleic acid EBS epidermolysis bullosa simplex ECG electrocardiogram ES exome sequencing ESP exome sequencing project (variant database of the NHLBI) E. coli Escherichia coli FBS fetal bovine serum GERP genomic evolutionary rate profiling (a score indicating the evolutionary conservation of a nucleotide) GoNL Genome of the Netherlands (database of the genomes of 500 individuals, used as a frequency database of “the Dutch wild type”) GWAS genome-wide association study HCM hypertrophic cardiomyopathy HEK human embryonic kidney 293T cells HF heart failure HiSeq Illumina’s Next Generation Sequencer system HLA major histocompatibility complex genes HST haplotype-sharing test

H2O hydrogen oxide (water)

H2O2 hydrogen peroxide ICD implantable cardioverter-defibrillator LDB3 LIM domain binding 3 gene LSH longest shared haplotype

18 APPENDIX 1 LVNC left ventricular non-compaction cardiomyopathy MD muscular dystrophy MiSeq Illumina’s “personal sequencer”, the “little sister” of the HiSeq system in benchtop size, with faster workflow, allowing the assembly of small

genomes or target regions APPENDIX 1 MRI magnetic resonance imaging mRNA messenger RNA NCBI National Center for Biotechnology Information NGS next generation sequencing NHLBI National Heart Lung and Blood Institue, a division of National Institutes of Health in the USA OMIM “Online Mendelian Inheritance in Man” – a comprehensive database of human genes and genetic phenotypes authored and edited by the Johns Hopkins University PBS phosphate buffered saline PCR polymerase chain reaction PLEC plectin PolyPhen Polymorphism phenotype (pathogenicity prediction software for missense variants) PPCM peripartum cardiomyopathy RBM20 RNA binding motif protein 20 RCM restrictive cardiomyopathy ROS reactive oxygen species RNA ribonucleic acid RT reverse transcription SCD sudden cardiac death SIFT sorting intolerant from tolerant (pathogenicity prediction software for missense variants) SNP single nucleotide polymorphism SOD2 superoxide dismutase 2 TFC task force criteria (diagnostic criteria of ARVC) tRNA transfer RNA TTN titin, the longest gene of the VOUS variant of unknown significance VUS variant of unknown significance 1000G 1000 Genomes catalog of human genetic variation

FREQUENTLY USED ABBREVIATIONS 19

CHAPTER 1 INTRODUCTION

Chapter 1: Introduction

Novel clinical molecular diagnostic methods for congenital and inherited heart disease

Jan DH Jongbloed, Anna Posafalvi, Wilhelmina S Kerstjens-Frederikse Richard J Sinke, J Peter van Tintelen

Published in Expert Opinion on Medical Diagnostics, 2011 Importance of the field: For patients with inherited and congenital heart disorders, causative mutations are often not identified due to limitations of current screening techniques. Identifying the mutation is of major importance for genetic counseling of patients and families, facilitating the diagnosis in persons at-risk and directing clinical management. Next generation sequencing (NGS) provides unprecedented opportunities to maximize mutation yields and improve clinical management, genetic counseling and monitoring of patients. Areas covered in this review: We review recent NGS applications, focusing on methods relevant for molecular diagnostics in cardiogenetics. We discuss requirements for reliable implementation into clinical practice and challenges that clinicians, bioinfomaticians and molecular diagnosticians must deal with in analyzing resulting data. What the reader will gain: Readers will be introduced to recent developments, techniques and applications in NGS. They will learn about possibilities of using it in clinical diagnostics. They will become acquainted with difficulties and challenges in interpreting the data and considerations around communicating these issues to patients and the community. Take home message: Although several obstacles are to overcome and much still to learn, NGS will revolutionize clinical molecular diagnostics of inherited and congenital cardiac diseases, maximizing mutation yields and leading to optimized diagnostic and clinical care.

Keywords: cardiogenetics, molecular clinical diagnostics, next-generation sequen- cing, targeted enrichment, exome sequencing, inherited and congenital heart disease

Article highlights: 1. Novel clinical molecular diagnostic methods in cardiogenetic diagnostics are to be found in the field of Next generation sequencing (NGS) and novel applications that have recently become available with the launching of this technology will become part of daily diagnostic practice. 2. The main challenges of the implementation of NGS in daily diagnostic work are the as- surance of good quality control and reliable data analysis and interpretation. 3. The most important consideration for clinical counseling will be the ascertainment of variants with uncertain clinical significance and the only reasonable way to deal with this problem is to pursue maximum data dissemination in the scientific community. 4. NGS provides unique solutions and will bring shorter reporting times, maximize muta- tion detection rates, and decrease costs if all the disease-related genes can be tested in parallel in a single experiment. 5. Despite the technological, bioinformatical and ethical problems, the use of NGS technology will lead to much improved and more effective diagnostic and preventive care for patients suffering from inherited and congenital heart disorders (CHD) and their relatives. INTRODUCTION What started in the 1950s with observations of cardiac diseases segregating in families and suggesting heritable disease [1][2], has led in the last 15 years to the identification of many disease-associated genes and mutations. Advances in cardiogenetics have exceeded the level of being scientifically interesting phenomena and have major implications in genetic counseling and in directing clinical therapy [3][4][5]. Not only the expanding possibilities in DNA analyses, but also the increased awareness among cardiologists, pediatric CHAPTER 1 cardiologists, and general practitioners of the potential heritability of cardiac disease has led to growing numbers of patients being referred to departments of genetics and/or cardiogenetic outpatient clinics for genetic counseling and DNA diagnostics. Diseases for which patients attend the cardiogenetics outpatient clinic are primary arrhythmia syndromes [4], cardiomyopathies [6] or familial congenital heart disorders (CHD) [7][8]. Examples of arrhythmia syndromes are the congenital long QT syndrome (LQTS), Brugada syndrome or cathecholaminergic-induced polymorphic ventricular tachycardia (CPVT), which are all associated with sudden cardiac death (SCD) at relatively young age. Most patients with a cardiomyopathy present with hypertrophic (HCM) or dilated (DCM) cardiomyopathy. Arrhythmogenic right ventricular cardiomyopathy (ARVC), restrictive (RCM) and left ventricular non- compaction cardiomyopathies (LVNC) are encountered less frequently. Cardiomyopathies frequently present with output-failure leading to fatigue, however, arrhythmias and SCD may occur. Finally, examples of CHD that may be heritable include either valvular abnormalities, such as bicuspid aortic valve/aortic valve stenosis (BAV/AVS) or pulmonary valve stenosis (PVS), septal defects (like atrial or ventricular septal defects; ASD/VSD), endocardial cushion defects (atrio-ventricular septal defect: AVSD), vascular abnormalities, such as coarctation of the aorta (CoA) or persistent ductus arteriosus (PDA), and more complex abnormalities like hypoplastic left heart syndrome (HLHS), tetralogy of Fallot (TOF), or heterotaxy-related cardiac abnormalities like transposition of the great arteries. Notably, genetics of CHD’s becomes increasingly important, because due to the enormous development in surgical and cardiological care many of the 1 in 100 people born with a CHD survive to have offspring [9][10]. Interestingly, the boundaries between the different clinical entities are disappearing as overlapping clinical phenotypes are being recognized more frequently. For example, patients suffering from arrhythmia syndromes

NOVEL CLINICAL MOLECULAR DIAGNOSTIC METHODS 25 have been reported to also developing a cardiomyopathy [4]. In addition, patients diagnosed with inherited CHD’s have been reported in which also a cardiomyopathy is identified [11][12]. As expected from this, both clinical and genetic heterogeneity is often being observed within these disorders (see also below). This review describes new developments in clinical molecular diagnostic methods for inherited and CHD’s, with a focus on the novel applications that have recently become available with the launching of NGS (NGS) technologies.

CURRENT CARDIOGENETIC DIAGNOSTICS In recent years, the relevance of genetic analyses in the genetic counseling and monitoring of patients and their family members having a cardiac disease with a proven, or at least suspected familial nature, has been increasingly recognized. Genetic analyses have therefore become an important part of the diagnostic activities to reach a clinical diagnosis in such patients. Since the first discovery of the MYH7 gene underlying HCM [13], a growing number of tests for heart-related disease have been introduced in DNA diagnostic laboratories worldwide, including array-CGH technology for the diagnostics of CHD’s. This is exemplified by the fact that at Orphanet, the European database for rare diseases and orphan drugs [14], and or GeneTests [15] websites, the mutation analyses for the majority of known genes related to inherited cardiomyopathies, arrhythmia syndromes and congenital structural cardiac disorders are being offered in at least one of the European laboratories (see also Table 1). Important to note however is that in most inherited cardiac diseases, the genetic cause has not identified yet. For example, in at most 50% of ARVC, 25% of DCM, 60% of HCM, 25% of LVNC and ~10% of RCM patients the underlying disease gene was found. The large number of genes related to these different groups of inherited cardiac diseases underscores the fact that the genetic causes of these disorders show a high level of heterogeneity. Moreover, some of the genes have proved to be mutated in different cardiac diseases. This concept was first recognized within specific disease entities. As a result, the fact that both HCM and DCM can be caused by mutations in genes encoding components of the sarcomere, the contractile machinery of cardiomyocytes, has been known since about the year 2000 [17]. However, the boundaries between the different cardiac diseases are also fading, as there is no longer a strict separation between cardiomyopathies and channelopathies due to recent observations that mutations in ion-channel

26 INTRODUCTION and related can also play a role in the pathogenesis of DCM [18][19]. Moreover, a genetic overlap between cardiomyopathies/channelopathies and inherited structural cardiac disorders has also been suggested. For example, several mutations in sarcomeric proteins have been described that resulted in congenital heart malformations (Table 1) [8]. In addition, mutations in the cardiac T-box factor gene TBX20 were shown to result in cardiomyopathies in both mice and human, among other cardiovascular abnormalities [11]. The phenomenon discussed above is exemplified in Figure 1 by showing the

genetic heterogeneity and overlap in genes that underlie different types of CHAPTER 1 cardiomyopathies (DCM, HCM, LVNC, ARVC, and RCM), including a few genes that are also known to be involved in channelopathies (RYR2, SCN5A and PRKAG2) or CHD’s (MYH7, MYBPC3). Finally, in addition to the significant heterogeneity of monogenic cardiac diseases, there is an emerging recognition that a significant proportion of patients carry two or more independent disease-causing gene mutations, which lead to more severe forms of clinical disease [20]. These might occur in the same gene (compound heterozygotes) or in different genes (bi- or multigenic). There may also be genetic modifiers present that are associated with a poorer prognosis. This concept and the fact that many genes might underlie a disease support the idea that large numbers of genes should be analyzed in parallel preferably within the same experiment in patients with inherited cardiac disorders to improve risk-assessment. Together, these observations imply that at least 110 genes are putative candidate disease genes in patients presenting at cardiogenetic outpatient clinics for a genetic diagnosis, since there are now ~60 cardiomyopathy [21], ~20 channelopathy [22], and ~30 CHD disease genes [8] known to be involved in the respective diseases (Table 1). Up to today, these genes have been analyzed at the nucleotide level on a gene-by-gene basis mainly. For this purpose, various pre-screening techniques like denaturing gradient gel electrophoresis (DGGE), denaturing high-performance liquid chromatography (dHPLC), single strand conformation polymorphisms analysis (SSCP), conformation-sensitive capillary electrophoresis (CSCE), or high-resolution melting analysis (HRM) are generally being used to screen for aberrant PCR-amplified DNA sequences. The abnormal PCR fragments are then subsequently analyzed by Sanger sequencing to identify the exact nucleotide substitutions [23]. However, in a considerable number of genetic laboratories, the preferred screening approach is direct dideoxy sequencing of all exonic and adjacent intronic sequences of genes of interest without using pre-screening methods.

NOVEL CLINICAL MOLECULAR DIAGNOSTIC METHODS 27 Figure 1. Genetic heterogeneity and overlap in genes causing cardiomyo- pathies. Shown are genes underlying DCM, HCM, LVNC, ARVC, and RCM. Nota- bly, some of these genes are also known to be in- volved in channel- opathies and/or congenital heart malformation (based upon [16]). Genes also invol- ved in congeni- tal cardiac disease are indicated in bold. Genes also involved in channelopathies are underlined. The genes incorporated are: ABCC9 (ATP-sen- sitive potassium channel), ACTC1 (cardiac α-actin), ACTN2 (α-atinin-2), CALR3 (Calre- ticulin 3), CAV3 (caveolin 3), (CSRP-3 (muscle LIM protein), CRYAB (Alpha-B chrystallin) DES (desmin), DSG2 (desmoglein-2), DSC2 (desmocollin-2), DSP (desmoplakin), DTNA (dystobrevin), DMD (dystrophin), EMD (emerin), EYA4 (Eyes absent 4), GLA (α-galacto- sidase), ILK (Integrin-linked kinase), JPH2 (junctophilin) JUP (junctional plakoglobin), LAMA4 (laminin α4), LAMP2 (lysosome-associated membrane protein 2), LDB3 (cypher/ ZASP), LMNA (lamin A/C), mtDNA (mitochondrial DNA), MYBPC3 (myosin-binding pro- tein C), MYH6 (α-myosin heavy chain), MYH7 (β-myosin heavy chain), MYL2 (regulato- ry myosin light chain), MYL3 (essential myosin light chain), MYPN (myopalladin), NEXN (nexilin), PDLIM (PDZ and LIM domain protein 3), PKP2 (plakophilin-2), PLN (phosphol- amban), PSEN1 (Presenilin-1), PSEN2 (Presenilin-2), PRKAG2 (AMPK-γ2 subunit), RBM20 (RNA binding motif protein 20), RyR2 (ryanodine receptor 2), SCN5A (cardiac sodium channel), TAZ (Tafazzin), TCAP (titincap/telethonin), TGFb3 (transforming growth fac- tor β3), TMPO (thymopoietin), TNNC1 (cardiac troponin C), TNNI3 (cardiac troponin I), TNNT2 (cardiac troponin T), TPM1 (α -tropomyosin), TTN (titin), VCL (metavinculin).

28 INTRODUCTION If available for the respective genes, multiplex ligation-dependent probe amplification is used to screen for the deletion and/or duplication of one or more exons, as these are not identified using PCR-based techniques [24]. Also in cardiogenetics, examples have been found in arrhythmia syndromes and cardiomyopathies [25][26]. However, since using these approaches is laborious, relatively expensive and time-consuming, DNA diagnostics is often limited to a maximum of ~10 putative disease genes, as health insurance companies are not prepared to reimburse many more gene tests, if at all. It is therefore often difficult to decide which genes should be screened in a specific patient. In CHAPTER 1 general, the genes being analyzed are those for which considerable mutation yields are reported in the literature. If a genotype-phenotype relationship has been identified, gene selection will of course be guided by the phenotypes identified in the respective patients and their affected family members. For example, in a patient presenting with DCM and conduction disease, the LMNA, DES and SCN5a genes are among the first genes to analyze, while patients presenting with an inherited arrhythmia syndrome should first be screened for genes encoding the respective ion channel proteins. As already mentioned above, in general, genetic testing in current cardiogenetic diagnostics is often limited to and guided by knowledge on the most common causative genes (for an overview of genes: see Table 1). The best possibilities to come to a genetic diagnosis in cardiomyopathies are in HCM, as mutations in the MYH7 and MYBPC3 genes account for ~80% of the cases in which a genetic cause is identified [21]. In HCM, genetic testing is therefore often started with these genes and, in addition, in the TNNT2 gene. When no mutation is identified in these 3 genes, the most logical option would be to analyze the other sarcomeric genes (TNNI3, TNNC1, ACTC1, TPM1, MYL2, MYL3 and TTN. The latter is very rarely screened since it is the largest human gene known). Other genetic analyses, like that of genes encoding Z-disk proteins, are often not performed because reported mutation yields are <1%, with the exception of the CSRP3 gene (1-5%) [5][21]. In DCM, up to 40 genes are known to cause disease, all with relatively low frequencies. Due to their higher frequencies, testing often starts with LMNA, MYH7 and TNNT2 [21]. For LVNC, choices in genetic testing are often comparable to those in DCM and HCM since most mutations are as yet found in sarcomeric genes. In ARVC, the group of candidate genes is relatively small, providing the opportunity to test them all. However, given the significantly higher yields reported forPKP2 , mutation screening in that gene should be considered before continuing

NOVEL CLINICAL MOLECULAR DIAGNOSTIC METHODS 29 with other (desmosomal) genes, of which the DSC2 and DSG2 genes are the most logic next choice [27]. Whereas the success rate in genetic testing in the majority of cardiomyopathies can still be improved, that of channelopathies is high, with mutations identified in most of the patients. In particular in long QT syndrome, in over 80% of cases a genetic diagnosis is made. For cases of CHD, genetic diagnostics is currently mainly driven by phenotypic characteristics in the respective patient, because of the high diversity in non-syndromic CHD’s. Moreover, these patients are often also screened by performing array-CGH analysis. Array-CGH is an assay in which DNA samples from patients and a healthy control are labeled with different fluorescent dyes and cohybridized to an array containing known DNA sequences. Differences in relative fluorescence intensities of hybridized DNA on the microarray then reflect differences in copy number between the genome of the patients and the healthy control. When, applying this method, a microdeletion or duplication is identified that is not known as a common copy number variant (CNV), this might represent the disease-causing genomic imbalance [28][29]. Mutation screening of the most promising candidate gene or genes in such a deletion or duplication in a cohort of CHD patients might result in identifying new disease genes. For example, mutation analysis in 402 patients of the top- ranking TAB2 candidate gene, which was one of the five genes in a critical 850 kb deleted region on 6q that was shared by 12 CHD patients resulted in finding two conserved missense mutations [30]. Finally, with respect to current testing regimes it is important to note that recently gene-chip re-sequencing technologies were implemented, providing the opportunity to analyze larger numbers of genes within one test. Although its use is limited yet since it is often only commercially available, this technology will be of importance in cardiogenetic diagnostics in the coming years (see also section 3.1). Even when using predictors like known mutation yield frequencies, phenotypes, or family history, in deciding on gene analyses, significant numbers of patients are left without a genetic diagnosis from current DNA diagnostic practice. Moreover, as also mentioned previously, a significant proportion of patients carries more than one mutation. Thus, other approaches are needed to maximize mutation yields and minimize investigation times. In the next section, highly promising possibilities that have recently become available to optimize cardiogenetic diagnostics will be presented and discussed. Notably, some of these techniques, such as re-sequencing arrays (CardioChips), have already been implemented but most are not yet being used in regular DNA diagnostics.

30 INTRODUCTION Table 1. Tentative summary of genes* involved in inherited and congenital heart disease#. Genes mainly involved in cardiomyopathies Skeletal cardio- Structural arrhythmias muscle Remarks myopathies heart disease disease Gene-group Sarcomeric proteins CALR3 + DTNA + + MYBPC3 ++ + MYH6§ ++ + MYH7 ++ + + MYL2 + CHAPTER 1 MYL3 + NEXN + TNNC1 ++ TNNI3 ++ TPM1 ++ Nuclear envelope EMD + ++ LMNA ++ + + LAP2/TMPO ++ Cyto-architecture ACTC1 ++ ACTN2 ++ CRYAB ++ Cataract CSRP3/MLP ++ DES ++ + ++ DMD + ++ FHL2 ++ FKRP + ++ FKTN + ++ ILK ++ LAMA4 ++ MYPN ++ PDLIM3/ALP ++ SCGD + ++ TCAP ++ + TTN + + ++ VCL ++ ZASP(LDB3) ++ + Ion channels/ Calcium handling ABCC9 ++ PLN ++ SCN5A + ++ Desmosomal proteins DSC2 ++ DSG2 ++ DSP ++ syndromal JUP + Naxos disease PKP2 ++ Miscellaneous EYA4 + Hearing loss GLA + Storage disorder JPH2 + LAMP2 ++ + Storage disease PRKAG2 ++ + Storage disease mtDNA + + + PSEN1 and 2 + + (possible) Alzheimer disease RBM20 + TAZ ++ Syndromal TGFB3 + TMEM43 ++

NOVEL CLINICAL MOLECULAR DIAGNOSTIC METHODS 31 Genes mainly involved in arrhythmias Structural Skeletal cardio- arrhythmias heart muscle Remarks myopathies disease disease Gene-group Sodium channel related SCN1B ++ SCN4B ++ Epilepsy/seizures SCN5A + ++ SNTA1 ++ Potassium channel related AKAP9 ++ KCNA5 ++ KCNE1 ++ KCNE2 ++ KCNE3 ++ KCNH2 ++ KCNJ2 ++ syndromal KCNJ8 ++ KCNQ1 ++ Calcium metabolism CACNA1C ++ syndromal CASQ2 ++ RYR2 + ++ TRPM4 ++ Others CAV3 + ++ ++ GJA5 ++ GPD1L ++ HCN4 ++

Genes mainly involved in structural heart disease Structural Skeletal cardio- arrhythmias heart muscle Remarks myopathies disease disease Gene-group Transcription factors ANKRD1 ++ CITED2 ++ FOXH1 ++ GATA4 ++ GATA6 ++ NKX2.5 + ++ NKX2.6 ++ TBX1 ++ TBX5 + ++ TBX20 + ++ ZIC3 ++ Heterotaxy ZFPM2 ++ Ligands/receptors AVCR1 ++ Heterotaxy ALK2 ++ Syndromal CFC1 ++ Heterotaxy GDF1 ++ JAG1 ++ Syndromal LEFTY2 ++ Heterotaxy NODAL ++ Heterotaxy NOTCH1 ++ TDGF1 ++ Sarcomeric proteins ACTC1 ++ ++ MYH11 ++ miscellaneous BRAF + ++ syndromal

32 INTRODUCTION CRELD1 ++ ELN ++ KRAS ++ syndromal MAP2K1 and 2 ++ syndromal MED13L ++ NRAS ++ syndromal PTPN11 ++ syndromal RAF1 + ++ syndromal SHOC2 ++ syndromal SOS1 ++ syndromal TLL1 ++

*The gene names correspond to the following proteins (in alphabetical order): ABCC9, ATP-binding cassette, subfamily C, member 9; ACTC1, alpha actin; ACTN2, actinin, alpha 2; AVCR1, activin A receptor, type I; AKAP9, CHAPTER 1 A kinase (PRKA) anchor protein (yotiao) 9; ANKRD1, ankyrin repeat domain 1 (cardiac muscle); BRAF, v-raf murine sarcoma viral oncogene homolog B1; CACNA1c, calcium channel, voltage-dependent, L type, alpha 1C subunit; CALR3, calreticulin 3; CASQ2, 2 (cardiac); CAV3, caveolin 3; CFC1, cripto, FRL-1, cryptic family 1; CITED2, Cbp/p300-interacting transactivator, with Glu/Asp-rich carboxy-terminal domain, 2; CRELD1, cysteine-rich with EGF-like domains 1; CRYAB, chrystallin, alpha-B; CSRP3/MLP, cysteine- and glycine-rich protein 3 / cardiac LIM protein; DES, desmin; DMD, dystrophin; DSC2, desmocollin 2; DSG2, desmoglein 2; DSP, desmoplakin; DTNA, dystrobrevin, alpha; ELN, elastin; EMD, emerin; EYA4, eyes absent 4; FHL2, four-and- a-half LIM domains 2; FKRP, fukutin-related protein; FKTN, fukutin; FOXH1, forkhead box H1; GATA4, GATA binding protein 4; GATA6, GATA binding protein 6; GDF1, growth differentiation factor 1; GJA5, gap junction protein, alpha 5, 40kDa; GLA, galactosidase, alpha; GPD1L, glycerol-3-phosphate dehydrogenase 1-like; HCN4, hyperpolarization activated cyclic nucleotide-gated potassium channel 4; ILK, integrin-linked kinase; JAG1, jagged 1; JPH2, junctophilin 2; JUP, junction plakoglobin; KCNA5, potassium voltage-gated channel, shaker- related subfamily, member 5; KCNE1, potassium voltage-gated channel, Isk-related family, member 1; KCNE2, potassium voltage-gated channel, Isk-related family, member 2; KCNE3, potassium voltage-gated channel, Isk- related family, member 3; KCNH2, potassium voltage-gated channel, subfamily H (eag-related), member 2; KCNJ2, potassium inwardly-rectifying channel, subfamily J, member 2; KCNJ8, potassium inwardly-rectifying channel, subfamily J, member 8; KCNQ1, potassium voltage-gated channel, KQT-like subfamily, member 1; KRAS, v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog; LAMA4, laminin alpha-4; LAMP2, lysosome- associated membrane protein 2; LAP2/TMPO, lamina-associated polypeptide 2 / thymopoietin; LEFTY2, left- right determination factor 2; LMNA, lamin A/C; MAP2K1 and 2, mitogen-activated protein kinase kinase 1 and 2; MED13L, mediator complex subunit 13-like; mtDNA, mitochondial DNA; MYBPC3, myosin-binding protein C, cardiac; MYH11, myosin, heavy chain 11, smooth muscle; MYH6, myosin, heavy chain 6, cardiac muscle, alpha; MYH7, myosin, heavy chain 7, cardiac muscle, beta; MYL2, myosin, light chain 2, regulatory, cardiac, slow; MYL3, myosin, light chain 3, alkali, ventricular, skeletal, slow; MYPN, myopalladin; NEXN, nexilin (F actin binding protein); NKX2.5, NK2 transcription factor related, 5 (Drosophila); NKX2.6, NK2 transcription factor related, locus 6 (Drosophila); NODAL, nodal homolog (mouse); NOTCH1, notch 1; NRAS, neuroblastoma RAS viral (v-ras) oncogene homolog; PDLIM3/ALP, PDZ and LIM domain protein 3; PKP2, plakophilin 2; PLN, Phospholamban; PRKAG2, protein kinase, AMP-activated, gamma 2 non-catalytic subunit; PSEN 1 and 2, presenilin-1 and -2; PTPN11, protein tyrosine phosphatase, non-receptor type 11; RAF1, v-raf-1 murine leukemia viral oncogene homolog 1; RBM20, RNA binding motif protein 20; RyR2, ryanodine receptor 2 (cardiac); SCGD, delta sarcoglycan; SCN1B, sodium channel, voltage-gated, type I, beta; SCN4B, sodium channel, voltage-gated, type IV, beta; SCN5A, sodium channel, voltage-gated, type V, alpha subunit; SHOC2, soc-2 suppressor of clear homolog (C. elegans); SNTA1, syntrophin, alpha1 (dystrophin-associated protein A1, 59 kDa, acidic component); SOS1, son of sevenless homolog 1 (Drosophila); TAZ, tafazzin; TBX1, T-box 1; TBX20, T-box 20; TBX5, T-box 5; TCAP, titin- cap (telethonin); TDGF1, teratocarcinoma-derived growth factor 1; TGFB3, transforming growth factor, beta 3; TLL1, tolloid-like 1; TMEM43, transmembrane protein 43; TNNC1, troponin C (type 1: slow); TNNI3, troponin I type 3 (cardiac); TNNT2, troponin T type 2 (cardiac); TPM1, tropomyosin 1 (alpha); TRPM4, transient receptor potential cation channel, subfamily M, member 4; TTN, titin; VCL, vinculin; ZASP(LDB3), Z-band alternatively spliced PDZ motif-containing protein; ZFPM2, zinc finger protein, multitype 2; ZIC3, Zic family member 3 (odd- paired homolog, Drosophila). #Indicated are the diseases in which a particular gene is involved: ++: generally accepted to be involved in; +: incidentally found to be involved in. §Genes indicated in bold and in italics are not offered in at least one international laboratory by searching the Orphanet and GeneTests websites.

NOVEL CLINICAL MOLECULAR DIAGNOSTIC METHODS 33 FUTURE DIAGNOSTIC APPROACHES AND FIRST APPLICATIONS Since the early 1990s, enormous progress has been made in identifying the genetic causes of inherited or CHD’s. So far, the hunt for causal genes has been performed using linkage and association techniques or array-CGH analysis and the subsequent mutational analysis of genes in the candidate region(s). These methods resulted in the identification of causal genes encoding proteins that are parts of various cellular structures or pathways. The discovery that these structures or pathways are also involved in these diseases led to candidate gene approaches to screen genes encoding other components of these structures or pathways [31]. However, to date, only a small proportion of allelic variants underlying disease have been discovered. For example, as mentioned previously, for cardiomyopathies between 40-80% of patients/families are still without a genetic diagnosis. This is because these investigations are hampered by factors such as having only a relatively small number of affected individuals within families to perform a linkage- based approach. In addition, it is not feasible to screen the genes encoding proteins that are part of the cellular structures or pathways on a gene-by-gene basis, since these contain hundreds of proteins for which the encoding genes need to be tested. Furthermore, other proteins that are not involved in these structures or pathways may also play a role in disease development. In order to maximize genetic testing for patients with inherited cardiac disorders, we therefore need approaches that enable mutational screening of cardiac disease genes in one experiment and on a large scale. The novel genomic techniques and some adaptations that will permit this are discussed below.

1. Cardiochips and arrays In recent years, several platforms were launched to facilitate parallel processing of larger numbers of genes. Low-density DNA hybridization assays were already being used in the early 2000s to identify known mutations (including small deletions/insertions) in HCM [32]. Customized resequencing assays were developed to identify mutations in cardiomyopathy genes, exploiting the Affymetrix gene-chip re-sequencing array technology. Waldmüller et al. [33] reported the use of array-based re-sequencing for testing the three most commonly affected genes in HCMMYH7, ( MYBPC3 and TNNT2), while Foksteun et al. [34] demonstrated the use of a DNA re-sequencing array for detecting mutations in 16 HCM genes. In addition, Zimmerman et al. [35]

34 INTRODUCTION demonstrated the efficient analysis of 19 genes implicated in DCM using a CardioChip. Nowadays, these cardiochips are being used in a significant number of molecular diagnostic laboratories and until the applicability of NGS technology for molecular diagnostics has convincingly been proven, re- sequencing array methods provide the best approach to analyze multiple genes within one test. For example, Partners Healthcare, in cooperation with Harvard Medical School, offers genetic testing applying their DCM CardioChipTM TEST, the design of which is based on the CardioChip described by Zimmerman et al.

[35]. However, this type of technology is still not very widespread in daily CHAPTER 1 diagnostic practice and NGS applications that are currently being developed will, most likely, replace such array techniques in the near future.

2. Next generation sequencing NGS techniques have recently become available that provide the opportunity to identify every unique variant in an individual genome via whole-genome re-sequencing [36][37]. The molecular basis of each type of technology is a DNA library preparation (including shearing the DNA, adapter ligation, and gel purification of DNA fragments of the desired size), the amplification of the resulting single strands and performing sequencing reactions on the amplified strands. Using reaction chambers that contain huge amounts of such oligonucleotides, a large number of these arbitrary nucleotide strands can be analyzed in parallel in a single run. As a result, the nucleotide sequences (the so called “reads”) of millions of different DNA fragments can be determined within a relatively short time. Depending on the question to be answered, subsequent bioinformatic analyses can translate these nucleotide sequences into useful information, e.g. reports on variants/ mutations found in genes of interest, de novo assembly of genomic regions (up to a full genome), or copy number variations in parts of the genome or in the full genome. There are various companies offering machines and solutions that use this highly promising technique (for extensive overviews see: [38][39]). Although they are still being too expensive to be introduced at a diagnostic level, a personalized genome can now be produced. This was recently shown by Ashley and co-workers [40], who reported on the full genome of a patient with a family history of vascular disease and early SCD. However, in most cases targeted approaches will need to be applied to identify disease-causing mutations.

NOVEL CLINICAL MOLECULAR DIAGNOSTIC METHODS 35 3. Disease-specific targeted enrichment and re-sequencing The human genome contains about 27.000 putative genes [41]. Logically, not all of these are associated with a certain disease and to apply NGS without having to sequence a full genome, we need methods for targeted enrichment of DNA fragments encoding the known or suspected disease genes. As also mentioned above, at least ~110 genes have been implicated in monogenic cardiac disorders and mutation analysis on a gene-by-gene basis is not feasible. Several methods that may help to enrich these genes have become available in the last years, basically making use of either hybridization or PCR-based capturing. Hybridization-based enrichment generally utilize probes complimentary to the sequences of interest that are either presented in a solid phase, such as oligonucleotide microarrays, or in a solution phase, applying molecular inversion probe (MIP)-based or biotynylated RNA-based approaches [42][43]. To enrich for sequences of interest, the total DNA is applied to the probes and the desired fragments hybridize. The non-targeted fragments are subsequently washed away, and the enriched DNA eluted for re- sequencing. A recent proof-of-principle study convincingly demonstrated the applicability of solid-phase enrichment and subsequent NGS in a diagnostic context, using autosomal recessive ataxia as a prototypical heterogeneous monogenic disorder [44]. In this study, the complete genomic sequence (coding and noncoding regions) of seven genes known to cause autosomal recessive ataxia were presented on a NimbleGen sequence capture array. By hybridizing diagnostic samples onto this array, these were enriched for DNA fragments encoding parts of the seven genes. Subsequent re-sequencing using Roche 454 Titanium shotgun sequencing was used to determine the sensitivity and specificity of NGS of enriched samples for the identification of pathogenic mutations. The enrichment showed high specificity: 80% of the sequences obtained were on target, which means that these could be mapped back to the targeted gene regions. In addition, also high sensitivity was demonstrated: pathogenic mutations for 6/7 studied mutant alleles and more than 99% of known SNP variants were identified. Mutation and SNP detection accuracy was shown to be limited by sequence coverage and misalignment rather than sequencing errors [44]. Methods have also been developed that enable the specific PCR-driven amplification of DNA fragments of interest. Of these, the microdroplet-based PCR enrichment technique of RainDance technologies has been shown to be effective in the simultaneous amplification of almost 4,000 products [45]. In addition to this commercially available

36 INTRODUCTION technology, several laboratories have developed their own approaches. For example, in a case study long range PCR enrichment was used to amplify 16 HCM genes for subsequent NGS [46]. For this purpose, primers and reactions were used that PCR-amplified DNA fragments of ~5100 nucleotides with overlaps averaging 550 nucleotides and together encompass the genes in full in 14/16 genes. Resulting PCR fragments were gel purified, an equimolar pool of fragments generated, and Roche 454 and Illumina DNA libraries were prepared. Subsequent sequencing on the respective machines showed that

95% and 90%, respectively, of the sequencing reads were on target, but with CHAPTER 1 a pattern of variable coverage. The latter emphasizes the need to have sufficient sequencing depth (see also sections 4 and 6). Variants identified could be confirmed by Sanger sequencing [46]. When using selected enrichment it is important to realize that for the efficient use of NGS machine capacity, the parallel sequencing of multiple patient samples in a single run is preferred. In such cases, barcoding the patient-specific samples prior to sequencing will aid in distinguishing the different patient sample data after their joint sequencing run. Barcoding is the simple technique of adding a unique nucleotide sequence to the adapter sequences that are ligated to DNA fragments during the patients’ library preparation [47][48].

4. Exome sequencing Although selectively enriching a panel of genes will lead to the identification of the disease-causing gene or genes in significantly more cases, the fact that this panel is still a selection of genes encoded from the genome implies that the causal gene may still not be identified in each individual patient using this approach. A more comprehensive alternative would therefore be to enrich an individual’s DNA for all the protein-encoding regions (“the exome”) of the genome - the exome encompasses ~1% of the whole genome - and then perform NGS (exome sequencing; [42]). The method is the same as for targeted re-sequencing (see section 3.3), however instead of using probes complimentary to the coding sequences of a subset of genes, all known coding DNA fragments are presented as probes. Moreover, in contrast to sequencing a full genome, this approach is currently feasible for patients seeking a genetic diagnosis at cardiogenetic outpatient clinics. Exome sequencing was recently shown to be a powerful tool for identifying candidate genes in a proof-of-concept experiment by Ng et al. that used four unrelated, affected individuals with the rare, autosomally dominant Freeman-

NOVEL CLINICAL MOLECULAR DIAGNOSTIC METHODS 37 Sheldon syndrome, which is known to be caused by mutations in the MYH3 gene [49]. Exome sequencing of these four patients and subsequent data analyses indeed led to the identification of causative mutations in MYH3. To evaluate the effectiveness of the exome sequencing method, Ng et al. then applied the same approach to find the gene responsible for Miller syndrome, a rare disorder characterized by facial dysmorphia and abnormalities of the extremities. As a result, a single candidate gene was identified. The subsequent screening of this gene by conventional Sanger sequencing in other, unrelated kindreds led to the identification of additional disease-causing mutations [50]. Other recent reports have demonstrated the successful application of exome sequencing in making a genetic diagnosis for various disorders [51] [52][53]. Together, these studies show that exome sequencing is a powerful tool for identifying the causative genes in monogenic disorders. Thus, exome sequencing, rather than custom-designed enrichment techniques, might soon be the method of choice for DNA diagnostic purposes.

5. Combined approaches The above examples demonstrate the potential power of exome sequencing. However, these cases describe the hunt for the causal disease gene in patients and families with very rare syndromes for which a thorough phenotypic classification was possible and had been performed. More importantly, finding the gene was simplified by the fact that the disease showed recessive inheritance was caused by mutations in the same gene or by de novo mutations (although not certain for every disease at the time the analysis was started). Hunting disease genes in disorders that are known to be genetically very heterogeneous, like the cardiac disorders for which the development of dedicated diagnostics is the subject of this review, will probably be more challenging and more sophisticated bioinformatic analysis techniques might be needed (see the section on “Challenges of future diagnostics”). Therefore, methods to narrow down the genomic regions of interests in specific patient cohorts or families might support the identification of causal genes in these disorders. For example, in families showing an X-linked inheritance pattern, their mutation might be identified by applying X-exome capture and sequencing, as demonstrated for terminal osseous dysplasia by Sun et al. [54]. When array-CGH analysis in a patient with a certain type of CHD resulted in the identification of a small genomic deletion or duplication, targeted re-sequencing of the, in general, limited number of genes in this region in

38 INTRODUCTION a cohort of patients having the same disease, could be performed. Finding mutations in one of these genes would then confirm the role of this candidate gene in CHD’s. The availability of linkage data or association peaks will provide the possibility of focusing on that specific part of the exome that originates from those regions, instead of analyzing the complete exome. As an example, exome sequencing in conjunction with homozygosity mapping led to the rapid identification of the causative allele for non-syndromic hearing loss in a consanguineous Palestinian family [55]. No such example has been reported for a monogenic cardiac disorder yet. Interestingly, however, using CHAPTER 1 a haplotype sharing test, we were recently able to identify the causal MYH7 and PKP2 mutations in the shared regions of a single DCM and multiple ARVC families, respectively [56]. Combining this haplotype approach with exome sequencing and data analysis of genes in regions identified in such families that do not encode already known cardiac disease genes will most likely lead to the identification of the causal allele. As exemplified in the Insulin Resistance Atherosclerosis Family Study (IRASFS), exome sequencing has also been useful in finding rare variants that may be a common explanation for linkage peaks observed in complex trait genetics [57]. It is important to note, however, that this could only be achieved because only a few families in the sample contributed significantly to a linkage signal and these families all carried the same rare variant. Thus, exploiting such a combined approach will often be limited to families with sufficient affected individuals to enable haplotype sharing analyses or the application of other linkage techniques. Nevertheless, exome sequencing of larger groups of likely unrelated patients and subsequent data analysis and comparison will undoubtedly result in the identification of causative genes that are shared by two or more of these patients. Success will either be based on the presence of founder mutations in specific populations and, as a result, the presence of two or more patients within a cohort carrying the same gene mutation, or on the increasing chance of encountering two or more patients who carry different mutations, but lying in the same gene, when more exomes of patients with the same disease are sequenced.

6. Other applications In addition to DNA re-sequencing, other NGS applications will become available for clinical diagnostic purposes. Four of these are described below. (1) Coverage information (the number of reads covering a specific DNA sequence) of DNA re-sequencing runs can be used to identify copy number

NOVEL CLINICAL MOLECULAR DIAGNOSTIC METHODS 39 variations. Higher or lower coverage numbers of successive DNA sequence reads indicate duplications or deletions, respectively, of chromosomal regions, isolated genes or smaller parts thereof. This information is not often used at the moment, although it is automatically incorporated in the results of “ordinary” DNA re-sequencing [58][59]. (2) The NGS technique can support studies on biological interactions between DNA and proteins, like transcription factors, chromatin, or other DNA-binding proteins. By using chromatin immunoprecipitation (ChIP) and subsequent NGS of DNA fragments ChIP–derived sequences can be determined (ChIP-seq). ChIP-seq entails a series of steps: i, chemical cross-linking of DNA and associated proteins; ii, isolation and lysis of nuclei and subsequent DNA fragmentation; iii, the use of an antibody against the DNA-binding protein of interest, to specifically immunoprecipitate the associated protein:DNA complex; iv, reverse the chemical crosslink and isolate the DNA; v, sequence the resulting DNA fragments applying NGS.. Although a direct role of this technique in genetic diagnostics might as yet not be envisaged, this method will, for example, be important in identifying new candidate disease genes or in finding genes that are regulated by known disease genes and that may form targets for disease treatment. For example, this approach was used with the enhancer-associated protein p300 from mouse heart tissue (embryonic day 11.5) to identify over 3,000 candidate heart enhancers genome-wide [60]. (3) Instead of sequencing parts of the genome, the direct sequencing of mRNA molecules on a large scale can be performed using NGS platforms (RNA-seq; [61]). Since these molecules represent the nucleotide sequences that are transcribed into proteins, the probability that mutations identified at this level (the transcriptome) are truly expressed is higher than those identified at the DNA level. Moreover, using this approach, sequencing of tissue-specific RNA molecules can be performed, for example, enabling the identification of mutations specifically expressed in the heart or even in pre-determined cardiac cell types. Likewise, the nucleotide sequences of non-coding RNAs and/or microRNAs can be determined. (4) Comparable to copy number determination using coverage statistics of DNA re-sequencing results, figures on RNA expression levels can be discovered by using RNA sequencing data [62][63]. However, to identify mutations as well as to determine RNA expression levels in patients suffering from cardiac disease, myocardial sampling would be required. As this would require the use of invasive interventions, it is unlikely to become a regular application in standardized diagnostic work.

40 INTRODUCTION CHALLENGES OF FUTURE DIAGNOSTICS Several NGS applications are now available to broaden the DNA diagnostic possibilities in cardiogenetics. These will certainly lead to maximized identification of the disease-causing mutations. However, the challenge in applying NGS is not so much producing the data, but its subsequent quality control, analysis and interpretation. In NGS experiments large amounts of data (up to gigabases of nucleotide sequences in a single run) are being produced. Therefore, where data quality CHAPTER 1 is concerned, it is of the utmost importance for diagnostic purposes to have absolute confidence that every exon of interest, together with the flanking intronic sequence containing consensus splice site sequences, is being analyzed, and that fully reliable data is being produced. This probably implies that more stringent quality control criteria will be needed to fulfill clinical diagnostic requirements, than those needed for purely research projects. Logically, although of utmost importance, this does not apply to the proper distinction between true- and false-positives, since these are inherent in both research and diagnostic applications. However, particularly when hybridization-based capturing approaches are being applied, the minimum exon coverage needed to obtain nearly complete certainty that a heterozygous mutation will be detected has to be carefully established. This is in particular challenging when GC-rich regions are concerned, as was demonstrated when the performance of NimbleGen 385K custom arrays for the re-sequencing of 22 genes most of which associated with hereditary colorectal cancer was evaluated [64]. Since this might be of importance in certain disease entities, the minimum coverage has to be even more carefully determined if a mosaic situation is suspected and mutations or variants might be present at percentages <50%, as is expected at heterozygosity. In addition to taking care to obtain enough coverage of every DNA sequence of interest, there has to be absolute certainty that deletion/insertion mutations will be identified, as the difficulty in tracing these mutations is intrinsic to their characteristics. When custom-designed enrichment techniques are applied, the pooling of patient samples is needed to ensure the efficient use of sequencing flow cells. This implies the use of patient-specific barcoding and thus comprehensive monitoring of the patients material trace. In addition to this, efficient processing of larger numbers of patient material, as is daily practice in diagnostics laboratories, will benefit from automated library preparation (as an example see: [65]). Moving these laboratory procedures to a robotic workstation will be an important next step

NOVEL CLINICAL MOLECULAR DIAGNOSTIC METHODS 41 in getting NGS into clinical molecular diagnostics. Finally, since the careful archiving of patient-related experimental results over longer periods of time are a prerequisite for good quality diagnostic care, methods to handle the storage of the huge datasets produced by NGS, as well as the minimal requirements for their storage must be discussed. Undoubtedly, these quality control issues will be solved satisfactorily and reliable results from NGS analyses can then be communicated to the respective patients. More challenging, however, will be the interpretation of these results. Even when the number of analyzed genes is limited because of the use of targeted capturing or amplification, large numbers of variants will be identified for which putative pathogenicity has to be determined. When these variants concern nonsense or frameshift mutations, the origin of the affected sequences will need to be verified. Do these originate from pseudo-genes or pseudo-exons and can the respective variants therefore be discarded, or are these true truncating mutations? If the latter is the case, the disease-causing mutation might have been found. However, the largest number of variants identified will be missense mutations; the substitution of only one amino acid residue in the protein sequence concerned. The first step in analyzing the list of variants will be to use the correct methods and tools to reliably separate the possibly disease-related from benign variants. The view now emerging in the field is that the most important reference database needed to perform such analysis is the one compiling all the results of NGS and/or Sanger sequencing experiments already performed. This might be an in-house collection (probably the preferred starting point), but ideally will be a databases with sequencing data from several laboratories performing this type of work. In addition, NCBI’s SNP database might be used as a reference database, although this also incorporates variants for which a pathogenic nature is not excluded. Moreover, NCBI’s SNP database might contain variants that are harmless for carriers in the heterozygous state, but disease-causing in a homozygous or compound heterozygous carrier. Taken together, the big challenge in identifying disease causing mutations from rare, but benign variant is to get to know the frequencies in which these rare variants are present in patient cohorts compared to their presence in the general population. Re-sequencing studies of disease genes have typically not subjected control populations to the same level as patient cohorts and therefore these rare variants go undetected. Therefore, to value variants identified in re-sequencing studies it is of major importance to know how many new variants can be expected when a new set

42 INTRODUCTION of individuals of a given size is being sequenced. Interesting in this respect is that a recent study calculated that 350 individuals have to be sequenced to find all common variants (frequency at least 1%), whereas >3,000 individuals have to be analyzed to identify all variants with a frequency of at least 0.1% [66]. This underscores the importance of compiling open source databases containing large datasets of variants identified in re-sequencing experiments. After omitting all the variants identified more often in other sequenced individuals, the putative pathogenicity of the remaining variants needs to be determined. Several software tools can be applied, of which those that CHAPTER 1 calculate the level of conservation of the affected nucleotide (i.e. GERP; [67]) and/or amino acid residue (i.e. phyloP; [68]), so far seem to provide the most important discriminating factor [53][69]. In addition, aspects like differences in the physico-chemical properties of the amino acids involved, the presence of the affected amino acid in a known functional domain, the known involvement of the affected gene in a comparable disease or other diseases in general, and/ or the expression of the respective gene in the tissue or tissues of interest. Prediction programs like SIFT, Polyphen or MutPred combine knowledge on aspects useful in predicting the putative pathogenicity of variants [70][71]. Such programs might be incorporated into the pipelines that are being complied for analyzing NGS data. Several data analysis pipelines have recently been published (i.e. [72][73][74]). Together, this information will result in a ranking of variants, in which the highest ranked variants will represent the most likely disease-causing ones. Next, additional experiments will have to be performed to verify the pathogenicity of the variants. First, carriership in the patient has to be confirmed by Sanger sequencing. Second, if needed, the absence of the variant(s) in a large number of healthy controls has to be verified. Third, when possible and appropriate, co-segregation analysis within the relevant family has to be performed. And finally, certainly where exome sequencing approaches are concerned, the presence of the variants in other patients suffering from the same disease should be determined. Ideally, these combined approaches should facilitate the identification of either the main suspect or of a few suspects. However, in a lot of cases this will most likely not be the situation and further analyses, e.g. at the functional level, will be needed to identify the causal gene conclusively. It is also possible that several of the ranked variants may contribute to disease development, since this concept has now been well-established by studies in a considerable proportion of patients suffering from inherited cardiac diseases [18].

NOVEL CLINICAL MOLECULAR DIAGNOSTIC METHODS 43 CONCLUSION Recent developments in genome-wide screening techniques have created exciting possibilities for taking genetic diagnostics and research to a higher level. The availability SNP arrays is enabling not only the hunt for associations in larger groups of patients with multifactorial or polygenic diseases, but also the identification of disease-causing genomic regions in affected families. More importantly, using NGS techniques, the content of an individual’s complete genome, or larger parts of the genome, can now be determined at the single nucleotide level, certainly with capturing techniques that allow one to zoom in on all protein-encoding DNA fragments (the exome) or specific subsets of the exome. Applying NGS technology will greatly enhance the possibilities to identify new disease genes and should provide a unique way to reduce screening times and maximize mutation detection rates in clinical molecular diagnostics. But perhaps even more importantly, it may help decrease the costs of genetic testing per individual if all the relevant disease genes can be tested in parallel. Since a large number of putative disease genes may underlie disease in the field of cardiogenetics and multiple genes might contribute to disease development, the exploitation of NGS techniques will provide the field with the optimal diagnostic genetic toolbox now available. As discussed here and in the next section, it is important to realize that although NGS is a highly promising techniques, the results must be treated with great care and there is still much to learn.

EXPERT OPINION Current clinical care and molecular diagnostics of inherited and CHD’s suffers from laborious, time-consuming, costly procedures and the limited possibilities to screen all the known genes involved in the respective disease. This results in incomplete, expensive diagnostic work and long reporting times. NGS technology provides unique solutions and will bring shorter reporting times, maximize mutation detection rates, and decrease costs if all the disease-related genes can be tested in parallel. Although the re-sequencing of a whole genome could technically already be applied, this is not yet financially feasible. However, we expect is the genetic diagnostics of cardiac diseases to gradually grow to a point at which a personalized cardio-related genome is produced for every patient visiting a cardiogenetics outpatient clinic. Laboratories involved in the diagnostics

44 INTRODUCTION of cardiogenetics will probably first focus on developing procedures for targeted enrichment of cardiac disease genes. Depending on the design, these procedures will facilitate the parallel analysis of a minimum of ten (in ARVD/C ~10 causative genes are now known) and up to several hundred genes. Notably, cost analysis of re-sequencing many of the common genes has shown that this is much more cost effective than other current methods [27][28]. However, since the mutation detection rate will still be limited when targeted capturing is applied, we expect the genetic diagnostics of cardiac disease to quickly

move towards the exome sequencing of patient samples. As soon as exome CHAPTER 1 sequencing and its data analysis and interpretation becomes daily practice and the $1000 genome comes within reach [75][76], the re-sequencing of a complete genome will enter the field of clinical diagnostics. However, there are ethical and practical considerations around re-sequencing an entire exome or genome that should not be ignored (see also: [77]) and these are elaborated below. Potentially, although such investigations will be implemented to test patients referred for a particular disease (in our case, a cardiac disorder), exome or whole genome sequencing will also reveal mutations related to completely different diseases, or variants of unknown significance that might cause unnecessary anxiety. Thus, methods to either mask results or filter only those results relevant to the diagnostic request should be considered and the analysis of data outside the known disease genes should only be performed after informed consent given by the patient. Of course, we could choose to simply discard the irrelevant data, but the danger is that variants that appear unimportant today may be shown to be disease-related in the future. Together, these new technical advances demand that patients be comprehensively counseled. On the practical side, the most important consideration concerns the ascertainment of variants with uncertain clinical significance [78]. Undoubtedly, the only reasonable way to deal with this problem is to pursue maximum data dissemination in the scientific community. This will require the construction of databases for all or much of the data from exome and whole genome sequencing projects, like the recent 1000 genomes project [77][78]. This would serve as a reference database for benign variants, although the difficulty remains of how to decide whether a variant is benign or disease- causing. This means we also need databases to compile mutations and variants that are identified in patients and can be related to disease. These should preferably be disease-specific databases, rather than locus-specific databases, as the information should not be limited to information on the specific

NOVEL CLINICAL MOLECULAR DIAGNOSTIC METHODS 45 variant. In addition, information about the context in which the variant was identified should be documented, e.g. the phenotype/characteristics of the patient carrying the variant, the co-existence of variants in the same or other disease genes, the co-segregation of the variant with disease in the patient’s family, phenotypic details on other affected family members, the carriership frequency of the variant in a larger patient cohort, etc. Early initiatives in this direction have been reported in the last decade in the field of cardiogenetics, e.g. the database on genetic mutations in inherited arrhythmias [79]; the human database [80][81]; and the ARVD/C database [82][83]. In addition, data analysis pipelines should be developed and/or improved to combine all the available data on a specific variant, including the results of in silico prediction programs, to support the deciphering of a variant’s clinical significance. In order to accomplish this, clinical and molecular geneticists and bioinformaticians should collaborate closely to reach this major goal. In conclusion, although there are still several hurdles to be taken before NGS can be implemented in the clinical molecular diagnostics of genetic cardiac diseases, this new technology will soon be making an impact on molecular cardiogenetics. It is therefore necessary that molecular diagnosticians, clinical geneticists and genetic counselors, cardiologists and pediatric cardiologists and other physicians involved in the diagnosis and treatment of inherited cardiac diseases should start learning about NGS and get comfortable with this new technique. Finally, despite the technological, bioinformatical and ethical problems discussed here, the use of NGS technology will certainly lead to much improved and more effective diagnostic and preventive care for patients suffering from inherited and CHD and their relatives.

ACKNOWLEDGEMENTS We thank Jackie Senior for editorial assistance to the authors during the preparation of this manuscript.

REFERENCES 1. Gaunt RT, Lecutier MA. Familial cardiomegaly. 4. Lehnart SE, Ackerman MJ, Benson DW Jr et al. Br Heart J 1956;18:251-8 Inherited arrhythmias: a National Heart, Lung, 2. Paley DH, Familial cardiac arrhythmia. Trans and Blood Institute and Offi ce of Rare Diseas- Am Coll Cardiol 1952;2:216-26 es workshop consensus report about the diag- 3. Schwartz PJ. The congenital long QT syn- nosis, phenotyping, molecular mechanisms, dromes from genotype to phenotype: clinical and therapeutic approaches for primary car- implications. J Intern Med 2006;259:39-47 diomyopathies of gene mutations affecting ion channel function. Circulation 2007;116:2325-45

46 INTRODUCTION 5. Bos JM, Towbin JA, Ackerman MJ. Diagnostic, of dilated cardiomyopathy. N Engl J Med prognostic, and therapeutic implications of ge- 2000;343:1688-96 netic testing for hypertrophic cardiomyopathy. 18. Bienengraeber M, Olson TM, Selivanov VA et al. J Am Coll Cardiol 2009;54:201-11 ABCC9 mutations identified in human dilated 6. Ashrafian H, Watkins H. Reviews of transla- cardiomyopathy disrupt catalytic KATP chan- tional medicine and genomics in cardiovascular nel gating. Nat Genet 2004;36:382-7 disease: new disease taxonomy and therapeutic 19. McNair WP, Ku L, Taylor MR et al. SCN5A mu- implications cardiomyopathies: therapeutics tation associated with dilated cardiomyopathy, based on molecular phenotype. J Am Coll Car- conduction disorder, and arrhythmia. Circula- diol 2007;49:1251-64 tion 2004;110:2163-7 7. Pierpont ME, Basson CT, Benson DW Jr et al. 20. Kelly M, Semsarian C. Multiple mutations Genetic basis for congenital heart defects: cur- in genetic cardiovascular disease: a marker rent knowledge: a scientific statement from the of disease severity? Circ Cardiovasc Genet CHAPTER 1 American Heart Association Congenital Cardi- 2009;2:182-90 ac Defects Committee, Council on Cardiovas- 21. Hershberger RE, Cowan J, Morales A et al. Prog- cular Disease in the Young: endorsed by the ress with genetic cardiomyopathies: screening, American Academy of Pediatrics. Circulation counseling, and testing in dilated, hypertro- 2007;115:3015-38 phic, and arrhythmogenic right ventricular 8. Wessels M, Willems P. Genetic factors in dysplasia/cardiomyopathy. Circ Heart Fail non-syndromic congenital heart malforma- 2009;2:253-61 tions. Clin Genet 2010;78:103-23 22. Campuzano O, Beltrán-Alvarez P, Iglesias A et 9. Drenthen W, Boersma E, Balci A et al. Pre- al. Genetics and cardiac channelopathies. Gen- dictors of pregnancy complications in wom- et Med 2010;12:260-7 en with congenital heart disease. Eur Heart J 23. Hutchison CA 3rd. DNA sequencing: bench 2010;31:2124-32 to bedside and beyond. Nucleic Acids Res 10. Botto LD, Lin AE, Riehle-Colarusso T et al. 2007;35:6227-37 Birth Defects Seeking causes: Classifying and 24. Schouten JP, McElgunn CJ, Waaijer R et al. Rel- evaluating congenital heart defects in etiologic ative quantification of 40 nucleic acid sequenc- studies. Res A Clin Mol Teratol 2007;79:714-27 es by multiplex ligation-dependent probe am- 11. Kirk EP, Sunde M, Costa MW et al. Muta- plification.Nucleic Acids Res 2002;30:e57 tions in cardiac T-box factor gene TBX20 are 25. Koopmann TT, Alders M, Jongbloed RJ et al. associated with diverse cardiac pathologies, Long QT syndrome caused by a large duplica- including defects of septation and valvulo- tion in the KCNH2 (HERG) gene undetectable genesis and cardiomyopathy. Am J Hum Genet by current polymerase chain reaction-based 2007;81:280-91 exon-scanning methodologies. Heart Rhythm 12. Monserrat L, Hermida-Prieto M, Fernandez X 2006;3:52-5 et al. Mutation in the alpha-cardiac actin gene 26. Bhuiyan ZA, van den Berg MP, van Tintel- associated with apical hypertrophic cardiomy- en JP et al. Expanding spectrum of human opathy, left ventricular non-compaction, and RYR2-related disease: new electrocardiograph- septal defects. Eur Heart J 2007;28:1953-61 ic, structural, and genetic features. Circulation 13. Geisterfer-Lowrance AA, Kass S, Tanigawa G 2007;116:1569-76 et al. A molecular basis for familial hypertro- 27. Sen-Chowdhry S, Morgan RD, Chambers JC phic cardiomyopathy: a beta cardiac myosin et al. Arrhythmogenic cardiomyopathy: etiol- heavy chain gene missense mutation. Cell ogy, diagnosis, and treatment. Annu Rev Med 1990;62:999-1006 2010;61:233-53 14. www.orpha.net 28. Thienpont B, Mertens L, de Ravel T et al 15. www.ncbi.nlm.nih.gov/sites/GeneTests Submicroscopic chromosomal imbalanc- 16. van Spaendonck-Zwarts KY, van den Berg MP, es detected by array-CGH are a frequent van Tintelen JP. DNA analysis in inherited cause of congenital heart defects in select- cardiomyopathies: current status and clinical ed patients. Eur Heart J 2007;28:2778-84. relevance. Pacing Clin Electrophysiol 2008;31 29. Li F, Lisi EC, Wohler ES et al. 3q29 interstitial Suppl 1:S46-9 microdeletion syndrome: an inherited case as- 17. Kamisago M, Sharma SD, DePalma SR et al. Mu- sociated with cardiac defect and normal cogni- tations in sarcomere protein genes as a cause tion. Eur J Med Genet 2009;52:349-52

NOVEL CLINICAL MOLECULAR DIAGNOSTIC METHODS 47 30. Thienpont B, Zhang L, Postma AV et al. Hap- 46. Voelkerding KV, Dames S, Durtschi JD. Next loinsufficiency of TAB2 causes congenital heart generation sequencing for clinical diagnos- defects in humans. tics-principles and application to targeted rese- 31. Ahmad F, Seidman JG, Seidman CE. The genet- quencing for hypertrophic cardiomyopathy: a ic basis for cardiac remodeling. Annu Rev Ge- paper from the 2009 William Beaumont Hospi- nomics Hum Genet 2005;6:185-216 tal Symposium on Molecular Pathology. J Mol 32. Waldmüller S, Freund P, Mauch S et al. Diagn 2010;12:539-51 Low-density DNA microarrays are versatile 47. Meyer M, Stenzel U, Myles S et al. Targeted tools to screen for known mutations in hy- high-throughput sequencing of tagged nucleic pertrophic cardiomyopathy. Hum Mutat 2002 acid samples. Nucleic Acids Res 2007;35:e97 19:560-9 48. Parameswaran P, Jalili R, Tao L et al. A pyrose- 33. Waldmüller S, Müller M, Rackebrandt K et al. quencing-tailored nucleotide barcode de- Array-based resequencing assay for mutations sign unveils opportunities for large-scale causing hypertrophic cardiomyopathy. Clin sample multiplexing. Nucleic Acids Res Chem 2008;54:682-7 2007;35:e130 34. Fokstuen S, Lyle R, Munoz A et al. A DNA rese- 49. Ng SB, Turner EH, Robertson PD et al. Targeted quencing array for pathogenic mutation de- capture and massively parallel sequencing of 12 tection in hypertrophic cardiomyopathy. Hum human exomes. Nature 2009;461:272-6 Mutat 2008;29:879-85 50. Ng SB, Buckingham KJ, Lee C et al. Exome se- 35. Zimmerman RS, Cox S, Lakdawala NK et al. A quencing identifies the cause of a mendelian novel custom resequencing array for dilated disorder. Nat Genet 2010;42:30-5 cardiomyopathy. Genet Med 2010;12:268-78 51. Choi M, Scholl UI, Ji W et al. Genetic diagnosis 36. Wheeler DA, Srinivasan M, Egholm M et al. by whole exome capture and massively paral- The complete genome of an individual by lel DNA sequencing. Proc Natl Acad Sci USA massively parallel DNA sequencing. Nature 2009;106:19096-101 2008;452:872-6 52. Lalonde E, Albrecht S, Ha KC et al. Unex- 37. Shendure J, Ji H. Next-generation DNA se- pected allelic heterogeneity and spectrum of quencing. Nat Biotechnol 2008;26:1135-45 mutations in Fowler syndrome revealed by 38. Mardis ER. Next generation DNA sequencing next-generation exome sequencing. Hum Mu- methods. Annu Rev Genomics Hum Genet tat 2010;31:918-23 2008;9:387-402 53. Hoischen A, van Bon BW, Gilissen C et al. De 39. Metzker ML. Sequencing technologies - the novo mutations of SETBP1 cause Schinzel-Gie- next generation. Nat Rev Genet 2010;11:31-46. dion syndrome. Nat Genet 2010;42:483-5 40. Ashley EA, Butte AJ, Wheeler MT et al. Clinical 54. Sun Y, Almomani R, Aten E et al. Terminal os- assessment incorporating a personal genome. seous dysplasia is caused by a single recurrent Lancet. 2010;375:1525-35 mutation in the FLNA gene. Am J Hum Genet 2010;87:146-53 41. Venter JC, Adams MD, Myers EW et al. The sequence of the human genome. Science 55. Walsh T, Shahin H, Elkan-Miller T et al. Whole 2001;291:1304-51 exome sequencing and homozygosity mapping identify mutation in the cell polarity protein 42. Hodges E, Xuan Z, Balija V et al. Genome-wide GPSM2 as the cause of nonsyndromic hearing in situ exon capture for selective resequencing. loss DFNB82. Am J Hum Genet 2010;87:90-4 Nat Genet 2007;39:1522-7 56. van der Zwaag PA, van Tintelen JP, Gerbens F et 43. Gnirke A, Melnikov A, Maguire J et al. Solution al. Haplotype sharing test maps genes for famil- hybrid selection with ultra-long oligonucle- ial cardiomyopathies. Clin Genet 2010, in press otides for massively parallel targeted sequenc- ing. Nat Biotechnol 2009;27:182-9 57. Bowden DW, An SS, Palmer ND et al. Molecular basis of a linkage peak: exome sequencing and 44. Hoischen A, Gilissen C, Arts P et al. Massive- family-based analysis identify a rare genetic ly parallel sequencing of ataxia genes after ar- variant in the ADIPOQ gene in the IRAS Family ray-based enrichment. Hum Mutat 2010;31:494-9 Study. Hum Mol Genet. 2010, in press 45. Tewhey R, Warner JB, Nakano M et al. Mi- 58. Yoon S, Xuan Z, Makarov V et al. Sensitive and crodroplet-based PCR enrichment for large- accurate detection of copy number variants scale targeted sequencing. Nat Biotechnol using read depth of coverage. Genome Res 2009;27:1025-31 2009;19:1586-92

48 INTRODUCTION 59. Alkan C, Kidd JM, Marques-Bonet T et al. Per- 73. Blankenberg D, Gordon A, Von Kuster G et al. sonalized copy number and segmental duplica- Manipulation of FASTQ data with Galaxy. Bio- tion maps using next-generation sequencing. informatics 2010;26:1783-5 Nat Genet 2009;41:1061-7 74. Wang K, Li M, Hakonarson H. ANNOVAR: 60. Blow MJ, McCulley DJ, Li Z et al. ChIP-Seq functional annotation of genetic variants from identification of weakly conserved heart en- high-throughput sequencing data. Nucleic Ac- hancers. Nat Genet 2010;42:806-10 ids Res 2010;38:e164 61. Cirulli ET, Singh A, Shianna KV et al. Screen- 75. Qiu J, Hayden EC. Genomics sizes up. Nature ing the human exome: a comparison of whole 2008;451:234 genome and whole transcriptome sequencing. 76. Hayden EC. International genome project Genome Biol 2010;11:R57 launched. Nature 2008;451:378-9 62. Marioni JC, Mason CE, Mane SM et al. RNA- 77. ten Bosch JR, Grody WW. Keeping up with the CHAPTER 1 seq: an assessment of technical reproducibility next generation: massively parallel sequencing in and comparison with arrays. clinical diagnostics. J Mol Diagn 2008;10:484-92 Genome Res 2008;18:1509-17 78. Ho CY, MacRae CA. Defining the pathogenici- 63. Torres TT, Metta M, Ottenwälder B et al. Gene ty of DNA sequence variation. Circ Cardiovasc expression profiling by massively parallel se- Genet 2009;2:95-7 quencing. Genome Res 2008;18:172-7 79. www.fsm.it 64. Hoppman-Chaney N, Peterson LM, Klee EW et 80. www.interfil.org al. Evaluation of oligonucleotide sequence cap- 81. Szeverenyi I, Cassidy AJ, Chung CW et al. The ture arrays and comparison of next-generation Human Intermediate Filament Database: com- sequencing platforms for use in molecular di- prehensive information on a gene family in- agnostics. Clin Chem 2010;56:1297-306 volved in many human diseases. Hum Mutat 65. Farias-Hesson E, Erikson J, Atkins A et al. 2008;29:351-60 Semi-automated library preparation for 82. www.arvcdatabase.info high-throughput DNA sequencing platforms. J 83. van der Zwaag PA, Jongbloed JD, van den Berg Biomed Biotechnol 2010;2010:617469 MP et al. A genetic variants database for ar- 66. Ionita-Laza I, Lange C, M Laird N. Estimating rhythmogenic right ventricular dysplasia/car- the number of unseen variants in the human ge- diomyopathy. Hum Mutat 2009;30:1278-83 nome. Proc Natl Acad Sci U S A 2009;106:5008-13 67. Cooper GM, Goode DL, Ng SB et al. Single-nucle- otide evolutionary constraint scores highlight dis- ease-causing mutations. Nat Methods 2010;7:250-1 68. Pollard KS, Hubisz MJ, Rosenbloom KR et al. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res 2010;20:110-21 69. Ng SB, Bigham AW, Buckingham KJ et al. Ex- ome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nat Genet 2010;42:790-3 70. Xi T, Jones IM, Mohrenweiser HW. Many ami- no acid substitution variants identified in DNA repair genes during human population screen- ings are predicted to impact protein function. Genomics 2004;83:970-9 71. Li B, Krishnan VG, Mort ME et al. Automated inference of molecular mechanisms of disease from amino acid substitutions. Bioinformatics 2009;25:2744-50 72. De Schrijver JM, De Leeneer K, Lefever S et al. Analys- ing 454 amplicon resequencing experiments using the modular and database oriented Variant Identi- fication Pipeline. BMC Bioinformatics 2010;11:269

NOVEL CLINICAL MOLECULAR DIAGNOSTIC METHODS 49

CHAPTER 2 CANDIDATE GENE SCREENING

Chapter 2.1

Mutational characterisation of RBM20 in dilated cardiomyopathy and other cardiomyopathy subtypes

Anna Posafalvi, Ludolf G Boven, Cindy Weidijk, Paul A van der Zwaag, Jan G Post, Karin Y van Spaendonck-Zwarts, Imke Christiaans, Maarten P van den Berg, Robert MW Hofstra, Gerard J te Meerman, Richard J Sinke, J Peter van Tintelen, Jan DH Jongbloed

Manuscript in preparation ABSTRACT Introduction: Dilated cardiomyopathy (DCM) is an insidious disease of the myocardium, leading to impaired heart function. RBM20, a recently discovered gene associated with DCM encodes the RNA-binding motif protein 20. It is involved in the tissue-specific splicing of titin and several other proteins with an essential function in the heart, and is known to have a five amino acid mutation hotspot in exon 9. Objectives: Our aim was to perform mutation screening of RBM20 in a Dutch cardiomyopathy patient cohort, and to design a spicing assay for evaluation of the pathogenicity of the identified variants. Methods and results: We performed mutation screening of RBM20 by Sanger (DCM patients only; n=436) and gene-panel based (targeted) next generation sequencing (all cardiomyopathy subtypes included; n=1311 patients). This resulted in the identification of 18 novel and 5 known, rare missense variants in 35 probands. In total 10 likely pathogenic or pathogenic missense mutations were identified: 7 missense variants in the RS-rich domain, 1 novel missense variant of the RNA-recognition motif and 2 outside of these important domains. Peripartum cardiomyopathy (PPCM) was observed in two DCM families carrying RBM20 mutations (p.R636H and p.S637N). A differential splicing assay of the known RBM20-target LDB3 was developed to evaluate the predicted pathogenicity of 11 missense variants, but the results were inconclusive. Conclusion: Our study identified a significant number of novel and potentially pathogenic RBM20 mutations, particularly in the RS domain. In addition, known hotspot mutations were identified. Analysis of all cardiomyopathy subtypes by next generation sequencing revealed that likely pathogenic or pathogenic mutations were found almost exclusively in DCM patients. The RBM20 mutations that we found in patients with PPCM were not unexpected as previous studies had shown TTN mutations are a frequent cause of PPCM. They lead to its shifted isoform composition, and TTN is the best characterized splicing target of RBM20.

Keywords: dilated cardiomyopathy, RBM20, genetic screening, differential splicing assay INTRODUCTION The RNA-binding motif protein 20 gene, RBM20, is a known disease gene in dilated cardiomyopathy (DCM) (Brauch et al, Li et al 2010, Millat et al, Refaat et al, Guo et al 2012, Wells et al). It was initially reported as missense mutations clustering in a hot-spot between amino acids 634-638, encoding an RS-rich domain of unknown function. It has also been observed that RBM20 carriers sometimes exhibit an unusually severe phenotype, or early onset of the disease (Brauch et al). Recent studies showed that RBM20 is involved in the heart-specific splicing of the mRNA molecules of several target genes. Some of these targets, such as the sarcomeric giant titin gene (TTN), as well as the cytoskeletal LIM domain binding 3 gene (LDB3), the calcium/-dependent protein kinase II delta gene (CAMK2D), and the gene encoding one of the voltage- dependent calcium channel proteins involved in membrane polarisation CHAPTER 2.1 (CACNA1C) (Guo et al 2012) have been linked to cardiomyopathy but also to other cardiac phenotypes. Li et al (2013) demonstrated complex exon skipping and shuffling patterns of titin in Rbm20-/- rats were demonstrated. These animals were shown to suffer from ultrastructural changes in the heart, including Z line streaming, abnormal myofibril width and orientation, and aggregation of mitochondria (Guo et al 2013). A more recent study crosslinking RBM20 molecules to their RNA targets, followed by immunoprecipitation and RNA sequencing (CLIP-Seq) suggests a role for RBM20 in differential splicing of PDLIM3, LMO7, RTN4 and RYR2 as well, while confirming its previously observed role in splicing of CAMK2D, LDB3 and TTN (Maatz et al). Finally, the phenotypic influence ofRBM20 knock down was studied in a mouse stem cell model, which led to the identification of intracellular enrichment of actin stress fibres and thin, elongated sarcomeric ultrastructures during cardiac differentiation, an observation that agrees with the abnormal sarcomere geometry in the thin, weakened myocardium of DCM patients (Beraldi et al). This study also showed disturbed Ca2+ handling in shRBM20 cardiomyocytes and abnormal splicing of TTN and CAMK2D. To date, the extent to which RBM20 contributes to DCM in the Netherlands has not been studied, nor has the putative role of the gene in other cardiomyopathy subtypes. The aim of this study was to determine if RBM20 mutations are responsible for DCM and other types of cardiomyopathy in the Dutch population and to gain functional evidence for the pathogenicity of some of the identified variants.

RBM20 IN DILATED CARDIOMYOPATHY 55 METHODS Mutational screening Patients: Written informed consent was obtained from index patients and their relatives involved in the first phase of the study with approval of the medical ethics committees of the participating hospitals. Patients included in this phase fulfilled the formal diagnostic criteria for DCM (Mestroni et al). Cardiomyopathy patients in the second phase of this study were evaluated in a standard diagnostic setting. The clinical diagnosis of these patients was inferred from the referral diagnosis to our laboratory, and included a variety of cardiomyopathy subtypes. DNA sequencing: Peripheral blood was collected from cardiomyopathy patients and their relatives when applicable. DNA was isolated according to the standard protocols. In the first phase of this study, Sanger sequencing of whole or part of the RBM20 gene was performed in 436 DCM patients. For this purpose, 19 amplicons covering all exons and the flanking intronic regions of the RBM20 gene were amplified by AmpliTaq Gold (Applied Biosystems, Thermo Fisher Scientific, Waltham, MA, USA) using a standardized PCR protocol (for primer sequences see also table S1). After purification, the PCR products were sequenced on an ABI3730xI sequencer (Applied Biosystems, Thermo Fisher Scientific, Waltham, MA, USA). Resulting sequences were visualized for analysis by Mutation Surveyor software (SoftGenetics LLC, State College, PA, USA). In the second phase of this study, patients’ DNA (n=1311) was analyzed using our gene-panel-based targeted next generation sequencing (NGS) method as described previously (Sikkema-Raddatz et al, chapters 4.1 and 4.2), and including RBM20. Variant classification: The identified genetic variants were classified as ‘benign’ (B), ‘likely benign’ (LB), ‘variant of unknown significance’ (VOUS), ‘likely pathogenic’ (LP), and ‘pathogenic’ (P) as described in chapter 4.1. In short, our classification was based on the changes in the physicochemical nature of the affected amino acid residues, the evolutionary conservation of the respective residue and the region harbouring the variant, the predicted pathogenicity using various software (such as AGVGD, PolyPhen, SIFT, and MutationTaster), and data from literature and online databases when available. Variant frequency information from additional databases (dbSNP, ExAC (including 1000 Genomes and ESP6500), and GoNL) was taken into consideration. Marker analysis: Several genetic markers (and their respective primers) were selected in the region both 5cM up- and down-stream of the R634W

56 SANGER SEQUENCING mutation from the deCODE high-resolution genetic map, as shown in table S2a. Lengths of PCR products were determined on the ABI 3730xl DNA Analyzer (Applied Biosystems, Thermo Fisher Scientific, Waltham, MA, USA) and the results analyzed using the GeneMapper v 4.1 software (Applied Biosystems, Thermo Fisher Scientific, Waltham, MA, USA).

Functional analysis Plasmids: Flag-tagged, kanamycine-resistant pCMV6-Entry vectors con- taining the human RBM20 cDNA sequences (wild type, and two mutants: R634P and Y681C) were obtained from OriGene Technologies Inc (Rockville, MD, USA). Site-directed mutagenesis was applied to introduce additional variants as follows: (1) primer pairs were designed that could anneal back to back to the plasmid and for this purpose were phosphorylated at the 5’ end, then the desired mutation was introduced in the middle of the forward or reverse primer, CHAPTER 2.1 with about 10-15 perfectly matched nucleotides upstream and downstream of the mutated nucleotide; (2) the mutation was introduced with a PCR reaction, using 2.5ng wild-type vector, the mutation specific primers and the Phusion Site-Directed Mutagenesis Kit (Thermo Fisher Scientific, Waltham, MA, USA); (3) PCR products were visualized on 1% agarose gel and, when produced in sufficient amounts, circularized by Quick T4 DNA ligase (New England Biolabs, Ipswich, MA, USA); (4) plasmids were transformed into competent DH5α E. coli cells, colonies selected for kanamycine resistance were mini-cultured and isolated using the GeneJET Plasmid Miniprep Kit (Thermo Fisher Scientific, Waltham, MA, USA) kit; (5) plasmids were Sanger-sequenced and the presence of the introduced mutations confirmed. Tables S3 and S4 show the primers used for site-directed mutagenesis and prior and subsequent plasmid sequencing. Transfection, cell lines: HEK293 cells were cultured in a 6-well culture plate in DMEM containing 10% foetal bovine serum and 1% penicillin- streptomycin. Transfection was performed using 150 µl serum-free DMEM including 1 µg plasmid DNA and 5µl polyethilenimine per sample. Cells were cultured at 37oC. RNA was isolated with the standard Trizol method 48 hours after transfection, and subsequently reverse transcribed into cDNA by using oligo(dT)18 primers and the RevertAid H Minus First Strand cDNA Synthesis Kit (Fermentas, Thermo Fisher Scientific, Waltham, MA, USA) according to the manufacturer’s instruction. Differential splicing assay: First, PCR and gel electrophoresis of cDNA samples was performed to enable several quality checks, including

RBM20 IN DILATED CARDIOMYOPATHY 57 analysis of the expression of house-keeping genes and RBM20 mRNA, and sequencing RBM20 to confirm the presence of introduced mutations (PCR and sequencing reactions were performed as described above; the primers used for the analysis of RBM20 expression are shown in table S5). As several putative target mRNA molecules were available for analyses of potential effects ofRBM20 mutations on differential splicing, results of these molecules in non-transfected (endogenous), wild type and hotspot mutation (R634P) transfected HEK293 cells were evaluated (primers designed for amplification of target mRNA molecules are shown in table S6). Based on this evaluation, we selected LDB3 for further follow up. As a start, we extracted separated PCR products of various lengths from agarose gel and sequenced them. Next, PCR conditions for amplification of LDB3 were optimised to enable termination of amplification before reaching saturating conditions (36 cycles; 62oC annealing), while allowing parallel quantification of several PCR products of different fragment sizes. Then, 16 cDNA samples originating from transfected HEK293 cells (2 of non-transfected HEK293 cells, 2 of cells transfected with plasmid containing wild type RBM20, 11 of cells transfected with plasmids carrying a single nucleotide variant each, and 1 non-template control) were placed on a 96 well PCR plate in triplo (PCR experimental replicates). To avoid possible plate-position-related effects, samples were distributed according to a computer generated random position. All LDB3 mRNA products were PCR- amplified. Resulting PCR products were then purified using the High Pure PCR Product Purification Kit (Roche Applied Science, Penzberg, Germany). Subsequently, aliquots of these samples were transferred to a 384-well plate and loaded on the Caliper GX capillary electrophoresis system (Perkin Elmer, Waltham, MA, USA), which separates PCR products by size and quantifies and visualizes the separated products. Each sample was quantified three times (capillary electrophoresis experimental replicates). Data analysis: We first performed peak detection of the three dominant product sizes that correspond to the previously sequenced fragments (the peaks were detected at an approximate fragment length corresponding to the 245, 472 and 613 bp long PCR products). The position of the peak was determined and an area under the curve (AUC) was determined by adding the intensity values +/- 6 bases from the peak position. Thus 9 data points were available for each peak of each wild type/mutant/control transfected sample (3x PCR and 3x capillary electrophoresis experimental replicates). Normalization was performed by setting the value of the AUC to 100 and assigning the proportional value to the

58 SANGER SEQUENCING other peaks. We used the group of certainly benign (wild-type) and the group of certainly pathogenic (R634P, R634W, R636H) transfected cells as controls during the data analysis, and compared the splicing pattern of cells transfected with the remaining variants of uncertain significance to these two control groups. Analysis of variance was used to evaluate differences between cells.

RESULTS Mutational screening To study the contribution of RBM20 mutations to the development of cardiomyopathy, specifically DCM, in the Dutch population, we analyzed this gene by Sanger sequencing and targeted NGS (as part of a panel of 55 cardiomyopathy genes). Initially, we studied RBM20 in 62 DCM patients by Sanger-sequencing CHAPTER 2.1 all exons and the flanking intronic sequence. In an additional cohort of 374 patients, we performed sequencing of amplicons in which potentially interesting genetic variants were previously identified (in the literature or in our 62 patients). Exon 9 was chosen since it encodes the RS-rich (Arginine and Serine-rich) domain and carries the known mutation hot-spot of five neighbouring amino acids between positions 634-638 (Brauch et al). Exons 6 and 7 were chosen because they encode the RNA-recognition motif (RRM) domain and a putative nuclear localization signal between c.1770-1842 (Filippello et al). The RRM domain, in particular, is highly homologous to RRMs of other spliceosomal and SR-proteins, is expected to play an essential role in the protein function and reported by Li et al (2010) to carry one missense variant. Finally, exon 11 was chosen because it harbours the missense variant D888N reported by Refaat et al as causative mutation. However, when our targeted NGS method was implemented into our routine diagnostics (Sikkema-Raddatz; chapter 4.1), we also evaluated the contribution of RBM20 mutations to both DCM and other cardiomyopathy subtypes for 1311 patients. In this study, we focus only on those patients who had interesting RBM20 variants. In 35 patients, we identified 23 missense variants (18 novel) that were either absent or present at very low frequency (MAF<0.0005) in the ExAC (http://exac.broadinstitute.org) and/or the GoNL (http://www.nlgenome.nl) databases. Table 1 contains clinical information is given on RBM20 patients and their family members who were identified within the initial phase (Sanger

RBM20 IN DILATED CARDIOMYOPATHY 59 sequencing of DCM patients), while table 2 summarizes information on all the potentially interesting variants identified in this study, including their classification. As expected from their low population frequency, all variants were classified, at minimum, as VOUS (table 2). Together, 13 variants were classified as VOUS, 7 as LP and 3 as P (hot-spot mutations). Six variants were identified in two or more patients, including the pathogenic R634W mutation identified in 3 different patients and the likely pathogenic mutations Y681C and Y1193C identified in 5 and 3 patients, respectively. The variants found in the known mutation hot-spot (R634P, R634W, and R636H) of the RS-rich domain (residues R632-S654) were classified as “pathogenic” not only as a consequence of their high evolutionary conservation, complete absence in control populations, and pathogenic predictions by multiple in silico programmes, but also due to several previous reports on hotspot mutations leading to an unusually severe phenotype and cosegregation with disease in several families, including the R634P mutation in our studies (table 1). In addition to the mutations mentioned above, we identified four other novel, likely pathogenic mutations in residues of the RS domain: R623Q, R632K, P633L, and S637N. We also identified likely pathogenic mutations outside the RS domain: V535L, Y681C, and Y1193C. The novel missense variant V535L resides in the RRM, a region in which only one genetic variant affecting the same amino acid position (V535I) was previously reported. We considered this variant as “likely pathogenic”, because it is located in an area of very high evolutionary conservation, is not found in any population database, and is predicted to be pathogenic by three of the four prediction programmes we used (PolyPhen, SIFT, MutationTaster). The Y681C mutation is located just outside the RS- rich domain, affects a very conserved residue residing in a significantly conserved region and is predominantly predicted to be a harmful variant. It has been identified in one individual in the GoNL population and twice (MAF=0.0001028) in the ExAC population. Finally, Y1193C resides at the end of a Zinc-finger(like) domain (residues 1158-1193), affects a conserved residue and is surrounded by conserved residues, is predicted to be pathogenic by three out of four prediction programs we used, and has never been identified in the control populations of GoNL and ExAC. In addition to the likely pathogenic and pathogenic mutations described above, 13 missense variants were detected outside the RBM20 domains or regions of known importance and were classified as VOUS (table 2). It is

60 SANGER SEQUENCING Table 1. Clinical information of RBM20 families identified during phase I; Sanger sequencing. The mutations as well as the result of segregation analysis, when applicable, are indicated.

Year of RBM20 Age at clinical Family gender clinical details birth mutation diagnosis status family 1 1935 M unknown 66 affected DCM 1961 M p.L100F 40 affected DCM family 2 1942 F p.R634P 33 affected LV-dysfunction; LVEF 11% (63yr); NSVT; ICD (64yr) 1946 M no sample 58 affected DCM; died at age 62yr 1938 M no sample na affected? SCD 41yr 1967 M p.R634P 39 affected DCM; NSVT; PVCs; ICD (39yr) 1969 M p.R634P 31 affected? NSVT; LVEF 53%; PVCs 1910 F no sample na affected? SCD 46yr DCM; HTX (53yr); hypertrophic cardiomyocytes family 3 1934 M no sample 50 affected on histology severe atherosclerosis; LVEF 35% (72yr); VT, MI 1929 M none 72 unclear (72yr), ICD (72yr); AF (79yr) CHAPTER 2.1 1959 F p.S637N 35 affected PPCM (35yr); VT; HTX (37yr) DCM (16yr); NSVT; PVCs; ICD and MI (17yr); HTX 1992 F p.S637N 16 affected (19yr) family 4 1945 M p.Y681C 56 affected DCM; NSVT; PVCs; AF (59yr) 1974 M non-carrier 21 affected DCM; AF (20); ICD (30) DCM; AF; died at age 45yr; interstitial fibrosis 1942 M no sample 44 affected on histology 1907 M no sample healthy Died at age 59yr family 5 1947 p.V535L 57 affected LV dilatation; AVB; sporadic case family 6 1938 p.R634W 55 affected LV dysfunction; LVEF 35% (67yr); died at age 70yr no sample affected LV dilatation; died at age 67yr no sample affected LV dilatation; died at age 40yr unknown affected DCM p.R634W and family 7 1978 20 affected DCM p.G672S family 8 1944 F p.R636H 47 affected DCM leading to HTX Died at age 23yr (PPCM); cardiomegaly (post 1968 F p.R636H 23 affected mortem) Abbreviations: AF – atrial fibrillation, AVB – atrioventricular block, DCM - dilated cardiomyopathy, HTX – heart transplantation, ICD – implantable cardioverter-defibrillator, LV – left ventricle, LVEF – left ventricular ejection fraction, MI - myocardial infarction, NSVT – non-sustained ventricular tachycardia, PPCM – peripartum cardiomyopathy, PVC - premature ventricular contraction, SCD – sudden cardiac death, VT – ventricular tachycardia important to note that a considerable subset of these 13 variants were identified in non-DCM patients, mainly those with hypertrophic cardiomyopathy (HCM). This is in contrast to our likely pathogenic and pathogenic mutations, which were almost exclusively identified in DCM patients, with the only exception being the Y681C and Y1193C mutations identified in both DCM and HCM patients. In addition, two mutations, the likely pathogenic S637N and the pathogenic R636H, were identified in DCM families in which peripartum

RBM20 IN DILATED CARDIOMYOPATHY 61 Myosin, Myosin, sodium

yes yes HGMD MYH6 – MYH6

SCN5A – SCN5A yes; other aa change yes; yes; other aa change yes; other aa change yes; Genesymbols: P P P LP LP LP LP LP LP LP VOUS VOUS VOUS VOUS VOUS VOUS VOUS VOUS VOUS VOUS VOUS VOUS VOUS Phospholamban, Phospholamban, classification PLN – The table includes information on table includes information The 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0002682 0.0004599 0.0001028 0.00004647 0.00005045 0.00004516 0.00006948 MAF ExAC dbase ExAC MAF (in 1 pat) Nexilin (F actin binding protein), (F actinNexilin binding protein), (in 1 pat)

MYH7 MYH7 MYH6 (in 1 pat) TNNT2 NEXN – & LP LP & PLN P other mutations NEXN SCN5A LP LP 3 (DCM) 2 (DCM) 1 (HCM) 1 (HCM) 1 (HCM) 1 (HCM) 1 (DCM) 1 (DCM) 1 (DCM) 1 (DCM) 1 (DCM) 1 (DCM) 1 (DCM) 1 (unsp) 1 (unsp) 1 (unsp) subtype) 2 (DCM; HCM) 1 (PPCM/DCM) 1 (PPCM/DCM) 5 (4 DCM; 1 HCM) 3 (2 DCM; 1 HCM) 2 (1 DCM; 1 HCM) 2 (1 DCM; 1 HCM) Troponin T type 2 (cardiac). Troponin # of patients (CM (CM # of patients T T T T T T T T T T T T T T T T S S S/T S/T S/T S/T S/T TNNT2 – method Myosin, heavy chain 7, cardiac muscle, beta, muscle, heavyMyosin, chain 7, cardiac MYH7 – MYH7 10:112557341 10:112557371 10:112541217 10:112572197 10:112572427 10:112541293 10:112572053 10:112572173 10:112540665 10:112595675 10:112572055 10:112572065 10:112572512 10:112581682 10:112572062 10:112544579 10:112572169 10:112572056 10:112572050 10:112595630 10:112570208 10:112572178 10:112581138 Genomic coord (37) Genomic coord variants identified by Sanger sequencing (S) and targeted sequencing by Sanger sequencing (S) and targeted identified (T) variants CM – cardiomyopathy, DCM – dilated cardiomyopathy, HCM – hypertrophic cardiomyopathy, HGMD – Human Gene Mutation Database, LP: likely LP: HGMD – Human Gene Database, Mutation HCM – hypertrophic cardiomyopathy, cardiomyopathy, DCM – dilated CM – cardiomyopathy, c.298C>T c.850G>A c.926G>A c1900C>T c1901G>C c.1898C>T c.3623C>T c.3305T>C c1907G>A c1910G>A c2014G>A c.2761A>T c2042A>G c.2023T>A RBM20 c.1603G>C c.1459G>A c.1633G>A c.1868G>A c.1895G>A c.2018G>A c.2272G>A c.2357A>G c.3578A>G

cDNA coord I921F V545I L100F P633L S675T V535L R634P R632K Y681C G672S G758S G309E S637N R636H G284R R623Q R673Q D786G R634W V487M Y1193C V1102A A1208V Protein Protein change heavy chain 6, cardiac muscle, alpha, muscle, heavy chain 6, cardiac pathogenic, MAF – minor allele frequency, P – pathogenic, PPCM – peripartum cardiomyopathy, VOUS – variant of unknown significance. unknownof variant VOUS – significance. – peripartumPPCM pathogenic, P – cardiomyopathy, frequency, allele – minor MAF pathogenic, each mutation, the method by which the mutation was identified, the number of patients carrying the number of patients identified, and their particular was the mutation which the mutation the method by each mutation, population the frequency in the ExAC mutations, carriership or pathogenic of other likely pathogenic putative subtype, cardiomyopathy channel, voltage gated, type V alpha subunit and type gated, voltage channel, MutationTaster softwares) and allele frequencies (in dbSNP, ExAC, and GoNL) were taken into consideration for classification. classification. for consideration taken into and GoNL) were ExAC, (in dbSNP, softwares) and allele frequencies MutationTaster and the respective classification of the mutations. Evolutionary conservation, predicted pathogenicity (by AGVGD, SIFT, PolyPhen2 and PolyPhen2 SIFT, EvolutionaryAGVGD, conservation, of the mutations. pathogenicity predicted classification (by and the respective

Abbreviations: Table 2. Table

62 SANGER SEQUENCING cardiomyopathy (PPCM) was also diagnosed. We also identified the missense variant D888N, which was formerly considered to be pathogenic (Refaat et al). However, we anticipate that D888N is a rare polymorphism, as it was only identified in 6 of our 374 Sanger-sequenced DCM patients (0.16%), in 13 of our 1200 NGS-analyzed patients (0.11%), and in comparably low frequencies in control populations: 0.18% in Dutch controls (GoNL; 9 in 996 alleles, which means 9/498 healthy individuals) and 0.28% in the ExAC database, D888N is therefore not included in summary table 2.

Cosegregation and haplotype analysis In cases where DNA of affected family members was available, we studied the carriership status of those individuals (for details, see table 1). We were able to show co-segregation of mutations R634P and R636H with the disease phenotype in the respective families. We did not find co-segregation of Y681C CHAPTER 2.1 in the one small family that was available for testing. We also identified unrelated index patients carrying the same variants: three patients carrying the pathogenic mutation R634W and five patients having the likely pathogenic mutation Y681C. Although no family relation between these patients was known, we hypothesized that these mutations were inherited from common ancestors (founders). Therefore, we performed haplotype analysis using markers within the approximately ±5cM region surrounding the respective mutations. These studies revealed that the patients carrying the R634W mutation share a relatively large haplotype (see table S2b; results shown for two of the three patients), suggesting that the mutation originated from a common founder. In contrast, no shared haplotype could be identified for the Y681C mutation (data not shown). Together with lack of cosegregation of this variant in the small family studied, the fact that it was also found in 1/996 alleles in the GoNL database, and that one of the Y681C patients is also carrying a certainly pathogenic PLN deletion, these results suggest that Y681C is less likely to be pathogenic. However, more data is needed to verify this conclusion.

Functional evaluation In order to evaluate our classification of variants L100F, V535L, R634P, R634W, R636H, G672S and Y681C using a functional approach, we designed a differential splicing assay. In addition, we included the likely benign variants W768S, W768L and the D888N variant, which was formerly reported

RBM20 IN DILATED CARDIOMYOPATHY 63 as pathogenic but that we now designate as benign. For this purpose, we transfected HEK293 cells with wild type and “mutated” human RBM20 cDNA expressing vectors. The RNA isolated from these cells was reverse transcribed, and we then studied potential differences in the composition of isoforms of putative splicing targets of RBM20 (Guo et al 2012, Maatz et al) or other spliceosomal proteins. For this purpose, primers were designed for the CAMK2D, CAMK2G, LDB3, SH3KBP1, SORBS1(1), SORBS1(2), TNNT2, TPM1 and TRDN targets, and differences in splicing patterns in presence of wild type, R634P, or endogenous RBM20 production were analyzed. Out of the nine potential targets, only LDB3 showed clear effects in this assay. We therefore decided to continue evaluating differential splicing of the known cardiomyopathy gene (Vatta et al) and RBM20 target (Guo et al 2012, Maatz et al) LDB3. We first sequenced the three different length PCR products of LDB3 detected on agarose gel, using the primers corresponding to 3’ sequences of exon 3 and 5’ sequences of exon 7, respectively. The two longer products we identified were shown to correspond to transcripts NM_001080114.1 and NM_001080116.1 (product of 472 nucleotides, including exon 3, 5, 6 and 7), and transcripts NM_0070782 and NM_001080115.1 (product of 614 nucleotides, including exon 3, 4 and 7). However, the shorter 245 nucleotide product did not correspond to the remaining isoforms NM_001171610.1 and NM_001171611.1 and only contained sequences of exon 3 and 7 (see also figure 1). Next, we analyzed the resulting LDB3 products in cDNA derived from HEK293 cells expressing the different RBM20 mutations/variants. A pre- liminary screening (a PCR under saturating conditions followed by gel electrophoresis) did not show an obvious presence/absence of certain LDB3 products when comparing wild type cells with cells expressing one of the hotspot mutations. However, as we observed subtle differences in LDB3 product intensities between the different samples (data not shown), subsequent analyses were aimed at quantifying the respective products under non-saturating conditions. Unfortunately, this did not lead to the identification of significant differences in product intensities between cells carrying the vector expressing wild type or certainly pathogenic mutations of RBM20 (figure 2). As shown in figure 2, we did observe some differences in the product intensity patterns between non-transfected, wild type transfected and mutation (R634P(1), R634P(2), R634W and R636H) transfected cells, although this was less apparent for the R636H transfected cells. The most prominent effect we saw was relatively high amounts of the 613bp product in non-

64 SANGER SEQUENCING CHAPTER 2.1

Figure 1. LDB3 splicing products identified in transfected HEK293 cells. PCR amplified and sequenced LDB3 transcripts expressed by the HEK293 cells are shown. Two fragments correspond to known LDB3 transcripts/isoforms (472bp and 613bp), while the exon composition of the shortest one (245bp) does not correspond to known transcripts. The known transcripts of LDB3 have the following NCBI IDs: transcript/isoform 1 – NM_007078.2, 2 – NM_001080114.1, 3 – NM_001080115.1, 4 – NM_001080116.1 transfected cells, while we observed high proportions (90-100%) of the 245 bp product in wild type transfected cells and slightly smaller proportions (<80%) of this short product in mutation transfected cells (with the exception of R636H) (figure 2). In most of the other cases (G672S, W768S, W768L and D888N transfected cells) the product intensity patterns resembled that of wild type transfected cells. The only exception to this was V535L transfected cells, which looked more like those of the certainly pathogenic mutation expressing cells, suggesting that this variant may be pathogenic as well. During our analysis of variants we grouped together the certainly pathogenic, wild type, non-transfected and unknown variants. A significant effect of the base position was present, but there was no significant effect of the baseposition x mutation type interaction. As an illustration, the results for the interaction are shown in figure 3. In summary, we were not able to show the presence of different splicing products or a shift in isoform composition among the samples using our in vitro splicing assay.

RBM20 IN DILATED CARDIOMYOPATHY 65 Figure 2. Individual LDB3 splicing plots of transfected and non-transfected HEK293 cells. On each plot the X axis shows the basepositions (corresponding to the 245, 473 and 613 bp long fragments), while the associated area under the curve (AUC) values are indicated on the Y axis. Samples marked with * are biological replicates.

DISCUSSION In this study, we aimed to identify and functionally evaluate mutations in the RBM20 gene in DCM patients, as well as in patients suffering from other subtypes of cardiomyopathy. This led to the identification of 23 different rare variants that might be involved in disease development. Importantly, the 10 mutations in 17 patients that were classified as likely pathogenic or pathogenic were almost exclusively found in DCM patients, underscoring the idea that RBM20 mutations are associated with the development of this cardiomyopathy

66 SANGER SEQUENCING Figure 3. Grouped LDB3 splicing plots of transfected HEK293 cells. CHAPTER 2.1 subtype. Notably, a few non-hotspot variants of unknown significance or likely pathogenicity were found in HCM, or in both HCM and DCM patients. Moreover, the identification of two likely pathogenic or pathogenic mutations in PPCM cases suggests that RBM20 mutations could be specifically involved in the development of this particular manifestation of familial DCM as previously reported for the TTN gene (van Spaendonck-Zwarts et al). Of the 23 variants we identified, 3 were classified as pathogenic mutations, 7 as likely pathogenic mutations and 13 as variants with unknown clinical significance. Based on its frequency in controls, we classified variant D888N, which was formerly reported to be pathogenic, as a rare polymorphism. To gain more insight into the putative pathogenicity of the promising variants we found, we performed additional tests such as segregation and haplotype analyses. When DNA sample from family members was available, we performed segregation analysis for the mutations found in the index patients. We found the mutations R634P and R636H co-segregated with the disease phenotype, while the Y681C variant did not co-segregate in the one family we could further investigate. One of the most interesting findings was the identification of the S637N mutation in a family with PPCM and DCM. However, the segregation analysis in this family could not prove that this novel RBM20 hot spot mutation (found in the PPCM patient and her DCM-affected daughter, but not in the grandfather who had possible DCM; Spaendonck-Zwarts et al) played and exclusive disease-causing role, leading to our classification of it as a VOUS. It is possible, that the RBM20 mutation together with another not-

RBM20 IN DILATED CARDIOMYOPATHY 67 yet-identified genetic mutation could explain the disease in this family, with the RBM20 mutation contributing to the PPCM phenotype. In a few other cases, the RBM20 variant was shown to be inherited in combination with other likely pathogenic mutations, e.g. in MYH7 or SCN5A (table 2). Such digenic and oligogenic inheritance is increasingly observed in various types of cardiomyopathy (Bauce et al, Xu et al 2010, Nakajima et al, Roncarati et al, Bao et al, Rigato et al, Pugh et al, Haas et al, chapter 2.2 and chapter 4.1). Likewise, the RBM20 mutation carriers published by Li et al (2010) carry other variants in LDB3 and LMNA as well. For unrelated patients who carried the same mutation, we performed marker analysis in order to check if the mutation is part of the same relatively large haplotype. This resulted in linking three index patients carrying the same R634W mutation, which suggests that they have inherited it from a common ancestor. Finally, in order to be able to further evaluate the pathogenicity level of the mutations found in our patients, we performed functional experiments. Several recent papers indicated that RBM20 has an important role in splicing dynamics of RNA molecules encoding cardiomyopathy-related proteins, mainly by studying these processes under in vivo conditions both in the presence of wild type or mutated RBM20 (Guo et al 2012, Guo et al 2013, Maatz et al). However, there were no myocardial biopsies available from the patients with RBM20 mutations in our study that could be used to evaluate the effect of these mutations on differential splicing in vivo. Therefore, it was our aim to design an in vitro assay in which this could be easily tested but not requiring an invasive procedure. Unfortunately, although our first experiments on transcripts of the validated RBM20-target LDB3 in HEK293 cells indicated minor differences in isoform intensities between samples, no significant differences were apparent upon quantification of the data and the results were not conclusive. One explanation for this could be that the cell line used in this assay has an embryonic kidney origin rather than being derived from embryonic or adult cardiomyocytes. If this were the case, splicing products resulting from compensating mechanisms and/or splicing procedures diverging from regular cardiac splicing may have concealed the true RBM20-related splicing effects. For example, one of theLDB3 transcripts we identified both in non-transfected and in wild type and mutated RBM20- transfected HEK293 cells does not correspond to the known isoforms, and it lacks the alternative Zasp-like motifs (exons 4, 5 or 6) which play a role in

68 SANGER SEQUENCING actin-binding (Huang et al, Klaavuniemi et al). Hence, it might not be the ideal system for modelling cardiomyopathy. On the other hand, it is likely that RBM20 overexpression conditions in our assay influence the outcome more than the effects of the RBM20 mutations, as Maatz et al have recently shown dramatic differences in the splicing ofTTN , RYR2, CAMK2D and LDB3 between heart failure patients with relatively high and low expression levels of endogenous RBM20. Moreover, the same study also showed that patients exhibiting high expression of wild type RBM20 had similar LDB3 splicing patterns to a DCM patient who carried the RBM20 mutation S635A. The fact that differences could be observed between LDB3 splicing patterns of the non-transfected (endogenous RBM20 expressing) and the transfected (wild type or mutant RBM20 overexpressing) HEK293 cells during our functional in vitro assay suggests that RBM20 overexpression, regardless of its wild type or

mutant sequence, affects differentialLDB3 splicing to an extent that conceals CHAPTER 2.1 mutation effects, and thus hampers the evaluation of pathogenicity levels of the differentRBM20 variants. An intriguing finding is that in two RBM20-mutation-carrying families in our study, those carrying the S637N and the R636H mutations, family members were diagnosed with PPCM. Previously, no association had been reported between PPCM and RBM20 mutations. However, we recently showed that PPCM/DCM can be frequently caused by truncating TTN mutations (van Spaendonck-Zwarts et al). In fact, our functional studies on explanted tissue of a TTN truncating mutation carrier demonstrated that the passive force generation of the single cardiomyocytes is drastically decreased, but also showed that the TTN isoform composition (which provides the molecular basis for this passive tension) is shifted towards more long, compliant N2BA isoform production in place of short N2B isoform production (van Spaendonck-Zwarts et al). The splicing of TTN is known to be regulated by RBM20-mediated exon skipping, a process which is disrupted by mutations of RBM20 (Guo et al 2012). Moreover, Rbm20-/- rats were found to exclusively express the N2BA isoform at all ages, while this isoform was shown to only account for about 15% of expressed titin 20 days after birth in healthy animals (Guo et al 2013). Together, these findings suggest thattitin isoform shift may be a common molecular pathway in PPCM/DCM patients with either RBM20 or TTN mutations. RBM20 mutation carriers have been reported to have unusually severe symptoms with very early onset (Brauch et al, Li et al 2010, Wells et al). We have only observed this in two cases within the two families with PPCM/DCM:

RBM20 IN DILATED CARDIOMYOPATHY 69 a patient carrying R636H passed away at the age of 23 while another with the S637N mutation had to go through heart transplantation at the age of 19. It is likely that the more severe nature of the disease in the earliest reported RBM20 families was the result of patient selection bias as more recent papers reported contradictory results, although including patients carrying RBM20 “mutations” of which the pathogenic nature is questionable (like the S455L and D888N variants) may have affected the outcomes of these studies (Haas et al; Refaat et al). Aberrant splicing, which is the key pathomechanism leading to cardiomyopathy and DCM in RBM20-mutation-carrying patients, has been associated with various diseases including, increasingly, cardiac diseases. For example, it was recently shown that heart failure (due to aortic stenosis, ischemic and DCM) is associated with altered splicing of sarcomeric genes, such as TNNT2, TNNI3, MYH7 or FLNC (Kong et al). Indeed, it is not only RBM20 that is known to contribute to the splicing machinery and splicing events in the physiological processes of the heart, several other splicing factors have also been connected to DCM in various animal models; SC35, ASF/SF2, and FXR1 are involved in the processing of calcium homeostasis or desmosomal proteins (Ding et al, Xu et al 2005, Whitman et al). There are also structural homologues of RBM20 that are known to play a role in the heart. One of these homologues is MATR3, which is involved in adult onset myopathy (Senderek et al), and which has been shown to interact with wild type RBM20, but to lose this ability in the presence of the S635A mutant (Maatz et al). RBM4 and PTB are antagonists of each other competing for the recognition element of TPM1 mRNA and regulating its exon selection (Lin & Tarn), while RBM24 is involved in cardiac differentiation and its targets are important for sarcomere assembly (Poon et al). According to the co-translational assembly model, the muscular RNA-binding proteins play an extremely important role in filament assembly, because they bind and stabilize their targets in the nucleus, and mediate the transports of those to close proximity to the sarcomere for immediate use (Zarnescu & Gregorio). It would therefore be worthwhile to study whether some of these spliceosomal proteins/genes could also contribute to human cardiac diseases, e.g. cardiomyopathy or heart failure in general. Remarkable in this respect is that, apart from RBM20, no clearly Mendelian inherited mutations in other spliceosomal proteins causing DCM have yet been reported. It may be that these mutations are rare and private familial mutations that still have to be

70 SANGER SEQUENCING discovered. On the other hand, as the splicing machinery is generally quite complex, involving a number of analogous proteins, it may be able to accept mutations in individual components of the machine without very drastic effect on its function. If this is the case, it will be even more intriguing to find out whyRBM20 mutations specifically result in the development of DCM. Since the expression of RBM20 is not strictly limited to the heart (Filippello et al), and the RBM20 RNA-recognition element containing an UCUU core sequence (Maatz et al) is quite short and not very specific, neither of these can sufficiently explain the purely cardiac phenotype of the patients. Further research on how the interconnections of RBM20 with other spliceosomal proteins ultimately mediate the splicing process in the heart in a tissue- and life-stage specific manner is needed.

ACKNOWLEDGEMENTS CHAPTER 2.1 The authors would like to thank Jackie Senior and Kate Mc Intyre for editing the manuscript. Anna Pósafalvi was supported by grants from the Jan Kornelis de Cock Foundation.

REFERENCES Bao JR, Wang JZ, Yao Y et al. Screening of pathogen- Guo W, Pleitner JM, Saupe KW et al. Pathophys- ic genes in Chinese patients with arrhythmo- iological defects and transcriptional profil- genic right ventricular cardiomyopathy. Chin ing in the RBM20-/- rat model. PLoS One Med J (Engl) 2013;126:4238-41 2013;8:e84281 Bauce B, Nava A, Beffagna G et al. Multiple mu- Guo W, Schafer S, Greaser ML et al. RBM20, a gene tations in desmosomal proteins encoding for hereditary cardiomyopathy, regulates titin genes in arrhythmogenic right ventricular splicing. Nat Med 2012;18:766-73 cardiomyopathy/dysplasia. Heart Rhythm Haas J, Frese KS, Peil B et al. Atlas of the clinical 2010;7:22-9 genetics of human dilated cardiomyopathy. Eur Beraldi R, Li X, Martinez Fernandez A et al. Heart J 2014; pii: ehu301 [epub ahead of print] Rbm20-deficient cardiogenesis reveals early Huang C, Zhou Q, Liang P et al. Characterization disruption of RNA processing and sarcomere and in vivo functional analysis of splice variants remodeling establishing a developmental eti- of cypher. J Biol Chem 2003;278:7360-5 ology for dilated cardiomyopathy. Hum Mol Klaavuniemi T, Kelloniemi A, Ylannet J: The ZASP- Genet 2014;23:3779-91 like motif in actinin-associated LIM protein is Brauch KM, Karst ML, Herron KJ et al. Mutations required for interaction with the α-actinin rod in ribonucleic acid binding protein gene cause and for targeting to the muscle Z-line. J Biol familial dilated cardiomyopathy. J Am Coll Car- Chem 2004;279:26402-10 diol 2009;54:930-41 Kong SW, Hu YW, Ho JW et al. Heart failure-asso- Ding JH, Xu X, Yang D et al. Dilated cardiomyopa- ciated changes in RNA splicing of sarcomere thy caused by tissue-specific ablation of SC35 genes. Circ Cardiovasc Genet 2010;3:138-46 in the heart. EMBO J 2004;23:885-96 Li D, Morales A, Gonzalez-Quintina J et al. Identifi- Filippello A, Lorenzi P, Bergamo E et al. Identi- cation of novel mutations in RBM20 in patients fication of nuclear retention domains in the with dilated cardiomyopathy. Clin Trans Sci RBM20 protein. FEBS Lett 2013;587:2989-95 2010;3:90-97

RBM20 IN DILATED CARDIOMYOPATHY 71 Li S, Guo W, Dewey CN et al. RBM20 regulates titin mon in families with both peripartum cardio- alternative splicing as a splicing repressor. Nu- myopathy and dilated cardiomyopathy. Eur cleic Acids Res 2013;41:2659-72 Heart J 2014;35:2165-73 Lin JC & Tarn WY: Exon selection in alpha-tro- Vatta M, Mohapatra B, Jimenez S et al. Mutations in pomyosin mRNA is regulated by the antago- Cypher/ZASP in patients with dilated cardio- nistic action of RBM4 and PTB. Mol Cell Biol myopathy and left ventricular non-compaction. 2005;25:10111-21 J Am Coll Cardiol 2003;42:2014-27 Maatz H, Jens M, Liss M et al. RNA-binding protein Wells QS, Becker JR, Su YR et al. Whole exome se- RBM20 represses splicing to orchestrate cardi- quencing identifies a causalRBM20 mutation in ac pre-mRNA processing. J Clin Invest 2014 Jun a large pedigree with familial dilated cardiomy- 24. pii: 74523 opathy. Circ Cardiovasc Genetics 2013;6:317-26 Mestroni L, Rocco C, Gregori D et al. Familial di- Whitman SA, Cover C, Yu L et al. Desmoplakin lated cardiomyopathy: evidence for genetic and Talin2 are novel mRNA targets of Fragile and phenotypic heterogeneity. Heart Mus- X-related protein-1 in cardiac muscle. Circ Res cle Disease Study Group. J Am Coll Cardiol 2011;109:262-71 1999;34:181-90 Xu X, Yang D, Ding JH et al. ASF/SF2-regulated Millat G, Bouvagnet P, Chevalier P et al. Clinical CaMKIIdelta alternative splicing temporally and mutational spectrum in a cohort of 105 un- reprograms excitation-contraction coupling in related patients with dilated cardiomyopathy. cardiac muscle. Cell 2005;120:59-72 Eur J Med Genet 2011;54:e570-5 Xu T, Yang Z, Vatta M et al. Compound and digenic Nakajima T, Kaneko Y, Irie T et al. Compound and heterozygosity contributes to arrhythmogenic digenic heterozygosity in desmosome genes right ventricular cardiomyopathy. J Am Coll as a cause of arrhythmogenic right ventricular Cardiol 2010;55:587–97 cardiomyopathy in Japanese patients. Circ J Zarnescu DC & Gregorio CC: Fragile hearts: New 2012;76:737-43 insights into translational control in cardiac Poon KL, TanKT, Wei YY et al. RNA-binding protein muscle. Trends Cardiovasc Med 2013;23:275-81 RBM24 is required for sarcomere assembly and heart contractility. Cardiovasc Res 2012;94:418-27 Pugh TJ, Kelly MA, Gowrisankar S et al. The land- scape of genetic variation in dilated cardiomy- opathy as surveyed by clinical DNA sequencing. Genet Med 2014;16(8):601-8 Refaat MM, Lubitz SA, Makino S et al. Genetic variation in the alternative splicing regulator RBM20 is associated with dilated cardiomyopa- thy. Heart Rhythm 2012;9:390-6 Rigato I, Bauce B, Rampazzo A et al. Compound and digenic heterozygosity predicts life-time arrhythmic outcome and sudden cardiac death in desmosomal gene-related arrhythmogenic right ventricular cardiomyopathy. Circ Cardio- vasc Genet 2013;6:533-42 Roncarati P, Viviani Anselmi C, Krawitz P et al. Doubly heterozygous LMNA and TTN muta- tions revealed by exome sequencing in a severe form of dilated cardiomyopathy. Eur J Hum Genet 2013;21(10):1105-11 Senderek J, Garvey SM, Krieger M et al. Autoso- mal-dominant distal myopathy associated with a recurrent missense mutation in the gene en- coding the nuclear matrix protein, matrin 3. Am J Hum Genet 2009;84:511-8 van Spaendonck-Zwarts KY, Posafalvi A, van den Berg MP et al. Titin gene mutations are com-

72 SANGER SEQUENCING Supplementary table 1. RBM20 sequencing primers. PT-tails were attached to all primers that facilitated sequencing using PT-primers and resulted in better quality sequencing data.

amplicon direction primer sequence (5'-> 3') exon 1 F GCCACCGGGAAGGACAAGGG R TCAGCAGGGACGGGAAATGAAGC exon 2/1 F GACCAGTGTGGGAAGGTCTTG R CATTAGAGGGAAACCGGGTACTG exon 2/2 F AATCAACTGAGGCATCCGTCTG R AAGGCAGCTTGACCATCCTG exon 2/3 F TATGGCCCTGAAACAGATGG R GATAGCAACGCCTGGTCCTC exon 2/4 F CCCGAGGAACCAACCTCAGAC R CTTGCCCATTGGCAGCGTGAG exon 3 F CAGCCTCTGGGCGCTCTGTG

R ACTTTGCCTGTTCTGGTCTCTG CHAPTER 2.1 exon 4 F GGGCTGCTAGGAAGGTTTGG R TGCTTTCTACATCCGTGAGAAGG exon 5 F CAGCATGTCCAGAGGTACAATC R GCATTCCAGCCTGGCTGTCTC exon 6 F CCTGGGTGATGGGAGTGAGAC R GTGGTCTGTGGCATATACACTG exon 7 F GACGTGGAATCATGCCTTGTG R GCAATGGTTTGCCTCGAGATCC exon 8 F TGGTGGACCAGGCAATGAATG R GAACAGGGCACAGCATGACTC exon 9/1 F TGCACAGTATATCTAAGACAGAGAC R AGACCCAGATCTCGGGTACTTC exon 9/2 F CCGGCAACTGGACAAGGCTG R CTCCTTCAGGGCCTGCCTCG exon 10 F GCTGGGACCTGCATTCAATATC R GGGTCTCAGCCATATTCCATCC exon 11/1 F ATGGCCAAGTCTTGTGCCTTCC R CCGCTCAGCATCCAGATTTAGG exon 11/2 F AGCCTCAAGTCACCCAGAGAAC R GGTGAGCAGGAGTCCAATCAAC exon 12 F GCTCCTAATGACAGTGCTTTGG R CAAGCTCTTGAGGTTGCTATGG exon 13 F TGGAGCTCGTGGCTCCCATTTC R AAACAGCCTGGTGTGCTTGG exon 14 F GCACAGATGCCAGGAGAGGGATG R TGGGTGACTTGCTCCTGGAGAC

RBM20 IN DILATED CARDIOMYOPATHY 73 Supplementary table 2a. Marker analysis primers. The markers used to screen for a potentially shared haplotype inherited from a common ancestor were selected from the deCODE high resolution genetic map. The names and primers can be found in the first two columns. The RBM20 gene can be found at 112.5 M (NC_000010.10) and markers in a region of 5 cM at each side were selected (see Location and Distance coloumns).

Location Distance to Marker Primer sequences 5'- 3' (NC_000010.10) RBM20 gene (cM) D10S530 TCTAGCAGTAAGAGTTGTGTCTCC 107.5M / 123.53 cM 4.4 TTGACAAGGCCATCAAAAC D10S1778 CTTGGTTATGATCTCACATGGTCT 108.1M / 124.24 cM 3.7 CTGCTCTGGATTGAATGTTT D10S521 CTCCAGAGAAAACAGACCAA 109.3M / 125.85 cM 2.1 CCTACCATCAATCAACTGAG D10S597 GAATGAAGACATCCAGAGG 111.2M / 126.7 cM 1.3 GCAAGTATCAGAAACCCAA D10S543 AAAGATGTTCAGGTAGATAACACAC 111.8M / 127.43 cM 0.6 ATCCCTCAGCCCCACT MUTATION (R634W) D10S1760 GCGAGACTCCATCTCCATAG 113.7M / 128.63 cM 0.6 CCATATAGTGGGTGGCTTAAA D10S1429 GCTCGTAATAGCTTTGTCCA 114.1M / 129.4 cM 1.4 ATGAAACCATATATGTGACTTTTTG D10S168 CATGGCACTAATAGAGTTAAC 114.7M / 130.05 cM 2 TTCACTTGGGATGGAGGCA D10S554 GGAGGACTCATGTCAGACTT 116.0M / 132.47 cM 4.4 CCTACCTTTAATTCAGCCCT D10S468 CAGGCATGTCCATGTAGGTA 117.2M / 134.83 cM 6.7 TCTGTAAATAACTCATTTGTCCG

Supplementary table 2b. Results of marker analysis for the R634W mutation carriers Expected product range Markers (length in bp) Patient 1 Patient 2 Control 1 Control 2 allele 1 allele 2 allele 1 allele 2 allele 1 allele 2 allele 1 allele 2 D10S530 182-208 196 202 196 198 198 200 190 192 D10S1778 216-226 220 220 218 220 D10S521 155-189 173 177 173 181 177 181 173 177 D10S597 206-222 214 216 214 215 214 218 215 D10S543 129-149 129 138 138 137 140 138 MUTATION (R634W) D10S1760 112-155 107 113 107 113 109 113 109 137 D10S168 159-165 160 160 162 162 D10S554 150-162 148 150 150 150 D10S468 82-96 87 83 87 89 81 87

74 SANGER SEQUENCING Supplementary table 3. RBM20 vector sequencing primers

amplicon primer sequence (5'-> 3') VP1.5_F GGACTTTCCAAAATGTCG 02A_R_PT2 CATTAGAGGGAAACCGGGTACTG 02B_F_PT1 AATCAACTGAGGCATCCGTCTG 02C_R_PT2 AGATAGCAACGCCTGGTCCTC 02D_F_PT1 CCCGAGGAACCAACCTCAGAC exon5_R CCACAGAAGCCAAAGGAAATG exon3_F CTGGGAGCTGCATGTGAAAG 09A_R_PT2 AGACCCAGATCTCGGGTACTTC 09B_F_PT1 CCGGCAACTGGACAAGGCTG 11A_R_PT2 CCGCTCAGCATCCAGATTTAGG 11B_F_PT1 AGCCTCAAGTCACCCAGAGAAC XL39_R ATTAGGACAAGGCTGGTGGG CHAPTER 2.1 Supplementary table 4. Site-directed mutagenesis primers. The 5’ primer ends were phosphorylated.

Variant Primer sequences 5'- 3' TM Calculator Annealing L100F_F CAGCTCACCTTCCACCGGC 70,84 71 L100F_R GGCCTGCAGTTGAGCCAGC 71,09 71 V535L_F GGCTGCCCTTTGGAAAGGT 67,73 68 V535L_R CCAGGTTAATGAGGTCATTCTCAGTG 67,73 68 R634W_F AGAAAGGCCGTGGTCTCGTAG 66,85 67 R634W_R GGGCCATATCTGTCTGCTTCC 67,13 67 R636H_F CGCGGTCTCATAGTCCGGT 67,79 68 R636H_R GCCTTTCTGGGCCATATCTGTC 67,99 68 G672S_F TCCTGGGAGCACTCTCCCTATGC 71,74 71,5 G672S_R GTCCCGGCTATTGCCCCAGT 71,45 71,5 W768S_F CAAAGCCAAGTCGGACAAGTATCTG 68,72 69 W768S_R GGCTCTTTCCGGTAGTAGCCGT 68,54 69 W768L_F CAAAGCCAAGTTGGACAAGTATCTGA 68,14 68 W768L_R GGCTCTTTCCGGTAGTAGCCGT 68,54 68 S637N_F CGGTCTCGTAATCCGGTGAGC 70,16 70 S637N_R CGGCCTTTCTGGGCCATATC 69,82 70 D888N_F AAAGTGAGGCAGAGGGGGAG 67,08 67 D888N_R CACTCTCCCAATTTTGTTCCTTCTT 66,44 67

RBM20 IN DILATED CARDIOMYOPATHY 75 Supplementary table 5. RBM20 cDNA primers

amplicon direction primer sequence (5'-> 3') exon 3 F CTGGGAGCTGCATGTGAAAG exon 5 R CCCACAGAAGCCAAAGGAAATG exon 9 F GGAGAAGTACCCGAGATCTG exon 9 R TGGCCTCGTCTTTCCTCCTG exon 10 F AACAGGAGGGCATGGAAGAAAG exon 11 R CATTCCCTACGGCCTTGACTC

Supplementary table 6. Target cDNA primers. The following primer pairs were used for differential splicing analysis of the target molecules of RBM20. Abbreviations:CAMK2D – calcium/calmodulin-dependent protein kinase II delta; CAMK2G - calcium/calmodulin- dependent protein kinase II gamma; LDB3 – LIM domain binding 3; SH3KBP1 – SH3- domain kinase binding protein 1; SORBS1 – sorbin and SH3 domain containing 1; TNNT2 – troponin T type 2 (cardiac); TPM1 – tropomyosin 1 (alpha); TRDN – triadin.

target gene direction primer sequence (5'-> 3') CAMK2D F AAGGGTGCCATCTTGACAAC R TCAAAGTCCCCATTGTTGAT CAMK2G F CAACGATCCACGGTGGCATCC R GTGTAGGCCTCAAAGTCCCCA LDB3 F TCAAAGCGTCCCATTCCCATC R TGAATTCTGTCCCCGTCATCTG SH3KBP1 F GTAGAGGAAGGATGGTGGGAA R CTACTTCAATTGACCTTGGTC SORBS1 /1 F AGAGCACTCAGGACTTAAGC R AGATGCAGGAAACTGGTAGG SORBS1 /2 F CGCTCTTCCTCACTGAAGTC R GAGTAGGGCTGATGGCTGAG TNNT2 F CATAGAAGAGGTGGTGGAAGAG R TCCTTCTCCCGCTCATTCC TPM1 F TGGACGCCATCAAGAAGAAG R TCATATTTGCGGTCGGCATC TRDN F CAAAGACACTGGCGAAAG R GCTTGTTCTGTCGGTAAGG

76 SANGER SEQUENCING

Chapter 2.2

Missense variants in the rod domain of plectin increase susceptibility to arrhythmogenic right ventricular cardiomyopathy

Anna Posafalvi, Petros Syrris, Vincent Plagnol, Ludolf G Boven, Marieke C Bolling, Judith A Groeneweg, MP van den Berg, Marcel F Jonkman, Arthur AM Wilde, Richard NW Hauer, Richard J Sinke, William McKenna, J Peter van Tintelen, Jan DH Jongbloed

Manuscript in preparation ABSTRACT

Aims: Although mutations causing arrhythmogenic right ventricular cardiomyopathy (ARVC) have been identified in several genes, they do not explain the disease in all patients. Since the majority of these genes encode desmosomal components, we hypothesized that proteins known to be physically linked to the desmosome might contain ARVC-causing genetic variations. Thus, we screened patients for mutations in the PLEC gene encoding the cytolinker protein plectin that anchors intermediate filaments to the desmosomes.

Methods & Results: We sequenced 107 ARVC patients from the Netherlands and 358 patients from the UK with either Sanger or high-throughput sequencing for the coding regions of PLEC, and identified 96 novel or low frequency (<2%) variants scattered across the gene. A comparison with the genetic variation of PLEC seen in the GoNL control population revealed an area of the rod domain harbouring multiple missense variants that are only present in ARVC patients. Careful classification of the variants based on their conservation, frequency, and predicted pathogenicity level, combined with the lack of genetic variability of this region in healthy controls, suggests that these variants, which are located in the domain responsible for homodimerization, may affect normal plectin function.

Conclusions: Although PLEC has been hypothesised as a promising candidate gene for ARVC, our current studies do not support mutations in this gene as the primary cause of ARVC. Our data do, however, suggest that PLEC missense mutations, in particular those in regions of the protein lacking variation in healthy controls, could play a risk factor role in an oligogenic inheritance model of ARVC.

Keywords: ARVC, oligogenic inheritance, plectin, rod domain INTRODUCTION Arrhythmogenic right ventricular cardiomyopathy (ARVC) is a progressive heart disease characterized histologically by fibrofatty infiltration of the ventricular myocardium. Clinically, ARVC patients may suffer from ventricular arrhythmias, syncope, and sudden cardiac death as early as young adulthood (Sen-Chowdhry et al), with the majority of cases diagnosed before the age of 40 (Teekakirikul et al). Familial ARVC is a genetically heterogeneous disease and is most commonly transmitted as an autosomal dominant trait with variable expression and incomplete penetrance (te Rijdt et al). However, oligogenic inheritance is increasingly observed (Bauce et al, Xu et al, Nakajima et al, Bao et al), and carriership of multiple rare variants was recently suggested to be related to

a more severe disease outcome (Marcus et al 2013, Rigato et al). Currently, ARVC CHAPTER 2 is considered a disease of the desmosome, an important cell-cell adhesion structure. Genes encoding desmosomal proteins such as plakophillin-2 (PKP2), desmoglein-2 (DSG2), desmocollin-2 (DSC2), junctional plakoglobin (JUP), and desmoplakin (DSP) have been reported to carry ARVC-related mutations (te Rijdt et al). Most mutations causing familial ARVC have been found in the PKP2 gene (found in up to 70% of the patients) (van Tintelen et al). Mutations have also been identified in theDSC2 and DSG2 genes although in lower proportions (in up to 15% of patients). Mutations in the JUP and DSP genes are less frequent in ARVC patients, and are more often associated with the cardiocutaneous syndromes Naxos disease (McKoy et al) and Carvajal syndrome (Norgett et al), respectively. Finally, desmin (DES), the intermediate filament protein that builds up the of cardiomyocytes andis anchored to the membrane via desmosomes has been implicated in ARVC as well (te Rijdt et al). Recently, more ARVC-associated genes have been discovered: the cell- cell adhesion molecule α-3 catenin (CTNNA3); the nuclear envelope protein lamin A/C (LMNA); titin (TTN), which is involved in sarcomere assembly and passive elasticity; regulators of intracellular Ca-levels, namely ryanodine receptor 2 (RYR2) and phospholamban (PLN); transforming growth-factor β-3 (TGFB3) and transmembrane protein 43 (TMEM43), of which the function is unknown (te Rijdt et al). These new genes still only contribute to a small proportion of this predominantly desmosomal disease, and roughly half of ARVC cases remain unexplained (Jacob et al, Marcus et al 2013). It was therefore the purpose of this study to explore whether the putative candidate

PLEC IN ARRHYTHMOGENIC CARDIOMYOPATHY 81 gene PLEC, which encodes a protein (plectin) physically connecting the cardiac desmosome, is also involved in ARVC. Plectin is a large cytolinker protein and belongs to the family of proteins. Among other functions, plectin is believed to connect the desmosomes to the cytoskeleton by binding the DSP protein of the desmosomes and the desmin/ filaments in the myocardium/skin epithelium, respectively. In cardiac tissue, plectin is mainly localized at the intercalated disk and the sarcomeric Z-line, whereas in skin it is located in desmosomes and hemi-desmosomes. This means that plectin potentially has a general and fundamental function in junctional complexes (Wiche et al, Zernig et al). When fully knocked out in mice, the lack of plectin causes severe skin blistering, a phenotype similar to the symptoms of epidermolysis bullosa patients. Although this blistering condition in mice seemed lethal in itself a few days after birth, the animals also exhibited more phenotypic changes, such as generalized myopathies of the skeletal muscle, while ultrastructural differences (e.g. aberrant myofibril bundles and focal loss of Z-lines) were observed in the heart (Andrä et al). Over the past two decades, PLEC has been shown to harbour mutations in patients suffering from various forms of inherited epidermolysis bullosa. Clinical phenotypes associated with mostly homozygous nonsense/frameshift mutations in the gene encompass autosomal recessive epidermolysis bullosa simplex (EBS) with muscular dystrophy (MD) (MIM#226670), EBS-MD with myasthenic symptoms (Winter & Wiche; not in OMIM), EBS-Ogna (autosomal dominant, heterozygous; MIM#131950), EBS with pyloric atresia (MIM#612138), and limb-girdle muscular dystrophy (MIM#613723) (reviewed by Winter & Wiche, Sonnenberg & Liem). Heterozygous carriership of missense mutations was recently reported in mild cases of EBS without cardio-muscular manifestation (Bolling et al 2013). Moving beyond these skin and muscle abnormalities, and based on the symptoms of the plectin KO mouse, PLEC was also suggested to be involved in further disease processes such as neurodegeneration or cardiomyopathy due to increased fragility of the desmosomes (Andrä et al). Nonetheless, only two incidental cases of cardiac involvement have been reported: a plectin mutation-carrying EBS-MD patient with ventricular hypertrophy (Schröder et al) and an EBS-MD patient who, by the age of 30, was discovered to have asymptomatic dilated cardiomyopathy (DCM) which later progressed to right ventricle involvement including (septal) fibrosis as well as features associated with arrhythmogenic cardiomyopathy

82 SANGER SEQUENCING (Bolling et al 2010). Very recently, left ventricular non-compaction cardiomyopathy developed in an EBS-MD patient with a homozygous plectin truncation (Villa et al). An intriguing related phenomenon was also observed in a striated-muscle-specific conditional knock-out mouse model of plectin: the mice showed progressively declining endurance performance and, by the age of 16 months, a remarkable increase in connective tissue formation in the heart, indicating cardiomyocyte degeneration (Konieczny et al). In fact, the fibrofatty replacement of cardiomyocytes is the key underlying factor for cardiac conductivity problems in arrhythmogenic cardiomyopathy, and is one of the major task force criteria for the diagnosis of the disease (Marcus et al 2010, Elliott et al). Unfortunately, no electrophysiological studies addressing potential arrhythmias had been carried out in these knock-out mice. Based on these observations, we hypothesized that plectin might play a role in cardiac pathophysiology, and analysed a large cohort of ARVC CHAPTER 2 patients for carriership of PLEC variants. In many of these patients, novel or low-frequency variants with a potential pathogenic nature were identified. Comparing the genetic variation of PLEC in the Genome of the Netherlands (GoNL) database with that in our patients suggests that novel variants in a region of the rod domain may underlie, or at least increase the susceptibility to, ARVC. Although our data does not prove a major causal role for PLEC in ARVC development, it does suggest that variants in the PLEC gene may contribute to the development of an ARVC phenotype in a disease model in which a combination of several factors, both genetic and non-genetic, are needed to reach the threshold levels above which disease development is initiated.

MATERIALS AND METHODS Patients Patients were clinically evaluated and the diagnosis of ARVC was based on the criteria of the consensus-based international Task Force Criteria (TFC) (Marcus et al 2010). Our Dutch Sanger-sequenced cohort included 88 patients who fulfilled these clinical criteria for ARVC (TFC+) and 19 patients who did not completely fulfil them. The British gene-panel sequenced cohort included 123 TFC+ and 235 TFC- index patients. All patients were evaluated for genetic variations in PLEC. Many of the patients had also been screened for mutations in other ARVC-related genes.

PLEC IN ARRHYTHMOGENIC CARDIOMYOPATHY 83 Genetic analyses Genomic DNA was isolated from blood samples using standardized procedures. Written informed consent was obtained from the index patients and their relatives according to the participating hospitals’ medical ethics committee guidelines. For the Dutch cohort, primers for PCR amplification of the coding regions of the PLEC gene (about a 14 kb long sequence) were designed to encompass the coding exons as well as adjacent intronic sequences as described previously (Bolling et al 2013), using the sequence obtained from the GenBank database. Variant annotation is according to NM_201380.3 unless otherwise indicated. Amplifications were conducted following a standard PCR protocol and PCR products were analysed by direct Sanger sequencing. For the UK cohort, a next generation sequencing (NGS) protocol was designed to screen 2.1 Mbp of genomic DNA sequence per patient that covered the coding regions of genes known to be associated with inherited cardiomyopathies (including ARVC) and arrhythmia syndromes. The gene panel also included PLEC as a candidate gene for ARVC. The sequencing methodology has been reported in detail (Lopes et al).

Data analysis: variant classification In this study, the pathogenic nature of the identified missense mutations were judged based on: (1) the differences in the physico-chemical properties of the affected amino acids, (2) the evolutionary conservation of the affected amino acids across orthologues, (3) the frequency of the variant in control populations and databases (such as dbSNP (http://www.ncbi.nlm.nih.gov/ SNP/), 1000Genomes (http://www.1000genomes.org/), and ESP (http://evs. gs.washington.edu/EVS/)), and (4) the predicted pathogenic or benign nature of the variant (identified using the Alamut software; Interactive Biosoftware, Rouen). Every variant was then classified as either ‘benign’, ‘likely benign’, ‘variant of unknown significance’, ‘likely pathogenic’, or ‘pathogenic’ (see chapter 4.1 for a detailed description).

Data analysis: frequency-based variant clustering Chromosomal positions of single nucleotide variants of the PLEC gene identified in our Sanger- and gene-panel-sequenced patient cohorts were annotated using information from the 1000 Genomes (1000G), the Exome Sequencing Project (ESP) and GoNL (http://www.nlgenome.nl/) control

84 SANGER SEQUENCING cohort databases using an in-house developed script (to be published elsewhere), and frequency information on these variants was collected. PLEC variants with an allele frequency <2% in 1000G, GoNL, and ESP (considering the European ancestry population only) were uploaded into SeattleSeq and only non-synonymous coding variants were analysed further (variants with an allele frequency ≥2% were considered as definite polymorphisms). The resulting list of low frequency and novel ‘ARVC variants’ was compared with a list of low frequency and novel coding variations extracted from the GoNL database, which was filtered for allele frequencies as described above. Regions consisting of consecutive novel or rare (<0.5%) genetic variants of the PLEC gene identified exclusively in our ARVC TFC+ or TFC- patients, but lacking novel and/or rare variants in the GoNL control population, were subsequently checked for the putative presence or absence of any variation with allele frequency of >2% in GoNL. CHAPTER 2

RESULTS Genetic screening of plectin As a result of the genetic analysis of 465 patients (211 TFC+ and 254 TFC-) for the coding sequences of PLEC, we identified 96 variants that were either novel (not known from control populations) or low frequency (<2% in control populations), the majority of which were missense mutations (tables 1 and 2). Next, all variants were classified, assessing their potential pathogenic nature with the help of in silico prediction tools, as described elsewhere (for the classification criteria: see chapter 4.1; for the classification outcome: see supplementary table 1). For some of the PLEC variants we studied their possible cosegregation with the disease phenotype in the families, but this yielded no conclusive results (data not shown). In order to further assess the potential involvement of the variants in the disease, we compared the localization of these variants with the localization of genetic variants of the PLEC gene in a healthy population by extracting data from the GoNL whole genome sequencing database of 500 individuals (1000 alleles/genomes). This led to the identification of clustering of consecutive novel and low frequency variants in probable and known ARVC patients and the lack of these variants in healthy controls.

PLEC IN ARRHYTHMOGENIC CARDIOMYOPATHY 85 Table 1. PLEC variants identified in ARVC patients from the Netherlands.Overview of all novel and low frequency (<2%) coding variants identified in thePLEC gene in Dutch ARVC patients. Frequencies in the healthy control population of the GoNL database are indicated. The variants have been classified on the basis of differences in physicochemical nature, conservation, frequency, and predicted pathogenic effect. Frequencies of variants found in patients are indicated in red, while of variants found in controls in green. Frequency-based clustering of variants in the rod domain is shown in light yellow.

variant position Sanger NGS data classification genomic cDNA protein ARVC GoNL freq (%) 145024454 421G>A R141C 0.1071 VOUS 145016664 20C>T A7V 1x - LB 145016560 124G>C D42H 1x - VOUS 145013602 28C>T Q10* 1x - VOUS 145013561_145013560 c69_70insTAC D23_N24insY 1x - VOUS 145010082 947C>T R316Q 0.1010 VOUS 145009211 1204C>T V402M 0.2020 VOUS 145009069 1265G>A R422Q 1x - LB 145009036 1298C>T R433Q 3x 1.3655 LB 145008532 1534C>T V512M 0.2045 VOUS 145008497 1569G>C S523R 1x 0.2079 VOUS 145007449 1745G>A A582V 0.1006 LP 145007273 1836C>G Y612* 1x - LP 145007235 1874A>G Y625C 1x - LP 145007192 1917C>G Q639H 0.1027 LB 145004124 3130G>A R1044C 0.2114 VOUS 145001652 4093G>A R1365W 2x 1.0664 LB 145001221 4280T>C K1427R 1x - LB 145001007 4400G>C T1467R 0.2049 VOUS 145000968 4439C>T R1480H 0.1014 LP 145000022 4486G>A R1496C 0.4373 LP 144999731 4777C>T V1593M 0.1420 LB 144999571 4937G>A R1646H 1x - VOUS 144999565 4943C>T R1648Q 0.2833 VOUS 144999499 5009C>T R1670Q 1x 1.1152 LB 144999454 5054G>A A1685V 0.2924 LP 144998932 5576C>T T1859M 1x - VOUS 144998885 5623C>T R1875W 1x - LP 144998707 5801G>A R1934H 1x - LP 144998648 5860C>G R1954G 2x - LP 144998621 5887G>A R1963W 1x - LP 144998620 5888G>A R1963Q 3x - VOUS 144998594 5914C>G L1972V 1x - LP 144998588 5920G>A E1974K 1x - LP 144998576 5932G>A A1978T 1x - LP 144998399 6109A>G K2037E 1x - LP 144998179 6329G>A R2110Q 1x - LP 144998035 6473G>A A2158V 1x 0.1761 LP 144997786 6722G>A A2241V 0.6383 VOUS 144997765 6743A>C V2248G 0.7143 VOUS 144997681 6827G>A R2276H 1x - LP 144997577 6931G>A A2311T 1x - VOUS 144997252 7256G>A R2419Q 1x - LP 144996830 7678C>T A2560T 2x 0.5071 LB 144996320 8080G>A R2694W 0.4464 LB 144996169 8231C>G A2744G 2x - LB

86 SANGER SEQUENCING 144995977 8423C>T R2808Q 0.8989 LB 144995942 8458C>T A2820T 0.1096 LB 144995938 8462C>T R2821Q 0.3275 LB 144995788 8612G>A A2871V 1x 0.2232 VOUS 144995519 8881C>T V2961M 0.2105 VOUS 144995500 8900T>C Y2967C 0.8368 LB 144995480 8920C>T E2974K 3x 0.5133 VOUS 144995477 8923C>T E2975K 1x - VOUS 144995173 9227C>T R3076Q 0.1048 VOUS 144995169 9231G>C D3077E 0.9395 B 144995012 9388C>T D3130N 0.1042 LB 144994936 9464C>T R3155Q 0.1025 LB 144994442 9958C>T D3320N 0.2053 VOUS 144994064 10336G>A R3446C 1x 1.4085 VOUS 144994028 10372C>T G3458R 1.2195 LB 144993931 10469G>C G3490A 1.8634 LB 144993491 10909G>A R3637C 0.1008 VOUS 144993344 11056C>A A3686S 1.3131 LB 144993119 11281C>T E3761K 0.1008 LP 144993076 11324G>A A3775V 0.1008 LB 144992660 11740C>T E3914K 0.3067 LB CHAPTER 2 144992378 12022C>T G4008S 0.3268 VOUS 144992269 12131G>A T4044M 1.0121 LB 144991963 12437C>T R4146H 0.1006 LP 144991958 12442C>T V4148I 0.1004 LP 144991802 12598C>T V4200M 0.2020 LP 144991799 12601C>T E4201K 0.2016 LP 144991205 13195G>A V4399I 1x - LB

Abbreviations B: benign; LB: likely benign; LP: likely pathogenic; P: pathogenic; VOUS: variant of unknown significance

Clustering of novel, likely pathogenic variants in the rod domain We have identified one large, potentially disease-associated cluster of novel, missense genetic variants in the rod domain of the PLEC gene in the Dutch patient cohort (variants T1859M-R2110Q, see table 1). Interestingly, another, significantly overlapping ARVC-associated region was found in the same domain in the UK cohort (variants R1688C-E2157A, see table 2). This cluster of variants was not only promising due to its presence in ARVC patients and absence in control populations, but also because all variants within the cluster were classified as ‘likely pathogenic’ or ‘variant of unknown significance’ (VOUS) (for details of the variant classification, see supplementary table 1). Therefore, on the basis of the clustering of variants in patients and their predicted pathogenicity, we considered this region as ‘potentially pathogenic’.

PLEC IN ARRHYTHMOGENIC CARDIOMYOPATHY 87 Table 2. PLEC variants identified in ARVC patients from the United Kingdom. Overview of all novel and low frequency (<2%) coding variants identified in thePLEC gene in British ARVC patients. Frequencies in the healthy control population of the GoNL database are indicated. The variants have been classified on the basis of differences in physicochemical nature, conservation, frequency, and predicted pathogenic effect. Frequencies of variants found in patients are indicated in red, while of variants found in controls in green. Frequency-based clustering of variants in the rod domain is shown in light yellow.

variant position NGS data classification genomic cDNA protein TFC+ TFC- GoNL freq (%) 145024845 30G>C D10E 1x - LB 145024609 266G>A P89L 1x - VOUS 145024570 305C>T R102H 1x - VOUS 145024454 421G>A R141C 0.1071 VOUS 145024372 503G>A P168L 1x - LB 145011343 743C>T R248Q 1x - VOUS 145010082 947C>T R316Q 0.1010 VOUS 145009211 1204C>T V402M 0.2020 VOUS 145009036 1298C>T R433Q 5x 8x 1.3655 LB 145008532 1534C>T V512M 0.2045 VOUS 145008497 1569G>C S523R 1x 0.2079 VOUS 145008206 1634C>T C545Y 1x - LP 145007449 1745G>A A582V 0.1006 LP 145007192 1917C>G Q639H 0.1027 LB 145006317 2474G>T P825Q 1x - LP 145006145 2549C>G C850S 1x - LP 145004376 2959C>T A987T 1x 1x - B 145004373 2962C>T V988M 1x - VOUS 145004124 3130G>A R1044C 0.2114 VOUS 145003722 3352G>A R1118C 1x - LB 145001873 3872C>T R1291Q 1x - LB 145001652 4093G>A R1365W 1.0664 LB 145001482 4189G>C R1397G 1x - LB 145001221 4280T>C K1427R 1x - LB 145001007 4400G>C T1467R 0.2049 VOUS 145000968 4439C>T R1480H 0.1014 LP 145000022 4486G>A R1496C 0.4373 LP 144999871 4637A>C V1546G 1x - LP 144999731 4777C>T V1593M 1x 0.1420 LB 144999565 4943C>T R1648Q 0.2833 VOUS 144999541 4967C>T S1656L - LB 144999499 5009C>T R1670Q 4x 1.1152 LB 144999454 5054G>A A1685V 0.2924 LP 144999446 5062G>A R1688C 2x - VOUS 144999268 5240G>A A1747V 1x - LP 144999224 5284C>T E1762K 2x - LP 144998857 5651G>A T1884M 1x - VOUS 144998782 5726G>A A1909V 1x - VOUS 144998626 5882G>A A1961V 1x - LP 144998621 5887G>A R1963W 2x - LP 144998620 5888G>A R1963Q - VOUS 144998077 6431G>A A2144V 1x - LP 144998038 6470T>G E2157A 1x - VOUS 144998035 6473G>A A2158V 0.1761 LP 144997899 6609C>A Q2203H 1x - LB 144997786 6722G>A A2241V 0.6383 VOUS 144997772 6736G>A R2246W 1x - VOUS 144997765 6743A>C V2248G 0.7143 VOUS

88 SANGER SEQUENCING 144997651 6857C>T R2286Q 1x - LP 144997561 6947C>T R2316Q 1x - VOUS 144996830 7678C>T A2560T 1x 2x 0.5071 LB 144996320 8080G>A R2694W 1x 0.4464 LB 144995977 8423C>T R2808Q 2x 0.8989 LB 144995948 8452C>T E2818K 1x - LP 144995942 8458C>T A2820T 0.1096 LB 144995938 8462C>T R2821Q 1x 0.3275 LB 144995807 8593T>C I2865V 1x - LB 144995788 8612G>A A2871V 2x 0.2232 VOUS 144995656 8744T>C K2915R 1x - VOUS 144995519 8881C>T V2961M 0.2105 VOUS 144995500 8900T>C Y2967C 2x 13x 0.8368 LB 144995483 8917C>T D2973N 2x - VOUS 144995480 8920C>T E2974K 1x 2x 0.5133 VOUS 144995477 8923C>T E2975K 1x - VOUS 144995459 8941C>T A2981T 1x - LB 144995173 9227C>T R3076Q 0.1048 VOUS 144995169 9231G>C D3077E 2x 3x 0.9395 B 144995012 9388C>T D3130N 0.1042 LB 144994955 9445C>T E3149K 1x - VOUS CHAPTER 2 144994946 9454T>C T3152A 1x - LB 144994936 9464C>T R3155Q 0.1025 LB 144994442 9958C>T D3320N 0.2053 VOUS 144994396 10004C>T R3335Q 1x - LB 144994346 10054T>C K3352E 1x - LB 144994298 10102G>A R3368C 1x - VOUS 144994175 10225G>A R3409C 1x - VOUS 144994064 10336G>A R3446C 1x 3x 1.4085 VOUS 144994028 10372C>T G3458R 1x 6x 1.2195 LB 144993946 10454C>T R3485Q 1x - LB 144993931 10469G>C G3490A - LB 144993859 10541C>T R3514Q 1x - LB 144993653 10747A>G S3583P 1x - LP 144993491 10909G>A R3637C 1x 0.1008 VOUS 144993344 11056C>A A3686S 1.3131 LB 144993242 11158A>G S3720P 1x - LB 144993119 11281C>T E3761K 0.1008 LP 144993076 11324G>A A3775V 0.1008 LB 144992962 11438G>A A3813V 1x - LB 144992953 11447G>A A3816V 1x - LB 144992660 11740C>T E3914K 0.3067 LB 144992638 11762T>A Q3921L 1x - LP 144992390 12010C>G D4004H 2x 1x - B 144992378 12022C>T G4008S 0.3268 VOUS 144992269 12131G>A T4044M 2x 5x 1.0121 LB 144991963 12437C>T R4146H 0.1006 LP 144991958 12442C>T V4148I 0.1004 LP 144991802 12598C>T V4200M 0.2020 LP 144991799 12601C>T E4201K 1x 1x 0.2016 LP 144991784 12616G>A R4206C 1x - LP 144991745 12655C>T D4219N 1x - LP 144991271 13129C>T A4377T 1x - LB 144991172 13228G>A P4410S 1x - LP 144990515 13885C>T G4629S 1x - LP 144990401 13999G>A R4667C 1x - LP

Abbreviations B: benign; LB: likely benign; LP: likely pathogenic; P: pathogenic; VOUS: variant of unknown significance

PLEC IN ARRHYTHMOGENIC CARDIOMYOPATHY 89 Absence of frequent non-synonymous variations in the rod domain region in controls Next, we investigated whether frequent (MAF>2%) SNPs reside in the ‘potentially pathogenic’ region in the rod domain in the GoNL control population. For this purpose, all variants of PLEC between chromosomal positions 144998038 - 144999446 were extracted from the GoNL database then uploaded to SeattleSeq for annotation with protein coding features (table 3). Only a few synonymous variants were identified, with the exception of one additional missense variant (K2047E) reported with the frequency of 8%. This variant, however, had a very low quality score, which indicated that this was a sequencing artefact rather than a true variant. Hence, it seems that missense mutations in this region are not “tolerated” without phenotypic consequences.

Further interesting regions of plectin Based on the absence of low frequency variants in patients versus controls, we identified additional interesting, potentially ARVC-associated regions of PLEC. Notably, these were only found in the UK cohort (probably due to the larger number of patients involved in the study) and spanned much shorter regions of the gene than the one in the rod domain. One of these ARVC- associated regions was the repeat region (P825Q-V988M): though this region only contained two ‘likely pathogenic’ variants in our patients, it did not contain any non-synonymous variant in the GoNL control population (data not shown). The other two variants, despite being classified as VOUS, might also be more damaging, since this segment of the encoded plectin protein is known to be responsible for interactions with, for example, actin, nesprin, and costameric proteins. Likewise, the region of variants R3335Q- R3409C, partially residing in the intermediate filament-binding plectin- repeats of the protein, may also contribute to the development of ARVC. Though this latter region was also free of coding non-synonymous genetic variants in Dutch controls (GoNL), two of the four variants found in patients were classified as ‘likely benign’ (primarily because they were predicted to be harmless) (see classification in supplementary table 1). Additionally, the C-terminus of PLEC (R4206C-R4667C) could be potentially interesting, but one relatively frequent missense variant (T4539M, 2.381% in GoNL), which was also found in 2/107 Dutch and 18/358 British patients, resides in this otherwise likely ARVC-related plectin-repeat region.

90 SANGER SEQUENCING The potential role of other desmosomal mutations and/or external factors: is ARVC a multifactorial disease? Of the 30 patients who were carriers of PLEC missense variants in the potentially disease-associated region of the rod domain, the vast majority had a TFC+ cardiac phenotype (true ARVC). Of these, 14 patients were found to carry other ARVC-related potentially pathogenic mutations or VOUS in addition to their PLEC cluster variants (table 4, only likely pathogenic and pathogenic variants included), mostly in the PKP2 gene but also in DSC2, DSP, JUP, SCN5A or TMEM43 for some cases. Moreover, five patients had multiple low frequency or novel genetic variants in PLEC (four of which were additional variants in the same rod cluster). While exercise (Perrin et al, Saberniak et al) and certain viral infections (Grumbach et al) remain

important contributors to the onset of an ARVC phenotype, our study indicates CHAPTER 2 that a potential oligogenic inheritance might complicate the seemingly multifactorial disease background and cause variable penetrance of ARVC.

Table 3. High frequency (>2%) PLEC control variants localized in the ARVC- associated ‘potentially pathogenic’ region of the rod domain. Variants of PLEC localized in the potentially pathogenic cluster of missense variants associated with ARVC. This region was found to be enriched for missense variants in both the Dutch and UK ARVC patient cohorts and lacking low frequency (<2%) variation in the healthy population represented by GoNL. No frequent coding non-synonymous PLEC variants, except for the c.6319T>C; p.K2047E variant (highlighted in black), which most likely is an artefact, were identified in the GoNL control population. Synonymous variants are indicated in gray. variant position dbSNP GoNL data remark genomic cDNA protein rs number quality frequency (%) 144999417 5091C>T A1697A rs55836855 9320,24 47,1429 144998868 5640C>T A1880A - 222,26 8,1633 144998514 5994C>T A1998A - 43,31 4,1096 most likely artefact 144998369 6319T>C K2047E - 41,57 8,0645 most likely artefact 144998190 2106A>G A2106A rs2857829 7241,12 31,1037 144998169 6339C>T A2113A rs1140522 10024,93 35,9177

PLEC IN ARRHYTHMOGENIC CARDIOMYOPATHY 91 DSP c.269G>A; p.Q90R (VOUS) p.Q90R DSP c.269G>A; other ARVC-related mutation 2 mutation other ARVC-related ------TMEM c.934C>T; p.R312W (VOUS) p.R312W TMEM c.934C>T; variants in the ARVC-associated cluster cluster in the ARVC-associated variants PLEC DSP c.7994C>T; p.T2665M (VOUS) p.T2665M DSP c.7994C>T; S688P (LP) PKP2 c.2062T>C; Y616X (P) PKP2 c.1848C>A; (P) p.Q378X PKP2 c.1132C>T; 1-4 (P) PKP2 del exon (P) p.Q133X PKP2 c.397C>T; (VOUS) p.K1992R DSP c.4775A>G; C796R (P) PKP2 c.2386T>C; (P) L404fs PKP2 c.1211dup; R79X (P) PKP2 c.235C>T; (P) p.P533fsX561 PKP2 c.1597_1600delATCC; other ARVC-related mutation 1 mutation other ARVC-related SCN5A c.665G>A; p.R222Q (P) p.R222Q SCN5A c.665G>A; (LP) p.T1069M SCN5A c.3206C>T; ------JUP c.902A>G; p.E301G (VOUS) p.E301G JUP c.902A>G; R422Q (LB), A2744G (LB) D42H (VOUS) R433Q (LB) other PLEC variant Q10* (VOUS) ------A2242V (B) Ofrod domain, 17 carrying the patients in the identified ARVC-associated plectin region in the putatively variants PLEC “cluster” variants PLEC “cluster” R1688C (VOUS) R1688C (VOUS) R1688C (VOUS) E1762K (LP) E1762K (LP) R1875W (LP) R1934H (LP) R1954G (LP) R1963W (LP) R1963W (LP) R1963W (LP) R1963W (LP) R1963Q (VOUS) R1963Q (VOUS) R1963Q (VOUS) R1963Q (VOUS) R1963Q (VOUS) R1963Q (VOUS) R1963Q (VOUS), K2099R (LP) L1972V (LP) E1974K (LP) R2110Q (LP) E2157A (VOUS) T1859M (LB), K2037E (LP) T1884M (VOUS) A1747V (LP) A1909V (VOUS) A1961V (LP) A1978T (LP) A2144V (LP) B: benign; LB: likely benign; LP: likely pathogenic; P: pathogenic; TFC – task force criteria (diagnostic criteria of ARVC); VOUS: variant of unknownVOUS: variant criteria of ARVC); criteria significance (diagnostic TFC – task force pathogenic; P: likely pathogenic; LP: LB: likely benign; B: benign; TFC status TFC- TFC- TFC- TFC- TFC- TFC- TFC+ TFC+ TFC+ TFC- TFC+ TFC+ TFC- TFC- TFC- TFC- TFC+ TFC- TFC- TFC- TFC- TFC- TFC+ TFC+ TFC- TFC+ TFC+ TFC+ TFC- TFC- patient ID patient 8 of the rod domain. of the rod of the 43 were found to be carriers in desmosomal genes or other known to of additional genetic mutations ARVC Only genetic found genes. of the 43 were 6 5 variants classified as pathogenic or likely pathogenic are included. are or likely pathogenic classified as pathogenic variants 9 3 30 Abbreviations Abbreviations 2 20 21 22 23 24 25 26 27 28 29 4 Table 4. Carriership of additional potentially pathogenic variants in patients with in patients variants pathogenic of additional potentially 4. Carriership Table 7 1 10 11 12 13 14 15 16 17 18 19

92 SANGER SEQUENCING DISCUSSION We hypothesized that the cytolinker protein plectin, which supports the binding of intermediate filaments to the desmosomes, might carry genetic variants that contribute to the development of, or at least the susceptibility for ARVC, which is known as a ‘disease of the cardiac desmosome’. Our reasons were that (1) plectin is highly expressed in the myocardium and is physically connected to the cell junctions which are known to be involved in the pathomechanism of ARVC, (2) its knock down in various mouse models leads to cardiac pathology, and (3) late-onset cardiac symptoms have recently been reported for a couple of mutated plectin-carrying EBS patients. Our Sanger sequencing and NGS-based analysis of the PLEC gene resulted in the identification of 96 novel or low frequency (<2%), mostly

heterozygous variants in the PLEC gene in patients with ARVC. Previously, CHAPTER 2 multiple homozygous or compound heterozygous truncating nonsense and frameshift variants of PLEC had been shown to lead to different manifestations of epidermolysis bullosa simplex; only a couple of missense variants were identified in unusually mild cases of EBS (Bolling et al 2013). The majority of the variants now identified are missense by nature, except for a small deletion and two heterozygous truncating variants. The Q10* variant is located in one of the multiple 5’ exon 1 sequences of the gene, and upon RNA splicing, it ends up only in one isoform (isoform 1a, transcript NM_201384.1) that was previously not found to be of importance in the heart (Fuchs et al). The same exon was shown to have a small, in-frame deletion in another patient. The possible pathogenic nature of these two variants is unclear, yet not likely. Moreover, in a third patient, we identified another truncating variant, p.Y612* in exon 14, which is part of all currently known plectin isoforms. The respective patient did not show any skin phenotype, i.e. blistering. This is consistent with heterozygous carriers of truncating PLEC mutations generally reported as being healthy, while a homozygous or compound heterozygous form causes EBS. Therefore, it seems that heterozygous truncating PLEC mutations are not disease-related, although we cannot exclude that mild phenotypes might have been overlooked. All remaining variants (n=93) detected in our patient cohorts are missense variants of which a substantial number were classified as VOUS or likely pathogenic based on conservation, predicted effects on protein function and differences in physico-chemical properties of the respective amino acid residues. When analyzing the distribution of these missense variations, we

PLEC IN ARRHYTHMOGENIC CARDIOMYOPATHY 93 noted that the variants are scattered around the entire gene and do not show an obvious, exclusive clustering any smaller area or domain. Moreover, when studying the presence of novel or rare variants in the control population of the Genome of the Netherlands, various missense mutations were also identified (n=48), and of these a subset were also classified as VOUS or likely pathogenic. This raised the question whether such missense mutations are all ‘harmless’ variants or whether a subset might have disease-associated effects, as was previously shown for a couple of heterozygous missense mutations, such as the p.R2110W mutation, that led to mild forms of EBS (Bolling et al 2013). For this reason we compared the distribution of PLEC variations in affected and healthy individuals, searching for regions rich in genetic variation in patients, but that show “variant deserts” in the control group. We subsequently identified one such region in the coiled-coil rod domain (between amino acids 1688-2157). This region exhibits almost exclusively synonymous (low AND high frequency) genetic variation in the GoNL control population, but contains many missense variants in our ARVC patients. According to their conservation, frequency in SNP databases and predicted pathogenicity, the majority of these variants were classified as ‘likely pathogenic’, or, in a few cases, as VOUS. Notably, due to coverage and/or mapping difficulties for some parts of this region, variant calling was hampered and in a few cases based on less than 1000 alleles. We identified an additional short region of the plectin-repeat domains which had ARVC-associated variants clustering in the UK patient cohort but did not see these variations in the GoNL cohort. In addition, several ‘likely pathogenic’ variants found at positions outside the clustering in the rod domain were identified. Unfortunately, it is not possible to interpret the functional consequences of these missense variants and make conclusions about their potential causative nature without performing further follow- up experiments. However, a modifier role in ARVC can still be anticipated. What our findings, however, suggest is that the homo-dimerizing rod domain, which seems to contain a well-defined region of disease-associated missense variants in both the British and Dutch patient cohorts (see also figure 1), may be of structural importance for plectin molecules. Indeed, it has been observed that the dimerized rod domains are able to form remarkably stable polymers via further lateral connections with each other (Walko et al). Moreover, in EBS-Ogna mouse and patient keratinocytes, the missense variants of the rod domain were found to increase the proteolysis of plectin

94 SANGER SEQUENCING in the hemidesmosomes, a mechanism which was rescued by treatment with inhibitors (Walko et al). Interestingly, there are a few exceptional yet mild EBS cases recently reported to be due to heterozygous missense variants of PLEC (Bolling et al 2013). In the skin, PLEC homozygous truncations (and as a consequence the lack of plectin protein expression) seem to predominantly cause a dysfunction of the hemidesmosomes, leading to insufficient attachment of keratinocytes to the dermis at the basement membrane zone and manifesting as the basal blistering of the skin (McLean et al, Smith et al). The missense heterozygous variants of the rod domain also cause basal blistering by increasing the vulnerability of plectin to calpain-mediated proteolysis (Walko et al), yet their effect is not as drastic and the phenotype is limited to the hands and feet, areas subject to more mechanical stress. CHAPTER 2

Figure 1. Schematic illustration of regions of interest in PLEC. The marked areas were found to have a number of missense variants found in ARVC patients, yet showing “variant deserts” in the GoNL control population. Orange-yellow colour indicates the region found in the Dutch patient cohort, while regions coloured in red were found in the British patient cohort. Exon 31 encodes the homodimerizing rod domain of plectin.

The exact mechanism by which heterozygous missense variants of PLEC could lead to cardiomyopathy are unknown. However, we know that plectin contains a number of plakin repeat domains typical of desmosomal proteins and responsible for binding intermediate filaments, as well as a highly variable N-terminal actin-binding domain. Thus we anticipate that by binding both intermediate filaments and ,PLEC may play a role in attracting the structurally robust desmosomes around the more fragile adherens junctions at the cardiac intercalated disks. Adherens junctions anchor the more dynamic and fragile thin (actin) filaments of the neighbouring cardiomyocytes together

PLEC IN ARRHYTHMOGENIC CARDIOMYOPATHY 95 and provide the continuity between myofibrils of these neighbouring cells. We hypothesize that the presence of missense variants leads to increased proteolysis of plectin in the heart (similar to what occurs in the skin), which could in turn decrease its interjunctional linking capacity, make the adherens junctions less supported by desmosomes and thus more sensitive to stress and damage. This could explain why mutations in PLEC do not seem to be involved in a monogenic, classical Mendelian form of ARVC, since the intact, strong desmosomes could still keep the cardiomyocytes aligned together. Thus, an additional desmosomal mutation or external stressor may be necessary to pass the thresholds for disease development. This would also fit the concept of an oligogenic/multifactorial disease mechanism of ARVC that has been suggested by Marcus et al (2013) and Perrin et al, and would better explain the low penetrance observed in ARVC. The pathogenicity of previously identified genetic variants is increasingly being questioned because they are found in healthy individuals in much larger percentage than would be expected based on the estimated 1:2000 incidence and late-onset nature of the disease. Even ‘radical mutations’ can be found in 0.5% of ostensibly healthy controls, while ‘missense mutations’ were identified in nearly 16% of controls that were screened for just the five prominent ARVC genes (Kapplinger et al). These observations would also fit in an oligogenic disease model. In this study, we have shown that about half of the patients carrying a PLEC missense variant in the ’pathogenic region’ of the rod domain are also carriers of other definitive mutations in known ARVC genes. Certain novel missense variants identified in our ARVC patient cohort have been identified in EBS patients as well, but without any obvious overlap in clinical phenotypes. For instance, variants R1963W and p.V4399I were identified in both EBS and ARVC patients, but absent from GoNL (EBS patients; unpublished results). Moreover, the mutation p.R2110W (in dermatological context reported as p.R2000W) that was detected in seven EBS index patients (Bolling et al 2013, Kiritsi et al) affects the same amino acid as the likely pathogenic p.R2110Q variant in one of our ARVC patients. Surprisingly, however, our ARVC patients exhibited no striking skin blistering disorder or muscular dystrophy, while the respective EBS patients did not show ARVC features. It is currently unclear what could cause some patients with missense variants to develop either mild EBS or ARVC, although one might expect that the stochastic distribution of wt/wt, wt/VOUS and VOUS/ VOUS plectin dimers, as well as the tissue-specific factors influencing the

96 SANGER SEQUENCING calpain-mediated proteolysis, could to some extent explain this phenotypic variability. It is also possible that some EBS patients are suffering from an undiscovered cardiomyopathy. Another explanation is that genetic modifiers or a multifactorial background might also influence the phenotype of EBS (Padalon-Brauch et al), in which case PLEC missense mutations are not the sole cause of the disease, although they still contribute substantially to the phenotype. Importantly, EBS has much younger age of onset (and is more easily diagnosed) than ARVC, and EBS patients might still have the chance of developing cardiomyopathy later in life. For this reason, EBS patients (at least those with an identified genetic variant of PLEC) might benefit from cardiological evaluation and/or regular follow-up (Bolling et al 2010). The referral of ARVC patients to dermatologists, on the contrary, does not seem to be essential based on our current knowledge. The exact mechanical role of plectin in different tissues, as well as the mechanism via which truncating and CHAPTER 2 missense mutations cause different disease phenotypes, however, requires further in-depth research in the future. There are a few examples of patients and families suffering from a combi- nation of skin blistering and cardiac phenotypes due to PLEC mutations (Schröder et al, Bolling et al 2010). One patient had a homozygous frameshift mutation in PLEC and exhibited EBS-MD with cardiac hypertrophy, while another was compound heterozygous for a truncating mutation (p.E1724X) and a missense variant (p.R433Q). Though this missense variant was detected in 3 Dutch and 13 UK patients in this study, it was also present in relatively high frequency in GoNL controls (1.36%) and thus classified as likely benign. However, in the patient carrying both PLEC variants, the p.R433Q variant is most likely the predominant form produced due to diminished protein expression from the allele carrying the nonsense mutation, and therefore it might still contribute to the cardiac pathology in this patient. In addition to these two previous case reports, we recently also observed cardiac involvement in two brothers affected by severe EBS-MD carrying double homozygous PLEC variants (p.E1914X and p.Y2967C) (Koss-Harnes et al). Since then, both patients died due to hypertrophic dilated cardiomyopathy. Additionally, one of their non-EBS brothers suffered from sudden cardiac death, a well-known feature of ARVC, and his son was affected by DCM without any signs of skin blistering or muscular dystrophy. Unfortunately, the PLEC carriership status of these family members is unknown. Villa et al have also just reported a case of EBS-MD developing left ventricular non-compaction cardiomyopathy.

PLEC IN ARRHYTHMOGENIC CARDIOMYOPATHY 97 Taken together, these findings suggest that it might be advisable to perform genetic screening of PLEC in other types of cardiomyopathy as well. The exact mechanical role of plectin in different tissues, as well as the mechanism via which truncating and missense mutations cause different disease phenotypes requires in-depth research in the future.

CONCLUSIONS We identified 96 novel or low frequency (<2%), mostly heterozygous, missense variants of PLEC in ARVC patients. The facts that these variants were identified in addition to pathogenic desmosomal mutations in a subset of these patients, that co-segregation with disease could not be established for those variants that were analysed for this study, and that novel or rare variants with predicted pathogenic nature were also identified in the healthy GoNL cohort led us to conclude that PLEC cannot be considered the primary genetic cause of inherited ARVC. By comparing the natural genetic variation of the gene, by collecting all variants with an allele frequency <2% identified in the GoNL population with that found in patients, we were able to identify a region of probably ARVC-associated missense variants in the rod domain, which is thought to mediate homodimerization/dimerization of the protein and might become more vulnerable to proteolysis. We hypothesize that genetic variations in this domain of PLEC may make the cellular junctions more fragile, thus increasing the susceptibility to ARVC. This result underscores the previously suggested multifactorial nature of ARVC and suggests that PLEC variations, at least when present in specific regions of the protein, contribute to the number of risk factors to reach the threshold levels needed to initiate the development of this disease.

ACKNOWLEDGEMENTS The authors would like to acknowledge Jackie Senior and Kate Mc Intyre for editing this manuscript, as well as Rudi Alberts for designing the script for the GoNL, 1000 Genomes and ESP allele frequency annotation of the genetic variants. Part of this work was undertaken at University College London Hospital and University College London (UCLH/UCL), which received some funding from the UK Department of Health’s NIHR Biomedical Research Centres funding scheme. This study made use of data generated by the Genome of the Netherlands Project.

98 SANGER SEQUENCING REFERENCES Andrä K, Lassmann H, Bittner R et al. Targeted inac- the original Dutch family with epidermolysis tivation of plectin reveals essential function in bullosa simplex with muscular dystrophy due maintaining the integrity of skin, muscle, and to a homozygous novel plectin point mutation. heart cytoarchitecture. Genes Dev 1997;11:3143-56 Acta Derm Venereol 2004;84:124-31 Bao JR, Wang JZ, Yao Y et al. Screening of pathogen- Lopes LR, Zekavati A, Syrris P et al. Genetic com- ic genes in Chinese patients with arrhythmo- plexity in hypertrophic cardiomyopathy re- genic right ventricular cardiomyopathy. Chin vealed by high-throughput sequencing. J Med Med J (Engl) 2013;126:4238-41 Genet 2013;50(4):228-39 Bauce B, Nava A, Beffagna G et al. Multiple muta- Marcus FI, McKenna WJ, Sherrill D et al. Diagnosis tions in desmosomal proteins encoding genes of arrhythmogenic right ventricular cardiomy- in arrhythmogenic right ventricular cardiomy- opathy/dysplasia: proposed modification of the opathy/dysplasia. Heart Rhythm 2010;7:22-9 task force criteria. Eur Heart J 2010;31:806-14 Bolling MC, Jongbloed JDH, Boven LG et al. Plectin Marcus FI, Edson S, Towbin JA: Genetics of ar- mutations underlie epidermolysis bullosa sim- rhythmogenic right ventricular cardiomyopa- plex in 8% of patients. J Invest Dermatol 2013; thy: a practical guide for physicians. J Am Coll doi: 10.1038/jid.2013.277 Cardiol 2013;61:1945-8 Bolling MC, Pas HH, de Visser M et al. PLEC1 muta- McKoy G, Protonotarios N, Crosby A et al. Identifi- tions underlie adult-onset dilated cardiomyopa- cation of a deletion in plakoglobin in arrhyth- CHAPTER 2 thy in epidermolysis bullosa simplex with mus- mogenic right ventricular cardiomyopathy with cular dystrophy. J Invest Dermatol 2010;130:1178-81 palmoplantar keratoderma and woolly hair Elliott P, Andersson B, Arbustini E et al. Classifica- (Naxos disease). Lancet 2000;355:2119-24 tion of the cardiomyopathies: a position state- McLean WH, Pulkkinen L, Smith FJ et al. Loss of ment from the European Cociety of Cardiology plectin causes epidermolysis bullosa with mus- Working Group on Myocardial and Pericardial cular dystrophy: cDNA cloning and genomic Diseases. Eur Heart J 2008;29:270-6 organization. Genes Dev 1996;10:1724-35 Fuchs P, Zörer M, Rezniczek GA et al. Unusual 5’ Nakajima T, Kaneko Y, Irie T et al. Compound and transcript complexity of plectin isoforms: nov- digenic heterozygosity in desmosome genes el tissue-specific exons modulate actin binding as a cause of arrhythmogenic right ventricular activity. Hum Mol Genet 1999;8:2461-72 cardiomyopathy in Japanese patients. Circ J Grumbach IM, Heim A, Vonhof S et al. Coxsackie- 2012;76:737-43 virus genome in myocardium of patients with Norgett EE, Hatsell SJ, Carvajal-Huerta L et al. Reces- arrhythmogenic right ventricular dysplasia/ sive mutation in desmoplakin disrupts desmo- cardiomyopathy. Cardiology 1998;89:241-5 plakin–intermediate filament interactions and Jacob KA, Noorman M, Cox MGPJ et al. Geo- causes dilated cardiomyopathy, woolly hair and graphical distribution of plakophilin-2 keratoderma. Hum Mol Genet 2000;9(18):2761-6 mutation prevalence in patients with ar- Padalon-Brauch G, Ben Amiati D, Vodo D et al. Di- rhythmogenic cardiomyopathy. Neth Heart genic inheritance in epidermolysis bullosa sim- J 2012;20:234-9 plex. J Invest Dermatol 2012;132(12):2852-4 Kapplinger JD, Landstrom AP, Salisbury BA et al. Perrin MJ, Angaran P, Laksman Z et al. Exercise Distinguishing arrhythmogenic right ventric- testing in asymptomatic gene carriers exposes ular cardiomyopathy/dysplasia-associated mu- a latent electrical substrate of arrhythmogen- tations from background genetic noise. J Am ic right ventricular cardiomyopathy. J Am Coll Coll Cardiol 2011;57:2317-27 Cardiol 2013;62:1772-9 Kiritsi D, Pigors M, Tantcheva-Poor I et al. Epider- Rigato I, Bauce B, Rampazzo A et al. Compound molysis bullosa simplex ogna revisited. J Invest and digenic heterozygosity predicts life-time Dermatol 2013;133:270-3 arrhythmic outcome and sudden cardiac death Konieczny P, Fuchs P, Reipert S et al. Myofiber in- in desmosomal gene-related arrhythmogenic tegrity depends on desmin network targeting right ventricular cardiomyopathy. Circ Cardio- to Z-disks and costameres via distinct plectin vasc Genet 2013;6:533-42 isoforms. J Cell Biol 2008;181:667-81 Saberniak J, Hasselberg NE, Borqquist R et al. Koss-Harnes D, Hoyheim B, Jonkman MF et al. Life- Vigorous physical activity impairs myocardi- long course and molecular characterization of al function in patients with arrhythmogenic

PLEC IN ARRHYTHMOGENIC CARDIOMYOPATHY 99 right ventricular cardiomyopathy and in muta- Xu T, Yang Z, Vatta M et al. Compound and digenic tion positive family members. Eur J Heart Fail heterozygosity contributes to arrhythmogenic 2014;16:1337-44 right ventricular cardiomyopathy. J Am Coll Schröder R, Kunz WS, Rouan F et al. Disorganiza- Cardiol 2010;55:587–97 tion of the desmin cytoskeleton and mitochon- drial dysfunction in plectin-related epidermol- ysis bullosa simplex with muscular dystropyhy. J Neuropathol Exp Neurol 2002;61(6):520-30 Sen-Chowdhry S, Morgan RD, Chambers JC et al: Arrhythmogenic cardiomyopathy: etiolo- gy, diagnosis, and treatment. Annu Rev Med 2010;61:233-53 Smith FJ, Eady RA, Leigh IM et al. Plectin deficien- cy results in muscular dystrophy with epider- molysis bullosa. Nat Genet 1996;13:450-7 Sonnenberg A & Liem RKH: in de- velopment and disease. Exp Cell Res 2007;313:2189-203 Teekakirikul P, Kelly MA, Rehm HL et al. Inherit- ed cardiomyopathies: molecular genetics and clinical genetic testing in the postgenomic era. J Mol Diagn 2013;15:158-70 te Rijdt WP, Jongbloed JDH, de Boer RA et al. Clin- ical utility gene card for: arrhythmogenic right ventricular cardiomyopathy (ARVC) Eur J Hum Genet 2013; doi: 10.1038/ejhg.2013.124 Villa CR, Ryan TD, Collins JJ et al. Left ventricular non-compaction cardiomyopathy associated with epidermolysis bullosa simplex with mus- cular dystrophy and PLEC1 mutation. Neuro- muscular Disord 2015;25:165-8 Walko G, Vukasinovic N, Gross K et al. Targeted proteolysis of plectin isoform 1a accounts for hemidesmosome dysfunction in mice mimick- ing the dominant skin blistering disease EBS- Ogna. PLoS Genet 2011;7(12):e1002396 Wiche G, Krepler R, Artlieb U et al. Occurrence and immunolocalization of plectin in tissues. J Cell Biol 1983;97:887-901 Winter L & Wiche G: The many faces of plectin and plectinopathies; pathology and mechanisms. Acta Neuropathol 2013;125(1):77-93 Zernig G & Wiche G: Morphological integrity of single adult cardiac myocytes isolated by col- lagenase treatment: immunolocalization of tu- bulin, -associated proteins 1 and 2, plectin, , and vinculin. Eur J Cell Biol 1985;38(1):113-22 van Tintelen JP, Entius MM, Bhuiyan ZA et al. Pla- kophilin-2 mutations are the major determi- nant of familial arrhythmogenic right ventric- ular dysplasia/cardiomyopathy. Circulation 2006;113:1650–8

100 SANGER SEQUENCING LIKELY BENIGN LIKELY BENIGN LIKELY BENIGN LIKELY BENIGN LIKELY BENIGN LIKELY PATHOGENIC LIKELY PATHOGENIC LIKELY PATHOGENIC LIKELY PATHOGENIC LIKELY BENIGN LIKELY VOUS VOUS VOUS VOUS VOUS VOUS VOUS VOUS VOUS VOUS VOUS CLASSIFICATION 0.0% 0.0% 0,00% 0,10% 0,00% 0.0% 0.0% 0.0% 0.0% 0,00% 0,10% 0,20% 0.0% 0,20% 0.2% 0,00% 0,10% 0.0% 0.0% 0,10% GoNL 1.3%

ESP 0/~13000 0/~13000 0/~13000 0; 6/3732 0/~13000 0/~13000 0/~13000 0/~13000 0/~13000 0/~13000 0; 1/4212 0/~13000 0/~13000 0/~13000 0/~13000 0/~13000 (EA; AA) 9/8342; 0 2/8416; 0 4/8296; 1/4116 1/8410; 57/4134 144/8466; 14/4188 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 Genomes CONTROL DATABASE FREQUENCIES DATABASE CONTROL 7/2184 1000 13/2184 CHAPTER 2 minor effect minor effect minor effect (but large exon) (but large splicing strong effect ------MutationTaster polymorphism polymorphism n.a. n.a. polymorphism n.a. polymorphism disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing tolerated tolerated tolerated tolerated SIFT n.a. n.a. n.a. deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious PolyPhen benign benign benign benign benign benign benign benign benign probably probably probably probably possibly n.a. n.a. possibly probably possibly probably n.a. probably damaging damaging damaging damaging damaging damaging damaging damaging damaging variants PREDICTED EFFECT n.a. n.a. n.a. C0 C0 C0 C0 C0 C0 C0 C0 C0 C0 C0 C0 C0 C65 C65 C65 C55 C0 AGVGD PLEC PhyloP PhyloP [-14.1;6.4] 0.53 0.29 0.37 0.37 n.a. n.a. n.a. 5.21 5,37 5.37 5,61 2.38 2,87 2.06 4.24 4,08 1.17 1,82 1,58 1.42 1,42 high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (region) high (Xenopus) weak (cow) weak (cow) weak (cow) weak (cow) weak (horse) weak (region) weak (region) weak weak (cow) weak (cow) weak weak (up to...) conservation conservation 81 21 21 64 64 98 98 45 43 43 43 43 24 29 n.a. n.a. n.a. 194 194 180 110 distance Grantham Grantham ESP (EA; AA): allele frequency in European Ancestry allele frequency Ancestry Sequencing and African in European ESP (EA; AA): Project individuals in the Exome D23_N24insY VARIANT database; GoNL: allele counts in the GoNL n.a.: not applicable/available database; GoNL:database; allele counts (NM_201383.1) (NM_201383.1) (NM_201384.1) (NM_201384.1) c.30G>C, D10E c.30G>C, P89L c.266G>A, R102H c.305C>T, R141C c.421G>A, P168L c.503G>A, A7V c.20C>T, D42H c.124G>C, Q10* c.28C>T, c.69_70insTAC, R248Q c.743C>T, R316Q c.947C>T, V402M c.1204C>T, R422Q c.1265G>A, R433Q c.1298C>T, V512M c.1534C>T, S523R c.1569G>C, C545Y c.1634C>T, A582V c.1745G>A, Y612* c.1836C>G, Y625C c.1874A>G, Q639H c.1917C>G, Supplementary of table 1: Classification coordinates Abbreviations Abbreviations

PLEC IN ARRHYTHMOGENIC CARDIOMYOPATHY 101 LIKELY PATHOGENIC LIKELY PATHOGENIC LIKELY BENIGN BENIGN LIKELY BENIGN LIKELY BENIGN LIKELY BENIGN LIKELY BENIGN LIKELY PATHOGENIC LIKELY PATHOGENIC LIKELY PATHOGENIC LIKELY BENIGN LIKELY BENIGN LIKELY BENIGN LIKELY PATHOGENIC LIKELY PATHOGENIC LIKELY PATHOGENIC LIKELY PATHOGENIC LIKELY VOUS VOUS VOUS VOUS VOUS VOUS VOUS CLASSIFICATION 0.0% 0.0% 0,00% 0.0% 0,20% 0.0% 0.0% 0,00% 0.0% 0,20% 0,10% 0,40% 0,00% 0.1% 0.0% 0,20% 0.0% 0,20% 0,00% 0,00% 0,00% 0.0% 0.0% GoNL 1.0% 1.0%

ESP 0/~13000 0/~13000 0; 13/4186 0/~13000 0; 23/4302 0/~13000 0/~13000 0/~13000 0 0/~13000 0 0/~13000 0/~13000 0/~13000 0/~13000 0/~13000 (EA; AA) 3/8370; 442/4402 3/8480; 0 38/7890; 4/3790 26/6166; 2/2752 1/8494; 0 16/8098; 2/3868 14/8436; 44/4178 1/8548; 0 1/6312; 22/2888 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 Genomes 6/2184 6/2184 CONTROL DATABASE FREQUENCIES DATABASE CONTROL 9/2184 9/2184 77/2184 1000 1/2184 10/2184 1/2184 minor effect minor effect minor effect splicing ------MutationTaster polymorphism polymorphism polymorphism polymorphism polymorphism polymorphism polymorphism disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing tolerated tolerated tolerated tolerated tolerated SIFT deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious PolyPhen benign benign benign benign benign benign benign benign benign benign benign benign benign benign probably probably probably possibly possibly possibly possibly probably possibly possibly possibly possibly damaging damaging damaging damaging damaging damaging damaging damaging damaging damaging damaging PREDICTED EFFECT C65 C65 C0 C15 C25 C25 C0 C15 C45 C25 C15 C25 C65 C65 C0 C0 C35 C0 C0 C65 C45 C65 C65 C0 C65 AGVGD PhyloP PhyloP [-14.1;6.4] 0,53 0.21 0.93 0,69 0.85 0,77 5.13 5.13 5,61 5,61 5,45 5,53 5,53 3.19 2.14 2,22 2.63 2.22 2,06 2.30 2.06 2.63 1,01 1.74 1,74 high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) R or K) weak (cow) weak (cow) weak (cow) weak (cow) weak weak (cow) weak (cow) weak (cow) weak weak (cow) weak (cow) weak (up to...) moderately (only moderately moderate conservation conservation 81 21 21 71 64 64 56 58 43 43 43 26 29 29 76 101 101 125 145 112 109 180 180 180 180 distance Grantham Grantham VARIANT c.2474G>T, P825Q c.2474G>T, C850S c.2549C>G, A987T c.2959C>T, V988M c.2962C>T, R1044C c.3130G>A, R1118C c.3352G>A, R1291Q c.3872C>T, R1365W c.4093G>A, R1397G c.4189G>C, K1427R c.4280T>C, T1467R c.4400G>C, R1480H c.4439C>T, R1496C c.4486G>A, V1546G c.4637T>G, V1593M c.4777C>T, R1646H c.4937G>A, R1648Q c.4943C>T, S1656L c.4967C>T, R1670Q c.5009C>T, A1685V c.5054G>A, R1688C c.5062G>A, A1747V c.5240G>A, E1762K c.5284C>T, T1859M c.5576C>T, R1875W c.5623C>T, coordinates

102 SANGER SEQUENCING LIKELY PATHOGENIC LIKELY PATHOGENIC LIKELY PATHOGENIC LIKELY PATHOGENIC LIKELY PATHOGENIC LIKELY PATHOGENIC LIKELY PATHOGENIC LIKELY PATHOGENIC LIKELY PATHOGENIC LIKELY PATHOGENIC LIKELY PATHOGENIC LIKELY BENIGN LIKELY PATHOGENIC LIKELY PATHOGENIC LIKELY VOUS VOUS VOUS VOUS VOUS VOUS VOUS VOUS 0,00% 0,00% 0.0% 0.0% 0,00% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0,00% 0,00% 0.1% 0,00% 0,60% 0,00% 0,70% 0.0% 0,00% 0.0% 0/~13000 0/~13000 0/~13000 0/~13000 0/~13000 0/~13000 0 0/~13000 0 0 0/~13000 0/7114; 13/3336 0/~13000 0/~13000 0/~13000 0 35/4580; 4/1900 2/7676; 3/3563 1/6216; 6/2628 1/7912; 1/3826 1/7880; 0 1/8102; 0 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 5/2184 19/218 1/2184 1/2184 10/2184 CHAPTER 2 minor effect minor effect minor effect (but large (but large exon) strong effect effect strong ------polymorphism polymorphism polymorphism polymorphism disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing tolerated tolerated deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious benign benign benign benign benign benign benign probably probably possibly probably probably possibly probably possibly probably probably possibly probably probably possibly probably probably damaging damaging damaging damaging damaging damaging damaging damaging damaging damaging damaging damaging damaging damaging damaging C0 C0 C25 C65 C65 C65 C35 C25 C55 C55 C55 C35 C0 C0 C0 C0 C0 C25 C0 C25 C35 C0 0,93 0.85 0,45 5.37 5.37 5,21 5.21 3,03 5,77 2,71 2.06 2.71 2,3 2.14 4,32 4.00 1,82 1.82 1.90 1.82 1.90 1,82 high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) weak (cow) weak (cow) weak (cow) weak (dog) weak (cow) weak 81 32 64 64 64 64 64 56 56 58 58 43 43 43 24 29 29 101 101 107 125 109 c.5651G>A, T1884M c.5651G>A, A1909V c.5726G>A, R1934H c.5801G>A, R1954G c.5860C>G, A1961V c.5882G>A, R1963W c.5887G>A, R1963Q c.5888G>A, L1972V c.5914C>G, E1974K c.5920G>A, A1978T c.5932G>A, K2037E c.6109A>G, R2110Q c.6329G>A, A2144V c.6431G>A, E2157A c.6470T>G, A2158V c.6473G>A, Q2203H c.6609C>A, A2241V c.6722G>A, R2246W c.6736G>A, V2248G c.6743A>C, R2276H c.6827G>A, R2286Q c.6857C>T, A2311T c.6931G>A,

PLEC IN ARRHYTHMOGENIC CARDIOMYOPATHY 103 LIKELY PATHOGENIC LIKELY BENIGN LIKELY BENIGN LIKELY BENIGN LIKELY BENIGN LIKELY PATHOGENIC LIKELY BENIGN LIKELY BENIGN LIKELY BENIGN LIKELY BENIGN LIKELY BENIGN LIKELY BENIGN BENIGN LIKELY BENIGN LIKELY BENIGN LIKELY BENIGN LIKELY BENIGN LIKELY BENIGN LIKELY VOUS VOUS VOUS VOUS VOUS VOUS VOUS VOUS VOUS VOUS VOUS VOUS VOUS CLASSIFICATION 0,00% 0.0% 0.5% 0,40% 0.0% 0,90% 0.0% 0,10% 0,30% 0,00% 0.2% 0,00% 0,20% 0.8% 0.0% 0.5% 0.0% 0.0% 0,10% 0,90% 0,10% 0,00% 0.0% 0,10% 0,20% 0,00% 0,00% 0,00% 0,00% GoNL 1.0% 1.0%

ESP 0 0 0/~13000 0/~13000 0/~13000 0 0 0/~13000 0/~13000 0/~13000 0/~13000 0/~13000 (EA; AA) 6/8470; 1/4256 6/8380; 1/4152 62/8338; 7/3998 99/8272; 10/3984 33/8494; 131/4296 3/8524; 0 39/8532; 927/4330 20/8328; 3/4018 21/8334; 0 2/8540; 0 49/8568; 9/4378 4/8198; 1/4296 4/8332; 0 75/8412; 12/4184 14/8518; 4/4312 14/8348; 3/4140 1/8538; 1/4330 1/8206; 0 13/8308; 2/4118 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 Genomes CONTROL DATABASE FREQUENCIES DATABASE CONTROL 9/2184 5/2184 23/2184 2/2184 4/2184 7/2184 7/2184 1000 1/2184 117/2184 1/2184 11/2184 minor effect minor effect minor effect minor effect (but large (but large splicing exon) strong effect effect strong ------MutationTaster polymorphism polymorphism polymorphism polymorphism polymorphism polymorphism polymorphism polymorphism polymorphism polymorphism polymorphism polymorphism polymorphism polymorphism polymorphism polymorphism polymorphism polymorphism disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing tolerated tolerated tolerated tolerated tolerated tolerated tolerated tolerated tolerated tolerated tolerated tolerated tolerated tolerated tolerated SIFT deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious PolyPhen benign benign benign benign benign benign benign benign benign benign benign benign benign benign benign benign benign benign benign benign benign benign benign benign benign benign benign possibly probably possibly probably damaging damaging damaging damaging PREDICTED EFFECT C0 C35 C0 C0 C0 C0 C55 C0 C0 C0 C65 C25 C0 C0 C15 C55 C15 C0 C0 C35 C0 C55 C0 C0 C0 C0 C0 C55 C0 C65 C0 AGVGD PhyloP PhyloP [-14.1;6.4] 0.53 0,12 0,12 0,29 0,04 3,43 3.68 5.45 3.11 2,06 2.30 2,47 2,55 4.08 4.16 1,5 1.58 1.66 1,98 1,25 1,17 1.09 -0.04 -1.09 -0.12 -0.36 -0.76 -0.12 -0.12 -0.36 -1.41 high (frog) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) weak (cow) weak (mouse) weak (cow) weak (cow) weak (lemur) weak (lemur) weak (cow) weak (cow) weak weak (cow) weak (lemur) weak weak weak (mouse) weak (cow) weak (cow) weak weak (up to...) moderate moderate moderate (zebrafish) conservation conservation 21 64 56 56 56 56 56 45 58 58 58 58 23 23 23 60 43 43 43 43 43 43 43 26 29 101 125 194 180 180 180 distance Grantham Grantham VARIANT c.6947C>T, R2316Q c.6947C>T, R2419Q c.7256G>A, A2560T c.7678C>T, R2694W c.8080G>A, A2744G c.8231C>G, R2808Q c.8423C>T, E2818K c.8452C>T, A2820T c.8458C>T, R2821Q c.8462C>T, I2865V c.8539T>C, A2871V c.8612G>a, K2915R c.8744T>C, V2961M c.8881C>T, Y2967C c.8900T>C, D2973N c.8917C>T, E2974K c.8920C>T, E2975K c.8923C>T, A2981T c.8941C>T, R3076Q c.9227C>T, D3077E c.9231G>C, D3130N c.9388C>T, E3149K c.9445C>T, T3152A c.9454T>C, R3155Q c.9464C>T, D3320N c.9958C>T, R3335Q c.10004C>T, K3352E c.10054T>C, R3368C c.10102G>A, R3409C c.10225G>A, R3446C c.10336G>A, G3458R c.10372C>T, coordinates

104 SANGER SEQUENCING LIKELY BENIGN LIKELY BENIGN LIKELY BENIGN LIKELY PATHOGENIC LIKELY BENIGN LIKELY BENIGN LIKELY PATHOGENIC LIKELY BENIGN LIKELY BENIGN LIKELY BENIGN LIKELY BENIGN LIKELY PATHOGENIC LIKELY BENIGN BENIGN LIKELY PATHOGENIC LIKELY PATHOGENIC LIKELY PATHOGENIC LIKELY PATHOGENIC LIKELY PATHOGENIC LIKELY PATHOGENIC LIKELY BENIGN LIKELY BENIGN LIKELY PATHOGENIC LIKELY PATHOGENIC LIKELY PATHOGENIC LIKELY VOUS VOUS 0,00% 0,00% 0,00% 0.1% 0,10% 0,00% 0,10% 0,10% 0,00% 0,00% 0,30% 0,00% 0,00% 0,30% 0,10% 0,10% 0,20% 0,20% 0,00% 0,00% 0,00% 0.0% 0.0% 0,00% 0,00% 1.8% 1.0% 0/~13000 0/~13000 0/~13000 0/~13000 0/~13000 0; 1/4360 0/~13000 0/~13000 0/~13000 0/8362; 1/4056 0/~13000 0/~13000 0/8330; 2/4024 0/8304; 13/3962 0/~13000 0/~13000 6/8446; 0 90/8382; 3/4112 2/8272; 0/3880 2/8376; 0 2/8026; 408/3760 4/8290; 0 4/8368; 1/4073 7/7656; 3/3544 139/8390; 8/4172 1/8291; 0 10/8372; 0 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 0/2184 82/2184 21/2184 1/2184 18/2184 12/2184 1/2184 CHAPTER 2 minor effect minor effect minor effect ------polymorphism polymorphism polymorphism polymorphism polymorphism polymorphism polymorphism polymorphism polymorphism disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing disease causing tolerated tolerated tolerated tolerated tolerated tolerated tolerated tolerated tolerated deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious deleterious benign benign benign benign benign benign benign benign benign benign benign benign benign possibly probably possibly probably probably possibly probably probably probably possibly probably probably probably probably damaging damaging damaging damaging damaging damaging damaging damaging damaging damaging damaging damaging damaging damaging C0 C55 C0 C65 C35 C0 C0 C55 C0 C0 C0 C0 C65 C0 C55 C65 C25 C25 C15 C15 C65 C15 C0 CC0 C65 C55 C65 0,53 0.93 0,12 0,77 0,53 0,61 0,93 0.69 5.86 5,29 3,51 5,86 5,86 5,86 5,86 5,86 5.86 2,55 2,47 2.79 2,79 4,81 4,24 4.00 1,74 1,98 -0.04 high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) high (zebrafish) weak (cow) weak (cow) weak (cow) weak (mouse) weak (cow) weak (cow) weak weak (lemur) weak (cow) weak (cow) weak weak 81 81 21 64 64 64 56 56 56 56 56 58 99 23 60 43 43 29 29 29 74 74 74 113 180 180 180 c.10454C>T, R3485Q c.10454C>T, G3490A c.10469G>C, R3514Q c.10541C>T, S3583P c.10747A>G, R3637C c.10909G>A, A3686S c.11056C>A, S3720P c.11158A>G, E3761K c.11281C>T, A3775V c.11324G>A, A3813V c.11438G>A, A3816V c.11447G>A, E3914K c.11740C>T, Q3921L c.11762T>A, D4004H c.12010C>G, G4008S c.12022C>T, T4044M c.12131G>A, R4146H c.12437C>T, V4148I c.12442C>T, V4200M c.12598C>T, E4201K c.12601C>T, R4206C c.12616G>A, D4219N c.12655C>T, A4377T c.13129C>T, V4399I c.13195G>A, P4410S c.13228G>A, G4629S c.13885C>T, R4667C c.13999G>A,

PLEC IN ARRHYTHMOGENIC CARDIOMYOPATHY 105

CHAPTER 3 EXOME SEQUENCING

Chapter 3.1

Hunting for novel disease genes in autosomal dominant cardiomyopathies: elucidating a role for the sarcomeric pathway

Rowida Almomani*, Anna Posafalvi*, Paul A van der Zwaag, Carlo L Marcelis, Bert Baars, Johanna C Herkert, Rudolf A de Boer, Karin Y van Spaendonck-Zwarts, Maarten P van den Berg, Richard J Sinke, J Peter van Tintelen§, Jan DH Jongbloed§

* The first two authors contributed equally § The last two authors contributed equally ABSTRACT We performed exome sequencing and a haplotype sharing test on a group of twelve families with autosomal dominant cardiomyopathy and no previous genetic diagnosis in order to identify potentially novel disease genes. Our approach resulted in the identification of the genetic cause of disease in 6/12 families. We found truncating variants in TTN in two dilated cardiomyopathy families, a frame-shift mutation in FLNC and a double missense mutation in FHL2 in two arrhythmogenic cardiomyopathy families, and missense variants in the COBL and STARD13 genes in two dilated cardiomyopathy families, respectively, both of which are genes that have not been related to cardiac pathology before. Thorough data-mining suggests a possible role for all of these genes in the disease mechanism of late onset cardiomyopathies. By creating a co-expression network of the five genes using an expression-array- based bioinformatics database and software created in the department, we show that 100 of the 166 proteins included in our network have been annotated with a potential function in cardiac development and physiology. Of these 100 proteins, 28 are known as disease genes in various types of cardiomyopathy, and a role in sarcomere assembly seems to be the common molecular pathway connecting a large proportion of these genes. INTRODUCTION Dilated cardiomyopathy (DCM) is a progressive heart disease mainly characterised by left ventricular dilatation and impaired cardiac contraction, while arrhythmogenic right ventricular cardiomyopathy (ARVC) is a common cause of sudden cardiac death because of its association with ventricular arrhythmias (Hershberger et al, Basso et al). Currently, there are more than 50 genes linked to the pathogenesis of familial DCM. In the pre-NGS era these genes could only explain up to 20% of Dutch DCM cases (25% in familial cases and 8% in sporadic cases) (van Spaendonck et al, 2013), while including screening of the titin (TTN) gene improved this to 45-50% (Wilde & Behr, Posafalvi et al, van Spaendonck et al 2014). Our gene-panel-based Next Generation Sequencing (NGS) method, which was recently implemented into routine DNA diagnostics, resulted in the identification of mutations and likely pathogenic variants in up to 55% of DCM index patients (see chapter 4.1). On the other hand, to date there are still ‘only’ 13 ARVC genes known (te Rijdt et al). One of these is the desmosomal plakophilin 2 gene (PKP2), and mutations in this gene are the most frequent cause of familial ARVC, occurring in up to 70% of the patients (van Tintelen et al). Until recently, the yield of attempts to identify genetic mutations in ARVC patient CHAPTER 3.1 cohorts via traditional sequencing was only ~50% (Cox et al, Quarta et al). In this study our aim was to identify the disease gene in families currently considered “unsolved” (without a known genetic factor potentially explaining the phenotype). For this purpose, we used exome sequencing (ES), i.e. sequen- cing of all protein-coding regions of the genome, to identify (potentially novel) disease genes in inherited cardiomyopathy patients/families. Since the inheritance pattern in the families studied was most likely to be autosomal dominant, and ES is well known to result in a huge number of heterozygous variants (potential mutations as well as benign variants), the data analysis was much more challenging than identifying the cause of the disease in a recessive form of the disease (such as in the rare cases of consanguinity). Hence, it was of special importance to narrow down the search for causal variants into chromosomal regions of particular interest. For this purpose, we combined ES with a haplotype sharing test (HST). HST has previously been shown to be a crucial step for successfully identifying regions carrying causative genes in cardiomyopathy families that are too small for classical linkage analysis (van der Zwaag et al). We applied HST as a filtering method during data analysis, and this helped us to prioritize the long list of genes containing heterozygous variants.

AUTOSOMAL DOMINANT CARDIOMYOPATHIES 111 Using this combined approach of ES and HST, we succeeded in identifying the disease gene or putative disease gene in six out of our twelve families with autosomal dominant cardiomyopathies. We identified five potential disease genes, of which three were novel, one had occasionally been associated with cardiomyopathies, and one was the known cardiomyopathy gene TTN, which was not routinely screened for at that time due to its enormous size.

MATERIALS AND METHODS Patients Families were selected because multiple affected members were available for HST analysis and because, in all cases, previous Sanger sequencing approaches and, in most cases, gene-panel-based NGS had not resulted in the identification of a pathogenic mutation or likely pathogenic variant. Eleven families were recruited from the cardiomyopathy cohort of the University Medical Center Groningen, the Netherlands, and one family was recruited from the University Medical Center Nijmegen, the Netherlands. The DCM patients were diagnosed according to established clinical criteria (Mestroni et al). One family had ARVC fulfilling the task force criteria (TFC) (Marcus et al), and one family had five family members with suspected ARVC, but not yet fulfilling all of those criteria. Our approach included (1) for most families, pre-screening of patients using gene-panel-based NGS targeting 55 known cardiomyopathy genes, and subsequent selection of candidate patients/families (some families were analysed using our gene-panel-based approach during the course of this study); (2) HST of all available affected family members and subsequent data analysis; (3) ES of at least two family members who are as distantly related as possible; (4) identification of probable disease regions and genes; (5) confirmation and co-segregation analysis; (6) mutational screening of probable disease genes in additional patients; and (7) co-expression network analysis to obtain supportive evidence of pathogenicity.

Targeted sequencing DNA samples isolated from peripheral blood of patients were sequenced for 55 known cardiomyopathy disease genes as formerly described by Sikkema- Raddatz et al and Posafalvi et al (manuscript in preparation, see also chapter 4.1). Data analysis was performed using the MiSeq reporter program (Illumina, San

112 EXOME SEQUENCING Diego, CA, USA), Next Gene software (v2.2.1, Softgenetics, State College, PA, USA) and Cartagenia software (Cartagenia, Leuven, Belgium), as described (Sikkema-Raddatz et al, chapter 4.1).

Haplotype sharing test To establish haplotypes and to identify possible shared haplotypes, single nucleotide polymorphism (SNP) genotyping of the DNA samples was performed using the Human 610-quad beadchip® 610K SNP array (Illumina) according to the manufacturer’s protocols. The data was analysed using Microsoft® Office Excel 2007 (Microsoft, Redmond, WA, USA) software as previously described by van der Zwaag et al. The longest shared haplotypes (LSH) identified were used for “ranking” candidate variants in the last step of the exome sequencing data analysis. In this step we assume that the longer a shared region is between affected family members, the higher the chance that it contains the mutual causative mutation. In the cases in which the mutation identified was not localised in the 1st LSH, we checked if those chromosomal regions which ranked better than the one carrying the mutation contained any cardiomyopathy candidate CHAPTER 3.1 genes, and mutations in those genes were excluded. Additionally, the array data was also used for quality control purposes: we performed a concordance check between the genotyping and exome sequencing datasets to exclude potential sample-swaps during the experimental procedures.

Exome sequencing Exome sequencing was performed on Illumina HiSeq2000 sequencers in paired end mode and 100bp read lengths following sample preparations using SureSelect exome capture kit All Exon V4 or V5 (Agilent Technologies, Inc., Santa Clara, CA, USA) enrichment according to the manufacturer’s protocol. The raw Fastq files were aligned by using bwa-0.5.9 to the human reference genome (hg 19, NCBI build 37) (Li et al, 2009a), SAM/BAM files were manipulated by Samtools-0.1.10 and Picard-1.57 (Li et al, 2009b). Then the Genome Analysis Toolkit was used to perform base quality score recalibration, duplicate removal and INDEL realignment (McKenna et al). The output vcf files were annotated by our in-house bioinformatics pipeline and SeattleSeq (http://gvs.gs.washington.edu/).

AUTOSOMAL DOMINANT CARDIOMYOPATHIES 113 Data analysis, filtering and prioritization After quality filtering of the data and checking the concordance of SNP calls from the genotyping and sequencing platforms, we used various, generally accepted, data filters in our analysis. These included filtering data for a minimal read depth, checking the allele balance (and only keeping heterozygous variant calls), and using a population frequency filter. From the remaining list of variants, we focused on those novel or rare coding variants that were shared among affected family members. At this step, we implemented both negative and positive filters. For instance, it is well known that olfactory receptor genes exhibit unusually high genetic variability between individuals (Waszak et al), hence those variants do not seem relevant in DCM (negative filter). On the other hand, we looked carefully at variants in genes which had been previously associated with cardiomyopathy or heart-specific phenotypes. For this purpose, we not only focussed on known cardiomyopathy genes, but also included genes known to show cardiac expression or found to be important for abnormal cardiomyocyte proliferation, or associated with a thin myocardial wall or other cardiac phenotypes in a heart-specific protein network built purely upon functional data (such as mouse models, yeast-two- hybrid screening or other sources of experimental proteomics data, Lage et al, 2010) (positive filter). In addition, we performed thorough data-mining taking into consideration everything known about those genes that remained at the end of the analysis: their known function, their potential cardiac expression, and the existence of any pseudogenes. In parallel with this last step, the remaining variants were ranked according to their localization into one of the shared haplotypes of considerable size within the family and their putative pathogenicity (for details on variant classification, see also chapter 4.1; for a decision tree during our exome sequencing data analysis, see figure 1).

Mutation screening Sanger sequencing was performed for validation of the ES results in the DNA of the index patients, for segregation analysis of the identified genetic variants (potential mutations) within families, as well as for screening in larger patient cohorts (where appropriate). Primer sequences are available upon request. In order to screen for additional mutation carriers of the COBL mutation c.998G>A; p.(Arg333Gln) identified in family 5, restriction digestion analysis

114 EXOME SEQUENCING SEQUENCING DATA read depth (exclude nucleotides of coverage <20x) allele balance (homozygous or heterozygous)

FREQUENCY in-house sample pool filter (exclude polymorphisms & artefacts) g population frequency filter (dbSNP, 1000 Genomes) n i e r t l

f i FAMILY create list of variants shared between affected family members exclude variants from non-affected relatives

FUNCTIONALITY keep only coding variants (frameshift, nonsense, missense)

splice site variants CHAPTER 3.1

GENE SELECTION CLASSIFICATION GENETIC SUPPORT Does the gene „fit”? Does the variant „fit”? Does the family

n background „fit”? o i blacklisted genes to be physicochemical re-check the

a t neglected changes mode of inheritance z i

t special attention given conservation combine with i

r to known disease genes HST or homozygosity

o affected domain mapping results i literature to determine:

p r expression frequency check co-segregation function within the family pseudogenes predicted pathways pathogenicity

Figure 1. Decision tree of exome sequencing data analysis Abbreviation: HST – haplotype sharing test.

AUTOSOMAL DOMINANT CARDIOMYOPATHIES 115 was performed on COBL PCR products using BseXI according to the protocols provided by the manufacturer (Thermo Scientific, Waltham, MA, USA) and results visualized by 1.5% agarose gel electrophoresis.

Co-expression network An online software using publicly available data from expression array depositories has been created (see Results section). Approximately 80,000 human, mouse and rat microarrays archived in the Gene Expression Omnibus were subjected to principal component analysis. The resulting components hypothetically reflect on transcriptomes, which are often well conserved across species and are enriched for known biological phenomena. The software uses statistical methods on the combined, multi-species gene network built upon these components in order to predict biological functions of candidate genes. Furthermore, it is able to visualise the co-expression network of a set of genes of interest in Cytoscape. A detailed description of the method can be found elsewhere (Fehrmann et al, in press, Nat Genet).

RESULTS & DISCUSSION Our approach of applying ES combined with HST as a filtering step to facilitate more focused variant prioritization during data analysis resulted in the identification of the potentially causative mutation in six out of twelve “unsolved” cardiomyopathy families, affecting five different genes in total. We describe our findings for these six families in detail below, with the pedigrees and haplotype sharing results of the respective families shown in figure 2.

Six families in which the cause of the disease was identified Family 1, DCM: TTN c.82117C>T; p.(Arg27373*) and Family 2, DCM: c.75607delA; p.(Ser25203Valfs*29), (NM_001256850.1) We have identified a disease-causing variant, c.82117C>T; p.(Arg27373*), in the TTN in family 1. The variant was a novel nonsense mutation encoded in the 3rd longest shared haplotype (2q14.1q31.1), and was shown to co-segregate with the disease phenotype in all six affected family members for whom a DNA sample was available (this family is also described in chapter 4.2). Mutations of TPM1 (candidate gene localised in the 2nd LSH) were excluded by Sanger sequencing. Likewise, in family 2 another novel TTN mutation, c.75607delA; p.(Ser25203Valfs*29), was identified in the two family members who were analysed by ES and in this case the

116 EXOME SEQUENCING TTN gene was located in the second largest haplotype (2p11.2q33.1). No additional affected family members were available for this analysis. TTN is known to play a role in sarcomere assembly and stabilization (Granzier & Wang) and has been associated with heart failure (Hein et al) and cardiomyopathy (Gerull et al) for decades, but has not been extensively sequenced in patients due to its huge genomic size (TTN is the largest gene of the human genome with a length of ~0.3 Mbp). Currently, TTN is suggested to be involved in dilated, restrictive, hypertrophic, and arrhythmogenic right ventricular cardiomyopathies (Gerull et al, Peled et al, Satoh et al, Taylor et al) and was recently reported to carry truncating mutations in up to 25% of familial DCM cases (Herman et al). Furthermore, our group has performed functional analysis of the TTN isoform composition combined with single cardiomyocyte passive force measurements on another truncating TTN variant (p.Lys15664Valfs*13) recently identified in a peripartum cardiomyopathy patient (van Spaendonck-Zwarts et al, 2014). What we showed based on these analyses is that the physiological function of the was affected by the presence of the TTN variant. Due to the technical advances made in parallel with our initial exome sequencing studies, we have implemented a gene-panel-based NGS approach CHAPTER 3.1 in order to analyse 55 known cardiomyopathy genes in the genome diagnostics laboratory of our department during the course of this project. This method is currently used as a routine screening step before applying exome sequencing on gene-panel negative patients only (see also Sikkema-Raddatz et al, chapter 4.1 of this thesis, and figure 1 in chapter 5). A remarkable advantage of this technical improvement is that all 363 exons of the TTN gene have now also been included in our targeted diagnostic approach. This resulted in the identification of TTN truncating mutations in up to 15% of criteria-positive DCM cases (see also chapter 4.1).

Family 3, ARVC: FHL2 c.698_699delinsAA; p.(Gly233Glu) (NM_201555.1) In this family, we identified a putative mutation in the four and a half LIM domains 2 gene (FHL2), which is known to be much more prominently expressed in the heart than in other organs (Chan et al). Even though FHL2 seems not to be required for the embryonic development of the heart and its full knock out in mice does not cause any cardiac phenotype up to 15 months of age (Chu et al), the stress of sustained β-adrenergic stimulation by soproterenol treatment lead to cardiac hypertrophy in these animals (Kong et al, 2001).

AUTOSOMAL DOMINANT CARDIOMYOPATHIES 117 figure A Family 1 A) family 1 family

B) figure B family 1 family

118 EXOME SEQUENCING Family 2 figure A A) family 2 family

B) figure B CHAPTER 3.1 family 2 family

AUTOSOMAL DOMINANT CARDIOMYOPATHIES 119 Family 3 figure A family 3 family

figure A Family 4 A) family 4 family

B) figure B family 4 family

120 EXOME SEQUENCING Family 5 figure A A) family 5 family

B)

figure B CHAPTER 3.1 family 5 family

AUTOSOMAL DOMINANT CARDIOMYOPATHIES 121 Family 6 figure A A) family 6 family

B) figure B family 6 family

Figure 2. Pedigrees including results of co-segregation analyses (A) and HST (haplotype sharing test) results (B) of the six solved families

122 EXOME SEQUENCING Recent studies have also shown that FHL2 is able to prevent pathological growth of the heart via the suppression of calcineurin activation that is induced by stress (Hojayev et al). Also, the overexpression of FHL2 might be the reason why ROCK2 conditional knock out mice were rescued from cardiac hypertrophy (Okamoto et al). Most importantly, a missense variant of FHL2 (p.Gly48Ser) found in a DCM patient has been reported to affect the binding of titin to the encoded protein (Arimura et al). Our putative FHL2 mutation was found in all three affected (and exome se- quenced) siblings in this family. Unfortunately, the unaffected parents were not available for carriership analysis, nor were further affected family members avail- able for co-segregation analysis, and HST was not performed in this family. None- theless, we classified this mutation as likely pathogenic because it is novel (i.e. not present in any control populations), the affected residue is localised in an evolu- tionarily highly conserved region of the 4th LIM zinc-binding domain, and the mu- tation is suggested to be deleterious by most protein effect prediction programs.

Family 4, ARVC-like: FLNC c.6864_6867dup; p.(Val2290Argfs*23) (NM_001458.4) We identified a potentially causative mutation in a gene-panel negative family with several family members suspected of ARVC, yet none fulfilling TFC in CHAPTER 3.1 the C gene FLNC( ), which encodes an actin-crosslinking phosphoprotein (van der Flier & Sonnenberg). FLNC is highly expressed in murine cardiac and skeletal muscle during embryonic development and regeneration (Goetsch et al) and localizes to the Z-disk of striated muscle and to the intercalated disks in the heart (van der Ven et al, 2000). It is expected to have an essential role in the maintenance of the structural integrity of the cell and to protect it against mechanical stress as was observed in mutant zebrafish that suffered from enlarged hearts (Fujita et al). Moreover, FLNC was also shown to have interactions with delta- and gamma-sarcoglycan, in particular in the muscles (Thompson et al). FLNC mutations are known to cause distal and myofibrillar myopathy and might also affect the heart (reviewed by Selcen & Carpén), but had not thus far been associated with cardiomyopathy (or ARVC in particular), although a FLNC mutation associated with arrhythmia and late onset myofibrillar myopathy has been reported (Avila-Smirnow et al). Moreover, it is known that FLNC, along with other sarcomere genes (MYH7, TNNI3, TNNT2), shows differential splicing in failing heart, DCM and aortic stenosis (Kong et al, 2010). The insertion identified in this family causes a frameshift in exon 41 (which encodes filamin repeat 20 and mediates interaction with XIRP1 according to

AUTOSOMAL DOMINANT CARDIOMYOPATHIES 123 the Uniprot database, www..org) leading to a premature stop 23 codons after the affected codon, and, as a consequence, the loss of filamin repeats 21-24 (unless the full sequence is subject to nonsense-mediated decay). The mutation was absent from control populations. Moreover, in the 6500 exomes of the ESP project, no truncating mutations were identified except for two truncations in the last but one exon of the gene, which probably do not have a large effect on the protein level and might escape nonsense-mediated decay. All affected family members for whom material was available were shown to carry the mutation. The mutation is located in the 26th longest shared haplotype (7q31.32q35), which is still a shared haplotype of considerable size (29.73cM), although in this particular case HST was not used for variant prioritization. The shared mutation was identified after exome sequencing and data analysis of four affected siblings (II:2, II:5, II:6, II:8) and filtering using exome data of one unaffected sibling (II:3).

Family 5, DCM: COBL c.998G>A; p.(Arg333Gln) (NM_015198.3) In this family suffering from an unusually mild and low penetrant form of DCM, we identified a putative missense mutation in the cordon bleu WH2 repeat protein gene (COBL) localized in the second longest shared haplotype (7p14.1q11.22). At the same time, no mutations were found in the FKTN gene located in the 1st LSH, nor in the cardiomyopathy gene-panel. Even though this mutation affects a highly conserved region of the protein, we classified it as a variant of unknown significance (VOUS) due to the contradictory pathogenicity predictions and the fact that the variant was found with an allele frequency of 0.12% in the ESP database and present in only one individual within the genome of the Netherlands project. This VOUS co-segregated with the mild and low penetrance, late-onset DCM phenotype in the family. The paediatric patient (V:1 in the pedigree, see figure 2) was not carrying the same VOUS, but her severe symptoms and early onset of disease might indicate an independent cause of disease, perhaps according to a recessive inheritance pattern. The COBL protein is known to be of key importance in cytoskeletal dynamics as a very potent actin nucleator promoting the construction of long, unbranched filaments by elongation at the barbed ends (Ahuja et al). The knock out of the COBL homologue in zebrafish was previously found to cause developmental problems of the nervous system due to the inhibition of motile cilia causing insufficient determination of the left-right asymmetry axis. Interestingly, zebrafish also exhibited problems in the embryonic

124 EXOME SEQUENCING development of the heart (the direction of heart looping was disturbed), which is not unexpected given that the heart, just like the nervous system, develops from a ciliated cell layer called Kupffer’s vesicle. Unfortunately, there is no information available about whether there were any microscopic changes in the ultrastructure of muscle filaments in the hearts of these knock out animals (Ravanelli & Klingensmith). Curiously, another actin nucleator that, similar to COBL, promotes the growth of actin filaments at the barbed ends (though with somewhat weaker activity; Ahuja et al) was shown to play a role in sarcomere assembly in cardiomyocytes (Taniguchi et al). Also, a recent study showed significant association between hypertrophic cardiomyopathy and a missense variant of this gene, FHOD3 (formin homology 2 domain containing 3), and demonstrated the importance of its Drosophila homologue in normal systolic contractions of the adult heart in a knock down model (Wooten et al). To date, COBL has not been connected to heart diseases, yet it is known to be highly expressed in the heart according to the GeneCards database (www. .org), and its interaction with actin filaments makes it an interesting candidate disease gene for DCM. Though a recent study investigated the functional consequences of mutating certain amino acids of the first two actin CHAPTER 3.1 monomer binding WH2 domains of COBL by electron microscopy (Jiao et al), the potential role of the evolutionarily highly conserved KRAP motifs of the protein have yet to be discovered; one such motif is affected by the missense variant identified in our patient. In addition to the identification of this missense VOUS in affected mem- bers of family 4, we have screened a further 183 DCM index patients for carriership of this variant, and have identified one more, unrelated, paediatric patient carrying the same putative mutation. Due to the severity of the symptoms in this child, and the very early onset of the disease (at age 1 year), we anticipated that compound heterozygosity could explain her phenotype, yet no additional coding COBL variant was identified by Sanger sequencing. However, gene-panel-based NGS for 55 cardiomyopathy-related genes and the subsequent stringent variant classification in this patient resulted in the identification of a likely pathogenic missense variant c.263A>C p.(Glu88Ala) of the myosin light chain 2 gene (MYL2, NM_000432.3). The patient was confirmed to carry both mutations and we expected to identify their paternal and maternal origin, respectively. However, co-segregation analysis proved the maternal origin of both COBL and MYL2 mutations, raising the question if

AUTOSOMAL DOMINANT CARDIOMYOPATHIES 125 further genetic or other external factors may be behind the early manifestation of symptoms in the child.

Family 6, DCM: STARD13 c.3017C>T; p.(Pro1006Leu) (NM_178006.3) A genetic variant in the START domain containing 13 (STARD13) gene was found in this family (START is the abbreviation of StAR-related lipid transfer; StAR stands for steroidogenic acute regulatory protein). The encoded protein is expected to be responsible for the binding of negatively charged small lipids such as phosphatidylcholine and fatty acids (Thorsell et al). STARD13 has been previously linked to several phenotypes including intracranial aneurysm (Yasuno et al) and insulin resistance related to metabolic syndrome (Nock et al). Combined with the facts that (1) irregular myocardial lipid turnover is a known phenomenon in dilated cardiomyopathy (Feinendegen et al) and (2) perturbed lipid metabolism, myocardial lipid accumulation, and a shift to the use of fatty acids instead of glucose as the predominant source of energy is observed in (and prior to the onset of) cardiomyopathy in diabetic patients and model animals (reviewed by Bayeva et al), these associations suggest that the genetic variant in STARD13 reported in this study could be related to disease in this family. The variant was identified in the 2nd longest shared haplotype of the family (13p13q13.3) and was classified as likely pathogenic due to its novelty, the high evolutionary conservation of the affected amino acid and the respective lipid-binding START domain, and predicted pathogenicity according to all available software. Mutations of known candidate genes were excluded by gene-panel-based sequencing, and of those affected family members tested all were found to be carriers of the STARD13 mutation. Upon the identification of this novel candidate gene, the medical records of the family were re-checked for possible signs of the diabetes mellitus or metabolic syndrome potentially associated with this mutation, but no such symptoms have been observed thus far in the patients (aged 80, 74, 67 and 53 years).

Network of the identified genes In order to gain insight into possible cardiac functions of the five genes identified in our exome sequenced families (COBL, FHL2, FLNC, STARD13, TTN), their HGNC approved gene symbols were uploaded to the Gene Network website (http://genenetwork.nl:8080/GeneNetwork), which predicted that all of them except STARD13 might potentially be involved in different cardiomyopathies using data from the Kyoto Encyclopedia of Genes

126 EXOME SEQUENCING and Genomes pathways (data not shown). Subsequently, the co-expressional network of these five genes was visualized in Cytoscape (http://genenetwork. nl:8080/GeneNetwork cytoscape.html) (figure 3). The resulting network consists of 166 genes, of which 27 were already well known to be involved in the pathogenesis of various types of cardiomyopathy: ACTC1, ACTN2, ANKRD1, CAV3, CRYAB, CSRP3, FHL1, FHL2, LDB3, MYBPC3, MYH6, MYH7, MYL2, MYL3, MYOZ2, MYPN, NEXN, PDLIM3, PLN, SCN5A, TCAP, TNNC1, TNNI3, TNNT2, TPM1, TTN, and VCL. In addition to these well characterised disease genes, the novel neonatal DCM-associated ALPK3 (alpha-kinase 3) was also present in the network. The knock out model for the mouse homologue of this gene encoding a nuclear protein kinase is known to suffer from cardiomyopathy (van Sligtenhorst et al), and we have recently discovered a homozygous mutation of the gene in the DCM-affected child of a consanguineous family (manuscript submitted). More importantly, we applied an unprecedented approach to putting the genes in a functionally meaningful perspective. While searching for this list of co-expressed genes in the database of the Cardiovascular Annotation Initiative (http://www.ebi.ac.uk/QuickGO/GProteinSet?id=BHF-

UCL), we discovered that about 60% of the genes (100/166) have previously CHAPTER 3.1 been manually annotated with a potential role in the physiological and/or pathological mechanisms of the cardiovascular system (table 1) based on the literature, and this is underscored by previous functional studies, as will be discussed below. For instance, triadin (TRDN) and xin actin-binding repeat containing 1 (XIRP1) are both known to be subject to tissue-specific splicing in the heart via RNA-binding motif protein 20 (RBM20), a known dilated cardiomyopathy protein that is part of the spliceosomal complex in the heart (Guo et al). TRDN is known to stimulate the ryanodine receptor-2 (RYR2) that functions as a sarcoplasmic Ca2+ release channel with the help of calsequestrin (CASQ2, also featured in the co-expression network), and in this way play a role in excitation-contraction coupling in the heart (Morad et al, Terentyev et al, 2005; Terentyev et al, 2007). Mutations of TRDN have been identified in patients with catecholaminergic polymorphic ventricular tachycardia (Roux-Buisson et al). Furthermore, the XIRP1 gene, which was formerly known as “cardiomyopathy associated 1” (CMYA1), is connected in the expression network to FLNC, and the respective protein was also shown to bind with the FLNC protein and participate in the process of sarcomere assembly and actin dynamics in cardiomyocytes (van der Ven et al, 2006).

AUTOSOMAL DOMINANT CARDIOMYOPATHIES 127

- - -

TTN, )

STARD13 families the five genes ( the five identified in the identified network built upon network and network are indicated: network are of with three pressed exome sequenced sequenced exome cardimyopathy dates, while the dark dates, exome sequencing. sequencing. exome multiple genes having connections within the ex are ones green er Red circles indicate the indicate Red circles the genes co-expressed identi- with those five genes green the lighter candi five the of two them. FHL2, FLNC, COBL, are co-expressedare with five genes identified by genes identified five Only those fied genes. Co-expression Co-expression Green circles indicate circles Green Figure 3. Figure

128 EXOME SEQUENCING Curiously, the FLNC frameshift mutation identified in family 4 affects filamin repeat 20, which is known to mediate the binding of XIRP1. Moreover, proline-rich regions of XIRP1 were recently discovered to bind the SH3 domains of nebulin (NEB) and nebulette (NEBL), the myofibrillar proteins involved in the pathomechanism of nemaline myopathy and cardiomyopathy, respectively (Eulitz et al, Lehtokari et al, Purevjav et al). It is rather remarkable that several genes in the network are shown to be important in sarcomere assembly, and this also applies to the proteins encoded by three of the five genes we identified: COBL also functions as an actin nucleator (Ahuja et al), TTN is a known structural component of the sarcomere (Horowits et al), and FLNC is also expected to play a role in the assembly process (van der Ven et al, 2000; Bönnemann et al, Fujita et al). Comparably, -1 (TMOD1) and leiomodin (LMOD) were shown to be involved in sarcomere assembly, as they have a role in fine-tuning the length of thin filaments in cardiomyocytes. TMOD1 caps the pointed end of actin filaments in the M-line of sarcomeres, while the competing LMOD2 is an actin nucleation factor that promotes sarcomere assembly in a tropomyosin-dependent way (Chereau et al, Skwarek-Maruszewska et al; Tsukada et al). In line with this, the gene encoding the cardiomyopathy-related tropomyosin (TPM1) also appears in CHAPTER 3.1 the co-expression network of the five genes. This suggests that these genes are part of a putative common molecular pathway. The fact that the disease genes we identified by exome sequencing are connected within such a functionally meaningful co-expression network, and that it is enriched for known cardiomyopathy genes as well as genes expected to play an essential role in the heart, underscores the usefulness of such databases in interpreting high- throughput genetic findings. Furthermore, our finding that this network is enriched for the sarcomeric components is in line with the recent observation in a cohort of 639 DCM patients that 14% of the known pathogenic mutations were related to the sarcomeric structure, making this the most frequently mutated cellular compartment in the disease (Haas et al). One of the limitations of our study is that we have not yet found additional patients with the same mutations, or with other relevant genetic variants in some of the candidate genes. Although our approach of combining HST and ES did help deal with the relatively small size of families, the limited number of affected individuals available in this study might have influenced our findings. Segregation analysis supported putative pathogenicity of the identified variants in most families, yet it is always challenging to perform

AUTOSOMAL DOMINANT CARDIOMYOPATHIES 129 Table 1. List of the five genes identified in the exome sequenced cardiomyopathy families and the genes of the additional 161 co-expressed proteins Genes in black have been previously studied in the context of cardiomyopathies and connected to the disease, or have been included in the Cardiovascular Gene Annotation Ontology Initiative database on the basis of data-mining and functional studies suggesting a putative role in cardiovascular physiology or pathophysiology. Genes in grey have not been previously connected to cardiomyopathies or annotated with a putative role in cardiovascular physiology or pathophysiology.

cardio- exome exome listed in listed found by by found Initiative known in myopathy Annotation Annotation gene name of gene sequencing Cardiovascular Cardiovascular Gene Ontology ABRA ACTA1 Actin, alpha skeletal muscle ACTC1 Actin, alpha cardiac muscle 1 ACTN2 Alpha-actinin-2 ADPRHL1 ADSSL1 ALPK3 AMOTL2 AMPD1 AMP deaminase 1 ANKRD1 ANKRD2 Ankyrin repeat domain-containing protein 2 ANXA3 APOBEC2 ASB2 ASB5 ATP1A2 Sodium/potassium-transporting ATPase subunit alpha-2 ATP2A1 Sarcoplasmic/endoplasmic reticulum calcium ATPase 1 ATP2A2 Sarcoplasmic/endoplasmic reticulum calcium ATPase 2 AXL BAG2 BDNF Brain-derived neurotrophic factor C10orf7 CA3 CACNA1S Voltage-dependent L-type calcium channel subunit alpha-1S CACNB1 Voltage-dependent L-type calcium channel subunit beta-1 CACNG1 Voltage-dependent calcium channel gamma-1 subunit CALD1 Caldesmon CAND2 CAP2 CASQ1 Calsequestrin-1 CASQ2 Calsequestrin-2 CAV3 Caveolin-3 CFL2 CHRNA1 Acetylcholine receptor subunit alpha CHRNB1 Acetylcholine receptor subunit beta CHRND Acetylcholine receptor subunit delta CKB Creatine kinase B-type CKM Creatine kinase M-type CMYA5 CNN1 Calponin-1 COBL Protein cordon-bleu CORO6 COX6A2 Cytochrome c oxidase subunit 6A2, mitochondrial CRYAB Alpha-crystallin B chain CSRP3 Cysteine and glycine-rich protein 3 DMPK Myotonin-protein kinase DUSP13 many other DUSPs DUSP27 many other DUSPs EEF1A2 ENO3 Beta-enolase FABP3 Fatty acid-binding protein, heart FHL1 Four and a half LIM domains protein 1 FHL2 Four and a half LIM domains protein 2 FLNC Filamin-C HFE2 HRC Sarcoplasmic reticulum histidine-rich calcium-binding protein HSPB3 HSPB6 HSPB7 Heat shock protein beta-7 HSPB8 IP6K3 Inositol hexakisphosphate kinase 3 ITGB1BP2 Integrin beta-1-binding protein 2 ITGB1BP3 KBTBD5 KERA Keratocan LDB3 LMOD1 Leiomodin-1 LMOD3 LRRC2 MB Myoglobin MLIP MURC MUSK Muscle, skeletal receptor tyrosine-protein kinase

130 EXOME SEQUENCING MYBPC1 Myosin-binding protein C, slow-type MYBPC2 Myosin-binding protein C, fast-type MYBPC3 Myosin-binding protein C, cardiac-type MYBPH Myosin-binding protein H MYF6 Myogenic factor 6 MYH1 Myosin-1 MYH11 Myosin-11 MYH2 Myosin-2 MYH3 Myosin-3 MYH6 Myosin-6 MYH7 Myosin-7 MYH8 Myosin-8 MYL2 Myosin regulatory light chain 2, ventricular/cardiac muscle isoform MYL3 Myosin light chain 3 MYL4 Myosin light chain 4 MYL7 Myosin regulatory light chain 2, atrial isoform MYLK3 Myosin light chain kinase 3 MYLPF Myosin regulatory light chain 2, skeletal muscle isoform MYO18B MYOD1 Myoblast determination protein 1 MYOF Myoferlin MYOG Myogenin MYOM1 Myomesin-1 MYOM2 Myomesin-2 MYOT Myotilin MYOZ1 Myozenin-1 MYOZ2 Myozenin-2 MYPN NEXN Nexilin NPHS2 Podocin NPPA Natriuretic peptides A NPPB Natriuretic peptides B NRAP OBSCN PACSIN3 PDLIM3 PDLIM5 PFKM PKIA PLN Cardiac phospholamban POPDC2 PPP1R27 many other PPP1Rs PPP2R3A many other PPP2Rs PRKAA2 5’-AMP-activated protein kinase catalytic subunit alpha-2 PYGM RAPSN RBFOX RBM24 RP11-59J5.1 CHAPTER 3.1 RP11-766F14.2 RRAD RTN2 RYR1 Ryanodine receptor 1 SCN4A Sodium channel protein type 4 subunit alpha SCN5A Sodium channel protein type 5 subunit alpha SGCA Alpha-sarcoglycan SGCG Gamma-sarcoglycan SH3BGR SLN Sarcolipin SMPX Small muscular protein SMTNL1 SMTNL2 SOX10 SRL SRPK3 SRSF protein kinase 3 STAC3 STARD13 StAR-related lipid transfer protein 13 SYNPO2 SYNPO2L TAGLN Transgelin TCAP Telethonin TECRL TGFB1I1 TMOD1 Tropomodulin-1 TNFRSF12A TNNC1 Troponin C, slow skeletal and cardiac muscles TNNC2 Troponin C, skeletal muscle TNNI1 Troponin I, slow skeletal muscle TNNI2 Troponin I, fast skeletal muscle TNNI3 Troponin I, cardiac muscle TNNT1 Troponin T, slow skeletal muscle TNNT2 Troponin T, cardiac muscle TNNT3 Troponin T, fast skeletal muscle TPM1 Tropomyosin alpha-1 chain TPM2 Tropomyosin beta chain TRDN Triadin TRIM63 TTN Titin UNC45B Protein unc-45 homolog B VCL Vinculin VGLL2 Transcription vestigial-like protein 2 XIRP1 Xin actin-binding repeat-containing protein 1 ZFP106

IN TOTAL: 166 genes KNOWN IN CARDIOMYOPATHY: 28 genes (17.47%) ANNOTATED WITH PUTATIVE CARDIOVASCULAR ROLE: 100 genes (60.24%)

AUTOSOMAL DOMINANT CARDIOMYOPATHIES 131 this for a late-onset disease such as cardiomyopathy because the healthy or affected status of family members is sometimes questionable. This fact, combined with the occasional presence of phenocopies, makes the accurate phenotyping of relatives sometimes difficult and might hamper the accurate analysis of ES and HST data. We cannot fully exclude the possibility that these two issues might have affected the outcome (especially the lack of any mutation being identified) in some of our families. On the other hand, having no genetic cause of disease identified in half of our families might also be due to other, technical problems, such as a lack of sufficient coverage of the respective mutation/disease gene in the available ES data. It has been anticipated that the revolutionary development of new genetics methods necessitate the application of appropriate bioinformatic tools and functional follow up to better interpret the respective results (Singleton, 2014). A very appealing, recent, example of combining exome sequencing with the creation of networks in neurodegeneration was published by Novarino et al. This group identified mutations of novel candidate disease genes in consanguineous families of hereditary spastic paraplegias (HSP), and validated their findings by discovering additional novel genes (and their mutations) selected from the protein interaction network of these novel candidate genes and already known HSP disease genes. Although there have been protein-protein interaction networks created for certain cardiac phenotypic traits (but not for cardiomyopathies) (Lage et al, 2010; Lage et al, 2012), this is the first example of combining exome sequencing and the use of a co-expression based network for the interpretation of the role of potential cardiovascular disease genes and pathways in inherited cardiomyopathy. Admittedly, the network of genes created in this study is based on shared mRNA expression patterns instead of interactions at the protein level. However, in comparison with protein interaction networks, it has the advantage of not creating a bias through exclusion of those genes from the network analysis that have not yet been functionally studied or otherwise shown to interact with heart-specific proteins.

CONCLUSIONS We have performed haplotype sharing tests and exome sequencing in twelve families suffering from DCM or ARVC with no identified genetic cause of the disorder. This resulted in the identification of potentially causative, heterozygous variants in six of the twelve families sequenced.

132 EXOME SEQUENCING Their involvement in disease was supported by the fact that the mutations identified co-segregated with the disease; most genes were located in one of the longest shared haplotypes and were absent or present at very low frequency in control populations. Moreover, the fact that 2/3 of the genes co-expressed with these five genes TTN,( FHL2, FLNC, COBL and STARD13) are annotated with a potential function in the heart, and many are related to the process of sarcomere assembly and reorganization of the cytoskeleton, suggests that a po- tential common molecular pathway may connect them in cardiomyopathy. Since one of the genes discovered in two families is the well-known DCM gene TTN, it has become part of the routine in our department to first perform targeted sequencing for 55 cardiomyopathy genes (including TTN; chapter 4.1) and then to only perform exome sequencing after excluding mutations in all these known disease genes. In the future, it will be of great importance to investigate the cellular function of the COBL and STARD13 genes, as well as the molecular pathways they play a role in, and the potential involvement of the six identified mutations in the pathomechanism of DCM and ARVC. Moreover, we will try to identify underlying disease genes in the other six families by (1) re- analysing the data, (2) incorporating exome sequence data of additional CHAPTER 3.1 affected and unaffected family members, (3) analysing the data for putative large deletions/duplications, and/or (4) applying other genomic techniques, such as RNA sequencing or whole genome sequencing.

ACKNOWLEDGEMENTS The authors would like to thank the families for participating in this study; Ludolf Boven and Elisabetta Lazzarini for technical assistance; members of the Genomics Coordination Centre, UMCG, for assistance in data analysis; Ellen Otten, Gerdien Bosman, Sandra Hermers, Rina Keupink, Jolien Klein-Wassink-Ruiter, Karin Nieuwhof, Wilma van der Roest and Marijke Wasielewski for counselling of families; and Jackie Senior and Kate Mc Intyre for editing this manuscript. Rowida Almomani was supported by the Netherlands Heart Foundation (grant 2010B164) and Anna Pósafalvi was supported by grants from the Jan Kornelis de Cock Foundation.

AUTOSOMAL DOMINANT CARDIOMYOPATHIES 133 REFERENCES Ahuja R, Pinyol R, Reichenbach N et al. Cordon-bleu opathy: a dual in vivo tracer approach. J Nucl is an actin nucleation factor and controls neu- Cardiol 1995;2:42-52 ronal morphology. Cell 2007;131(2):337-50 Fujita M, Mitsuhashi H, Isogai S et al. Filamin C Arimura T, Hayashi T, Matsumoto Y et al. Structural plays an essential role in the maintenance of analysis of four and half LIM protein-2 in di- the structural integrity of cardiac and skeletal lated cardiomyopathy. Biochem Biophys Res muscles, revealed by the medaka mutant zacro. Commun 2007;357(1):162-7 Dev Biol 2012;361:79-89 Avila-Smirnow D, Béhin A, Gueneau L et al. A nov- Gerull B, Gramlich M, Atherton J et al. Mutations el missense FLNC mutation causes arrhythmia of TTN, encoding the giant muscle filament tit- and late onset myofibrillar myopathy with par- in, cause familial dilated cardiomyopathy. Nat ticular histopathology features. Abstract for Genet 2002;30(2):201-4 poster P2.18 presented at the 15th International Goetsch SC, Martin CM, Embree LJ et al. Myogenic Congress of The World Muscle Society, 2010 progenitor cells express filamin C in developing (http://www.sciencedirect.com/science/arti- and regenerating skeletal muscle. Stem Cell cle/pii/S0960896610003615) Dev 2005;14:181-7 Basso C, Corrado D, Marcus FI et al. Arrhythmo- Granzier HL and Wang K. Gel electrophoresis of gi- genic right ventricular cardiomyopathy. Lancet ant proteins: solubilization and silver-staining 2009;373:1289–300 of titin and nebulin from single muscle fiber Bayeva M, Sawicki KT and Ardehali H. Taking di- segments. Electrophoresis 1993;14:56-64 abetes to heart – deregulation of myocardial Guo W, Schafer S, Greaser ML et al. RBM20, a gene lipid metabolism in diabetic cardiomyopathy. J for hereditary cardiomyopathy, regulates titin Am Heart Assoc 2013;2:e000433 splicing. Nat Med 2012;18(5):766-73 Bönnemann CG, Thompson TG, van der Ven PF et Haas J, Frese KS, Peil B et al. Atlas of the clinical al. Filamin C accumulation is a strong but non- genetics of human dilated cardiomyopathy. Eur specific immunohistochemical marker of core Heart J 2014; pii: ehu301 formation in muscle. J Neurol Sci 2003;206:71-8 Hein S, Scholz D, Fujitani N et al. Altered expres- Chan KK, Tsui SK, Lee SM et al. Molecular clon- sion of titin and contractile proteins in fail- ing and characterization of FHL2, a novel LIM ing human myocardium. J Mol Cell Cardiol domain protein preferentially expressed in hu- 1994;26(10):1291-306 man heart. Gene 1998;210(2):345-50 Herman DS, Lam L, Taylor MR et al. Truncations of Chereau D, Boczkowska M, Skwarek-Maruszewska titin causing dilated cardiomyopathy. N Engl J A et al. Leiomodin is an actin filament nucleator Med 2012;366(7):619-28 in muscle cells. Science 2008;320(5873):239-43 Hershberger RE, Hedges DJ and Morales A: Dilated Chu PH, Bardwell WM, Gu Y et al. FHL2 (SLIM3) cardiomyopathy: the complexity of a diverse ge- is not essential for cardiac development and netic architecture. Nat Rev Cardiol 2013;10:531-47 function. Mol Cell Biol 2000;20(20):7460-2 Hojayev B, Rothermel BA, Gillette TG et al. FHL2 Cox MG, van der Zwaag PA, van der Werf C et al. binds calcineurin and represses pathological Arrhythmogenic right ventricular dysplasia/ cardiac growth. Mol Cell Biol 2012;32(19):4025-34 cardiomyopathy: pathogenic desmosome mu- Horowits R, Kempner ES, Bisher ME et al. A phys- tations in index-patients predict outcome of iological role for titin and nebulin in skeletal family screening: Dutch arrhythmogenic right muscle. Nature 1986;323(6084):160-4 ventricular dysplasia/cardiomyopathy geno- Jiao Y, Walker M, Trinick J et al. Mutagenetic and type-phenotype follow-up study. Circulation electron microscopy analysis of actin filament 2011;123(23):2690-700 severing by Cordon-Bleu, a WH2 domain pro- Eulitz S, Sauer F, Pelissier MC et al. Identification tein. Cytoskeleton 2014;71(3):170-83 of Xin-repeat proteins as novel ligands of the Kong SW, Hu YW, Ho JW et al. Heart failure-asso- SH3 domains of nebulin and nebulette and ciated changes in RNA splicing of sarcomere analysis of their interaction during myofi- genes. Circ Cardiovasc Genet 2010;3:138-46 bril formation and remodeling. Mol Biol Cell Kong Y, Shelton JM, Rothermel B et al. Cardiac-spe- 2013;24(20):3215-26 cific LIM protein FHL2 modifies the hypertro- Feinendegen LE, Henrich MM, Kuikka JT et al. phic response to beta-adrenergic stimulation. Myocardial lipid turnover in dilated cardiomy- Circulation 2001;103(22):2731-8

134 EXOME SEQUENCING Lage K, Møllgård K , Greenway S et al. Dissecting Purevjav E, Varela J, Morgado M et al. Nebulette spatio-temporal protein networks driving hu- mutations are associated with dilated cardio- man heart development and related disorders. myopathy and endocardial fibroelastosis. J Am Mol Syst Biol 2010;6:381 Coll Cardiol 2010;56(18):1493-502 Lage K, Greenway SC, Rosenfeld JA et al. Genetic Quarta G, Muir A, Pantazis A et al. Familial and environmental risk factors in congenital evaluation in arrhythmogenic right ventric- heart disease functionally converge in protein ular cardiomyopathy: impact of genetics networks driving heart development. Proc Natl and revised task force criteria. Circulation Acad Sci USA 2012;109(35):14035-40 2011;123:2701-19 Lehtokari VL, Kiiski K, Sandaradura SA et al. Mu- Ravanelli AM & Klingensmith J. The actin nu- tation update: The spectra of nebulin variants cleator Cordon-bleu is required for develop- and associated myopathies. Hum Mutat 2014; ment of motile cilia in zebrafish. Dev Biol doi:10.1002/humu.22693 2011;350(1):101-11 Li H & Durbin R. Fast and accurate short read align- Roux-Buisson N, Cacheux M, Fourest-Lieuvin A et ment with Burrows-Wheeler transform. Bioin- al. Absence of triadin, a protein of the calcium formatics 2009;25:1754–60 release complex, is responsible for cardiac ar- Li H, Handsaker B, Wysoker A et al. The Sequence rhythmia with sudden death in human. Hum Alignment/Map format and SAMtools. Bioin- Mol Genet 2012;21(12):2759-67 formatics 2009;25:2078–9 Satoh M, Takahashi M, Sakamoto T et al. Structural Marcus FI, McKenna WJ, Sherrill D et al. Diag- analysis of the titin gene in hypertrophic cardio- nosis of arrhythmogenic right ventricular myopathy: identification of a novel disease gene. cardiomyopathy/dysplasia: proposed modi- Biochem Biophys Res Commun 1999;262(2):411-7 fication of the task force criteria. Circulation Selcen D and Carpén O. The Z-disk diseases. Adv 2010;121:1533-41 Exp Med Biol 2008;642:116-30 McKenna A, Hanna M, Banks E et al. The Genome Sikkema-Raddatz B, Johansson LF, de Boer EN et al. Analysis Toolkit: a MapReduce framework for Targeted next-generation sequencing can re- CHAPTER 3.1 analyzing next-generation DNA sequencing place Sanger sequencing in clinical diagnostics. data. Genome Res 2010;20:1297–303 Hum Mutat 2013;34:1035-42 Mestroni L, Maisch B, McKenna WJ et al. Guide- Singleton AB. Genetics. A unified process for neu- lines for the study of familial dilated cardiomy- rological disease. Science 2014;343(6170):497-8 opathy. Eur Heart J 1999;20:93-102 Skwarek-Maruszewska A, Boczkowska M, Zajac AL Morad M, Cleemann L, Knollmann BC. Triadin: the et al. Different localizations and cellular be- new player on excitation-contraction coupling haviors of leiomodin and tropomodulin in ma- block. Circ Res 2005;96(6):607-9 ture cardiomyocyte sarcomeres. Mol Biol Cell Nock NL, Wang X, Thompson CL et al. Defining ge- 2010;21(19):3352-61 netic determinants of the Metabolic Syndrome Taniguchi K, Takeya R, Suetsugu S et al. Mammali- in the Framingham Heart Study using associ- an formin fhod3 regulates actin assembly and ation and structural equation modeling meth- sarcomere organization in striated muscles. J ods. BMC Proc 2009;3(Suppl 7): S50 Biol Chem 2009;284:29873-881 Novarino G, Fenstermaker AG, Zaki MS et al. Ex- Taylor M, Graw S, Sinagra G et al. Genetic variation ome sequencing links corticospinal motor in titin in arrhythmogenic right ventricular car- neuron disease to common neurodegenerative diomyopathy-overlap syndromes. Circulation disorders. Science 2014;343(6170):506-11 2011;124(8):876-85 Okamoto R, Li Y, Noma K et al. FHL2 prevents car- te Rijdt WP, Jongbloed JD, de Boer RA et al. Clinical diac hypertrophy in mice with cardiac-specific utility gene card for: arrhythmogenic right ven- deletion of ROCK2. FASEB J 2013;27:1439-49 tricular cardiomyopathy (ARVC). Eur J Hum Peled Y, Gramlich M, Yoskovitz G et al. Titin muta- Genet 2014;22(2). doi: 10.1038/ejhg.2013.124 tion in familial restrictive cardiomyopathy. Int J Terentyev D, Cala SE, Houle TD et al. Triadin over- Cardiol 2014;171(1):24-30 expression stimulates excitation-contraction Posafalvi A, Herkert JC, Sinke RJ et al. Clinical utility coupling and increases predisposition to cel- gene card for: dilated cardiomyopathy (CMD). Eur lular arrhythmia in cardiac myocytes. Circ Res J Hum Genet 2013;21. doi: 10.1038/ejhg.2012.276 2005;96(6):651-8

AUTOSOMAL DOMINANT CARDIOMYOPATHIES 135 Terentyev D, Viatchenko-Karpinski S, Vedamoor- Wilde AA & Behr ER: Genetic testing for inherited thyrao S et al. Protein protein interactions be- cardiac disease. Nat Rev Cardiol 2013;10:571-83 tween triadin and calsequestrin are involved Wooten EC, Hebl VB, Wolf MJ et al. Formin homol- in modulation of sarcoplasmic reticulum cal- ogy 2 domain containing 3 variants associated cium release in cardiac myocytes. J Physiol with hypertrophic cardiomyopathy. Circ Car- 2007;583(Pt 1):71-80 diovasc Genet 2013;6:10-18 Thompson TG, Chan YM, Hack AA et al. Filamin 2 Yasuno K, Bilguvar K, Bijlenga P et al. Genome-wide (FLN2): A muscle-specific sarcoglycan interact- association study of intracranial aneurysm ing protein. J Cell Biol 2000;148:115-26 identifies three new risk loci. Nat Genet Thorsell AG, Lee WH, Persson C et al. Comparative 2010;42:420-5 structural analysis of lipid binding START do- mains. PLoS One 2011;6(6):e19521 Tsukada T, Pappas CT, Moroz N et al. Leiomodin-2 is an antagonist of tropomodulin-1 at the point- ed end of the thin filaments in cardiac muscle. J Cell Sci 2010;123(Pt 18):3136-45 van der Flier A & Sonnenberg A: Structural and functional aspects of . Biochim Bio- phys Acta 2001;1538:99-117 van der Ven PF, Obermann WM, Lemke B et al. Characterization of muscle filamin isoforms suggests a possible role of gamma-filamin/ ABP-L in sarcomeric Z-disc formation. Cell Motil Cytoskeleton 2000;45:149-62 van der Ven PF, Ehler E, Vakeel P et al. Unusual splicing events result in distinct Xin isoforms that associate differentially with filamin c and Mena/VASP. Exp Cell Res 2006;312(11):2154-67 van der Zwaag PA, van Tintelen JP, Gerbens F et al. Haplotype sharing test maps genes for familial cardiomyopathies. Clin Genet 2011;79:459-67 van Sligtenhorst I, Ding ZM, Shi ZZ et al. Cardio- myopathy in α-kinase 3 (ALPK3)-deficient mice. Vet Pathol 2012;49:131-41 van Spaendonck-Zwarts KY, van Rijsingen IA, van den Berg MP et al. Genetic analysis in 418 in- dex patients with idiopathic dilated cardiomy- opathy: overview of 10 years’ experience. Eur J Heart Fail 2013;15:628-36 van Spaendonck-Zwarts KY, Posafalvi A, van den Berg MP et al. Titin gene mutations are com- mon in families with both peripartum cardio- myopathy and dilated cardiomyopathy. Eur Heart J 2014; doi: 10.1093/eurheartj/ehu050 van Tintelen JP, Entius MM, Bhuiyan ZA et al. Pla- kophilin-2 mutations are the major determi- nant of familial arrhythmogenic right ventric- ular dysplasia/cardiomyopathy. Circulation 2006;113:1650–8 Waszak SM, Hasin Y, Zichner T et al. Systematic inference of copy-number genotypes from per- sonal genome sequencing data reveals exten- sive olfactory receptor gene content diversity. PLoS Comput Biol 2010;6(11):e1000988

136 EXOME SEQUENCING

Chapter 3.2

Homozygous SOD2 mutation as a cause of lethal neonatal dilated cardiomyopathy

Rowida Almomani1,*, Anna Posafalvi1,*, Johanna C Herkert1, Jan G Post2, Paul A van der Zwaag1, Peter Willems 3, Cindy Weidijk1, Peter GJ Nikkels4, Richard J Rodenburg5, Richard J Sinke1, J Peter van Tintelen1, Jan DH Jongbloed1

*These authors contributed equally to these studies.

Manuscript in preparation ABSTRACT Although cases are rare, neonatal and paediatric dilated cardiomyopathy (DCM) is a severe and often lethal disease, in which a genetic factor plays an important role in disease development. Identifying this genetic compo- nent is of major importance for parents as it enables prenatal diagnostics to be performed in their future pregnancies. Here, we report the results of homozygosity mapping followed by exome sequencing in a DCM-affected neonate in whom autosomal recessive inheritance was anticipated. This approach revealed a potentially pathogenic, homozygous missense mutation, c.542G>T, p.(Gly181Val), in the gene encoding Superoxide dismutase 2 (SOD2). SOD2 is a mitochondrial matrix protein that converts the reactive oxygen –• species (ROS) superoxide anion (O2 ) into H2O2, and is therefore important for preventing cellular damage due to oxidative stress. We measured the −• oxidation of hydroethidine and detected a significant increase in O2 levels in the fibroblasts of the patient compared with controls. This indicates that the mutation affects the catalytic activity of SOD2, which could lead to a drastic increase in damaging oxygen radical levels in the neonatal heart and result in rapidly developing heart failure and death. In conclusion, we have identified a novel mitochondrial gene involved in severe neonatal cardiomyopathy, thus expanding the wide range of genetic factors involved in paediatric cardiomyopathies. INTRODUCTION Dilated cardiomyopathy (DCM) is characterized by left ventricular enlargement and systolic dysfunction, which can lead to heart failure and sudden cardiac death (Fatkin et al). It is the most common type of cardiomyopathy and the major reason for heart transplantations in children. The incidence of DCM in children is estimated to be 0.57/100,000 per year, and is even higher in children below the age of one year (8.34/100,000) (Towbin et al). Approximately 25-50% of DCM cases are familial, and mutations in more than 50 genes have been reported to be associated with adult-onset familial DCM, some of which are observed in paediatric DCM as well (Somsen et al, Dellefave & McNally, Posafalvi et al). DCM-associated genes encode diverse groups of proteins including cytoskeletal, sarcomeric, ion transport, nuclear membrane and mitochondrial proteins (Somsen et al, Dellefave & McNally, Posafalvi et al). In contrast to adult DCM, knowledge about the underlying genetic causes of paediatric cases is still limited. In familial cases, mutations are regularly found in the known DCM genes (Rampersaud et al). However, these neither explain the majority of pediatric cases in which rare mutations in autosomal recessive inherited genes underlie disease, nor the cases of children whose DCM is part of CHAPTER 3.2 a syndromic or metabolic disease (Kindel et al). Therefore, Burns et al recently concluded that approaches using gene-panel based applications targeting ‘adult’ DCM disease genes are less appropriate for the severe infantile forms of the disease, and they suggested that gene discovery is likely to proceed more rapidly when exome sequencing (ES) or genome sequencing are applied. Successful application of ES to identify the causal mutations in paediatric DCM has been recently demonstrated (Theis et al 2011, 2014; Louw et al). Here we have used homozygosity mapping followed by ES to identify the genetic cause of lethal DCM in a three-day-old Dutch girl. The homozygous mutation, c.542G>T, p.(Gly181Val), we found in the SOD2 gene (NM_000636.2) most likely affects the catalytic activity of the protein, leading to excess oxygen radical levels with strongly damaging effects in the neonatal heart.

METHODS Case report The female patient was born at 39+2 weeks gestation after a caesarean delivery due to breech presentation and meconium staining of the amniotic fluid. The pregnancy was complicated by maternal nephrotic syndrome

SOD2 IN AUTOSOMAL RECESSIVE CARDIOMYOPATHY 141 at 19 weeks gestation and treated with prednisone. Her Apgar scores were 2-3 and 9, her birth weight was 2240 g (

Homozygosity mapping Genome-wide genotyping with the HumanCytoSNP-12 BeadChip® 300K SNP array (Illumina, San Diego, CA, USA) was performed according to the manufacturer’s protocols. Data from the arrays were converted to genotypes using the GenomeStudio® data analysis software (Illumina). The genotype data was subject to homozygosity mapping using Microsoft® Office Excel

142 EXOME SEQUENCING 2010 (Version 14.0; Microsoft, Redmond, WA, USA) software by searching for homozygous regions in the patient’s DNA, allowing for a 1% genotyping error margin. The size of the homozygous regions was calculated in megabases (Mb) and in centiMorgans (cM), based on the deCODE genetic map (Kong et al).

Exome sequencing ES on the patient’s DNA was performed using the SureSelect 50Mb exome capture kit (Agilent, Santa Clara, CA, USA) following the manufacturer’s protocol. The enriched fragments captured were sequenced using the Illumina HiSeq platform in paired-end mode, with a read length of 100 bp following the manufacturer’s protocol. The raw Fastq files were aligned by using bwa- 0.5.9 to the human reference genome (hg 19, NCBI build 37) (Li et al, 2009a), SAM/BAM files were manipulated by Samtools-0.1.10, and Picard-1.57 (Li et al, 2009b). Then the Genome Analysis Toolkit (GATK) was used to perform base quality score recalibration, duplicate removal and INDEL realignment (McKenna et al). The output vcf files were annotated by our in-house Bioinformatics pipeline and Seattleseq (http://gvs.gs.washington.edu/).

Subsequent mutation analysis CHAPTER 3.2 Sanger sequencing was used to confirm the presence/absence of the SOD2 mutation in the patient and her family members. In addition, screening of all exons and exon/intron junctions of the SOD2 gene was performed in other patients. PCR was performed by using AmpliTaq Gold PCR Master Mix (Invitrogen Life Science Technologies, Carlsbad, CA, USA) following the official protocol and resulting fragments were sequenced by Applied Biosystems’ 96-capillary 3730XL system (Carlsbad, CA, USA).

RNA extraction and Reverse Transcriptase-PCR (RT-PCR) product analysis RNA was isolated from cultured fibroblasts from the patient. Cells were cultured in standard medium for human fibroblasts (Dulbecco’s modified Eagle’s medium with 10% FBS, 1% penicillin/streptomycin, 1% glucose, 1% glutamax) with 5% CO2 at 37°C. RNA was extracted with the RNeasy Mini Kit (QIAGEN, Venlo, the Netherlands) following the manufacturer’s protocol. cDNA was synthesized from 500 ng of total RNA by RevertAid RNaseH- M-MuLV reverse transcriptase in a total volume of 20 μl according to the

SOD2 IN AUTOSOMAL RECESSIVE CARDIOMYOPATHY 143 protocol provided by the supplier (MBI-Fermentas, St Leon-Rot, Germany). To investigate whether the c.542G>T mutation could have an effect on mRNA splicing, we performed RT-PCR with primers specific for SOD2 and designed to amplify the exon that was expected to be affected by the mutation and flanking sequences (primers are available upon request). Target regions were amplified by PCR and the PCR products were examined by 2% agarose gel and analysed by Sanger sequencing. To test for effects of nonsense-mediated decay, fibroblasts were treated with cycloheximide for 4.5 hr, followed by RNA analysis using the same procedures as those for RNA from untreated cells.

Measurement of superoxide substrate levels Fibroblasts, cultured to 70% confluence, were incubated in HEPES-Tris medium containing 10 μM hydroethidine (HEt) for 10 min at 37°C. Within the cell, HEt reacts with O2–• to form the fluorescent and positively charged product ethidium (Et) or oxyethidium. The reaction was stopped by thoroughly washing the cells with PBS to remove excess HEt. For quantitative analysis of Et emission signals, coverslips were mounted in an incubation chamber placed on the stage of an inverted microscope (Axiovert 200 M; Carl Zeiss, Jena, Germany) equipped with a Zeiss ×40/1.3 NA fluor lens objective. Et was excited at 490 nm using a monochromator (Polychrome IV; TILL Photonics, Gräfelfing, Germany). Fluorescence emission was directed using a 525DRLP dichroic mirror (Omega Optical, Brattleboro, VT) through a 565ALP emission filter (Omega Optical) onto a CoolSNAP HQ monochrome charge-coupled device camera (Roper Scientific, Vianen, the Netherlands). The image- capturing time was 100 ms. Routinely, 10 fields of view per coverslip were analysed.

SOD2 protein’s 3D structure As the 3D-structure of the SOD2 protein is known, HOPE software was applied to predict the potential effect of the p.(Gly181Val) missense mutation on the 3D structure of the protein (Venselaar et al). Additionally, the Uniprot protein database (www.uniprot.org) was used to search for known functional features within the mitochondrial Superoxide dismutase [Mn] protein (accession number: P04179) in the region affected by the genetic variation.

144 EXOME SEQUENCING RESULTS Case report Genealogical analysis found a distant relationship between the parents 6 to 8 generations previously, suggesting an autosomal recessive inheritance (figure 1). Array-CGH showed no pathogenic copy number variations. Diagnostic Sanger sequencing results of mitochondrial DNA, isolated from fibroblasts, and of the POLG, MYL2, MYH7, LMNA, DES, SUCLA2 and RYR2 genes were normal. Respiratory chain complexes were found to function normally. Echocardiography revealed no abnormalities in the mother or father (aged 27 and 29, respectively) or in the patient’s younger brother (cardiologically evaluated aged 1 week). CHAPTER 3.2

Figure 1. Pedigree of a Dutch family with a child with severe, lethal DCM, in whom autosomal recessive inheritance was expected due to the pedigree compo- sition. The patient is marked with a black symbol.

SOD2 IN AUTOSOMAL RECESSIVE CARDIOMYOPATHY 145 Figure 2. Homozygosity mapping results show the second longest homozygous region (the longest autosomal homozygous region) on 6, where the SOD2 gene is located.

Homozygosity mapping Homozygosity mapping in the patient (figure 1; X:1) revealed the longest homozygous region was on the X chromosome (figure 2). The longest autosomal region of homozygosity was located on chromosome 6, between rs378512 and rs9458499 (159,949,340-162,713,427 bp; UCSC Genome Browser, build hg19), spanning 268 SNPs and 4.26 cM. This homozygous region contains 26 genes, including the SOD2 gene.

Exome sequencing ES was performed to target all exons and exon/intron junction sequences of the known genes in the human genome to identify potentially pathogenic, disease-causing mutations. Using the sequence analysis pipeline from GATK, we identified 41,621 different variants in the patient’s exome data. Data filtering was performed to exclude all known variants with a high frequency (> 1%) in the dbSNP129, the 1000 Genomes Project, GoNL, ESP6500 databases and in our in-house database. We then selected for coding variants in the remaining 325 variants and subsequently for nonsense, missense, splice site, and frame shift variants in concordance with autosomal recessive inheritance

146 EXOME SEQUENCING (i.e. homozygous or compound heterozygous variants in one gene). This resulted in the identification of a homozygous mutation, c.542G>T; p.(Gly181Val) (NM_000636.2), in the SOD2 gene located in the second longest homozygous region on chromosome 6 (figure 2). This mutation was absent from known control populations (ESP6500, GoNL, and 1000 Genomes). Our ES data was also analysed for potential causal mutations in known cardiomyopathy genes, relevant metabolic and syndromic genes, and nuclear encoded mitochondrial genes, but no putative pathogenic mutations were identified.

Sanger Sequencing, gene-panel-based resequencing and RT-PCR product analysis Using Sanger sequencing, the homozygous mutation c.542G>T; p.(Gly181Val) was confirmed in the affected child (figure 3) and in heterozygous form in her parents, but it was absent in her brother (data not shown). Furthermore, Sanger sequencing of the SOD2 gene in an additional DCM cohort of 27 different paediatric patients and 161 adult patients, and gene- panel-based resequencing of the gene in more than 1,000 adult cardiomyopathy patients revealed no pathogenic SOD2 mutations. RT-PCR product analysis of CHAPTER 3.2 RNA isolated from patient fibroblasts, and cultured both with and without cycloheximide, showed only a transcript of wild type size, indicating that this mutation has no effect on splicing.

−• Superoxide (O2 ) substrate levels For superoxide substrate level measurements, hydroethidine was used as −• an intracellular probe to measure the levels of superoxide (O2 ) in the patient fibroblasts. Notably, hydroethidine is not sensitive toH2O2. Hydroethidine −• is a cell-permeable compound that interacts with O2 to form ethidium or oxyethidium. The oxidation levels of hydroethidine measured in our −• in vitro assay indicated a significant increase of superoxide2 (O ) levels in the fibroblasts of the patient comparable to the order of magnitude seen in complex I deficient fibroblasts (figure 4). What we could not directly −• determine from this data was whether the significant increase of O2 levels resulted from a complex I deficiency or from abnormal SOD2 enzyme activity. However, mitochondrial respiratory chain enzyme activities (complexes I, II, III, IV, and V) were also measured and revealed no differences in the activity, suggesting SOD2 activity as the likely mechanism.

SOD2 IN AUTOSOMAL RECESSIVE CARDIOMYOPATHY 147 control

patient Figure 3. Sanger sequencing confirmed the presence of the homozygousSOD2 variant c.542G>T, p.(Gly181Val) in the affected patient (bottom) compared to control (top) and in heterozygous form in her parents (not shown).

148 EXOME SEQUENCING Figure 4. The oxidat- ion of hydroethidine analysis shows a signi- ficant increase of ROS •− (O2 ) level as measured in both the nuclear and mitochondrial fractions in the fibroblasts of the patient compared to control fibroblasts.

SOD2 3D structure: predicting the effect of the p.(Gly181Val) mutation Using the HOPE software we retrieved the 3D structure information of the SOD2 protein through the WHAT IF Web services, the Uniprot database CHAPTER 3.2 and a series of DAS-servers, in order to predict the effect of the p.(Gly181Val) mutation on the protein structure. The Gly181 residue is part of a manganese/ iron superoxide dismutase domain, which is important for the main activity of the protein. The domain has a function in superoxide dismutase activity ( activity) and metal ion binding. According to the Uniprot database, four important amino acid residues are involved in the formation of the Mn-binding pocket that binds the manganese co-factor of the enzyme (accession number: P04179). These residues are His50, His98, Asp183 and His187. Interestingly, the aspartic acid residue of key importance (Asp183) is only two amino acids away from the Gly181 residue that was mutated in our patient. The increased size of the mutant residue is predicted to disturb the core structure of the manganese/iron superoxide dismutase domain and, as a consequence, the catalytic activity of the enzyme (figure 5).

DISCUSSION Using a combination of homozygosity mapping and ES in the patient, we detected a novel homozygous missense mutation, c.542G>T; p.(Gly181Val), in an evolutionarily highly conserved domain of the SOD2 gene located in the

SOD2 IN AUTOSOMAL RECESSIVE CARDIOMYOPATHY 149 A B

Figure 5. 3D structure of the SOD2 protein: (A) Overview of the SOD2 protein in ribbon- presentation. (B) Magnification of the part of the manganese/iron superoxide dismutase domain where the mutated residue is located. The protein backbone (grey) and the side chains of both the wild-type (green) and the mutant residue (red) are shown. The mutant residue is bigger than the wild-type residue, which may disturb the core structure of this domain and affect the catalytic activity of the enzyme.

second longest homozygous region on chromosome 6. To our knowledge, this is the first report of a major role for mutated SOD2 in human disease. Two facts support the potential pathogenicity of this mutation. The first is that the mutation is located in the functionally important C-terminal manganese/ iron superoxide dismutase region of the respective protein. The second is that drastic differences between the size and the physicochemical­ characteristics of the wild-type glycine (which is the smallest of all residues and its presence is known to often provide flexibility to protein structures) and the mutant valine residues are predicted to disturb the core structure in this crucially important domain. Furthermore, according to the Uniprot database, the mutation is localized only two amino acids away from one of the four Histidine/Aspartic acid residues that are involved in the binding of the manganese co-factor.

The role of the mutation Hydroethidine oxidation measurements indicated a significant increase −• in the levels of O2 (one of the major ROS which are the physiological substrate of the SOD2 enzyme) in the fibroblasts of the patient; this substrate level was comparable in order of magnitude to the levels seen in complex-I-deficient fibroblasts. Since no deficiency in any of the mitochondrial respiratory chain

150 EXOME SEQUENCING −• complexes I-V was seen, this significant increase in2 O could probably be explained by the pathogenic effect of the c.542G>T; p.(Gly181Val) SOD2 mutation on the function of the encoded enzyme, leading to malfunctioning and accumulation of damaging oxygen radicals in the cells and increased oxidative stress.

The role of superoxide dismutase in disease SOD2 belongs to the manganese/iron superoxide dismutase family which is one of the primary families of antioxidant in mammalian cells. These antioxidant enzymes protect cells from the damage caused by ROS. In eukaryotic cells, there are three SOD homologs: Cu/ZnSOD (SOD1), Mn/ FeSOD (manganese superoxide dismutase 2; SOD2) and extracellular SOD3. SOD2 is a mitochondrial matrix protein which converts superoxide anion –• (O2 ) to H2O2 which is then metabolized by glutathione peroxidase into H2O (Alscher et al). Oxidative stress is a deleterious process mediated by ROS, and it can lead to severe damage of cellular structures and their building blocks, including proteins, DNA and lipids (Valko et al). ROS are naturally formed during mitochondrial metabolism, and cells self-regulate their ROS levels CHAPTER 3.2 by producing antioxidant enzymes (Starkov, 2008). Deficiency of one the antioxidant enzymes, such as SOD2, may affect any organ at any age, but most often affect organs with a high energy demand, such as the heart and brain, as is commonly observed in mitochondrial disorders (Meyers et al). Furthermore, it has been reported that oxidative stress and mutations in the SOD2 gene are involved in the pathogenesis of several diseases such as mitochondrial dysfunction, cancer, neurological disorders, diabetes, and many cardiovascular diseases including hypertension, atherosclerosis, and restenosis (Hedskog et al, Jenner, 2003, Louzao & van Hest, Cai & Harrison, Griendling & FitzGerald). There have also been reports of the involvement of other nuclear genes, such as TAZ (D’Adamo et al), TXNRD2 (Conrad et al, Sibbing et al), DNAJC19 (Davey et al, Ojala et al), and SDHA (Levitas et al), in mitochondrial cardiomyopathy, and this also seems applicable to the current case.

Superoxide dismutase in cardiomyopathy Oxidative stress and disturbed mitochondrial respiratory function are known to play a substantial role in the development of heart failure (Huss & Kelly) and the role of the SOD2 protein in cardiomyopathy has previously been demonstrated in mice. Homozygous Sod2 knockout mice showed

SOD2 IN AUTOSOMAL RECESSIVE CARDIOMYOPATHY 151 neonatal lethality due to neurodegeneration and cardiomyopathy (Li et al 1995). In addition, the intake of antioxidants improved their phenotypes of dilated cardiomyopathy and muscle fatigue and had beneficial effects on electrophysiological disturbances in heart and muscle (Koyama et al, Sunagawa et al). Interestingly, heterozygous SOD2+/- mice showed reduced SOD2 enzyme activity, yet did not exhibit any disease phenotype at 9 months of age (Li et al 1995). Likewise, the parents of the severely affected child described here, who are heterozygous carriers of the SOD2 mutation, did not show any cardiac abnormalities. Finally, chemotherapeutic (anthracyclin-induced) cardiomyopathy and heart failure is believed to be a side effect of superoxide radical accumulation leading to the induction of mitochondrial dysfunction in the heart (Thayer, 1988). In fact, this phenotype was successfully rescued in transgenic mice by the overexpression of SOD2 (Yen et al), underscoring the cardioprotective role of this enzyme in healthy individuals.

CONCLUSIONS Here we have reported the successful use of a combined approach using homozygosity mapping and exome sequencing to identify the causal mutation in the mitochondrial protein, SOD2, in a child with severe neonatal cardiomyopathy. Protein conformation predictions and functional evaluation support the role of SOD2 deficiency in the abnormally elevated levels of oxidative stress found in our patient. Oxidative stress itself is known to be involved in the development of various diseases, including cardiomyopathies. The result from our patient adds a novel, nuclear-encoded disease gene to the list of genes involved in severe mitochondrial cardiomyopathies.

ACKNOWLEDGEMENTS We thank the parents and sibling of the patient for participating in this study; Ludolf Boven and Sander Grefte for technical assistance; members of the Genomics Coordination Centre, UMCG, for assistance in data analysis; and Jackie Senior and Kate Mc Intyre for editing this manuscript. Rowida Almomani was supported by the Netherlands Heart Foundation (grant 2010B164).

152 EXOME SEQUENCING REFERENCES Alscher RG, Erturk N, Heath LS: Role of superox- drial superoxide dismutase-deficient mice. ide dismutases (SODs) in controlling oxidative Molecules. 2013 18:1383-93 stress in plants. J Exp Bot. 2002;53:1331–41 Levitas A, Muhammad E, Harel G et al. Familial Burns KM, Byrne BJ, Gelb BD et al. New mechanis- neonatal isolated cardiomyopathy caused tic and therapeutic targets for pediatric heart by a mutation in the flavoprotein subunit of failure: report from a National Heart, Lung, succinate dehydrogenase. Eur J Hum Genet. and Blood Institute working group. Circulation. 2010;18:1160-5 2014;130:79-86 Li H & Durbin R: Fast and accurate short read align- Cai H & Harrison DG: Endothelial dysfunction ment with Burrows-Wheeler transform. Bioin- in cardiovascular diseases: the role of oxidant formatics, 2009;25:1754–60 stress, Circ. Res. 2000;87:840–4 Li H, Handsaker B, Wysoker A et al. The Sequence Conrad M, Jakupoglu C, Moreno SG et al. Essential Alignment/Map format and SAMtools. Bioin- role for mitochondrial thioredoxin reductase in formatics. 2009;25:2078–9 hematopoiesis, heart development, and heart Li Y, Huang TT, Carlson EJ et al. Dilated cardiomy- function. Mol Cell Biol. 2004;24:9414-23 opathy and neonatal lethality in mutant mice D’Adamo P, Fassone L, Gedeon A et al. The X-linked lacking manganese superoxide dismutase. Nat gene G4.5 is responsible for different infantile Genet. 1995;11:376–81 dilated cardiomyopathies. Am J Hum Genet. Louw JJ, Corveleyn A, Jia Y et al. Homozygous loss- 1997;61:862-7 of-function mutation in ALMS1 causes the Davey KM, Parboosingh JS, McLeod DR et al. Muta- lethal disorder mitogenic cardiomyopathy in tion of DNAJC19, a human homologue of yeast two siblings. Eur J Med Genet 2014; pii: S1769- inner mitochondrial membrane co-chaperones, 7212(14)00136-0. doi: 10.1016/j.ejmg.2014.06.004 causes DCMA syndrome, a novel autosomal re- Louzao I & van Hest JC: Permeability effects on the cessive Barth syndrome-like condition. J Med efficiency of antioxidant nanoreactors. Bio- Genet. 2006;43:385-93 macromolecules. 2013;14:2364-72 CHAPTER 3.2 Dellefave L & McNally EM: The genetics of di- McKenna A, Hanna M, Banks E et al. The Genome lated cardiomyopathy. Curr Opin Cardiol. Analysis Toolkit: a MapReduce framework for 2010;25:198-204 analyzing next-generation DNA sequencing Fatkin D, Otway R, Richmond Z: Genetics of dilated data. Genome Res. 2010;20:1297–303 cardiomyopathy. Heart Fail Clin. 2010;6:129–40 Meyers DE, Basha HI, Koenig MK: Mitochon- Griendling KK & FitzGerald GA: Oxidative stress drial cardiomyopathy: pathophysiology, di- and cardiovascular injury: part I: basic mecha- agnosis, and management. Tex Heart Inst J. nisms and in vivo monitoring of ROS, Circula- 2013;40:385-94 tion 2003;108:1912–6 Ojala T, Polinati P, Manninen T et al. New muta- Hedskog L, Zhang S, Ankarcrona M: Strate- tion of mitochondrial DNAJC19 causing dilated gic role for mitochondria in Alzheimer’s and noncompaction cardiomyopathy, anemia, disease and cancer. Antioxid Redox Signal. ataxia, and male genital anomalies. Pediatr Res 2012;16:1476-91 2012;72:432-7 Huss JM & Kelly DP: Mitochondrial energy metab- Posafalvi A, Herkert JC, Sinke RJ et al. Clinical olism in heart failure: a question of balance. J utility gene card for: dilated cardiomyopathy Clin Invest. 2005;115:547-55 (CMD). Eur J Hum Genet. 2013;21. doi: 10.1038/ Jenner P: Oxidative stress in Parkinson’s disease. ejhg.2012.276 Ann. Neurol. 2003;53:S26−S38 Rampersaud E, Siegfried JD, Norton N et al. Rare Kindel SJ, Miller EM, Gupta R et al. Pediatric car- variant mutations identified in pediatric- pa diomyopathy: importance of genetic and meta- tients with dilated cardiomyopathy. Prog Pedi- bolic evaluation. J Card Fail. 2012 18(5):396-403 atr Cardiol. 2011;31(1):39-47 Kong A, Gudbjartsson DF, Sainz J et al. A high-res- Sibbing D, Pfeufer A, Perisic T et al. Mutations in olution recombination map of the human ge- the mitochondrial thioredoxin reductase gene nome. Nat Genet. 2002;31:241-7 TXNRD2 cause dilated cardiomyopathy. Eur Koyama H, Nojiri H, Kawakami S et al. Antioxi- Heart J 2011;32:1121-33 dants improve the phenotypes of dilated car- Somsen G, Hovingh G, Tulevski I et al. Familial diomyopathy and muscle fatigue in mitochon- dilated cardiomyopathy. In: Cinical Cardioge-

SOD2 IN AUTOSOMAL RECESSIVE CARDIOMYOPATHY 153 netics. Baars H, Doevendans P, Smagt J, eds. Springer 2011, 63–77 Starkov AA: The role of mitochondria in reactive oxygen species metabolism and signaling. Ann N Y Acad Sci 2008;1147:37-52 Sunagawa T, Shimizu T, Matsumoto A et al. Car- diac electrophysiological alterations in heart/ muscle-specific manganese-superoxide dis- mutase-deficient mice: prevention by a di- etary antioxidant polyphenol. Biomed Res Int 2014:704291. doi:10.1155/2014/704291 Thayer WS: Evaluation of tissue indicators of oxida- tive stress in rats treated chronically with adri- amycin. Biochem Pharmacol 1988;37:2189-94 Theis JL, Sharpe KM, Matsumoto ME et al. Homo- zygosity mapping and exome sequencing re- veal GATAD1 mutation in autosomal recessive dilated cardiomyopathy. Circ Cardiovasc Genet. 2011;4(6):585-94 Theis JL, Zimmermann MT, Larsen BT et al. TN- NI3K mutation in familial syndrome of con- duction system disease, atrial tachyarrhythmia and dilated cardiomyopathy. Hum Mol Genet 2014; pii:ddu297 Towbin JA, Lowe AM, Colan SD et al. Incidence, causes, and outcomes of dilated cardiomyopa- thy in children. JAMA 2006;296:1867-76 Valko M, Leibfritz D, Moncol J et al. Free radicals and antioxidants in normal physiological func- tions and human disease. Int J Biochem Cell Biol 2007;39:44-84 Venselaar H, Te Beek TA, Kuipers RK et al. Protein structure analysis of mutations causing inher- itable diseases. An e-Science approach with life scientist friendly interfaces. BMC Bioinformat- ics 2010;11:548 Yen HC, Oberley TD, Gairola CG et al. Manganese superoxide dismutase protects mitochondrial complex I against adriamycin-induced cardio- myopathy in transgenic mice. Arch Biochem Biophys 1999;362:59-66

154 EXOME SEQUENCING

Chapter 3.3

One family, two cardiomyopathy subtypes, three disease genes: an intriguing case

Anna Posafalvi, Nicole Corsten-Janssen, Paul A van der Zwaag, Jan G Post, Richard J Sinke, J Peter van Tintelen, Jan DH Jongbloed ABSTRACT Pedigree information is often crucial in making decisions in clinical genetic counselling and diagnostics. Here we report on how pedigree information guided genetic analysis in a large, complex family with three affected individuals suffering from neonatal or late-onset dilated cardiomyopathy. Exome sequencing in combination with haplotype sharing tests led to causal mutations in the MYL2 (myosin light chain 2) and SOD2 (superoxide dismutase 2) genes in two deceased babies, respectively. Now targeted next generation sequencing based on a cardiomyopathy gene panel has revealed the possible role of another gene, JUP (junction plakoglobin), in one of the grandmothers affected with adult DCM. We present the 10-generation family pedigree that was constructed during the course of continuing genetic analyses and discuss aspects that directed diagnostic routing. We show the benefit of using pedigree data for the clinical genetic work on an intriguing familial cardiomyopathy case. INTRODUCTION Idiopathic dilated cardiomyopathy (DCM) is a rare, progressive disease of the myocardium, usually exhibiting an autosomal dominant inheritance pattern and late onset of symptoms of heart failure (such as dyspnoea, syncope, and oedema), arrhythmias and thromboembolism. In some cases, however, cardiomyopathy may start at a very young age or just after birth, when it often proves to be lethal. This form of the disease (called neonatal or paediatric cardiomyopathy) is believed to be caused by autosomal recessive mutations. There are more than 50 genes known to be involved in cardiomyopathies, but since they only explain the disease in a relatively small proportion of patients, there must be novel genes to be discovered in many unsolved families (Teekakirikul et al, Almomani et al, see also chapter 3.2; Posafalvi et al). Pedigree information is often very important for genetic screening decisions in cardiomyopathy families. Here we report on how the family’s pedigree guided our genetic analyses in an unusual case of an extended consanguineous family that is affected by two types of dilated cardiomyopathy. The family shows a regular adult-onset disease (putatively autosomal dominant (AD)) and a severe neonatal form (putatively autosomal recessive (AR)), with CHAPTER 3.3 three possible disease-causing genes underlying the condition.

MATERIALS AND METHODS Patients The family pedigree is shown in figure 1. Patient X:1 died at the age of 6 months from a severe neonatal form of dilated cardiomyopathy, and was later found to carry a homozygous mutation (c.403-1G>C) in an acceptor splice site of intron 6 of the known DCM gene, myosin regulatory light chain 2 (MYL2). This mutation leads to the activation of a cryptic splice site, causing a frameshift in the C-terminal EF-hand motif of the encoded protein. Functional follow- up experiments showed that the calcium-binding properties of the mutant molecule were perturbed (Weterman et al). Parents IX:1 and IX:2 have since had a second affected baby who died from the same disease at age 4 months. This child was also homozygous for the MYL2 mutation, which caused huge emotional distress to the family. There is one healthy older sibling (not shown in pedigree) and the mother (IX:2) had three miscarriages before patient X:1 was born. The grandmother of patient X:1, VIII:2, was diagnosed with heart failure due to dilated cardiomyopathy at the age of 54 years. Patient X:2 also

1 FAMILY, 2 TYPES OF CARDIOMYOPATHY, 3 DISEASE GENES 159 Figure 1. The 10-generation pedigree of a family with both neonatal and late- onset cardiomyopathy. Square symbols (men), circles (women); black symbol (child affected by neonatal dilated cardiomyopathy), grey symbol (person affected by adult- onset dilated cardiomyopathy), diagonal line through symbol (deceased). The pedigree is incomplete; it only indicates the degree of relationship between patients VIII:2, X:1 and X:2. The genealogical cross-links within the family were discovered by Eric Hennekam.

suffered from a lethal, neonatal form of DCM and died three days after birth (see also chapter 3.1). Her parents, IX:3 and IX:4, and her brother (not shown in pedigree) were found to be unaffected.

Homozygosity mapping SNP genotyping on a HumanCytoSNP-12 BeadChip® 300K SNP array (Illumina, San Diego, CA, USA) and data analysis by genomestudio® (Illumina)

160 EXOME SEQUENCING and Microsoft® Office Excel 2010 (Version 14.0; Microsoft, Redmond, WA, USA) software was performed as described earlier by van der Zwaag et al. We aimed to identify chromosomal regions which are homozygous in the patients or shared by the patients.

Targeted NGS Sample preparation and targeted enrichment of a panel of 55 cardio- myopathy-related genes were performed according to the manufacturer’s instructions (SureSelect XT Custom library, SureSelect Library prep kit, Agilent Technologies, Inc., Santa Clara, CA, USA), and as recently described in more detail by Sikkema-Raddatz et al. Sequencing was performed on a MiSeq sequencer (Illumina, San Diego, CA, USA) using 151 bp paired-end sequencing. Subsequent data analysis and variant filtering were performed with Next Gene (v2.2.1, Softgenetics, State College, PA) and Cartagenia (Cartagenia, Leuven, Belgium) software, as described in chapter 4.1.

Variant classification

To classify the variants identified, we performed a comprehensive analysis CHAPTER 3.3 using information on the type of variation, the evolutionary conservation of the affected residue and the residing protein-region, the frequency of the variant in numerous control populations (such as 1000 Genomes, GoNL, and ESP6500), and the pathogenicity predicted by Alamut software (version 2.3.6), PolyPhen2, AGVGD, SIFT and MutationTaster. In addition, literature and database searches for further information were implemented. Finally, we uploaded the list of variants to the Combined Annotation Dependent Depletion online variant prioritizer tool (CADD, http://cadd.gs.washington. edu/info) to obtain a list of top candidate variants that could be considered to be likely pathogenic.

RESULTS The role of the pedigree in making genetic analysis decisions The initial three-generation pedigree of this family (not shown) indicated several possible modes of inheritance for the disease, including autosomal recessive or di-/oligogenic inheritance in particular. Initially, we considered the involvement of the same mutation in homozygous form in the neonatal cases and in heterozygous form in the late-onset DCM that affected the

1 FAMILY, 2 TYPES OF CARDIOMYOPATHY, 3 DISEASE GENES 161 grandmother. Since there were two cousins affected by the same, very rare, lethal (supposedly recessive) disease in the same family, we were curious if there was a more complex relation between the family members. After successfully extending the pedigree to 10 generations (figure 1), and discovering the multiple genealogical cross-links and distant consanguinity between the four parents of the affected babies, the inheritance model shifted towards a combination of the two forms mentioned above, i.e. the same gene, carrying an autosomal recessive inherited mutation, was anticipated to underlie disease in the neonatal cases X:1 and X:2, while the genetic cause of the disease would be independent and autosomal dominant in the grandmother (VIII:2). To find shared homozygous regions, homozygosity mapping was performed on DNA samples of the two neonatal patients. However, there was also still a possibility of finding homozygosity in X:2 and compound heterozygosity in X:1, carrying the same mutation as X:2 but in heterozygous form and combined with another, independent mutation, inherited from the grandmother (VIII:2) and causing the late-onset of her phenotype. Surprisingly, this mapping approach did not result in the identification of particularly large, shared homozygous regions (the longest such region was only 3.59 cM). Moreover, our analyses with the goal of identifying a homozygous region in X:2 that was heterozygously present in X:1 did not reveal any putative candidate gene regions either. These negative results were later supported by the identification of the homozygousMYL2 mutation in X:1 and the exclusion of this mutation in patient X:2. Subsequently, homozygosity mapping on the individual samples was performed to identify independent homozygous regions in the genome of X:2 that were not shared with X:1. When conducting this analysis for X:1, the MYL2 splice site mutation was found to be located in the 3rd largest homozygous chromosomal region of 5.89 cM (12q24.11-q24.13), which happened to be the second longest autosomal homozygous fragment in the patient. The search for independent homozygous regions in the genome of X:2 that were not shared with X:1 was combined with exome sequencing. This resulted in the discovery of a causative, recessively inherited variant in the nuclear encoded mitochondrial enzyme superoxide dismutase (SOD2). This is located in the longest autosomal homozygous region (6q25.3-q26; the 2nd longest such chromosomal region), spanning 4.26cM. Functional studies performed on the fibroblast samples of patient X:2 confirmed elevated levels of the substrate (oxygen free radicals), while the possible defect of any of the mitochondrial

162 EXOME SEQUENCING complexes was excluded. Together, these findings strongly support the expected pathogenic role of the recessive mutation, c.542G>T; p.(Gly181Val), in this novel DCM gene (Almomani et al, see also chapter 3.2). Finally, as the 10-generation pedigree indicated that the DCM in the grandmother (VIII:2) could not be genetically related to the disease in X:1 or X:2, targeted NGS was performed on her DNA. Nevertheless, we excluded her as a carrier of either the MYL2 or SOD2 variants.

Targeted sequencing identifies the third gene in VIII:2 The grandmother of the MYL2 patient, VIII:2, was shown to not carry the MYL2 mutation, neither could she potentially carry the SOD2 mutation. Targeted sequencing using our cardiomyopathy gene-panel revealed 152 vari- ants in total in the 55 genes covered by the panel, of which four variants remained after filtering the data with the standard parameters of our analysis pipeline. Our routine classification method pointed to RYR2 (ryanodine receptor 2), c.3517A>G, p.(Met1173Val) and/or JUP (junction plakoglobin), c.746C>T, p.(Thr249Met) as likely pathogenic variants (see details in table 1). In both cases, the respective amino acid residues are highly conserved (the CHAPTER 3.3 Met1173 of RYR2 at least up to chicken, and the Thr249 residue of JUP up to Drosophila), and all the protein effect prediction programs we used supported the likely pathogenicity. The RYR2 variant was absent from well-known population databases, while the JUP variant was reported only once in 8,600 European American alleles in the ESP database. Subsequent application of the CADD tool strongly suggested a primary pathogenic role, as the scaled C-score for the JUP variant is an extremely high 29.8. Since CADD uses a logarithmic scale, this means a much higher predicted deleteriousness than that of the RYR2 missense variant, which scored only 15.15. Additionally, a score of ~30 indicates that the variant belongs to the top 0.1% most deleterious of all substitutions that might theoretically occur in the human genome (CADD website and Kircher et al). Hence, without any further functional confirmation of the effects of the four variants, it seems most probable, from the currently available data, that JUP is the cause of the DCM in patient VIII:2 (although a digenic background for the disease development and causative roles for both variants cannot be excluded).

1 FAMILY, 2 TYPES OF CARDIOMYOPATHY, 3 DISEASE GENES 163 no effect no effect no effect no effect splicing SPRY (SPIa/Ryanodine receptor) (SPIa/Ryanodine SPRY armadillo sodium transport-associated conserved domain conserved - Mutation Taster Mutation polymorphism polymorphism disease causing disease causing (p-value: 0.998) (p-value: 0.999) (p-value: 0.996) (p-value: 0.93) [-12.36;6.18] GERP 6.17 5.41 2.38 4.30 tolerated tolerated SIFT deleterious deleterious PhyloP PhyloP [-14.1;6.4] 0.77 5.86 1.42 -2.14 missense variants were classified as the most likely to be classified as the most likely were missense variants RYR2/JUP high / nucleotide / amino acid weak /high weak /moderate weak not conserved / weak conservation PolyPhen2 benign (0.000) benign (0.007) benign possibly damaging (0.920) damaging (1.000) probably p.(Thr249Met) has the highest pathogenicity rank. Table D shows our final classification of our final classification D shows Table p.(Thr249Met) has the highest pathogenicity rank. JUP transcript ID transcript NM_001035.2 NM_198056.2 NM_004572.3 NM_002230.2 C15 C0 C0 C0 AGVGD

81 variant effect variant Grantham distance Grantham 58 21 155 p.(Met1173Val) p.(Ala1102Thr) p.(Ser140Phe) p.(Thr249Met) variant genomic, cDNA and protein cDNA and protein genomic, chr1:237732538A>G c.3517A>G c.3517A>G chr1:237732538A>G chr3:38620911C>T c.3304G>A c.419C>T chr12:33031395G>A c.746C>T chr17:39923794G>A coordinate Only four variants remained after filtering for the region of interest, read depth, polymorphisms and artefacts,known variant of interest, and region for the after filtering remained variants Only four RYR2 PKP2 SCN5A gene TABLE B TABLE JUP RYR2 PKP2 frequency in various databases. According to our standardized classification system based on evolutionaryconservation system classification A), (table our standardized to According frequency in various databases. NGS. prediction software (table B) and allele frequencies (table C),prediction software (table B) and allele frequencies the SCN5A causative. According to the CADD tool, the CADD tool, to According causative. gene the four variants. the four JUP TABLE A TABLE Table 1 (A-D). Interpretation of putative pathogenicity of variants found in the adult DCM patient VIII:2 by gene-panel-basedVIII:2 by in the adult DCM patient found of variants pathogenicity of putative Interpretation 1 (A-D). Table

164 EXOME SEQUENCING not present not present not present GoNL 6/996=0.006 final conclusion final CAUSATIVE VARIANT CAUSATIVE ESP not present (in about not present EA: A=26/G=8575 EA: A=1/G=8599 (ALL: not present (in about not present

(ALL: 26/6503=0.004) (ALL: 13000) 13000) 1/6503=0) PHRED 29.8 15.15 1,385 11.56 rs number rs150821281 rs377612199 - - CHAPTER 3.3 RawScore CADD results 5,033,031 2,746,820 not present allele; MAF: non-pathogenic probable dbSNP db134, validated, clinical significance: clinical significance: db134, validated, frequencyno validation, no db138, (0/2184 in 1000genomes) data control database frequencies database control 3/2184=0.001 - 1,673,660 -0.582387 LIKELY PATHOGENIC LIKELY PATHOGENIC LIKELY Likely Benign our classification VOUS

HGMD mutation database mutation not present not present (as disease causing) present not present RYR2 PKP2 RYR2 PKP2 SCN5A SCN5A gene gene TABLE D TABLE TABLE C TABLE JUP JUP

1 FAMILY, 2 TYPES OF CARDIOMYOPATHY, 3 DISEASE GENES 165 DISCUSSION Here we report on the unusual and rare example of a multigeneration, triple-consanguineous family that is affected by two distinct types of cardiomyopathy (namely, an adult onset form and a lethal neonatal form). Three different genes have been associated with the respective phenotypes in the three patients. The family’s large, ten-generation pedigree played an important role in guiding the genetic analyses to uncover the mutations causing the cardiomyopathy disease types observed in the family. Although it was at first unexpected that the two affected babies would have different genetic causes of their disease, the genealogical reconstruction of the pedigree clearly indicated that they could have independent causes. This observation was further substantiated by the homozygosity mapping on the neonates’ DNA samples, which did not result in the identification of any obvious candidate regions. Based on the pedigree composition, one can quickly appreciate that the founder of the SOD2 mutation and the founder of the MYL2 mutation are likely to be ancestors from different branches of the family. The MYL2 mutation was most likely inherited from I:2, while potentially I:2, I:3, III:5 or III:6 could have been the individuals with the initial SOD2 mutation. Although the cardiac symptoms of both recessive patients seemed to be comparable at first glance, a systematic evaluation revealed clear differences in their phenotypes. Both were suffering from lethal neonatal cardiomyopathy, but the baby with the homozygous MYL2 mutation had myopathy with fibre type disproportion type 1, while the baby with the homozygous SOD2 mutation suffered from subependymal cysts (which is a possible manifestation of mitochondrial disease affecting the central nervous system), and she did not have typical skeletal muscle problems. For both genes, homozygosity mapping on the individual DNA samples (as shown in figure 2) supported their involvement in disease: MYL2 and SOD2 are localized in one of the longest autosomal homozygous regions of the patients (SOD2 in the 1st, MYL2 in the 3rd longest region). In fact, this was a major determinant in the completion of the exome sequencing data analysis for X:2. The fact that both patients carried relatively small homozygous regions (including those “relatively large” ones harbouring the causal mutations) supports the idea that both mutations are quite old and have been inherited from a founder many generations ago. We have identified the variant p.(Thr249Met) of JUP as the most likely cause of disease in VIII:2. Upon stringent filtering of the respective gene-

166 EXOME SEQUENCING A) Figure A

B) Figure B CHAPTER 3.3

Figure 2. SNP genotyping results for patients X:1 and X:2. Homozygous regions identified in patient X:1 (A) and patient X:2 (B) are shown. The genes identified as causative are marked in the 2nd and 3rd longest homozygous regions, respectively.

1 FAMILY, 2 TYPES OF CARDIOMYOPATHY, 3 DISEASE GENES 167 panel-based targeted NGS data, two of the remaining four variants (the above-mentioned JUP and a missense variant of RYR2) were predicted and clas- sified as “likely pathogenic”. The same JUP variant was previously reported as an incidental finding in 1/1236 alleles of exome-sequenced individuals (non-selected for cardiomyopathy, arrhythmia, or sudden death, though some of them having an increased risk or a history of coronary artery disease), and was also classified as a variant of unknown clinical significance (Ng et al). This finding neither supports, nor excludes the putative role of the variant in cardiomyopathy development. However, the use of the new CADD online variant prioritizer tool pinpointed this JUP p.(Thr249Met) variant as an order of magnitude more likely to be disease-causing than the second best candi- date, an RYR2 missense variant. According to the Uniprot database (accession number: P14923), the JUP variant is located in the third ARM repeat of the encoded junction plakoglobin protein, which spans amino acids 216-255. It is involved in the interaction with desmocollin and desmoglein (Witcher et al), the cadherins known to play a role in cell adhesion and desmosome formation (Garrod et al). It is important to note that heterozygous missense mutations of neither JUP nor RYR2 have been associated with DCM so far, although both have been associated with arrhythmogenic right ventricular cardiomyopathy (ARVC). At this stage, we cannot exclude the possibility that the combina- tion of both variants were the trigger to the development of DCM, or that an unidentified gene was also involved in the disease. However, according to our current data, the role of the JUP variant seems the most probable, and this could be easily followed up by functional experiments investigating the potentially impaired binding of desmocollin and desmoglein in the presence of the variant. Additionally, a recent study on 639 DCM patients suggested that the genetic overlap between various types of cardiomyopathy is much more extensive than previously estimated; it reported that 31% of the truly pathogenic mutations of DCM patients are mutations of typical ARVC-related genes and have been previously associated with ARVC (Haas et al). This intriguing family nicely exemplifies the importance of extensive analysis of the family history by pedigree reconstruction in genetic counselling. These genealogical studies led to an easier interpretation of why the MYL2 mutation was not found in patients VIII:2 and X:2, as well as of the rare recessive phenotypes caused by the two different genes. The common ancestor carrying the founder MYL2 mutation can be identified nine generations ago, and the one carrying the SOD2 mutation either nine or seven generations ago.

168 EXOME SEQUENCING Our homozygosity mapping data supports the idea that both mutations are old and were transmitted through multiple generations. The improved understanding of the genetic background of the family has essential practical implications too. The parents of both X:1 and X:2 have been counselled that they have a 25% recurrence risk due to their carriership of an autosomal recessive disease. With the identification of the causative gene, they can now consider the reproductive options available (e.g. prenatal screen- ing). This is of outmost importance - especially given that IX:1 and IX:2 have, in the meantime, had a second baby who was also affected by the same lethal disorder. Fortunately, prenatal diagnosis in a very recent pregnancy of IX:4 indicated that the foetus was not a homozygous carrier of the SOD2 mutation. Thus, with good genetic counselling and prenatal screening, this family should be able to avoid having any more seriously affected children.

ACKNOWLEDGEMENTS We would like to acknowledge all those involved in counselling the distinct branches of this family. We thank Eric Hennekam, UMCU, for the pedigree construction; Jos Dijkhuis and his team in the Genome Diagnostics laboratory, Department of Genetics,

UMCG, for technical support and performing the molecular genetic tests; and Jackie CHAPTER 3.3 Senior and Kate Mc Intyre for editing this manuscript.

REFERENCES Almomani R, Posafalvi A, Herkert JC et al. Homo- Sikkema-Raddatz B, Johansson LF, de Boer EN et al. zygous SOD2 mutation as a cause of severe Targeted next-generation sequencing can re- neonatal dilated cardiomyopathy (manuscript place Sanger sequencing in clinical diagnostics. in preparation, see also chapter 3.2) Hum Mutat 2013;34:1035-42 Garrod DR, Merritt AJ, Nie Z. Desmosomal cadher- Teekakirikul P, Kelly MA, Rehm HL et al. Inherit- ins. Curr Opin Cell Biol 2002;14:537-45 ed cardiomyopathies: molecular genetics and Haas J, Frese KS, Peil B et al. Atlas of the clinical clinical genetic testing in the postgenomic era. genetics of human dilated cardiomyopathy. Eur J Mol Diagn 2013;15:158-70 Heart J 2014; pii: ehu301 van der Zwaag PA, van Tintelen JP, Gerbens F et al. Kircher M, Witten DM, Jain P et al. A general Haplotype sharing test maps genes for familial framework for estimating the relative patho- cardiomyopathies. Clin Genet 2011;79:459-67 genicity of human genetic variants. Nat Genet Weterman MA, Barth PG, van Spaendonck-Zwarts 2014;46:310-5 KY et al. Recessive MYL2 mutations cause in- Ng D, Johnston JJ, Teer JK et al. Interpreting sec- fantile type I muscle fibre disease and cardio- ondary cardiac disease variants in an exome co- myopathy. Brain 2013:136;282-93 hort. Circ Cardiovasc Genet 2013;6:337-46 Witcher LL, Collins R, Puttagunta S et al. Desmo- Posafalvi A, Herkert JC, Sinke RJ et al. Clinical utility somal cadherin binding domains of plakoglo- gene card for: dilated cardiomyopathy (CMD). Eur bin. J Biol Chem 1996;271:10904-9 J Hum Genet 2013;21(10). doi: 10.1038/ejhg.2012.276

1 FAMILY, 2 TYPES OF CARDIOMYOPATHY, 3 DISEASE GENES 169

CHAPTER 4 TARGETED SEQUENCING

Chapter 4.1

Gene-panel-based Next Generation Sequencing (NGS) substantially improves clinical genetic diagnostics in inherited cardiomyopathies

Anna Posafalvi*, Jan DH Jongbloed*, Renee C Niessen, Paul A van der Zwaag, Yvonne Hoedemaekers, Birgit Sikkema-Raddatz, Jos Dijkhuis, Sebastiaan RD Piers, Katja Zeppenfeld, Rudolf A de Boer, Paul L van Haelst, Daniela QCM Barge-Schaapveld, Folkert W Asselbergs, Jasper J van der Smagt, Maarten P van den Berg, J Peter van Tintelen§, Richard J Sinke§

*The first two authors contributed equally §The last two authors contributed equally

Manuscript submitted ABSTRACT Background: Targeted next generation sequencing (NGS) is an attractive approach for the screening of multiple genes underlying genetic heterogeneous diseases, such as cardiomyopathies. We implemented an enrichment kit targeting 55 cardiomyopathy-related genes in our routine diagnostics work. The aim of this study was to determine the diagnostic yield, to evaluate the contribution of mutations in genes that were previously only infrequently or never screened for, and to obtain more insight into the suggested bigenic or multigenic inheritance patterns in a subset of patients. Methods and Results: DNA samples of 252 cardiomyopathy patients were analysed and their clinical characteristics collected. Patients with one or more variants labelled as ‘likely pathogenic’ or ‘pathogenic’ were considered to be ‘resolved’. Retrospective phenotype evaluation showed that of these 252 patients, 125 fulfilled the formal clinical criteria for a cardiomyopathy disease, 44 were suspected of having cardiomyopathy, and 37 had an unconfirmed diagnosis. We excluded 46 from further analysis. We identified pathogenic or likely pathogenic mutations in 107/206 (52%) patients: in 56% (40/72) of dilated cardiomyopathy (DCM) patients fulfilling the clinical criteria, and in 52% (12/23) of DCM-like patients. Truncating mutations in TTN were found in 14% of DCM patients. The yield in hypertrophic cardiomyopathy (HCM) and HCM-like patients was 46% (21/46) and 36% (4/11), respectively. In >50% of all our cardiomyopathy cases, we identified mutations in genes that were previously rarely analysed, and in 15% of cases, we found two or more pathogenic or likely pathogenic mutations. Conclusions: Targeted sequencing of cardiomyopathy genes results in a diagnostic yield of over 50%. In particular, our yield for genetic testing of DCM patients was substantially increased (approx. 55% vs. 20-25% earlier). As this NGS method enables a large set of genes to be screened, including some infrequently studied genes, it opens up new avenues for exploring the role of ‘rare’ genes and/or multiple mutations underlying inherited cardiomyopathies.

Key Words: Next Generation Sequencing, targeted enrichment, clinical diagnostics, diagnostic yield, cardiomyopathy, genetics INTRODUCTION Next Generation Sequencing (NGS) is one of the most promising developments in clinical genetics, including cardiogenetics, of the past few years (Jongbloed et al). This technique enables clinicians to make a genetic diagnosis – within a short time frame – for diseases which potentially have multiple genes underlying the phenotype. To apply NGS in a clinical diagnostics setting, the currently preferred method appears to be dedicated and reliable targeted enrichment, which provides sufficient specificity and sensitivity to replace the gold standard of Sanger sequencing (Sikkema-Raddatz et al). The use of several targeted enrichment methods (putatively) applicable for clinical diagnostics have been reported recently, with most of them using array-based enrichment and targeting a relatively small subset of genes (Harakalova et al, Almomani et al, Mook et al). However, approaches applying in-solution enrichment methods are also becoming increasingly popular (Sikkema-Raddatz et al, Lopes et al): these require smaller amounts of input DNA, while providing higher efficiency and better reproducibility, and being easier to handle (Querfurth et al, Shearer et al). Cardiomyopathies are a group of genetically and sometimes phenotypically overlapping heterogeneous disorders. The major subforms, in which over 50 disease genes have been identified, include arrhythmogenic right ventricular (ARVC), dilated (DCM), hypertrophic (HCM), left-ventricular non-compaction (LVNC), and restrictive (RCM) cardiomyopathies. Many of these genes are

involved in different types of the disease (Teekakirikul et al, van Tintelen et al). CHAPTER 4.1 In the pre-NGS era, the yield of diagnostic screening in well-defined patient cohorts varied widely: 35-70% in HCM (Christiaans et al, Pinto et al, Wilde & Behr), 20-25% in DCM (Wilde & Behr, Posafalvi et al, van Spaendonck-Zwarts et al, 2013), approximately 50% in ARVC (Cox et al, Quarta et al, te Rijdt et al), 25-40% in LVNC (Teekakirikul et al, Hoedemaekers et al), and approximately 35% in RCM (Teekakirikul et al). Since the number of genes associated with cardiomyopathies is large and still growing, this disease is an ideal candidate for the implementation of the rapidly developing NGS-based diagnostic tools. Several studies have already reported the screening of multiple cardiomyopathy genes (range 5 to 84 genes) within one experiment using NGS (Voelkerding et al, Gowrisankar et al, Zimmerman et al, Meder et al, Mook et al, Lopes et al, Pugh et al, Haas et al). Some of these studies only focused on cohorts of one type of cardiomyopathy.

DIAGNOSTIC YIELD OF CARDIOMYOPATHIES 175 We have recently demonstrated that the sensitivity, specificity and ro- bustness of targeted NGS for cardiomyopathies is equal to those of Sanger se- quencing (SS) (Sikkema-Raddatz et al). Subsequently, we constructed an im- proved enrichment kit targeting 55 cardiomyopathy genes and implemented this into our routine clinical diagnostic work. Here we report on the outcome and yield when we used this gene panel in a large cohort of cardiomyopathy patients. Patients were diagnosed with various types of cardiomyopathies in- cluding DCM, ARVC, HCM, LVNC, RCM, or with phenotypic characteristics related to cardiomyopathy, but not yet classified as a specific subtype. Their DNA was screened for variants using our 55 gene-panel-based method and, after data analysis, variant filtering and prioritization, we classified the vari- ants found with the help of a strategy developed in-house. Our hypothesis was that implementing this test into routine diagnostics would lead to: (1) higher diagnostic yield, (2) identification of mutations in genes that were previously infrequently or never screened, and (3) provide more insight into the suggested bigenic or multigenic inheritance in a subset of cardiomyopathy patients.

METHODS Patient material DNA was isolated according to standard operating procedures from peripheral blood samples obtained from 252 cardiomyopathy patients, who were referred to our laboratory for gene-panel-based genetic analysis. Informed consent to perform the diagnostic screening was obtained from all patients. They were referred to our department by four Dutch clinical genetics centres: Groningen, Leiden, Nijmegen and Utrecht.

Targeted sequencing DNA fragment libraries were prepared according to the manufacturer’s instructions (SureSelect XT Custom library, SureSelect Library prep kit, Agilent Technologies Inc., Santa Clara, CA, USA). The following experimental steps were performed: fragmentation of genomic DNA samples, end-repair, adapter ligation, size selection, and amplification of the purified product. Targeted enrichment was performed according to the manufacturer’s instructions (Sureselect XT Custom library, Agilent Target Enrichment kit & Agilent SureSelect MP Capture Library kit, Agilent Technologies Inc.).

176 TARGETED SEQUENCING Hybridization of the DNA fragment libraries with the capture probes for 55 selected genes was performed, followed by purification and barcoding of the captured fragments. Finally, equimolar pools of 12 samples were prepared. Sequencing was performed on a MiSeq sequencer (Illumina, San Diego, CA, USA) using 151 bp paired-end reads according to the manufacturer’s instructions. The sample preparation, targeted enrichment and sequencing method has been described in detail by Sikkema-Raddatz et al. Capture probes of the following 55 cardiomyopathy-related genes were included in the custom designed, targeted enrichment kit. Their respective OMIM IDs are given in brackets, and genes marked by # were recently added to the improved version of the 48-gene enrichment kit described by Sikkema- Raddatz et al: ABCC9 (*601439), ACTC1 (*102540), ACTN2 (*102573), ANKRD1 (*609599), BAG3 (*603883), CALR3 (*611414), CAV3# (*601253), CRYAB (*123590), CSRP3 (*600824), DES (*125660), DMD (*300377), DSC2 (*125645), DSG2 (*125671), DSP (*125647), DTNA# (*601239), EMD (*300384), EYA4# (*603550), GATAD1# (*614518), GLA (*300644), JPH2 (*605267), JUP (*173325), LAMA4 (*600133), LAMP2 (*309060), LDB3 (*605906), LMNA (*150330), MYBPC3 (*600958), MYH6 (*160710), MYH7 (*160760), MYL2 (*160781), MYL3 (*160790), MYPN (*608517), MYOZ1 (*605603), MYOZ2 (*605602), NEXN# (*613121), PKP2 (*602861), PLN (*172405), PRKAG2 (*602743), PSEN1 (*104311), PSEN2 (*600759), RBM20 (*613171), RYR2 (*180902), SCN5A (*600163), SGCD (*601411), SOD2# (*147460), TAZ (*300394), TBX20 (*606061), TCAP (*604488), TMEM43 (*612048), TNNC1 (*191040), TNNI3 (*191044), TNNT2 CHAPTER 4.1 (*191045), TPM1 (*191010), TTN (*188840), TXNRD2# (*606448), VCL (*193065).

Sequence annotation and variant calling Data analysis was performed using the MiSeq reporter program (Illumina, San Diego, CA, USA) to generate fastq.gz output files. These were uploaded to the NextGene software (v2.2.1, Softgenetics, State College, PA, USA) and upon quality filtering, aligned to the reference genome (Human_v37.2). SNPs and indels were called, and the respective variant list was converted into the *.vcf file format for further analysis.

Variant filtering, interpretation and prioritization The *.vcf files obtained from NextGene were uploaded into the Cartagenia software (Cartagenia, Leuven, Belgium), with which variant filtering and

DIAGNOSTIC YIELD OF CARDIOMYOPATHIES 177 classification was performed (as summarized in figure 1 and described in the supplementary methods). Remaining variants were evaluated for their potential pathogenicity using in silico prediction tools and data, available via the Cartagenia and Alamut programs (versions 2.3.2 and 2.3.6, respectively; Interactive Biosoftware, Rouen, France) and/or other resources (see table 1). We took various factors into account, such as the nature and location of the variants, the conservation of this area, the frequency of the variant in the general population (when it was available in any of the healthy or patient population databases), and the predicted pathogenicity of the variant according to multiple prediction programmes. Moreover, data on variants available from the scientific literature and from disease and variant databases (such as the Leiden Open Variant Databases (LOVDs) and the ARVD/C genetic variants

Figure 1: Flowchart of the Cartagenia filtering tree used to determine our final variant list for analysis. Variant filtering strategy as used by the Cartagenia software. The input variant list contained on average 168 (± 24) variants per patient. Details of the filtering steps and strategy are described in the Supplementary methods. After performing the filtering steps, an average of 8 (± 6) variants per patient remained on the final variant list. These were classified after data-mining using Cartagenia and Alamut. Variants which were classified as ‘benign’ or ‘likely benign’ are regularly being added to our in-house database of managed variants (grey feed-back loop). The respective population control cohorts were: GoNL Genome of the Netherlands; 1000G 1000 Genomes project; ESP6500 6500 exomes from the NHLBI Exome Sequencing Project (ESP); and dbSNP the dbSNP database of NCBI.

178 TARGETED SEQUENCING Table 1. Criteria for variant classification

Classification Mutation type Criteria Benign (B) any MAF* >0.02 intronic or Likely benign (LB) No predicted# significant changes in RNA splicing synonymous No predicted# significant changes in RNA splicing AND missense No, or only ¼ of prediction programs^ used suggest pathogenicity AND residue and surrounding residues not evolutionary conserved Variant of uncertain Variants which do not fit into any of the other categories, or for any significance (VOUS) which the available information is contradictory Large effect on recognition of consensus splice site (±1 and 2) predicted# in gene for which association of such mutation with Likely pathogenic (LP) intronic phenotype has not yet been established AND MAF* <0.001 or novel Prediction program# suggests that mutation creates cryptic splice missense or site with large effect synonymous AND MAF* <0.001 or novel 3/4 or all 4 prediction programs^ used suggest pathogenicity AND Residue and surrounding residues are evolutionary conserved (at least up to chicken) AND missense MAF* <0.001 or novel OR variant does not completely fulfil the above criteria but there is other evidence available, such as functional proof or co-segrega- tion data Truncating mutation in gene for which association of such muta- nonsense or tion with phenotype has not yet been established frame-shift AND MAF* <0.001 or novel CHAPTER 4.1 Large effect recognition of consensus splice site (±1 and 2) predicted# in gene for which association of such mutation with Pathogenic (P) intronic phenotype has been established AND MAF* <0.001 or novel 3/4 or all 4 prediction programs^ used suggest pathogenicity AND Residue and surrounding residues are evolutionary conserved (at least up to chicken) missense AND MAF* <0.001 or novel AND Additional evidence like functional proof or co-segregation data Truncating mutation in gene for which association of such muta- nonsense or tion with phenotype has been established frame-shift AND MAF* <0.001 or novel Abbreviations: MAF: Minor Allele Frequency.*In population control cohorts (with at least 200 allele counts): 1000 Genomes, GoNL (Genome of the Netherlands), NHLBI Exome Sequencing Project (ESP); #RNA splicing prediction programs as provided by the Alamut software; ^Protein effect prediction programs as provided by the Cartagenia and Alamut software: SIFT, Polyphen, AGVGD, Mutation Taster

DIAGNOSTIC YIELD OF CARDIOMYOPATHIES 179 database (http://www.arvcdatabase.info)) was also taken into consideration for the classification. Based on all the available data, a final classification was performed. Our classification criteria are summarized in table 1: variants were classified as ‘benign’ (B), ‘likely benign’ (LB), ‘variant of uncertain significance’ (VOUS), ‘likely pathogenic’ (LP), or ‘pathogenic’ (P). Finally, for the purpose of this study, we considered patients who were shown to carry one or more ‘likely pathogenic’ and/or ‘pathogenic’ variants as ‘resolved cases’.

Patient inclusion We collected clinical data on 252 patients to retrospectively evaluate whether they fulfilled the formal diagnostic criteria for the respective subtypes of cardiomyopathy, as published for ARVC (Marcus et al, DCM (Mestroni et al), HCM (Gersh et al), and LVNC (Jenni et al). Patients were categorized as ‘fulfilling criteria’, ‘suspected’ (not fulfilling criteria, but showing features of the respective subtype), or ‘unconfirmed diagnosis’ (retrospective phenotype confirmation was not possible because no clinical data were available; patient categorized on the basis of the referral diagnosis to our laboratory), or excluded from the analysis if the clinical data could not confirm the suspected diagnosis of cardiomyopathy.

Statistics Statistical calculations were performed using the Statistical Package for Social Sciences software, version 22.0 (IBM SPSS Statistics, Inc., Chicago, Illinois, USA). Descriptive statistics are reported as mean ± SD or number (percentage). Continuous variables were compared using the unpaired Student’s t test or One-way ANOVA. Discrete variables were compared using Fisher’s Exact test. Values of P<0.05 were considered statistically significant.

RESULTS Summary of sequencing data We conducted targeted resequencing of 55 cardiomyopathy-related genes and the subsequent data analysis, variant interpretation and prioritization (see Methods and supplementary methods) in 252 patients. We were able to analyse, on average, 99.2 % of all targeted nucleotides with a coverage of at least 20x. The average coverage per nucleotide was

180 TARGETED SEQUENCING 433 (± 161) and varied between 134 (± 52) and 1126 (± 451). After MiSeq reporter quality filtering, vcf files were uploaded into the Cartagenia software and the regions of interest were selected, after which, on average, 168 (± 24) variants per patient remained. Upon using our filtering strategy, on average, 8 (± 6) variants per patient remained (range 1 - 49 variants), which were subjected to further interpretation and prioritization.

Phenotype evaluation, patient inclusion and categorization The clinical data of the 252 patients were retrospectively evaluated (see figure 2 for inclusion/exclusion criteria, phenotype evaluation, cardiomyo- pathy subtype categorization, and genetic diagnostic outcome). We excluded 46 patients because they had: (1) a primary arrhythmia or conduction disorder evaluated with the purpose of identifying a potentially related, but late developing, cardiomyopathy (n=19), (2) a family history of cardiomyopathy, but they did not fulfil the criteria or were not suspected of having familial cardiomyopathy (n=12), (3) vascular disease (n=4), (4) syndromal cardiomyo- pathy (n=2), (5) congenital heart disease (n=4), or metabolic cardiomyopathy (n=2), or because these patients had been published elsewhere (n=3). Of the remaining 206 patients, 125 fulfilled the clinical criteria of the respective cardiomyopathy subtype (‘fulfilling criteria’), 44 did not fulfil the criteria, but showed features of the disease and were thus ‘suspected’, and 37 patients had no detailed clinical records available, hence we analysed these 37 on the basis

of the referral diagnosis to our laboratory (‘unconfirmed diagnosis’) (figure 2). CHAPTER 4.1 The group of 125 patients who fulfilled the criteria consisted of 71 DCM, 47 HCM, 3 ARVC, 3 LVNC, and 1 RCM cases (figure 2). The group of 44 suspected patients consisted of 23 DCM, 11 HCM, 6 ARVC, 1 LVNC, and 3 unspecified CM cases (figure 2). And the group of 37 patients with an unconfirmed diagnosis consisted of 16 DCM, 17 HCM, 1 ARVC, 1 LVNC, and 2 unspecified CM cases (figure 2). For an overview of all patients, including the mutations they carry, see supplementary table 2.

Mutation spectrum We identified 142 pathogenic or likely pathogenic mutations (113 different mutations in total) in 34 genes. Of the 113 different mutations, 13 were classified as ‘pathogenic’, while the remaining 100 were classified as ‘likely pathogenic’ (supplementary table 2).

DIAGNOSTIC YIELD OF CARDIOMYOPATHIES 181 Figure 2: Flow chart of patient inclusion, phenotype evaluation and categorization, and genetic diagnosis. After referral to our laboratory, patients were retrospectively phenotyped and categorized into cardiomyopathy subtypes (fulfilling criteria, suspected disease, or unconfirmed) and were either genetically resolved or remained without a genetic diagnosis. The number of patients in the different groups/categories are indicated.

Among those mutations identified in multiple patients, we saw several well-known, Dutch founder mutations: c.2373dupG (n=5), c.2864_2865delCT (n=2) and c.2827C>T; p.R943* (n=1) in MYBPC3 (Christiaans et al) and c.40_42delAGA (n=4) in PLN (van der Zwaag et al, 2012). We identified putatively truncating TTN mutations (i.e. mutations leading to a premature stop codon or nonsense, frame shift, or consensus splice site mutations) in 21/206 (10%) patients. Of these, 11/125 (9%) were identified in patients who fulfilled our criteria, with 10/71 (14%) in DCM patients and 1/47 (2%) HCM patients. In addition, 4/44 (9%) were identified in suspected patients, with 3/23 (13%) in DCM-like patients and 1/23 (9%) in HCM-like patients.

182 TARGETED SEQUENCING Table 2: Diagnostic yield in patients who fulfilled the clinical criteria for a cardiomyopathy subtype

CM* subtype negative P LP positive (P + LP) total ARVC 2 (67%) 1 (33%) 0 1 (33%) 3 DCM 32 (45%) 4 (6%) 36 (50%) 40 (56%) 72 HCM 25 (53%) 6 (13%) 15 (33%) 21 (46%) 46 LVNC 2 (67%) 1 (33%) 0 1 (33%) 3 RCM 0 1 (100%) 0 1 (100%) 1 CM 0 0 0 0 0 total 61 (49%) 13 (10%) 51 (41%) 64 (51%) 125

Table 3: Diagnostic yield in patients with a suspected cardiomyopathy subtype

CM* subtype negative P LP positive (P + LP) total ARVC 3 (50%) 2 (33 %) 1 (17%) 3 (50%) 6 DCM 11 (48%) 2 (9%) 10 (43%) 12 (52%) 23 HCM 7 (64%) 0 4 (36%) 4 (36%) 11 LVNC 0 0 1 (100%) 1 (100%) 1 RCM 0 0 0 0 0 CM 1 (33%) 0 2 (67%) 2 (67%) 3 total 22 (50%) 4 (9%) 18 (41%) 22 (50%) 44

Table 4: Diagnostic yield in patients with an unconfirmed diagnosis

CM* subtype negative P LP positive (P + LP) total ARVC 1 (100%) 0 0 0 1 DCM 7 (41%) 3 (18%) 7 (41%) 10 (59%) 17 HCM 7 (41%) 2 (12%) 8 (47%) 10 (59%) 17 CHAPTER 4.1 LVNC 1 (100%) 0 0 0 1 RCM 0 0 0 0 0 CM 0 0 1 (100%) 1 (100%) 1 total 16 (43%) 5 (14%) 16(43%) 21 (57%) 37 *Abbreviations: ARVC: arrhythmogenic right ventricular cardiomyopathy, CM: cardiomyopathy unspecified, DCM: dilated cardiomyopathy, HCM: hypertrophic cardiomyopathy, LVNC: left ventricular non-compaction, LP: likely pathogenic (sometimes together with one or more LPs), P: pathogenic (sometimes together with one or more LPs), RCM: restrictive cardiomyopathy. Finally, we found TTN mutations in 6/37 (16%) patients with an unconfirmed diagnosis, with 5/16 (31%) in DCM patients and 1/17 (6%) HCM patients (see supplementary table 2). When categorized according to mutation type, we had 69 different missense mutations, 27 different truncating mutations (frame shift and nonsense mutations) and 17 different splice site mutations. In the 206 patients described here, we did not identify pathogenic or likely pathogenic mutations

DIAGNOSTIC YIELD OF CARDIOMYOPATHIES 183 in the ACTC1, BAG3, CAV3, CRYAB, DSG2, EYA4, GATAD1, LAMP2, MYL2, MYOZ1, MYOZ2, PRKAG2, PSEN1, PSEN2, SGCD, SOD2, TAZ, TBX20, TCAP, TMEM43, and TXNRD2 genes. Out of the 113 different disease-associated mutations that we have identified, only 32 (28%) were found in the Human Gene Mutation Database (HGMD; see supplementary table 2); in addition, six of the mutated nucleotides were known in HGMD, but reported as mutated into another nucleotide than the one seen in our patients.

Diagnostic yield The overall diagnostic yield for our patient cohort was 52% (107/206) (supplementary table 3). Of these, 77/206 (37%) patients carried one pathogenic or likely pathogenic variant, and 30/206 (15%) patients carried more than one mutation. At least one pathogenic mutation was identified in 22/206 (11%) patients. In the group of patients fulfilling the clinical criteria for cardiomyopathy, the diagnostic yield was 64/125 (51%) (table 2; figure 2). In patients suspected of having cardiomyopathy, this was 22/44 (50%) (table 3; figure 2). Interestingly, the diagnostic yield was the highest (although this was not significant) in our 37 patients with an unconfirmed diagnosis: 21/37 (57%) (table 4; figure 2). This could be attributed mainly to the higher yield for HCM patients (10/17; 59%) compared to the yield for patients who fulfilled the clinical criteria for HCM and the yield for suspected HCM cases. Pathogenic mutations were identified in 13/125 (10%) patients who fulfilled the criteria, in 4/44 (9%) suspected cases, and in 5/37 (14%) patients with an unconfirmed diagnosis. When cardiomyopathy subtypes were taken into consideration, the differences between the ‘fulfilling criteria’ and ‘suspected’ subcategories were observed for both DCM and DCM-like, and HCM and HCM-like patients: for both subtypes, we observed a higher, although insignificant, diagnostic yield in the group of patients fulfilling the clinical criteria than in those who were only suspected of having the subtype. In 40/72 (56%) DCM patients who fulfilled the criteria and in 12/23 (52%) suspected DCM patients, we identified pathogenic or likely pathogenic mutations (figure 2; table 2). Likewise, the diagnostic yield was 21/46 (46%) for HCM patients who fulfilled the criteria and 4/11 (36%) for HCM-like patients (figure 2; table 2). Unfortunately, we could not make similar comparisons for the other cardiomyopathy subtypes because the numbers of patients were too low. It is important to note, however, that we found an underlying genetic cause of disease in 3/4 (75%) of the

184 TARGETED SEQUENCING patients with an unspecified cardiomyopathy (3 from the ‘suspected’ group and 1 from the ‘unconfirmed diagnosis’ group).

Effect of including genes that were previously infrequently or never screened for To gain insight into the effect of including less prevalent genes in the panel on our diagnostic yield, we investigated how many patients were found to carry mutations in genes that were routinely screened in the pre-NGS era (the ‘pre- NGS genes’), and compared this with the number of patients with mutations in the less prevalent genes (i.e. those genes previously not included in routine diagnostic screening). The following genes were considered as the ‘pre-NGS genes’ in the DCM, HCM and ARVC subtypes: LMNA, MYBPC3, MYH6, MYH7, PLN, SCN5A and TNNT2 in DCM (Wilde & Behr, Posafalvi et al, Hershberger & Siegfried); ACTC1, MYBPC3, MYH7, MYL2, MYL3, TNNI3, TNNT2 and TPM1 in HCM (Wilde & Behr, and Pinto et al); and DSC2, DSG2, DSP, JUP, PKP2 and PLN in ARVC (Quarta et al, Wilde & Behr, te Rijdt et al). This study revealed that mutations were identified in the ‘pre-NGS genes’ in 34% (21/62) of DCM and DCM-like cases (38% in criteria-positive cases), 40% (14/35) of HCM and HCM- like cases (48% in criteria-positive cases), and 75% (3/4) of ARVC and ARVC-like cases, which means that 25-66% of cases are explained by mutations in genes previously rarely investigated in the particular type of cardiomyopathy. Notably, when multiple mutations were identified in a patient and these included a ‘pre-

NGS gene’, he or she was considered to be resolved. CHAPTER 4.1 We also evaluated how many patients were now found to carry mutations (i.e. at least one if multiple mutations were found) in genes that were previously not reported as being involved in that specific subtype (based upon van Tintelen et al). As expected, for the large number of genes reported to be involved in DCM, only two patients (2/62; 3%) were found to carry a mutation in a ‘non-DCM’ gene (CALR3 and MYL3, both in criteria-positive cases). In HCM and HCM-like cases, 6/35 (17%) patients carried mutations in genes not previously reported in HCM (in 3 criteria-positive cases: LAMA4, ABCC9, and DSP; one ‘suspected’ case: DES & DTNA (both in one patient); and 2 ‘unconfirmed’ cases: DSP & RYR2 (both in one patient) and RYR2). In the other subtypes, this was 1/4 (25%) in ARVC and ARVC-like patients (DMD in the ‘ARVC-like’ patient), and 1/2 (50%) in LVNC and LVNC-like cases (JUP in the ‘LVNC-like’ case).

DIAGNOSTIC YIELD OF CARDIOMYOPATHIES 185 Bigenic or multigenic inheritance Finally, a group of 30/206 patients (15%) carried multiple pathogenic or likely pathogenic variants: 15/125 (12%) of patients who fulfilled the criteria, 7/44 (16%) of the ‘suspected’ patients, and 8/37 (22%) of patients with an unconfirmed diagnosis. Of these, 12 patients carried two pathogenic/likely pathogenic mutations (2 patients had one pathogenic and one likely pathogenic mutation) and 3 patients carried three pathogenic/likely pathogenic mutations (1 with one pathogenic and two likely pathogenic mutations; two with three likely pathogenic mutations) in the category of criteria-positive cases; seven patients in the suspected category all carried two likely pathogenic mutations; and in the unconfirmed category, six patients carried two pathogenic/ likely pathogenic mutations (3 patients with one pathogenic and one likely pathogenic mutation, and 3 with two likely pathogenic mutations) and two patients carried three likely pathogenic mutations.

Patient characteristics Of the 206 patients we included, 126 (61%) were male and 80 (39%) female. There were no significant differences in sex distribution between the three patient categories (see supplementary table 4). The mean age at diagnosis of patients without a mutation compared to those carrying one mutation, and those with multiple mutations, did not differ significantly: 52 (± 13) years, 48 (± 18) years and 48 (± 19) years, respectively (for age of diagnosis see supplementary table 2). In addition, no significant differences regarding the age of diagnosis were noted among patients who fulfilled the clinical criteria when comparing the subgroups without a muta- tion, carrying a single mutation, or carrying multiple mutations. This was also true for the suspected and unconfirmed patient groups. In the subcategory of patients who fulfilled the clinical criteria, we evaluated the Left Ventricular Ejection Fraction (LVEF) in DCM patients, and the InterVentricular Septum thickness (IVS) in HCM patients, and investigated potential differences between patients carrying no, one or multiple mutations (see supplementary tables 5 and 6). Notably, comparable analyses within the other subcategories was not relevant as patient numbers were too low. In the DCM subcategory there was no significant difference between the mean LVEF for patients carrying no, one or multiple mutations (supplementary table 5). We also observed no differences with respect to the mean IVS in HCM patients

186 TARGETED SEQUENCING who carried either no or one mutation: 19.5 (± 4.4) mm versus 18.2 (± 2.9) mm (p=0.29) (supplementary table 6). The IVS was available for only 2/4 HCM cases with multiple mutations, so that a comparison was not appropriate.

DISCUSSION The diagnostic yield using our gene-panel-based NGS method is more than half (52%) for all the cases of hereditary cardiomyopathy: 51% in patients who fulfilled the clinical criteria, 50% in patients suspected of having a cardio- myopathy, and 57% in patients with an unconfirmed diagnosis. A substantial increase in diagnostic yield was observed particularly for DCM and DCM-like patients: 56% for criteria fulfilling, 52% for suspected and 56% for unconfirmed cases, in comparison to the yield of 20% in a cohort of 418 Sanger-sequenced patients (van Spaendonck et al, 2013). Our yield is comparable to that reported by two recent studies which used NGS in large DCM cohorts: 37% (121 patients; Pugh et al) and 73% (639 patients; Haas et al). The difference in these two studies may, in part, be explained by the number of genes that they screened for: 46 and 84 genes, respectively. In addition, the differences might also reflect differences in the pathogenicity classification criteria used. Pugh et al included variants in an additional class, the ‘VOUS-favour pathogenic’ class, which we would mostly have included in our ‘likely pathogenic’ class. Moreover, Haas et al included disease causing mutations according to the HGMD database, but this means they may also have included variants that were later proven not to

be causal, like the c.419C>T, p.Ser140Phe variant in PKP2, which was recently CHAPTER 4.1 shown not to co-segregate with disease in a Dutch ARVC family (Groeneweg et al). Compared to previous results obtained by Sanger sequencing of the most prevalent genes, our approach leads to a significantly higher diagnostic yield, which can be largely attributed to truncating mutations identified in the TTN gene. In cases which fulfilled the criteria, the percentage of TTN mutations was 14%, comparable to the cohorts studied by Pugh et al and Haas et al, but slightly lower than previously reported (18% in sporadic cases and 25% in familial DCM cases; Herman et al). The diagnostic yield for HCM and HCM-like cases was 46% for those that fulfilled the clinical criteria, 36% for suspected cases, and 59% for unconfirmed cases. The yield from criteria-positive cases is comparable to that seen in Dutch HCM patients in the pre-NGS era (approx. 50%; Christiaans et al). Notably, Lopes et al has also reported a slightly higher diagnostic yield of 57% after high-throughput sequencing of 41 genes in 223 HCM patients. The fact

DIAGNOSTIC YIELD OF CARDIOMYOPATHIES 187 that the diagnostic yield in criteria-positive cases is not significantly higher after the gene-panel-based sequencing in our study is rather surprising, certainly given the fact that mutations were identified in the most prevalent HCM genes in only 48% of these patients (i.e. the genes which were also screened in the pre-NGS era). This lack of increase in yield can probably be explained by the fact that HCM screening has been available in the Netherlands since 1996, and many of the more severe HCM cases have already undergone genetic diagnostic screening. In line with this observation, Hofman et al reported that the yield of DNA testing in arrhythmia syndromes has dropped over a 15-year period. They thought this was due, over time, to more patients with unclear diagnoses being referred for DNA testing, often coming from relatively small families. Slightly higher diagnostic yields are being observed in patients who fulfil the clinical criteria for DCM and HCM (see tables 2 and 3), compared to the yield in the suspected patients: 56% versus 52% for DCM and DCM- like cases, and 46% versus 36% for HCM and HCM-like cases. However, as these differences are not statistically significant, larger cohorts need tobe studied to reveal whether diagnostic yields might be higher in patients who fulfil the clinical criteria. Importantly, a pathogenic or likely pathogenic mutation was also identified in a relatively large number of patients suspected of having cardiomyopathy. In a 2010 position paper, the European Society of Cardiology working group on Myocardial and Pericardial Diseases stated that genetic testing is not indicated for the diagnosis of a borderline or doubtful cardiomyopathy except for selected cases in the setting of expert teams (Charron et al). However, the identification of a mutation in criteria-negative cases may be helpful in directing further diagnostic work and family-screening. In ARVC, for example, the identification of a pathogenic mutation is one of the task force’s major criteria (Marcus et al). One remarkable observation made in this study is that a high diagnostic yield was found within the subcategory of unconfirmed diagnoses, mainly in HCM and DCM patients. This might be explained by the fact that these patients were referred by other Dutch clinical genetics centres or by our own cardiogenetics team, who offer consultations to patients in other (regional) hospitals who have been selected by the local cardiologists. It is possible that the local physicians used higher phenotypic thresholds before referring patients for genetic testing. A substantial number of pathogenic or likely pathogenic mutations were identified in genes that would rarely or never have been analysed in the pre-

188 TARGETED SEQUENCING NGS era: only 40-50% of our mutations were identified in ‘pre-NGS genes’. A comparable finding was reported by Haas et al as they frequently found mutations in desmosomal, channelopathy and HCM genes in their DCM cohort; this is a subset of genes that would not have been screened for in this patient group, or only irregularly, in the pre-NGS era. A considerable part of these mutations were identified in the titin TTN( ) gene, i.e. in 10% in our total cohort and in up to 15% in DCM patients who fulfilled the clinical criteria. Because of its large size, the TTN gene was hardly ever examined in detail in the pre-NGS era, but new technologies now enable its routine screening. As reported before by Herman et al, we also identified truncating TTN variants in HCM and HCM-like cases, but this was in a percentage that did not differ from healthy controls. This underscores the importance of co- segregation analyses in larger patient cohorts to find further support for the pathogenicity of these mutations, as recently published for some families with truncating TTN mutations (van Spaendonck et al, 2014). Similarly, the role of likely pathogenic mutations in other rarely studied genes should be further investigated, including through co-segregation analyses. As expected, we also identified known Dutch founder mutations, for instance, the c.40_42delAGA; p.(Arg14del) mutation in the PLN gene (van der Zwaag et al, 2012; 2013) was identified in four cases and the c.2373dupG mutation in MYBPC3 (Alders et al, Michels et al, Christiaans et al) was identified in five cases. In contrast, only 30% of the mutations we identified were already known from the HGMD database, which underscores the importance of thorough data-mining and interpretation, and of the sharing CHAPTER 4.1 of data for the careful classification of variants. Another interesting observation was that 30/206 index patients (15%) were found to carry two or more pathogenic or likely pathogenic variants. Since earlier DNA diagnostic work was stopped once a pathogenic or likely pathogenic variant was found in one of the candidate genes, this is a highly interesting finding, and it provides support for the multigenic background of cardiomyopathies, which has recently been addressed by Roncarati et al, Bauce et al, Xu et al, Bao et al, and Rigato et al. The phenotype in multiple mutation carriers is generally believed to be more severe and/or to manifest at a younger age. However, we did not observe either of these. On the one hand, this might be because our group of patients was too small, or because several severe cases in which we only identified one mutation using our approach were actually bi- or multigenic, but they might have had other mutations in

DIAGNOSTIC YIELD OF CARDIOMYOPATHIES 189 genes not included in our gene panel. We cannot therefore exclude that some of these variants were wrongly classified and they are not truly pathogenic. Future co-segregation and functional analysis should offer more insight into this. On the other hand, we cannot rule out that in some patients with multiple likely pathogenic mutations, their cardiomyopathy only developed because of the presence of more than one mutation, and that these mutations individually would not have resulted in disease development, or only at very old age (bi-/oligogenic inheritance). Finally, a genetic diagnosis is still lacking for approximately 45% of our patients. There are several possible reasons for this: (1) the major cause of disease in some patients is not genetic; (2) mutations lie in regulatory regions of currently screened genes or in regulatory RNAs; (3) patients might carry deletions/duplications of one or more exons and these cannot be detected with the current approach; (4) patients may have causal mutations in other unknown or rare cardiomyopathy genes; and (5) the underlying cause is truly oligogenic and therefore difficult or impossible to deduce from gene-panel-based NGS analyses. Other strategies are needed to identify the missing inheritance in those patients, including exome or genome sequencing strategies, and/or RNA sequencing, and where applicable these strategies could be combined with linkage or linkage-like methods (van der Zwaag et al, 2011). Together, our gene-panel-based approach allows a more complete identification of disease-causing mutations in cardiomyopathy patients. We show that this is a valuable tool for routine diagnostics, and that it will facilitate more accurate and/or personalized counselling of patients and their families. Our approach results in a substantial increase in the diagnostic yield for DCM patients compared to the results from Sanger sequencing of the most prevalent genes (>50% vs. 20-25%). In addition, slightly higher diagnostic yields were achieved for patients fulfilling the DCM and HCM clinical criteria compared to patients with suspected disease; however, this must be confirmed in larger cohorts. Finally, our gene-panel-based approach enables the large-scale exploration of rare genes and multiple mutations underlying the inherited cardiomyopathies, for which the clinical relevance now has to be validated.

ACKNOWLEDGEMENTS We would like to thank the clinical geneticists, genetic counsellors and cardiologists for counselling and referring their patients to the Department of Genetics, UMCG, for

190 TARGETED SEQUENCING routine diagnostic screening; the molecular genetics team of the Genome Diagnostics section for technical assistance; staff members of the Genome Diagnostics section for help in variant interpretation and classification; and Jackie Senior for editing this manuscript.

SOURCES OF FUNDING This study was supported by a grant from the “Doelmatigheidsfonds” of the University Medical Center Groningen (to JDH Jongbloed, JP van Tintelen and RJ Sinke); a grant from the NutsOhra foundation (project 0903-41 to JDH Jongbloed, MP van den Berg, JP van Tintelen and RJ Sinke), and grants 2007B132 and 2010B164 from the Netherlands Heart Foundation (to JDH Jongbloed, PA van der Zwaag and JP van Tintelen).

Disclosures: The authors declare no conflicts of interest.

REFERENCES Alders M, Jongbloed R, Deelen Wet al. The 2373insG family screening: Dutch arrhythmogenic right mutation in the MYBPC3 gene is a founder mu- ventricular dysplasia/cardiomyopathy geno- tation, which accounts for nearly one-fourth of type-phenotype follow-up study. Circulation the HCM cases in the Netherlands. Eur Heart J 2011;123(23):2690-700 2003;24:1848-53 Gersh BJ, Maron BJ, Bonow RO et al. ACCF/AHA Almomani R, can der Heijden J, Ariyurek Y et al. Ex- guideline for the diagnosis and treatment of periences with array-based sequence capture; hypertrophic cardiomyopathy: a report of the toward clinical applications. Eur J Hum Genet American College of Cardiology Foundation/ 2011;19:50-55 American Heart Association Task Force on Prac- Bao JR, Wang JZ, Yao Y et al. Screening of pathogen- tice Guidelines. Circulation 2011;124:e783–e831 ic genes in Chinese patients with arrhythmo- Gowrisankar S, Lerner-Ellis JP, Cox S et al. Evalu-

genic right ventricular cardiomyopathy. Chin ation of second-generation sequencing of 19 CHAPTER 4.1 Med J (Engl) 2013;126:4238-41 dilated cardiomyopathy genes for clinical ap- Bauce B, Nava A, Beffagna G et al. Multiple muta- plications. J Mol Diagn 2010;12:818-27 tions in desmosomal proteins encoding genes Groeneweg JA, van der Zwaag PA, Jongbloed JD et in arrhythmogenic right ventricular cardiomy- al. Left-dominant arrhythmogenic cardiomy- opathy/dysplasia. Heart Rhythm 2010;7:22-29 opathy in a large family: associated desmosom- Charron P, Arad M, Arbustini E et al. European So- al or nondesmosomal genotype? Heart Rhythm ciety of Cardiology Working Group on Myocar- 2013;10:548-59 dial and Pericardial Diseases Genetic counsel- Haas J, Frese KS, Peil B et al. Atlas of the clinical ling and testing in cardiomyopathies: a position genetics of human dilated cardiomyopathy. Eur statement of the European Society of Cardiolo- Heart J 2014; pii: ehu301. [Epub ahead of print] gy Working Group on Myocardial and Pericar- Harakalova M, Mokry M, Hrdlickova B et al. Multiplexed dial Diseases. Eur Heart J 2010;31(22):2715-26 array-based and in-solution genomic enrichment Christiaans I, Nannenber EA, Dooijes D et al. for flexible and cost-effective targeted next-gen- Founder mutations in hypertrophic cardiomy- eration sequencing. Nat Protoc 2011;6(12):1870-86 opathy patients in the Netherlands. Neth Heart Herman DS, Lam L, Taylor MRG et al. Truncations J 2010;18:248-54 of titin causing dilated cardiomyopathy. N Engl Cox MG, van der Zwaag PA, van der Werf C et al. J Med 2012;366:619-28 Arrhythmogenic right ventricular dysplasia/ Hershberger RE & Siegfried DE. Update 2011: Clini- cardiomyopathy: pathogenic desmosome mu- cal and genetic issues in familial cardiomyopa- tations in index-patients predict outcome of thy. J Am Coll Cardiol 2011;57:1641-9

DIAGNOSTIC YIELD OF CARDIOMYOPATHIES 191 Hoedemaekers YM, Caliskan K, Michels M et al. Quarta G, Muir A, Pantazis A et al. Familial evalu- The importance of genetic counseling, DNA ation in arrhythmogenic right ventricular car- diagnostics, and cardiologic family screening in diomyopathy: impact of genetics and revised left ventricular noncompaction cardiomyopa- task force criteria. Circulation 2011;123:2701-19 thy. Circ Cardiovasc Genet 2010;3(3):232-9 Querfurth R, Fischer A, Schweiger MR et al. Cre- Hofman N, Tan HL, Alders M et al. Yield of mo- ation and application of immortalized lecular and clinical testing for arrhythmia syn- bait libraries for targeted enrichment and dromes: report of 15 years’ experience. Circula- next-generation sequencing. Biotechniques tion 2013;128:1513-21 2012;52(6):375-80 Jenni R, Oechslin E, Schneider J et al. Echocardio- Rigato I, Bauce B, Rampazzo A et al. Compound graphic and pathoanatomical characteristics of and digenic heterozygosity predicts life-time isolated left ventricular non-compaction: a step arrhythmic outcome and sudden cardiac death towards classification as a distinct cardiomyop- in desmosomal gene-related arrhythmogenic athy. Heart 2001;86(6):666-71 right ventricular cardiomyopathy. Circ Cardio- Jongbloed JDH, Pósafalvi A, Kerstjens-Frederikse vasc Genet 2013;6:533-42 WS et al. New clinical molecular diagnostic Roncarati P, Viviani Anselmi C, Krawitz P et al. methods for congenital and inherited heart dis- Doubly heterozygous LMNA and TTN muta- ease. Expert Opin Med Diagn 2011;5(1):9-24 tions revealed by exome sequencing in a severe Lopes LR, Zekavati A, Syrris P et al. Genetic com- form of dilated cardiomyopathy. Eur J Hum plexity in hypertrophic cardiomyopathy re- Genet 2013;21(10):1105-11 vealed by high-throughput sequencing. J Med Shearer AE, Hildebrand MS, Smith RJ. Solution- Genet 2013;50(4):228-39 based targeted genomic enrichment for preci- Marcus FI, McKenna WJ, Sherrill D et al. Diagnosis ous DNA samples. BMC Biotechnol 2012;4;12:20 of arrhythmogenic right ventricular cardiomy- Sikkema-Raddatz B, Johansson LF, de Boer EN et opathy/dysplasia: proposed modification of the al. Targeted next generation sequencing can re- task force criteria. Circulation 2010; 121: 1533-41 place Sanger sequencing in clinical diagnostics. Meder B, Haas J, Keller A et al. Targeted next-gener- Hum Mut 2013;34(7):1035-42 ation sequencing for the molecular genetic di- Teekakirikul P, Kelly MA, Rehm HL et al. Inherit- agnostics of cardiomyopathies. Circ Cardiovasc ed cardiomyopathies – Molecular genetics and Genet 2011;4:110-22 clinical genetic testing in the postgenomic era. Mestroni L, Maisch B, McKenna WJ et al. Guide- J Mol Diagn 2013;15:158-70 lines for the study of familial dilated cardiomy- te Rijdt WP, Jongbloed JD, de Boer RA et al. Clin- opathy. Eur Heart J 1999;20:93—102 ical utility gene card for: arrhythmogenic Michels M, Soliman OII, Kofflard MJ et al. Diastolic right ventricular cardiomyopathy (ARVC). abnormalities as the first feature of hypertro- Eur J Hum Genet 2014;22(2). doi: 10.1038/ phic cardiomyopathy in Dutch myosin-binding ejhg.2013.124 protein C founder mutations. JACC Cardiovasc van Tintelen JP, Pieper PG, van Spaendonck-Zwarts KY. Imaging 2009;2:58-64 Pregnancy, cardiomyopathies, and genetics. Cardio- Mook ORF, Haagmans MA, Soucy JF et al. Targeted se- vasc Res 2014;101(4):571-8. doi: 10.1093/cvr/cvu014 quence capture and GS-FLX Titanium sequencing van der Zwaag PA, van Tintelen JP, Gerbens F et al. of 23 hypertrophic and dilated cardiomyopathy Haplotype sharing test maps genes for familial genes: implementation into diagnostics. J Med cardiomyopathies. Clin Genet 2011;79:459-67 Genet 2013; doi:10.11136/jmedgenet-2012-101231 van der Zwaag PA, van Rijsningen IAW, Asimaki A Pinto YM, Wilde AA, van Rijsingen IA et al. Clinical et al. Phospholamban R14del mutation in pa- utility gene card for: hypertrophic cardiomyop- tients diagnosed with dilated cardiomyopathy athy (type 1-14). Eur J Hum Genet 2011;19(8). doi: or arrhthmogenic right ventricular cardiomy- 10.1038/ejhg.2010.243 opathy: evidence supporting the concept of ar- Posafalvi A, Herkert JC, Sinke RJ et al. Clinical utility rhythmogenic cardiomyopathy. Eur J Heart Fail gene card for: dilated cardiomyopathy (CMD). Eur 2012;14:1199-207 J Hum Genet 2013;21(10), doi: 10.1038/ejhg.2012.276 van der Zwaag PA, van Rijsingen IAW, de Ruiter R Pugh TJ, Kelly MA, Gowrisankar S et al. The land- et al. Recurrent and founder mutations in the scape of genetic variation in dilated cardiomy- Netherlands – Phospolamban p.Arg14del mu- opathy as surveyed by clinical DNA sequencing. tation causes arrhythmogenic cardiomyopathy. Genet Med 2014;16(8):601-8 Neth Heart J 2013;21:286-293

192 TARGETED SEQUENCING van Spaendonck-Zwarts KY, Posafalvi A, van den Berg MP et al. Titin gene mutations are com- mon in families with both peripartum cardio- myopathy and dilated cardiomyopathy. Euro- pean Heart Journal 2014;35(32):2165-73 van Spaendonck-Zwarts KY, van Tintelen JP, van Veldhuisen DJ et al. Peripartum cardiomyopa- thy as a part of familial dilated cardiomyopathy. Circulation 2010;121(20):2169-75 van Spaendonck-Zwarts KY, van Rijsingen IA, van den Berg MP et al. Genetic analysis in 418 in- dex patients with idiopathic dilated cardiomy- opathy: overview of 10 years’ experience. Eur J Heart Fail 2013;15:628-36 Voelkerding KV, Dames S, Durtschi JD. Next gener- ation sequencing for clinical diagnostics-prin- ciples and application to targeted resequencing for hypertrophic cardiomyopathy: a paper from the 2009 William Beaumont Hospital Sym- posium on Molecular Pathology. J Mol Diagn 2010;12:539-51 Wilde AA & Behr ER. Genetic testing for inherited cardiac disease. Nat Rev Cardiol 2013;10:571-83 Xu T, Yang Z, Vatta M et al. Compound and digenic heterozygosity contributes to arrhythmogenic right ventricular cardiomyopathy. J Am Coll Cardiol 2010;55:587–97 Zimmerman RS, Cox S, Lakdawala NK et al. A novel custom resequencing array for dilated cardio- myopathy. Genet Med 2010;12:268-78 CHAPTER 4.1

DIAGNOSTIC YIELD OF CARDIOMYOPATHIES 193 SUPPLEMENTARY MATERIAL Supplementary methods: Variant filtering A classification tree was developed using software from Cartagenia (Leuven, Belgium) in which subsequent filtering steps were used as described below. First, variants were filtered for the regions of interest, which included all exons of the targeted 55 cardiomyopathy genes with their respective +/- 20 bp flanking intronic sequences. Next, quality filtering of the called variants was performed, excluding all those which were identified with a read depth <20x. In the next step, variants were filtered against our in-house list of ‘managed variants’, which is regularly updated and contains previously identified and validated variants, including polymorphisms (‘benign’ variants) and sequencing artefacts. This was followed by excluding any variants that were present with an allele frequency ≥2%, with a minimum of 200 alleles screened in cohorts of ostensibly healthy controls: (1) the Genome of the Netherlands: the ‘1000 genome database’ of healthy Dutch individuals (http://www.genoomvannederland.nl), (2) the 1000 Genomes project (www.1000genomes.org); variants identified in the 2184 genomes from the 1000 Genomes project, and (3) the dbSNP database (http://www.ncbi.nlm.nih.gov/SNP/; status: validated). In addition, those variants present with an allele frequency ≥5% (again with a minimum of 200 alleles screened) in the ESP6500 database (NHLBI Exome Sequencing Project (ESP); http://evs.gs.washington.edu/EVS/) (variants identified during exome sequencing of 6500 individuals) were excluded. In the latter case, a higher frequency cut-off was selected, as this database contains the exomes of patients with cardiovascular diseases. As the final filter, we used an additional ‘managed variant list’ containing ‘likely benign’ variants, which had been previously identified as such in our validation series of targeted enrichment sequencing (Sikkema-Raddatz et al). Or they were frequently seen in our patient samples, but not in more than 20% of those samples, and predicted in silico to be likely benign (variants identified in ≥20% of patient samples were incorporated into our in-house database of polymorphisms, ‘benign’ variants, or artefacts, depending on the nature of the variant).

194 TARGETED SEQUENCING Supplementary Table 1: Targeted cardiomyopathy genes

Gene Chromosome Basepair position* (start - end) NEXN** 1 78354198 - 78409580 LMNA 1 156084670 - 156108971 TNNT2 1 201328298 - 201346845 PSEN2 1 227058923 - 227083365 ACTN2 1 236849934 - 236925959 RYR2 1 237205782 - 237996012 TTN 2 179391699 - 179672188 DES 2 220283145 - 220290507 CAV3 3 8733800 - 8841808 TMEM43 3 14166654 - 14183335 SCN5A 3 38595730 - 38674890 MYL3 3 46899317- 46899317 TNNC1 3 52485251 - 52488071 MYOZ2 4 120056899 - 120107411 SGCD 5 155753727 - 156186441 DSP 6 7542109 - 7586986 LAMA4 6 112430565 - 112575868 PLN 6 118879948 - 118879948 EYA4 6 133561736 - 133853258 SOD2 6 160090089 - 160183561 TBX20 7 35242002 - 35293271 GATAD1 7 92076767 - 92088150 PRKAG2 7 151254178 - 151573745 MYPN 10 69881155 - 69970283

MYOZ1 10 75391372 - 75401555 CHAPTER 4.1 VCL 10 75757926 - 75878001 LDB3 10 88428388 - 88492804 ANKRD1 10 92672493 - 92681072 RBM20 10 112404173 - 112595790 BAG3 10 121411148 - 121437369 CSRP3 11 19204110 - 19223629 MYBPC3 11 47352917 - 47374293 CRYAB 11 111779310 - 111782513 ABCC9 12 21953938 - 22089668 PKP2 12 32945260 - 33049705 MYL2 12 111348584 - 111358444 MYH6 14 23851159 - 23877526 MYH7 14 23881907 - 23904910 PSEN1 14 73614463 - 73686082 ACTC1 15 3508225 - 35087049 TPM1 15 63334989 - 63363411

DIAGNOSTIC YIELD OF CARDIOMYOPATHIES 195 TCAP 17 37821573 - 37822407 JUP 17 39911956 - 39928146 DSC2 18 28647949 - 28682428 DSG2 18 29078175 - 29126804 DTNA 18 32073254 - 31471808 CALR3 19 16589835 - 16606980 TNNI3 19 55663096 - 55668997 JPH2 20 42743396 - 42789087 TXNRD2 22 19863040 - 19929515 DMD X 31139907 - 33357766 GLA X 100652739 - 100663041 LAMP2 X 119565097 - 119603064 EMD X 153607805 - 153609597 TAZ X 153640141 - 153649402 List of genes included in the targeted Sure Select enrichment kit. *base pair position according to NCBI build 37 (UCSC hg19); **NEXN Nexilin, rat, homolog of, LMNA Lamin A/C, TNNT2 Troponin T2, cardiac, PSEN2 Presenilin 2, ACTN2 Actinin, alpha-2, RYR2 Ryanodine receptor 2, TTN Titin, DES Desmin, CAV3 Caveolin 3, TMEM43 Transmembrane protein 43, SCN5A Sodium channel, voltage gated, type V, alpha subunit, MYL3 Myosin, light chain 3, alkali, ventricular, skeletal, slow, TNNC1 Troponin C, slow, MYOZ2 Myozenin 2, SGCD Sarcoglycan, delta, DSP Desmoplakin, LAMA4 Laminin alpha-4, PLN Phospholamban, EYA4 eye absent 4, SOD2 superoxide dismutase 2, mitochondrial, TBX20 T-box 20, GATAD1 GATA zinc finger domain-containing protein 1, PRKAG2 Protein kinase, AMP-activated, noncatalytic, MYPN gamma-2, Myopalladin, MYOZ1 Myozenin 1, VCL Vinculin, LDB3 LIM domain-binding 3, ANKRD1 Ankyrin repeat domain-containing protein 1, RBM20 RNA-binding protein 20, BAG3 BCL2-associated athanogene 3, CSRP3 Cysteine- and glycine-rich protein 3, MYBPC3 Myosin- binding protein C, cardiac, CRYAB Crystalline, alpha-B, ABCC9 ATP-binding cassette, subfamily C, member 9, PKP2 Plakophilin 2, MYL2 Myosin, light chain 2, regulatory, cardiac, slow, MYH6 Myosin, heavy chain 6, cardiac muscle, alpha, MYH7 Myosin, heavy chain 7, cardiac muscle, beta, PSEN1 Presenilin 1, ACTC1 Actin, alpha, cardiac muscle, TPM1 Tropomyosin 1, TCAP Titin-cap, JUP Junction plakoglobin, DSC2 Desmocollin 2, DSG2 Desmoglein 2, DTNA Dystrobrevin, alpha, CALR3 Calreticulin 3, TNNI3 Troponin I, cardiac, JPH2 Junctophilin 2, TXNRD2 Thioredoxin reductase 2, DMD Dystrophin, GLA Galactosidase, alpha, LAMP2 Lysosome-associated membrane protein 2, EMD Emerin, TAZ Tafazzin

196 TARGETED SEQUENCING

1 1 1 1 1 3 0 0 0 0 0 0 0 JUP TNNT2 mutations number of *16 *8 fs (NM_002471.3), (NM_004006.2), (NM_001103.2), (NM_002667.3), fs PLN (NM_020433.4), DMD MYH6 ACTN2 (NM_000363.4), JPH2 TNNI3 (NM_004572.3), (NM_001927.3), (NM_000256.3), likely pathogenic mutation(s): mutation(s): pathogenic likely LAMA4, c.4624A>T; p.N1542Y c.4624A>T; LAMA4, no no no no no no no no MYPN, c.211_213delGAA; p.E71del c.211_213delGAA; MYPN, gene, cDNA; protein gene, c.608C>T; p.S203L c.608C>T; ANKRD1, c.222dupA; p.L75T ANKRD1, c.222dupA; ACTN2, c.2386C>T; p.R796C & LDB3, p.R796C ACTN2, c.2386C>T; TTN, p.V25131L c.75391delG; (NM_005691.2), DES PKP2 (NM_ 000169.2), ABCC9 MYBPC3 GLA (NM_003280.2), *95^ fs TNNC1 (NM_014000.2). (NM_ 003476.3), (NM_144573.3), VCL PKP2, c.2146-1G>C^ pathogenic mutation(s): mutation(s): pathogenic no no no no no no no no no no no p.P955R MYBPC3, c.2864_2865delCT; c.2864_2865delCT; MYBPC3, gene, cDNA; protein gene, (NM_ 170707.3), (NM_000117.2), NEXN CSRP3 EMD LMNA (NM_ 198056.2), (NM_ fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria patient patient suspected suspected suspected categorisation (NM_033337.2), Summary of clinical data and the mutations identified in 206 patients, including Summary patients, in 206 identified the mutations and data clinical of (NM_032578.2), SCN5A CHAPTER 4.1 (NM_001267550.1) and (NM_001390.4), CAV3 MYPN TTN Dx after Dx DCM DCM DCM DCM DCMlike DCMlike DCMlike HCM DCM DCM HCM HCM phenotype evaluation ARVC (NM_001171610.1), DTNA DTNA LDB3 Family Family History neg neg neg pos neg neg? neg neg neg neg pos pos pos? (NM_001035.2), (NM_145046.3), (NM_ 000258.2), (NM_ RYR2 (NM_004415.2), (NM_001018005.1), n.a. n.a. diagnosis 64 61 65 65 50 58 56 CALR3 Age Age 47 48 49 45 MYL3 (NM_002290.4), DSP TPM1 F F F F F F M M M M M M M Gender LAMA4 (NM_014391.2), Dx at at Dx DCM DCM DCM DCM DCM DCM DCM HCM DCM DCM DCM HCM HCM referal (NM_001134363.1), (NM_000257.2), (NM_024422.3), Patient 8 pathogenic mutations identified and the number of mutations per patient. per patient. and the number of mutations identified mutations pathogenic syndrome. Wolff-Parkinson-White WPW cardiomyopathy, RCM restrictive pos positive, negative, of the mutations the nomenclature used for were transcripts following The nucleotide. Dx diagnosis, F female, HCM hypertrophic cardiomyopathy, LVNC left ventricular non-compaction, M male, n.a. not available, neg left non-compaction, ventricular not available, n.a. M male, LVNC hypertrophic HCM cardiomyopathy, F female, diagnosis, Dx 6 their diagnosis (Dx) at referral, gender, age of onset, family history, diagnosis (Dx) after and or likely (Dx) phenotype pathogenic diagnosis evaluation, family history, age of onset, gender, referral, at (Dx) their diagnosis 5 Supplementary Table 2: Patients and mutations and mutations 2: Patients Table Supplementary Symbols used * stop codon, ^ mutation known in HGMD, ×mutated nucleotide (also) known in HGMD, but substituted for another for but substituted known nucleotide (also) in HGMD, ×mutated known ^ mutation in HGMD, codon, used * stop Symbols RBM20 DSC2 9 3 MYH7 2 (NM_002230.2), (NM_001001430.1), 4 7 Abbreviations: ARVC arrhythmogenic right ventricular cardiomyopathy, CM (unspecified) cardiomyopathy, DCM dilated cardiomyopathy, cardiomyopathy, DCM dilated CM (unspecified) cardiomyopathy, ARVC cardiomyopathy, right ventricular arrhythmogenic Abbreviations: ANKRD1 1 10 11 12 13

DIAGNOSTIC YIELD OF CARDIOMYOPATHIES 197 1 1 1 1 1 1 1 1 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 mutations number of likely pathogenic mutation(s): mutation(s): pathogenic likely RYR2, c.9454C>T; p.R3152C & p.R3152C c.9454C>T; RYR2, p.P1112H c.3335C>A; LAMA4, p.S203L LDB3, c.608C>T; & MYH6, p.R392* NEXN, c.1174C>T; & MYH7, p.Q1560R c.4679A>G; DSP, no no no no no no no no no no no no no no MYH6, c.3010G>T; p.A1004S^ c.3010G>T; MYH6, p.A1443D^ c.4328C>A; MYH6, gene, cDNA; protein gene, c.961G>C; p.V321L c.961G>C; p.S1362P c.4084T>C; NEXN, c.1453G>A; p.E485K NEXN, c.1453G>A; ACTN2, p.K679Q c.2035A>C; TTN, (splice) p.R29809Q c.89426G>A; TTN, p.Y30384* c.91152T>A; TTN, c.58432+2T>C *41^ fs pathogenic mutation(s): mutation(s): pathogenic no no no no no no no no no no no no no no p.W792V no no no no no no no no no MYBPC3, c.2373dupG; c.2373dupG; MYBPC3, gene, cDNA; protein gene, TNNI3, c.527G>A; p.W191* TNNI3, c.527G>A; fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria patient patient unconfirmed unconfirmed unconfirmed unconfirmed suspected suspected suspected suspected suspected suspected suspected suspected categorisation Dx after Dx HCM DCM DCM DCM DCM HCMlike DCMlike DCMlike HCMlike HCMlike HCM HCM DCM RCM HCM DCM DCM DCM HCM HCMlike DCMlike HCM DCM phenotype evaluation CM ARVC Family Family History neg? neg pos neg? pos pos pos n.a. pos pos? n.a. n.a. pos neg neg pos? neg neg pos n.a. neg n.a. neg n.a. pos 0 n.a. diagnosis 60 61 61 54 59 58 53 57 55 58 55 39 35 Age Age 23 27 44 42 47 49 42 45 73 13 F F F F F F F F F F F F F M M M M M M M M M M M M Gender Dx at at Dx HCM DCM DCM DCM DCM HCM DCM DCM HCM HCM HCM HCM DCM RCM HCM DCM DCM DCM HCM HCM DCM HCM DCM referal CM ARVC Patient 30 31 32 33 34 35 36 37 38 20 21 22 23 24 25 26 27 28 29 14 15 16 17 18 19

198 TARGETED SEQUENCING 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 0 0 0 0 0 0 0 0 EMD, c.149C>A; p.P50H c.149C>A; EMD, p.R331Q^× LMNA, c.992G>A; p.N593S c.1778A>G; DSP, & DES, p.E332A NEXN, c.995A>C; no no no no no no no no no no no no no MYBPC3, c.3065G>C; p.R1022P^ & p.R1022P^ c.3065G>C; MYBPC3, (splice) p.= c.841C>A; MYBPC3, & DMD, p.R748C c.2242C>T; MYPN, GLA, c.1153A>G; p.T385A^× GLA, c.1153A>G; c.2827C>T; p.R943C c.2827C>T; p.L398P c.1193T>C; SCN5A, c.659C>T; p.T220I^ SCN5A, c.659C>T; ABCC9, c.4516C>T; p.R1506C c.4516C>T; ABCC9, TNNI3 c.626A>C; p.Glu209Ala^ TNNI3 c.626A>C; JUP, c.849G>T; p.K283N c.849G>T; JUP, *72^ *72^ *41^ *41^ fs fs fs fs PLN, c.40_42delAGA; PLN, c.40_42delAGA; c.2297+2T>A DSP, no no no no no p.W792V no p.Q1259R p.Q1259R no p.R14del^ no no no no no no no no p.W792V no no no MYBPC3, c.2373dupG; c.2373dupG; MYBPC3, c.3776delA; MYBPC3, c.3776delA; MYBPC3, c.2373dupG; MYBPC3, fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria unconfirmed unconfirmed unconfirmed suspected suspected suspected suspected suspected CHAPTER 4.1 HCM HCM HCM DCM HCM HCM HCM DCM DCMlike HCM DCM LVNClike HCMlike DCM HCM DCM DCM HCM HCM HCM DCM DCMlike CM ARVClike Pos neg pos n.a. neg pos neg pos neg neg neg pos neg neg n.a. n.a. pos? pos pos? pos n.a. n.a. pos pos n.a. n.a. n.a. n.a. n.a. 62 60 65 64 65 67 59 50 50 51 54 55 38 27 46 44 49 47 73 F F F F F F F F F M M M M M M M M M M M M M M M HCM HCM HCM DCM HCM HCM HCM HCM DCM HCM DCM LVNC HCM DCM HCM DCM DCM HCM HCM HCM DCM DCM/ARVC DCM/ARVC CM 60 61 62 50 51 52 53 54 55 56 57 58 59 39 40 41 42 43 44 45 46 47 48 49

DIAGNOSTIC YIELD OF CARDIOMYOPATHIES 199 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 mutations number of *16 fs likely pathogenic mutation(s): mutation(s): pathogenic likely LAMA4, c.4624A>T; p.N1542Y c.4624A>T; LAMA4, c.273+5G>A^ DSP, p.N593S c.1778A>G; DSP, no no no p.M25113K no no no no no no no no no no no no no no no MYL3, c.517A>G; p.M173V^ c.517A>G; MYL3, p.V964L^ c.2890G>C; MYH7, gene, cDNA; protein gene, CSRP3, c.131T>C; p.L44P^ CSRP3, c.131T>C; VCL, c.2467C>T; p.R823W VCL, c.2467C>T; TTN, c.75332_75335dupTAAG; TTN, c.75332_75335dupTAAG; *95^ fs pathogenic mutation(s): mutation(s): pathogenic no no no no no p.P955R no no no no no no no no no no no no no no no no no no no no MYBPC3, c.2864_2865delCT; c.2864_2865delCT; MYBPC3, gene, cDNA; protein gene, TNNI3 c.292C>T; p.R98*^ TNNI3 c.292C>T; fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria patient patient unconfirmed unconfirmed unconfirmed unconfirmed unconfirmed unconfirmed suspected suspected suspected suspected suspected suspected categorisation Dx after Dx HCMlike DCM DCM DCM DCM HCM DCMlike DCMlike DCMlike HCMlike DCM DCM HCM DCM HCM DCM DCM DCMlike HCM DCM DCM HCM DCM HCM DCM DCM HCM phenotype evaluation Family Family History Neg neg n.a. pos neg neg n.a. neg neg n.a. n.a. pos pos pos pos n.a. n.a. n.a. n.a. n.a. n.a. n.a. neg pos pos? neg neg n.a. n.a. n.a. n.a. n.a. n.a. diagnosis 66 65 66 69 56 50 55 54 50 59 56 54 34 Age Age 45 41 42 43 4 44 41 70 F F F F F F F F F F M M M M M M M M M M M M M M M M M Gender Dx at at Dx HCM DCM DCM DCM DCM HCM DCM DCM HCM DCM DCM HCM DCM HCM DCM DCM DCM/HCM HCM DCM DCM HCM DCM HCM DCM DCM HCM referal CM Patient 80 81 82 83 84 85 86 87 88 89 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79

200 TARGETED SEQUENCING 1 1 1 1 1 1 1 1 1 2 2 2 2 3 0 0 0 0 0 0 0 0 0 0 0 *10 *3 fs fs *8 fs PKP2, c.1288A>G; p.K430E & DSP, & DSP, p.K430E PKP2, c.1288A>G; LMNA, c.437C>A; p.A146D & p.A146D LMNA, c.437C>A; & MYPN, p.R943C c.2827C>T; DMD, p.V1639M c.4915G>A; DSP, no no no no no no no no no no no p.D200G no no MYH7, c.4377G>T; p.K1459N^ c.4377G>T; MYH7, c.25922-6T>G c.939+1G>A^ c.59A>G; p.Y20C^ c.59A>G; & ANKRD1, c.599_600delAT; & ANKRD1, c.599_600delAT; SCN5A, c.2423G>A; p.R808H× SCN5A, c.2423G>A; ANKRD1, c.368C>T; p.T123M^ & TTN, & p.T123M^ ANKRD1, c.368C>T; ACTN2, p.D230E c.690T>A; p.P775H c.2324C>A; ABCC9, TTN, p.S28958K c.86872dupA; TTN, p.E15206* c.45616G>T; TTN, c.32887+1G>C TTN, p.G4300R c.12897dupA *90^ fs PLN, c.40_42delAGA; PLN, c.40_42delAGA; no no no no no no no no no p.R14del^ no p.F861W no no no no no no no no no no no no MYBPC3, c.654+1G>A MYBPC3, SCN5A, c.2582_2583delTT; fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria unconfirmed unconfirmed suspected suspected suspected suspected suspected suspected suspected CHAPTER 4.1 HCM LVNC DCMlike HCM DCM DCM HCM DCM HCM DCM HCM DCM DCM DCM HCM DCM DCMlike DCMlike DCM HCM DCM CM ARVClike ARVClike ARVClike neg pos n.a. pos n.a. pos pos neg? n.a. pos pos n.a. neg pos pos n.a. pos pos neg neg pos pos pos pos pos? 8 n.a. n.a. n.a. n.a. n.a. 61 63 68 60 57 55 59 38 36 35 35 28 23 22 42 40 46 49 72 F F F F F F F F F F F M M M M M M M M M M M M M M HCM LVNC DCM HCM DCM HCM DCM HCM DCM HCM DCM DCM DCM HCM DCM DCM DCM HCM DCM CM (D)CM ARVC ARVC/CM ARVC ARVC/DCM 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114

DIAGNOSTIC YIELD OF CARDIOMYOPATHIES 201 1 1 1 1 1 1 1 1 1 1 1 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 mutations number of *10 *8 fs fs likely pathogenic mutation(s): mutation(s): pathogenic likely RYR2, c.3152G>A; p.R1051H× c.3152G>A; RYR2, p.L3140V c.9418T>G; RYR2, DSP, c.944G>A; p.R315H c.944G>A; DSP, no no no no no no no no no no no no no no no MYH7, c.5773C>G: p.R1925G^ & p.R1925G^ c.5773C>G: MYH7, p.I1131T^ c.3392T>C; MYBPC3, p.R785H c.2354G>A; MYH6, gene, cDNA; protein gene, c.427G>A; p.A143T^× & RYR2, & RYR2, p.A143T^× c.427G>A; p.I2721T c.8162T>C; ANKRD1, c.222dupA; p.L75T ANKRD1, c.222dupA; TTN, c.98990-1G>T TTN, c.41329+1G>T & GLA, TNNT2, c.821+2dupT TTN, p.E18113D c.54339delA; JPH2, c.723C>G; p.S241R JPH2, c.723C>G; PLN, c.40_42delAGA; PLN, c.40_42delAGA; pathogenic mutation(s): mutation(s): pathogenic no p.R14del^ no no no no no no no no no no no no no no no no no no no no no no no gene, cDNA; protein gene, TNNT2, c.814C>T; p.Q272* TNNT2, c.814C>T; fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria patient patient unconfirmed unconfirmed unconfirmed unconfirmed unconfirmed unconfirmed unconfirmed unconfirmed unconfirmed unconfirmed unconfirmed unconfirmed unconfirmed unconfirmed suspected categorisation Dx after Dx HCM DCM DCM HCM HCM DCM DCM HCM HCM DCM DCM HCM HCM DCM HCM HCM HCM DCM DCM HCM HCM HCMlike HCM HCM DCM phenotype evaluation ARVC Family Family History n.a. n.a. n.a. n.a. pos n.a. neg n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. pos neg n.a. pos n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. diagnosis 60 62 50 56 57 56 Age Age 42 17 F F F F F F F F F M M M M M M M M M M M M M M M M M Gender Dx at at Dx HCM DCM DCM HCM HCM DCM DCM HCM HCM DCM DCM HCM HCM DCM HCM HCM HCM DCM DCM HCM HCM HCM HCM HCM DCM referal ARVC Patient 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140

202 TARGETED SEQUENCING 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 *6 *8 fs fs DSP, c.1778A>G p.N593S c.1778A>G DSP, p.R2834C c.8500C>T; DSP, p.T2663R c.7988C>G; DMD, DSP, c.1714C>T; p.R572W & RYR2, & RYR2, p.R572W c.1714C>T; DSP, p.S637N RBM20, c.1910G>A; p.Y2752C c.8255A>G; DMD, no no no no no no no no no no no no MYH7, c.2156G>A; p.R719Q^× c.2156G>A; MYH7, p.V606M^ c.1816G>A; MYH7, c.343A>G; p.I115V c.343A>G; c.8162T>C; p.I2721T c.8162T>C; ANKRD1, c.222dupA; p.L75T ANKRD1, c.222dupA; ACTN2, c.1426G>T; p.A476S & DMD, & DMD, p.A476S ACTN2, c.1426G>T; TTN, c.59926+1G>A TTN, p.V26772* c.80314_80315del; p.R147C^ TNNC1, c.439C>T; TTN, & DSC2, c.943- c.31514-3A>G TTN, p.K28603N c.85809delA; 1G>A^ *41^ fs no no no no no no no no no no no no no no no no no no no no no no no no p.W792V no MYBPC3, c.2827C>T; p.R943*^ c.2827C>T; MYBPC3, MYBPC3, c.2373dupG; c.2373dupG; MYBPC3, fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria unconfirmed unconfirmed unconfirmed unconfirmed suspected suspected CHAPTER 4.1 DCM DCM DCM HCM HCM DCM DCM DCM HCM DCM DCM DCM HCM DCM DCM HCM HCM DCMlike DCM HCM DCM LVNC DCM LVNC HCM ARVC ARVClike n.a. neg pos neg n.a. pos pos pos pos neg pos n.a. pos pos neg neg pos n.a. pos n.a. n.a. n.a. pos n.a. pos pos neg n.a. n.a. n.a. n.a. n.a. n.a. 61 63 66 57 56 50 56 59 57 38 37 34 20 25 49 46 43 48 7 76 1 F F F F F F F F F F F F M M M M M M M M M M M M M M M DCM DCM DCM HCM HCM DCM DCM DCM HCM DCM DCM DCM HCM DCM DCM HCM HCM DCM DCM HCM DCM LVNC DCM LVNC HCM ARVC ARVC 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167

DIAGNOSTIC YIELD OF CARDIOMYOPATHIES 203 1 1 1 1 2 2 2 2 3 3 0 0 0 0 0 0 0 0 0 0 0 mutations number of *5 & MYPN, c.59A>G; c.59A>G; *5 & MYPN, fs likely pathogenic mutation(s): mutation(s): pathogenic likely DMD, c.2827C>T; p.R943C c.2827C>T; DMD, & MYBPC3 p.E485K NEXN c.1453G>A; c.4608_4612delACGCC; DSP, p.R647C c.1939C>T; RYR2, & DTNA, p.S57L DES, c.170C>T; & p.R1096C c.3286C>T; LAMA4, & p.R331Q^× LMNA, c.992G>A; no no no no no no no no p.R1537E c.5773C>G; & MYH7, p.Y20C^ p.R1925G^ no no no p.A1004S^ MYH6, c.4264C>T; p.R1422W c.4264C>T; MYH6, p.R984Q c.2951G>A; MYPN, p.R1270H c.3809G>A; MYH6, gene, cDNA; protein gene, CSRP3, c.208G>T; p.G70W CSRP3, c.208G>T; c.649A>G; p.S217G^ c.649A>G; p.S593Y c.1778C>A; c.3010G>T; & MYH6, c.59926+1G>A ACTN2, c.690T>A; p.D230E & TTN, & ACTN2, p.D230E c.690T>A; pathogenic mutation(s): mutation(s): pathogenic no no no no no no no no no no no no no no no no no no no no no gene, cDNA; protein gene, fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria patient patient unconfirmed unconfirmed suspected suspected suspected suspected suspected suspected suspected categorisation Dx after Dx HCM HCM DCM DCM DCMlike DCMlike LVNC DCM DCM DCM HCM DCM HCM DCMlike DCMlike DCMlike HCMlike DCM HCM DCM phenotype evaluation ARVClike Family Family History n.a. pos pos pos neg pos? neg pos pos pos n.a. pos pos? pos? n.a. pos pos pos n.a. pos? neg? n.a. n.a. diagnosis 61 68 64 69 68 50 51 58 53 54 36 31 Age Age 49 49 48 70 77 70 14 F F F F F F F F F M M M M M M M M M M M M Gender Dx at at Dx HCM HCM DCM DCM DCM LVNC DCM DCM DCM HCM DCM HCM DCM DCM DCM HCM DCM HCM DCM referal CM ARVC Patient 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188

204 TARGETED SEQUENCING 1 1 1 1 1 1 2 2 2 2 0 0 0 0 0 0 0 0 142 *8 & *8 & fs fs *8 fs LAMA4, c.3335C>A; p.P1112H c.3335C>A; LAMA4, DSP, c.3294C>G p.D1098E c.3294C>G DSP, p.I2721T c.8162T>C; RYR2 p.L398P DES, c.1193T>C; no no no no no p.Q18136M no no no no no CALR3, c.147dupT; p.R50* CALR3, c.147dupT; ANKRD1, c.222dupA; p.L75T ANKRD1, c.222dupA; TPM1, c.853T>C; p.*285Glnext*20 & p.*285Glnext*20 c.853T>C; TPM1, TTN, p.I5941M c.17823delA; TTN, (splice)^ p.V1034M c.3100G>A; TTN, c.54406_54409delCAGT; JPH2, c.8G>A; p.G3E JPH2, c.8G>A; *30 *41^ fs fs PLN, c.40_42delAGA; PLN, c.40_42delAGA; no no no no no p.W792V no no no no no no no p.A1203G no no p.R14del^ no MYBPC3, c.2373dupG; c.2373dupG; MYBPC3, c.3607dupG; MYH6, fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria fulfilling criteria unconfirmed unconfirmed suspected suspected suspected suspected suspected CHAPTER 4.1 HCM DCM HCMlike HCMlike DCM HCM DCM DCM DCM DCM DCM DCM DCM DCM DCM DCM like DCM like CM neg? n.a. neg neg? n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. 69 61 51 59 55 57 53 56 22 23 49 49 45 42 40 77 F M M M M M M M M M M M M M M M M M HCM + DCM HCM HCM DCM HCM DCM DCM DCM DCM DCM DCM DCM DCM DCM HCM/DCM DCM DCM WPW 200 201 202 203 204 205 206 189 190 191 192 193 194 195 196 197 198 199

DIAGNOSTIC YIELD OF CARDIOMYOPATHIES 205 Supplementary Table 3: Total diagnostic yield CM subtype neg P LP pos (P + LP) total

ARVC 6 (60%) 3 (30%) 1 (10%) 4 (40%) 10 DCM 50 (45%) 9 (8%) 53 (47%) 62 (55%) 112 HCM 39 (52%) 8 (11%) 27 (36%) 35 (47%) 74 LVNC 3 (60%) 1 (20%) 1 (20%) 2 (40%) 5 RCM 0 1 (100%) 0 1 (100%) 1 CM 1 (25%) 0 3 (75%) 3 (75%) 4

total 99 (48%) 22 (11%) 85 (41%) 107 (52%) 206 *Abbreviations: ARVC: arrhythmogenic right ventricular cardiomyopathy, CM: cardiomyopathy, DCM: dilated cardiomyopathy, HCM: hypertrophic cardiomyopathy, LVNC: left ventricular non-compaction, LP: likely pathogenic (sometimes together with one or more LP’s), neg: negative, P: pathogenic (sometimes together with one or more LPs), pos: positive, RCM: restrictive cardiomyopathy.

Supplementary Table 4: Gender and genetic diagnosis.

Fulfilling Suspected Unconfirmed Total M* F p-value M F p-value M F p-value M F p-value Total 76 48 25 19 25 13 126 80 39 25 15 8 12 9 66 42 ≥1 mut 1 0.36 0.31 1 (51%) (52%) (60%) (41%) (48%) (69%) (52%) (53%) no mut 37 23 10 11 13 4 40 38 Comparison of the sex distribution in the three patient categories related to their muta- tion carrier status (no mut = no mutation identified; ≥1 mut = one or multiple mutations identified. *M = male; F = female).

Supplementary Table 5: LVEF in DCM patients fulfilling criteria related to presence or absence of single and/or multiple mutations. Mutations P-value 0 1 >1 0vs≥1 0vs1 0vs>1 1vs>1 Data available 32/32 27/30 9/10 LVEF[%]±SD 29.3±9.0 27.1±10.8 34.6±8.5 0.11 1.00 0.47 0.15 * Groups were compared using one-way ANOVA

Supplementary Table 6: IVS in HCM patients fulfilling criteria related to presence or absence of single and/or multiple mutations.

Mutations P-value 0 1 >1 0vs≥1 0vs1 0vs>1 1vs>1 Data available 23/25 16/17 2/4 IVS[mm]±SD 19.5±4.4 18.1±3.0 19.5±2.1 0.29 0.78 1.00 1.00 * Groups were compared using one-way ANOVA

206 TARGETED SEQUENCING

Chapter 4.2

Titin gene mutations are common in families with both peripartum cardiomyopathy and dilated cardiomyopathy

Karin Y van Spaendonck-Zwarts, Anna Posafalvi, Maarten P van den Berg, Denise Hilfiker-Kleiner, Ilse AE Bollen, Karen Sliwa, Mariëlle Alders, Rowida Almomani, Irene M van Langen, Peter van der Meer, Richard J Sinke, Jolanda van der Velden, Dirk J Van Veldhuisen, J Peter van Tintelen*, Jan DH Jongbloed*

*The last two authors contributed equally

Published in European Heart Journal, 2014 ABSTRACT Aims: Peripartum cardiomyopathy (PPCM) can be an initial manifestation of familial dilated cardiomyopathy (DCM). We aimed to identify mutations in families that could underlie their PPCM and DCM. Methods and Results: We collected 18 families with PPCM and DCM cases from various countries. We studied the clinical characteristics of the PPCM patients and affected relatives, and applied a targeted next-generation sequencing (NGS) approach to detect mutations in 48 genes known to be involved in inherited cardiomyopathies. We identified 4 pathogenic mutations in 4/18 families (22%): 3 in TTN and 1 in BAG3. In addition, we identified 6 variants of unknown clinical significance that are likely to be pathogenic in 6 other families (33%): 4 in TTN, 1 in TNNC1, and 1 in MYH7. Measurements of passive force in single cardiomyocytes and titin isoform composition potentially support an upgrade of one of the variants of unknown clinical significance inTTN to a pathogenic mutation. Only 2/20 PPCM cases in these families showed recovery of left ventricular function. Conclusion: Targeted NGS shows that potentially causal mutations in cardiomyopathy-related genes are common in families with both PPCM and DCM. This supports the earlier finding that PPCM can be part of familial DCM. Our cohort is particularly characterised by a high proportion of TTN mutations and a low recovery rate in PPCM cases.

Keywords: cardiomyopathy, peripartum cardiomyopathy, genetics, pregnancy, titin INTRODUCTION Peripartum cardiomyopathy (PPCM) is an idiopathic cardiomyopathy presenting with heart failure secondary to left ventricular systolic dysfunction towards the end of pregnancy or in the first months following delivery, where no other cause of heart failure is found. The left ventricle may not be dilated but the ejection fraction is nearly always reduced below 45%.1 According to this recent definition, the time frame is not strictly defined, in contrast to previous definitions.2-4 The severity of PPCM is highly variable, ranging from complete recovery to rapid progression to end-stage heart failure. PPCM affects 1:300 to approximately 1:3000 pregnancies, with geographic hot spots of high incidence such as in Haiti and Nigeria.4,5 The precise mechanisms that lead to PPCM are not fully known. Several risk factors and possible underlying pathological processes have received attention, such as abnormal autoimmune responses, apoptosis, and impaired cardiovascular microvasculature.5,6 Recent work into the pathogenesis of PPCM has shown involvement of a cascade with oxidative stress, the prolactin-cleaving protease cathepsin D, and the nursing hormone prolactin, which may lead to a target for a disease-specific therapy, namely pharmacological blockade of prolactin by bromocriptine.7-9 In addition, involvement of cardiac angiogenic imbalance may explain why PPCM is a disease seen in late pregnancy and why pre-eclampsia and multiple gestation are important risk factors.10 PPCM is probably caused by a complex interaction

of more than one pathogenic mechanism. The large variation in incidence and CHAPTER 4.2 clinical characteristics may reflect the involvement of specific mechanisms, or combinations thereof, in certain subgroups of PPCM. We and others recently reported that PPCM can be an initial manifestation of familial dilated cardiomyopathy (DCM),11,12 indicating that, at least in a subset of cases, genetic predisposition plays a role in the pathophysiology of pregnancy-associated heart failure. Accordingly, Haghikia et al. reported a positive family history for cardiomyopathy in 16.5% (19/115) of PPCM cases from a German PPCM cohort.13 So far, eight cases with underlying mutations in DCM-related genes have been published11,12,14,15 and several other cases with familial occurrences of PPCM and DCM, as well as familial clustering of PPCM, have been reported.16-24 Here, we describe our extensive genetic analysis using next-generation sequencing (NGS) technology to identify potentially causal mutations in families with both PPCM and DCM from various parts of the world.

TTN IN PERIPARTUM CARDIOMYOPATHY 211 METHODS Subjects and Clinical Evaluation We collected a cohort of families with cases of both PPCM and DCM from various parts of the world (the Netherlands, Germany, and South Africa) and studied their clinical characteristics by reviewing medical reports. The local institutional review committees approved the study, and all participants gave their informed consent. PPCM was diagnosed when a patient had an idiopathic cardiomyopathy presenting with heart failure secondary to left ventricular systolic dysfunction towards the end of pregnancy or in the first months following delivery, where no other cause of heart failure was found.1 DCM was diagnosed when a patient had both a reduced systolic function of the left ventricle (left ventricular systolic ejection fraction <0.45) and dilation of the left ventricle (left ventricular end- diastolic dimension >117% of the predicted value corrected for body surface area and age) and only after other identifiable causes like severe hypertension, coronary artery disease, and systemic disease had been excluded.25 If only one of the two criteria was fulfilled, the patient was labeled with “mild DCM”. If the family history suggested DCM in a relative but there were no medical reports to confirm this, the relative was labeled as having “possible DCM”. Familial PPCM/ DCM was diagnosed when there were ≥2 affected family members, at least one with PPCM and one with DCM or sudden cardiac death (SCD) ≤35 years.

Targeted Next-Generation Sequencing of 48 Cardiomyopathy- Related Genes Genomic deoxyribonucleic acid (DNA) was extracted from blood samples obtained from all the available PPCM patients and their affected relatives. Targeted NGS was performed in one or two affected relatives in the selected families (these individuals are marked with an arrow in Figures 1 and 2). We developed a kit based on Agilent Sure Select Target Enrichment for mutation detection in 48 genes (all exonic and ± 20 bp of exon-flanking intronic sequences) known to be involved in inherited cardiomyopathies (ABCC9, ACTC1, ACTN2, ANKRD1, BAG3, CALR3, CRYAB, CSRP3/MLP, DES, DMD, DSC2, DSG2, DSP, EMD, GLA, JPH2, JUP, LAMA4, LAMP2, LMNA, MYBPC3, MYH6, MYH7, MYL2, MYL3, MYPN, MYOZ1, MYOZ2, PKP2, PLN, PRKAG2, PSEN1, PSEN2, RBM20, RYR2, SCN5A, SGCD, TAZ, TBX20, TCAP, TMEM43, TNNC1, TNNI3, TNNT2, TPM1, TTN, VCL, ZASP/LDB3).26 Samples

212 TARGETED SEQUENCING were prepared according to the manufacturer’s protocols and multiplexed to an amount still permitting a theoretical coverage of 100 reads per targeted sequence/per patient. All samples were sequenced using 151 bp paired-end reads on an Illumina MiSeq sequencer and analyzed using the MiSeq Reporter pipeline and Nextgene software.27 Eleven amplicons with low coverage were also analyzed by Sanger sequencing. Identified mutations were confirmed by Sanger sequencing. To study co-segregation, affected relatives were screened for carriership of the identified mutations by Sanger sequencing.

Sanger Sequencing STAT3 Gene The STAT3 gene (all coding exons and flanking intronic sequences) was analysed by Sanger sequencing in PPCM patients of the collected families.

Classification of Identified Mutations The criteria used to classify mutations were published recently.28 Briefly, we used a list of mutation-specific features based on in silico analysis using the mutation interpretation software Alamut (version 2.2.1). A score was given depending on the outcome of a prediction test for each feature (i.e. the PolyPhen-2 prediction tool). Then, depending on the total score and the presence/absence of the mutation in at least 300 ethnically matched control alleles (data obtained from the literature and/or available databases, e.g. http://evs.gs.washington.edu/EVS and http://www.nlgenome.nl, or from our own control alleles), we classified mutations as: pathogenic, not patho- CHAPTER 4.2 genic, or as a variant of unknown clinical significance (VUS; VUS1, unlikely to be pathogenic; VUS2, uncertain; VUS3, likely to be pathogenic). Co- segregation data and/or functional analysis were needed to classify a mutation as pathogenic.

Functional Analysis of TTN mutation Passive force was measured in single membrane-permeabilized cardiomyocytes mechanically isolated from the heart tissue.29,30 Titin isoform composition was analysed as described previously.30

TTN IN PERIPARTUM CARDIOMYOPATHY 213 Figure 1. Pedigrees of the Dutch families (NL1-11). Square symbols indicate men; circles, women; diamonds, unknown sex; and triangles, miscarriage. Blue symbols indicate a clinical diagnosis of PPCM; black symbols, (mild) DCM; grey symbols, possible DCM; orange symbols, sudden cardiac death (SCD). Diagonal lines through symbols indicate deceased; arrows indicate patients selected for targeted next-generation sequencing; and the number in a symbol indicates the number of individuals with this symbol (question mark if unknown).

214 TARGETED SEQUENCING Figure 2. Pedigrees of the South African (SA1) and German families (GER1-6). Square symbols indicate men; circles, women; diamonds, unknown sex. Blue symbols indicate clinical diagnosis of PPCM; black symbols, (mild) DCM; orange symbol, sudden cardiac death (SCD). Diagonal lines through symbols indicate deceased; arrows indicate patients selected for targeted next-generation sequencing; the number in a symbol indicates the number of individuals with this symbol (question mark if unknown); and SB indicates still birth.

RESULTS CHAPTER 4.2 Clinical Characteristics: Low Rate of Full Recovery in PPCM Cases of Familial PPCM/DCM We collected 18 families with familial PPCM/DCM. These families originated from the Netherlands (n=11), Germany (n=6), and South Africa (n=1; black). Clinical data of the PPCM cases in these families are summarised in Table 1 and of all (likely) affected relatives in Supplemental Table S1. The pedigrees of all the families are shown in Figures 1 (NL1-11) and 2 (SA1 and GER1-6). In two families there were two cases of PPCM (NL1 and SA1). Eight families (NL1-7 and SA1) have been described previously.11,31 The median age at diagnosis in PPCM patients was 29 years (n=15; range 20-36 years), with mean parity 2 (n=13; range 1-4). PPCM diagnosis was postpartum in 12/14 patients. Only 2/20 PPCM patients showed a full recovery of left ventricular function, one of them even had an uneventful next

TTN IN PERIPARTUM CARDIOMYOPATHY 215 hypertrophy, fibrosis fibrosis hypertrophy, Pathology and other remarks Pathology heart, Dilated myocyte terminated New pregnancy, Myocyte hypertrophy Signs of acute myocarditis myocarditis of acute Signs Suspicion of neurodermitis (EMB), suspicion of vasculitis (35)

D (31) D MOF (27) D asthma cardiale LBBB, HF (51) (51) LVEF No recovery No recovery with recovery Full ICD/CRT (31), LVAD, VF, D VF, ICD/CRT (31), LVAD, HTX ICD, tachycardia uneventful 2nd pregnancy uneventful and outcome (age in yrs) and outcome Cardiological remarks remarks Cardiological cardiogenic shock (34) cardiogenic (26) HTX(35), PM, (37), normal AF (30), PVCs, VTs (46), D VTs AF (30), PVCs, 2 years later 2 years Thrombus LV apex, apex, LV Thrombus VT TIA, apex, LV Thrombus Tachycardia

LVEF LVEF 8 years 10% 8 years recovery years 50-55% years 6 months 44%, 6 months 23% 6 months 55%, 6 months 55%, 6 months 37%, 6 months at Follow-Up Follow-Up at 9 months no 9 months 3 months 33% 3 months normal 3 years 30-35% 2 years 45%, 3 2 years 24% normal 2 years 4 months 4 months 7 years 42% 7 years 30% 20% 25% 21% 23% 23% 25% 22% 20% 25% 43% Poor Poor 18% LVEF at at LVEF 20-30% Diagnosis Pregnancy P4 P1 P4 P2 CS P1 AI CS P3 P1 CS P2 P1 CS, twin P2 CS 29th P2 P1 week, HELLP pregnancy eclampsia SB 27 weeks SB 27 weeks 29th week, Few days after delivery days Few 3 days after delivery3 days of pregnancy 37th week after3 months delivery of pregnancy 35th week after delivery3 weeks 2 months after2 months delivery after delivery2 weeks Timing at Diagnosis at Timing Just after delivery Just after delivery Just after delivery Just after delivery 1 month after1 month delivery PPCM PPCM Diagnosis Diagnosis PPCM (29) PPCM (27) PPCM (26) PPCM (33) PPCM (30) PPCM (33) PPCM (29) PPCM (23) PPCM (35) PPCM (30) PPCM (36) PPCM (20) PPCM (23) PPCM (22) (age in yrs) Referred for Referred HF HF HF HF HF HF HF HF respiratory HF, Dyspnea, insufficiency tachycardia asymptomatic Cardiogenic Cardiogenic Chest pain, coughing shock Screening, Screening, Patient Patient II:6 III:4 III:3 III:1 III:2 III:1 III:2 III:1 III:3 III:1 III:6 III:1 II:5 II:6 II:1 II:1 y Famil NL1 NL1 NL2 NL3 NL4 NL5 NL6 NL7 NL8 NL9 NL10 NL11 GER1 GER2 SA1 SA1 Table 1. Clinical characteristics of confirmed PPCM cases 1. Clinical characteristics of confirmed Table

216 TARGETED SEQUENCING TNNC1 VUS2 p.Arg279Trp VUS2 p.Arg279Trp † (NM_004281.3), No samples available No samples available II:1, II:3, II:4, III:4, III:5, III:6 II:2, III:1 II:5, III:2, III:5, IV:1 II:1, II:3 III:1, III:5 II:1, II:2, II:6, III:2, III:5, III:6 II:1, III:1 I:1, II:1

Affected relatives carrier relatives Affected BAG3 BAG3 Graves’disease, nicotin and nicotin Graves’disease, drug abuse

Unknown Unknown Co-segregation Yes/Unknown Yes Yes Yes Yes Yes Yes Yes Yes BiVAD, no recovery, no recovery, BiVAD, D after 2 years no recovery VAD after 2nd pregnancy, after 2nd pregnancy, VAD entered with 30% LVEF , with 30% LVEF entered Subsequent pregnancy Subsequent

(NM_001256850.1; Q8WZ42-1), TTN Pathogenic Pathogenic Pathogenic Pathogenic Classification VUS3 VUS3 VUS3 VUS3 VUS3 VUS3

recovery 6 months 30% 6 months 25% 6 months 36%, 6 months no 6 months >1 year 47% >1 year † 25% 25% 25% <30% P1

CHAPTER 4.2 Nucleotide change c.82117C>T c.1018C>T c.149T>C c.86171_86174dupAAAG c.52795C>T c.71867_71876delGAGTTCTGGA c.81949dupA c.55070G>A c.46990_46993delAAGG c.3907C>G

† 9 *3 *10 *13 fs fs* fs fs 3 months after3 months delivery PPCM PPCM PPCM PPCM (33) p.Arg27373* p.Gln340* p.Gln50Arg p.Asn28726Lys p.Arg17599* p.Arg23956Thr p.Ser27317Lys p.Trp18357* p.Lys15664Val p.Arg1303Gly

Amino acid change Amino BAG3 Gene MYH7 (NM_000257.2). VUS indicates variant of unknown clinical significance (VUS3, likely to be pathogenic, VUS2, uncertain). of unknown variant clinical significance VUS indicates to be pathogenic, (VUS3, likely (NM_000257.2).

TTN TNNC1 TTN TTN TTN TTN TTN TTN

MYH7 II:3 II:2 III:2 II:3 III:1 II:6 III:1 II:1 II:1 II:1 patient

Tested Tested II:1 II:1 II:1 II:1 Family NL1 NL3 NL4 NL6 NL9 NL10 NL11 GER1 GER4 GER5

Nomenclature according to HGVS (Human Genome Variation Society) using the reference sequences: sequences: Society) using the reference Variation HGVS (Human Genome to according Nomenclature implantable cardiac defibrillator; LBBB, left bundle branch block; LV, left ventricle; LVEF, left ventricular ejection fraction; MOF, multiple organ failure; P, pregnancy;PM, failure; multiple organ left ejectionventricular fraction; MOF, LVEF, leftventricle; LV, left block; defibrillator; bundle branch LBBB, cardiac implantable pacemaker; PPCM, peripartum cardiomyopathy; PVC, premature ventricular contraction; RV, right ventricle; SB, still birth; TIA, transient ischemic attack; VF, ventricular ventricular VF, ischemic attack; TIA, transient still birth; SB, right ventricle; RV, contraction; ventricular pacemaker; PPCM, peripartum PVC, premature cardiomyopathy; fibrillation; VT, ventricular tachycardia. tachycardia. ventricular VT, fibrillation; GER3 GER4 GER5 GER6 caesarean section; D, death; EMB, endomyocardial biopsy; HELPP, hemolysis, elevated liver enzymes, low platelet count; HF, heart HTX, failure; heart ICD, transplantation; HF, count; enzymes, platelet liver low elevated hemolysis, biopsy; HELPP, endomyocardial EMB, death; section;caesarean D, (c.835C>T) on same allele (NM_003280.2), AF indicates atrial fibrillation; AI, artificial insemination; AT, atrial tachycardia; (Bi)(L)VAD, (bi)(left) ventricular assist device; CRT, cardiac resynchronization therapy; CS, resynchronization cardiac (bi)(left) ventricular assist device; CRT, (Bi)(L)VAD, atrial AT, tachycardia; atrial fibrillation; AI, artificialAF indicates insemination; Table 2. Potentially causal mutations identified in 10/18 families identified causal mutations 2. Potentially Table

TTN IN PERIPARTUM CARDIOMYOPATHY 217 pregnancy (NL9 III:1, LVEF still normal 3 years after diagnosis; and GER1 II:1, full recovery with uneventful second pregnancy two years later). Another PPCM patient showed recovery of left ventricular function, but only under treatment with a beta-blocker and ACE inhibitor (NL10 III:6). In addition to 20 confirmed PPCM patients in these families, five relatives show clinical characteristics suggestive for PPCM (NL4 II:2, GER1 I:1, GER3 I:1, GER4 I:1, GER5 I:1; Table S1). PPCM could not be confirmed because clinical data of these relatives was lacking. In addition, two relatives with DCM showed a decline of left ventricular function after delivery (NL2 IV:8 and SA1 II:3; Table S1).

Targeted Next-Generation Sequencing: Potential Causal Mutations in Cardiomyopathy-Related Genes, in particular TTN, are Common in Familial PPCM/DCM Using our validated NGS approach,27 a mean coverage of 220x per individual patient was reached and, on average, 98.5% of all targeted nucleotides were covered at least 20x. In 4/18 families (22%) pathogenic mutations in cardiomyopathy-related genes were identified (3 in TTN and 1 in BAG3). In addition, in 6 other families (33%) VUS3s were identified (4 in TTN, 1 in TNNC1, and 1 in MYH7). An overview of these mutations and VUS3s and the respective co-segregation analyses are shown in Table 2. All 7 TTN mutations/VUS3s were located in the titin A-band, for which over-representation of mutations in DCM patients was reported previously.32 No potential mutations were identified in 8 families (NL2, NL5, NL7, NL8, SA1, GER2, GER3, and GER6). An overview of the 26 mutations that were not classified as potentially disease-causing (VUS1s and VUS2s) identified in the 18 families is shown in Supplemental Table S2.

No STAT3 Mutations in PPCM Cases No STAT3 mutations were identified in 15 PPCM cases (DNA was available from 15/20 cases).

Functional and Protein Analyses Support the Pathogenicity of a Likely Pathogenic TTN Mutation Heart tissue from PPCM patient GER4 II:1 with a VUS3 in TTN was available for functional and protein analyses. Passive force was measured in

218 TARGETED SEQUENCING single cardiomyocytes (n=4) at sarcomere lengths of 1.8 to 2.2 μm (see Figure 3). Our functional measurements of passive stiffness, which is largely based on titin composition in the heart, revealed a very low passive force development (1.0±0.3 kN/m2) at a sarcomere length of 2.2 μm in the PPCM sample compared to previously reported values in control hearts (~2.5 kN/m2).29,30 Analysis of titin isoform composition showed a shift towards the more compliant N2BA isoform evident from a higher N2BA/N2B ratio (0.72±0.02; mean of triplo) in the PPCM heart compared to the previously reported ratio (0.39±0.05) in control hearts.30

DISCUSSION This is the first report of a comprehensive genetic analysis in a large series of cases with familial occurrences of PPCM and DCM. We identified pathogenic mutations in cardiomyopathy-related genes in 4/18 families (22%) and VUSs that are likely to be pathogenic in 6 other families (33%). These data support the earlier finding that PPCM can be part of familial DCM.11,12 Cascade genetic screening can identify relatives at risk in those families in which an underlying mutation has been identified. Our data also specifically show a low recovery rate in our cohort (only 10%) compared to reports in other groups not selected for familial cases (recovery rates of around 25 to 50%),33-36 indicating that the presence of an underlying mutation or positive family history for cardiomyopathy in a patient with PPCM may be a prognostic factor for a low recovery rate.

The targeted NGS approach that we have developed provides high- CHAPTER 4.2 throughput, rapid and affordable molecular analysis for cardiomyopathies.27 As accurate annotation of mutations in cardiomyopathies is of the utmost importance,37 we were extremely careful in classifying these.28 Our study has

Figure 3. Force measurements in heart tissue of GER4 II:1. Single cardiomyocyte of the PPCM heart sample (A). Passive force development was measured at sarcomere lengths of 1.8, 2.0 and 2.2 μm. (B)

TTN IN PERIPARTUM CARDIOMYOPATHY 219 several advantages: one is the inclusion of some large families, where co-segregation analysis added value to the classification of mutations. Another was the large number of genes we tested, including the large TTN gene, for which mutation analyses on a large scale were impossible before NGS became available, because exclusion of pathogenic mutations in 47 other candidate genes makes it more likely that the identified VUS3s have a pathogenic nature. Accordingly, the previously reported TNNC1 mutation is still the only potential genetic cause in family NL4.11 And although the pathogenicity of truncating TTN mutations is still under debate due to these types of mutations being found in apparently healthy controls (up to 3%) and the general population,32,38 the pathogenicity of TTN VUS3s identified in our families also becomes more likely after excluding pathogenic mutations in 47 other cardiomyopathy-related genes. Possible exclusion of mutations in other genes in patients carrying truncating TTN mutations was not explicitly addressed by Herman et al.32 As expected, we identified several mutations in the majority of patients, however, we focused on the pathogenic mutations and VUS3s. Other identified mutations (VUS1s and VUS2s; see Table S2) might be benign genetic variations, but some may also contribute to the development of disease in these families. Some of these VUSs might even be independently pathogenic, but additional testing is needed to confirm this (this might be the case for two VUS2s in TTN (p.Arg1408Cys (c.4222C>T) in GER2, and p.Glu2076Gly (c.6227A>G) in GER6). Other possibilities are that these VUSs may act as modifiers, or that they are risk factors with a low penetrance. The great majority of pathogenic mutations and VUS3s (7/10) were in the TTN gene, which encodes the giant sarcomeric protein titin. It was recently reported that truncating mutations in TTN account for a significant portion (approximately 25%) of the genetic etiology in familial DCM.32 The high yield of pathogenic mutations and VUS3s in TTN in our cohort of familial PPCM/ DCM cases (39%; 7/18) suggests that TTN mutations are specifically related to PPCM. Changes in isoform expression and phosphorylation status of titin have been reported in acquired forms of heart failure (reviewed by Hildalgo and Granzier).39 We were able to measure functional properties and titin isoform composition in heart tissue from one of the PPCM patients with a VUS3 in TTN. The passive force was twice as low as the value previously reported in control groups, and was associated with a shift towards the more compliant N2BA titin isoform. The shift towards more compliant N2BA has been reported in human heart failure.30,40,41 Overall, our data from functional and protein analyses support the pathogenicity of this particular TTN mutation.

220 TARGETED SEQUENCING We still classify this mutation as VUS3, however extended experience with these functional analyses might drive us to re-classify this VUS3 towards a pathogenic mutation. Recent studies indicated that titin phosphorylation is indirectly altered by increased oxidative stress42 and, as such, may represent a likely pathomechanism in PPCM. Future studies will need to reveal the functional deficits induced by mutations in the TTN gene in relation to high oxidative stress, as present in PPCM. There may be genetic factors specific for PPCM development, for example a factor tentatively underlying the geographical hotspot of incidence in Haiti, and a locus near the PTHLH gene reported by Horne et al.39 We only focused on the STAT3 gene as a possible specific genetic factor for PPCM. Because mice with cardiomyocyte-specific deletion of STAT3 develop PPCM,7 STAT3 might also be involved in human PPCM but there are no human genetic data supporting this yet. STAT3 mutations are so far only known to cause hyper-IgE syndrome.40 In contrast to the PPCM cases, some women in our PPCM/DCM families went through several pregnancies without developing PPCM. We therefore hypothesized that STAT3 mutations in the PPCM cases of these families contributed to the development of PPCM, in addition to an underlying cardiomyopathy-related mutation. However, we found no STAT3 pathogenic mutations or VUSs in these PPCM cases, which was consistent with previous findings.7 Exome sequencing of rare familial PPCM cases could lead to identifying novel genetic factors specific for PPCM. However, this approach is limited by the fact that familial PPCM cases with more than two affected relatives or with affected distant relatives are lacking. An alternative strategy could be to compare the data from exome sequencing on different CHAPTER 4.2 PPCM cases in order to identify a shared genetic cause, but this might not lead to a result because the causal genetic factor may well be unique to each family.

Limitations One limitation of our study is that it does not provide data on the frequency of familial disease in PPCM. Currently, we only have data from a German cohort reporting a positive family history for cardiomyopathy in 16.5% of PPCM cases,13 but we hope to gain more information via the Peripartum Cardiomyopathy Registry of EURObservational Research Programme (www.eorp.org). (unpublished data, 2013, manuscript submitted to European Journal of Heart Failure) Another limitation is that retrieving information on larger deletions/duplications from NGS data is not possible yet, although software to enable such analysis is being developed. We may therefore

TTN IN PERIPARTUM CARDIOMYOPATHY 221 have missed that type of mutation in our analyses. A further limitation is the difficulty of judging which TTN mutations are pathogenic, given the presence of truncating TTN mutations in the general population and reported truncating mutations that do not segregate with disease in DCM families.30,36,41 In contrast to the latter observation, we were able to show co-segregation of truncating TTN mutations/VUS3s in five of our families (NL1, NL6, NL9, NL10 and NL11; Table 2), and we have data from functional and protein analyses supporting the pathogenicity of one likely pathogenic TTN mutation (GER4 II:1). Additional functional studies on TTN mutations and collection of large families carrying these mutations are needed. Moreover, although our findings suggest a specific role for TTN mutations in families with PPCM and DCM, we do realise that the number of families studied is currently too small to definitely conclude this. Finally, we were lacking some clinical data, especially of cases that showed clinical characteristics suggestive of PPCM.

Conclusions and Practical Implications Potentially causal mutations in cardiomyopathy-related genes are common in families with both PPCM and DCM, in particular TTN mutations. The targeted next-generation sequencing approach we applied has been shown to be suitable for identifying such mutations. Functional studies as performed in the present study may provide a future tool to confirm pathogenicity of TTN mutations. Our results provide more support for the earlier finding that PPCM can be a manifestation of familial DCM. Cascade genetic screening can identify relatives at risk in those families in which an underlying mutation has been identified. Moreover, the presence of an underlying mutation or a positive family history for cardiomyopathy in a PPCM patient may be a prognostic factor for low recovery rate.

ACKNOWLEDGEMENTS We thank all the patients who participated in this study; the Study Group on PPCM of the Heart Failure Association of the European Society of Cardiology; Birgit Sikkema- Raddatz for her help in validating and implementing the targeted enrichment kit; Ludolf Boven, Eddy de Boer and Lennart Johansson for technical assistance; Wies Lommen for assistance with functional analyses; Nicolaas de Jonge, cardiologist, for cardiac evaluation of family NL10; Wilma van der Roest, genetic counselor, for counseling some of the Dutch families; and Jackie Senior for editing this manuscript. Rowida Almomani was supported by the Netherlands Heart Foundation (grant 2010B164).

222 TARGETED SEQUENCING REFERENCES 1 Sliwa K, Hilfiker-Kleiner D, Petrie MC et al. peripartum cardiomyopathy. Basic Res Cardiol. Current state of knowledge on aetiology, diag- 2013;108:366. nosis, management, and therapy of peripartum 14 Toib A, Grange DK, Kozel BA et al. Distinct cardiomyopathy: a position statement from clinical and histopathological presentations of the Heart Failure Association of the Europe- Danon cardiomyopathy in young women. J Am an Society of Cardiology Working Group on Coll Cardiol. 2010;55:408—410. peripartum cardiomyopathy. Eur J Heart Fail. 15 Møller DV, Andersen PS, Hedley P et al. The 2010;12:767—778. role of sarcomere gene mutations in patients 2 Demakis JG, Rahimtoola SH. Peripartum car- with idiopathic dilated cardiomyopathy. Eur J diomyopathy. Circulation. 1971;44:964—968. Hum Genet. 2009;17:1241—1249. 3 Hibbard JU, Lindheimer M, Lang RM. A modi- 16 Ntusi NB, Wonkam A, Shaboodien G et al. Fre- fied definition for peripartum cardiomyopathy quency and clinical genetics of familial dilated and prognosis based on echocardiography. J cardiomyopathy in Cape Town: implications Obstet Gynecol. 1999;94:311—316. for the evaluation of patients with unexplained 4 Pearson GD, Veille JC, Rahimtoola S et al. cardiomyopathy. S Afr Med J. 2011;101:394—398. Peripartum cardiomyopathy. National Heart, 17 Baruteau AE, Leurent G, Schleich JM et al. Can Lung, and Blood Institute and Office of Rare peripartum cardiomyopathy be familial? Int J Diseases (National Institutes of Health) work- Cardiol. 2009;137:183—185. shop recommendations and review. JAMA. 18 Pierce JA, Price BO, Joyce JW. Familial occur- 2000;283:1183—1188. rence of postpartal heart failure. Arch Intern 5 Sliwa K, Fett J, Elkayam U. Peripartum cardio- Med. 1963;111:151—155. myopathy. Lancet. 2006;368:687—693. 19 Voss EG, Reddy CVR, Detrano R et al. Fa- 6 Hilfiker-Kleiner D, Sliwa K, Drexler H. Peri- milial dilated cardiomyopathy. Am J Cardiol. partum cardiomyopathy: recent insights in 1984;54:456—457. its pathophysiology. Trends Cardiovasc Med. 20 Massad LS, Reiss CK, Mutch DG et al. Familial 2008;18:173—179. peripartum cardiomyopathy after molar preg- 7 Hilfiker-Kleiner D, Kaminski K, Podewski E et nancy. Obstet Gynecol. 1993;81:886—888. al. A cathepsin D-cleaved 16 kDa form of pro- 21 Pearl W. Familial occurrence of peripartum car- lactin mediates postpartum cardiomyopathy. diomyopathy. Am Heart J. 1995;129:421—422. Cell. 2007;128:589—600. 22 Fett JD, Sundstrom BJ, King ME et al. Moth- 8 Hilfiker-Kleiner D, Meyer GP, Schieffer E et er-daughter peripartum cardiomyopathy. Int J

al. Recovery from postpartum cardiomyop- cardiol. 2002;86:331—332. CHAPTER 4.2 athy in 2 patients by blocking prolactin re- 23 Ferguson JE, Harney KS, Bachicha JA. Peripar- lease with bromocriptine. J Am Coll Cardiol. tum maternal cardiomyopathy with idiopathic 2007;50:2354—2345. cardiomyopathy in the offspring. A case report. 9 Sliwa K, Blauwet L, Tibazarwa K et al. Evaluation J Reprod Med. 1986;31:1109—1112. of bromocriptine in the treatment of acute severe 24 Strung P. Familial cardiomyopathy. Peripar- peripartum cardiomyopathy: a proof-of-concept tum and primary congestive cardiomyop- pilot study. Circulation. 2010;121:1465—1473. athy in a sister and brother. Ugeskr Laeger. 10 Patten IS, Rana S, Shahul S et al. Cardiac angio- 1976;138:2567—2569. genic imbalance leads to peripartum cardiomy- 25 Mestroni L, Maisch B, McKenna WJ et al. opathy. Nature. 2012;485:333—338. Guidelines for the study of familial dilated car- 11 van Spaendonck-Zwarts KY, van Tintelen JP, diomyopathy. Eur Heart J. 1999;20:93—102. van Veldhuisen DJ et al. Peripartum cardiomy- 26 Posafalvi A, Herkert JC, Sinke RJ et al. Clinical opathy as a part of familial dilated cardiomyop- utility gene card for: dilated cardiomyopathy athy. Circulation. 2010;121:2169—75. (CMD). Eur J Hum Genet. 2013;21:doi:10.1038. 12 Morales A, Painter T, Li R et al. Rare variant mu- 27 Sikkema-Raddatz B, Johansson LF, de Boer EN tations in pregnancy-associated or peripartum et al. Targeted next generation sequencing can cardiomyopathy. Circulation. 2010;121:2176—82. replace sanger sequencing in clinical diagnos- 13 Haghikia A, Podewski E, Libhaber E et al. Phe- tics. Human Mutation. 2013;34:1035—1042. notyping and outcome on contemporary man- 28 van Spaendonck-Zwarts KY, van Rijsingen IA, agement in a German cohort of patients with van den Berg MP et al. Genetic analysis in 418

TTN IN PERIPARTUM CARDIOMYOPATHY 223 index patients with idiopathic dilated cardio- ular function in patients with dilated cardiomy- myopathy: overview of 10 years’ experience. Eur opathy. Circulation. 2004;110:155—162. J Heart Fail. 2013; 15:628—636. 42 van Heerebeek L, Hamdani N, Falcão-Pires I et 29 van Dijk SJ, Paalberends ER, Najafi A et al. Con- al. Low myocardial protein kinase G activity in tractile dysfunction irrespective of the mutant heart failure with preserved ejection fraction. protein in human hypertrophic cardiomyopa- Circulation. 2012; 126:830—839. thy with normal systolic function. Circ Heart 43 Horne BD, Rasmusson KD, Alharethi R et al. Fail. 2012;5:36—46. Genome-wide significance and replication of 30 Borbely A, Falcao-Pires I, van Heerebeek L et the chromosome 12p11.22 locus near the PTH- al. Hypophosphorylation of the stiff N2B -tit LH gene for peripartum cardiomyopathy. Circ in isoform raises cardiomyocyte resting ten- Cardiovasc Genet. 2011;4:359—366. sion in failing human myocardium. Circ Res. 44 Minegishi Y, Saito M, Tsuchiya S et al. Domi- 2009;104:780—786. nant-negative mutations in the DNA-binding 31 Tibazarwa K, Sliwa K, Wonkam A et al. Peri- domain of STAT3 cause hyper-IgE syndrome. partum cardiomyopathy and familial dilated Nature. 2007;448:1058—1062. cardiomyopathy: a tale of two cases. Cardiovasc 45 Norton N, Li D, Rampersaud E et al. Exome se- J of Afr. 2013; 24:e4—e7. quencing and genome-wide linkage in 17 fami- 32 Herman DS, Lam L, Taylor MR et al. Trunca- lies illustrates the complex contribution of TTN tions of titin causing dilated cardiomyopathy. truncating variants to dilated cardiomyopathy. N Engl J Med. 2012;366:619—628. Circ Cardiovasc Genet. 2013;6:144—153. 33 Blauwet LA, Libhaber E, Forster O et al. Pre- dictors of outcome in 176 South African pa- tients with peripartum cardiomyopathy. Heart. 2013;99:308—313. 34 Goland S, Bitar F, Modi K et al. Evaluation of the clinical relevance of baseline left ventricu- lar ejection fraction as a predictor of recovery or persistence of severe dysfunction in women in the United States with peripartum cardiomy- opathy. J Card Fail. 2011;17:426—430. 35 Fett JD, Christie LG, Carraway RD et al. Five-year prospective study of the incidence and progno- sis of peripartum cardiomyopathy at a single in- stitution. Mayo Clin Proc. 2005;80:1602—1606. 36 Duran N, Günes H, Duran I et al. Predictors of prognosis in patients with peripartum cardiomy- opathy. Int J Gynaecol Obstet. 2008;101:137—140. 37 Norton N, Robertson PD, Rieder MJ et al. Eval- uating pathogenicity of rare variants from di- lated cardiomyopathy in the exome era. Circ Cardiovasc Genet. 2012;5:167—174. 38 Golbus JR, Puckelwartz MJ,Fahrenbach JP et al. Population-based variation in cardiomyopathy genes. Circ Cardiovasc Genet. 2012;5:391—399. 39 Hidalgo C, Granzier H. Tuning the molecular giant titin through phosphorylation: Role in health and disease. Trends Cardiovasc Med. 2013;23:165—171. 40 Makarenko I, Opitz CA, Leake MC et al. Pas- sive stiffness changes caused by upregulation of compliant titin isoforms in human dilated car- diomyopathy hearts. Circ Res. 2004;95:708—716. 41 Nagueh SF, Shah G, Wu Y et al. Altered titin ex- pression, myocardial stiffness, and left ventric-

224 TARGETED SEQUENCING hypertrophy, fibrosis fibrosis hypertrophy, Dilated heart, myocyte heart,Dilated myocyte Myocyte hypertrophy Myocyte Myocyte hypertrophy, Myocyte hypertrophy, Dilated heart, hyper- Dilated myocyte Pathology and other remarks Pathology interstitial fibrosis interstitial trophy, hyperchromatic nuclei hyperchromatic trophy, Rheumatic disease

LBBB, D asthma cardiale D asthma cardiale LBBB, D (31) D MOF (27) D HF (60) D (63) (41), ICD (51), intramyo- VTs LBBB, (25) VTs PVCs (15), PVCs D (54) PVCs, VTs VTs PVCs, (70) (61), AF (70) VTs PVCs, (48), ICD (58), D HF (59) VTs PVCs, (48), ICD (50), VTs PVCs, (26) Cardiological remarks and remarks Cardiological (53), HTX, D DIC (54) appropriate ICD shock (53) appropriate cardial stem cell implantation cell stem cardial outcome (age in yrs) outcome AVB1, PVCs, VTs, ICD (61) VTs, PVCs, AVB1, LVEF at Follow-Up Follow-Up at LVEF 8 years 45% 8 years 3 months 33% 3 months 6 years 53% 6 years 9 years 18% 9 years 5 years 35-40% 5 years 20-25% 4 years 40% 4 years 7 years 40-45% 7 years 1 year 41%, 12 years 41%, 12 years 1 year LVEF at at LVEF Diagnosis 37-49% 30-40% 25% 32% 20% 25% 23% 24% 45-50% 40% 43% 44% 40% P4 P1 P4 P2 CS Pregnancy P2 CHAPTER 4.2 Few days after days Few LV with preserved LVEF LV 3 days after delivery 3 days of pregnancy37th week delivery abnormal contraction Just after delivery 4 years earlier already earlier already 4 years Timing at diagnosis diagnosis at Timing Just after delivery 10 weeks after P5, but 10 weeks SIDS DCM SCD (27) SCD (25) SCD (57) SCD (54) SCD (26) SCD DCM (83) DCM (61) DCM (61) DCM (48) DCM (48) DCM (41) DCM (22) DCM (20) DCM (28) DCM (35) DCM (57) DCM (54) Diagnosis Diagnosis PPCM (26) PPCM (33) PPCM (29) PPCM (27) (age in yrs) Possible DCM Possible Mild DCM (25) Mild DCM (63) Possible DCM (21) Possible HF HF HF Died HF Died Died Died Referred for Referred Dyspnea Died Dyspnea, chest pain Died Died Screening Screening, Screening Screening Screening Screening fatique, Screening, Screening Screening Screening Screening palpitations palpitations Cardiogenic shock Cardiogenic F F F F M/F M M M M M M M M M M F F F F F F F F F F F F F II:6 III:4 III:3 III:1 Patient II:1 II:3 II:4 II:5 III:5 III:6 III:8 I:2 II:2 II:3 III:2 IV:2 IV:4 IV:5 IV:6 IV:8 IV:9 I:2 I:3 II:2 II:1 II:4 II:2 NL2 NL3 NL1 NL1 NL2 NL2 NL2 NL2 NL2 NL2 NL2 NL2 NL2 NL3 NL3 NL3 NL4 NL4 Family NL1 NL1 NL1 NL1 NL1 NL1 NL1 NL2 NL4 Table S1. Clinical characteristics of PPCM cases and all affected (or likely affected) relatives relatives affected) S1. Clinical characteristics of PPCM cases and all affected (or likely Table

TTN IN PERIPARTUM CARDIOMYOPATHY 225 New pregnancy, pregnancy, New terminated (35) terminated heart, PAD (50) heart, PAD Pathology and other remarks Pathology Raynaud's phenomenon (17) Signs of acute myocarditis myocarditis Signs of acute (EMB), suspicion of vasculitis Chemotherapy (5FU/LV), (5FU/LV), Chemotherapy (59) colon carcinoma (50) carcinoma colon TIA and LE due to thrombus thrombus TIA and LE due to Aortic bifurcation prothesis prothesis Aortic bifurcation HF (51) (51) HTXPM, (37), normal LVEF ICD/CRT (31), LVAD, VF, D VF, ICD/CRT (31), LVAD, tachycardia MI (40), PVCs, LBBB, severe severe LBBB, MI (40), PVCs, LBBB PVCs (54), AF (56) HTX, D (58) (18), ICD (17), MI, PTCA PVCs, HTX (19) D (44) HTX D (52) list, waiting HTX (48), PVCs (62), normal (73) LVEF (47), LA hemiblock (53), PACs HTX, D (66) IV conduction delay, PM (50), PM IV conduction delay, VF, VT, ICD (25) VT, VF, Cardiological remarks and remarks Cardiological MI, coronary VT, Collapse, Thrombus LV apex, apex, LV Thrombus VT (35),TIA, apex, LV Thrombus AF (30), PVCs, VTs (46), D VTs AF (30), PVCs, outcome (age in yrs) outcome sclerosis, PTCA, ICD (72), AF (79) PTCA, sclerosis, cardiogenic shockcardiogenic (34) Thrombus heartThrombus (50) AF AF RVAlso dysfunction AF (62), D (78)

LVEF at Follow-Up Follow-Up at LVEF 9 months no recovery9 months 8 years 10% 8 years 8 years 45% 8 years 3 years normal 3 years 6 months 44%, 6 months 23% 6 months 55%, 6 months 7 years 42% 7 years 2 years 40% 2 years 4 years 50% 4 years 7 years stable 7 years LVEF at at LVEF Diagnosis 30% 21% 50% 23% 23% 25% 35% 20% 23% 21% 40% 40-45% 18% <30% P1 AI CS P3 P1 CS P2 P1 CS, twin Pregnancy pregnancy 29th week, 29th week, eclampsia 3 months after3 months delivery of 35th week pregnancy 2 months after2 months after delivery 2 weeks delivery Just after delivery Just after delivery Timing at diagnosis diagnosis at Timing DCM DCM months) DCM (50) DCM (74) DCM (70) DCM (61) DCM (50) DCM (16) DCM (42) DCM (42) DCM (25) DCM (58) DCM (47) Diagnosis Diagnosis PPCM (30) PPCM (33) PPCM (29) PPCM (23) PPCM (35) PPCM (30) (age in yrs) Mild DCM (16 Mild DCM (62) Possible DCM (46) Possible DCM (53) Possible DCM (72) Possible insufficiency HF HF HF HF HF respiratory HF, Referred for Referred Heart murmur HF Dyspnea Dyspnea Dyspnea VF Screening Screening, Screening Screening palpitations Collapse AF F F F F F F M/F M M M M M M M M F F F F F F F F F F III:2 III:1 III:2 III:1 III:3 III:1 Patient II:5 III:5 IV:1 II:1 II:4 II:1 II:3 II:2 II:3 II:4 IV:1 II:3 II:5 III:5 II:1 II:2 II:6 III:2 NL4 NL5 NL6 NL7 NL8 NL9 Family NL4 NL4 NL4 NL5 NL5 NL6 NL6 NL7 NL8 NL8 NL8 NL9 NL9 NL9 NL10 NL10 NL10 NL10

226 TARGETED SEQUENCING No signs of myocarditis (EMB) of myocarditis No signs Suspicion of neurodermitis Graves’disease, nicotin and nicotin Graves’disease, CVA (29) CVA drug abuse drug

No recovery No recovery No with uneventful recovery Full D after no recovery, BiVAD, ICD, HTX ICD, tered with 30% LVEF , VAD afterVAD , with 30% LVEF tered PVCs (46), AT, VT, ICD (51) VT, PVCs (46), AT, advised Echocardiogram Worsening HF after 2nd Worsening D Subsequent pregnancySubsequent en- 2nd pregnancy 2 years later 2nd pregnancy 2 years no recovery 2nd pregnancy, 2 years Tachycardia delivery, D (30) delivery,

50-55% year 47% year 6 months 55%, 6 months 37%, 6 months 6 months 30% 6 months 25% 6 months 36%, >1 6 months no recovery 6 months 2 years 45%, 3 years 45%, 3 years 2 years 24% normal 2 years 4 months 30-35% 4 months 10 years 56% 10 years 20-30% 31% 26% 22% 20% 25% 25% 25% 25% 43% 40% 45% <30% P2 CS HELLP P2 P1 P1 weeks P2 SB 27 29th week, 29th week,

CHAPTER 4.2 3 weeks after delivery 3 weeks 3 months after3 months delivery Died soon after delivery Died after delivery Just after delivery 1 month after1 month delivery DCM PPCM PPCM PPCM PPCM PPCM SCD (28) PPCM or DCM (46) DCM (39) DCM (26) PPCM (36) PPCM (20) PPCM (23) PPCM (22) PPCM (33) myocarditis Arrhythmias Arrhythmias (unspecified) Mild DCM (28) PPCM or DCM PPCM or DCM SCD or PPCMSCD (<35) Dyspnea, Dyspnea, fainting tachycardia HF near Palpitations, HF Screening, Screening, asymptomatic Chest pain, Chest coughing

F F F F F F F F F F M M M F F F F F F F F III:6 III:1 II:5 II:6 II:1 II:1 II:1 II:1 II:1 II:1 III:5 II:1 II:1 II:3 I:1 II:2 I:1 I:1 I:1 I:1 I:1 NL10 NL11 NL10 NL11 PAD, peripheral arterial disease; PM, pacemaker; PPCM, peripartum cardiomyopathy; PTCA, percutaneous transluminal coronary transluminal angioplasty; ventricular percutaneous arterial peripheral pacemaker; PVC, premature PPCM, peripartum PTCA, disease; PM, cardiomyopathy; PAD, SA1 SA1 SA1 SA1 platelet count; HF, heart failure; HTX, heart transplantation; ICD, implantable cardiac defibrillator; IV, intraventricular; LA, left left block; anterior; bundle branch lung LBBB, LE, intraventricular; defibrillator; heart IV, cardiac implantable HTX, failure; heart ICD, transplantation; HF, count; platelet GER1 GER2 GER3 GER4 GER5 GER6 GER2 GER5 GER6 artificial insemination; AVB, atrioventricular block; (Bi)(L)VAD, (bi)(left) ventricular assist device; CRT, cardiac resynchronization therapy; CS, caesarean section; CVA, cerebral vascular cerebral therapy; section; CS, caesarean resynchronization CVA, cardiac CRT, (bi)(left)ventricular assist device; block; (Bi)(L)VAD, atrioventricular AVB, artificial insemination; enzymes, liver low elevated hemolysis, HELPP, female; F, biopsy; endomyocardial EMB, coagulation; intravascular diffuse DIC, cardiomyopathy; dilated DCM, death; D, accident; Confirmed PPCM cases are displayed in bold; affected cases with clinical characteristics suggestive for PPCM are displayed in bold and italic. AF indicates atrial fibrillation; AF indicates in bold and italic. AI, are displayed for PPCM cases with clinical characteristics in bold; affected suggestive displayed PPCM cases are Confirmed embolism; LV, left ventricle; LVEF, left ventricular ejection fraction; M, male; MI, myocardial infarction; MOF, multiple organ failure; P, pregnancy; PAC, premature atrial contraction; premature pregnancy; PAC, P, failure; multiple organ infarction; MOF, left ejection ventricular fraction;MI, myocardial M, male; left LVEF, ventricle; embolism; LV, ventricular tachycardia. VT, fibrillation; ventricular VF, ischemic attack; TIA, transient syndrome; death SIDS, sudden infant death; sudden cardiac still birth; SB, contraction; SCD, GER1 GER3 GER4 GER5

TTN IN PERIPARTUM CARDIOMYOPATHY 227 Table S2. Overview of mutations classified as VUS1 or VUS2

Family Tested patient Gene Amino acid change Nucleotide change Classification NL2 III:2 TTN p.Glu10855dup c.32562_32564dupAGA VUS1 NL4 III:2 LAMA4 p.Met1202Val c.3604T>C VUS1 NL4 III:2 PRKAG2 3'UTR c.*2C>T VUS1 NL4 III:2 TTN p.Ala9135Pro c.27403G>C VUS1 NL5 II:1 PKP2 p.Asp26Asn c.76C>T VUS1 NL6 II:3 DMD p.Asn2713Ser c.8138T>C VUS2 NL6 II:3 RYR2 p.Ile2721Thr c.8162T>C VUS2 NL6 II:3 TTN p.Glu21080Lys c.63238G>A VUS1 NL7 II:2, III:1‡ MYBPC3 p.Ala833Thr c.2497G>A VUS2

NL7 II:2, III:1‡ TMEM43 p.Arg312Trp c.934C>T VUS1

NL7 II:2, III:1‡ TTN p.Glu10855dup c.32562_32564dupAGA VUS1 NL8 III:3 RBM20 p.Ser637Asn c.1910G>A VUS2 NL10 II:6 TTN p.Arg279Trp† c.835C>T† VUS2 NL10 II:6 TTN p.Pro17045Ala c.51133C>G VUS2 NL11 III:1 TTN p.Lys4401Glu c.13201A>G VUS1 SA1 II:5 MYPN p.Ser774Tyr c.2321C>A VUS2 SA1 II:5 PKP2 p.Val842Ile c.2524C>T VUS1 SA1 II:5 TTN p.Ser1400Thr c.4199G>C VUS1 SA1 II:5 TTN p.Glu18378Lys c.55132G>A VUS1 SA1 II:5 TTN p.Val32108Met c.96322G>A VUS2 SA1 II:5 TTN p.Arg33402Cys c.100204C>T VUS2 GER2 II:1 PKP2 p.Ile531Ser c.1592T>G VUS1 GER2 II:1 TTN p.Arg1408Cys c.4222C>T VUS2 GER5 II:1 TTN p.Glu15076Asp c.45228G>C VUS1 GER5 II:1 TTN p.Ile17461Thr c.52382T>C VUS2 GER6 II:1 TTN p.Glu2076Gly c.6227A>G VUS2

Nomenclature according to HGVS (Human Genome Variation Society) using the reference sequences: TTN (NM_001256850.1; Q8WZ42-1), LAMA4 (NM_001105206.1), PRKAG2 (NM_016203.3), PKP2 (NM_004572.3), DMD (NM_004006.2), RYR2 (NM_001035.2, MYBPC3 (NM_000256.3), TMEM43 (NM_024334.2), RBM20 (NM_001134363.1), MYPN (NM_.032578.2). VUS indicates variant of unknown clinical significance (VUS1, unlikely to be pathogenic; VUS2, uncertain). ‡ II:2 and III:1 were both analyzed; only shared mutations were investigated further (analyzed in silico) † pathogenic mutation on same allele (p.Arg23956Thrfs*9 (c.71867_71876delGAGTTCTGGA))

228 TARGETED SEQUENCING

CHAPTER 5 DISCUSSION

Chapter 5: Discussion

Discussion and future perspectives

Anna Posafalvi

Cardiomyopathy is an insidious disease of the myocardium, which can manifest with a wide range of symptoms at various ages, but which usually presents in adulthood. Currently, there are 76 genes known to be involved in the familial form of this disease (for an overview of known disease genes, see the preface of this thesis). Cardiomyopathy has several subtypes in which impairment of various molecular pathways leads to insufficient circulation (as reviewed by Teekakirikul et al). Hypertrophic cardiomyopathy (HCM) was initially thought to be primarily a disease of sarcomeric proteins, while arrhythmogenic right ventricular cardiomyopathy (ARCV) was considered mostly a disease of the desmosomal complex. Restrictive cardiomyopathy has been frequently shown to be caused by desmin (and sometimes sarcomeric) mutations. In addition to these molecules, a large number of proteins responsible for the construction of the cytoskeleton and the nuclear envelope or having a role in calcium/sodium handling have been shown to be involved in dilated cardiomyopathy (DCM) (see review by Posafalvi et al). However, there is increasing evidence that it is not only the phenotypic characteristics of these cardiomyopathy subtypes that are entwined and overlapping, but that the same overlapping pattern is present in their genetic background, as mutations of known genes are increasingly discovered to underlie other subtypes of the disease (Teekakirikul et al). An example of this overlap is described in chapter 4.1 of this thesis: our diagnos- tic screening of 55 genes implicated in different types of cardiomyopathy led to the discovery of potentially pathogenic variants in genes that would not have been chosen for sequencing in the earlier Sanger-sequencing era. At that time, decisions about which genes to sequence were made based on the clinical phenotype of the patients, and screening was limited to a small number of genes per patient. In addition to the examples of genes now shown to be involved in previously unexpected cardiomyopathy subtypes (reported in chapter 4.1), the complicated genetic overlap among the types of disorder that were already CHAPTER 5 known is visualized in figure 2 of the preface. Traditionally, Sanger-sequencing of a few disease genes was the standard method used in genetic diagnostics of cardiomyopathies, and the same method was also applied to the screening of novel candidate genes in a research setting. The recent development of whole genome, exome, and gene panel-based high-throughput sequencing technologies created revolutionary possibilities for both diagnostic and research-related applications. The work described in this thesis shows the recent impact of these technical developments on diagnostics and research in cardiogenetics.

DISCUSSION AND FUTURE PERSPECTIVES 235 Due to the difficulty of defining a small and distinct set of candidate genes to be screened based purely on the specific phenotypic features that a familial cardiomyopathy patient exhibits, we currently apply the gene panel- based targeted-next-generation sequencing described in chapter 4 in the routine DNA diagnostics of cardiomyopathies. Using this technique we are able to sequence 55 well-established disease genes in one experiment, and are in some cases able to identify the genetic explanation of the disease in a gene which would not have been chosen for screening by classical Sanger- sequencing based on the patient’s phenotype (for examples, see chapter 4.1). In those individuals whose cardiac health problems could not be explained by genetic variation in the 55 known genes, exome sequencing combined with a haplotype sharing test (when appropriate and depending on the size of the family) seemed to be an effective way of searching for novel candidate disease genes (as shown in chapter 3) that can later be searched for in screens of larger cardiomyopathy cohorts by Sanger-sequencing (as shown in chapter 2). This approach requires phenotypically well-characterized, multi-generational families in which the affected/healthy disease status of individuals has been clearly determined. A flowchart of the cardiogenetics workflow as currently applied in our department is shown in figure 1.

How can we know that we have found the true causative variant? When sequencing a set of 55 disease genes, there is a fair chance that we will identify likely causal genetic variants in at least one of them. For this reason, we need to take further steps to verify that the variant we are looking at is truly the cause of the patient’s disease. After checking the predicted pathogenicity of the variant using multiple software packages, the presence/absence and (if applicable) frequency of the variant in different population frequency databases (such as GoNL, and the slightly more critically handled dbSNP, 1000G or ESP, which may contain causative variants as well) and performing segregation analysis in the family (checking if all affected family members carry the putative causative variant), we might be able to finally classify the variant as ‘benign’, ‘likely benign’, ‘variant of unknown significance’, ‘likely pathogenic’ or ‘pathogenic’. In this thesis, we have used strict and robust criteria for variant classification (see chapters 2 and 4 for description and examples). Additionally, we may easily screen a patient cohort to search for an additional carrier of the same variant, and if we identify further (unrelated) patients carrying the same mutation, we would apply haplotype analysis in the hope of discovering a potential founder effect.

236 DISCUSSION DNA sample of cardiomyopathy index patient

gene-panel based targeted NGS DIAGNOSTICS (currently for 55 cardiomyopathy genes)

„solved cases”: in about 50% of patients we identify the (likely) pathogenic mutation(s) in one or more known cardiomyopathy gene(s)

„unsolved cases”: hunting for a novel disease gene RESEARCH (HST and/or exome sequencing)

„solved cases”: mutation identified in a novel cardiomyopathy gene

„unsolved cases”: candidate gene screening: searching for missing heritability additional patients with mutations of the same gene in a large cohort

functional follow-up on the novel gene

Figure 1: Current cardiogenetics workflow Since gene panel-based targeted-sequencing is a straightforward approach that sequences all cardiomyopathy disease genes in one experiment, we now implement this as a routine diagnostic screening test. Unsolved cases might later be subject to haplotype sharing analysis, whole exome or genome sequencing, or other disease-gene hunting methods. Novel disease genes identified in these ways are then Sanger-sequenced in large patient cohorts (although we might expect some of these to be private mutations/ genes in the families examined), and may be further investigated functionally. In order to ensure up-to-date DNA diagnostics, newly discovered and well-established disease genes can periodically be added to the targeted enrichment kit used for gene panel- based sequencing. CHAPTER 5 In order to have a more precise idea of the potential pathogenicity level of genetic variants, and to be able to better prioritize variants in large datasets (e.g. as a result of exome/genome sequencing), it is crucial that more reliable and standardized prediction programs and software become available. For example, the novel Combined Annotation-Dependent Depletion tool seems to outperform existing software and sources in predicting deleteriousness via incorporating known databases and tools as well as results of the ENCODE project (Kircher et al). Other bioinformatics tools such as well-established annotation databases (a good example is the Cardiovascular Gene Ontology Annotation Initiative) and network tools (for instance the co-expression

DISCUSSION AND FUTURE PERSPECTIVES 237 network Cytoscape, or protein interaction networks which contain functional information on the genes supported by the literature) have also proven to be of great utility. Chapter 3.1 is a straightforward demonstration of how to use such sources in interpreting high-throughput sequencing data. Ultimately, the best way to prove pathogenicity is to perform functional studies on the identified variants themselves, and examples of functional analyses via in vitro experiments are also described in this thesis. In chapter 2.1 we show how we tried to experimentally evaluate the expected pathogenicity of RBM20 variants and mutations using a splicing assay. In chapter 3.2 we measured the enzyme activity of superoxide dismutase in patient-derived fibroblasts in order to prove the pathogenicity of a missense variant of the Mn-binding pocket. Finally, in chapter 4.2 we analysed the titin isoform composition and passive force generation of single cardiomyocytes isolated from explanted tissue of a TTN frameshift mutation carrier. There are other ways of acquiring further evidence on the pathogenic nature of the detected genetic variant via functional analysis. A popular but time-intensive method is to set up a knock out/knock in gene in an animal model. Examples for the use of such models to gain more knowledge about the general function of a gene or protein related to the content of this thesis are • the RBM20 knock out rat, which has been used for the identification of target RNA molecules of the spliceosomal RBM20 via sequencing of RNA isolated from heart biopsies of mutant and wild type animals (Guo et al) • the null-mutant, tissue- and isoform-specific knock outPLEC mice, from which much has been learned about the function of plectin in the past decades (Winter et al) • the lethal mice and Drosophila SOD2 knock outs suggesting the essential role of this enzyme in the heart (Li et al, Kirby et al) • the zebrafish model showing the effect ofCOBL knock out on embryonal development of the neural tube and heart (Ravanelli et al). However, creating an animal model carrying the homolog of the investigated gene with an identical mutation to the one our patient carries is usually a complicated job, which only the recent development of novel gene targeting technologies (TALEN and CRISPR/cas) makes more feasible (Menke 2013). Alternatively, fibroblasts may be acquired from the patient (via a ’simple’ skin biopsy), reprogrammed through iPS cells then differentiated into specific cell types such as cardiomyocytes. These cells will be genetically (and also theoretically phenotypically) identical to those in the heart of the

238 DISCUSSION patient, yet are acquired in a less invasive way than a cardiac biopsy sample obtained via catheterization. The derived cells can be used to examine arrhythmogenic cardiac phenotypes and the underlying molecular pathways, as well as to investigate potential opportunities for personalized and/or regenerative therapy. However, this novel technique has been criticized for the low yield of cardiomyocytes produced, their tendency to dedifferentiate and the immature electrophysiological character of the derived cells. It is of crucial importance that the derived cardiomyocytes contain plenty of the genetic variants (mutations and polymorphisms alike) carried by the patient. Therefore, the combination of this technique with a rescue experiment is necessary to exclude other variants from the disease pathomechanism and to prove the sole pathogenicity of the candidate variant under investigation (Sinnecker et al, Knollmann et al).

Where is the “missing heritability” and what indications do we have of the mechanisms in cardiomyopathies? To date, a significant proportion of familial cardiomyopathies (about 30-40% of HCM, 40-50% of ARVC, and around 50% of DCM cases) remain genetically unexplained. Below we describe some of the possible underappreciated mechanisms that might be behind the “missing heritability” for cardio-myopathies on the DNA, RNA and protein levels, respectively.

1. On the DNA level In past decades, cardiomyopathy was primarily considered a monogenic disorder, most often exhibiting an autosomal dominant pattern. In the families that do not carry mutations of the 76 disease genes identified so far, we can

search for novel candidate genes by exome or whole genome sequencing, when CHAPTER 5 appropriate, in combination with the haplotype sharing test. It is important to keep in mind that some of these families might carry private mutations and no additional affected carriers will be identified in follow-up screening of large patient cohorts. Chapter 3 shows some examples of how we tried to identify novel disease genes in one autosomal recessive family and several autosomal dominant families, with the latter group being naturally much more challenging. The potential oligogenic background of late onset heart diseases is an increasingly popular concept, with a growing number of publications in HCM and ARVC supporting this idea. We also discuss it in chapters 2.2 (in which

DISCUSSION AND FUTURE PERSPECTIVES 239 variant effect size

large Mendelian/ monogenic

di-/trigenic

medium oligogenic

complex cases small (series of risk factors)

variant very rare variant rare variant uncommon variant common variant frequency <0.1% 0.1-1% 1-10% >10%

familial solitary cases cases increasing environmental influence

Figure 2: Different disease models in cardiomyopathies Rare genetic variants of large effect size cause Mendelian, monogenic diseases, while variants of relatively high frequency, but small effect size, are the ones classically identified by GWAS, and associated with certain complex phenotypes. Familial cardiomyopathies are traditionally considered and investigated as monogenic disorders, while a smaller number of studies have tried to establish genetic associations in relatively “large” cohorts of not necessarily familial forms of the disease (between HLA genotypes and cardiac phenotype, for example). Recently there were a few reports of di- and oligogenic cardiomyopathy cases, which suggest the possibility that variants of relatively low frequency and medium effect size may increase an individual’s susceptibility to the disease and may also mediate the environmental influence on the disease onset and phenotypic variability. The degree of darkness of the clouds indicates how well studied those disease models are in cardiomyopathies.

we argue that genetic variants of PLEC are expected to be involved in passing the threshold needed for the manifestation of ARVC) and 4.1 (in which we show that 15% of our diagnostically screened patients carry more than one potentially pathogenic variant). Our results suggest that it is not only novel or very rare variants with large effect size that may be implicated in the disease,

240 DISCUSSION but that there are also several low frequency variants with slightly lower effect size that may increase the genetic susceptibility for cardiomyopathy in a non-monogenic disease model (see figure 2). Additionally, the association of cardiomyopathies with complex diseases such as diabetes or coeliac disease has been observed, supporting the idea that alleles of relatively high frequency and low effect size may be involved in amultifactorial background of the disease. For example, the increased risk for DCM in patients with coeliac disease (a disease famous of its complex genetics) was apparent, but statistically not significant in a large population-based study (Emilsson et al), while “diabetic cardiomyopathy” is a well-known disease entity that can be treated with targeted antioxidant therapy (reviewed by Huynh et al). There are also a couple of studies that support the idea of the complex genetics of cardiomyopathy by marking the connection between certain HLA genotypes (for example, the HLA-DQB1 0309 allele) and DCM (Pankuweit et al). Yet, due to the relatively low incidence of the disease and the low number of affected individuals, it is not easy to perform classical genome-wide association studies (GWAS) in cardiomyopathy while looking for risk or protective factors. It may be possible to compare frequencies of genetic variants of cardiomyopathy genes between patient and control cohorts upon DNA sequencing, but this will require the collection of material from patient cohorts from many different laboratories, while also taking into account the ethnic background of these patients. The role of mitochondrial processes in cardiomyopathies is evident, yet only a few genes related to these processes are shown to be the potential cause of the disease. In this thesis, we have described a mutation of the chromosomally encoded mitochondrial enzyme SOD2, which led to lethal cardiomyopathy with additional mitochondrial symptoms in a homozygote newborn (chapter 3.2). There are also mitochondrially encoded tRNA genes that have been reported to be causative, such as the mutation of the gene CHAPTER 5 encoding tRNA glutamic acid that in nearly homoplasmic state proved fatal in an infant (Van Hove et al). In the case of inherited cardiomyopathies, there has not yet been enough attention paid to possible large indels and copy number variations (CNVs). There are only a few examples of CNVs identified to date: those of the BAG3 gene via array CGH (Norton et al), single large deletions of LMNA in DCM (Gupta et al) and PKP2 in ARVC (Li Mura et al), and a large duplication observed in MYBPC3 in HCM (Meyer T et al). The remaining unsolved affected families may also have duplications, large insertions, or deletions underlying their

DISCUSSION AND FUTURE PERSPECTIVES 241 phenotypes. Hopefully, in the future, exome and whole genome sequencing methods will provide us with sufficient information on these types of genetic variants. Epigenetic regulations (the potential mediators of gene-environment interaction via chromatin modification) have never been associated with familial cardiomyopathies. Instead, the influence of the environment is ex-pected to trigger the onset of the phenotype in other ways. For example, pregnancy in individuals who are genetically susceptible for DCM causes earlier onset of the disease (see chapter 4.2), while stress and over-exercising probably contribute to individuals passing the thresholds for the development of an ARVC phenotype (Perrin et al).

2. On the RNA level The discovery of RBM20 mutations and the multiple RBM20-target molecules and their heart-specific splicing pattern meant the beginning of a new era in cardiogenetics, and this RNA-based pathway is also closely examined in this thesis (chapter 2.1). Yet we still do not know much about the potential role of miRNAs in cardiomyopathy, and the potential differential splicing effect of variants of known cardiomyopathy genes is also usually underestimated. These can easily be investigated by RNA sequencing. Perhaps the most exciting problem related to the role of RNA molecules in cardiomyopathy is that of the titin gene (TTN). The longest gene of the human genome, TTN has been connected to heart failure (Hein et al) and DCM (Gerull et al) for about two decades, but was never extensively screened due to its enormous size (~0.3Mb). Making things more complicated, TTN is not only large, but also has a highly complex pattern of post-translational modifications on the protein level. TTN also undergoes random changes on the RNA-level before translation: during its age-dependent splicing it randomly loses a gradually increasing part of the gene between exons 50 and 219 (Guo et al). TTN has been recently reported to harbour truncating variants in familial and sporadic DCM (Herman et al), and is nowadays often screened for due to the availability of easy-to-perform gene panel-based sequencing platforms (also shown in chapters 4.1 and 4.2). The inclusion of this gene in DNA-diagnostics resulted in the identification of truncating mutations in ~15% of DCM cases (chapter 4.1). Despite these advances in screening, a problem we continue to face is that we might be underestimating the importance of missense variants. It is possible that the transcribed mRNA

242 DISCUSSION molecules carrying truncating variants are subject to nonsense mediated decay leading to decreased protein production that only becomes a serious issue in homozygous state, while the right missense variant could disrupt a domain or of key importance in the encoded protein and perturb its function in a heterozygous form. Yet despite the existence of some limited literature on functional evaluation of TTN missense variants (e.g. a missense mutation of the N2B domain specifically expressed in cardiac isoforms of titin caused a cardiomyopathy-like phenotype in zebrafish (Xu et al)), and further N2B mutations shown to affect the binding of various interacting proteins via yeast-two-hybrid assays by Matsumoto et al), we are biased towards the truncating mutations due to the recent finding of Herman et al that up to 25% of familial DCM is caused by them. In case we identify them localized in one of the exons that might get spliced out in some individuals (hence rescuing the onset of any sort of heart symptoms), it is quite difficult to correctly determine the pathogenicity level even for TTN truncations, let alone missense variants. In chapter 4.2, we have performed a functional experiment measuring passive force in single isolated patient cardiomyocytes, and our result supported the “pathogenic” labelling of that studied frameshift variant.

3. On the protein level Though it has not received much attention thus far, protein aggregation, a current focus in the field of neurodegeneration, may also be related to the pathomechanism of cardiomyopathies. There are examples in the literature showing that certain proteins do form aggregates and are therefore expected to lead to cardiovascular abnormalities. It has, for instance, been previously shown that PLEC knock out mouse models as well as skin biopsies of PLEC mutant patients with EBS-MD have large, desmin-positive protein aggregates accumulating in their cells (reviewed in Winter & Wiche and also mentioned CHAPTER 5 in chapter 2.2). An exciting, translational potential of this mechanism was demonstrated by the recent discovery that protein aggregation could be inhibited and the phenotype improved in the muscles of plectin deficient conditional knock out mice by the chemical chaperon 4-phenylbutyrate (Winter et al 2014). Desmin aggregation is also a known phenomenon in heart failure (Sanbe et al) and in desminopathies (myopathies and cardiomyopathies related to abnormal desmin) caused by mutations of DES (desmin), CRYAB (alpha- B-crystallin or small heat shock protein), MYOT (myotilin), BAG3 (BCL2-

DISCUSSION AND FUTURE PERSPECTIVES 243 associated athanogene 3), LDB3 (LIM domain-binding 3), or FLNC (filamin C) (reviewed by Goldfarb et al). Interestingly, the expression of the BAG3 gene (for which a large deletion of about 8 kbp and point mutations have been reported in DCM, Norton et al) was shown to suppress the aggregation and cytotoxic effect of mutant CRYAB in cultured cells (Hishiya et al), a discovery that links the two genes to a mutual pathway. PSEN 1 and 2, the genes connected to Alzheimer disease as well as cardiomyopathy, are also known to be involved in the formation of amyloid plaques in the myocardium of DCM patients (Gianni et al). Deletions of the PLN gene were also recently shown to lead to perinuclear aggregates of the encoded protein in the hearts of deceased DCM and ARVC patients (manuscript submitted). Hopefully, a better understanding of protein aggregation in cardiomyopathies will open up novel possibilities of targeted therapy using various molecules with chaperone activity.

Further aspects and mechanisms While not yet extensively studied, an interesting observation is that there are some gender differences observed in the epidemiology, genetics, and clinical course of autosomal inherited cardiomyopathies (reviewed by Meyer S et al and Fairweather et al). Beyond the environmental influence of the cardiovascular challenges occurring during pregnancy that trigger peripartum cardiomyopathy (PPCM) or DCM at an earlier age in genetically susceptible women (see chapter 4.2), there are also hormone-related pathways involved in the pathomechanism. For example, male LMNA carriers are more severely affected than females, and this observation was associated with the nuclear accumulation of androgen receptors in LMNA mutant mice (Arimura et al). In contrast, a recent retrospective study found no worsening of symptoms in LMNA mutation-carrying women during pregnancy (Palojoki et al). The practical implications of gender differences for the diagnosis, management, and pharmacotherapy of cardiomyopathies were discussed in detail by Fairweather et al. Another question is how certain genes can be involved in the pathomechanisms of several diseases affecting multiple organs leading to a combined phenotype, while in other cases the same genes only cause the disease of one organ. There is a well-known correlation between ARVC, generalized myopathy and various skin diseases. For instance, truncating mutations of PLEC cause epidermolysis bullosa, yet missense mutations are observed in

244 DISCUSSION cardiomyopathy without the involvement of blistered skin (chapter 2.2). But there are many other desmosome-related genes also involved in dermatological diseases: for example, mutations in JUP cause palmoplantar keratoderma with woolly hair, while in DSP they may result in lethal acantholytic epidermolysis bullosa, or skin fragility with woolly hair. Systemic muscular involvement occurs quite frequently in cardiomyopathies: e.g. LMNA mutations were found in limb-girdle dystrophy and lipodystrophy besides DCM. PSEN1 and PSEN2 genes, when mutated, lead to neurodegeneration (Alzheimer’s disease), just as mutations in the potassium channel KCND3 are associated with another disease of the central nervous system, spinocerebellar ataxia, and with Brugada syndrome (characterized by lethal arrhythmia) (Duarri et al). Mutations in cardiomyopathy genes may affect the health of the sensory organs as well, for instance, in the case of EYA4 causing hearing loss with DCM and CRYAB causing cataract, and/or myofibrillar myopathy with DCM.

…about pharmacogenetics in a nutshell The basic principle of personalized medicine and pharmacogenetics was created some fifty years ago with the idea that serious side-effects could be prevented and the therapeutic response optimized, if only we were able to give the right medicine in the right dose to the right patient, making the right decision based on his/her individual genetic make-up. Even though genetic research has gone through unprecedented development in the past few decades, and pharmacovigilance databases provide excellent research material for such studies, the number of truly practical implications in patient stratification is still limited. We have some examples showing the efforts to stratify cardiomyopathy patients, yet these mostly resulted in treatment protocols based not on the CHAPTER 5 genetic background but rather on the symptoms of the patients. For instance, patients suffering from DCM with asymptomatic systolic dysfunction are thought to benefit from pharmacological treatment (Colucci et al). Further examples include the recent observation that PPCM patients may have improved left ventricular ejection fraction when under bromocriptine treatment (Sliwa et al), or that ARVC patients carrying a PLN p.Arg14del mutation need the implantation of an ICD earlier than other ARVC patients (van der Zwaag et al). Based on the genetic background of a patient, classical pharmacodynamic or pharmacokinetic pharmacogenetics could be implemented. Pharmaco-

DISCUSSION AND FUTURE PERSPECTIVES 245 dynamic pharmacogenetics of cardiomyopathies is not yet a rewarding research field, because cardiomyopathy is usually treated with widely used cardiovascular drugs (such as beta blockers, ACE inhibitors, or calcium channel blockers). These are not known to cause devastating adverse drug reactions (bizarre or type B ADRs) and, if not well tolerated by the patient, are easily replaced by a comparable drug targeting a different pathway. In contrast, pharmacokinetic pharmacogenetics, may be much more promising, because it is of utmost importance that these drugs are administered in the right dose, taking into account the patients’ metabolic abilities to achieve optimal blood concentrations of the drug. Genes and the SNPs observed to have an influence on the blood concentration of certain cardiomyopathy medicines could be included in the cardiomyopathy gene panel tests in the future. This would mean that, in parallel with the molecular diagnosis of a cardiomyopathy patient, we could also obtain sequence information to help in immediately adjusting the dose of the drug, and this would facilitate complex counselling (as attempted following personal genome sequencing by Ashley et al). Yet, at this moment, alleles of known SNPs of genes associated with slower/faster drug metabolism can be much faster, cheaper and more easily identified using a genotyping array of limited size. Also, the complete lack of knowledge about the truly functional genetic variants means it is currently not worthwhile to apply sequencing for patient stratification. In the past decades, another very exciting research area of stratified medicine related to cardiomyopathies has been the struggle to find out why certain drugs used for the treatment of other diseases lead to cardiomyopathy as a result of cardiotoxic side-effects (reviewed by Ky et al). An example is the dilated cardiomyopathy frequently observed after anticancer treatment using anthracycline molecules (briefly touched upon in chapter 3.2). Even though some patients are in danger of being sensitive to the cardiotoxicity of, for example, doxorubicin, they might not have any alternative treatment option available. Different methods of drug formulation, and hopefully better preventive combinations will soon be available to alleviate the toxic side- effects (reviewed by Octavia et al and Carvalho et al).

CONCLUSIONS Cardiomyopathy is both a clinically and genetically complex disorder. Even though currently 76 genes are known to be involved in the heritable forms of the disease, we cannot explain the familial accumulation of the phenotype

246 DISCUSSION in many cases. This thesis provides an overview of the development of molecular genetic methods implemented during recent years in the research and diagnostics of cardiomyopathies. It contributes to the field through the discovery of novel disease genes as well as through the establishment of new and highly effective methods for molecular diagnostics. In spite of the recent technological advances, the genetic cause of the disease often remains unknown in affected families, as do the complex interactions of environmental and genetic factors. Hopefully the molecular pathways underlying the disease will be extensively studied in the future, ultimately leading to novel translational solutions and practical implications for patients.

REFERENCES Arimura T, Onoue K, Takahashi-Tanaka Y et al. Nu- pathic dilated cardiomyopathy. Circulation clear accumulation of androgen receptor in 2010;121(10):1216-26 gender difference of dilated cardiomyopathy Goldfarb LG and Dalakas MC. Tragedy in a heart- due to lamin A/C mutations. Cardiovasc Res. beat: malfunctioning desmin causes skele- 2013;99(3):382-94 tal and cardiac muscle disease. J Clin Invest Ashley EA, Butte AJ, Wheeler MT et al. Clinical 2009;119:1806-13 assessment incorporating a personal genome. Guo W, Schafer S, Greaser ML et al. RBM20, a gene Lancet 2010;375:1525-35 for hereditary cardiomyopathy, regulates titin Carvalho FS, Burgeiro A, Garcia R et al. Doxorubi- splicing. Nat Med 2012;18(5):766-73 cin-induced cardiotoxicity: from bioenergetic Gupta P, Bilinska ZT, Sylvius N et al. Genetic and ul- failure and cell death to cardiomyopathy. Med trastructural studies in dilated cardiomyopathy Res Rev 2014;34:106-35 patients: a large deletion in the lamin A/C gene is Colucci WS, Kolias TJ, Adams KF et al. Metopr- associated with cardiomyocyte nuclear envelope olol reverses left ventricular remodeling in disruption. Basic Res Cardiol 2010;105:365-377 patients with asymptomatic systolic dysfunc- Hein S, Scholz D, Fujitani N et al. Altered expres- tion: the REversal of VEntricular Remodeling sion of titin and contractile proteins in fail- with Toprol-XL (REVERT) trial. Circulation ing human myocardium. J Mol Cell Cardiol 2007;116(1):49-56 1994;26(10):1291-306 Duarri A, Nibbeling E, Fokkens MR et al. The L450P Herman DS, Lam L, Taylor MR et al. Truncations of mutation in KCND3 brings spinocerebellar titin causing dilated cardiomyopathy. N Engl J ataxia and Brugada syndrome closer together. Med 2012;366(7):619-628 Neurogenetics 2013;14(3-4):257-8 Hishiya A, Salman MN, Carra S et al. BAG3 direct- CHAPTER 5 Emilsson L, Andersson B, Elfström P et al. Risk of ly interacts with mutated alphaB-crystallin to idiopathic dilated cardiomyopathy in 29 000 suppress its aggregation and toxicity. PLoS One patients with celiac disease. J Am Heart Assoc 2011;6(3):e16828 2012;1(3).001594 Huynh K, Bernardo BC, McMullen JR et al. Diabetic Fairweather D, Cooper LT Jr, Blauwet LA. Sex cardiomyopathy: Mechanisms and new treat- and gender differences in myocarditis and ment strategies targeting antioxidant signaling dilated cardiomyopathy. Curr Probl Cardiol pathways. Pharmacol Ther 2014;142(3):375-415 2013;38(1):7-46 Kirby K, Hu J, Hilliker AJ et al. RNA interfer- Gerull B, Gramlich M, Atherton J et al. Mutations ence-mediated silencing of Sod2 in Drosophila of TTN, encoding the giant muscle filament tit- leads to early adult-onset mortality and elevat- in, cause familial dilated cardiomyopathy. Nat ed endogenous oxidative stress. Proc Natl Acad Genet 2002;30(2):201-4 Sci USA 2002;99(25):16162-67 Gianni D, Li A, Tesco G et al. Protein aggregates Kircher M, Witten DM, Jain P et al. A general and novel presenilin gene variants in idio- framework for estimating the relative patho-

DISCUSSION AND FUTURE PERSPECTIVES 247 genicity of human genetic variants. Nat Genet Posafalvi A, Herkert JC, Sinke RJ et al. Clinical 2014;46:310-5 Utility gene card for: dilated cardiomyopathy Knollmann BC: Induced pluripotent stem cell-de- (CMD). Eur J Hum Genet 2012; doi:10.1038/ rived cardiomyocytes – Boutique Science ejhg.2012.276 or valuable arrhythmia model? Circ Res Ravanelli AM & Klingensmith J: The actin nu- 2013;112:969-976 cleator Cordon-bleu is required for develop- Ky B, Vejpongsa P, Yeh ET et al. Emerging para- ment of motile cilia in zebrafish. Dev Biol digms in cardiomyopathies associated with 2011;350(1):101-11 cancer therapies. Circ Res 2013;113:754-64 Sanbe A, Osinska H, Saffitz JE et al. Desmin-re- Li Y, Huang TT, Carlson EJ et al. Dilated cardiomy- lated cardiomyopathy in transgenic mice: opathy and neonatal lethality in mutant mice a cardiac amyloidosis. Proc Natl Acad Sci USA lacking manganese superoxide dismutase. Nat 2004;101:10132-6 Genet 1995;11:376-81 Sliwa K, Blauwet L, Tibazarwa K et al. Evalua- Li Mura IE, Bauce B, Nava A et al. Identification of tion of bromocriptine in the treatment of a PKP2 gene deletion in a family with arrhyth- acute severe peripartum cardiomyopathy: mogenic right ventricular cardiomyopathy. Eur a proof-of-concept pilot study. Circulation J Hum Genet 2013;21:1226-31 2010;121(13):1465-73 Limphong P, Zhang H, Christians E et al. Modeling Sinnecker D, Goedel A, Laugwitz KL et al. Induced human protein aggregation cardiomyopathy pluripotent stem cell-derived cardiomyocytes using murine induced pluripotent stem cells. – A versatile tool for arrhythmia research. Circ Stem Cells Transl Med 2013;2(3):161-6 Res 2013;112:961-968 Matsumoto Y, Hayashi T, Inagaki N et al. Function- Teekakirikul P, Kelly MA, Rehm HL et al. Inherit- al analysis of titin/connectin N2-B mutations ed cardiomyopathies: molecular genetics and found in cardiomyopathy. J Muscle Res Cell clinical genetic testing in the postgenomic era. Motil 2005;26:367-74 J Mol Diagn 2013;15(2):158-170 Menke DB: Engineering subtle targeted mutations into van der Zwaag PA, van Rijsingen IA, de Ruiter R the mouse genome. Genesis 2013;51(9):605-18 et al. Recurrent and founder mutations in the Meyer S, van der Meer P, van Tintelen JP et al. Sex Netherlands-Phospholamban p.Arg14del mu- differences in cardiomyopathies. Eur J Heart tation causes arrhythmogenic cardiomyopathy. Fail. 2014;16(3):238-47 Neth Heart J 2013;21(6):286-93 Meyer T, Pankuweit S, Richter A et al. Detection of Van Hove JL, Freehauf C, Miyamoto S et al. Infantile a large duplication mutation in the mysin-bind- cardiomyopathy caused by the T14709C muta- ing protein C3 gene in a case of hypertrophic tion in the mitochondrial tRNA glutamic acid cardiomyopathy. Gene 2013;527:416-20 gene. Eur J Pediatr 2008;167(7):771-6 Norton N, Li D, Rieder MJ et al. Genome-wide Winter L & Wiche G: The many faces of plectin and studies of copy number variation and exome plectinopathies: pathology and mechanisms. sequencing identify rare variants in BAG3 as Acta Neuropathol 2013;125(1):77-93 a cause of dilated cardiomyopathy. Am J Hum Winter L, Staszewska I, Mihailovska E et al. Chemi- Genet 2011;88(3):273-82 cal chaperone ameliorates pathological protein Octavia Y, Tocchetti CG, Gabrielson KL et al. Doxo- aggregation in plectin-deficient muscle. J Clin rubicin-induced cardiomyopathy: from molec- Invest 2014;124(3):1144-57) ular mechanisms to therapeutic strategies. J Xu X, Meiler SE, Zhong TP et al. Cardiomyopathy Mol Cell Cardiol 2012;52:1213-25 in zebrafish due to mutation in an alternatively Palojoki E, Kaartinen M, Kaaja R et al. Pregnancy spliced exon of titin. Nat Genet 2002;30:205-9 and childbirth in carriers of the lamin A/C- gene mutation. Eur J Heart Fail 2010;12:630-3 Pankuweit S, Ruppert V, Jónsdóttir T et al. The HLA class II allele DQB1 0309 is associated with di- lated cardiomyopathy. Gene 2013;531(2):180-3 Perrin MJ, Angaran P, Laksman Z et al. Exercise testing in asymptomatic gene carriers exposes a latent electrical substrate of arrhythmogen- ic right ventricular cardiomyopathy. J Am Coll Cardiol 2013;62:1772-9

248 DISCUSSION

SUMMARY SAMENVATTING MAGYAR NYELVŰ ÖSSZEFOGLALÓ

SUMMARY In my doctoral thesis several aspects of the genetic background of cardiomyopathy, an insidious, complex group of hereditary heart diseases, were examined. This disorder usually develops in adulthood and manifests with diverse symptoms. While some patients have shortness of breath, chest pain or oedema, others may suffer from arrhythmia, embolism, and other severe symptoms. A relatively rare but extreme sign of the disease is sudden cardiac death, which most often occurs in athletes and football players. Although there are many environmental factors and/or other diseases (including muscular abnormalities, hormonal changes, some types of chemo- therapeutic drugs, pregnancy-related cardiovascular challenges, alcoholism and drug abuse) known to cause or trigger cardiomyopathy, certain genetic factors also increase susceptibility to the disorder. Presently, we know of about 75 genes which, when mutated, play a role in the molecular pathomechanism and onset of cardiomyopathy. However, a significant subset of these genes have only been studied in limited numbers of patients and mostly in a research set- ting. The pathogenicity of the respective mutations is often based on in silico predictions but not yet supported by functional proof. Despite the known heterogeneity of disease, some genes were only studied within the context of specific cardiomyopathy subtypes, and there are still a considerable number of patients or families whose phenotypes cannot be explained by having mutations in these genes. Therefore, the studies in this thesis aimed to (1) provide a better understanding of the genetic background and the molecular pathomecha- nism of familial cardiomyopathies, (2) identify novel disease genes in unsolved families, and (3) improve existing methods of molecular diagnostic testing. The preface provides an easy-to-understand general introduction to cardiomyopathies and the challenges of the related genetic research. Chapter 1 provides a more detailed, scientific introduction to the field of cardiogenetics. It reviews congenital and late-onset inherited heart diseases, categorizes the genes involved in different types of heritable heart diseases, and thoroughly describes those research methods with a great potential for future diagnostic application in cardiovascular diseases.

In the studies reported in chapter 2, the classical candidate gene SUMMARY screening approach via the traditional method of Sanger sequencing was applied to study the involvement of candidate genes in disease development of two cardiomyopathy subtypes: dilated cardiomyopathy (DCM) and arrhythmogenic right ventricular cardiomyopathy (ARVC).

SUMMARY 253 In chapter 2.1, we report on our studies of the role of the DCM-related gene RNA-binding motif protein 20 (RBM20) in Dutch patients. We identified five known mutations of the arginine-serine(RS)-rich domain, and 18 novel missense variants. In total, 10 variants were classified as likely pathogenic or pathogenic. We then performed a functional follow-up of ten ‘interesting’ variants by using an in-house-developed splicing assay to study the transcripts produced by one of the recently identified heart-specific RBM20 targets, LDB3. Unfortunately, our results could not confirm differential splicing of LDB3 in HEK293 cells transfected with either wild type or mutant RBM20 encoding plasmids, neither could these studies support evaluation of the potential pathogenicity of variants identified outside of the RS-rich domain. Interestingly, two of our RBM20 mutation-carrying families manifest DCM in combination with peripartum cardiomyopathy (PPCM). This novel observation is not completely unexpected, since RBM20 is known to control splicing of the titin gene (TTN), and, in another study reported in this thesis, we have shown that mutations in this gene are the frequent underlying cause of familial PPCM/DCM (see details in chapter 4.2). Our findings suggests that abnormal titin isoform composition could be an essential mutual pathway leading to PPCM in both TTN and RBM20 mutation carriers. In chapter 2.2, the screening of the plectin gene (PLEC) as a novel desmo- some-related candidate gene for ARVC is reported. Though plectin was formerly shown to be an essential component of the desmosomes and hemidesmosomes in skin, muscle and heart, and was known for a decade to carry homozygous truncating/frameshift mutations in blistering skin diseases with muscular involvement, we were the first to investigate its potential disease-causing role in cardiomyopathy. We identified numerous missense variants in patients, and compared the patient-related variation with the general variation in PLEC in the Genome of the Netherlands control cohort. We identified one region of PLEC rich in mostly novel variants with high predicted pathogenicity level in ARVC patients in both the Dutch and British patient cohort that shows a “variant desert” in controls. This region is localized in the homo-dimerizing ROD domain, underscoring the particular importance of the mechanical resistance of this cytolinker protein and suggesting a role for mutations in this domain in disease progression. In conclusion, missense variants (in particular the ones located in this region) might play a risk-factor-role in the oligogenic background of the disease, contributing to the various genetic and non-genetic factors that can then exceed the threshold needed for the manifestation of ARVC.

254 SUMMARY In chapter 3, we describe our studies utilizing the novel exome sequencing (ES) technique in order to identify novel cardiomyopathy genes (mostly private mutations of unsolved families, who were formerly screened for known disease genes). We performed ES in 12 families suffering from autosomal dominant cardiomyopathy and report on the results in chapter 3.1. We identified the potentially causal genetic variations in 6/12 families in the TTN (in two families), FHL2, FLNC, COBL, and STARD13 genes. Importantly, earlier functional studies of the encoded proteins support their putative involvement in heart disease (though to different extents). Additionally, by evaluating the potential co- expression of these genes using a large expression array database, we found that all these genes are interconnected by a complex network of co-expressed genes. This network of 166 proteins contains 28 genes that are already known to be associated with cardiomyopathy. Furthermore, 100 of them are listed in the Cardiovascular Gene Ontology Annotation database, suggesting a potential cardiac function and providing an excellent basis for genetic and functional follow-up studies. It is interesting to note that many of these proteins are involved in the sarcomeric pathway that is known to be the most prominent one involved in the molecular pathomechanism of several subtypes of cardiomyopathy. We have investigated a consanguineous family with a child who passed away due to severe DCM a few days after birth and show the results in chapter 3.2. The nature and severity of the symptoms suggested a mitochondrial disease. Applying ES in combination with homozygosity mapping, we identified a homozygous missense mutation affecting the Mn-binding pocket of a mitochondrial protein encoded by the autosomal superoxide-dismutase gene SOD2, which is located in the longest autosomal homozygous region on chromosome 6. The absence of the gene was previously shown to lead to DCM in knock-out mice, but had not yet been found to underlie the disease in humans. Here, we confirmed the accumulation of oxygen radical substrates via functional experiments performed on fibroblast samples of the deceased patient, while excluding dysfunction of the mitochondrial respiratory chain complexes. Excitingly, the same pathway of SOD2-dependent accumulation of oxygen radicals has been known for about 20 years to be involved in SUMMARY the pathomechanism of cardiomyopathy that arises as a complication of anthracycline chemotherapy in cancer patients. We present a short report on the unusual case of a distantly consangui- neous family with several patients affected by two distinct cardiomyopathy

SUMMARY 255 subtypes (late-onset DCM and neonatal DCM) and having three different genes (MYL2, SOD2, and JUP) underlying their diseases in chapter 3.3. The case presented is an excellent example of how essential genealogical linking and pedigree construction are for the proper interpretation of genetic findings, subsequent counselling and individual genetic follow-up studies in families affected by different forms of cardiomyopathies. In chapter 4, we demonstrate the application of next generation sequencing in routine diagnostic screenings. In order to do so, we applied a different technical approach when compared to that described in chapter 3. Instead of capturing and enriching for the whole exome, we performed targeted enrichment for a set of already known, previously published cardiomyopathy disease genes during the sample preparation. This minimized both the costs of the NGS experiments and the time needed for the interpretation of the results, while the coverage is highly optimized for standardized screening. An additional advantage of this targeted enrichment method is that the longest gene of the entire human genome, TTN, which has been known for a decade or two to be involved in DCM but which was, in the past, too large to be routinely screened for using Sanger sequencing, could now also be included in the enrichment kit. In chapter 4.1, we report on the quantitative advantages of this method complemented with a rigorous variant classification system over the ‘old- fashioned’ Sanger-sequencing in diagnostics: we have managed to solve 107/206 (52%) index cases of different types of cardiomyopathies with the help of this ‘cardiomyopathy panel’-based approach, and the yield was especially high for patients with DCM and DCM-like phenotypes. In our sample 30/206 (15%) patients had multiple mutations detected, pinpointing the importance of broadening the spectrum of inherited cardiomyopathies from a classical Mendelian inherited disease towards a more oligogenic disorder. Finally, in at least half of the cases the mutations were identified in genes that would not have been selected for candidate screening in the earlier era when such decisions were based on the phenotype of the patient and the low or unknown frequencies of mutations in those genes. In chapter 4.2, we report on solving the genetic cause of 10/18 PPCM families via identifying (likely) pathogenic variants in one or more of the 48 sequenced known (mostly dilated) cardiomyopathy genes. Our results support the former, phenotype-based hypothesis that PPCM is not an in- dependent subtype of cardiomyopathy, but rather a pregnancy-related

256 SUMMARY manifestation of DCM, since these two ‘types’ of the disease also exhibit considerable overlap in their genetic background. Interestingly, seven of the ten families in which a (likely) pathogenic variant was identified carry the mutation in the TTN gene. Drastically decreased passive force development in single cardiomyocytes, as well as the switch in titin isoform composition measured in explanted heart tissue of one of these patients, support upgrading of the classification of (at least) the p.K15664Vfs*13 truncating variant to the pathogenic mutation category. Taken together, our results suggest a prominent role for TTN mutations in the development of PPCM. In chapter 5 the tremendous technical advances influencing cardiogenetics during the past few years are discussed, the current workflow of routine diagnostics and research of cardiomyopathies in our department is illustrated, and many unsolved questions and potential future directions in this field of research are addressed.

LAY SUMMARY Cardiomyopathy is an insidious disease of the heart that can cause mild symptoms like dizziness and chest pain, but might result in irregular heart rhythm, and sometimes even heart failure or sudden cardiac death without any previous warning sign. This disease is largely influenced by heritable factors, and our aim was to gain more insight into the genetic causes of the disease. We applied various DNA sequencing methods (determining the genetic code of certain genes or regions) to identify novel mutations in known genes as well as novel disease genes in currently unexplained families with a repeated history of the disease. We have shown that the production of incorrect forms of an important building block of the heart muscle machinery leads to pregnancy-related cardiomyopathy, while variants in the rod domain of a novel gene may increase the fragility of the protein complex that “glues” together neighbouring cardiac muscle cells, leading to arrhythmogenic cardiomyopathy. Moreover, in a newborn patient we identified increased oxygen radical levels due to a mutation to be the likely cause of the disease. From a diagnostic point of view, our panel of 55 known disease genes can be easily and reliably sequenced using our targeted sequencing method, which SUMMARY LAY results in much improved diagnostic yield, facilitating the screening of healthy looking family members of patients to identify those who have a risk to develop the disease too.

LAY SUMMARY 257 NEDERLANDSE SAMENVATTING In dit proefschrift heb ik verschillende aspecten van de genetische achtergrond van cardiomyopathie onderzocht, een groep van erfelijke hartafwijkingen. Cardiomyopathie ontwikkelt zich meestal op volwassen leeftijd, en kan zich uiten met verschillende symptomen. Sommige patiënten hebben kortademigheid, pijn op de borst of oedeem, anderen hebben last van hartritmestoornissen, embolieën, of andere ernstige symptomen. Een zeldzame maar extreme uiting is plotse hartdood, wat meestal voorkomt bij atleten of voetballers. Verschillende factoren kunnen cardiomyopathie veroorzaken. Naast om- gevingsfactoren en andere ziektes (spierafwijkingen, hormonale veranderin- gen, chemotherapie, zwangerschapsgerelateerde hart- en vaatproblemen, alcoholisme en drugsgebruik) die de ziekte kunnen veroorzaken, spelen ge- netische (erfelijke) factoren hierbij ook een belangrijke rol. Op dit moment zijn er ’ongeveer 75 genen bekend waarvan duidelijk is dat mutaties (foutjes) in deze genen een rol kunnen spelen bij het ontstaan en ziektemechanisme van cardiomyopathie. Het effect van veel van deze genen is vaak enkel bestudeerd bij een kleine groep patiënten, en meestal alleen binnen wetenschappelijke onderzoeksprojecten. Hoe schadelijk de mutaties zijn voor de functie van de genen is vaak slechts gebaseerd op voorspellingen met de computer, maar nog niet ondersteund door bewijs uit functionele studies. Ondanks dat bekend is dat de ziekte genetisch sterk heterogeen is (een groot aantal genen kunnen potentieel bij de ziekte betrokken zijn), is de rol van sommige genen alleen nog maar bestudeerd in de context van één type cardiomyopathie. Bovendien, zijn er nog veel patiënten en families bij wie de ziekte niet verklaard wordt door mutaties in de nu bekende cardiomyopathie genen. Het onderzoek in dit proefschrift had daarom tot doel om: (1) ons inzicht te vergoten in de genetische achtergrond en de oorzaken van erfelijke cardiomyopathieën, (2) nieuwe genen te ontdekken in families waarvan nog niet bekend was welke mutaties hun ziekte veroorzaken en (3) de bestaande methodes van moleculaire diagnostiek (het stellen/bevestigen van de juiste diagnose op basis van DNA-onderzoek) te verbeteren. De “preface” is een algemene introductie over cardiomyopathieën en de uitdagingen van het genetische onderzoek naar deze ziektes. Hoofdstuk 1 is een meer gedetailleerde, wetenschappelijke introductie van het veld van de cardiogenetica. Hierin wordt een overzicht gegeven van aangeboren en verworven erfelijke hartafwijkingen en worden de genen die betrokken zijn bij verschillende soorten erfelijke hartziektes gegroepeerd. Ook wordt

258 SAMENVATTING een uitgebreide beschrijving gegeven van recent beschikbaar gekomen en veelbelovende onderzoeksmethoden die gebruikt kunnen worden voor (toekomstige) diagnostiek bij erfelijke hartaandoeningen. In hoofdstuk 2 zijn studies beschreven waarbij kandidaatgenen (genen waarvan vermoed wordt dat zij een rol spelen bij cardiomyopathieën) op mogelijke mutaties onderzocht zijn op de “klassieke” manier: via een techniek die Sanger Sequencing heet. Hierbij werd de rol van deze kandidaatgenen bij twee types cardiomyopathie onderzocht: Dilaterende Cardiomyopathie (DCM) en Aritmogene Rechter Ventrikel Cardiomyopathie (ARVC). In hoofdstuk 2.1 is de rol van het RNA-binding motif protein 20 (RBM20) gen, waarvan betrokkenheid bij DCM eerder beschreven was , bij Nederlandse DCM patiënten bestudeerd. Hierbij hebben we vijf al eerder beschreven mutaties in het Arginine-Serine (RS)-rijke domein gevonden en 18 nog niet eerder gerapporteerde missense mutaties, zowel in als buiten dit RS-rijke domein. Van deze 23 mutaties, werden in totaal 10 als pathogeen of waarschijnlijk pathogeen (schadelijk) beoordeeld. Vervolgens hebben we 10 variaties/mutaties verder bestudeerd met behulp van een zelf ontwikkelde methode (een splicing assay genoemd). Hierbij hebben we gekeken of deze variaties/mutaties invloed hebben op de samenstelling van transcripten afkomstig van het LDB3 gen, dat één van de genen is waarvan de variërende RNA splicing door RBM20 gereguleerd wordt. Helaas waren we niet in staat om middels deze methode verschillen in aanwezigheid van wild type of gemuteerd RBM20 aan te tonen. Hierdoor was het ook niet mogelijk om bewijs te vinden voor een mogelijke rol van nieuwe, buiten het RS-rijke domein gevonden variaties. In twee families met mutaties in het RMB20 gen werd DCM in combinatie met peripartum cardiomyopathie (PPCM) aangetoond. Omdat bekend is dat RMB20 ook bij de variërende splicing van het gen titine (TTN) betrokken is, en we weten dat mutaties in TTN vaak de onderliggende oorzaak zijn van familiaire PPCM/DCM (zie hoofdstuk 4.2) is dit niet onverwachts. Onze bevindingen suggereren dat een afwijkende samenstelling van titine isovormen de onderliggende oorzaak is van de ontwikkeling van PPCM bij dragers van mutaties in zowel TTN als RBM20. In hoofdstuk 2.2 beschrijven we ons onderzoek naar het plectine gen (PLEC), als mogelijk nieuw kandidaatgen voor ARVC. Hoewel eerder aangetoond SAMENVATTING is dat plectine een essentiële rol speelt in desmosomen en hemidesmosomen in de huid, spier en het hart en het al langere tijd bekend is dat homozygote en “compound” heterozygote, truncerende mutaties in dit gen blaarziektes met spierproblemen veroorzaakten, waren wij de eersten die een mogelijke

SAMENVATTING 259 ziekteveroorzakende rol van dit gen bij cardiomyopathie onderzochten. We hebben in Nederlandse en Britse patiënten een groot aantal missense variaties in dit gen gevonden, en hebben de variatie in patiënten vergeleken met de variatie die voorkomt bij gezonde mensen uit het “Genoom van Nederland” cohort. Hierbij bleek er sprake te zijn van de verrijking van variaties, die als waarschijnlijk pathogeen geclassificeerd worden op basis van “in silico” voorspellingen, in één gebied van het PLEC gen, in vergelijking tot de variaties in dit gebied bij de gezonde controles. Bij de laatste is er in dit gebied vrijwel geen variatie aanwezig. Dit gebied bevindt zich in het zogenaamde ROD domein dat belangrijk is voor homodimerisatie van het PLEC eiwit. We denken dat mutaties in dit domein een rol kunnen spelen in de ziekte ARVC en de mechanische resistentie van het eiwit aantast. Samenvattend, suggereren onze resultaten dat mutaties in PLEC een risicofactor kunnen zijn die een rol spelen bij de oligogene basis van ARVC, en kunnen bijdragen aan de verschillende genetische en niet-genetische factoren die ervoor zorgen dat iemand de ziekte ontwikkelt. In hoofdstuk 3 beschrijven we het gebruik van de nieuwe techniek exome sequencing (ES) om nieuwe genen, betrokken bij cardiomyopathie, te identificeren – waarbij voornamelijk mutaties gevonden werden die uniek zijn voor de onderzochte families, die eerder zonder resultaat gescreend waren voor bekende ziektegenen. We hebben deze techniek toegepast bij 12 families waarbij een erfelijke vorm van cardiomyopathie voorkwam en de resultaten hiervan staan beschreven in hoofdstuk 3.1. In 6 van deze 12 families hebben we genetische variaties gevonden die waarschijnlijk de ziekte veroorzaken, te weten in de genen TTN (in 2 families), FHL2, FLNC, COBL en STARD13). Eerder functioneel onderzoek maakte al duidelijk dat deze genen betrokken zouden kunnen zijn bij erfelijke hartziektes. Daarnaast hebben we de gegevens van een grote gen expressie database gebruikt om aan te tonen dat deze genen onderdeel zijn van een complex netwerk van genen die gezamenlijk tot expressie komen. Dit suggereert dat ze een gezamenlijke functie kunnen hebben. Het netwerk, bestaande uit 166 genen, bevat 28 genen waarvan al bekend is dat ze betrokken zijn bij cardiomyopathieën. Ook zijn 100 van deze genen beschreven in de Cardiovascular Gene Ontology Annotation database. Dit wijst er dus op dat de genen in dit netwerk een potentiële rol spelen bij een normale hartfunctie. Dit netwerk van genen vormt dan ook een mooie basis voor verdere genetische en functionele vervolg studies. Bovendien, viel het ons op dat een aanzienlijk deel van deze genen betrokken zijn bij het functioneren van het sarcomeer,

260 SAMENVATTING de structuur waarin mutaties een prominente rol spelen bij het moleculaire ziektemechanisme bij verschillende cardiomyopathie subtypes. Ook werd een familie bestudeerd waarin een kind met consanguine ouders een paar dagen na de geboorte overleed aan een ernstige vorm van DCM. De resultaten van deze studie staan beschreven in hoofdstuk 3.2. De ernst van en het type symptomen suggereerden dat het om een mitochondriële ziekte zou gaan. Door exome sequencing in combinatie met “homozygosity mapping” te gebruiken, hebben we een homozygote missense mutatie geïdentificeerd, gelegen in het langste, autosomale homozygote gebied in het DNA van de patiënt en gelocaliseerd op chromosoom 6. Deze mutatie beïnvloedt het Mn- bindende domein van het mitochondriële eiwit gecodeerd door het superoxide dismutase gen SOD2. Het was al bekend dat het ontbreken van dit gen in knock- out muizen leidt tot DCM; dat dit gen ook de humane ziekte kon veroorzaken was nog niet bekend. Wij hebben, door cellen van de overleden patiënt te bestuderen op accumulatie van zuurstof radicalen en het correct functioneren van de mitochondriële ademhalingsketen, bevestigd dat de functie van het gen inderdaad verstoord was bij deze patiënt. Dit is ook een interessante waarneming met het oog op de rol van SOD2-afhankelijke accumulatie van zuurstofradicalen die al jaren geleden beschreven is bij het ontstaan van cardiomyopathie als bijwerking van anthracycline-gebaseerde chemotherapie bij kanker patiënten. In hoofdstuk 3.3 beschrijven we het ongebruikelijke geval van een consanguine familie met meerdere patiënten met twee verschillende vormen van cardiomyopathie (volwassen DCM en neonatale DCM). We laten zien dat er drie verschillende genen bij hun ziekte betrokken zijn (MYL2, SOD2 en JUP). Deze casus is een goed voorbeeld van hoe genealogische koppeling en stamboom reconstructie gebruikt kunnen worden voor de juiste interpretatie van genetisch resultaten, en hoe deze kennis kan helpen bij goede voorlichting van en genetische vervolgstudies bij families waarin verschillende vormen van cardiomyopathie voorkomen. In hoofdstuk 4 demonstreren we hoe de next generation sequencing (NGS) techniek gebruikt kan worden in de routine diagnostiek. Hiervoor hebben we een andere aanpak gebruikt dan beschreven in hoofdstuk 3. In plaats van het verrijken en bestuderen van álle genen hebben we de gerichte SAMENVATTING verrijking gebruikt van een set genen, waarvan bekend was dat ze betrokken zijn bij cardiomyopathie. Hierdoor werden de kosten behoorlijk lager, en konden de resultaten sneller geïnterpreteerd worden; terwijl dankzij de hoge horizontale en goede verticale dekking resultaten van hoge kwaliteit verkregen

SAMENVATTING 261 konden worden. Een additioneel voordeel van de inzet van NGS was dat de inclusie van het TTN gen, het langste gen in het hele humane genoom en te groot om routine-matig te screenen met Sanger sequencing, mogelijk werd. Hierdoor kan ook dit gen, waarvan bekend is dat het betrokken is bij DCM, meegenomen worden bij de routine diagnostiek. De voordelen van deze methode ten opzichte van de ‘ouderwetse’ techniek (Sanger sequencing) worden beschreven in hoofdstuk 4.1. Met deze “gen panel”- gebaseerde aanpak is het ons gelukt om 107/206 (52%) index patiënten met cardiomyopathie op te lossen (d.w.z.: het bijbehorende gen te vinden). De diagnostische opbrengst was voornamelijk hoog bij patiënten met klinisch bewezen DCM of patiënten die verdacht zijn voor DCM. Bovendien werd bij 30/206 (15%) patiënten meer dan één (waarschijnlijk) ziekteverwekkende mutatie gedetecteerd. Deze observatie ondersteunt de eerdere suggesties van het bestaan van een spectrum van klassiek Mendeliaanse tot oligogeen overervende cardiomyopathieën. In ten minste de helft van de patiënten werden mutaties gevonden in genen die in het verleden niet onderzocht zouden zijn met Sanger sequencing, toen beslissingen voor genetische screening gebaseerd werden op de symptomen van de patiënt en de gerapporteerde frequentie van mutaties in genen. In hoofdstuk 4.2 beschrijven we hoe we de genetische oorzaak van de ziekte bij 10/18 PPCM-families vinden door het zoeken naar schadelijke varianten in één of meer van de 48 bekende genen die betrokken zijn bij (met name dilaterende) cardiomyopatie. Onze resultaten bevestigen het idee dat PPCM geen subtype van cardiomyopathie is, maar een zwangerschap-gerelateerde vorm van DCM, aangezien deze twee ‘types’ van de ziekte ook een significante overlap vertonen in hun genetische achtergrond. Daarnaast werden in zeven van de tien families waarin een schadelijke variant geïdentificeerd is, een mutatie in het TTN gen gevonden. In hartweefsel van één van deze patiënten werd inderdaad een andere samenstelling van de verschillende isovormen van titine gevonden en bleek het functioneren van dit eiwit verstoord. Onze resultaten suggereren dat mutaties in TTN een belangrijke rol spelen bij de ontwikkeling van PPCM. In hoofdstuk 5 wordt een overzicht van de enorme technische vooruitgang van de aflopen jaren gegeven, en de invloed daarvan op de cardiogenetica beschreven. Ook illustreren we het gebruik van NGS methodes binnen onze afdeling in de routine-diagnostiek en bij het wetenschappelijk onderzoek naar de genetische achtergrond van cardiomyopathieën, en worden vele onbeantwoorde vragen en mogelijke toekomstige onderzoekslijnen besproken.

(Translated by Dr Eva Teuling)

262 SAMENVATTING MAGYAR NYELVŰ ÖSSZEFOGLALÓ Doktori disszertációmban az örökletes cardiomyopathiák genetikai hátterét vizsgáltam. Ez a betegség (ami szó szerint átfogóan csupán annyit jelent: szívizomrendellenesség) leggyakrabban felnőtt korban jelentkezik változatos és eltérő súlyosságú tünetekkel - egyeseknél szédülés, nehézlégzés, ödéma és mellkasi fájdalom jelzi, míg mások súlyos szívritmuszavaroktól és thromboembolizációtól is szenvedhetnek. A betegség ritka és szélsőséges megnyilvánulási formája a futball- és jégkorongjátékosok réme, a hirtelen szívhalál. Ugyan számos környezeti tényező és egyéb betegség (mint például izombetegségek és hormonális változások, egyfajta kemoterápiás kezelés, terhesség során felmerülő keringési nehézség, alkoholizmus és egyes drogok használata) is kiválthatja vagy gyorsíthatja a betegség progresszióját, genetikai tényezők is hajlamosíthatnak cardiomyopathiára. Pillanatnyilag mintegy 75 olyan gént ismerünk, amelynek bizonyos módosulatai szerepet játszanak a betegség molekuláris mechanizmusában és kialakulásában. Sajnos ezen gének jelentős hányadát eddig kis esetszámban tanulmányozták a kutatók. A felfedezett mutációk valódi betegségkiváltó hatását gyakran nem erősítik meg funkcionális kísérletek, azt pusztán számítógépes predikciók alapján feltételezhetjük. A betegség rendkívüli összetettsége és sokszínűsége ellenére egyes gének szerepét eddig csak bizonyos cardiomyopathia típusok eseteiben vizsgálták, továbbá a betegek és családok jelentős hányadában a fenotípus kialakulását nem magyarázza az eddig ismert gének egyikének potenciális eltérése sem. Éppen ezért kutatásom célja volt (1) alaposabban feltárni a betegség örökletes formájának genetikai és molekuláris hátterét, (2) a megoldatlan, családi halmozódást mutató esetekben eddig nem ismert gének mutációinak azonosítása, illetve (3) a jelenleg használatos molekuláris genetikai, diagnosztikai módszerek továbbfejlesztése, hatásfokának javítása. A betegséget nagyvonalakban bemutató előszót követően az első fejezetben részletes irodalmi áttekintést adunk a kardiogenetikáról, csoportosítjuk a géneket aszerint, hogy a betegség mely típusában ismert mutációjuk, és részletesen leírjuk azokat a kutatási módszereket, amelyeket a szív- és érrendszeri betegségek diagnosztikájában akár a közeljövőben is lehetne sikeresen használni. ÖSSZEFOGLALÓ NYELVŰ MAGYAR A második fejezetben a hagyományos Sanger szekvenálási módszerrel vizsgáltuk nagy betegcsoportokban, hogy egyes – korábbi ismereteink alapján potenciálisan érdekesnek tűnő – gének szerepet játszhatnak-e a cardiomyopathia kialakulásában.

MAGYAR NYELVŰ ÖSSZEFOGLALÓ 263 A 2.1 cikkben egy néhány éve felfedezett dilatatív cardiomyopathia (DCM) gént, az RBM20-at (egy RNS-kötő fehérjét kódoló gént) szekvenáltuk holland betegekben, és számos olyan génváltozatot találtunk, amely egészséges kontrollokban nem megfigyelhető: 5 ismert mutációt az arginin-szerin (RS) gazdag doménben, valamint további 18 korábban ismeretlen missense (egy darab aminosavat másik aminosavra cserélő) variánst. Összesen 10 variánst találtunk pathogénnek (betegséget kiváltónak) vagy lehetségesen pathogénnek. Ezt követően nyomon követtük 10 „ígéretes” variáns hatását egy saját tervezésű splicing assay segítségével, mely módszer az egyik nemrég azonosított szívspecifikus RBM20-célpont, az LDB3 különböző hosszúságú transzkriptjeinek jelenlétén és arányán alapul transzfektált HEK293 sejtekben. Sajnos nem találtunk egyértelmű különbséget a vad típusú és mutáns RBM20- plazmidokat hordozó sejtek között, és nem sikerült igazolnunk az RS-gazdag doménen kívül elhelyezkedő variánsok esetleges pathogén mivoltát sem. Érdekes viszont, hogy az RBM20 mutációt hordozó családok közül kettőben is a DCM terhességi cardiomyopathiával (TCM) kombinálva jelenik meg. Ez a megfigyelés nem volt teljesen váratlan, hiszen az RBM20 által kódolt fehérje az RNS-molekulák megkötésében és átszabásában (splicing) játszik szerepet, és fő molekuláris célpontja a titin (TTN), amelynek mutációit a 4.2 fejezetben gyakran kulcsfontosságúnak találtuk örökletes TCM/DCM kialakulásában. Így valószínű, hogy a TTN izoformák arányának megbomlása a közös molekuláris út, amely TCM-hez vezethet mind TTN, mind RBM20 mutációt hordozó betegekben. A 2.2 fejezetben a plektin (PLEC) gént szekvenáltuk meg arrythmogén cardiomyopathiátiól (ACM) szenvedő betegek vérből kivont DNS-mintájában. Habár a plektin bőrben, izmokban és szívben játszott, szomszédos sejteket egymással összekapcsoló funkciója évtizedek óta ismert, a bőr felhólyagosodásával és a vázizmok elsorvadásával járó súlyos betegségben – az epidermolysis bullosában – ismertek frameshift (“kereteltolásos”) és nonsense (“csonkoló”) mutációi, mi voltunk az elsők, akik megkísérelték szívbetegséghez is kötni a PLEC genetikai változatait. Rengeteg missense típusú variánst találtunk a génben ACM-es betegekben, majd összevetettük ezen variánsok elhelyezkedését a gén általános variánsaival, amelyeket egészséges emberekben azonosítottak be a Genome of the Netherlands kohortban. Találtunk egy olyan PLEC szakaszt, amelyben több, korábban ismeretlen, vélhetően pathogén variáns halmozódott fel mind a brit, mind a holland beteg kohortban, ám „variáns sivatagnak” bizonyult kontrollokban.

264 MAGYAR NYELVŰ ÖSSZEFOGLALÓ Valószínűsíthető tehát, hogy a szakasz genetikai módosulatai a betegségben szerepet játszhatnak. Ez a szakasz a homodimerizációban szerepet játszó rod doménben helyezkedik el, és e kapocsként működő protein mechanikai ellenállásban betöltött különleges funkcióját sugallja. Összegezve, a PLEC gén missense variánsai (főként az említett régióban találhatók) hajlamosíthatnak az ACM-re, hozzájárulva a számos örökletes és környezeti tényezőhöz, amelyek összeadódva és egymást felerősítve a betegség oligogénes kialakulásához vezethetnek. A harmadik fejezetben exom szekvenálás modern technikáját alkalmaztuk. Exomnak nevezzük az összes gén exonjainak összességét, vagyis a (jelen esetben emberi) genom szétszórtan elhelyezkedő, fehérjekódoló DNS-darabkáinak összességét. A rendkívül ígéretes új módszer használatával célunk volt új cardiomyopathia gének felfedezése, elsősorban olyan családok szekvenálása révén, amelyeket korábban az már ismert génekre szűrve nem sikerült mutációt azonosítani. Először is (3.1) autoszomális domináns cardiomyopathiás családokban kutattunk a betegségben eddig nem ismert gének mutációi után. A valószínűleg pathogén genetikai módosulatot a vizsgált 12-ből 6 családban megtaláltuk a TTN (2x), FHL2, FLNC, COBL és STARD13 génekben. Ezekről korábbi tanulmányok alapján eltérő mennyiségű információ állt rendelkezésünkre, amely potenciálisan magyarázhatja az adott gének által kódolt fehérjék normál szívfunkcióban játszott szerepét (és mutációiknak betegségkiváltó hatását). Ezt követően koexpressziós hálózatot építettünk egy microarray adatbázis segítségével, vagyis hozzákapcsoltuk az 5 fent nevezett génhez azokat, amelyekkel egyidőben szokták egyes sejtekben, szövetekben fehérje termékeiket kifejezni. A hálózatban található 166 gén közül már 28 korábban is ismert volt cardiomyopathiákban, míg összesen 100 megtalálható egy génontológiai adatbázisban mint potenciális kardiovaszkuláris betegségért felelős gén. Ez a nagy arányú szívspecifitás, és a citoszkeletális molekuláris útvonal erős reprezentáltsága alátámasztja az 5 gén variánsainak cardiomyopathiával való vélhető összefüggését. Egy feltehetően 8 generációra visszamenőleg vérrokon házaspár születés után pár nappal, nagyon súlyos DCM-ben elhunyt gyermekének betegségét ÖSSZEFOGLALÓ NYELVŰ MAGYAR vizsgáltuk a 3.2 fejezetben. A tünetek típusa és súlyossága alapján elképzelhető volt, hogy a gyermek mitokondriális betegségben szenvedett. Exom szekvenálással találtunk egy homozigóta missense mutációt a SOD2 génben, amely egy mitokondriális funkciójú, oxigén szabadgyököket semlegesítő

MAGYAR NYELVŰ ÖSSZEFOGLALÓ 265 enzimet kódol. A gén a családban talált leghosszabb autoszomális homozigóta régióban, a 6. kromoszómán van kódolva, és a mutáció a génről átírt enzim fontos szerepű “mangán-kötő zsebének” szerkezetét torzíthatja el. A SOD2 gént már korábban egy egér modellen azonosították a DCM kiváltójaként (a SOD2 gén “kiütésével” létrehozott állatok cardiomyopathia tüneteit mutatták), de korábban még nem találták jelentősnek humán betegségekben. A betegtől vett, petri csészében kitenyésztett fibroblaszt sejtekben az oxigén szabadgyökök felhalmozódásával és a SOD2 enzimaktivitásának közvetett mérésével igazolódott, hogy az enzim nem funkcionál normálisan a mintában, míg egyéb mitokondriális enzimkomplexek funkcióképtelenségét kizártuk. Érdekes módon, a SOD2 hibájával összefüggő reaktív oxigéngyök felhalmozódás egy évtizedek óta ismert molekuláris pathológiai útvonal az antraciklin kemoterápiával kezelt betegeknél hosszútávú szövődményként jelentkező cardiomyopathiában. Egy rövid esetismertetés következik a 3.3 fejezetben: egy családot mutatunk be, amelyben a cardiomyopathia két elkülöníthető formája (a későn kezdődő és az újszülöttkori DCM) is jelen van, és három gént (MYL2, SOD2 és RYR2) találtunk, amelyek szerepet játszanak a betegség kialakulásában. Ez az eset annak jó példája, hogyan segítheti a genetikai információk helyes értelmezését és a személyre szabott tanácsadást a családfakészítés cardiomyopathiában szenvedő családok esetében. A negyedik fejezetben bemutatjuk az újgenerációs szekvenálás lehetséges rutindiagnosztikai használatát. Ehhez egy technikai módosítás szükséges: a teljes genom vagy teljes exom DNS-darabkáinak feldúsítása és szekvenálása helyett célzottan egy (ismert, szívbetegségekben szerepet játszó génekből álló) génkészletre koncentrálunk a mintaelőkészítés során. Ilyen módon a szekvenálás költségei éppúgy, mint a bonyolult elemzéshez szükséges idő is nagymértékben csökkenthető, miközben a szekvenált területek lefedettsége bőségesen megfelel a standardizált mérésekhez. Egy további előnye a célzott szekvenálási módszernek, hogy a humán genom leghosszabb génje, a titin (TTN), mely már évtizedek óta ismert a cardiomyopathia különböző típusaiban, de hossza miatt nagyon nehéz volt a korábbi módszerekkel megszekvenálni, szintén bekerülhetett a kiválasztott gének körébe. A 4.1 fejezetben ezen módszer, és a hozzá kapcsolódó szigorú variáns osztályozó rendszer előnyeit tárgyaljuk és hasonlítjuk a hagyományos Sanger szekvenáláshoz: 206 beteg genetikai szűrése során 107 esetben (52%) találtuk meg a cardiomyopathia genetikai okát a vizsgált 55 gén valamelyikében, és ez

266 MAGYAR NYELVŰ ÖSSZEFOGLALÓ az arány kiemelkedően magas volt a DCM és DCM-szerű fenotípusok esetében. 30/206 páciens (15%) esetében két vagy több mutációt is azonosítottunk, mely alátámasztja, hogy az örökletes cardiomyopathiák spektrumát a klasszikus mendeli helyett egyre inkább az oligogénes betegségek irányába lehet kitolni. Végezetül, a betegek legalább felében a mutációt olyan génben találtuk meg, amely a korábbi módszerek használatának korszakában „nem jött volna számításba”, amikor az ilyen jellegű döntéseket pusztán a beteg tüneteire és az adott gén alacsony vagy ismeretlen mutáció frekvenciájára alapozták. Ezzel szemben a 4.2-ben 10/18 terhességi cardiomyopathiában (TCM) szenvedő beteg esetét sikerült megoldanunk 48, főként DCM-ben ismert gén vizsgálatával. Ezen eredményeink megerősítik azt a korábbi hipotézist, mely szerint a rendkívül ritka TCM nem egy önálló altípusa a betegségnek, hanem a terhesség során fellépő hemodinamikai (keringési) változások mint előnytelen környezeti tényezők hatására örökletesen hajlamos egyéneknél már a szokásosnál korábbi életkorban megmutatkozó DCM-ről van szó. A szívbetegség ezen két formája, a TCM és a DCM tüneteiben is nagy hasonlóságot mutat, nemcsak átfedő genetikai hátterében. Érdekesség, hogy a 10-ből 7 sikeresen elemzett családban a TTN gén mutációja felelős a betegség kialakulásáért. Egy szívátültetésen átesett beteg szövetmintáján végzett TTN izoforma arányt és ezzel összefüggésben a szívizomsejt passzív erőkifejtését vizsgáló funkcionális kísérlet alátámasztja a kereteltolásos p.K15664Vfs*13 variáns pathogén osztályzását. Az ötödik, egyben utolsó fejezetben összegezve áttekintem e könyv tartalmát, egy folyamatábrán keresztül bemutatom a jelenleg tanszékünkön használt kardiogenetikai diagnosztika és kutatás lépéseit, az elmúlt néhány év technikai fejlődésének, ezen belül is főként az újgenerációs DNS szekvenálás hatását a terület fejlődésére, és említést teszek különböző további lehetséges kutatási irányvonalakról.

(Proof-read by Edit Posta and Dr Péter Mészáros) ÖSSZEFOGLALÓ NYELVŰ MAGYAR

MAGYAR NYELVŰ ÖSSZEFOGLALÓ 267 APPENDIX 2 List of authors and affiliations: UNIVERSITY OF GRONINGEN, UNIVERSITY MEDICAL CENTER GRONINGEN, GRONINGEN, THE NETHERLANDS Department of Genetics Ludolf G Boven Nicole Corsten-Janssen Jos Dijkhuis Johanna C Herkert Yvonne Hoedemaekers Robert MW Hofstra Jan DH Jongbloed Wilhelmina S Kerstjens-Frederikse Irene M van Langen Gerard J te Meerman Rowida Al Momani Renee C Niessen Anna Posafalvi Birgit Sikkema-Raddatz Richard J Sinke Karin Y van Spaendonck-Zwarts J Peter van Tintelen Cindy Weidijk Paul A van der Zwaag Department of Cardiology Rudolf A de Boer Peter van der Meer Maarten P van den Berg Dirk J van Veldhuisen Department of Dermatology Marieke C Bolling Marcel F Jonkman UNIVERSITY MEDICAL CENTER UTRECHT, UTRECHT, THE NETHERLANDS Department of Medical Genetics Jan G Post Jasper J van der Smagt Department of Pathology Peter GJ Nikkels Division of Heart and Lungs, Department of Cardiology Folkert W Asselbergs Judith A Groeneweg Richard NW Hauer DURRER CENTER FOR CARDIOGENETIC RESEARCH, UTRECHT, THE NETHERLANDS J Peter van Tintelen RADBOUD UNIVERSITY MEDICAL CENTRE, NIJMEGEN, THE NETHERLANDS Department of Cardiology Bert Baars

268 APPENDIX 2 Department of Human Genetics Carlo L Marcelis NIJMEGEN CENTRE FOR MITOCHONDRIAL DISORDERS Department of Biochemistry Peter Willems Department of Paediatrics Richard J Rodenburg LEIDEN UNIVERSITY MEDICAL CENTER, LEIDEN, THE NETHERLANDS Department of Cardiology Sebastiaan RD Piers Katja Zeppenfeld Department of Clinical Genetics Daniela QCM Barge-Schaapveld ANTONIUS HOSPITAL, SNEEK, THE NETHERLANDS Department of Cardiology Paul L van Haelst ACADEMIC MEDICAL CENTER, UNIVERSITY OF AMSTERDAM, AMSTERDAM, THE NETHERLANDS Heart Center, Department of Cardiology Arthur AM Wilde Department of Genetics Mariëlle Alders Imke Christiaans Karin Y van Spaendonck-Zwarts VU UNIVERSITY MEDICAL CENTER, AMSTERDAM, THE NETHERLANDS Department of Physiology Ilse AE Bollen Jolanda van der Velden UNIVERSITY COLLEGE LONDON, LONDON, UNITED KINGDOM Institute of Cardiovascular Science William McKenna Petros Syrris Department of Genetics Vincent Plagnol MEDICAL SCHOOL HANNOVER, HANNOVER, GERMANY Department of Cardiology and Angiology Denise Hilfiker-Kleiner UNIVERSITY OF SOUTHERN DENMARK, ODENSE, DENMARK Department of Cardiology Jens Mogensen UNIVERSITY OF CAPE TOWN, SOUTH AFRICA Hatter Institute for Cardiovascular Research in Africa, Department of Medicine & IIDMM Karen Sliwa

LIST OF AUTHORS AND AFFILIATIONS 269 ABOUT THE AUTHOR Anna Pósafalvi graduated as a Doctor of Pharmacy (PharmD) from the University of Debrecen Medical and Health Science Centre, Hungary, in 2009 with praise. Subsequently, she started her PhD in cardiogenetics at the Department of Genetics, University Medical Centre Groningen, the Netherlands, where she has learned the ins & outs of Sanger, exome, and gene- panel-based next generation sequencing. She applied these methods in the research and diagnostics of the complex disease group of cardiomyopathies, and performed functional follow-up experiments on some of the genetic findings. She is currently working in the group of Professor David Kelsell at the Blizard Institute, Queen Mary University of London, where she is investigating desmosomal biology in the context of skin and heart diseases. Besides her immediate field, she also has a broad scientific interest in personalised/stratified medicine and pharmacogenetics, clinical pharmacy, herbal medicine, and the history of pharmacy. In her undergraduate years, Anna was active in organizing various student activities and events (e.g. The World Diabetes and AIDS Days), while during her PhD period, she has also given trainings and workshops to pharmacy students on topics such as creativity, emotional intelligence, cultural awareness, or academic life. In her free time, Anna plays the piano, and likes reading, dancing and hiking. She is also a passionate photographer.

270 ABOUT THE AUTHOR PUBLICATIONS Jongbloed JDH, Pósafalvi A, Kerstjens-Frederikse WS, Sinke RJ, van Tintelen JP: New clinical molecular diagnostic methods for congenital and inherit- ed heart disease. Expert Opin Med Diagn. (2011) 5(1):9-24 review Posafalvi A*, Herkert JC*, Sinke RJ, van den Berg MP, Mogensen J, Jongbloed JDH, van Tintelen JP: Clinical utility gene card for: dilated cardiomyopathy (CMD) Eur J Hum Genet. (2012 Dec 19.) doi: 10.1038/ejhg.2012.276. # van Spaendonck-Zwarts KY, Posafalvi A, van den Berg MP, Hilfiker-Kleiner D, Sliwa K, Alders M, Almomani R, van Langen IM, van der Meer P, Sinke RJ, van der Velden J, van Veldhuisen DJ, van Tintelen JP§, Jongbloed JDH§: Titin gene mutations are common in families with both peripartum cardio- myopathy and dilated cardiomyopathy Eur Heart J. (2014) 35:2165-73 Posafalvi A*, Jongbloed JDH*, Niessen RC, van der Zwaag PA, Hoedemaekers Y, Sikkema-Raddatz B, Dijkhuis J, Piers SRD, Zeppenfeld K, de Boer RA, van Haelst PL, Barge-Schaapveld DQCM, Asselbergs FW, van der Smagt JJ, van den Berg MP, van Tintelen JP§, Sinke RJ§: Gene-panel based Next Generation Sequencing (NGS) substantially improves clinical genetic diag- nostics in inherited cardiomyopathies. Manuscript submitted

* the first two authors contributed equally, § the last two authors contributed equally, # not included in this thesis

PUBLICATIONS 271 ACKNOWLEDGEMENTS “...ekkor a megtapasztalások hihetetlen lavinája temette maga alá” (... and then she was buried under the incredible avalanche of experiences)

The science of genetics has completed a long journey from crossing garden pea plants to sequencing personal genomes… and yet, the most demanding part, the proper and ethical interpretation of genomic information, is still ahead of us. Even though my contributions are tiny and insignificant, it feels like I have been on a long and crazy adventure, and I would like to thank all the people who guided me on the way. First and most, Jan and Richard, thank you for providing me with the great environment and opportunities to achieve all the things which I can present today in this book. Jan, I can only imagine what a tough job it must have been to supervise me, not something people would like to pick up as a new hobby... Thanks for all your efforts keeping me on track and not letting my brain fly around too much. And looking back after so many years, perhaps the results of our work together look just great like this, even if we had our different ways of doing things sometimes. Richard, thank you for the great brainstorming sessions, and for cheering me up with a new “dog-sitter” story or picture whenever I looked like I really needed one (actually, I might have looked like that every day?). Robert, during the first half of this 4-year marathon, you were, with the greatest respect, my “professor next door”. It was wonderful to meet such an inspiring person. Funnily enough, even though you left a few years ago, there is always a place called the “Oude Kamer van Robert” somewhere in the department. Ludolf, thank you for all your experimental help, the support in my fight against the evil labgoblin®. And of course, you also got some help from the students, especially Bastiaan and Elisabetta. Thank you guys for a great job! Cindy, you were the only student I supervised during my PhD, but you would anyway be my absolute favourite and dearest… it was amazing to work with you. :) Thank you for all your valuable contributions! Working in biological sciences always means teamwork, and this is certainly true if one is working with Next Generation Sequencing. Members of the

272 ACKNOWLEDGEMENTS Genome Analysis Facility who performed the exome sequencing; the genome diagnostic teams responsible for the targeted sequencing (Renee, Birgit, Eddy, Lennart, Krista and others); and colleagues of the Genomics Coordination Centre involved in my data analysis: thank you all for your wonderful work in setting up and running the method! Rowida, my “partner in crime”, eating carrots during every coffee break of the American Heart Scientific Sessions in LA, thank you for the collaboration on the exome sequenced families. Kristin and Pieter Neerincx, thanks for the pleasant and helpful conversations. I would like to acknowledge all the medical doctors, cardiologists and clinical geneticists in Groningen who were sedulously collecting the blood samples and medical information of cardiomyopathy patients for the cardio group research projects: Maarten, Peter, Paul, Yvonne, Anne, Nicole, Ellen, and everyone else responsible for counselling these families; as well as the doctors from Utrecht, Leiden, Nijmegen, Amsterdam, Sneek, and further afield, from Hannover and Cape Town. Peter, thank you for your feedback on the contents of my thesis. I would also like to express my gratitude to all our external collaborators: William McKenna and Petros Syrris (the Heart Hospital and University College London, UK) for collaborating on ARVC; the Nijmegen group for the mitochondrial measurements; and Marieke Bolling and Marcel Jonkman (Department of Dermatology, UMCG) for their valuable insights for the plectin manuscript. I need to mention many more colleagues who were not working with me directly, but who have all been head over heels (yes, this must be some kind of love) involved in what it took to create the Department of Genetics: Cisca, Gerard, Lude, Sebo, Klaas, Ellen, Irene, Connie, Rolf, Morris, Dineke, Sasha, Cleo, and Jingyuan, for influencing my view on scientific questions and professional matters; Jackie and Kate for correcting my sometimes rather crooked English; Bote, Mentje, Hayo, Marina, Héléne, and Joke for all their support. I enjoyed nice chats, coffee breaks, crazy and cosy moments around the department with so many: Mats the Chess master, Supertrynka, Agata, Jihane, Yunia (I miss you singing and humming around the lab and in the corridors), Bahram, Mahdi, Maria, Kaushal, Asia (the godmother of Glamorous Thursdays, my occasional Túró Rudi provider, happy family of the skateboarding lovebirds), Rodrigo, Javier, Juha (I so admire your special talent of ordering a glass of wine in fluent Hungarian, let alone your gene network magic), Isis, Harm-Jan, the always very kind and smiling Omid, Rajendra (thanks for the amazing curry! I made frozen aliquots of it and used them for cooking :) ),

ACKNOWLEDGEMENTS 273 Marijke, Suzanne, Ania, Vinod; Michiel and Rutger who ensured the never- boring atmosphere in the lab, Jan, Mathieu, Astrid, and everyone else. My dear roomies, many of you were my Dutch teachers, and I remember our few fun pub quiz tours: Helga, Peter, Annemieke, Karin; Gerben and Anna, proud residents of the M&M’s Addicts’ Desk; Ettje and her amazing cakes; Jun, the expert on linguistics, Monica (lunch?), Itty and Gineke. Thanks to my mentor for her advice, to the Graduate School of Medical Sciences office for the range of courses and their general helpfulness, and to my Dutch teachers, Jenny and Joke. Becoming a pharmacist is one of the best things that has happened in my life, and no matter how I look at it, I still carry with me some of those strange personality traits and the way of thinking... A warm thanks to all those people who have shaped me into a concerned health care professional from the University of Debrecen. One of the most memorable events from my undergrad studies is my first visit to a congress of the European Pharmaceutical Students’ Association, where I got completely infected by what the insiders simply call the EPSA Spirit. I met countless amazing people (Louise, João, Uros, Nikos, Dave and Giulia, to name just a few) and got my annual dose of inspiration from those sleepless conferences and alumni weekends between 2007 and 2014. I am particularly glad for the opportunity of growing from my experiences as a member of the Trainer Team, as well as for witnessing the Science Day and Project unfold throughout the years – so, all in all, for having seeing all an EPSA dinosaur could hope to have seen. But life is luckily not all about science and studying, and I have met many wonderful people outside work, such as the organic chemistry-related, culturally interested, temporary inhabitants of Groningen: Miri my dear, no reindeers can hide from you... and you hosted the best VAl NTiNe’S party ever :), Julia and Felix, Mathieu, Céline, Nop and Tizi; people from “the Cell Biology table” of the restaurant; Indian people at and not at GISA events: Harsh, Shiva, Ashoka, Bhushan and Prachi, Milind (we had great conversations at the final countdown), Harshad, Lalitha (who draped me in a Beautifully Blue Real Sari <3), Tushar, Ankita, Rama, Bala, Suresh van der V, Gaurav (my favourite Bollywood dance instructor), Vineet, Milon, and of course the most special of them all: Harshal, Sena and Sanket. It was fun to be a member of the Genetics Activity Committee 2010, and meet the participants and organizers of GRASP/ GOPHER and Foreign Guest Club/WIRE events.

274 ACKNOWLEDGEMENTS My dance teachers (with Roy Tweeboom in the starring role), and people from my salsa courses and parties (Gimon, Ivan and his Anne and Sanne, Faris, Victor, Peter, Susan, Andrius, Maurizio, and many others, who were pretty good at “one-and-a-half-enchufla” and “shoulder check”): you all helped me survive the difficult times. After all, I strongly believe that dancing is a wonder drug; at least as effective and universal as paracetamol seems to be in the Dutch huisartsenpraktijk (general practice). And if salsa made me feel like I was not a zombie, thanks to the amazing Maybelline concealers, I did not look like one either at times. Dear painter guy in the city centre, though you might actually never read these lines: it was always refreshing to visit your peaceful little universe of lively coloured canvases, thanks for your hospitality. Since it is not that easy to meet other Hungarians around Groningen, I really appreciated every moment I could spend with people who speak the same exotic language, and happen to know that the real gulyás is a soup, not a stew (that is called pörkölt!). Köszönet Mikinek a jazzkoncertért és Juditnak a kreatívkodásért. Pisti, ígérem, meglátogatom az Irodalmi Múzeumot! :) Péter és Erika: hálás köszönet az elmúlt évek során minden segítségért és a kellemes társaságért, és sok boldogságot kívánok a kis Emmához. Réka és Peter, thanks for those lovely weekends we spent together, here, there, and somewhere halfway. Dear Justyna and Tonia, be it an Eastern European Easter Breakfast or the playful Andrzejki names days, discovering the funny pedestrian crossings painted as piano keys on the streets of Warsaw or our inner princesses in charming Bavaria, it was a pleasure to have both of you around, as the poem says: drinking and fighting together. We seem to have done both. :) Tonia, I am very glad that on one cold winter evening, we met up to drink a few polar bears and penguins. Justyna, see you again at some random airport I presume...? Dear Peng, you were such a wonderful neighbour. Thanks for the dinners together and the dimsum for breakfast, for all your help when I was moving out, and especially for co-inventing the game “what do you think this Chinese character I have drawn might mean?”; it was fun. Just as much as giving a hand to Vanesa for the Goldfish Manoeuver, or babysitting the little darlings. Ladies from the Department of Dermatology: I am very glad for all the good times we spent together! My sweetest Laura, I can’t wait to visit you, Masaro, and your “little bean” one day soon, and I’m so gonna bring a big hug and some mandarijnen for you. Angelique, I remember those evenings, practising Beethoven while only some lonely ghosts (or PhD students?)

ACKNOWLEDGEMENTS 275 passed by in the dark and empty building... until one of them discovered us! It was a wonderful experience to play together with you and Ming-San. Maybe we should give it a go and perform the world premiere of our on Youtube. Ena, my favourite out-for-some-salsa-or-a-slightly-less-chic-mudwalking company in town, such an honour that you are my paranymph in your super busy 4th year! I honestly hope we are going to find the opportunity to work together on something in the future, but even if not... I will make sure to bring my dancing shoes with me to dermatology congresses, just in case. :) Barbara, I always smile thinking of those caring moments when I could enter the office in the morning only to find another postcard of Krteček, or a giant piece of ”makova cake” awaiting on my desk. Not to mention the secret kaleidoscope… I miss you so much! Beste Eva, they say the best thing that may ever happen to an expat in “the strange land” is the friendship of a true native person. Thanks for all your help over the years and for being my paranymph, for the amazing number of movies, events, and science- and art festivals we went to. I am very happy to see you finding your way in the world of science communication; your strength, creativity, and willingness to jump in is truly inspiring. I wish you many, many wonderful years ahead with Mark and your little pianist. Dear Adi, I think there is no other part of my entire thesis (including chapter 3.3) which has been as dramatically re-written as many times, as these few sentences. I am very grateful for the nice moments we had, for all your support and advice during the past few years, and I can’t tell you how proud I am of you for having made friends with the redwoods in the end. Wish you a lot of success finishing your postdoc at Berkeley, and all the best for your future. Dearhsd, I have been wondering a million times what would have happened to me if we hadn’t met on day minus 4, and, after we bought Dutch sim cards together but forgot to ask for each other’s number, if we hadn’t come across each other again on day 15 by pure chance. Most probably my PhD years (gosh, some 1,500 days) would have been lethally boring without those philosophical brunches, lunches, not coffees, and of course, a pinch of salt. Of misschien niet? Hope all izz well with you. They say home is where the heart is… I rather think home is somewhat like a batyu full of hamuban sült pogácsa: it is whoever and whatever we carry around the world in our hearts, where ever the Yellow (oranje?) Brick Road may take us. I am deeply indebted to all of those who have supported me throughout this crazy adventure from afar-yet-always-close, and who warmly

276 ACKNOWLEDGEMENTS welcomed me back to home sweet home. Apa, Anya, Tücsök és Tapicsek; drága nagymamáim, Robibá’, Árpiék és Petiék: köszönöm, hogy mindig hazavártatok, és még olyankor is hittetek benne, hogy sikerülhet, amikor én magam már feladtam volna. Köszönöm Erikának, Editnek, Nucynak és Trajánusznak, hogy támogattak, amikor életem eddigi legnehezebb döntését kellett meghoznom; Zsuzsinak a csuhébabákat és a veszprémi kirándulást. Vali, most mi lesz a beteg fókákkal? :) Finally, I would like to say a warm thank you to the Netherlands... because there are so many things I would have never been able to learn without leaving my beloved, soft and cosy comfort zone in Debrecen behind for a while. Although when I arrived in Groningen, the reasons might have not been completely clear to me either, during these years I have grown so much: I have become sufficiently waterproof and wind resistant; have admired the tulip fields of the Keukenhof and the sunflowers of Van Gogh; felt like I so wanna be a bad child to be taken to Spain for some gratis holidays each December; got addicted to stroopwafel, karnemelk and delicious Indonesian food, and discovered that kruidnoten taste the best when dipped into milk (well, sorry). Luckily, meanwhile, I also found a bit of time: (1) to become an expert on failing hearts, broken hearts, and “having no hearts” (a devastating, but not always hopeless condition I herewith suggest naming tinman syndrome®), (2) to discover that, just like familial cardiomyopathies, cheek dimples are classically considered to be inherited in an autosomal dominant fashion, yet there is some room for environmental influences as well (unpublished observation), and (3) to gather a rich personal collection of the cutest Dutch words, which I would like to share with you below in the secret appendix. Writing these final thoughts while sitting at a sun-soaked desk, surrounded by a few Western blot images and sipping hot chocolate from a mug decorated with the pictures of five tiny islands and lighthouses, it is hard to believe that this Never-Ending Story of mine is finally rushing toward a not-much- expected happy end on the next page. Amazing things have happened since I arrived in the Blizard of Oz, and I am very glad for getting so much support in the most critical weeks, while the last missing LEGO bricks just snapped into place – but this is perhaps a story to keep for another, happier book.

Yours, Anna Whitechapel, February 2015

ACKNOWLEDGEMENTS 277 SECRET APPENDIX – LIST OF MY FAVOURITE DUTCH WORDS: (a strictly personal, linguistic and artistic collection of the 30 weirdest and cutest Dutch words to be spoken and tasted, not necessarily translated)

blikvanger boemboe* eeneeiige tweelingen flessenschepjesmuseum gepocheerde heekfilet hoogtevrees kakkerlakken kapitein lieveheersbeestje meerkoet molenaar moltrein** monstertjes nietmachine opzuiveringsvloeistof paashaaskaas pijnboompitten pimpelmees pompoensoep puntjeschrijper sneeuwklokje suikerklontje terpdorp Tingtangstraatje tjonge jonge tuinkabouter verrekijker welterusten wetenschap wondertje

* Indonesian origin ** Afrikaans origin

278 SECRET APPENDIX