Understanding the genetic basis of rare inherited bleeding and platelet disorders: the utility of next-generation sequencing

Claire Lentaigne

A thesis submitted for the degree of Doctor of Philosophy

Centre for Haematology Department of Medicine Imperial College London

2 Declaration of originality

I hereby declare that the work presented in this thesis is my own. All collaborations and work performed by others are made clear and appropriately referenced.

Copyright declaration

The copyright of this thesis rests with the author and is made available under a Creative Commons Attribution Non-Commercial No Derivatives licence. Researchers are free to copy, distribute or transmit the thesis on the condition that they attribute it, that they do not use it for commercial purposes and that they do not alter, transform or build upon it. For any reuse or redistribution, researchers must make clear to others the licence terms of this work

3

Statement of collaboration

The work presented in this thesis was all conducted as part of the BRIDGE-bleeding and platelet disorders international consortium study which was created by Prof Willem Ouwehand at the University of Cambridge in 2010-2011. During the course of my PhD, the BRIDGE study became incorporated into the NIHR BioResource for Rare Diseases. This study involved the enrolment of over one thousand patients with rare bleeding and platelet disorders with subsequent analysis and interpretation of their sequencing data. Collaboration between PhD fellows, clinicians, geneticists and bioinformaticians has been essential to the success of the study and the outcomes presented here. All the work presented herein was either performed by me or is work in which I had a major contribution. Work performed by others is clearly stated in the text. Some of my specific roles within the consortium and key collaborators are outlined below • Methodology design for recruitment and data collection alongside Clinical PhD fellow Dr Tadbir Bariana (Royal Free Hospital), Prof Kathleen Freson, Geneticist (University of Leuven, Belgium) and Dr Ernest Turro (Bioinformatician, University of Cambridge) • Optimisation and application of the Human Phenotype Ontology with Clinical PhD fellows Dr Tadbir Bariana, Dr Anne Kelly (University of Cambridge) and Dr Sarah Westbury (University of Bristol), Prof Kathleen Freson, Dr Ernest Turro and Dr Daniel Greene (Computational Scientist and PhD fellow, University of Cambridge). • Recruitment and phenotyping of all cases and relatives enrolled at Hammersmith Hospital. From 2014 to 2017 most recruitment was also performed by Dr Carolyn Millar and Prof Mike Laffan (Consultant haematologists, Imperial College NHS Healthcare Trust) and research assistants Alice Glaser and Nicola Window at Imperial College NHS Healthcare Trust. • Analysis of whole exome and whole genome sequencing data from all BRIDGE- BPD cases as part of the core analysis team. This comprised principally the study PI’s Prof Willem Ouwehand (University of Cambridge) and Prof Mike Laffan (Imperial College London), clinicians Prof Andrew Mumford (Bristol) and Dr Keith

4 Gomez (Royal Free Hospital), Geneticist Prof Kathleen Freson, Chief bioinformatician Dr Ernest Turro and PhD fellow Daniel Greene, study coordinator Dr Sofia Papadia and clinical PhD fellows Dr Sarah Westbury, Dr Tadbir Bariana and more recently Dr Suthesh Sivapalaratnam and Dr Janine Collins. Regular additional input was also provided by scientists from the University of Cambridge involved in co-segregation analysis, functional studies on under investigation and Thrombogenomics, particularly Dr Kate Downes and Dr Karyn Megy. The core analysis team would meet on a weekly basis to discuss novel analytical methods proposed by the bioinformaticians, candidate novel genes and strategies for follow-up co-segregation and functional studies. My specific roles included providing a clinical perspective to the analysis process, testing novel algorithms and ensuring that the methods applied were clinically relevant and coordinating follow-up studies on genes of interest. • Local analysis of Hammersmith cases and investigation of the candidate novel genes described in this thesis were performed by me.

5 Acknowledgements

I would like to personally thank and acknowledge my supervisors, Dr Carolyn Millar and Prof Mike Laffan at Imperial College and Prof Willem Ouwehand at the University of Cambridge for their vision, support and guidance throughout my PhD. Being an integral member of the BRIDGE consortium has been an incredible and valuable experience and I thank all my supervisors for the opportunities this has provided. I also owe great thanks to the other core BRIDGE analysis team members, in particular Dr Ernest Turro, Prof Kathleen Freson, Dr Sofia Papadia, Dr Daniel Greene, Dr Suthesh Sivapalaratnam and clinical PhD fellows Dr Sarah Westbury and Dr Tadbir Bariana from whom I have learnt a great deal and they have also been wonderful colleagues. I would also like to acknowledge the help and support received from other colleagues in the Ouwehand group at the University of Cambridge, particularly Dr Kate Downes, Dr Karyn Megy, Dr Jonathan Stephens and Dr Stephanie Maiwald. I would also like to thank all my colleagues in the haemostasis lab at Hammersmith, particularly Dr Tom Mckinnon for his practical support and guidance on the bench. I am grateful to Prof Maddy Parsons at Kings College London for her knowledge and advice regarding ROCK1 experiments. I would also like to thank Haemophilia nurses Wendy Hutchinson and Sharon Alavian in the Haemophilia Centre for their help in patient recruitment and the research and admin teams, in particular Lisa Pape for all their administrative assistance with patient recruitment and recall. The work presented in this thesis has only been possible thanks to the support and dedication of all the patients and their families who are enrolled in the NIHR BioResource. I hope that all the hard work goes some way to improving the diagnosis and management of patients who suffer with bleeding and platelet disorders. Finally, I owe a huge debt of gratitude to my husband Julian who has supported me all the way through and my children who are my constant inspiration.

6 Abstract

Inherited bleeding and platelet disorders (BPD) are rare, heterogeneous and rarely receive a specific genetic diagnosis. The BRIDGE-Bleeding and platelet disorders consortium was set up to address this using high-throughput sequencing (HTS). More than one thousand probands with inherited BPD of unknown aetiology were recruited to an international consortium study with two over-arching aims: to identify novel genetic loci involved in BPD and provide a better diagnosis for patients. Through applying comprehensive, standardised phenotyping to this large dataset combined with novel analytical methods, several novel candidate genes for BPD have been identified. HTS has also identified variants in known BPD genes, many of them pathogenic, thus providing a diagnosis to many patients with BPD. In this thesis I present the study design and methodology for recruitment and data collection for the BRIDGE-BPD study. I describe the optimisation and application of the Human Phenotype Ontology to phenotype rare BPD and show how this has facilitated discovery and improvements in patient diagnosis. In chapters 4 and 5 I show how HTS has identified many variants in known BPD genes and illustrate the challenges faced and methods required for the interpretation of these variants. Many novel candidate BPD genes have also been identified and in chapters 6 and 7 some examples are highlighted with methods for their characterisation. Overall, this thesis demonstrates the utility of HTS in the diagnosis of rare BPD and many of the challenges faced in the interpretation of data from large whole genome sequencing projects and application of this technology to rare diseases.

7 Table of Contents

Declaration of originality...... 3

Copyright declaration ...... 3

Statement of collaboration ...... 4

Acknowledgements ...... 6

Abstract ...... 7

Table of Contents ...... 8

List of Figures ...... 16

List of Tables ...... 19

Introduction ...... 21

1.1 Overview of haemostasis and bleeding ...... 21

1.2 Megakaryopoiesis and platelet production ...... 22 1.2.1 Megakaryopoiesis ...... 22 1.2.1.1 Regulation of megakaryopoiesis...... 22 1.2.1.2 Platelet production ...... 24 1.2.2 Platelets ...... 26

1.3 Haemostasis ...... 30

1.4 Rare inherited bleeding and platelet disorders ...... 32 1.4.1 Diagnosis of bleeding and platelet disorders ...... 34 1.4.1.1 Assessment of bleeding ...... 34 1.4.1.2 Laboratory investigations ...... 34 1.4.1.3 Platelet function testing ...... 35 1.4.2 Limitations of BPD diagnosis in 2012 ...... 36

8 1.5 Genetic diagnosis of BPD ...... 37

1.6 Next Generation Sequencing ...... 40 1.6.1 NGS in the diagnosis of Mendelian disorders ...... 42 1.6.1.1 Whole exome sequencing ...... 43 1.6.1.2 Targeted panel sequencing ...... 43 1.6.1.3 Whole genome sequencing ...... 44

1.7 Proposed benefits of NGS in BPD ...... 44

1.8 Thesis Aims ...... 45

Methods and materials ...... 46

2.1 BRIDGE-Bleeding and Platelet Disorders: Study design and recruitment ...... 46 2.1.1 Ethics ...... 46 2.1.2 Eligibility criteria ...... 47 2.1.2.1 Identification of eligible cases with BPD ...... 48 2.1.3 Local recruitment ...... 48 2.1.3.1 Recruitment of relatives ...... 49 2.1.4 Sample collection ...... 49 2.1.5 Data collection ...... 50 2.1.5.1 Design of data collection tools ...... 50 2.1.5.1.1 Case report form ...... 51 2.1.5.1.2 Bleeding assessment tool ...... 51 2.1.5.1.3 Online data collection tool ...... 52

2.2 Development and use of the Human Phenotype Ontology ...... 59

2.3 Methods for genotyping ...... 59 2.3.1 DNA extraction ...... 59 2.3.2 Sequencing ...... 59 2.3.3 Variant calling ...... 60

2.4 Methods for analysing sequencing data ...... 61 2.4.1 Identifying novel genes ...... 62 2.4.2 Identification of variants in Tier-1 BPD genes ...... 62

9 2.4.3 Classification of pathogenicity and research reporting ...... 66 2.4.4 Recall and Co-segregation studies ...... 67 2.4.5 Sanger sequencing ...... 67

2.5 Feedback and Data-sharing ...... 68 2.5.1 Pertinent findings ...... 68 2.5.2 Data-sharing ...... 68

2.6 Methods for ROCK1 in vitro studies ...... 69 2.6.1 General molecular biology techniques ...... 69 2.6.1.1 Agarose gel electrophoresis ...... 69 2.6.1.2 Gel extraction and purification ...... 69 2.6.1.3 Transformation into competent E. Coli ...... 70 2.6.1.4 Isolation of plasmid DNA ...... 70 2.6.2 Construction of pc-DNA3.1 minus B-ROCK1 plasmid ...... 71 2.6.2.1 Isolation of ROCK1 cDNA ...... 71 2.6.2.2 pcDNA3.1(-)B vector ...... 72 2.6.2.3 Restriction digest of pcDNA3.1B- ...... 74 2.6.2.4 Ligation of ROCK1 insert and pcDNA3.1B- vector ...... 74 2.6.3 Site-directed mutagenesis to create ROCK1 mutants ...... 74 2.6.3.1 Confirmatory Sanger sequencing of ROCK1 plasmid and mutants ...... 77 2.6.4 HEK293T cell culture ...... 77 2.6.4.1 HEK293T cell cryopreservation and revival ...... 78 2.6.5 Transient transfection of HEK293T cells ...... 78 2.6.6 techniques ...... 79 2.6.6.1 Cell lysate collection ...... 79 2.6.6.2 Determination of protein concentration using the BCA assay ...... 79 2.6.6.3 Western blotting ...... 80 2.6.6.3.1 SDS-Polyacrylamide gel electrophoresis (SDS-PAGE) ...... 80 2.6.6.3.2 Probing and developing ...... 80 2.6.6.3.3 Stripping and re-probing ...... 81 2.6.7 Scratch wound healing assay ...... 81 2.6.8 Statistical analysis ...... 82

10 The BRIDGE-Bleeding and platelet disorders (BRIDGE-BPD) study: recruitment and phenotyping ...... 83

3.1 BRIDGE-BPD study collection ...... 83

3.2 Collection of data ...... 85

3.3 Description of the study collection ...... 86

3.4 Application of the Human Phenotype Ontology ...... 90 3.4.1 Introduction to the Human Phenotype Ontology (HPO) ...... 90 3.4.2 Rationale for using HPO ...... 95 3.4.3 Development of HPO terminology for BPD...... 95 3.4.4 Automated HPO suggestions...... 102 3.4.5 HPO-based clustering ...... 102 3.4.6 The utility of HPO phenotyping in patients with bleeding and platelet disorders 103 3.4.6.1 HPO demonstrates the phenotypic complexity of BPDs ...... 103 3.4.6.2 HPO identifies clusters of cases who have similar phenotypes ...... 105 3.4.6.3 HPO-based similarity clustering identified cases with MYH9-related disorder from cases with non-pathogenic variants in MYH9...... 106 3.4.6.4 HPO phenotyping helps identify pathogenic variants in known or suspected disorders...... 107

3.5 Discussion...... 108

Analysis of genotyping results from the BRIDGE-BPD study ...... 112

4.1 Genotyping of local samples ...... 112

4.2 Identification of candidate variants in tier-1 BPD genes in cases enrolled locally 113

4.3 Assessment of tier-1 BPD gene variants identified in cases enrolled locally...... 123 4.3.1 Assessment of tier-1 gene variants listed in HGMD ...... 124 4.3.1.1 Clearly pathogenic variant which fully explains the phenotype ...... 125 4.3.1.2 Likely pathogenic 5’UTR variants ...... 125 4.3.1.3 Likely pathogenic variants where the pathogenicity has changed ...... 126

11 4.3.1.4 Pathogenic, HGMD-listed variants do not necessarily explain the BPD phenotype ...... 127 4.3.1.5 HGMD variants without a BPD phenotype match ...... 128 4.3.2 Assessment of tier-1 BPD gene variants with a phenotype match and not listed in HGMD ...... 129 4.3.2.1 Likely pathogenic variants ...... 129 4.3.2.1.1 Likely pathogenic structural variants ...... 129 4.3.2.1.2 Likely pathogenic novel missense variants ...... 133 4.3.2.2 Variants of unknown significance ...... 136 4.3.2.2.1 Interpretation of multiple variants in the same pathway...... 136 4.3.2.2.2 Phenotypic heterogeneity in MYH9 variants ...... 137

4.4 Genotyping of the BRIDGE-BPD collection ...... 139 4.4.1 Variants identified in Tier-1 genes in the BRIDGE-BPD collection ...... 139 4.4.2 Replication studies in the BRIDGE-BPD cohort validate CYCS as a BPD gene. 141 4.4.2.1 Introduction to CYCS ...... 141 4.4.2.2 Identification of CYCS variant in Hammersmith proband ...... 142 4.4.2.3 CYCS variants in BRIDGE-BPD and Thrombogenomics collections ...... 143 4.4.2.4 Statistical association of CYCS with thrombocytopenia (BeviMed)...... 149

4.5 Discussion...... 150 4.5.1 Assigning pathogenicity to variants in tier-1 BP genes...... 150 4.5.2 The utility of high-throughput sequencing in bleeding and platelet disorders 153 4.5.2.1 Diagnosis ...... 153 4.5.2.2 Validation through replications ...... 154

4.6 Further work ...... 154

Assigning pathogenicity to variants in GFI1B ...... 156

5.1 Structure and function of GFI1B ...... 156

5.2 Previously reported pathogenic variants in GFI1B ...... 157

5.3 GFI1B variant identified in case enrolled at Hammersmith hospital ...... 161

12 5.4 Variant identified ...... 162

5.5 Co-segregation studies and extended phenotyping ...... 163 5.5.1 CD34+ expression on platelets is increased in cases with GFI1B His181Tyr 171 5.5.2 Alternative candidate variants ...... 171

5.6 Identification of GFI1B variants in other BRIDGE-BPD cases ...... 172

5.7 GFI1B variants in the general population ...... 178

5.8 Discussion...... 182

5.9 Further work ...... 185

Novel gene discovery ...... 186

6.1 Novel gene discovery in BRIDGE-BPD...... 186

6.2 Identification of candidate genes in local pedigree with platelet function defect using gene prioritisation methods ...... 187 6.2.1 Hammersmith family 428 characteristics ...... 187 6.2.2 Variant identification ...... 190 6.2.3 BRCA-1 associated protein ...... 195 6.2.3.1 BRAP in BRIDGE-BPD cases...... 196 6.2.3.2 BRAP in non-BPD cases in the NIHR BioResource ...... 197 6.2.3.3 BRAP in the general population...... 198 6.2.3.4 BRAP in Genome wide association studies ...... 198 6.2.3.5 Conclusion: ...... 199

6.3 Identification of a potentially novel candidate gene for thrombocytopenia ...... 199 6.3.1 Ikaros zinc-finger 5 ...... 200 6.3.2 Variants identified in IKZF5 in BPD cases ...... 200 6.3.3 Conclusion and further work ...... 206

6.4 Investigation of a novel candidate BPD gene in three families with a similar bleeding phenotype...... 206

6.5 Discussion and further work ...... 206

13 Identification and characterisation of ROCK1 as a candidate gene for undefined bleeding disorder...... 210

7.1 Rho-associated coiled-coil containing kinase 1: structure and function ...... 210

7.2 Identification of ROCK1 variants in interim BRIDGE-BPD analysis ...... 213

7.3 Co-segregation ...... 217

7.4 INTERVAL donor search ...... 219

7.5 Identification of ROCK1 variants in entire BRIDGE-BPD collection ...... 219 7.5.1 Further investigation of pedigree A012362 ...... 222

7.6 ROCK1 variants in control and reference populations ...... 223 7.6.1 ROCK1 variants in non-BPD cases ...... 223 7.6.2 ROCK1 variants in wider populations ...... 225 7.6.3 Statistical comparison of BPD ROCK1 variants with other populations ...... 227

7.7 Investigation of the functional effects of ROCK1 variants...... 230 7.7.1 Expression of ROCK1 in HEK293T cells...... 230 7.7.1.1 Creation of ROCK1 mutants ...... 231 7.7.1.2 ROCK1 mutants Y405* and M156L expression in transfected HEK293T cells is time-dependent ...... 232 7.7.2 ROCK1 effects on wound closure ...... 233 7.7.2.1 Over-expression of wild-type ROCK1 in HEK293T cells reduces wound closure 233 7.7.2.2 Overexpression of wild-type and mutant ROCK1 reduces wound closure and effects are time-dependent ...... 234

7.8 Discussion...... 235

7.9 Future work ...... 239

Chapter 8: Discussion ...... 240

8.1. Study design and recruitment ...... 240

8.2. Phenotyping ...... 242

14 8.3. Utility of large-scale High-throughput sequencing in bleeding and platelet disorders ...... 242

8.4. Interpretation of variants ...... 245

8.5. The future ...... 248

Publications related to this thesis ...... 250

References...... 252

Appendix ...... 273

1. Case Report Form 1 ...... 273

2. Case report form 2 ...... 282

3. The phenotypic abnormalities associated with defects in 63 BPD genes coded as HPO terms...... 297

Permissions ...... 300

15 List of Figures

Figure 1.1: Regulation of megakaryopoiesis...... 24 Figure 1.2: Pro-platelet formation...... 26 Figure 1.3: Platelet structure...... 28 Figure 1.4: Summary of platelet granule contents and the physiological (green) and pathological (red) platelet roles mediated by secretion...... 29 Figure 1.5: Haemostasis...... 32 Figure 1.6: Prevalence of bleeding and platelet disorders in the UK in 2012...... 33 Figure 1.7: Genomic landscape of BPD in 2012: inherited coagulation disorders...... 39 Figure 1.8: Genomic landscape of BPD in 2012: 39 genes associated with inherited platelet disorders...... 40 Figure 1.9: Illustration of Illumina sequencing...... 42 Figure 2.1: Electronic data entry pages for the BRIDGE-BPD study...... 54 Figure 2.2: Electronic data entry pages for results of platelet function testing...... 56 Figure 2.3: Online documentation of bleeding symptoms with (top) and without(bottom) a bleeding score...... 57 Figure 2.4: HPO data entry...... 58 Figure 2.5: pcDNA3.1(-)B vector map (top) and multiple cloning site (MCS) (bottom) ...... 73 Figure 3.1: Flow-chart illustrating recruitment of patients at Hammersmith Hospital...... 85 Figure 3.2: Phenotypes of probands recruited to the BRIDGE BPD study...... 87 Figure 3.3: Comparison of thrombocytopenic and bleeding cases between the locally- enrolled BPD cases and the entire BRIDGE-BPD cohort...... 89 Figure 3.4: Direct Acyclic Graph (DAG) representation of HPO terms...... 93 Figure 3.5: Direct acyclic graph representation of HPO terms in 7 patients with MYH9- related disorder ...... 94 Figure 3.6: Distribution of HPO terms shared with the blood and blood-forming tissues leading class...... 101 Figure 3.7: Human Phenotype Ontology terms in BRIDGE-BPD cases...... 104 Figure 3.8: BRIDGE-BPD cases grouped by HPO phenotypes outside of the blood...... 105

16 Figure 3.9: HPO-based clustering of cases within pedigrees and syndromic cases…………….106 Figure 3.10: HPO-based gene prioritisation...... 108 Figure 4.1: Flow diagram indicating the number of DNA samples from patients enrolled at Hammersmith hospital who had whole exome sequencing (WES), whole genome sequencing (WGS) or both...... 113 Figure 4.2: Categories of tier-1 BPD genes in which rare coding variants were identified in Hammersmith cases...... 115 Figure 4.3: Pedigree of Hammersmith case with Leu132Gln heterozygous variant in GP1BB...... 134 Figure 4.4: 210 pathogenic tier-1 BPD gene variants (CPV and LPV) identified in 1602 cases sequenced in the BRIDGE-BPD study...... 140 Figure 4.5: CYCS expression in haemopoietic progenitor cells...... 142 Figure 4.6: Pedigree HH23E carrying the p.Leu99Val variant...... 143 Figure 4.7: Four pedigrees with autosomal dominant thrombocytopenia who carry CYCS variants...... 146 Figure 5.1: Schematic representation of GFI1B protein structure...... 157 Figure 5.2: Pedigree #17 from BRIDGE-BPD collection...... 162 Figure 5.3: Conservation of the Histidine residue 181 in GFI1B...... 163 Figure 5.4: Phenotypes of affected members of pedigree #17...... 167 Figure 5.5: Co-segregation of pedigree #17...... 170 Figure 5.6: CD34+ expression on platelets in GFI1B cases...... 171 Figure 5.7: Expression of candidate genes in whichc cases II/3 and III/1 share missense or high-impact variants with a MAF <0.001 ...... 172 Figure 5.8: GFI1B variants identified in BRIDGE-BPD cases...... 175 Figure 5.9: GnomAD and NIHR BioResource GFI1B variants...... 180 Figure 6.1: Hammersmith family 428...... 189 Figure 6.2: Heatmap showing the expression of 29 genes harbouring rare coding variants listed in table 6.3 in haematopoietic progenitor cells...... 193 Figure 6.3: Diagram of the BRAP protein showing the major domains ...... 195 Figure 6.4: BRAP is highly expressed in platelets...... 196 Figure 6.5: BRAP variants identified in the NIHR BioResource...... 197

17 Figure 6.6: Alignment of the three N-terminal zinc finger domains of IKZF5 and position of variants associated iwth thrombocytopenia...... 202 Figure 6.7: IKZF5 variants...... 203 Figure 6.8: BRIDGE-BPD pedigrees with IKZF5 missense variants associated with thrombocytopenia...... 204 Figure 7.1: ROCK1 expression in Haematopoietic cells...... 211 Figure 7.2: Human ROCK1 protein showing the major functional domains...... 212 Figure 7.3: ROCK1 protein annotated with BPD variants...... 216 Figure 7.4: Predicted position of Met156 and Arg403 in ROCK1 kinase domain...... 217 Figure 7.5: Co-segregation of ROCK1 variants in three BPD pedigrees...... 218 Figure 7.6: Pedigree of proband A012362...... 223 Figure 7.7: Distribution of rare variants in the ROCK1 protein...... 227 Figure 7.8: ROCK1 variants in BPD, non-BPD and GnomAD cases...... 229 Figure 7.9: Overexpression of His-tagged wild-type ROCK1 in HEK293T cells...... 231 Figure 7.10: Expression of transfected ROCK1 increases over time...... 233 Figure 7.11: HEK293T cells transfected with wild-type ROCK1 have reduced wound closure compared to non-transfected cells...... 234 Figure 7.12: Effects of wild-type and mutant ROCK1 on wound closure after 24 and 48 hours...... 235

18 List of Tables

Table 2.1: Eligibility criteria for the BRIDGE-BPD study...... 47 Table 2.2: Samples collected and tests performed on locally enrolled cases...... 50 Table 2.3: Reference cohorts used to estimate the allele frequency of variants in the general population...... 60 Table 2.4: Variant annotation tools used in local and central analysis ...... 61 Table 2.5: ISTH Tier-1 BPD gene list, last updated July 2017 ...... 63 Table 2.6: Primers used to extract ROCK1 cDNA insert from MEG01 cDNA library...... 71 Table 2.7: Primers designed for site-directed mutagenesis ...... 76 Table 2.8: Primers used for sanger sequencing of ROCK1-His plasmid ...... 77 Table 2.9: Antibodies used in Western blotting ...... 81 Table 3.1: Total number of cases enrolled into BRIDGE and NIHR BioResource – Bleeding and platelet disorders (BPD) studies...... 84 Table 3.2: The 23 leading classes which make up the phenotypic abnormality sub-ontology of the HPO ...... 92 Table 3.3: New terms added to the Human Phenotype Ontology version 887...... 96 Table 4.1: Rare high-impact or missense variants in tier-1 BPD genes identified in 64 pedigrees enrolled at Hammersmith hospital...... 116 Table 4.2: Tier-1 BPD gene variants identified in Hammersmith cases and listed in HGMD...... 124 Table 4.3: Hammersmith tier-1 variants with a phenotype match not listed in HGMD...... 131 Table 4.4: MYH9 variants identified in Hammersmith cases...... 138 Table 4.5: CYCS variants identified in BRIDGE-BPD cases and their respective phenotypes...... 145 Table 4.6: CYCS variants in non-BPD cases in the NIHR BioResource...... 148 Table 4.7: Top-ranking genes associated with thrombocytopenia using BeviMed ...... 150 Table 5.1: Previously reported variants in GFI1B. Phenotypes are given as described in the respective publications...... 160

19 Table 5.2: Bleeding and platelet characteristics of all family members tested from Hammersmith pedigree #17...... 165 Table 5.3: Phenotypes of BRIDGE-BPD cases with rare coding variants in GFI1B...... 176 Table 5.4: Types of GFI1B variants seen in the general population...... 179 Table 6.1: Novel BPD genes discovered by the BRIDGE consortium since 2012...... 187 Table 6.2: Platelet function results of Hammersmith family 428...... 190 Table 6.3: Thirty-one candidate variants shared by all three affected relatives...... 191 Table 6.4: Summary prioritisation of candidate genes...... 194 Table 6.5: BRAP variants identified in BRIDGE-BPD cases ...... 197 Table 6.6: IKZF5 variants identified in BRIDGE-BPD cases...... 201 Table 7.1: ROCK1 variants identified in early analysis of BRIDGE-BPD cases...... 215 Table 7.2: ROCK1 variants in BRIDGE-BPD cases...... 221 Table 7.3: ROCK1 kinase domain variants in non-BPD cases...... 225 Table 7.4: ROCK1 kinase tail variants in non-BPD collections ...... 225 Table 7.5: ROCK1 mutants created by site-directed mutagenesis of ROCK1-His plasmid. .. 231

20 Introduction

1.1 Overview of haemostasis and bleeding

Haemostasis is the normal physiological response to vessel injury which controls bleeding and leads to vessel repair. It is a complex process relying on tightly regulated interactions between the blood vessel wall, platelets and coagulation . The process of haemostasis begins with vessel constriction at the site of injury, mediated by the endothelium and smooth muscle. Haemostasis then proceeds due to exposure of subendothelial matrix proteins such as collagen, elastin and glycosaminoglycans and tissue factor on extravascular cells. Recruitment of platelets and their subsequent activation and aggregation leads to the formation of an unstable platelet plug which is then stabilised by fibrin, resulting from serial activation of coagulation proteins. The process of fibrinolysis leads to clot dissolution followed by vessel repair. Under normal physiological conditions, blood is maintained in a liquid state by the integrity of the endothelium which is coated with anticoagulant factors and also releases factors which inhibit platelet and coagulation factor activation. Clot formation is also critically regulated by anticoagulant plasma proteins such as tissue factor pathway inhibitor (TFPI), antithrombin and proteins C & S which prevent inappropriate thrombus formation after minor stimuli and limit haemostatic clot formation to the site of injury. Any dysregulation of the haemostatic process can lead to either thrombosis (pathological clot formation) or abnormal bleeding. Defects of the vessel wall, platelets, coagulation proteins and the fibrinolytic pathway can be inherited or acquired and can lead to pathological bleeding. Inherited defects in haemostasis which lead to abnormal bleeding are the subject of this thesis, with a particular focus on platelets, which will be discussed first, followed by a broader description of the secondary coagulation mechanisms.

21 1.2 Megakaryopoiesis and platelet production

Platelets are critical to the haemostatic process and their production from megakaryocytes is described here.

1.2.1 Megakaryopoiesis

Megakaryopoiesis describes the process of megakaryocyte (MK) development from haematopoietic stem cells (HSC) where MK grow, develop alpha and dense granules and expand their cytoplasm with the formation of an extensive invaginated membrane system 1. Megakaryocytes are the largest cells in the bone marrow (50-100µm) and account for 0.01% of nucleated bone marrow cells2. They develop by differentiation of multipotent HSC through discrete, highly regulated steps via increasingly committed progenitor cells (fig 1.1). Within the adult bone marrow, MK precursors develop from a common myeloid progenitor (CMP) in a highly specialised bone marrow niche3. The CMP differentiates into the megakaryocyte-erythroid progenitor (MEP) and granulocyte/monocyte colony-forming units (figure 1.1). In early megakaryopoiesis, differentiation occurs through mitosis similarly to other haematopoietic cells, driven by thrombopoietin (TPO). After the promegakaryoblast (PMKB) stage, endomitosis begins and increases MK ploidy from 2N up to 128N before final maturation4,5. It is thought that polyploidisation allows functional gene amplification, facilitating the production of the large amounts of mRNA and protein needed for granule packaging and platelet production6. Indeed, polyploidisation may make platelet production more efficient, since a 16N MK can produce around 2000 platelets compared to one or two produced by a 2N megakaryocyte7. Terminal maturation and proliferation in the bone marrow occurs in the vascular niche adjacent to endothelial cells and culminates in platelet production.

1.2.1.1 Regulation of megakaryopoiesis

Thrombopoietin (TPO) is the most important driver of megakaryopoiesis. Through binding to its receptor, c-Mpl, it induces a series of downstream signalling pathways involving PI3K, Akt, MAPK and ERK1/ERK28,9. TPO-MPL signalling induces transcription factors to drive MK development and thrombopoiesis10. TPO stimulates MK maturation and proplatelet formation and also mediates the compensatory responses of bone

22 marrow MKs to the level of platelets in the peripheral blood5. The important role of TPO is evidenced by the thrombocytopenia and thrombocytosis that can be seen in humans with defects in TPO signalling pathways11-13. The location of the megakaryocyte and its interaction with the bone marrow microenvironment are also crucial for megakaryopoiesis and platelet formation. Gradients of chemokines and growth factors plus adhesive interactions through MK integrins regulate MK migration through the bone marrow. Some of the chemokines and growth factors critical for the proliferation and maturation of MK’s are illustrated in figure 1.1. These cytokines and growth factors also enable megakaryopoiesis and platelet production to occur through TPO-independent means14. Lineage specific differentiation in haematopoiesis is regulated by complexes of multiple cell-type specific transcription factors, including RUNX1, GATA-1 and its co-factor FOG1, FLI1, TAL1, NFE2 and EVI115. The importance of haematopoietic transcription factors in megakaryopoiesis is highlighted by the occurrence of thrombocytopenia and platelet function defects in patients with germline mutations in key transcription factors such as RUNX1, GATA-1 and FLI116-18.

23

Figure 1.1: Regulation of megakaryopoiesis. The roles of some of the major transcription factors, growth factors and chemokines in megakaryopoiesis. HSC: haematopoietic stem cell, CMP:common myeloid progenitor, MEP:MK-erythroid progenitor, PMKB:pro-megakaryoblast, MKB:megakaryoblast, proMK:pro- megakaryocyte, MK:megakaryocyte. (Adapted from Mazzi et al, Experimental Haematology 20187 and Deutsch & Tomer, B J Haematology 201310).

1.2.1.2 Platelet production

The interaction of the MK with its local environment is crucial for platelet formation. MKs interact directly with endothelial cells and extracellular matrix proteins, for example

24 fibrinogen binding to integrin aIIbb3 and VWF binding to the GP1B/V/IX complex15 (see fig 1.3, p27). Defects in both these receptors can cause platelet dysfunction and thrombocytopenia (Glanzmann’s thrombasthenia & Bernard Soulier Syndrome respectively). The precise mechanism of platelet production is not fully understood, but the current, widely accepted theory is that mature MKs extend long, branching cytoplasmic protrusions called proplatelets into sinusoidal blood vessels in the bone marrow19. Proplatelet formation is enabled by reorganisation of the actin cytoskeleton20. The importance of the actin cytoskeleton in platelet formation is evidenced by the abnormal platelet production and thrombocytopenia occurring in patients with inherited defects in MYH9 which encodes the myosin heavy chain and TUBB1 encoding tubulin21- 23, both key components of the cytoskeleton. Platelet granules are synthesised in the megakaryocyte, including alpha granules, dense granules and lysosomes which contain hundreds of diverse proteins that enable platelet function in a variety of biological and pathological processes1. Organelles are formed either from small vesicles budded off from the Golgi complex of the MK or from MK multi-vesicular bodies and contain proteins produced both endogenously and captured by endocytosis24. Organelles and granules are passed along the proplatelet extensions to the proplatelet tips, facilitated by microtubules which reorganise and traverse the length of the proplatelet extensions2. There are alternative theories of platelet production, including cytoplasmic fragmentation of megakaryocytes25 and platelet formation outside the bone marrow, with the lung providing an alternative major site of platelet biogenesis26,27.

25

Figure 1.2: Pro-platelet formation. The widely accepted model of platelet formation. The top panels show mouse megakaryocytes forming pro-platelets in vitro over time in video-enhanced light microscopy. The area where the MK cytoplasm initially begins to unravel is marked with an asterisk. A white arrow marks the beginning of pseudopodia formation. Pseudopodia elongate, narrow and frequently bend over time until the entire MK cytoplasm is converted into proplatelet extensions. Bar=20um. The bottom diagrams mirror the processes occurring in the top panels and illustrate how MKs remodel their cytoplasm into thick pseudopodia which subsequently elongate into proplatelets containing microtubule bundles and with platelet content packaged into the ends. Finally, proplatelets are released and the MK body retracts. Adapted from Italiano et al, J Cell Biology 199919.

1.2.2 Platelets

Humans produce on average 1x1011 platelets per day from megakaryocytes, maintaining a normal physiological range of around 150-400 x109 platelets per litre with an average lifespan of 7-10 days. Platelets are small, anucleate cells usually 1-3um in diameter which

26 do not carry any genomic DNA but they do contain messenger RNA (mRNA) enabling protein synthesis. Platelets circulate in a discoid shape, enclosed in a plasma membrane with an external glycocalyx. The surface glycocalyx contains many surface glycoproteins required for platelet adhesion and activation. The most important and well-studied receptors are the GP1B-V-IX complex, GPVI and integrin aIIbb3, responsible for mediating platelet adhesions, activation and aggregation28. Inherited defects in GP1B-V- IX and aIIbb3 result in two of the most common and well characterised inherited platelet function defects – Bernard-Soulier syndrome (OMIM#231200) and Glanzmann’s Thrombasthenia (OMIM#273800), where patients usually present with severe bleeding. The plasma membrane is a phospholipid bilayer which forms into tiny folds and also invaginates into an open canalicular system (OCS), allowing the platelet to change shape during activation29. The OCS can also transport plasma molecules in to be stored in platelet granules (eg. fibrinogen) and is one of the routes for secretion of granule contents during platelet activation28. Beneath the membrane lies the membrane actin cytoskeleton, a system of thin actin filaments which enable platelet shape change and form a link between the transmembrane receptors and cytoplasmic proteins28,30. The actin cytoskeleton forms a network of actin filaments throughout the platelet cytoplasm, keeping organelles suspended and apart from one another and the platelet membrane at rest,enabling movement of platelet contents upon activation30. The actin cytoskeleton is also linked to a system of microtubules which are arranged in coils around the edge of the platelet and are crucial for maintaining its discoid shape28. Platelet activation initiates a series of intracellular signalling events via the integrin and G-protein-coupled receptors which regulate the actin cytoskeleton, mostly through the small GTPases RhoA, Rac1 and CDC4231. Signalling through RhoA and its downstream effector Rho-associated coiled-coil containing kinase (ROCK) regulates actin contractility, platelet shape change and thrombus stability32.

27

Figure 1.3: Platelet structure. Illustration of the platelet structure including the major internal features, surface receptors and their ligands.

In the platelet cytoplasm sit three major types of secretory organelles - a-granules, dense granules and lysosomes – and mitochondria, which provide the platelet’s energy requirements28. Alpha granules are the most numerous (50-80 per platelet) followed by dense granules (3-8 per platelet) and only one or two lysosomes. Between them, these organelles contain more than 300 different molecules which can be secreted upon platelet activation and contain substances crucial for the amplification of the platelet activation response and effect the variety of functions of platelets33. Alpha granules contain a complex array of factors including VWF, fibrinogen and P-selectin, growth factors, complement protein and chemokines. Dense granule contents include ADP, ATP, serotonin, histamine, calcium and polyphosphates whereas lysosomes mainly contain enzymes. The mechanisms of platelet granule secretion are not fully understood but it is likely that it is a tightly regulated and coordinated process balancing the release of multiple factors with often opposing effects as illustrated in figure 1.433. Rare germline variants in genes involved in platelet granule biogenesis and trafficking can lead to severe platelet dysfunction and bleeding, as seen in Hermansky-Pudlak syndrome (caused by variants in HPS genes), Chediak-Higashi syndrome (LYST) and grey platelet syndrome (NBEAL2)34-44.

28 The major and most well-established role of platelets is in haemostasis, which is described in detail below, but they also have other important physiological and pathological roles including inflammation, immunity, tumour metastasis, atherogenesis and wound healing33,45,46 (figure 1.4). The involvement of platelets in many non- haematological processes may contribute to the wider phenotype often seen in the inherited platelet granule disorders.

Figure 1.4: Summary of platelet granule contents and the physiological (green) and pathological (red) platelet roles mediated by secretion. a-granule contents may often have opposing actions (e.g. angiogenesis and coagulation-related factors). Some of the physiological (green) and pathological (red) processes known to be affected by platelet granules secretion are also listed. Although functions are assigned to each cargo here, many cargoes have multiple roles, while the roles of others still have not been fully elucidated. a: alpha-granules; d: dense-granules; L: lysosomes. Adapted from Golebiewska & Poole, British Journal of Haematology 2013 47.

29 1.3 Haemostasis

Haemostasis is the combination of many complex and tightly regulated processes which maintain vascular integrity, keep the blood in a fluid state and close off and repair damaged blood vessels after injury. The healthy endothelium keeps platelets in a resting state by releasing factors such as nitric oxide and prostaglandin (PGI2) which inhibit platelet activation and promote smooth muscle cell relaxation and ADPase which metabolises the platelet agonists ADP. Thrombomodulin. The endothelium also releases antiocoagulant factors such as tissue factor pathway inhibitor (TFPI) and protein S. In response to vessel injury, the blood vessel constricts and this is enhanced by the local release of vasoconstrictors from the endothelium. Subendothelial collagen is exposed and binds to von Willebrand factor (VWF) released from the Wiebel-Palade bodies of the endothelium. Binding to collagen under high-shear causes VWF to unravel, exposing GP1B binding sites. Platelets can then adhere to VWF via their surface GP1B–V-IX complex and once they have been slowed down, also bind directly to subendothelial collagen through platelet GPVI. This interaction is enhanced by co-adhesion with integrin a2b1 receptors on the platelet surface. Signalling through the glycoprotein receptors, including GP1B, initiates restructuring of the actin cytoskeleton leading to platelet shape change. Recruitment of ITAM-containing receptor FCgrIIa and FCR g-chain leads to signalling via tyrosine kinases including SRC, PI3K, phospholipase C (PLC) and small GTPases. The net effect of signalling through the glycoprotein receptors is an increase in intracellular calcium, inside-out signalling activating the surface integrin aIIb3 receptors on the platelet surface and remodelling of the actin cytoskeleton leading to platelet shape change, spreading and clot stabilisation. The same signalling pathways lead to release of granule contents which instigates a positive feedback loop of platelet activation with substances such as ADP and thromboxane activating G-protein-coupled receptors, including P2Y1, P2Y12 and TXA2 on platelets, leading to more intracellular signalling, integrin activation, further granule release and platelet activation. Integrin aIIbb3 activation allows fibrinogen binding to cross-link platelets resulting in platelet aggregation. Fibrinogen binding reinforces platelet activation by signalling through the aIIbb3 receptor from the outside-in and

30 chemokines are released which recruit further platelets, resulting in the formation of the primary platelet plug (fig 1.5). Alongside the adherence, activation and aggregation of platelets, damage to the endothelium exposes tissue factor on extravascular cells such as fibroblasts, a key initiator of the coagulation cascade. Tissue factor (TF) binds coagulation factor VII (FVII) which leads to its activation (FVIIa). The TF/FVIIa complex in turn activates FIX and FX. FXa associates with FVa (the prothrombinase complex) which converts prothrombin into thrombin. This generates trace amounts of thrombin, a potent platelet agonist, which in turn accelerates platelet aggregation. Thrombin also directly activates FV, FVIII and FXI leading to a massive amplication of thrombin generation. FXIa converts FIX into additional FIXa which acts as a co-factor for FVIIIa to support further FXa generation. Platelets also provide a phospholipid surface to facilitate the actions of the FVIIIa/FIXa by the assembly of the intrinsic Xase and the prothrombinase complexes leading to amplified thrombin generation. Thrombin is a strong platelet agonist, activating platelets through PAR receptors and also converts fibrinogen to fibrin, leading to fibrin deposition and stabilisation of the platelet-rich thrombus reviewed in 48. Recently, there has been much research improving understanding of the important role of leucocytes in thrombus formation. Leucocytes produce enzymes such as cathepsin G and elastase and cytokines such as TNFa and IL-1b which can activate platelets and also modulate the anticoagulant activity of endothelial cells. They can express thrombomodulin, TFPI and tissue factor, becoming either anti- or pro-coagulant depending on the circumstances. Leucocytes can also increase or attenuate fibrinolysis and regulate clot resolution49. Traditionally, haemostasis has been described as two processes: primary (the interaction of the vessel wall with platelets and formation of the platelet plug) and secondary (activation of the coagulation cascade leading to fibrin stabilisation of the thrombus) haemostasis, however current opinion favours a more dynamic interplay between all these processes. The platelet and coagulation activation pathways described act as a series of networks rather than individual pathways which means there is an element of redundancy in some pathways. Defects or deficiencies in molecules or pathways involved in haemostasis result in bleeding but these range from severe and spontaneous bleeding to mild bleeding only after a haemostatic challenge, depending on the degree of deficiency, the nature of the defect and the redundancy of the molecule affected. Inherited

31 quantitative and qualitative defects in coagulation proteins and platelets and in the endothelium can all lead to bleeding.

Figure 1.5: Haemostasis. Summary of the interplay between platelets and coagulation proteins leading to the formation of a stable clot. (Image from Versteeg et al, 201348)

1.4 Rare inherited bleeding and platelet disorders

Inherited bleeding and platelet disorders (BPD) are a heterogeneous group of diseases that can be caused by defects in the vessel wall, platelets or coagulation factors. Rare diseases are defined as those which affect less than 5 in 10,000 of the general population and 1 in 17 people (7% of the population) will be affected by a rare disease at some point in their lives, presenting a significant challenge to healthcare systems50 . The majority of diagnosed BPD are coagulation defects (fig 1.6). This is in part because the most common BPD are Haemophilia A (inherited deficiency in factor VIII), which occurs in approximately 1 in 5000 live male births and Von Willebrand disease (inherited qualitative or quantitative defect in VWF), estimated to occur in up to 1 in 100 in some populations51. Coagulation defects are also easier to accurately identify using specific assays than defects in platelets or the vessel wall. Individual inherited platelet defects are

32 usually extremely rare, with some only reported in one to five cases worldwide. In 2012, platelet defects accounted for only 6-7% of the 23,749 patients registered with bleeding disorders in the UK and the majority of those were not listed with a specific pathway defect52 (fig 1.6). Furthermore, approximately 1% of those registered were categorised as miscellaneous or unclassified, indicating that these patients had bleeding symptoms which were not associated with any specific laboratory abnormality. These numbers are likely to be an underestimate because it is recognised historically that inherited BPD are often misdiagnosed, as seen by the frequent confusion between inherited thrombocytopenias and immune thrombocytopenia (ITP)53-55. It is also recognised that, historically, many patients with bleeding disorders have remained undiagnosed and/or have not been registered56,57.

Figure 1.6: Prevalence of bleeding and platelet disorders in the UK in 2012. Graph showing the breakdown of all patients registered in the UK with all types of bleeding disorder as of 31st March 2012 (total number of registered patients = 23,749). Rare coagulation disorders include deficiencies of factors V, VII, X, XI, XIII, fibrinogen,

33 prothrombin and combinations of any of these. Data from UK Haemophilia Centre Doctors Organisation (UKHCDO) annual report April 2011 to March 201252

1.4.1 Diagnosis of bleeding and platelet disorders

1.4.1.1 Assessment of bleeding

The assessment and diagnosis of a patient presenting with abnormal bleeding symptoms (or a family history of abnormal bleeding) begins with a detailed history of the bleeding and a full medical and family history. The objective assessment of bleeding is challenging. Bleeding symptoms are common in the general population, even amongst ‘healthy’ subjects with prevalence reported from 25% to 75%58-61. The reporting of symptoms may vary depending on the subject’s own experiences (for example being from a family where epistaxis or menorrhagia is common) and exposure to bleeding risk by pastime or occupation. Bleeding severity and frequency changes with age and, in many cases only manifests after a haemostatic challenge62. Milder BPD frequently present only following a haemostatic challenge such as surgery or childbirth and therefore are often diagnosed in later life, despite being inherited conditions. There is also considerable redundancy in platelet signalling pathways, which may mean that patients are asymptomatic or have very mild symptoms except when exposed to a specific challenge such as an anti-platelet agent or major surgery. Consequently, formal bleeding assessment tools (BATs) are increasingly used to classify, document and quantify bleeding symptoms. There were a variety of BATs available in 2012, including the condensed MCMDM-1VWD bleeding questionnaire, ISTH-BAT and the Rockefeller-BAT but no studies had been conducted directly comparing the different BATs in unselected BPD63-66. Finally, a full history is taken to exclude acquired causes of bleeding, such as drugs and local pathologies.

1.4.1.2 Laboratory investigations

Laboratory testing begins with the measurement of platelet count and mean platelet volume and examination of a peripheral blood smear. Basic assessment of coagulation is also performed by measuring prothrombin time (PT), activated partial thromboplastin

34 time (APTT), thrombin time (TT) and fibrinogen assays and measurement of Von Willebrand factor level and function. This is followed by specific coagulation factor assays if indicated. If the bleeding symptoms in the patient are suggestive of a primary haemostatic defect then further investigations are predominantly based on platelet functional analysis.

1.4.1.3 Platelet function testing

A variety of laboratory tests are available to investigate how well an individual’s platelets are functioning: The PFA-100â is performed on anti-coagulated (citrate) whole blood which is aspirated at high shear rates through an aperture containing a membrane coated with either collagen and epinephrine or collagen and ADP. VWF and platelets are captured on the collagen surface. The agonists then assist in activation of platelets and subsequent aggregation closes the aperture. The time taken to do this is the closure time. The PFA- 100â is used by some laboratories for its negative predictive value, to confirm the lack of primary haemostatic disorder in a patient in whom there is a low clinical suspicion, but the false negative rates and lack of specificity for any particular disorder limit its utility in the assessment of BPD67. Light-transmission aggregometry (LTA) measures the change in light transmission caused when platelets aggregate in response to the addition of various agonists to platelet-rich plasma. ADP, epinephrine, collagen, arachidonic acid and ristocetin are the traditional baseline panel of agonists used in LTA, with extended panels used for more detailed investigation57. There is wide variation in the combination and concentration of agonists used between laboratories and in the interpretation of results, although attempts have been made to standardise57,68. Assessment of specific platelet surface proteins can be done using flow cytometry but these tests are only performed in specialist laboratories and are most commonly used to confirm a clinically suspected glycoprotein receptor deficiency in Glanzmanns thrombasthenia (GT) or Bernard-Soulier syndrome (BSS). Assessment of platelet granules is most commonly performed using Lumi aggregometry and platelet nucleotide assays since platelet granule defects can be missed using platelet aggregometry alone69. Lumi-aggregometry is a variation of LTA with addition of a luciferin-luciferase reagent to measure the ATP released from platelet

35 granules. It is important to note that an abnormal measurement of ATP secretion cannot distinguish between deficiency in granule number, defect in granule content (referred to as storage pool disease) or a defect in granule content release/signal transduction (referred to as a release defect). The total platelet ADP and ATP content can also be measured in the platelet lysate by high performance liquid chromatography, ELISA and using luciferin but these are less commonly performed in clinical laboratories57,70. Platelet electron microscopy is useful to confirm the reduction or absence of dense granules in storage pool disease but is not routinely available.

1.4.2 Limitations of BPD diagnosis in 2012

BPD diagnosis is most straightforward in coagulation defects, thrombocytopenias and in the more common, well-defined platelet function disorders such as BSS and GT. The majority of coagulation defects can be detected by an assay available in most clinical laboratories and thrombocytopenia is detectable on a routine full blood count analysis. BSS and GT have moderate to severe bleeding symptoms, are identified early in life and can be easily recognised by the typical pattern of associated platelet aggregation defects to a standard panel of agonists. Some inherited platelet disorders are part of a well- defined syndrome eg. Chediak Higashi and Hermansky-Pudlak Syndrome (HPS), where the platelet dense granule defect is typically associated with immune deficiency or ocular albinism respectively. Diagnosis remains challenging, however, for the majority of BPDs that often have milder platelet function defects and are known to be clinically highly heterogeneous. Many assays for specific platelet defects are only available in specialist laboratories and repeated fresh sampling is required which is labour intensive. Furthermore, platelet function testing is poorly standardised and reproducible. Platelet aggregometry is highly sensitive to pre-analytical variables such as exposure to both prescription and non- prescription medication and dietary factors that affect platelet function. LTA may also be unreliable or impractical if the test subject is thrombocytopenic. There are methods of normalising the platelet count prior to performing LTA but there is poor standardisation in this practice and it can alter platelet responsiveness. There is also lack of standardisation in the choice and concentration of agonists used between laboratories in LTA71,72 and wide variation in interpretation of aggregation traces and diagnostic criteria for most mild platelet function defects73.

36 Finally, measurement of coagulation factors and platelet function testing fails to consider defects of the vessel wall, fibrinolysis or clot stability and results of these assays often correlate poorly with in vivo haemostasis. Consequently, a specific pathway defect is only identified in a minority of BPD cases using standard methods and numbers are even lower when just looking at patients with hereditary bleeding symptoms60,74.

1.5 Genetic diagnosis of BPD

Genetic testing has traditionally formed the final tier of investigation in BPD, primarily performed for confirmation of an already suspected clinical diagnosis (e.g. Glanzmanns thrombasthenia or a coagulation factor deficiency). Genetic testing was therefore dependent on the interpretation of laboratory coagulation and platelet function results and the extent to which they were performed. This is problematic with BPD which are often both clinically and genetically heterogeneous. Genetic testing was usually performed in individual specialist laboratories by Sanger sequencing of single genes75. Modern versions of Sanger sequencing work by amplifying sections of genomic DNA using specific primers in a polymerase chain reaction. Chain-terminating dideoxynucleotides (ddNTPs – A, T, C or G) bound to a fluorophore are then added and incorporated by DNA polymerase during DNA replication. When a ddNTP is added this terminates the extending chain and a specific fluorophore signal is detected for each base. Through this method, sections of target DNA up to 1000 base pairs can be sequenced. Sanger sequencing has been the gold-standard in molecular diagnostics for 30-40 years75,76 but it is time-consuming, expensive and restricts sequencing to a known gene or small groups of known genes. Results from one test would be awaited before sending samples for a different test, often taking months to make a diagnosis. Furthermore, the management of most BPD has been based on their clinical rather than genetic diagnosis, with uncertainty about how a genetic diagnosis will change management. Consequently, genetic testing may not performed be at all for BPD and very few patients receive a genetic diagnosis57,77,78. At the start of this project 59 genes had been identified over several decades in which rare protein-altering variants caused inherited BPD in humans, mostly causing highly penetrant, more severe and more common disorders. Twenty of these genes encode coagulation proteins (either coagulation proteins or inhibitors of coagulation or

37 fibrinolysis) and are shown in figure 1.7. Thirty-nine genes encode proteins important for megakaryopoiesis and platelet function and are associated with inherited platelet disorders (shown in figure 1.8). Genetic diagnostic testing was available in the UK for a minority of BPD in a handful of NHS molecular diagnostic laboratories in 201279 (shown in fig 1.7 & 1.8). Gene discovery, i.e. the identification of the genetic basis of disorders for which the cause was previously unknown has been primarily achieved through linkage mapping and sequencing of candidate genes. These approaches have discovered the genetic basis of one-third to one-half of all known Mendelian disorders but they have several limitations80,81. Many Mendelian disorders are extremely rare, with only a small number of pedigrees available to study worldwide, reduced or variable penetrance and genetic heterogeneity make identification of a genetic locus difficult and conversely, if the causative mutation produces a very severe phenotype, it may be under negative selection and only seen in very small pedigrees or appearing de novo. Additionally, mapping may identify a list of candidate genes that is too large to analyse entirely using Sanger sequencing. Consequently, the majority of Mendelian disorders remain without a genetic explanation82.

38

Figure 1.7: Genomic landscape of BPD in 2012: inherited coagulation disorders. The genes with established associations with human inherited coagulation disorders are listed on the right and their role in coagulation indicated in the diagram of the coagulation cascade on the left. Alpha-2-antiplasmin and PAI-1 are shown in red because they are inhibitors of the pathway shown. Asterisks mark those disorders for which genetic testing was available in NHS molecular diagnostic laboratories in 2012. Figure designed by me and Dr S Sivapalaratnam and prepared for publication by Jo Westmoreland in the Visual Aids department at the MRC Laboratory of Molecular Biology, Cambridge. 39

Figure 1.8: Genomic landscape of BPD in 2012: 39 genes associated with inherited platelet disorders. Genes known to be associated with human inherited platelet disorders are categorised according to their role in megakaryopoiesis and platelet function. Genes involved in glycoprotein signalling appear twice to reflect their roles in both late megakaryopoiesis/pro-platelet formation and platelet signalling. Asterisks mark the six disorders for which genetic testing was available in NHS molecular diagnostic laboratories in 2012 according to the UKHCDO genetics network79. Image designed by me and prepared for publication by Jo Westmoreland in the Visual Aids department at the MRC Laboratory of Molecular Biology, Cambridge. Adapted from Lentaigne et al, Blood 201683.

1.6 Next Generation Sequencing

Next-generation sequencing (NGS), also referred to as High-throughput sequencing (HTS), is an umbrella term for several different sequencing technologies which can sequence a large number of DNA sequences in parallel, including Roche 45484, SOLiD85, Illumina86 and Ion torrent87. All these NGS technologies share some basic processes -

40 genomic or cDNA is fragmented and denatured to produce a single-strand DNA library followed by PCR amplification and sequencing of multiple DNA strands in parallel – but they differ in how the DNA templates are generated, their cost, length of reads, coverage bias and frequency of errors reviewed in 88,89. One of the most widely-used NGS technologies, and the one used in the BRIDGE study presented in this thesis is Illumina sequencing (www.illumina.com). In this method, DNA is cleaved into short 100-150bp fragments, specific adaptors are attached to each fragment then attached to a specialised flow cell. Each fragment is amplified by PCR creating a cluster of hundreds of copies of the same read. Fluorescently-labelled, chain-terminating ddNTPs and DNA polymerase are added and nucleotides are added one by one. The fluorescence is detected from each read before the terminators and fluorescent labels are cleaved off, allowing the next base to be added, and so-on. The cycle is repeated multiple times and in this way the sequence is constructed (fig 1.9). The advent of NGS in 2005 enabled DNA sequencing that was cheaper, faster and required less DNA than Sanger sequencing and led to a revolution in clinical human genetics82.

41 Figure 1.9: Illustration of Illumina sequencing. The fragments are then amplified in- situ by PCR and sequenced in parallel by the addition of nucleotides in a process similar to Sanger sequencing but performed multiple times in parallel. (Adapted from Johnsen et al, Blood 201389)

1.6.1 NGS in the diagnosis of Mendelian disorders

The use of NGS in research has increased rapidly in recent years in line with exponentially falling costs and by the start of this project was being used in clinical practice in both the diagnosis of Mendelian disorders and cancer diagnostics90-94. NGS can be categorised into two main types: sequencing of the entire genomic DNA (whole genome sequencing – WGS) and targeted sequencing. In the latter, either all the coding portions of the genome can be sequenced (Whole exome sequencing – WES) or smaller subsets of specific regions can be sequenced in the form of disease-specific panels. It should be noted that in all NGS methods, Sanger sequencing is still used as the gold standard method for confirming

42 variants detected by NGS and it is also used to complement NGS, sequencing poorly covered regions if necessary.

1.6.1.1 Whole exome sequencing

Sequencing of the entire coding portion of the genome became technically possible in 200995 and by the time this project started in 2012, multiple studies had demonstrated the utility of WES for identifying the cause of Mendelian disorders where conventional approaches had failed41,92,93,95,96. WES platforms are typically designed to capture all the exons and their flanking regions in the genome using specific probes, however they can be designed to target additional regions of interest80. Sequencing the whole exome was an attractive strategy in 2012 for identifying the genetic basis of rare BPD because, although the exome accounted for only 1-2% of the genome, it contained 85% of known Mendelian disease-causing variants93. Furthermore, a large proportion of rare, protein- altering variants are predicted to be deleterious, suggesting the exome is likely to be enriched for disease-causing variants80,97. At that time, whole genome sequencing was technically possible but not widespread and still more expensive than WES.

1.6.1.2 Targeted panel sequencing

Panel sequencing involves the targeted sequencing of a set of genes associated with a particular disease and in 2012 these were already being integrated into patient care as diagnostic tests for a variety of diseases including haematological disorders, cancers, cardiomyopathies, neurological and retinal disorders90,94,98,99. The rationale for targeted panels is that by targeting regions that have been widely studied and implicated in a particular disorder or group of disorders, this would be both cheaper and give a higher diagnostic yield than WES or WGS, thus turning NGS into a valuable tool for diagnosis. Gene panels can be specifically designed to target the exons of target genes plus additional regions including splice-sites and regions particularly relevant to disease. They can cover areas often missed by WES often have higher coverage and use copy number detection approaches to identify copy number variations and large deletions76.

43 1.6.1.3 Whole genome sequencing

The obvious advantage of WGS is that all regions of the genome are included and variant detection is not limited to just the coding regions nor previously known disease- associated loci. After the success of the project in 2003 which cost approximately $3 billion, the cost of whole genome sequencing began to fall, costing $14 million dollars in 2006, $5000-10,000 in 2012 and predicted to fall to less than $1000 for a single human genome in 2016100. WGS has other advantages over targeted sequencing since it avoids errors in target capture and is more reliable at detecting structural variants101. In 2012, the use of whole genome sequencing in research was increasing but the majority of groups were using WES and targeted panel sequencing as these were faster and cheaper than WGS.

1.7 Proposed benefits of NGS in BPD

The BRIDGE bleeding and platelet disorders study (BRIDGE-BPD) was initiated in 2010- 2011 by Prof Willem Ouwehand at Cambridge University. The primary aim was to identify the genetic basis of unresolved BPD by combining deep multi-system phenotyping with NGS. As outlined above, BPD diagnosis in the NHS was inadequate and laborious and the genetic basis of most BPD was unknown. NGS in the form of WES or WGS would avoid the bias of GWAS, linkage and candidate-gene studies and was not limited by existing knowledge of platelet function pathways or small pedigrees. NGS reduces the delay currently seen in the search for diagnosis in patients with BPD and is increasingly efficient and affordable. The design and set-up of the study is described in chapters 2 to 4. Genetic diagnosis is important for patients with BPD because it avoids misdiagnosis, informs optimal treatment and can provide clarity about disease progression. There is increasing recognition that a number of IPDs are associated with severe pathologies, including an increased risk of malignancy and a definitive diagnosis can inform prognosis and care. Moreover, genetic counselling can be provided to families if a genetic diagnosis is made. Secondly, enabling a genetic diagnosis in more patients with BPD will increase the number of cases of the extremely rare disorders. This in turn is likely to improve our ability to correctly identify disease-causing variants and disease-associated genes.

44 Furthermore, an increase in case numbers will improve our understanding of genotype phenotype correlations and mechanisms of disease. Discovery of novel BPD genes will also increase our understanding of biological pathways important for haemostasis and potentially identify novel targets for therapies to treat bleeding and thrombosis.

1.8 Thesis Aims

The overall aim of this PhD has been to apply high-throughput sequencing to rare BPD in order to identify novel causal variants and provide a diagnosis to patients with BPD. This has been conducted as part of the BRIDGE-BPD international consortium study and the NIHR BioResource for Rare Diseases.

My specific aims were to 1. Design and develop methodology for the study to enable the recruitment, phenotyping and accurate data collection of BPD cases. 2. Optimise the Human Phenotype Ontology (HPO) for use in BPD and apply HPO to the phenotyping of recruited cases. 3. Recruit probands with likely inherited BPD to the study from Imperial Healthcare NHS Trust at Hammersmith Hospital 4. Analyse sequencing data on cases enrolled from Hammersmith Hospital and identify causal variants in known BPD genes. 5. Analyse phenotypic data from all BPD cases enrolled in the BRIDGE and NIHR Bioresource studies to identify phenotype clusters to assist in the interpretation of sequence variants. 6. Identify and characterise variants in novel candidate BPD genes

45 Methods and materials

2.1 BRIDGE-Bleeding and Platelet Disorders: Study design and recruitment

The BRIDGE-Bleeding and Platelet disorders pilot study began in 2011, prior to the start of this thesis. In 2012, when I began work on the study, recruitment was expanded to 3 UK haemophilia centres, including the haemophilia centre in Imperial College NHS Trust at Hammersmith Hospital, and 1 European centre. Between 2013 and 2017 recruitment was coordinated from 37 international centres. Incorporation of BRIDGE into the NIHR BioResource-Rare Disease study in 2014 meant that samples from all rare disease studies in the NIHR BioResource-RD were processed on the same sequencing and analysis pipelines, enabling coordinated analysis across all 13,000 cases. The NIHR BioResource- RD also became a pilot study in 2014 for the Genomics England 100,000 genomes project102. The primary aim of BRIDGE-BPD was to identify the genetic basis of hitherto unresolved rare, inherited bleeding and platelet disorders.

2.1.1 Ethics

The study was approved in the UK by a UK research ethics committee in 2011 (UK REC 04/Q0108/44) and approval for non-UK enrolment centres was sought from appropriate national ethics authorities. In 2014 the BRIDGE-BPD study was incorporated into the NIHR BioResource-Rare Disease (RD) study (UK REC 13/EE/0325) alongside 14 other rare disease categories. Many substantial amendments to the study protocols, patient information leaflets and consent forms were required as the study progressed and these were all submitted for appropriate research and ethics committee approval.

46 2.1.2 Eligibility criteria

Eligibility criteria were necessarily broad to include as many children and adults as possible with any abnormality of platelet number, morphology or function and/or any abnormality of bleeding for which a specific molecular explanation could not be found using conventional laboratory testing. It was essential that there was a high likelihood of the disorder being genetic, (other affected family members or childhood onset) and that acquired causes were excluded.

Table 2.1: Eligibility criteria for the BRIDGE-BPD study. Inclusion criteria

All ages Platelet count (PLT) <100x109/L or >400x109/L Mean Platelet Volume (MPV) < 6fl or >12 fl Abnormal platelet function test Abnormal platelet morphology Any combination of the above Any bleeding disorder of unknown molecular aetiology Any of the above or a combination of the above in the context of a wider syndrome (e.g. of neurodevelopmental, immunological, skeletal, etc) High likelihood of being genetic such as: • Early onset • Other pedigree members affected • Sporadic cases most likely caused by ‘de novo’ mutations Exclusion criteria Use of prescription or over-the-counter drugs known to be associated with abnormal platelet phenotypes or bleeding High likelihood of ITP or other autoimmune disorders associated with low platelet count (including HIV positivity)

47 Other medical conditions known to be associated with acquired abnormal platelet phenotypes • Malignancies, particularly those compromising haematopoiesis • Bone marrow aplasia • TTP and HUS • Acute viral infection • Splenomegaly • Uraemia or hepatic failure • Others

2.1.2.1 Identification of eligible cases with BPD

Eligible cases were identified at each recruiting centre by local recruiting clinicians. At Hammersmith hospital patients meeting the eligibility criteria were identified in the following ways: 1. Paper and electronic clinic letter archives. 2. Clinical laboratory records of extended platelet function testing (eg. platelet nucleotide assays) 3. New and existing patients attending haemostasis clinics 4. Patients identified as potentially eligible for the study by clinicians in other local hospitals and referred to the Hammersmith Hospital for assessment.

2.1.3 Local recruitment

Patients identified from clinic letter and laboratory archives were initially sent a standard letter of invitation to participate in the study, accompanied by a comprehensive patient- information leaflet. After a suitable interval, postal invitations were followed-up by a phone call in order to maximise participation. Patients were recruited via face-to-face interview and written consent was obtained. Patient demographics, full medical history and pertinent clinical and laboratory findings were recorded on a case report form (CRF) (see section 2.1.5.1 for details). Clinical and laboratory data was also recorded directly onto the secure online study database (see section 2.1.5 for more details). Written consent was obtained either by the recruiting clinician or another member of the research team.

48 2.1.3.1 Recruitment of relatives

Relatives who were present at the time of proband enrolment were also invited to participate in the study. Specific information leaflets and consent forms were provided for relatives and for parents of any children under 16. A detailed family history was taken from all probands at the time of enrolment and probands could be asked to invite their relatives to participate. Members of the study team were not able to contact relatives directly until communication had been initiated by the relative themselves. Recruitment, consent and sample collection was the same as for probands. Saliva sampling kits (Oragene OG-500, DNA Genotek, Ottawa, USA) could be posted to relatives who were unable to attend the Hammersmith haemophilia centre to provide a blood sample but phenotype could not be confirmed in this way.

2.1.4 Sample collection

A venous blood sample (6-10mls in EDTA) was obtained for full blood count, blood smear analysis and genomic DNA (gDNA) extraction from all probands at enrolment. Additional tests to determine phenotype (e.g platelet function testing) would be performed at enrolment if not available from patient records and if clinically indicated, for example in a new patient (table 2.2). There was no minimal set of investigations required for enrolment to the study and investigations were determined by local practice in individual recruiting centres. Further testing may be performed at a later date if necessary to establish a broader phenotype, phenotype-genotype relationships or as part of co- segregation studies (section 2.4.4). gDNA could be extracted from saliva instead of blood when relatives were unable to attend a recruitment centre. Saliva samples were obtained by individuals as per the instruction provided (Oragene (OG-500), DNA Genotek Inc, Ottawa, Canada). Stored DNA samples containing at least 6 micrograms of DNA could be used if this was deemed more appropriate by the clinical care team than obtaining a fresh sample All cases were allocated a unique 10-digit BRIDGE ID. All patient samples were labelled with the BRIDGE ID to ensure patient anonymity and a link table between patient- identifiable information and BRIDGE ID was kept both on a local NHS central secure computers and central secure server at the University of Cambridge. A unique sequencing

49 identifier was allocated at the time of sequencing and this was linked to the unique BRIDGE ID. Samples were sent by post to the CATGO laboratory at the University of Cambridge where genomic DNA was isolated from venous blood or saliva obtained from the cases at enrolment or from archived samples. DNA library capture was performed using ROCHE NimbleGen SeqCap EZ 64Mb Human Exome Library version 3.0 (ROCHE NimbleGen, Inc. Madison, WI, USA) .

Table 2.2: Samples collected and tests performed on locally enrolled cases. Samples collected Tests performed if not available Results recorded only if from all enrolled cases from patient records OR if available from patient clinically indicated records Full blood count Coagulation factors Bone marrow biopsy Blood smear PFA-100 Platelet surface glycoprotein measurement 3-6mls venous blood Platelet light transmission Platelet electron microscopy in EDTA for gDNA aggregometry extraction Platelet lumi-aggregometry Platelet nucleotide assay

2.1.5 Data collection

Clinical and laboratory data from each case was recorded on a paper case report form (section 2.1.5.1.1 and Appendix) by the recruiting clinician and transferred to an electronic database (section 2.1.5.1.3). All data collected on BRIDGE-BPD cases was anonymised with the unique BRIDGE identifier allocated at enrolment and linked to the samples (see previous section). A link table between patient identifiable information and BRIDGE ID was stored locally on an NHS secure server.

2.1.5.1 Design of data collection tools

Specific data collection tools were designed for the BRIDGE-BPD study to meet the following criteria:

50 • Clinical relevance: must be able to record information that is required to accurately and comprehensively record the BPD phenotype of each individual without enforcing the collection of data that is less relevant and possibly unavailable. • Capture the richness and diversity of clinical information: Allow for clinical interpretation of results as well as raw data. • Bioinformatically useful: Collect data in a computable format; convert non-categorical into categorical data where possible. • Encourage collection of pertinent clinical features which may not be routinely sought: for example, specifically ask about disorders in other organ systems. • Practical: Easy and quick to use.

2.1.5.1.1 Case report form

A paper case report form (CRF) was designed to be completed during the patient recruitment interview by the recruiting clinician. The order broadly followed the sequence of a standard medical consultation in order to ensure efficient completion in real-time and the structure was complementary to the online database in order to facilitate data transfer. The CRF contained sections for clinical and laboratory data, including all laboratory data that may be pertinent to a patient with a bleeding or platelet disorder. Clinicians were specifically requested to record the presence of any significant medical problems that were not directly related to the BPD that were either unusual or likely to be inherited. This was to detect potential syndromic disorders and associations between BPD and wider phenotypes. For example, ischaemic heart disease in a younger patient, severe neurological disease or any disorder in which there was a strong family history. Pedigree relationships were also recorded. Examples of the CRF are given in the Appendix.

2.1.5.1.2 Bleeding assessment tool

Bleeding symptoms from cases enrolled locally were recorded using the Condensed MCMDM-1VWD bleeding questionnaire63,64 at the time of enrolment. This bleeding assessment tool (BAT) was developed and validated for the assessment of diagnosis of Type 1 VWD and was not validated for all BPD, but in 2012 it was a widely used tool in the UK for documenting bleeding symptoms. It had also been shown to be useful for the

51 assessment of mild bleeding disorder in an unselected population64 and in women with menorrhagia103. The condensed MCMDM-1VWD BAT was therefore chosen above alternatives such as the ISTH-BAT65 and the Rockefeller-BAT66 which were developed specifically to evaluate the bleeding symptoms of patients referred with any possible BPD but were not yet widely used in the UK and hadn’t been validated in any setting. No studies had been conducted directly comparing different BATs in unselected BPD. The BAT was predominantly used in BRIDGE-BPD for the documentation of bleeding symptoms at enrolment rather than assessment of bleeding severity or diagnosis.

2.1.5.1.3 Online data collection tool

In order to provide a portal where individual recruiting clinicians could directly enter data on their patients, a dedicated online data collection tool was developed for the BRIDGE-BPD study which met the criteria outlined above. This was necessary to accommodate international recruiting centres and to provide an element of standardisation to the documentation of bleeding and platelet phenotypes. As stated, it was designed in conjunction with the paper CRF so the two were comparable and data entered into the paper CRF by the recruiting clinician could be easily transferred by another member of the research team into the online database. Patient data was recorded online using the anonymised, unique BRIDGE ID. Most clinical data were recorded as categorical variables in the form of HPO terms (see chapter 3) but in order to keep the richness of clinical information it was also necessary to record some free text information (for example ‘this case has a severe bleeding history with a bleeding score of 15 but normal platelet morphology and platelet function testing. Mother and grandfather also affected’). Most laboratory data could be recorded as quantitative variables, for example the full blood count parameters. Some quantitative data such as platelet light transmission aggregometry responses, measurement of ATP secretion and platelet nucleotide levels are less standardised and interpretation is highly dependent on local controls and reference ranges. Clinicians were asked to interpret the results from these tests as per their local protocols and record them as either normal or abnormal.

52 Blood film or bone marrow light microscopy and platelet electron microscopy reports could not be recorded as categorical variables and free-text sections were provided for these.

To avoid sequencing DNA samples on patients where no data was available minimal requirements were set: the type of disorder (i.e. primarily a bleeding disorder, platelet number or morphology disorder or both), age of onset and platelet count were essential. Failure to provide these would mean that DNA samples would not be processed. Data-entry pages were updated after an initial trial-run from the first few recruiting centres. Help-tabs were added to clarify what exactly was required for each field (fig 2.2). Drop-down menus were added next to many of the numerical laboratory data fields to allow the input of data in whatever units were used locally (eg. Haemoglobin measured in g/dL or mg/ml) (fig 2.2). Automated checks were later inserted to detect when laboratory parameters had been recorded with incompatible units Fields were included to account for different methods of recording the same parameter.. For example, the platelet size can be assessed by automated measurement of the mean platelet volume (MPV) or by blood film light microscopy. Patients may have one, both or neither recorded. As this was an important piece of data for every patient, especially when thrombocytopenic. Space was provided to enter data for both MPV and blood film microscopy and a mandatory tick-box added to record whether any patient had macrothrombocytopenia or not (fig 2.2). Data entry could not proceed without completion of this section.

53

Figure 2.1: Electronic data entry pages for the BRIDGE-BPD study. This figure shows some examples of the electronic data capture pages. (A) Example of the General Information entry page which was designed to capture basic patient demographics, the type of disorder and whether there was a family history. The dialog box is an example of a ‘help’ feature – it appears if the user queries what is expected from the ‘type of disorder’ entry box. (B) Example of the Full Blood Count entry page

54 for collecting numerical laboratory data, including the macrothrombocytopenia tick box – answering this question was made mandatory to be able to proceed with data-entry (C) Examples of the unit drop-down menus available for numerical laboratory data to enable accurate data entry from different recruiting centres.

55 The platelet function data entry pages were designed to capture these data as categorical variables as outlined above. PFA-100 measurements were recorded both as raw data and in categorical form normal or prolonged or shortened Platelet aggregation responses to various agonists, flow cytometry results and platelet nucleotide assays were recorded as normal or abnormal (fig 2.2).

Figure 2.2: Electronic data entry pages for results of platelet function testing. A screenshot of the main platelet function entry page including results entered for an example patient is shown at the top. (B) The entry box for PFA-100 analyser results, including the drop-down menu to record a result as normal, prolonged or shortened. (C) ‘Help’ dialog box which appears if a user requires assistance to interpret what is required in the Aggregometry ADP data entry section.

56 Bleeding scores were widely used in the UK but use was variable between recruiting centres particularly internationally, so a data entry page was added to accommodate those who did not routinely use a bleeding score. Bleeding symptoms were recorded as either numerical data (where clinicians had used the condensed MCMDM1-VWD bleeding questionnaire) or as binary categorical data, where clinicians recorded yes or no for the presence of each bleeding symptom if they had not performed the MCMDM1-VWD score.

Figure 2.3: Online documentation of bleeding symptoms with (top) and without(bottom) a bleeding score.

57 Human Phenotype Ontology (HPO) terms were also recorded via the online data collection tool (fig 2.4). Use of HPO terms is discussed in more detail in chapter 3.

Figure 2.4: HPO data entry. (A) Screenshot of the HPO term menu for recording phenotypes in any organ system.(left) and some of the options under abnormal platelet function (right). (B) screenshot of the HPO terms entered for a representative case with thrombocytopenia and storage pool disorder.

58 2.2 Development and use of the Human Phenotype Ontology

The Human Phenotype Ontology (HPO)104 was used to annotate clinical and laboratory phenotypes of all cases enrolled in BRIDGE-BPD and other rare disease studies in the NIHR BioResource. It was necessary to develop and optimise HPO terms pertinent to BPD prior to their use. This is detailed in chapter 3.

2.3 Methods for genotyping

2.3.1 DNA extraction

Genomic DNA from locally-enrolled cases was isolated centrally in the Cambridge Translational Genomics (CATGO) laboratory in Cambridge (University of Cambridge and NHS Blood & Transplant) from either venous blood collected in EDTA or saliva collected in Oragene-DNA OG- 500 or OG-575 sampling kits (DNA Genotek, Ottawa, Canada). Other recruiting centres also sent the majority of venous blood samples to Cambridge for gDNA extraction but a minority of cases had gDNA extracted at the recruitment center either historically or at enrolment. Extracted DNA was quality controlled by gel electrophoresis and DNA concentration was checked using three independent measurements: Picogreen (Life Technologies Ltd, Paisley, UK), Qubit (Life Technologies) and Glomax (Promega, Madison, USA). For each case 1μg of their DNA sample was fragmented by ultrasound using Covaris E220 (Covaris Inc., Woburn, MA, USA) to obtain an average size of 200 (bp) DNA fragments.

2.3.2 Sequencing

DNA libraries were prepared for whole exome sequencing following manufacturers protocols using ROCHE NimbleGen SeqCap EZ 64Mb Human Exome Library version 3.0 (ROCHE NimbleGen, Inc. Madison, WI, USA) and the Illumina TruSeq DNA LT sample prep kit (Illumina Inc., San Diego, CA, USA). Libraries were sequenced as 100bp paired-end reads on an Illumina Hiseq 2000 instrument at the Wellcome Trust Sanger Institute (WTSI, Hinxton UK). DNA libraries were prepared for whole genome sequencing at the WTSI using an Illumina TruSeq DNA PCR-free sample preparation kit (Illumina Inc) and libraries were sequenced

59 in 125-base pair paired-end reads by on a HiSeq 2000 or HiSeq X by Illumina Cambridge Inc. (Great Chesterford, UK). A minimum coverage of 15X was generated for 95% of the genome.

2.3.3 Variant calling

Reads were aligned to the Genome Reference Consortium human genome build 37 (GRCh37) using Isaac genome alignment software105. Realignment around indels and base call quality calibration was performed using GATK 2.3_924. Samples were excluded from analysis if the coverage was below 10X in 70% of the genome or if the estimated contamination rate computed by VerifyBamId 1.0.025 was >5%. Variations from the reference genome were detected and variants were selected for further analysis based on the rarity in the general population (estimated based on allele frequency in reference cohorts listed in table 2.3) and predicted effect on protein function. Variants meeting the following criteria were excluded as unlikely to cause rare disease: (a) variant allele frequency >0.1% in any of the reference cohorts listed in table X (b) variant not predicted to alter protein by snpEff 3.4 or Ensemble variant effect predictor 106,107 (c) variant not present in other affected family members (d) <3 reads supporting the alternate allele (e) overall allele count >20 across other BRIDGE projects in the NIHR Bioresource.

Table 2.3: Reference cohorts used to estimate the allele frequency of variants in the general population Reference cohort Description 1000 genomes project108 1092 individuals of African, European and Asian ancestry from the HapMap project UK10K109 10,000 individuals including twin cohorts, healthy individuals and patients with rare diseases, obesity, neurodevelopmental disorders NHLBI Exome sequencing project 6500 unrelated individuals including >2200 African- (ESP)110 Americans. Healthy controls and cases with cardiovascular disease.

60 Exome aggregation consortium and Cohort of initially 60,706 unrelated individuals (ExAC), Genome aggregation database expanded in 2017 to include 123,136 (WES) and (ExAC/GnomAD)111 15,496 individuals (WGS) in GnomAD. Cases with severe paediatric disease removed.

Structural Variants (SVs) were identified in the individuals who had WGS using two independent algorithms: Isaac Copy Number Variant Caller (Canvas, Illumina), which identifies copy number gains and deletions based on read depth, and Isaac Structural Variant Caller (Manta, Illumina), which identifies translocations, deletions, tandem duplications, insertions, and inversions based on both paired read fragment spanning and split read evidence. Gene and variant nomenclature follows the HGNC and HGVS guidelines112,113.

2.4 Methods for analysing sequencing data

The variants called in section 2.3.3 next followed two complementary analysis pathways. Local analysis performed by me focussed on the identification of pathogenic variants in locally enrolled cases. Central analysis focussed on strategies to identify novel genes and pathogenic variants across the entire BPD cohort. To facilitate these processes, variants were annotated using various tools listed in table 2.4.

Table 2.4: Variant annotation tools used in local and central analysis . Tool Utility

Variant Effect Online tool from Ensembl used to predict the effect on the protein of Predictor (VEP)107 called genomic variants and to provide SIFT and Polyphen 2 scores (http://grch37.ensembl.org/Homo_sapiens/Tools/VEP?db=core)

Combined- Integrated scoring system combining many different individual variant Annotation annotations (eg. SIFT, Polyphen, GERP & other conservation metrics, dependent regulatory information such as transcription factor binding) into a score depletion which correlates with deleteriousness. The CADD phred score was used (CADD)114 to prioritise variants more likely to be deleterious. Variants scoring >10 are in the highest 10% of all variant scores; >20= highest 1%; >30 = highest 0.1%.

SIFT115 In silico predictor of missense variant deleteriousness. Used as an additional assessment of the likely deleteriousness of missense variants

61 Polyphen 2116 In silico predictor of missense variant deleteriousness. Used as an additional assessment of the likely deleteriousness of missense variants

ExAC/GnomAD117 Cohort of initially 60,706 unrelated individuals (ExAC), expanded in 2017 to include 123,136 (WES) and 15,496 individuals (WGS) in GnomAD. Used to estimate the frequency of a particular variant in human populations Human Genome A curated database of germline variants associated with human disease Mutation in peer-reviewed literature. Used to highlight variants which are Database reported as disease-causing (HGMD)118

2.4.1 Identifying novel genes

The main aim of the BRIDGE BPD project was to identify novel candidate genes causing rare BPD and the cohort was analysed alongside the ~11,000 other rare disease cases enrolled in the NIHR BioResource for Rare Diseases (NIHRBR-RD). The NIHRBR-RD cases provided a set of ‘true controls’ who were unlikely to have severe BPD and were processed and sequenced on the same platform. A central core analysis team comprising study principal investigators, computational scientists, clinical PhD fellows and a study coordinator was responsible for analysing sequencing and phenotyping data and identifying novel candidate genes. Novel statistical methods developed by the computational scientists were applied to combine the phenotype data with the sequencing data. Candidate genes identified in specific pedigrees or phenotypic sub- groups were assessed by the core analysis team and prioritised using literature and database searches to summarise prior knowledge of the gene and variant annotation tools listed in table 2.4. Co-segregation studies were conducted in informative pedigrees and laboratory studies planned to confirm functional effects of variants. Specific methods are discussed in relation to specific genes in later chapters.

2.4.2 Identification of variants in Tier-1 BPD genes

In order to identify variants which could potentially cause (previously unsuspected) cases of known BPD in cases enrolled into the BRIDGE study, the list of tier-1 BPD genes curated by the ISTH SSC for genomics in thrombosis and hemostasis119 was used. These genes have established associations with coagulation, platelet and thrombotic disorders.

62 The list is regularly updated to include the latest discoveries in peer-reviewed literature and the current version as of January 2018 contains 95 genes associated with 75 disorders of coagulation, platelets or thrombosis (table 4.2). After exclusion of variants as described in section 2.3.3, variants in tier-1 BPD genes were considered as potential candidates and selected for further analysis under the following criteria • Allele frequency <0.1% in ExAC/GnomAD • CADD score >10 • Missense or predicted high-impact (splice-site, frameshift, stop-lost or stop- gained) – 5’UTR variants in ANKRD26 were also included as this is a well- established location of pathogenic variants in this gene120. If variants were listed in Human Genome Mutation Database (HGMD)118 then the allele frequency threshold was increased to <2.5%. Variants meeting the above criteria in tier-1 BPD genes in locally enrolled cases underwent local detailed assessment, discussed in chapter 4.

Table 2.5: ISTH Tier-1 BPD gene list, last updated July 2017 Coagulation Disorders Gene Symbol Alpha 2 antiplasmin deficiency SERPINF2 Angioedema, hereditary, type III & Factor XII deficiency FXII Combined factor V and VIII deficiency LMAN1; MCFD2 Factor V deficiency F5 Factor VII deficiency F7 Factor X deficiency F10 Factor XI deficiency F11 Factor XIII deficiency F13A1; F13B Fibrinogen deficiency FGA; FGB; FGG Haemophilia A F8 Haemophilia B F9 Kininogen deficiency KNG1 Multiple coagulation factor deficiency type 2 VKORC1 Multiple coagulation factor deficiency type 3 GGCX

63 Plasminogen activator Inhibitor 1 deficiency SERPINE1 Prothrombin deficiency F2 von Willebrand disease VWF Thrombotic Disorders Gene Symbol Antithrombin deficiency SERPINC1 Familial thrombotic thrombocytopenic purpura ADAMTS13 Heparin cofactor 2 deficiency SERPIND1 Histidine-rich glycoprotein deficiency HRG Plasminogen deficiency PLG Protein C deficiency PROC Protein S deficiency PROS1 Thrombomodulin deficiency THBD Tissue plasminogen activator deficiency PLAT Platelet and other bleeding disorders Gene symbol ADP receptor defect P2RY12 ARC syndrome VIPAS39; VPS33B Autosomal dominant thrombocytopenia 2 ANKRD26 Autosomal dominant thrombocytopenia 4 CYCS Autosomal dominant macrothrombocytopenia GP1BA; GP1BB Bernard-Soulier syndrome GP1BA; GP1BB; GP9 Bleeding diathesis due to glycoprotein VI deficiency GP6 Bleeding disorder, platelet-type 20 SLFN14 Congenital amegakaryocytic thrombocytopenia (CAMT) MPL Deficiency of phospholipase A2, group IV A PLA2G4A Dense granule abnormality NBEA Ehlers-Danlos syndrome, classic type COL1A1, COL5A1, COL5A2 Ehlers-Danlos syndrome, musculocontractural type CHST14 Ehlers-Danlos syndrome, vascular type COL3A1 Familial hemophagocytic lymphohistiocytosis type 5 STXBP2 Familial platelet disorder with predisposition to AML RUNX1 Ghosal syndrome TBXAS1

64 Glanzmann thrombasthenia ITGA2B; ITGB3 Gray platelet syndrome NBEAL2 Gray platelet-like syndrome GFI1B Hermansky-Pudlak syndrome AP3B1; AP3D1; BLOC1S3; BLOC1S6; DTNBP1; HPS1; HPS3; HPS4; HPS5; HPS6 Leukocyte integrin adhesion deficiency, type III FERMT3 Macrothrombocytopenia ACTN1, TPM4; FLNA Macrothrombocytopenia and sensorineural hearing loss DIAPH1 Macrothrombocytopenia, Beta-tubulin 1 related TUBB1 MYH9-related disorders MYH9 Myopathy associated with thrombocytopenia GNE Noonan syndrome PTPN11 Paris-Trousseau thrombocytopenia and Jacobsen syndrome FLI1 Platelet-type von Willebrand disease GP1BA Platelet-type bleeding disorder 18 RASGRP2 Quebec platelet disorder PLAU Radioulnar synostosis with amegakaryocytic HOXA11 thrombocytopenia 1 Radioulnar synostosis with amegakaryocytic MECOM thrombocytopenia 2 Roifman syndrome RNU4ATAC Scott syndrome ANO6 Sisterolemia & macrothrombocytopenia ABCG5; ABCG8 Storage pool disorder GFI1B Stormorken syndrome ORAI1; STIM1 Thrombocytopenia & immune deficiency ARPC1B Thrombocytopenia 3 FYB1 Thrombocytopenia and erythrokeratoderma KDSR Thrombocytopenia and susceptibility to cancer ETV6 Thrombocytopenia and thrombocythemia 1 THPO Thrombocytopenia-absent radius syndrome (TAR) RBM8A Thrombocytopenia, anaemia and myelofibrosis MPIG6B

65 Thromboxane A2 receptor defect TBXA2R Wiskott-Aldrich syndrome WAS X-linked thrombocytopenia with dyserythropoiesis GATA1 .

2.4.3 Classification of pathogenicity and research reporting

Formal classification of variants was performed by the Thrombogenomics MDT which was set up primarily to assign pathogenicity to variants identified in tier-1 genes in the Thrombogenomics diagnostic panel78,121. The MDT comprises a study coordinator, geneticist and haematologist plus additional scientists and haematologists as required for assessment of individual variants. The established and validated pathway was also used to assign pathogenicity to tier-1 gene variants identified in the BRIDGE-BPD sequencing project. The MDT assessed all the evidence of pathogenicity in accordance with recently published guidelines122-124 integrating data from HGMD, disease-specific and gene-specific databases and population reference databases (table 2.3); comparing HPO phenotypes of each case in comparison to the HPO gene profile (see chapter 3) and the phenotype of reported cases; considering variants in the context of known mechanisms of disease and established modes of inheritance and also taking into account information from co-segregation studies. For the results presented in this thesis, the following variant classification system for tier- 1 variants has been used, based on international guidelines122. • Clearly pathogenic variant (CPV): Variant previously reported in at least 4 unrelated pedigrees with a similar phenotype OR a loss of function variant in a gene where loss of function is an established mechanism of causing disease • Likely pathogenic (LPV): Variant previously reported in less than 4 unrelated pedigrees with a similar phenotype • Variant of unknown significance (VUS): Any variant not meeting the criteria for CPV or LPV. Research reports were generated for clearly and likely pathogenic variants and sent directly to recruiting clinicians for feedback of pertinent findings. Variants could be up- or downgraded between categories dependent on co-segregation results within a pedigree, functional studies providing evidence of novel mechanisms or if there is corroboration or inconsistency between other cases in the NIHR BioResource with the

66 same variant. Pathogenicity status could also change if more cases were reported with the same variant.

2.4.4 Recall and Co-segregation studies

Co-segregation studies were performed when candidate novel variants were identified in either tier-1 or novel BPD genes. The presence of variants in all affected (and absence in non-affected) relatives provided supporting evidence of pathogenicity. Probands were asked to invite their relatives to participate and relatives were then contacted by post or phone by the study team. Dedicated information leaflets and consent forms were provided for relatives, otherwise the recruitment process was the same as already described for probands. Probands could be recalled if further samples were required for functional studies to confirm or investigate pathogenicity. Local cases were invited to the Hammersmith haemophilia centre. Relatives and probands enrolled at other centres would be invited to attend their local recruitment centre or to see the central study team in Cambridge. Home visits from a member of the study team were also arranged when a relative could not attend clinic. An individual blood sampling protocol was developed for each participant specific to the gene and phenotype under investigation. In cases where the participant could not attend clinic or home visit was not feasible and the BPD phenotype was already documented, then saliva sampling kits could be posted to participants. gDNA was then extracted from saliva samples.

2.4.5 Sanger sequencing

Confirmatory Sanger sequencing was performed on all pathogenic variants detected by whole exome and whole genome sequencing on a fresh venous blood sample from the proband and also on samples from relatives as part of co-segregation studies. gDNA was extracted from either blood or saliva samples as previously described. Primer design, PCR amplification, quantification and purification were performed by Dr Jonathan Stephens at the University of Cambridge. Sanger sequencing was done by Source Bioscience (Cambridge).

67 2.5 Feedback and Data-sharing

2.5.1 Pertinent findings

Consent was obtained from all participants for the option of receiving feedback of results from the study. Feedback of results is to the recruiting clinical care team who are responsible for feedback to patients and families. Only variants labelled clearly pathogenic or likely pathogenic in genes known to be associated with BPD (otherwise called pertinent findings) and confirmed in clinically-accredited laboratories are fed back to patients.

2.5.2 Data-sharing

All confirmed pathogenic variants will be deposited annotated with HPO phenotypes in ClinVar125. All sequencing results are released to the European Genotype to phenotype Archive126, linked to clinical data in the NIHR BioResource database by anonymised study identifiers.

68 2.6 Methods for ROCK1 in vitro studies

2.6.1 General molecular biology techniques 2.6.1.1 Agarose gel electrophoresis

Agarose gel electrophoresis was used to separate DNA according to size for visualisation and purification. Gels were prepared by mixing agarose powder (Sigma) with TBE buffer

(Tris 0.89M, Boric acid 0.89M, EDTA-Na2-2H2O 0.02M) at the desired concentration (e.g. 1g agarose in 100mls TBE buffer for a 1% gel). The mixture was heated in the microwave until completely dissolved. Once cooled, 5uL SYBRsafe DNA gel stain (Invitrogen) was added per 100ml gel to enable DNA visualisation under UV light. Gels were poured into a gel case, well comb inserted and allowed to cool at room temperature or at 4∘C until solidified. Once solidified, well combs were removed and gels were placed in the electrophoresis unit and covered with TBE buffer. Samples were prepared by adding 6X gel loading dye (NEB) before loading into the wells. A 1Kb DNA molecular weight marker (NEB) was added to the first lane of the gel. Gels were run at 100V for 45 minutes and then examined under UV light to visualise DNA fragments.

2.6.1.2 Gel extraction and purification

After agarose gel electrophoresis, DNA bands were manually excised and weighed before purification using the QIAquick gel extraction kit (Qiagen) following the manufacturer’s protocol. Briefly, 3X volume of buffer QG was added to the gel band heated at 50∘C for 10 minutes with occasional vortexing until the gel was completely dissolved. 1X volume of ispropanol was added before adding the whole sample to a spin column which was then centrifuged at 13,000rpm for 1 min to adsorb the DNA onto a silica membrane. Flow- through was discarded followed by two rounds of buffer addition and centrifugation as per manufacturer’s instructions. DNA was eluted by centrifugation in 30uL buffer EB (10mM Tris-HCl, pH 8.5) and stored at -20∘C. Presence of DNA bands was checked by running on 1% agarose gel for 10 mins prior to use.

69 2.6.1.3 Transformation into competent E. Coli

Plasmid DNA was transformed into high efficiency, chemically competent E. Coli using heat-shock transformation. DH10B (Invitrogen) and Top10 (ThermoFisher) E. Coli were used in the work presented in this thesis. Cells were stored at -80∘C and thawed on ice before use. 1-5uL ligation or plasmid DNA was added to 50uL Top10 cells (or 100uL DH10B cells) and incubated on ice for 30 minutes. Heat shock was performed at 42∘C for 30 minutes followed by incubation on ice for another 2-5 minutes. 900uL SOC outgrowth medium (NEB or Invitrogen) was added to the transformation reaction and cells were then incubated on a shaker at 37∘C for 1 hour. 20-100uL transformation reaction was plated onto fresh, warmed LB-agar plates containing 100ug/mL ampicillin and incubated overnight at 37∘C. Single colonies were picked on the following day, used to inoculate a starter culture containing 5ml LB medium plus 100µg/mL ampicillin and incubated overnight at 37∘C while shaking at 220rpm. Cells were then pelleted by centrifugation at 4000rpm for 5 minutes, supernatant discarded and cells either frozen at -20∘C or used directly for DNA isolation in mini-prep. If larger amounts of DNA were required, 100-1000uL of starter culture was transferred into 100-150mls LB medium + 100µg/mL ampicillin and incubated at either 37∘C for 8-12 hours or 30∘C for 24 hours on a shaker at 220rpm. Cells were then pelleted by centrifugation at 4000 rpm for 5 minutes and the pellet frozen at - 20∘C.

2.6.1.4 Isolation of plasmid DNA

Plasmid DNA was isolated using the NEB Monarch plasmid mini-prep kit (New England Biolabs) for small-scale isolation and to perform confirmatory restriction digest or Sanger sequencing (see section 2.6.3.1). Once correct DNA sequences were confirmed, midi or maxi-prep kits were used to isolate larger amounts of DNA from large-scale cultures. Three different kits were used in the work presented in this thesis: Pureyield maxiprep system (Promega, UK), ZymoPURE midi and maxi-prep kits (Zymo Research, CA, USA). Mini, midi and maxi preps were all performed according to manufacturers’ protocols using either centrifugation or vacuum-manifold methods. DNA was eluted in manufacturer-supplied elution buffers and DNA concentration was measured on a NanodropTM spectrophotometer (Thermo). All DNA samples were stored at -20∘C.

70 2.6.2 Construction of pc-DNA3.1 minus B-ROCK1 plasmid 2.6.2.1 Isolation of ROCK1 cDNA

ROCK1 cDNA was extracted from a megakaryocyte cDNA library, donated by Dr Angela Doerr, Imperial College London. Primers were designed to produce ROCK1 cDNA with overhangs complementary to the destination vector (pcDNA3.1B-) cut using the EcoRI and KpN1 restriction enzymes (see section 2.6.1.2.3). Primers were designed using the NEBuilder online primer design tool127.

Table 2.6: Primers used to extract ROCK1 cDNA insert from MEG01 cDNA library. Primer sequences Annealing temperature Forward 5’ ACTGTGCTGGATATCTGCAGATGTCGACTGGGGACAGTTTTG 63.8∘C Reverse 5’ TGTTCTAGAAAGCTTGGTACACTAGTTTTTCCAGATGTATTTTTGACC 63.8∘C

The ROCK1cDNA was amplified by polymerase chain reaction (PCR) using the HiFi App polymerase kit (Appleton Woods). The 50µL PCR reaction mixture contained 10µL 5X HiFi reaction buffer, 2µL forward and reverse primers, 1µL megakaryocyte cDNA template, 34µL distilled H2O and 1µL HiFi App polymerase. The thermal cycling parameters are given below.

Step Temperature/∘C Time Number of cycles Denaturation 95∘C 1 min 1 Denaturation 95∘C 15 seconds Annealing 63.8∘C 15 seconds 30 Extension 72∘C 2 min Extension 72∘C 4 min 1

The PCR product was submitted to agarose gel electrophoresis which confirmed a band at approximately 4kB as expected. The amplified ROCK1 sequence was extracted using the QIAquick gel extraction kit and correct sequence confirmed by sanger sequencing.

71 2.6.2.2 pcDNA3.1(-)B vector

The pcDNA3.1/myc-His(-)B (Invitrogen) is a mammalian expression vector with a CMV promoter, allowing efficient, high-level expression in mammalian cells. It also contains an SV40 promoter which allows episomal replication in cells expressing the SV40 large T antigen, such as HEK293T cells and an ampicillin resistance gene to allow selection in E. Coli. The multiple cloning site is in frame with a C-terminal myc epitope and a polyhistidine tag, both for detection and purification (fig. 2.5)

72

Figure 2.5: pcDNA3.1(-)B vector map (top) and multiple cloning site (MCS) (bottom). Illustrating major features and highlighting the section of the MCS which is missing from the cloned sequence (after restriction digest using EcoRI and Kpn1 restriction endonucleases) in red.

73

2.6.2.3 Restriction digest of pcDNA3.1B-

The pcDNA31.1(-)B vector was linearised by restriction digest using CutSmart buffer® with EcoRI and Kpn1 restriction enzymes (New England Biolabs). The reaction mixture contained 5uL plasmid DNA, 3uL 10X CutSmart buffer, 1uL of each enzyme and 20µL water and was incubate at 37∘C for 30 minutes. The digestion products were run on a 1% agarose gel, extracted and purified using a gel extraction kit (as described in sections 2.6.1.1. and 2.6.1.2).

2.6.2.4 Ligation of ROCK1 insert and pcDNA3.1B- vector

The NEB HiFi DNA Assembly cloning kit (New England Biolabs) was used to assemble the prepared ROCK1cDNA fragment into the pcDNA3.1(-)B vector following manufacturer’s instructions. The reaction mixture contained 0.5µL vector (section 2.6.2.2), 3µL ROCK1 insert (section 2.6.2.1), 10uL NEB HiFi 2X mastermix and 6.5µL H2O and was incubated at 50∘C for 15 minutes. The assembled product was transformed into DH10B cells as described in section 2.6.1.1.3. DNA was isolated from starter culture by mini-prep and sent for Sanger sequencing using T7 and BGHR primers (see section 2.6.3.1) to confirm the correct start and end sequences of the ROCK1 open-reading frame (ORF) and that it was in-frame with the myc-His tag. Once this had be confirmed, the assembled ROCK1 plasmid DNA (ROCK1-His) was re-transformed in DH10B cells for large-scale culture and DNA isolation. The entire ROCK1 ORF sequence was confirmed by Sanger sequencing (primers listed in table 2.8) prior to use in transfections or site-directed mutagenesis.

2.6.3 Site-directed mutagenesis to create ROCK1 mutants

Variants were introduced into the ROCK1 cDNA using the Q5 site-directed mutagenesis kit, containing Q5 HiFi DNA polymerase (New England Biolabs) as per manufacturer’s instructions. The PCR reaction mixture contained 1.5uL Q5 HiFi 2X Mastermix, 1.25uL forward and reverse primers at 10uM, 1uL template DNA (ROCK1-His, section 2.6.2.4) and 9µL water. The thermal cycling parameters are listed below. Primers used to create each variant were designed using the NEBaseChanger™ online tool (http://nebasechanger.neb.com/) and are listed in table 2.7.

74

Step Temperature/∘C Time Number of cycles Denaturation 98∘C 30s 1 Denaturation 98∘C 10s Annealing Specific to each primer 20s 25 Extension 72∘C 4.5min Extension 72∘C 2min 1

PCR products were incubated with the Q5 kit KLD enzyme mixture in the following reaction mixture for 20 minutes at room temperature: 1uL PCR product, 5uL 2X KLD buffer, 1uL 10X KLD enzyme mix and 3uL water. This step allowed phosphorylation and re-circularisation of the plasmid DNA and also digestion of any residual methylated template DNA with Dpn1. Mutagenesis reactions were transformed into DH10B or Top10 cells and DNA isolated as described.

75 Table 2.7: Primers designed for site-directed mutagenesis ROCK1 variant Nucleotide change in Forward primer 5’-3’ Reverse primer 5’-3’ Annealing ROCK1 open-reading temperature/∘C frame M156L 466 A/C GATGGAATACCTGCCTGGTGG ACCATGTAGAGATAACGATC 59

R403C 1207 C/T TTATAGCAATTGTAGATACTTATCTTC TATGTAAATCCTACAAAAGGTAG 58

R403L 1208 G/T TATAGCAATCTTAGATACTTATCTTCAG ATATGTAAATCCTACAAAAGG 56

L312F 934 C/T AGCAAAAAACTTTATTTGTGCC TCTTTTGATATGTCATTATCATC 56

Y405* 1214-1215 AA insertion AACTTATCTTCAGCAAATCCTAATG TATCTACGATTGCTATAATATG 56

K105G 313-314 AA/GG ATATGCTATGGGGCTTCTCAGCAAATTTG ACCTTCCTGGTGGATTTATG 60

76 2.6.3.1 Confirmatory Sanger sequencing of ROCK1 plasmid and mutants

The entire ROCK1 ORF sequence was confirmed by sanger sequencing in ROCK1-His wild-type plasmid and all mutants to check the presence of intended variants and ensure the absence of any unintended mutations, prior to use in transfections. Sequencing primers were designed using Snapgene® software (GSL Biotech; available at snapgene.com), checked for off-target alignments using the NCBI-BLAST online tool (https://blast.ncbi.nlm.nih.gov/Blast.cgi) and oligonucleotides were ordered from either Sigma, UK or Thermo Scientific UK. Sanger sequencing was performed by GENEWIZ (Essex, UK) using primers provided at 5uM diluted in water (listed in table 2.8) and company-supplied T7 and BGHR universal primers which align to the pcDNA3.1(-)B vector. Sanger sequencing results were aligned to the ROCK1-His sequence using Snapgene® software.

Table 2.8: Primers used for sanger sequencing of ROCK1-His plasmid Primer name Sequence 5’-3’ T7 TAATACGACTCACTATAGGG ROCK1_156_F CAGTCCTTGGGTTGTTCAGC ROCK1_403_F AGCACCAGTTGTACCCGATT ROCK1_b_F GTGTCTCAGATTGAGAAGGAG ROCK1_c_F CGTTTAACTGACAAACATCA ROCK1_c_R CGATTCATTATTTCTGCCAA ROCK1_d_F GAATATCAACACTGAACGAACCC BGHR TAGAAGGCACAGTCGAGG

2.6.4 HEK293T cell culture

HEK293T cells were cultured in humidified incubators at 37∘C, 5% CO2 in complete growth medium [Minimal Essential Medium (MEM, Sigma Cat# M2279) supplemented with 10% Fetal bovine serum (FBS, Biosera), 1U/mL penicillin/ 0.1mg/mL streptomycin (Invitrogen) and 2mM L-Glutamine (Invitrogen)]. Cells were cultured in 20-25ml growth medium in T175 culture flasks and passaged when they reached 70-80% confluency:

77 medium was removed, cells were washed twice in 10mls sterile phosphate buffered saline (PBS; Sigma) then incubated with 2mls TryplE Express (Life Technologies cat# 12605-010) at 37∘C for two minutes. Once cells were detached, trypsin was quenched by the addition of 10mls fresh growth medium and cells were split 1:3 into fresh culture flasks.

2.6.4.1 HEK293T cell cryopreservation and revival

Cells were cryopreserved at early passage in liquid nitrogen. When cells were 70-80% confluent, they were washed with PBS, trypsinised and quenched in 10mls complete growth medium as above. Cells were centrifuged at 1200rpm for 5 minutes at room temperature, the supernatant was discarded and the cell pellet resuspended in 3mls freezing medium (90%FBS, 10% dimethyl sulfoxide (DMSO)). Cells were immediately then transferred into 3x1ml cryovials and stored in a cryo-freezing container (Nalgene) at -80∘C until transferred to the liquid nitrogen for long-term storage at approximately - 180∘C. Cells were revived from liquid nitrogen storage by thawing 1 cryovial at 37∘C and transferring 1ml cells directly to 10mls pre-warmed growth medium in a T25 culture flask. Medium was changed every 24 hours until cells reached 70-80% confluency when they were passaged into larger flasks.

2.6.5 Transient transfection of HEK293T cells

HEK293T cells were seeded into 6-well-plates 24-48 hours prior to transfection in 2mls complete growth medium per well. Cells were transfected when 70% confluent using Lipofectamine 3000 (Invitrogen). Medium was changed to 1.5mls fresh complete growth medium pre-warmed to 37∘C, 30 minutes prior to transfection. 3.75uL Lipofectamine 3000 reagent was incubated with 2. 5µL P3000 reagent (Invitrogen) and 1250ng plasmid DNA in 250µL OptiMEM reduced serum medium (Gibco) according to manufacturer’s instructions. 250uL of the Lipofectamine/DNA/OptiMEM mixture was then added dropwise to each well of a 6-well plate and incubated overnight at 37∘C. Medium was changed to fresh complete growth medium (2mls per well) after 12-24hrs.

78

2.6.6 Protein techniques 2.6.6.1 Cell lysate collection

Cell lysates were collected from HEK293T cells cultured in 6-well plates at indicated time-points. Medium was first removed before cells were collected in 1ml ice-cold RIPA lysis buffer (Merck Millipore cat#20-188) containing protease inhibitors (Roche; cOmplete, mini, EDTA-free protease inhibitor cocktail tablets: 1 tablet per 10ml lysis buffer). Cells were transferred to a 1.5ml micro-centrifuge tube on ice and incubated for 20 minutes. Cells were pelleted by centrifugation at 13,000 rpm for 10 mins then the supernatant was transferred to a fresh 1.5ml micro-centrifuge tube on ice and stored at - 20∘C.

2.6.6.2 Determination of protein concentration using the BCA assay

Total protein concentration in HEK293T cell lysates was measured using the PIERCE BCA protein assay kit (Thermo Fisher cat#23225). This assay is based on the principle of the Biuret reaction where polypeptides can reduce copper from Cu2+ to Cu1+ in an alkaline environment, producing a colour change. In the BCA assay, the bicinchonic acid (BCA) reacts with Cu1+ to produce a purple colour, the intensity of which correlates with the protein concentration. The assay was performed in flat-bottomed, uncoated 96-well plates. Bovine serum albumin was used as a standard and diluted to eight known concentrations with distilled water (range 0-2000ug/mL). Six serial dilutions of the sample to be tested were also prepared. 200uL cupric sulphate pentahydrate was added to 10mls BCA prior to use to make the working reagent. 25uL of each sample dilution or protein standard was pipetted into each well of a 96-well plate. 200uL of working BCA reagent was then added to each well, gently mixed on a plate shaker for 30 seconds and then incubated at 37∘C for 30 minutes. The colorimetric change was measured by absorbance at 562nM in a plate reader. The absorbance measurements from the BSA standard dilutions were used to make a standard curve from which the protein concentration of the tested samples could be calculated.

79

2.6.6.3 Western blotting

Western blotting was used to determine protein expression in transfected HEK293T cells.

2.6.6.3.1 SDS-Polyacrylamide gel electrophoresis (SDS-PAGE)

Proteins of interest in HEK293T cell lysates were resolved using SDS-PAGE under reducing conditions. Lysates were diluted in water to achieve a standard concentration according to the total protein concentration measured by BCA assay prior to dilution in loading buffer. Samples to be analysed were diluted in 4X reducing loading buffer [Novex LDS sample buffer (Invitrogen) +2% β-mercaptoethanol] and heated at 90∘C for 10 minutes. 10-20µg protein in a volume of 15 or 30µL was loaded per well of Novex BOLT Bis-Tris 4-12% pre- cast polyacrylamide gels (Invitrogen). A pre-stained protein molecular weight standard marker (Invitrogen) was run alongside all samples. All gels were run in Novex BOLT MES SDS running buffer (Invitrogen) at a constant 200V for 30-45 minutes.

2.6.6.3.2 Probing and developing

Proteins were transferred from polyacrylamide gels to a nitrocellulose membrane (Amersham UK; Protran premium 0.45uM) in transfer buffer (ThermoFisher 1-step transfer buffer) using the Pierce power-blot automated transfer system (ThermoFisher) at 25V for 15 minutes. Following transfer, membranes were blocked with 5% non-fat milk diluted in phosphate buffered saline containing 0.1% Tween-20 (PBST) for either 1- 2 hours on a shaker at room temperature or overnight at 4∘C. They were then incubated with primary specific antibody overnight at 4∘C or on a shaker for 1-2 hours at room temperature if the primary antibody was conjugated to horseradish peroxidase (HRP). Membranes were then washed with PBST followed by incubation with HRP-conjugated secondary antibody where required for 1-2 hours on a shaker at room temperature. All primary and secondary antibodies were diluted in PBST and dilutions are listed in table 2.9 . Membranes were thoroughly washed in PBST prior to protein detection using an ECL detection kit (Millipore; Immobilon chemiluminescent HRP substrate). Briefly 3mls HRP substrate luminol reagent and 3mls HRP substrate peroxide solution were mixed

80 together and applied to each membrane for 5 minutes. The membrane was then exposed to Hyperfilm (Amersham, UK) and manually developed.

Table 2.9: Antibodies used in Western blotting. HRP: horseradish peroxidase Primary antibody Dilution Secondary antibody Dilution in PBST in PBST Rabbit polyclonal anti-ROCK1 1:4000 Goat polyclonal 1:5000 (Abcam ab45171) anti-rabbit-HRP (DAKO P0448) Rabbit polyclonal anti-6X His 1:20,000 n/a tag-HRP (Abcam ab1187) Mouse monoclonal anti- GAPDH 1:5000 Goat polyclonal anti-mouse- 1:5000 (Santacruz sc365062) HRP

2.6.6.3.3 Stripping and re-probing

Western blot membranes were stripped prior to re-probing with a different primary antibody, usually when probing with anti-GAPDH antibodies as a control. Membranes were washed twice in stripping buffer (15g glycine, 1g SDS, 10ml Tween20 in 1L distilled

H2O, pH2.2) followed by washing twice in PBS and PBST before re-blocking in PBST-milk and probing with anti-GAPDH antibodies as above.

2.6.7 Scratch wound healing assay

HEK293T cells were seeded in 6-well plates in 2mls complete growth medium and transfected with plasmid DNA as described. When cells were in a confluent monolayer, usually 24 hours after transfection, a scratch was made down the centre of each well using a 20µL sterile pipette tip. Medium was then removed, cells washed twice in 1ml PBS to remove the dead cells and 2mls fresh complete growth medium replaced. Each scratch wound was marked in three places on the bottom of the well to ensure the same area was imaged at serial time-points. Scratch wounds were imaged in three places at 0 hours then at 24-hour intervals using an inverted microscope at 200x magnification (Olympus CKX41) and a QImaging camera (Rollera XR) with Q capture Pro software. Image J software was used to analyse the digitalised images and measure wound area at each time-point. Wound area was calculated using a scale of 764pixels per millimetre at 200X magnification.

81

2.6.8 Statistical analysis

Data analysis was performed using Graphpad Prism 7 (GraphPad Software, CA, USA). Analysis of variance (ANOVA) and T-tests were used to measure the differences between groups and p values <0.05 were considered significant.

82

The BRIDGE-Bleeding and platelet disorders (BRIDGE-BPD) study: recruitment and phenotyping

The BRIDGE bleeding and platelet disorders study (BRIDGE-BPD) was set up in 2010- 2011 with the aim of using next-generation sequencing to understand the molecular basis of inherited rare bleeding and platelet disorders. The target population was children and adults with any abnormality of platelet number, morphology or function and/or any abnormality of bleeding for which a specific molecular explanation could not be found using conventional laboratory testing. It was essential that there was a high likelihood of the disorder being genetic, (other affected family members or childhood onset) and that acquired causes were excluded. Methods of recruitment were described in chapter 2. This chapter will detail the cases recruited locally at Hammersmith hospital and the outcomes from my role in recruitment of and data collection from over 1000 patients globally with inherited rare bleeding and platelet disorders. I will describe the optimisation and utility of the Human Phenotype Ontology (HPO) to systematically phenotype all patients to facilitate data analysis.

3.1 BRIDGE-BPD study collection

Between 2011 and 2017 more than 2000 eligible BPD patients and their relatives were enrolled into the BRIDGE-BPD and NIHR BioResource RD-BPD projects from nine countries (table 3.1). At the time of writing, high-throughput sequencing and phenotypic data was available for 1951 cases, including 1602 probands.

83

Table 3.1: Total number of cases enrolled into BRIDGE and NIHR BioResource – Bleeding and platelet disorders (BPD) studies. Total recruitment UK 1312 Belgium 562 France 170 USA 109 Germany 25 Italy 18 Israel 9 Netherlands 8 Australia 1 Total 2214

Recruitment from Imperial College NHS Trust at Hammersmith Hospital Haemophilia Centre began in September 2012 until the study closed to recruitment in March 2017. I initially identified 153 patients who appeared to meet the eligibility criteria through clinic letter archives and historical platelet function test results. Of these 153 patients, 7 did not meet the eligibility criteria on subsequent re-evaluation of clinical and laboratory phenotypes; 11 declined to participate; 56 patients did not respond to the study invitation by letter or phone call. 1 patient could not attend the clinic and I interviewed, consented and obtained samples from her at home. The remaining 78 patients were interviewed face-to-face in the Haemophilia Centre by either me (n=70) or another trained clinician (n=8), recruited to the study and provided written informed consent prior to having blood samples taken. Between 2012 and 2017 a further 52 eligible patients were identified when they attended the haemostasis clinic at the Hammersmith Hospital Haemophilia centre, the joint haematology-obstetric clinic at Queen Charlottes and Chelsea hospital (QCCH), or were referred by clinicians at other local hospitals. A total of 130 cases were enrolled at Imperial College NHS Trust, including 106 probands and 15 affected relatives meeting eligibility criteria (fig 3.1). Venous blood samples were obtained from all enrolled cases at Hammersmith hospital and sent for extraction of genomic DNA and high-throughput sequencing as described in chapter 2, section 2.3.

84

Clinic letter and platelet function testing archives

153 patients eligible for study

7 not eligible when interviewed 11 declined to participate 56 did not respond to invitation

78 patients •70 recruited by me recruited •8 recruited by other clinicians Cases identified during clinic visits • 38 eligible probands • 5 affected relatives • 9 unaffected relatives

130 patients recruited

106 15 affected 9 unaffected probands relatives relatives

Figure 3.1: Flow-chart illustrating recruitment of patients at Hammersmith Hospital.

3.2 Collection of data

Clinical and laboratory characteristics from all locally recruited patients were recorded on both a local database and on the central BRIDGE database, as described in Chapter 2 Methods, section 2.1.5 .

85

3.3 Description of the study collection

The majority of the 130 patients recruited from Hammersmith hospital were female (n=104, 78%). Early analysis of the BRIDGE-BPD cohort in 2014 found that overall 63% of enrolled patients were female128. The bleeding and platelet phenotypes of 1602 probands enrolled to the BRIDGE-BPD study are shown alongside the phenotypes of the 106 probands recruited from Hammersmith hospital in Fig. 3.2. In both the BRIDGE-BPD and Hammersmith cohorts, more than three-quarters of cases were enrolled with abnormal bleeding symptoms (1245 of 1602 probands (78%) and 90 out of 106 probands (85%) respectively). In locally recruited cases, the label abnormal bleeding was applied to cases with a bleeding score at enrollment of at least 4 and/or any bleeding that was judged clinically to be excessive and unexplained. In cases enrolled at other centres abnormal bleeding was defined by the recruiting clinician according to their individual practice and coded as the abnormal bleeding HPO term (see section 3.4). Abnormal platelet function was recorded in 49% (n=789) of probands in BRIDGE-BPD and 65% (n=69) of Hammersmith cases. In locally recruited cases, the label abnormal platelet function was applied to cases with any of the following: abnormal platelet light- transmission aggregometry (LTA) to any routinely used agonists; otherwise unexplained prolonged PFA-100 closure times; reduced platelet granule content measured by lumi- aggregometry, nucleotide assay or electron microscopy. Abnormal platelet function in other recruiting centres was defined according to their local practice.

86

Figure 3.2: Phenotypes of probands recruited to the BRIDGE BPD study. The bar chart indicates the percentages of all probands from both the entire BRIDGE- BPD collection (total n=1602, in blue) and those recruited from Hammersmith hospital (HH)(total n=106, in red) who were enrolled into the study in each of the four phenotype categories (abnormal bleeding, abnormal platelet function, thrombocytopenia and thrombocytosis as defined in the text). Patients can be in more than one category (for example can have thrombocytopenia and abnormal bleeding) so the cumulative percentages total >100%. (TCP =thrombocytopenia). Data correct for all patients for which both sequencing and phenotype data was available as of January 2018.

Both cohorts had similar proportions of cases enrolled with thrombocytopenia: n=486 (30%) in BRIDGE-BPD and n=25 (24%) from Hammersmith. Thrombocytopenia was defined according to normal ranges at individual recruiting centres. Of the 25 Hammersmith cases enrolled with thrombocytopenia, 16 (66%) had a platelet count <100 x109/L and 18 (72%) had large platelets by either automated measurement of mean platelet volume or by light microscopy. Similar proportions were seen in the thrombocytopenic cases in the BRIDGE cohort with 287 out of 486 cases (59%) having plts<100 and 283 of 486 (58%) having macrothrombocytopenia (fig 3.3 A&B). 7% of cases in the BRIDGE-BPD cohort were enrolled with thrombocytosis; no thrombocytosis cases were enrolled from Hammersmith (fig 3.3)

87

88

Figure 3.3: Comparison of thrombocytopenic and bleeding cases between the locally enrolled BPD cases and the entire BRIDGE-BPD cohort. (A) Cases enrolled at Hammersmith hospital (top) and across BRIDGE-BPD (bottom) with thrombocytopenia are in dark blue. The proportion of thrombocytopenic cases who had a platelet count <100x109/L is indicated in light blue. n=total number of cases with thrombocytopenia (plt count <150x109/L). (B) Proportion of thrombocytopenic cases with large platelets is indicated in green. (C) Cases enrolled with abnormal bleeding (as defined in the text) with the proportion who have no platelet or coagulation abnormality (undefined bleeding disorder) identified marked in pale purple. HH=Hammersmith hospital; BRIDGE=BRIDGE- BPD cohort

Of the 90 Hammersmith patients with abnormal bleeding, 16% (n=14) had no platelet or coagulation abnormality detected in routine laboratory testing. This sub-group had normal platelet counts and morphology, normal coagulation parameters and did not have abnormal platelet function (defined above). This subgroup are henceforth referred to as undefined bleeding disorder (UBD) (fig 3.3 C). A similar proportion was identified across the BRIDGE cohort, with 17% (n=212) of the 1245 probands with abnormal bleeding recorded as having no platelet or coagulation abnormality (fig 3.3 C).

89

3.4 Application of the Human Phenotype Ontology

Thanks to rapid advancements in sequencing technology, the greatest challenge in large-scale sequencing projects is no longer in the sequencing itself but in the interpretation of huge amounts of sequencing data. For the BRIDGE-BPD study, in order to correctly interpret sequence variation, it was essential to first systemically organise the clinical and laboratory data collected from patients across 45 recruiting centres in 9 countries. To help with the computation of this vast amount of heterogeneous data, we used the Human Phenotype Ontology (HPO)129 to annotate the phenotype of each individual patient enrolled into the BRIDGE-BPD study.

3.4.1 Introduction to the Human Phenotype Ontology (HPO)

An ontology is defined as ‘A set of concepts and categories in a subject area or domain that shows their properties and the relations between them’130 and the HPO was first developed in 2007 131 with the aim of enabling large-scale computational analysis of all phenotypes encountered in human disease i.e. a computational tool for the human phenome132. It was originally developed by mining the text in OMIM for definitions of all disease types and had 8,000 phenotype terms in the first publication131. It is manually curated, updated by clinical experts and the current data release contains nearly 12,000 HPO terms with more than 120,000 annotations to rare diseases. HPO operates under an open licence and since its introduction its use has been adopted by Orphanet, the online reference for rare diseases133; the Monarch Initiative which aims to integrate genotype and phenotype data across species134; DECIPHER135,136 and a range of gene discovery projects in a wide range of diseases104. The HPO comprises a set of >11,000 terms in which each term describes a clinical abnormality. It is grouped into 4 sub-ontologies: Phenotypic abnormality, Clinical modifier, Mortality/aging or Mode of inheritance. Phenotypic abnormality is the main sub-ontology and contains the majority of the terms, grouped into 23 main categories or parent terms (referred to as ‘leading classes’). These leading classes

90 are listed in table 3.2. HPO terms are organized in a structured hierarchy: each term is a subclass of its parent term(s) and terms become increasingly specific as one travels along any particular sequence (Fig. 3.4). All terms are in “is-a” relationships with their parent terms129. HPO terms for a particular patient or group of patients can be illustrated in a direct acyclic graph (DAG). In the example shown in a patient with Wiskott-Aldrich syndrome, thrombocytopenia is-a abnormality of platelet number is-a abnormality of thrombocytes is-a abnormality of blood and blood forming tissues (Fig 3.4). Each term has a unique identifier (e.g. HP:0011877 for increased mean platelet volume; HP:0004846 for prolonged bleeding after surgery).

91

Table 3.2: The 23 leading classes which make up the phenotypic abnormality sub-ontology of the HPO Phenotypic abnormality

Abnormality of the genitourinary system Abnormality of head or neck Abnormality of the eye Abnormality of the ear Abnormality of the nervous system Abnormality of the breast Abnormality of the endocrine system Abnormality of the skeletal system Abnormality of prenatal development or birth Growth abnormality Abnormality of the integument Abnormality of the voice Abnormality of the cardiovascular system Abnormality of blood and blood-forming tissues Abnormality of metabolism/homeostasis Abnormality of the respiratory system Neoplasm Abnormality of the immune system Abnormality of the musculature Abnormality of connective tissue Abnormality of the digestive system Abnormality of limbs Abnormality of the thoracic cavity

92

Figure 3.4: Direct Acyclic Graph (DAG) representation of HPO terms. (A) DAG showing the hierarchical is-a relationships between the child term thrombocytopenia and its sequential parent terms. (B) DAG representation of HPO terms in a patient with Wiskott-Aldrich syndrome.

When all the phenotypes are recorded for a particular patient, or groups of patients, a more complex direct acyclic graph representing all the phenotypes can be produced. The DAG in figure 3.5 illustrates the phenotypes recorded for a group of 7 unrelated patients found to have MYH9-related disorder. This DAG also illustrates two other attractive features of the HPO. Firstly, that a child term can be related to more than one parent term (for example epistaxis is both an abnormality of the nose and abnormal bleeding). Secondly, child terms are recognised as being related not just to their immediate parent but also to every parent in the hierarchy, ie. epistaxis is an abnormality of the nose but is also an abnormality of the head and neck. This is clearly illustrated in DAG format (fig 3.5).

93

Figure 3.5: Direct acyclic graph representation of HPO terms in 7 patients with MYH9-related disorder

94

3.4.2 Rationale for using HPO

HPO was selected to record phenotypes in BRIDGE in preference to alternative established disease coding systems such as ICD-10137 and SNOMED-CT138. There were several reasons for this. Firstly we considered HPO to be better suited to describing a range of phenotypes in cases who do not have a specific diagnosis. ICD-10 is well- established but is a post-diagnosis classification system and less well suited to descriptions of phenotypes in patients where a diagnosis has not yet been made. SNOMED-CT is also better suited to recording specific diseases than clinical and laboratory phenotypes in multiple organs. HPO has a greater range of terminology and therefore greater specificity than either of these two systems. Online Mendelian Inheritance in Man (OMIM139) is another open-access reference with comprehensive descriptions of rare diseases but the vocabulary is neither controlled nor consistent and the relationships between phenotypes are not well defined. Secondly, HPO is specifically designed for the description of clinical phenotypes of human rare diseases. It is designed to be intuitive for the clinical community and is interpretable by both humans and computers. It is not a fixed system but can be modified and optimised as required for specific disease areas. Since its inception, HPO has been validated in a variety of rare disease gene discovery projects104, used to develop computational algorithms140,141 and proved useful in the prioritization of candidate genes142,143 including the comparison of phenotypes across species144. As far as the BRIDGE-BPD team were aware there was no alternative phenotyping system which fulfilled all these characteristics.

3.4.3 Development of HPO terminology for BPD

When we began using HPO for the BRIDGE-BPD project, the ontology had only relatively recently been developed and the set of terms describing disorders of the blood was not as comprehensive as for some other organ systems. For example, we were unable to accurately annotate platelet function disorders and there were insufficient terms to record platelet granule abnormalities. I worked with Dr Anne Kelly, Dr Tadbir Bariana and Prof Kathleen Freson to develop areas of the HPO version 887 which were

95 insufficient for the needs of the project. This was mostly achieved by identifying terms needed to code the phenotypes of patients we had already recruited which were not already present in the HPO. As a result of this analysis, 78 HPO terms were added to the ontology (table 3.3). The majority of terms were a member of the ‘abnormality of blood and blood-forming tissues’ leading class. Terms in other leading classes include vitamin K deficiency (in ‘abnormality of metabolism/homeostasis’ leading class) and 4 terms in the ‘abnormality of limbs’ leading class which can be associated with specific platelet disorders (unilateral radial aplasia, flattened metacarpal heads, shortening of all phalanges of fingers, abnormality of metacarpophalangeal joint). Two terms embryonic onset and fetal onset are not phenotypic abnormalities and were added to a different set of HPO terms labelled clinical modifiers. One entirely new super class was added - abnormality of bone marrow cell morphology. The majority of new terms were under either the abnormality of thrombocytes or abnormal bleeding super classes which reflects the characteristics of the study collection and the expertise of the group.

Table 3.3: New terms added to the Human Phenotype Ontology version 887. (BBFT = Abnormality of blood and blood-forming tissues; super class = immediate child-term to the leading class term; * = terms which are members of more than one leading class) New HPO ID New HPO term HPO leading class HPO super class HP:0011880 Acute disseminated BBFT abnormal thrombosis intravascular coagulation HP:0002584 Intestinal bleeding* BBFT abnormality of bleeding HP:0003010 Prolonged bleeding BBFT abnormality of time bleeding HP:0007420 Spontaneous BBFT abnormality of hematomas* bleeding HP:0007902 Vitreous haemorrhage* BBFT abnormality of bleeding HP:0011884 Abnormal umbilical BBFT abnormality of stump bleeding bleeding HP:0011885 Haemorrhage of the BBFT abnormality of eye* bleeding HP:0011886 Hyphema BBFT abnormality of bleeding

96

HP:0011887 Choroid haemorrhage* BBFT abnormality of bleeding HP:0011888 Bleeding requiring red BBFT abnormality of cell transfusion bleeding HP:0011889 Bleeding with no or BBFT abnormality of minor trauma bleeding HP:0011890 Prolonged bleeding BBFT abnormality of following procedure bleeding HP:0011891 Post-partum BBFT abnormality of hemorrhage bleeding HP:0011896 Subconjunctival BBFT abnormality of haemorrhage* bleeding HP:0005528 Bone marrow BBFT abnormality of bone hypocellularity marrow cell morphology HP:0012130 Abnormality of cells of BBFT abnormality of bone the erythroid lineage marrow cell morphology HP:0012135 Abnormality of cells of BBFT abnormality of bone the granulocytic lineage marrow cell morphology HP:0100827 Lymphocytosis* BBFT abnormality of bone marrow cell morphology HP:0011898 Abnormality of BBFT abnormality of circulating fibrinogen coagulation HP:0011899 Hyperfibrinogenemia BBFT abnormality of coagulation HP:0011900 Hypofibrinogenemia BBFT abnormality of coagulation HP:0011901 Dysfibrinogenemia BBFT abnormality of coagulation HP:0012146 Abnormality of von BBFT abnormality of Willebrand factor coagulation HP:0012147 Reduced quantity of BBFT abnormality of von Willebrand factor coagulation HP:0005527 Reduced kininogen BBFT abnormality of activity* coagulation cascade HP:0011895 Anemia due to reduced BBFT abnormality of life span of red cells erythrocytes HP:0011902 Abnormal hemoglobin BBFT abnormality of erythrocytes HP:0011903 Hemoglobin H BBFT abnormality of erythrocytes HP:0011904 Persistence of BBFT abnormality of haemoglobin F erythrocytes

97

HP:0011905 Reduced haemoglobin BBFT abnormality of A erythrocytes HP:0011906 Reduced beta/alpha BBFT abnormality of synthesis ratio erythrocytes HP:0011907 Reduced alpha/beta BBFT abnormality of synthesis ratio erythrocytes HP:0012131 Abnormal number of BBFT abnormality of erythroid precursors erythrocytes HP:0012132 Erythroid hyperplasia BBFT abnormality of erythrocytes HP:0012133 Erythroid hypoplasia BBFT abnormality of erythrocytes HP:0011893 Abnormal leukocyte BBFT abnormality of count* leukocytes HP:0011897 Neutrophilia* BBFT abnormality of leukocytes HP:0001973 Autoimmune BBFT abnormality of thrombocytopenia* thrombocytes HP:0003540 Impaired platelet BBFT abnormality of aggregation thrombocytes HP:0004813 Post-transfusion BBFT abnormality of thrombocytopenia thrombocytes HP:0008148 Impaired epinephrine- BBFT abnormality of induced platelet thrombocytes aggregation HP:0008320 Impaired collagen- BBFT abnormality of induced platelet thrombocytes aggregation HP:0011869 Abnormal platelet BBFT abnormality of function thrombocytes HP:0011870 Impaired arachidonic BBFT abnormality of acid-induced platelet thrombocytes aggregation HP:0011871 Impaired ristocetin- BBFT abnormality of induced platelet thrombocytes aggregation HP:0011872 Impaired thrombin- BBFT abnormality of induced platelet thrombocytes aggregation HP:0011873 Abnormal platelet BBFT abnormality of count thrombocytes HP:0011874 Heparin-induced BBFT abnormality of thrombocytopenia thrombocytes HP:0011875 Abnormal platelet BBFT abnormality of morphology thrombocytes

98

HP:0011876 Abnormal platelet BBFT abnormality of volume thrombocytes HP:0011877 Increased mean platelet BBFT abnormality of volume thrombocytes HP:0011878 Abnormal platelet BBFT abnormality of membrane protein thrombocytes expression HP:0011879 Decreased platelet BBFT abnormality of glycoprotein Ib-IX-V thrombocytes HP:0011881 Decreased platelet BBFT abnormality of glycoprotein VI thrombocytes HP:0011882 Decreased platelet BBFT abnormality of P2Y12 receptor thrombocytes HP:0011883 Abnormal platelet BBFT abnormality of granules thrombocytes HP:0011894 Impaired thromboxane BBFT abnormality of A2 agonist-induced thrombocytes platelet aggregation HP:0012483 Abnormal alpha BBFT abnormality of granules thrombocytes HP:0012484 Abnormal dense BBFT abnormality of granules thrombocytes HP:0012491 Abnormal dense BBFT abnormality of tubular system thrombocytes HP:0012524 Abnormal platelet BBFT abnormality of shape thrombocytes HP:0012525 Abnormal alpha granule BBFT abnormality of distribution thrombocytes HP:0012526 Absence of alpha BBFT abnormality of granules thrombocytes HP:0012527 Abnormal alpha granule BBFT abnormality of content thrombocytes HP:0012528 Abnormal number of BBFT abnormality of alpha granules thrombocytes HP:0012529 Abnormal dense BBFT abnormality of granule content thrombocytes HP:0012530 Abnormal number of BBFT abnormality of dense granules thrombocytes HP:0012148 Multiple lineage BBFT hematological myelodysplasia* neoplasm HP:0012149 Bilineage BBFT hematological myelodysplasia* neoplasm HP:0012150 Single lineage BBFT hematological myelodysplasia* neoplasm

99

HP:0005561 Abnormality of bone BBFT n/a marrow cell morphology HP:0011460 Embryonal onset Clinical modifier Onset HP:0011461 Fetal onset Clinical modifier Onset HP:0011908 Unilateral radial aplasia Abnormality of abnormality of upper Abnormality of limb limbs HP:0011909 Flattened metacarpal Abnormality of abnormality of upper heads limbs limb HP:0011910 Shortening of all Abnormality of abnormality of upper phalanges of fingers limbs limb HP:0011911 Abnormality of Abnormality of abnormality of upper metocarpophalangeal limbs limb joint HP:0011892 Vitamin K deficiency Abnormality of abnormality of vitamin metabolism/home metabolism ostasis

Many HPO terms appear in more than one leading class, super-class or sub class. This reflects that the same phenotype can have more than one aetiology and that phenotypes can be categorised by their cause or their anatomical location. For example, menorrhagia (HP:0000132) can either be part of the abnormal bleeding super class which is an abnormality of blood and blood-forming tissues or it can be an abnormality of the menstrual cycle (HP:0000140) which is in the abnormality of the genitourinary system (HP:0000078) leading class. Examples of this in the newly added terms include autoimmune thrombocytopenia (HP:0001973) which can either be categorised as a thrombocytopenia (HP:0001873) which is an abnormality of thrombocytes and an abnormality of blood and blood-forming tissues or as an autoimmunity (HP:0002960) which is an abnormality of the immune system leading class (HP:0002715). Overall, 14 of the newly-added HPO terms are a member of more than one leading class (table 3.3, starred). Figure 3.6 illustrates how frequently terms within the abnormality of blood and blood-forming tissues leading class occur in other leading classes across the whole ontology. The most common system to overlap with blood and blood-forming tissues is the immune system.

100

Figure 3.6: Distribution of HPO terms shared with the blood and blood- forming tissues leading class. Bar plot showing the total number of HPO terms in each leading class. The blue bars show the number of terms which are shared with the abnormality of blood and blood-forming tissues leading class. (Figure adapted from Westbury et al, Genome Medicine 2015145)

Development of HPO terms extended beyond the BRIDGE-BPD project. I worked with 5 clinical PhD fellows (Dr Suthesh Sivapalaratnam, Dr Sarah Westbury, Dr Tadbir Bariana, Dr Minke Vries and Dr Sol Schulman) to develop HPO profiles for 63 genes on the Thrombogenomics diagnostic gene panel78. The Thrombogenomics gene panel was set up in 2013, initially as a research tool and later, in 2015, as a clinical diagnostic service. It is a sequencing panel designed to target the exons, splice regions and some regulatory regions of the genes known to be associated with coagulation, platelet and thrombotic disorders. The tier-1 gene list is curated and approved by members of the International Society for Thrombosis and Hemostasis scientific subcommittee (ISTH-SSC). We developed an HPO profile for each tier-1 gene in the Thrombogenomics panel, comprising the HPO terms commonly, typically or pathognomonically associated with the disorders with which each gene is associated. For example, the HPO terms for ITGA2B, pathogenic variants of which cause Glanzmann’s Thrombasthenia are abnormal bleeding, decreased platelet glycoprotein IIb-IIIa and impaired platelet aggregation. The HPO terms for MYH9-

101 related disorders are abnormal bleeding, cataract, giant platelets, nephropathy, neutrophil inclusion bodies, sensorineural hearing impairment and thrombocytopenia. The full list of 63 genes and associated HPO terms is provided in Appendix 3 (p.297), as published in Simeoni et al, 2016146. The HPO annotations for each gene are used in the Thrombogenomics multi-disciplinary meetings where sequencing data from patient samples are assessed and pathogenicity status assigned to variants before reporting. The success of this approach is shown in section 3.4.6.4.

3.4.4 Automated HPO suggestions

Comparison of the categorical laboratory results and HPO phenotypes coded by recruiting clinicians indicated that there was not always concordance between these two sets of results. For example, a patient may have had a low haemoglobin level recorded but no anaemia HPO term. This could result from an error in transcription of the laboratory result or the HPO phenotyping. Alternatively, if the anaemia was caused by an acquired disorder or was transient it may be deemed not pertinent to the inherited platelet disorders and therefore deliberately not coded by the recruiting clinician. In order to improve the accuracy of HPO phenotyping and account for these issues, an automated system was set up where clinicians would be prompted with ‘suggested’ HPO terms if any of the laboratory parameters fell outside a pre-set reference range. I provided clinical input to establish appropriate reference ranges for all laboratory parameters. For the FBC, the reference ranges for all values were the 2.5th and 97.5th percentiles from the UK Biobank147. Other automated suggestions would also appear if abnormal bleeding symptoms were documented or other laboratory test results had been marked as ‘abnormal’. All automated HPO suggestions had to be manually confirmed by the recruiting clinician.

3.4.5 HPO-based clustering

One of the driving forces behind the use of HPO to phenotype patients was to enable identification of phenotypically similar cases, which were hypothesised to share a genetic basis. An algorithm to identify clusters of patients with similar phenotypes was

102 developed by Dr Ernest Turro, the chief bioinformatics analyst of the BRIDGE-BPD study and Dr Daniel Greene, a computational statistics PhD fellow, both at the University of Cambridge. This was based on the following principles: 1. The information content of each HPO term is defined by the rarity of that term within the entire BRIDGE-BPD cohort 2. The similarity between two cases is measured by the overall rareness of the terms which they both share 3. Clusters are compared with the distribution of a set of random samples to ensure that the cluster is due to real HPO term similarity and not due to chance. This methodology was central to the analysis of BRIDGE data128.

3.4.6 The utility of HPO phenotyping in patients with bleeding and platelet disorders

3.4.6.1 HPO demonstrates the phenotypic complexity of BPDs An early observation made when using the HPO was that patients enrolled into the bleeding and platelet disorders study frequently also had phenotypes in other organ systems. The median number of HPO terms annotated per case was 6 in both the BRIDGE- BPD cohort (range 1-26) and the locally enrolled cases (range 1-15) (fig 3.7).

103

Figure 3.7: Human Phenotype Ontology terms in BRIDGE-BPD cases. (A) Bar chart showing the number of HPO terms annotated per case in the BRIDGE-BPD study. (B) Bar chart showing the number of HPO terms annotated per case enrolled from Hammersmith Hospital. (Figure adapted from Westbury et al, 2015128)

Sixty percent of enrolled cases were annotated with at least one HPO term belonging to a leading class other than blood and blood-forming tissues128. This highlights the breadth of phenotypes seen in bleeding and platelet disorder patients and also supports the hypothesis that extended phenotypes may help clustering. The most frequently annotated HPO terms outside the blood and blood-forming tissues leading class are terms in the nervous system (16.2% of patients) and immune system (11.7% of patients) fig 3.8

104

Figure 3.8: BRIDGE-BPD cases grouped by HPO phenotypes outside of the blood. The colours in the heatmap show the proportion of BRIDGE-BPD cases in each non-BPD phenotype group (horizontal axis) who have the bleeding or platelet abnormality, categorised into 5 sub-classes (vertical axis). The percentages indicate the proportion of the entire BPD cohort with a phenotype in that category. Figure courtesy of Dr Daniel Greene, University of Cambridge.

3.4.6.2 HPO identifies clusters of cases who have similar phenotypes In order to validate the clustering methodology, the algorithm was first applied to cases with known shared genes (patients within the same family) or similar phenotypes (patients enrolled with syndromic disorders). Significant clustering was shown in 40 of the 50 pedigrees analysed (p <0.05). Eight did not reach statistical significance and two clearly failed to cluster (p=0.33,p=0.42) (fig 3.9). Some cases had been enrolled into BRIDGE-BPD with a putative diagnosis of a known syndrome for which a genetic basis had not yet been identified. These included cases with phenotypes suggestive of Hermansky-Pudlak and Wiskott-Aldrich syndromes in whom a causative mutation had not been identified and cases with features typical of Roifman syndrome, Gorham-Stout syndrome and Pseudohypoparathryoidism type 1b for which there was no known gene at the time of enrolment. Cases within all 5 of these syndromic groups clustered appropriately and significantly (p<0.05) (fig 2.8).

105

Figure 3.9: HPO-based clustering of cases within pedigrees and syndromic cases. This heatmap illustrates the similarity between cases within the same pedigree, cases enrolled with typical features of one of five named syndromes and cases with MYH9 pathogenic variants. Cases are positioned in the same order along the horizontal axis from left to right and along the vertical axis from top to bottom (indicated by colour on each axis). Each individual is assigned a row or a column and the visible grey lines separating rows and columns delineate between each pedigree and syndrome. Each square is colour-coded to reflect the similarity between the case on x-axis and the case on the y-axis at the position where they meet. The phenotypic similarity score is highest along the central axis, indicated by the green-yellow squares. The p-values of the clustering within each pedigree and syndrome are indicated on the scatterplot to the left of the heatmap. Figure courtesy of Dr Ernest Turro and Dr Daniel Greene, University of Cambridge.

3.4.6.3 HPO-based similarity clustering identified cases with MYH9-related disorder from cases with non-pathogenic variants in MYH9.

Early analysis of the initial 716 whole exome sequenced cases revealed 13 unrelated index cases with 9 distinct variants in the MYH9 gene (MYH9)128. Clustering according to HPO phenotypic similarity was shown in seven, (p=0.005) (fig 3.8); the remaining six

106 cases did not demonstrate significant clustering (p=0.684). When the variants were individually assessed for pathogenicity, the seven cases who harboured likely pathogenic variants were the same cases who had clustered significantly closely; the six cases who did not cluster had MYH9 variants which did not meet criteria to be pathogenic and were labelled variants of uncertain significance. This analysis showed that the clustering correctly identified a group of cases with MYH9-related disorder which were not suspected clinically. Furthermore, that computerised clustering based on HPO phenotyping could potentially help identify pathogenic variants

3.4.6.4 HPO phenotyping helps identify pathogenic variants in known or suspected disorders. The HPO terms annotated to each of the genes on the Thrombogenomics gene panel were used to prioritise candidate variants in tier-1 genes discussed at MDT. When a case submitted for targeted sequencing on the Thrombogenomics gene panel is discussed at MDT, variants are usually detected in more than one gene on the panel. In order to aid selection of the correct pathogenic variant, the HPO phenotypes of the patient in question are compared with the HPO phenotype profiles assigned to each gene. The gene with the profile most similar to the patient’s HPO profile is then prioritized as the candidate. In a validation exercise of 109 cases, this approach selected the correct variant in 85% of cases78. (Fig 3.9). The HPO gene profiles are now used routinely in the Thrombogenomics MDT process to facilitate gene prioritisation. The MDT formally assigns pathogenicity to all candidate variants identified both on the Thrombogenomics sequencing panel and the BRIDGE-BPD whole genome sequencing platform

107

Figure 3.10: HPO-based gene prioritisation. Graph showing the similarity between the patient phenotype profile and the profiles of the genes in which the patient had a candidate variant. The variant that was deemed to be pathogenic in each case is highlighted in red. In 85% of cases, the pathogenic variant was in the gene which scored the highest similarity to the case phenotype out of all the candidate variants (Figure reproduced from Simeoni et al, Blood 201678)

3.5 Discussion

This chapter has described the recruitment and phenotyping of more than 2000 patients and their relatives with rare bleeding and platelet disorders to the BRIDGE-BPD and NIHR Bioresource Rare Disease studies. This is the largest collection of patients of this kind reported to date.

I led the coordination of the BRIDGE-BPD study and recruitment of patients locally at Hammersmith hospital Haemophilia Centre. 52% of patients who were initially identified as eligible were recruited to the study. All eligible patients were sent a letter of invitation with an accompanying information leaflet followed by a phone call. Only 2

108 patients responded to the letter of invitation whilst the majority were willing to participate when contacted by phone. Only 11 patients declined to participate in the study and most declined for health or geographical reasons. The main barrier to recruitment was not having up to date contact information for patients who were no longer under regular follow up at the haemophilia centre. The transient population of London and the often mild nature of platelet disorders means that many patients do not see a haematologist on a regular basis. 5 patients declined to participate once they had been interviewed, either having concerns about the implications for themselves and their family about having their DNA sequenced or not having time to return to have blood samples taken. The patients recruited from the Hammersmith hospital Haemophilia Centre were broadly representative of the whole BRIDGE-BPD collection. Similarly to other cohorts of BPD148,149, the majority of patients in both collections were female. This can be likely attributed to the more apparent bleeding symptoms associated with menstruation and childbirth. The very high proportion of females in the Hammersmith cohort may be in part explained by the local association with Queen Charlotte’s and Chelsea maternity hospital. Several of the BPD patients recruited at Hammersmith were initially diagnosed with their BPD during pregnancy at the joint obstetric-haematology clinic. In contrast to the 7% of patients in the entire cohort who were enrolled with a thrombocytosis, there were no patients recruited in this category locally. This is probably because patients with thrombocytosis may be seen in another haematology clinic at Hammersmith hospital and the recruiting focus locally was on thrombocytopenias and bleeding disorders.

As expected from the enrolment criteria of the study, abnormal bleeding was the most frequently recorded phenotype. Interestingly, 17% of cases with bleeding had normal platelet number, morphology and function and the bleeding was also not felt to be due to a coagulation defect. This is likely to reflect two major factors: Firstly, the complex and often multifactorial nature of haemostasis and bleeding; secondly, the limitations of our understanding of haemostatic pathways and the current diagnostic work up for bleeding disorders. Some patients have significant, bleeding and yet can have ‘normal’ laboratory results which suggests there are some aspects of bleeding which we cannot routinely test for, such as vessel wall abnormalities, or that our platelet testing has limited sensitivity. Whole genome sequencing may help to identify more of the genetic determinants of

109 bleeding, however this sub-group are highly heterogeneous and the presence of both genetic and acquired modifiers of bleeding (like age and non-haematological co- morbidities) make a molecular diagnosis particularly challenging in these patients.

Comprehensive phenotyping using the Human Phenotype Ontology has been central to all analysis in the BRIDGE-BPD study. Extensive development of the HPO terms to meet the needs of our project meant we could use the HPO with a high degree of sensitivity and specificity and convert heterogeneous patient phenotypes into a set of standardised, computable terms. We were able to show that patients with bleeding or platelet disorders frequently have pathologies in other organ systems. This is an important consideration when assessing patients and their families in clinic. In this collection, neurological and immune phenotypes were most highly represented. This may reflect some selection bias in certain centres with particular research interests – for example, one collaborator has a large collection of patients with autistic spectrum disorder co-existing with platelet function abnormalities – but may also reflect the emerging knowledge that many genes associated with BPD are expressed in a variety of tissues. For example, NBEAL2 and LYST (two BEACH-domain containing genes affecting granule biogenesis and trafficking in platelets) are expressed in other blood cell types and also outside the blood, reflected in the immune and neurological phenotypes seen in patients with gray-platelet syndrome and Chediak-Higashi syndrome respectively150. The validation exercises performed on the HPO-based clustering methodology showed proof-of-principle that this method can cluster patients with similar phenotypes together (syndromes) and patients who were known to have similar genetics (pedigrees). Not all of the cases within families clustered significantly more than would be expected by chance and two families surprisingly failed to cluster at all. In one family this was because the subjects were recruited at different centres, one phenotyped with a high level of detail and the other with only one or two HPO terms so this case was calculated to be more similar to other cases outside the pedigree than its own relatives. The other family which failed to cluster contained cases who were recruited at different ages – the much older relative had experienced many more hemostatic challenges and pathologies than the relative who was still a young child. The example of the patients with MYH9-related disorder also demonstrates the utility of the HPO clustering algorithm to identify cases who share a genetic basis. MYH9-related

110 disorders are a group of well-established platelet disorders that typically cause macrothrombocytopenia and neutrophil inclusion bodies and are variably associated with renal disease, cataracts and hearing loss. The number of patients with MYH9-related disorders in the BRIDGE-BPD study was unexpected, given the exclusion of patients in whom this disorder was suspected. The cases described above had not been suspected by their recruiting clinician to have MYH9-RD. All the cases had macrothrombocytopenia but only one was recorded as having neutrophil inclusion bodies and none of the other syndromic features such as hearing or renal failure were present in any of the cases. This highlights the ability of HPO-based clustering to identify similar cases even when this is not suspected clinically .

In conclusion, this chapter has described the optimisation of data collection and phenotyping of this large patient cohort. In subsequent chapters I will discuss how this has helped interpretation of the next-generation sequencing data and enabled gene discovery and genetic diagnosis for these patients.

111

Analysis of genotyping results from the BRIDGE-BPD study

Chapter 3 described the enrolment and phenotyping of cases with inherited, rare bleeding and platelet disorders. DNA samples from these cases were then submitted for high-throughput sequencing. In this chapter I will demonstrate the success of the application of whole exome and whole genome sequencing to this group of disorders in providing a diagnosis to patients. I will describe the process of identifying pathogenic variants in established BDD genes both locally, for cases enrolled at Hammersmith Hospital and for cases across the entire BRIDGE-BPD study.

4.1 Genotyping of local samples

I described the recruitment of BPD probands and their affected relatives at Hammersmith hospital in Chapter 3. The majority of these samples passed quality control (QC) and repeat samples were sought from cases whose samples did not. A total of 113 DNA samples passed QC of which 101 have been sent to sequencing platforms at the time of writing. While the first 36 of these samples were initially sent for WES, the falling costs of WGS made it feasible to perform WGS on later enrolled cases (n=64); some WES samples were also re-run on the WGS platform as sequencing capacity allowed (n=27) (fig 4.1).

112

121 DNA samples • 106 probands from eligible patients and relatives • 15 affected relatives

113 DNA samples passed QC

100 sequenced • 89 probands • 11 affected relatives

9 WES 64 WGS 27 WES only & WGS

Figure 4.1: Flow diagram indicating the number of DNA samples from patients enrolled at Hammersmith hospital who had whole exome sequencing (WES), whole genome sequencing (WGS) or both. (QC: quality control; affected relative: member of a pedigree who shares the same bleeding and/or platelet phenotype as the proband in that pedigree)

4.2 Identification of candidate variants in tier-1 BPD genes in cases enrolled locally

64 probands of the 89 who have been sequenced (72%) were found to have rare variants in tier-1 BPD genes meeting all of the following criteria: • frequency <0.001 in ExAC/GnomAD117 • CADD score >10114 • missense or predicted high-impact (splice-site, frameshift, stop-lost or stop- gained) variant. 5’UTR variants in ANKRD26 were also included as this is a well- established location of pathogenic variants in this gene151.

113

• if >1 member of the pedigree has been sequenced, the variant is present in all pedigree members affected by the phenotype (12 pedigrees had >1 member sequenced). In total, 104 different candidate variants were identified in 49 tier-1 genes; these are listed in table 4.1. The number of variants per case ranged from 1 to 4. Five cases carried 4 different variants in tier-1 genes (table 4.1). All candidate variants were annotated with: • predicted consequence of the variant as per the Variant Effect Predictor107 • presence of the variant in HGMD118, • allele frequency in ExAC/GnomAD, and • CADD score. Candidate variants were considered to have a Phenotype Match if: • the patient phenotype was similar to the disease phenotype previously associated with that particular gene, and • the patient phenotype was similar to the phenotypes of other cases in the NIHR BioResource with that variant.

As expected, the majority (n=72, 69%) of variants were in genes associated with platelet disorders (platelet genes), with variants in coagulation (n=16), thrombotic (n=13) and other genes (n=6) seen less frequently (figure 4.2). One third (n=23) of the variants in platelet genes were in a case with a phenotype match. Phenotype matches were less common in cases with variants in other genes (six phenotype matches in coagulation genes and only one in a thrombosis gene). This analysis was consistent with the expectation that genetic coagulation disorders would be seen infrequently in the BRIDGE cohort because they are more easily diagnosed by standard laboratory techniques than platelet disorders and therefore these patients should have been identified in the clinic. It also reflects the enrolment criteria of the study, with very few thrombotic disorders recruited. Overall, 70% of identified candidate variants in tier-1 BPD genes were not considered to have a phenotype match because the cases did not have the phenotype associated with defects in that gene, the variant was present in cases in the NIHR Bioresource with different phenotypes or the type of variant was incompatible with the known mechanism of disease or inheritance pattern for that gene (table 4.1). These

114 variants were considered to be benign with respect to the BPD phenotype and were not considered further in this analysis.

Figure 4.2: Categories of tier-1 BPD genes in which rare coding variants were identified in Hammersmith cases. Bar chart showing that the majority of variants occurred in genes associated with platelet disorders and that only a minority of variants had a phenotype match. Fewer variants occurred in coagulation, thrombotic or incompletely categorised genes. The red columns indicate the number of variants in each category in which there was a phenotype match (ie. where the patient’s phenotype was compatible with the expected phenotype cause by a defect in the respective gene)

115

Table 4.1: Rare high-impact or missense variants in tier-1 BPD genes identified in 64 pedigrees enrolled at Hammersmith hospital. Bleeding or platelet phenotypes within each pedigree are summarised into categories [MTCP – macrothrombocytopenia; UBD – undefined bleeding disorder (abnormal bleeding but normal aggregation responses to all routine agonists and normal PFA-100 closure times); PFD – platelet function defect (abnormal aggregation response to any agonist in LTA or prolonged PFA-100 closure times); PFD+ - PFD with additional evidence of granule secretion or formation defect measured either by lumi-aggregometry or platelet nucleotide assay; TCP – thrombocytopenia with normal sized platelets; IMPV – increased mean platelet volume]. Variants recorded in the Human Gene Mutation Database (HGMD)118 as disease-causing are listed. Reasons for excluding the variant from further analysis are given [P – phenotype of patient does not match that expected from a defect in the gene; V – same variant occurs in other cases with a different phenotype; M – type of variant and/or inheritance pattern is incompatible with known mechanism of disease]. * pedigrees in which more than one family member underwent whole exome or whole genome sequencing

Pedigree Phenotype # tier-1 Gene Variant Protein HGMD Gnomad CADD Phenotype Reason identifier variants effect frequency match for exclusion 1 MTCP 1 GP1BB 22:19711761 T/A Leu132Gln 23 y

2 UBD 1 SLFN14 17:33879748 C/T splice donor 0.0000287 25 n P,V

3 UBD 1 FERMT3 11:63978242 G/A Arg107His 0.0003794 20.4 n V

*4 PFD+; 1 RUNX1 21:36318516-37284936 large y MTCP del deletion 5 PFD 2 VWF 12:6212480 8.6Kb del exon 4&5 y deletion NBEAL2 3:47033017 T/C Val255Ala 0.0000041 10 n M

6 PFD 2 RUNX1 21:36164621 C/G Met391Ile n P

116

MYH9 22:36688151 C/T Asp1409Asn 0.000285 34 n P,V 7 PFD+ 1 F8 X:154130352 Ser2030Asn y 0.0000336 25.5 y

8 TCP 1 ANKRD26 10:27389373 G/A 5'UTR y 16.58 y c.-118C>T 9 PFD 1 VWF 12:6059049 C/T splice region 27.4 y missense Cys2719Tyr *10 PFD 2 STIM1 11:4105953 A/G Thr240Ala 0.00000243 16.19 n P,V

FERMT3 11:63987709 G/A Asp412Asn 0.0000081 33 n P

11 PFD 1 PROC 2:128180774 G/GAA Frameshift n P

*12 TCP 1 CYCS 7:25163343 G/C Leu99Val 25.6 y

13 UBD 1 STXBP2 19:7712322 G/A Gly552Ser 0.0002299 29.2 n V,P

14 PFD 1 PLA2G4A 1:186925459 T/C Leu521Pro 0.0000081 24.3 n M

15 PFD 2 NBEAL2 3:47033332 G/A splice region 0.0000613 12.92 n P missense Ala310Thr STIM1 11:4076834 G/A Arg155Gln 0.0000185 26.2 n P,M

16 PFD 2 SLC45A2 5:33947285 G/A Arg451Cys 0.0000541 34 n V

ITGB3 17:45384941 A/G Ile747Val 0.0000041 19.79 y

*17 PFD+ 2 GFI1B 9:135864478 C/T His181Tyr 26 y

THBD 20:23029459 G/A Pro683Leu 0.0002389 8.45 n V

*18 UBD 1 VWF 12:6138548 Arg976His 0.0000366 10.68 n P

117

19 PFD 2 LYST 1:235872609 C/A splice 28.1 n M acceptor MYH9 22:36691696 A/G Ser1114Pro 0.0003179 23.7 n P,V

20 PFD+ 1 STIM1 11:4104609 C/G Ala452Gly 0.0000217 23.2 n P,M

21 UBD 1 VWF 12:6128092 C/T Asp1498Asn 0.0000122 24.8 n P

22 PFD+ 1 LYST 1:235884048 T/C Asn3158Ser 0.0000447 17.84 y

23 PFD 2 DIAPH1 5:140960535 T/G Lys261Thr 24.3 n P,M

HPS1 10:100183615 C/T Arg466Gln 0.0000651 23.9 n P,M

24 UBD 2 COL3A1 2:189872869 G/A splice donor y 28.1 y

SERPINC1 1:173883846 T/C Lys85Glu 11.63 n P

25 PFD 1 GP1BB c.505- p.Val169- y 516dupGTGCTGCTGCTG Leu172dup 26 PFD 1 F8 X:154195804 C>T intronic y y

27 UBD 1 HOXA11 7:27224368 C/G Arg132Ser 0.0004763 25.4 n V

28 PFD+ 4 ITGA2B 17:42453314 G/A Leu830Pro 24.9 y

RASGRP2 11:64504445 C/T Arg292Gln 0.0000287 23.4 y

COL3A1 2:1898667789 A/T splice donor y 27.3 y

FLI1 11:128679073 C/T Pro268Leu 0.0000253 35 y

29 PFD 4 LYST 1:235955271 C/T Gly1424Asp 0.0000406 23 n V

ANKRD26 10:27311569 T/G Glu1378Ala 0.0000072 23.3 n M,P

THBD 20:23028854 C/T Gly430Ser 0.0000577 29.4 n P

118

F8 X:154157065 C/T Arg1667Gln 0.000056 n P,M

30 PFD 2 LYST 1:235973523 C/T Asp199Asn 0.0001789 16.92 n P,M

ORAI1 12:122079485 Arg283His 26.9 n P

31 TCP 1 TUBB1 20:57594594 A/G His6Arg 0.0000487 25.4 y

32 TCP 1 ANKRD26 10:27389381 A/G 5'UTR y 13.86 y c.-126 T>C 33 PFD 3 DIAPH1 5:140961908 G/A Arg219Cys 0.0000041 27.7 n P,M

CHST14 15:40763778 GC/G frameshift n P

ITGB3 17:45331247 Pro7His 17.96 n P

*34 PFD 4 HRG 3:186395181 G/A Gly363Arg 0.0000126 16.82 n P

ANKRD26 10:27382394 Pro139Thr 0.0000081 14.48 n M

STXBP2 19:7707186 A/G 24.4 n P

SERPIND1 22:21133973 Arg125Cys 0.0000397 34 n P

*35 PFD 1 GP9 Thr44Met 0.0001186 20.29 n V,P

36 PFD 1 SERPINC1 1:173879923 A/T Val244Asp 0.0000487 26.2 n P

37 PFD 1 NBEA 13:35782874 A/T Ser1802Cys 0.0001233 25.1 n P,M

38 MTCP; 1 MYH9 22:36688074 C/G Gln1434His 0.0000041 23.5 y PFD 39 IMPV;PFD 3 F5 1:169495162 G/A Thr1903Met 0.0000289 33 n P

ETV6 12:11993401 ACT/A frameshift y

SLFN14 17:33884243 G/A Thr280Ile 0.0000202 11.92 n P

119

40 TCP;PFD 1 DIAPH1 5:140905944 C/T Glu1121Lys 0.0000081 23.2 n P,M

41 PFD 1 F10 13:113803475 G/A Gly371Ser 22.9 n P

42 PFD 4 FGA 4:155510103 C/T Arg69Lys 25.4 n V

VWF 12:6138549 G/A Arg976Cys y 26 y

GP6 19:55525788 T/C Thr509Ala n P

VWF 12:6230434 G/C Phe42Leu 12.33 y

43 MTCP 2 MYH9 22:36688106 C/T Asp1424Asn y 33 y

NBEA 13:35517126 C/T Pro57Ser 18.61 n P

BLOC1S3 19:45682571 G/A Arg6His 0.0004887 28 n V

44 Other 3 HPS3 3:148880655 C/T Ser824Leu 0.0000977 24.9 n P,M

ANO6 12:45740851 A/G Gln109Arg 0.0001012 23.6 n P

CHST14 15:40763432 C/A Thr7Asn 0.0000556 22.1 n P

45 PFD 2 DIAPH1 5:140907136 A/G splice-donor 18 n P

46 PFD+ 1 GATA1 X:48650823 A/G Tyr231Cys 26 y

47 PFD 4 PROS1 3:93692574 C/G Arg7Pro n P

NBEA 13:36046590 A/G Lys2168Glu n P,M

LMAN1 18:57006053 A/G Val363Ala n P

TUBB1 20:57597987 G/A Val49Met n P

48 PFD 1 ANO6 12:45664245 G/A Gly24Asp 0.0000984 27.7 n V,M

49 PFD; IMPV 1 MYH9 22:36682793 Met1678Val 18.4 y

120

50 PFD 3 SERPINC1 1:17883863 Arg79His y 0.0000974 33 n P

HPS1 10:100185308 T/G Gln442Pro 15.65 n P,M

SERPIND1 22:21138428 T/A Ile353Asn 0.0004402 33 n P

*51 Other 1 PROS1 large y deletion 52 PFD 1 MPL 1:43804305 G/C Arg102Pro y 0.0003752 25.8 n V

53 PFD 1 PLAT 8:42038084 C/T Gly337Arg 0.0000831 17.14 n P

54 UBD 1 LYST 1:235872482 G/A Ser3351Leu 0.0000072 34 n P,M

55 PFD+ 1 NBEAL2 3:47040552 T/TG frameshift n P,M

56 UBD 3 GP9 3:128781109 frameshift & n P T/TGG[12]CCC stop HPS6 10:103826577 C/G Ser449Cys 0.0001732 24 n P

FLI1 11:128628194 G/T Arg68Leu 0.0002828 34 n P

57 UBD 1 VWF 12:6172185 C/T Val490Met 0.0000869 26.5 n V,P 58 PFD 3 MPL 1:43818201 G/T Val556Phe y 0.0002454 26.4 n P

SERPINC1 1:173883867 G/A Arg78Trp 0.0000325 34 n P

SERPIND1 22:21134448 C/G Pro283Arg 0.0000902 26.9 n P

59 MTCP 1 DIAPH1 5:140998453 G/C Pro10Arg 0.0000978 16.54 n M 60 TCP 1 ANKRD26 10:27389383 C/A 5'UTR y 14.43 y c.-128 G>T 61 PFD+;TCP 2 FLI1 11:128628185 A/G Asn65Ser 10.77 n P

ETV6 12:12037447 TG/T frameshift y

121

62 PFD 1 GP1BA 17:4837701 C/T Ser601Leu 0.0000203 20.6 y

63 PFD 1 SERPIND1 22:21138428 T/A Ile353Asn 0.0004402 33 n V,P

64 MTCP 1 GP1BA 17:4836384 A/G Asn150Ser 24.3 Y

122

4.3 Assessment of tier-1 BPD gene variants identified in cases enrolled locally.

The variants identified in tier-1 BPD genes were next assessed for their likelihood of being disease-causing and categorised as clearly pathogenic (CPV), likely pathogenic (LPV) or a variant of uncertain significance (VUS) as per guidance from the American College of Medical Genetics122 and described in chapter 2, section 2.4. Briefly, missense variants which have been described previously in at least four independent cases with a similar phenotype OR loss-of-function variants with an established mechanism of action were labelled clearly pathogenic. Variants previously reported but in less than 4 independent pedigrees are likely pathogenic OR, if not previously reported there must be strong experimental evidence in support of that type of variant (for example loss-of- function) and/or supportive co-segregation data. Variants not meeting these criteria were labelled a variant of unknown significance. All tier-1 gene variants identified in Hammersmith cases were assessed locally by me. They also follow a central assessment process so that variants meeting CPV or LPV criteria are discussed at the central Thrombogenomics MDT with documentation of the variant in a formal research report. I have assessed each tier-1 BPD gene variant listed in table 4.1 in the context of the phenotype in each case and co-segregation data where available. I have accounted for GnomAD frequency, presence in other cases in the NIHR BioResource, other published reports and results of 3 pathogenicity prediction tools (CADD, SIFT and Polyphen). Where possible I have also performed a limited assessment of the known mechanism by which variants in the gene cause disease and whether this is compatible with the type of variant found. Each case or pedigree is then annotated according to whether the variant can fully explain the phenotype or not. All cases with a VUS remain unexplained. LPVs may either fully or partially explain the phenotype. Detailed assessment was performed of twelve variants listed in HGMD as disease-causing (labelled DM or DM? in the HGMD database) plus an additional 22 variants with a phenotype match. These assessments are discussed in the following sections. The remaining 70 variants were not listed in HGMD, did not have a phenotype match and/or the type of variant was not compatible with the known mechanism of disease for that gene and phenotype (Table 4.1). These variants were not considered for further analysis in this project.

123

4.3.1 Assessment of tier-1 gene variants listed in HGMD

Twelve of the tier-1 gene variants are listed in HGMD as disease-causing (table 4.2). Eight of these variants were detected in cases where there was a phenotype match between gene and patient phenotype, five of which have been assessed at the central MDT. Seven variants are clearly or likely pathogenic, six of which either partially or fully explain the BPD phenotype for which the case was enrolled in the BRIDGE-BPD study. Formal MDT reports have been issued on three of the pathogenic variants, enabling this diagnosis to be fed back to individual families.

Table 4.2: Tier-1 BPD gene variants identified in Hammersmith cases and listed in HGMD. Bleeding or platelet phenotypes within each pedigree are summarised into categories as defined previously (MTCP: macrothrombocytopenia, PFD: platelet function defect, PFD+: platelet function defect with additional evidence suggestive of a granule disorder, UBD: undefined bleeding disorder). Pathogenic variants (CPV or LPV) can either fully or partially contribute to the bleeding or platelet (BPD) phenotype seen in the patient. Pedigree identifiers correspond to those in table 4.1 Pedigree Case Gene Variant MDT Local Contribution identifier phenotype pathogenicity to BPD assessment phenotype 43 MTCP MYH9 p.Asp1424Asn CPV CPV Full 7 PFD+ F8 p.Ser2030Asn CPV Unexplained 8 TCP ANKRD26 5’UTR LPV LPV Full 24 UBD COL3A1 Splice site VUS LPV Partial 26 PFD F8 Intronic VUS VUS 28 PFD+ COL3A1 Splice site LPV Partial 32 TCP ANKRD26 5’UTR LPV CPV Full 42 PFD VWF p.Arg976Cys VUS 60 TCP ANKRD26 5’UTR LPV Full 50 PFD SERPINC1 p.Arg79His Likely benign 52 PFD MPL p.Arg102Pro Likely benign 58 PFD MPL p.Val556Phe Likely benign

124

4.3.1.1 Clearly pathogenic variant which fully explains the phenotype

A missense variant in MYH9 (p.Asp1424Asn) was identified in case #43 with macrothrombocytopenia without platelet dysfunction or associated features. The variant has been reported in the literature and is also seen in other BRIDGE cases with a similar phenotype. There are also several published cases with different amino acid substitutions at the same residue152-155. The presence of this variant in at least four independent cases with the same phenotype, plus an established mechanism for missense variants in MYH9 causing disease enabled it to be assigned clearly pathogenic status. Although defects in MYH9 can be associated with a wider phenotype, including renal disease and hearing impairment, many cases have isolated platelet defects and this missense variant is felt to fully explain the phenotype in this case.

4.3.1.2 Likely pathogenic 5’UTR variants

Variants in the 5’UTR have rarely been found to be to be pathogenic in most genes and as such, are usually not considered as candidate variants for Mendelian diseases. However, several single nucleotide substitutions in the 5’UTR of the transcription factor ANKRD26 have recently been identified as a cause of non-syndromic thrombocytopenia with increased risk of myeloid malignancy120,151. Pathogenic 5’UTR variants are predicted to disrupt the binding of the RUNX1 and FLI1 transcription factors to ANKRD26 which in turn leads to persistent ANKRD26 expression during megakaryopoiesis. The mechanism of thrombocytopenia has not been fully elucidated but the ANKRD26 expression seems to increase signalling through the TPO/MPL pathway and lead to defective proplatelet formation156. Probands #8, #32 and #60 with non-syndromic thrombocytopenia were found to carry different variants in the 5’UTR of ANKRD26. All three variants are listed in HGMD as disease-causing for thrombocytopenia151,157. The variant in case #32 was also seen in four unrelated thrombocytopenic cases in the BRIDGE-BPD study, with supportive co-segregation evidence in two pedigrees, so this variant meets CPV criteria. The variant in cases #8 and #60 have been previously reported but not in enough cases to be clearly pathogenic so they are LPVs. All variants can fully explain the non- syndromic thrombocytopenia phenotype seen in their respective pedigrees. Case #60

125 had not been reviewed at the central MDT at the time of writing but was passed on for formal assessment and production of a research report to enable patient feedback

4.3.1.3 Likely pathogenic variants where the pathogenicity has changed

There are some examples of variants where the pathogenicity based on their listing in HGMD was later reviewed and changed. Two COL3A1 HGMD-listed splice site variants were detected in two probands with vascular Ehlers Danlos syndrome (vEDS). The first, c.3525+1 G>A, was identified in a 23 year old female (case #24) with vEDS and excessive, atypical mucocutaneous bleeding symptoms without demonstrable platelet function defect (UBD). She was enrolled into the BRIDGE-BPD study to identify a cause for the UBD. At the time of central MDT assessment, this COL3A1 variant had not been reported in the literature and the majority of known vEDS-causing variants were missense, so it was labelled as VUS. Patient #28 had clinically diagnosed vEDS with no variants identified on previous targeted sequencing of the COL3A1 gene in another centre. He was enrolled in the BRIDGE-BPD study in an attempt to identify the cause of an additional platelet storage pool disorder and the c.1150-2 A>T splice region variant in COL3A1 was identified and labelled VUS since it was previously unreported. Both variants were recently reported in a large case series of vEDS cases, alongside other splice variants and are now listed in HGMD as disease-causing158 so could therefore now both be labelled LPV for the vEDS. Bleeding symptoms are heterogeneous in vEDS and it is uncertain whether the COL3A1 defect fully or partially contributes to the bleeding phenotype in case #24. Case #28 had a PFD. There have been many reports of platelet function defects occurring in patients with vEDS, comprising up to 50% in some studies and often with storage pool defects159- 162, but no mechanism has been elucidated and no large-scale studies proving a statistical association performed. The COL3A1 splice variant can therefore only be labelled as partially explaining the phenotype in this case as we cannot prove its association with the platelet defect.

An intronic single nucleotide variant in a splice region in F8 was identified in a young female patient investigated for UBD which, as previously stated, means that no cause for bleeding was found on routine coagulation or platelet function testing. The factor VIII

126 level was normal. The variant is listed in HGMD as a DM?163. The category of DM? is allocated to variants that have been previously reported and thought probably pathogenic but there is some evidence to suggest uncertainty. Even though the factor VIII level in the patient was normal this is frequently seen in female carriers of disease- causing F8 variants and so the variant was initially labelled as a CPV on the basis of the HGMD listing. There was no family history of Haemophilia A which raised the possibility of this being a de novo variant and the implications of being a carrier of a pathogenic factor 8 mutation are significant. The variant is present in GnomAD with a MAF of 0.1% in Europeans which is much more common than Haemophilia A which occurs in approximately 1 in 5000 live male births. When we looked at the original report of this variant in three unrelated pedigrees, all the cases with mild haemophilia A also carried a second F8 intronic variant163 and it is not clear whether either variant could be pathogenic on its own. It was concluded that this variant was too common to be clearly pathogenic for Haemophilia A and that there was not enough evidence of pathogenicity in the original case reports and so it was reassigned to a VUS.

4.3.1.4 Pathogenic, HGMD-listed variants do not necessarily explain the BPD phenotype

Four of the pathogenic variants are listed in HGMD as disease-causing but do not fully explain the bleeding or platelet disorder phenotype for which the case was enrolled in the BRIDGE-BPD study. Two of these are the COL3A1 variants discussed in the previous section where the contribution to the BPD phenotype is uncertain and these cases remain only partially explained. The third such example is a missense variant (p.Ser2030Asn) in F8 identified in a young female patient who had a family history of Haemophilia A and was enrolled into the study for investigation of a platelet function defect (case #7). The patient had normal factor VIII levels, a common finding in haemophilia A carriers. The F8 variant is seen in GnomAD at a frequency of 1 in 30,000 and is listed in HGMD as a disease-causing variant associated with mild Haemophilia A164 so it meets criteria to be clearly pathogenic for a FVIII defect. The platelet defect for which the patient was enrolled in the study however remains unexplained. This assessment has been performed locally but the case will be highlighted to the central MDT so a formal research report can be issued.

127

A fourth HGMD variant, a missense variant in VWF (p.Arg976Cys; case #42, table 4.2) was identified in an index case with undefined bleeding disorder. This case also carried a second missense variant in VWF (p.Phe42Leu) which is not previously reported. This case did not have a clinical diagnosis of von Willebrand disease on the basis of having normal von Willebrand factor related parameters. The variant in HGMD was described in a case labelled with Type 2A/2E von Willebrand disease, a reported subtype of VWD in which it has been proposed a decreased susceptibility to proteolysis by ADAMTS13 results in an abnormal oligomeric structure in association with normal VWF antigen and collagen binding levels165. The Hammersmith case had not had VWF multimeric analysis performed and therefore it was not possible to make an accurate comparison with the reported case. The Arg976Cys variant was labelled VUS and the Phe42Leu variant was not considered further. This analysis shows that pathogenic, HGMD-listed variants are not necessarily responsible for the full phenotype of the patient and this must be considered and made clear when reporting findings back to clinicians and patients.

4.3.1.5 HGMD variants without a BPD phenotype match

Three variants listed in HGMD were present in cases without a phenotype match and were not considered for further analysis. A variant in SERPINC1, encoding antithrombin was detected in a patient with a bleeding disorder with platelet function defect. The SERPINC1 variant is reported both in association with antithrombin deficiency166 and as a likely benign variant167. There is no personal or family history of thrombosis in this pedigree and antithrombin levels are not known in this case. The SERPINC1 variant can be considered an incidental finding, the feedback of which is not routinely performed. The MPL variants in cases #52 and #58 have previously been reported in patients with amegakaryocytic thrombocytopenia and Diamond-Blackfan anaemia respectively168,169. Both patients were enrolled in BRIDGE-BPD with bleeding and a platelet function disorder with normal platelet number without evidence of bone marrow failure. The MPL variant in case #52 was also present in other BRIDGE cases with a different phenotype. Again, these MPL variants were therefore an incidental finding and unlikely to be pathogenic for the BPD seen in the patients.

128

4.3.2 Assessment of tier-1 BPD gene variants with a phenotype match and not listed in HGMD

In total 31 variants (31% of the total number of tier-1 gene variants identified) in 26 Hammersmith probands were labelled with a phenotype match between gene and proband. The nine variants listed in HGMD have been discussed in the previous section, but the remaining 22 variants were not listed at the time of assessment (table 4.3). As previously, all variants have undergone local assessment and some have also been assigned pathogenicity at the central MDT.

4.3.2.1 Likely pathogenic variants

Seven variants were likely pathogenic and all fully explained the BPD phenotype in each case except a deletion in the VWF gene in case #5.

4.3.2.1.1 Likely pathogenic structural variants

Three large deletions were identified and labelled LPV. The first is a large deletion including exons 4&5 in VWF in a case with low VWF levels previously diagnosed with Type 1 VWD. The same deletion was detected in another BRIDGE-BPD case with low VWF antigen levels and a similar deletion was previously reported in Type 1 and Type 3 VWD170. The patient was enrolled in the BRIDGE-BPD study for investigation of an additional platelet function defect which is not explained by the VWF variant and therefore this variant is labelled as only partially explaining the patient’s BPD phenotype. Two other large deletions in PROS1 and RUNX1 were also previously unreported and designated LPV status in view of (a) the resultant critical protein disruption (because they delete areas of the respective proteins that are critical for function) and (b) the patient phenotypes were identical to previously reported cases with similar deletions. The proband with the PROS1 LPV was initially sequenced on the Thrombogenomics targeted sequencing panel78 in an attempt to identify a cause for a known inherited protein S deficiency. The Thrombogenomics platform performs targeted sequencing of mainly exons, splice regions and UTR’s of genes and no candidate variant was identified in PROS1. Samples from this patient and her mother (who also has protein S deficiency) were transferred to the BRIDGE whole genome sequencing platform and the large deletion in PROS1 was subsequently identified.

129

A large deletion predicted to delete the first exon of RUNX1, a transcription factor important in megakaryopoiesis, was found in a father and daughter who both had macrothrombocytopenia and platelet function defect with abnormal platelet nucleotides and ATP release. The phenotype of patients with pathogenic variants in RUNX1 is variable but both the macrothrombocytopenia and platelet function defect are recognised. Very large deletions, removing several exons or even the entire gene have been reported in RUNX1-related thrombocytopenias, but the consequences of a deletion just involving exon 1 are unclear. Nonetheless, it was concluded that this RUNX1 deletion was likely to be pathogenic and could fully explain the phenotype seen in both sequenced an d affected members of this pedigree.

130

Table 4.3: Hammersmith tier-1 variants with a phenotype match not listed in HGMD. Cases highlighted in blue have been discussed at central Thrombogenomics MDT with a formal research report received. Other cases have either only been assessed locally or are VUS, in which case they may have been through MDT but a formal research report was not made. All variants are annotated with minor allele frequency (MAF) in the GnomAD database, deleteriousness assessment using SIFT and Polyphen2, pathogenicity assessment as outlined in the text either at central Thrombogenomics MDT or by me. Phenotype abbreviations have been previously defined TCP: thrombocytopenia, MTCP: macrothrombocytopenia, PFD: platelet function defect, PFD+: platelet function defect with additional evidence of granule defect, UBD: undefined bleeding disorder. Pedigrees in which there is sequencing data for more than one family member are marked by an asterisk *. Sift analysis: O Deleterious; O Deleterious low confidence; O Tolerated. Polyphen analysis: O Probably damaging;

O Possibly damaging; O benign. Phenotype Gene Variant GnomAD MAF CADD SIFT/ Pathogenicity Contribution to Polyphen2 assessment phenotype 1 MTCP GP1BB p.Leu132Gln 0 23 O O LPV Full *4 PFD+; RUNX1 Large deletion 0 LPV Full MTCP 5 PFD VWF Exon 4&5 deletion 0 O O LPV Partial *12 TCP CYCS p.Leu99Val 0 25.6 O O LPV Full 25 PFD GP1BB Hom 0 LPV Full p.Val169-Leu172dup *51 Other PROS1 Large deletion 0 LPV Full *17 PFD+ GFI1B p.His181Tyr 0 26 O O VUS Unexplained 28 PFD+ ITGA2B p.Leu830Pro 0 24.9 O O VUS Unexplained RASGRP2 p.Arg292Gln 0.0000287 23.4 O O VUS Unexplained

131

FLI1 p.Pro268Leu 0.0000253 35 O O VUS Unexplained 9 PFD VWF splice site missense 0 27.4 O O VUS Unexplained Cys2719Tyr 16 PFD ITGB3 p.Ile474Val 0.0000041 19.79 VUS Unexplained 22 PFD+ LYST p.Asn3158Ser 0.0000447 17.8 O O VUS Unexplained 31 TCP TUBB1 p.His6Arg 0.0000487 25.4 O O VUS Unexplained 38 MTCP; MYH9 p.Gln1434His 0.0000041 23.5 O O VUS Unexplained PFD 39 IMPV;PFD ETV6 p.Ser85* 0 VUS Unexplained 42 UBD VWF p.Phe42Leu 0 12.3 O O VUS Unexplained 46 PFD+ GATA1 p.Tyr231Cys 0 26 O O VUS Unexplained 49 PFD; IMPV MYH9 p.Met1678Val 0 18.4 O O VUS Unexplained 61 PFD+;TCP ETV6 p.Glu361fs* 0 VUS Unexplained 62 PFD GP1BA p.Ser601Leu 0.0000203 20.6 O O VUS Unexplained 64 MTCP GP1BA p.Asn150Ser 0 24.3 O O VUS Unexplained

132

4.3.2.1.2 Likely pathogenic novel missense variants

Two heterozygous missense variants identified in GP1BB and CYCS are both novel variants identified in cases with non-syndromic thrombocytopenia. Although these specific variants have not been reported before, heterozygous missense variants have been reported in cases with thrombocytopenia. The CYCS novel variant is predicted to cause a Leu99Val substitution. The mechanism by which variants in CYCS cause thrombocytopenia is not fully understood, but this variant co-segregates with non- syndromic thrombocytopenia in a large pedigree enrolled in the BRIDGE-BPD study. The same variant is also present in a second pedigree with non-syndromic thrombocytopenia and is completely absent from GnomAD. On these grounds, this CYCS variant was labelled LPV. Variants in CYCS are discussed in more detail in section 4.4.2.

The Leu132Gln missense variant in GP1BB was identified in the proband of a small pedigree with mild macrothrombocytopenia (platelet count 115 x109/l, MPV 13.7fL). There was no history of abnormal bleeding in the proband and platelet light transmission aggregation responses were normal, including normal agglutination in response to ristocetin. The family history suggested autosomal dominant inheritance with the proband’s father, sibling and son all reported to have a low platelet count (fig. 4.3). GP1BB encodes the glycoprotein (GP) 1Bb, which together with GP1Ba, GPV and GP9, form the platelet transmembrane receptor for VWF171. Variants in GP1BB, GP1BA and GP9 are classically associated in homozygous or compound heterozygous form with autosomal recessive macrothrombocytopenia and bleeding with failure of platelet agglutination in response to ristocetin (Bernard Soulier syndrome (BSS), OMIM #231200)172-176. Heterozygous variants in GP1BB have been reported in association with a giant platelet disorder but without substantive family or functional studies177-179. The cause of thrombocytopenia in biallelic BSS is not well-defined but it may be due to reduced pro-platelet formation180,181. Furthermore, the phenotype expected from pathogenic heterozygous variants in GP1BB is unclear: Obligate carriers of BSS variants usually have normal platelets; patients with monoallelic BSS usually have mild macrothrombocytopenia without bleeding and platelet agglutination in response to ristocetin and flow cytometry assessment of GP1B-IX-V expression are not done in many cases181-183. Complete loss of one GP1BB allele can reduce the receptor expression by 50% but not cause any abnormal platelet phenotype184. Missense variants in GP1BB may

133 exert a dominant negative effect due to the stoichiometry of the GP1B-V-IX complex, in which 4 GP1BB molecules are covalently bound to 2 GP1BA, 2 GPV and 2 GPIX molecules, but this has not been proven in functional studies.

Figure 4.3: Pedigree of Hammersmith case with Leu132Gln heterozygous variant in GP1BB. The proband is marked with a black arrow. Individuals in red have macrothrombocytopenia. Individuals in white have a normal platelet count. Individuals in grey have an unknown platelet phenotype. The proband carries the GP1BB variant but other family members have not yet been sequenced.

The Leu132Gln variant seen in the Hammersmith case is located in a conserved hydrophobic region of GP1BB and the leucine to glutamine substitution is predicted to interfere with the interaction between GB1Bb and GPIX in the formation of the GP1B-V- IX complex on the platelet surface. There is a pathogenic missense variant affecting the adjacent amino acid (p.Ala133Pro) previously reported in a case with large platelets177 and the variant is completely absent from the GnomAD control dataset. Supporting evidence for pathogenicity of the Leu132Gln variant came from the demonstration of a significant association between heterozygous missense variants in GP1BB and autosomal dominant thrombocytopenia in the BRIDGE-BPD cohort. A similarity regression technique developed by computational statisticians in the BRIDGE consortium was used to estimate the probability of association between an HPO-coded phenotype and rare variants in a particular gene185. An interim analysis of 1542 BRIDGE-BPD cases and 5422 control cases from the NIHR BioResource-Rare Diseases revealed a statistical association

134 between rare, non-synonymous heterozygous variants in GP1BB and autosomal dominant macrothrombocytopenia (Fisher’s exact test p value=2.1x10-6)186. Eight unrelated pedigrees with autosomal dominant macrothrombocytopenia were identified in the BRIDGE-BPD collection which drove this statistical association, including the Hammersmith case. Sequencing data in all eight probands was scrutinised for variants in tier-1 BPD genes, to identify potential alternative causes for the thrombocytopenia, but no rare variants were seen in known thrombocytopenia genes. The association was validated by co-segregation studies where possible and the identification of additional pedigrees with autosomal dominant macrothrombocytopenia and GP1BB heterozygous variants in two further collections of thrombocytopenia cases. In total, nine variants were identified in 18 unrelated pedigrees186. All variants except one were absent from GnomAD and three variants had been previously implicated in BSS in homozygous or compound heterozygous form. This statistical association helped to define a novel mode of inheritance for GP1BB-associated macrothrombocytopenia, however further co- segregation studies or identification of more cases carrying the same variants will be necessary to confirm causality in all cases. The Leu132Gln variant was therefore labelled LPV in the Hammersmith case (i.e. likely to be the cause of macrothrombocytopenia) based on the location and predicted effect of the variant, its rarity in the general population and the fact that the macrothrombocytopenia phenotype is associated with missense variants in GP1BB as described above.

A second likely pathogenic GP1BB variant was identified in a case who had thrombocytopenia with a platelet count of 30x109/L, normal platelet aggregation to all agonists except ristocetin and decreased expression of GP1B-V-IX complex on platelets. This is characteristic of Bernard-Soulier syndrome (BSS), but atypically this patient had undergone several haemostatic challenges including major surgery without bleeding sequelae and the atypical presentation prompted enrolment into the BRIDGE_BPD study. A homozygous in-frame insertion was detected which has not been previously reported in the literature. The effect of the insertion of just three amino acids on the function of GP1BB is uncertain, but, given the characteristic BSS phenotype seen in the patient, it was felt that this variant was likely to be pathogenic. Family studies would also be informative but have not been possible thus far.

135

4.3.2.2 Variants of unknown significance

A total of sixteen variants in fourteen genes did not fulfil CPV or LPV criteria and remain variants of unknown significance (table 4.3). Their contribution to the BPD phenotype in each case is unclear and these cases continue to be included in the analysis of genetic sequencing data to find a pathogenic variant in these fifteen pedigrees or until further evidence emerges to support their pathogenicity.

4.3.2.2.1 Interpretation of multiple variants in the same pathway

Eleven of the 64 unrelated cases with tier-1 gene variants (table 4.1) had variants in 3 or more different tier-1 genes. Four cases have a phenotype match with more than one tier- 1 variant (table 4.3), however, one case (#28) had variants in 4 tier-1 genes with which there was a potential phenotype match (table 4.3). Rare missense variants were present in ITGA2B, RASGRP2, COL3A1 and FLI1. This case was a 20 year old male with an existing diagnosis of vascular Ehlers Danlos syndrome with a known COL3A1 mutation and had been additionally diagnosed aged 11 with a platelet storage pool disorder at another hospital. He was enrolled into the BRIDGE-BPD study in an attempt to identify the cause of the platelet defect. Tests locally showed normal platelet count and size and impaired platelet aggregation to all agonists (ADP, epinephrine, collagen, ristocetin and arachidonic acid). There was also reduced ATP release in response to ADP measured by lumi-aggregometry. The ITGA2B missense variant is predicted to result in a Leu830Pro substitution in the Calf-2 extracellular domain of ITGA2B. Glanzmann’s Thrombasthenia (GT) is classically caused by homozygous or compound heterozygous mutations in ITGA2B or ITGB3 which lead to decreased expression and/or function of the aIIbB3 receptor on platelets. Heterozygous variants have been shown to produce some subtle variations in structure187 but their role in causing platelet phenotypes is less well defined188. Variants affecting Leu830 have been reported in patients with GT in both homozygous and heterozygous form, indicating that missense variants affecting this codon can affect protein function189. However, rare variants in ITGA2B and ITGB3 are relatively common in the general population and likely account for a natural variation in ITGA2B expression and function 188. The effect of heterozygous missense mutations on bleeding phenotype is not yet known.

136

RASGRP2 encodes a small GTPase CalDAG-GEFI which mediates the inside-out activation of aIIbb3 integrin in platelets via Rap1190. Homozygous variants in RASGRP2 have recently been identified causing a platelet function disorder in humans191. Reduced platelet aggregation in response to stimulation with ADP and collagen are characteristically seen with variable responses to arachidonic acid and TRAP6 and impaired dense granule release192,193. No pathogenic heterozygous variants have yet been identified and the significance of these variants is unknown. The RASGRP2 missense variant seen in this patient predicts an arginine to glutamine change at amino acid residue 292 (Arg292Gln). This residue is in the CDC25 catalytic domain of CalDAG-GEFI which interacts directly with Rap1 and is essential for guanine nucleotide exchange factor activity194. Homozygous pathogenic variants have been reported in nearby residues (C296Y195 and C296R193) in cases with a similar phenotype to this patient. The potential for a synergistic effect on a2bb3 signalling from defects in these 2 genes which act in the same pathway must be considered, however cannot be proven without laboratory studies. Heterozygous missense variants in the ETS DNA-binding domain of FLI1 have been reported in association with macrothrombocytopenia and platelet dense granule defects196. The Pro268Leu substitution predicted in this patient is outside this domain and the effect of this missense variant is uncertain without further functional studies. The patient with these 4 variants has a platelet abnormality that could be compatible with a defect in aIIBb3 signalling. Whether the heterozygous RASGRP2 and ITGA2B variants could have a mild but cumulative effect is impossible to say without functional studies. ITGA2B is a target of FLI1 but an additional effect of FLI1 transcriptional defect caused by the heterozygous variant which is not in the DNA binding domain is unlikely unless it caused conformational change which affected DNA binding. These three variants therefore remain a VUS. The COL3A1 variant is discussed in section 4.3.1 and is an LPV for vEDS. The contribution to bleeding in this patient remains uncertain.

4.3.2.2.2 Phenotypic heterogeneity in MYH9 variants

In total, five missense variants in MYH9 were identified in Hammersmith cases with a variety of platelet phenotypes (table 4.4). Variants in MYH9 are an established cause of MYH9-related disorder (MYH9-RD), the commonest cause of inherited thrombocytopenia. Patients typically have macrothrombocytopenia with neutrophil

137 inclusions but additional platelet function defects and disorders in other organs such as the kidney, ear, eye and liver are variable and the detection of neutrophil inclusions (Dohle bodies) is inconsistent197,198. In chapter 3 I described how a number of cases with MYH9-RD were surprisingly detected in the BRIDGE-BPD collection because it can be clinically difficult to distinguish MYH9-RD from other causes of thrombocytopenia.

Table 4.4: MYH9 variants identified in Hammersmith cases. Variants are annotated with the number of unrelated non-BPD cases in the NIHR Bioresource in which they were also seen, the MAF in GnomAD database and the pathogenicity concluded from local and central assessment. MTCP: macrothrombocytopenia; PFD: platelet function defect; IMPV: increased mean platelet volume. Case ID Variant Platelet Identified in Gnomad Pathogenicity phenotype non-BPD frequency assessment cases 43 p.Asp1424Asn MTCP 0 CPV 38 p.Gln1434His MTCP; PFD 1 in 240,000 VUS 49 p.Met1678Val IMPV; PFD 0 VUS 19 p.Ser1114Pro PFD x5 1 in 3000 Likely benign 6 p.Asp1409Asn PFD x15 1 in 3500 Likely benign

All the variants identified in local cases were located in the coiled-coil domain of MYH9. Three cases had large platelets, two with co-existent platelet function defects. Two cases had platelet function defects only, with normal platelet count and platelet size. None of the cases had any non-haematological features of MYH9-RD such as renal disease or deafness and none were noted to have Dohle body inclusions on blood smear analysis. All five cases were assessed as previously described in order to assign pathogenicity. The variants in the latter two cases (p.Ser1114Pro and p.Asp1409Asn) are unlikely to be pathogenic because they were also identified in five and fifteen non-BPD cases in the NIHR BioResource respectively. They also have normal platelet volume, although macrothrombocytes are a consistent feature of MYH9-RD197 and so were deemed not to have a phenotype match. The p.Asp1424Asn variant is discussed in section 4.3.1.1 and is clearly pathogenic, having been previously described in other cases with a similar phenotype. The two remaining cases both had large platelets but the variants have not been previously described. Case #38 (p.Gln1434His) had macrothrombocytopenia and co-existent platelet function defect; case #49 (p.Met1678Val) had a normal platelet count

138 but increased platelet volume and also a co-existent platelet function defect. Unlike cases #19 and 6 the variants are not seen in any non-BPD cases so must be considered as candidates, however their significance in relation to platelet phenotypes is uncertain so they remain VUS until more cases with the same variants are identified or experimental studies confirm the effect of the variant. The analysis of MYH9 variants highlights how even missense variants in the same domain in cases with platelet phenotypes require individual assessment.

4.4 Genotyping of the BRIDGE-BPD collection

Enrolment into the BRIDGE-BPD study and phenotyping were described in chapter 3. Of the 2214 individuals enrolled into BRIDGE-BPD worldwide, 1859 affected cases (84%) have been sequenced on a high-throughput sequencing platform and had their phenotype data submitted to the study database, of which 1602 are unrelated probands. 689 cases have had exome sequencing only with the majority having their whole genome sequenced. Central analysis of the entire collection was performed in parallel to the analysis of local cases described above in this chapter. Methods for analysis are detailed in chapter 2, sections 2.3 and 2.4. The aims of this analysis were novel gene discovery and identification of pathogenic variants in tier-1 BPD genes.

4.4.1 Variants identified in Tier-1 genes in the BRIDGE-BPD collection

Variants in tier-1 genes were prioritised for assessment of pathogenicity at the MDT using the criteria described. Briefly, variants were considered potential candidates if they were predicted to have a moderate or high-impact on the protein and had an allele frequency of <0.1% in reference databases (ExAC/GnomAD/UK10K/1000 Genomes). Variants were also prioritised if they were listed in HGMD with a more relaxed allele frequency of <2.5% in in the same reference datasets. Assessment of candidate variants was then performed in a similar manner to the variants in local cases described above. Each variant was reviewed at central MDT with the phenotype in each case compared to the phenotype in reported cases in the peer-reviewed literature. GnomAD frequency, presence in other cases in the NIHR BioResource, published reports and results from pathogenicity prediction tools were all considered alongside the predicted impact of the variant on the protein and established mechanisms of disease.

139

At the time of writing 204 cases (13% of total sequenced cases) have had a clearly pathogenic or likely pathogenic variant confirmed at central MDT and a research report sent to the referring clinician. 88% of these diagnoses were made in genes associated with platelet disorders, 10% were in coagulation genes and 1% in genes associated with thrombotic disorders (figure 4.4). Two cases were found to have pathogenic variants in collagen genes COL1A1 and COL5A1 which are categorised as ‘other’ because they have neither a well-defined mechanism of impact on platelet function nor on coagulation proteins, but they are associated with bleeding.

Figure 4.4: 210 pathogenic tier-1 BPD gene variants (CPV and LPV) identified in 1602 cases sequenced in the BRIDGE-BPD study. Genes identified as BPD genes since the start of the BRIDGE project are marked with an asterisk *. The number of cases identified with pathogenic variants in each gene is written in brackets

Variants in ACTN1, MYH9, ANKRD26 and RUNX1 cumulatively accounted for 45% of all the pathogenic variants and were all in cases with thrombocytopenia. 13 of the genes (marked with a * in figure 4.4) were not associated with inherited platelet disorders prior to the start of the BRIDGE-BPD study and have been added to the tier-1 gene list as evidence has accrued. In total, at the time of writing the BRIDGE consortium had identified pathogenic variants in 49 cases in these recently discovered genes. Five of

140 these genes were identified and first reported as novel BPD genes by the BRIDGE consortium41,199-201.

4.4.2 Replication studies in the BRIDGE-BPD cohort validate CYCS as a BPD gene.

4.4.2.1 Introduction to CYCS

The CYCS gene encodes the highly conserved 150aa human cytochrome c protein. Cytochrome c is a haem protein which sits on the inner mitochondrial membrane, bound to cardiolipin in the resting state. It has 2 main functions: Firstly as an electron carrier in the mitochondrial respiratory chain and secondly as a critical element of the intrinsic apoptosis pathway202. When cyctochrome c binds to Apaf-1, it triggers formation of the apoptosome with subsequent caspase activation203. CYCS was first reported in association with inherited autosomal dominant thrombocytopenia in a large New Zealand pedigree in 2010204. Affected members of the six-generation pedigree carried a p.Gly42Ser variant in CYCS and had thrombocytopenia with no associated haematological or non-haematological disorders (Thrombocytopenia 4: OMIM #612004). The G42S variant was predicted to alter the conformation of cytochrome c, particularly affecting the interactions with heme205,206. This variant was also shown to increase the apoptotic activity of cytochrome c and also decrease the mitochondrial respiratory rate204,207 but platelets from patients seem to form normally and the platelet life span was not reduced, so enhanced apoptosis has not provided an explanation for the thrombocytopenia204,208. Two further CYCS variants were subsequently reported in cases with non-syndromic thrombocytopenia: p.Tyr49His207 and p.Ala52Val209. These variants are located nearby G42S and both increase caspase activation and cytochrome c peroxidase activity.

141

Figure 4.5: CYCS expression in haemopoietic progenitor cells. CYCS is highly expressed in haemopoietic progenitors with the exception of neutrophils. Peaks are seen in macrophages and platelets (adult blood, top row). Data generated by Blueprint epigenome consortium210.

4.4.2.2 Identification of CYCS variant in Hammersmith proband

A missense variant (g.7:25163343 G>C) predicting a Leu99Val substitution in CYCS was identified in a female case enrolled at Hammersmith hospital with a congenital thrombocytopenia. Thrombocytopenia had been diagnosed in infancy and platelet count had fluctuated between 40-70x109/L. Platelet morphology was normal and PFA-100 closure times were normal in both collagen/epinephrine and collagen/ADP cartridges. There was no history of abnormal bleeding or other medical disorders and condensed MCMDM-1-VWD bleeding score was zero. At enrolment, a limited family history was 142 obtained which suggested the proband’s father may also have thrombocytopenia, suggesting possible autosomal dominant inheritance but no other relatives were known to be affected. This variant is not present in any of the control datasets and is not previously described in the literature.

4.4.2.3 CYCS variants in BRIDGE-BPD and Thrombogenomics collections

The BPD collection was scrutinised and six rare missense variants were identified in CYCS in a further seven BPD probands. Phenotype information was sought from all pedigrees and is detailed in table 4.4. Two cases were found to harbour the same Leu99Val variant present in the Hammersmith pedigree. Sequencing data indicated one of these cases was related to the Hammersmith proband and this was confirmed by the ‘probands’ of the two halves of the pedigree. Sanger sequencing of more relatives confirmed co- segregation of the CYCS variant with thrombocytopenia in this pedigree (fig 4.6). Analysis of the coding variants present in both cases III/4 and IV/1 did not reveal any alternative candidates for the thrombocytopenia.

Figure 4.6: Pedigree HH23E carrying the p.Leu99Val variant. The family members with thrombocytopenia are marked in red. Those marked with a ‘?’ have unconfirmed phenotypes. Where cases have had WES, WGS or Sanger sequencing, the presence of the reference/alternate alleles or ref/ref alleles are labelled.

Relatives in all families were recalled for co-segregation studies where possible. CYCS variants were associated with non-syndromic thrombocytopenia in six probands (table 4.4). All cases were reported to have normal platelet morphology except for 2 cases (RHM and RP4) who were noted to have some large platelets on the blood smear. Platelet

143 function testing was not performed in most cases because of the thrombocytopenia and abnormal bleeding was reported in only 2 cases from 1 pedigree (BRI). Abnormalities of platelet function were recorded in 1 case with thrombocytopenia and double-granule defect (case IPD1) and in 1 non-thrombocytopenic case (IPD2) as detailed in table 4.4. Non-platelet phenotypes were only present in pedigree IPD2. Thrombocytopenia appeared to be highly penetrant and follow an autosomal inheritance pattern in all pedigrees. All thrombocytopenic probands and relatives tested so far carry the CYCS variant (figure 4.7). Two cases in BPD pedigree IPD2 did not have thrombocytopenia (table 4.4). Both had a platelet function defect and a neurological disorder and both phenotypes were explained by a pathogenic variant in another gene.

144

Table 4.5: CYCS variants identified in BRIDGE-BPD cases and their respective phenotypes. Phenotype abbreviations have been previously defined (TCP: thrombocytopenia, MTCP: macrothrombocytopenia, PFD: platelet function defect, PFD+: platelet function defect with additional evidence of granule defect, UBD: undefined bleeding disorder. Neuro: neurological disorder). Sift analysis: O Deleterious; O Deleterious low confidence; O Tolerated. Polyphen analysis: O Probably damaging; O Possibly damaging; O benign

Pedigree Case Variant Plt count Phenotype GnomAD CADD SIFT Polyphen Supporting evidence Pathogenicity ID x109/L MAF assessment IPD1 I p.K26E 89 TCP, PFD+ 21 O O VUS RA7 I p.R39W 119 TCP, bleeding 4.06x10-6 23 O O R39Q also in ExAC VUS BRI IV/4 p.G42S 101 TCP, bleeding 32 O O Reported in literature LPV III/3 103 TCP, bleeding IPD2 I p.N53S 228 PFD, neuro 2.84x10-5 16.17 O O N53D also in ExAC Likely benign II 243 PFD, neuro Alternative explanation RP4 I/1 p.L65V 64 MTCP 25 O O Seen in NIHR BioResource case VUS with TCP II/1 44 MTCP RHM IV/5 p.L99V 44 TCP, bleeding 25.6 O O Present in HH23E LPV V/1 54 TCP V/2 42 TCP HH23E III/4 p.L99V 57 TCP 25.6 O O Present in RHM LPV IV/1 66 TCP III/2 TCP IV/2 88 TCP

145

Figure 4.7: Four pedigrees with autosomal dominant thrombocytopenia who carry CYCS variants. Pedigrees are labelled with the amino acid substitution and the pedigree ID. Cases marked in red have low platelet counts. Those in white have normal platelet

146 counts and those marked with a ‘?’ have unknown platelet counts. Probands are marked with a black arrow. Where cases have had WES, WGS or Sanger sequencing, the presence of the references/alternate alleles or ref/ref alleles are labelled

147

A search for CYCS variants across the NIHR BioResource (approx. 12,000 additional cases) identified five missense variants and one splice donor variant in nine unrelated non-BPD cases (table 4.5). Platelet count was only available in three cases. One case carrying a Lys74Asn variant was not thrombocytopenic (platelets >300 x109/L). Two related cases in another (non-BPD) BRIDGE study were found to harbour the same missense variant seen in BPD case RP4 (Leu65Val). Under the ethical agreement of the study, it is not permitted to request additional phenotype information on cases enrolled in other studies in the NIHR BioResource beyond the information already collected during recruitment. This case had been enrolled into the BRIDGE-Stem cell and myeloid disorders study so full blood count data was available. Both members of this pedigree (SMD) who had been enrolled in the study were thrombocytopenic and the proband had a several recorded platelet counts ranging from 32-89x109/L.

Table 4.6: CYCS variants in non-BPD cases in the NIHR BioResource. Variants identified in Non-BPD NIHR Bioresource cases are listed. The presence of thrombocytopenia (TCP) is indicated if known. (nk=not known). The number of unrelated cases carrying the same variant is given. Deleteriousness predictions using SIFT and Polyphen2 prediction tools are shown alongside the CADD phred score. Sift analysis: O Deleterious; O Deleterious low confidence; O Tolerated. Polyphen analysis: O Probably damaging; O Possibly damaging; O benign

Variant TCP Number of unrelated cases GnomAD CADD SIFT Polyphen who carry the same variant MAF p.Thr29Ala nk 1 non-BPD 4.06x10-6 15.32 O O p.Ser48Ala nk 1 non-BPD 2.44x10-5 5.82 O O p.Ile58Val nk 4 non-BPD 2.43x10-5 12.33 O O 1 unaffected relative of a proband in BPD p.Leu65Val y 1 non-BPD 0 25 O O 1 BPD pedigree RP4 p.Lys74Asn n 1 non-BPD 0 27 O O Splice site nk 1 non-BPD 0 7 na na

These data show that variants in CYCS are seen more frequently in BPD compared to non- BPD cases in the NIHR BioResource. Non-thrombocytopenic and non-BPD cases tend to

148 have consistently lower pathogenicity scores across all 3 methods used and be present in GnomAD. The majority of BPD cases with CYCS variants have thrombocytopenia, irrespective of variant pathogenicity status and they also have more consistently high pathogenicity scores. There are two exceptions to this. Firstly, BPD pedigree RA7 with the R39W variant – this case is thrombocytopenic but this variant and other variants affecting the same amino acid are present in GnomAD, albeit at <1 in 10,000. The CADD, SIFT and Polyphen assessments do not correlate for this variant. Co-segregation and functional studies for this variant would be informative. Secondly, the Lys74Asn variant seen in a non-BPD case in the NIHR BioResource is absent from GnomAD, scores highly in CADD, is deleterious in SIFT but likely benign in polyphen. This case is known to have a normal platelet count so is clearly non-pathogenic for thrombocytopenia. Since the mechanism of how CYCS variants cause thrombocytopenia is still unknown, it is not possible to make definitive assessments of pathogenicity in the novel missense variants identified in the BRIDGE BPD study. However, both L65V and L99V are both seen in 2 unrelated pedigrees with thrombocytopenia, are absent from control datasets and score highly on all 3 pathogenicity assessments so have been labelled likely pathogenic.

4.4.2.4 Statistical association of CYCS with thrombocytopenia (BeviMed).

The BeviMed method developed by Dr Daniel Greene211 was applied to see if there was an association between rare missense CYCS variants and the presence of thrombocytopenia in BPD cases. HPO phenotype codes were used to identify cases with the indicator phenotype of straightforward thrombocytopenia, defined by presence of HPO code HP:0001873 thrombocytopenia and absence of HP:0011869 abnormal platelet function. A total of 235 cases in 192 pedigrees were identified with straightforward thrombocytopenia. The probability of association of CYCS missense variants with this indicator phenotype was 0.907 and CYCS ranked 9th out of all genes for association with straightforward thrombocytopenia (see table 4.6). The top 8 genes are all established causes of inherited thrombocytopenia, which supports CYCS as a true thrombocytopenia- gene.

149

Table 4.7: Top-ranking genes associated with thrombocytopenia using BeviMed211 Gene Probability of association with straightforward thrombocytopenia under a moderate dominant model ANKRD26 1 MYH9 1 GP1BB 1 RUNX1 1 ACTN1 1 GATA1 0.981 GP9 0.979 GP1BA 0.946 CYCS 0.907

4.5 Discussion

In this chapter I have described the analysis of genotyping data from locally enrolled cases at Hammersmith hospital. This was performed in parallel to the central study-wide analysis and required the integration of data from an additional 12,000 non-BPD cases in the NIHR BioResource. The main outcomes presented in this chapter are the identification of pathogenic variants in tier-1 BPD genes.

4.5.1 Assigning pathogenicity to variants in tier-1 BP genes

The most striking finding from this analysis is that most rare, protein altering variants identified in known BPD genes in patients with BPD phenotypes are not pathogenic. Even the majority of variants where the patient phenotype matched the phenotype expected for a defect in the corresponding gene were labelled VUS, highlighting how difficult it can be to be confident that a variant is pathogenic, even in a gene known to be associated with a particular phenotype. This is a result of a necessarily cautious approach to assigning pathogenicity to variants and is one limitation of the whole genome sequencing approach for BPD. Co-segregation studies and functional experimental data remain important for clarifying the significance of a VUS but these are not always feasible. The sharing of data

150 from other WGS studies in this area will be necessary to identify additional cases and confirm the pathogenicity of some VUS.

The presence of multiple tier-1 VUS within single cases or pedigrees raises an interesting question as to whether these variants could interact synergistically as modifiers of a phenotype. Heterozygous variants which are either themselves known to be pathogenic in homozygous form, or are in genes which are associated with autosomal recessive disease may have small effects on phenotypes which are not detected clinically. It is also possible that intronic and regulatory defects on the other allele remain unidentified in these cases. There are a number of reports of heterozygous variants in ITGA2B, ITGB3, GP1BB and GP1BA (classically associated with autosomal recessive Glanzmanns thrombasthenia (OMIM #273800) and Bernard-Soulier syndrome (OMIM#231200)) causing mildly reduced platelet counts with or without mild bleeding177-179,182,212-215. However, many variants which are pathogenic in homozygous form may have no phenotype in heterozygous form (such as seen in obligate carriers of BSS-causing variants), the phenotype associated with heterozygous variants is variable and the mechanism by which they cause thrombocytopenia is not well-defined215. The clinical relevance of this is illustrated by the Leu132Gln GP1BB variant identified in a Hammersmith case and described in section 4.3.2.1.2. This variant has been labelled LPV for the macrothrombocytopenia seen in the patient but we don’t know whether this variant also carries a risk of bleeding. The lack of bleeding symptoms to date in the patient, combined with the observation that reported monoallelic BSS cases have no or mild bleeding symptoms only, suggest that it does not. However, identification of further cases and case series through WGS studies will enable better understanding of genotype- phenotype relationships. Furthermore, co-segregation studies within the Hammersmith pedigree and/or identification of more unrelated cases with the same variant are essential to confirm pathogenicity.

This analysis has also highlighted some of the limitations of commonly used methods for assessing the pathogenicity of variants. There was little correlation between the CADD, SIFT and Polyphen2 scores which are all different in silico methods for assessing the likely deleteriousness of a variant. Individual CPV, LPV and VUS shown here may be likely deleteriousness on one score and likely benign on another. Even if all scoring systems

151 suggest deleteriousness this is not sufficient evidence to reassign a VUS as an LPV so the utility of these methods in variant assessment is limited. It is also important not to rely absolutely on the presence or absence of a variant in the GnomAD database. Samples from people with severe paediatric disease are excluded from the ExAC /GnomAD databases, but we can expect that some genetic disorders will have been included. Bleeding and platelet disorders can be relatively mild and be undetected or unreported by patients so it is reasonable to expect that some variants causing BPD will be present in ExAC and GnomAD, albeit rarely. Indeed, we have VUS which are absent from GnomAD and pathogenic variants which are present in the database. The intronic F8 variant discovered in case #26 enrolled at Hammersmith illustrates that, even when a variant is listed in HGMD as DM or DM?, it must be assessed on its own merits and pathogenicity must not be presumed. This variant is labelled as DM in HGMD, but it occurs in 1 in 1141 cases in GnomAD which is clearly too frequent to cause haemophilia (incidence 1 in 5000 male births). The variant was labelled CPV and subsequently downgraded to VUS. The GnomAD frequency data would not have been available when the case was originally reported, underlining the need to continually re-assess variants in the light of new data from large-scale and disease-specific sequencing projects like ExAC/GnomAD, the 100,000 genomes project and Thrombogenomics which are providing more and more accurate information about the population frequencies of variants and revealing that many variants that have previously been labelled as disease-causing occur too frequently in the general population to be pathogenic78,111.

Variants must be assessed in the context of their clinical phenotype. HGMD-listed F8, VWF and SERPINC1 variants were all identified in Hammersmith BPD cases where the clinical phenotype did not correlate with a defect in these genes. This highlights the importance of either clinician involvement, which may not be feasible in the analysis of large-scale sequencing projects, or accurate and comprehensive phenotype information being available when pathogenicity is assigned.

152

4.5.2 The utility of high-throughput sequencing in bleeding and platelet disorders

4.5.2.1 Diagnosis

The application of HTS to patients with BPD of an unknown cause has resulted in the identification of pathogenic variants in BPD genes in 13% of cases enrolled both locally and across the BRIDGE-BPD study. Pathogenic variants mean a diagnosis for the patients and their relatives who harbour them, representing a significant step forward for this group of patients who previously rarely received a specific genetic diagnosis. However, known causes of BPD were specifically excluded from entry to the study, so at the outset we were not expecting so many of these diagnoses. The presence of these variants in BRIDGE-BPD cases can be explained by 3 factors. Firstly, it highlights the unsatisfactory traditional diagnostic pathway for patients with BPD. The routinely available investigations frequently do not point to a defect in a specific pathway or gene. Even in cases where a specific genetic defect is suspected, individual testing is often only available in specific laboratories or may not even exist (for example ACTN1, ANKRD26), so specific gene defects are either not suspected or unconfirmed. Secondly, it may not be possible to identify a single gene disorder clinically. For example, inherited thrombocytopenias may be caused by several different genes which are indistinguishable clinically and until now it has not been possible to sequence them all. Finally, several of the tier-1 BPD genes have only been discovered since the start of this study and these diagnoses would not have been possible or even suspected prior to 2012. A specific genetic diagnosis is valuable in BPD because it can inform appropriate care planning, follow-up and family counselling. This is highlighted by the thrombocytopenic cases. More than 20 families with autosomal dominant thrombocytopenia have ACTN1- related thrombocytopenia which is a benign, relatively mild, stable thrombocytopenia not associated with other pathologies. Confirming this diagnosis is reassuring for families and avoids unnecessary investigations and therapies. Some inherited thrombocytopenias are associated with non-haematological phenotypes (eg. MYH9-related disorders and DIAPH1-related thrombocytopenia and hearing loss) or risk of haematological malignancy (eg. ANKRD26, ETV6 and RUNX1-related thrombocytopenias). High- throughput sequencing through the BRIDGE study has enabled confirmed diagnoses of

153 these disorders in more than 60 patients, enabling appropriate family counselling, risk stratification and follow-up. In one case of KDSR-related thrombocytopenia with early- onset myelofibrosis, identification of this disorder (in which cytopenias improve with age) meant that the affected child was able to avoid a planned bone marrow transplant (Bariana et al, submitted)

4.5.2.2 Validation through replications

Diagnosis of known BPD was not the primary aim of the BRIDGE project, but by applying high-throughput sequencing to the largest ever collection of bleeding and platelet disorder cases, we have identified pathogenic variants in several BPD genes recently discovered by other groups. These replications are important firstly because they validate very rare disorders where there may only be one or two previously reported unrelated cases. For example, the work presented here provides further evidence for CYCS as a BPD gene. An ongoing collaboration with Prof Elizabeth Ledgerwood and Dr Ian Morison at the University of Otago, NZ to express CYCS variants identified here aims to confirm the pathogenicity of some of these variants and understand the mechanism by which they cause thrombocytopenia. Replications in BRIDGE-BPD cases are also improving genotype-phenotype correlations in recently described and established BPDs. Detailed genotype-phenotype description have recently been published on ACTN1216 and RASGRP2193. Manuscripts are in preparation for MYH9 and NBEAL2, with 3 pedigrees enrolled at Hammersmith contributing to the MYH9 collection. In the next chapter I will demonstrate how a Hammersmith pedigree has been central to the assessment of rare variants in GFI1B, and how analysis of the NIHR BioResource is improving understanding about a new BPD and expanding the repertoire of GFI1B variants associated with disease.

4.6 Further work

Assessment of the tier-1 BPD gene variants identified in the BRIDGE-BPD collection is ongoing. For those in whom no pathogenic variant has been identified, we continue to include these case in the search for novel BPD genes. For cases in whom a VUS has been identified, further evidence will be sought for their pathogenicity. For Hammersmith

154 cases we will aim to perform co-segregation studies in relatives where feasible. As we become better able to interpret the regulatory regions of the genome, it will be possible to scrutinise the alternate allele of many cases with heterozygous variants for non-coding variants which have not been identified in this round of analysis. For the CYCS variants, an ongoing collaboration with a team at the University of Otago in New Zealand will help to confirm the pathogenicity of some of these variants. Expressing the mutant proteins and assessing their effect on CYCS function will help to confirm whether the variants are benign or not. Identifying the mechanism by which CYCS variants cause thrombocytopenia will be more challenging. Ideally all variants should be continually re-assessed in the light of emerging evidence. More published cases could turn VUS into pathogenic variants and LPV into CPV and also de-classify pathogenic variants to VUS or benign. In practice, continual re-assessment may not be feasible and will certainly take a long time.

155

Assigning pathogenicity to variants in GFI1B

The identification of rare variants in tier-1 BPD genes has been described in the previous chapter, including a missense variant (9:135864478 C/T) predicting a His181Tyr substitution in GFI1B in a male patient with a historical diagnosis of storage pool disorder (pedigree #17). In this chapter I will discuss this variant in more detail and in doing so highlight some of the general challenges faced in assigning pathogenicity to variants, even in a known BPD gene.

5.1 Structure and function of GFI1B

GFI1B (growth factor independent 1B), located on 9q34.13 contains 11 exons and encodes a 330 amino-acid protein which functions as a transcriptional repressor essential for normal megakaryopoiesis. GFI1B contains a SNAG domain (Snai1/GfI1) at its N terminus, and 6 highly conserved C2H2 zinc-finger domains at the C-terminal end217(fig 5.1). Zinc fingers 3 to 5 are essential for DNA binding but the function of zinc-fingers 1,2 and 6 is uncertain. The SNAG domain is responsible for nuclear localisation and recruiting histone deacetylases and other corepressors to promoters of GFI1B target genes, modifying their chromatin structure and repressing transcription218. Murine studies have confirmed that GFI1B is essential for megakaryopoiesis and erythropoiesis and complete knockdown of GFI1B is embryonically lethal219,220. Alternative splicing can produce a second, shorter 284 amino- acid isoform (GFI1B-S) which lacks the first and second zinc fingers. GFI1B-S has been implicated in, but is not critical for erythropoiesis221,222. GFI1B was recently identified as a BPD gene in humans (platelet-type bleeding disorder 17, OMIM #187900). Dominant gain-of-function mutations in GFI1B were described simultaneously by two groups causing a gray-platelet-like syndrome (GPS-like)223,224. Since then, several more variants have been described associated with a variety of platelet defects (Table 5.1; Fig 5.1).

156

Figure 5.1: Schematic representation of GFI1B protein structure. The location of truncating and missense variants reported in the literature in association with platelet disorders are marked below the diagram (Heterozygous (dominant) variants in green; Homozygous (recessive) variants in blue). Zinc-finger domains important for DNA- binding are highlighted in green. The location of the His181Tyr amino acid change seen in the Hammersmith case discussed in this chapter is indicated in red above the diagram.

SNAG=Snai1/GfI1 domain, ZNF = zinc-finger domains.

5.2 Previously reported pathogenic variants in GFI1B

Dominant gain-of-function mutations in GFI1B were described simultaneously by two groups causing a gray-platelet-like syndrome (GPS-like) at around the same time that GFI1B variants were first identified in the BRIDGE-BPD collection224-226. The original reports described an autosomal dominant platelet disorder characterised by macrothrombocytopenia, platelet dysfunction and reduced platelet alpha-granule content. One pedigree also had red cell abnormalities. Both initially-reported pedigrees had truncating mutations in GFI1B with a dominant-negative effect on the wild-type protein224,225. In the first pedigree, a single nucleotide insertion caused a frameshift (294fs) in the 5th zinc-finger domain, predicted to destabilise the structure of the DNA- binding domain and was shown to alter the transcriptional activity of wild-type GFI1B224 (fig 5.1). The second variant introduced a premature stop codon (Gln287*), also in the 5th zinc-finger domain, leading to a truncated protein lacking the C-terminal 44 amino acids which inhibited the function of the wild-type protein225. Since then 2 additional nonsense or frameshift variants in zinc fingers 4 and 5 have been identified in pedigrees also with thrombocytopenia and alpha-granule defects227,228. Dense granule abnormalities have also been noted in some cases 228. Schulze et al report a homozygous frameshift variant

157 in the 1st ZNF (ser185Leufs*3) leading to the formation of a truncated GFI1B protein missing the first two zinc fingers, similar to the short GFI1B isoform produced by alternative splicing. This was associated with autosomal recessive macrothrombocytopenia and a mixed alpha-delta granule platelet defect222. Three splice variants in GFI1B have been reported. The first was identified in another whole exome sequencing study of bleeding and platelet disorders in 2 members of the same family with macrothrombocytopenia209. This variant at c.814+1G>A is located at the end of exon 4 and is predicted to affect splicing but no mechanism for causing macrothrombocytopenia has been elucidated. A second splice variant was reported by Rabbolini et al in another pedigree with macrothrombocytopenia without bleeding or platelet dysfunction. The family also carried an MYH9 variant which co-segregated with the thrombocytopenia and was labelled as the cause of that phenotype rather than the GFI1B variant. In this pedigree, the GFI1B splice variant was labelled only as the cause of CD34 expression on the platelets229. A third rare splice variant (MAF 0.009) has been identified in a genome wide association study by Polfus et al associated with a mild reduction in platelet count but not affecting other platelet traits. The synonymous variant rs150813342 is located in exon 5 (which encodes ZNF 1 and 2) and is predicted to result in alternative splicing, producing a short isoform of GFI1B missing zinc fingers 1 and 2, similar to the case reported by Schulze et al but with a much milder phenotype230. GFI1B represses the promoter of CD34 and several groups have noted that cases with GFI1B variants show increased CD34+ expression on the platelet surface225,227-229. Indeed, CD34+ platelets was the only phenotype attributed to the GFI1B splice variant seen by Rabbolini et al229.

Missense variants in GFI1B have also been reported. A homozygous missense variant in the 6th ZNF (hom Leu308Pro) was identified in a pedigree with macrothrombocytopenia, bleeding, alpha-delta granule defect and CD34 expression on platelets but the mechanism by which this variant is pathogenic has not been elucidated228. Rabbolini et al also report a Cys168Phe substitution in the first zinc-finger in association with macrothrombocytopenia but no bleeding or granule abnormality (Fig 5.1 and Table 5.1). This variant is predicted to alter the conformation of the 1st ZNF and was shown to affect GFI1B transcriptional repression but to a lesser degree than the truncating mutations discussed above229.

158

In summary, truncating GFI1B mutations affecting the DNA-binding capacity of GFI1B appear to inhibit the repressional activity of the wild-type protein in a dominant-negative manner, but this mechanism has not yet been confirmed in missense GFI1B variants.

159

Table 5.1: Previously reported variants in GFI1B. Patients phenotypes are given as described in the respective publications, including presence or absence of bleeding and thrombocytopenia; platelet size; whether red blood cell morphology is normal or abnormal; number of alpha and dense platelet granules and the expression of CD34 on the platelet surface. TCP= thrombocytopenia; RBC=abnormal red blood cell morphology; na=results not available.

Variant Variant effect Bleeding TCP Large RBC Platelet a granules Dense CD34+ Ref platelets function granules platelets abnormal c.859C>T p.Gln287* y y y na na ¯ na y 225 c.880-881insC p.His294fs y y y y y ¯ na na 224 c.814+1G>C p.Gly272fs y y y y ¯ na y 227 c.814+1G>A splice variant y y n na y na na na 209 c.551insG (hom) p.Ser185fs y y y na y ¯ ¯ y 222 c.793A>T p.Lys265* y y n y na ¯ ¯ na 228 c.293T>C (hom) p.Leu308Pro y y y na na ¯ ¯ y 228 c.503G>T Cys168Phe n y y na n normal na y 229 c.2520+1_2520+8d splice variant n n n na n na na y 229 elGTGGGCAC

160

5.3 GFI1B variant identified in case enrolled at Hammersmith hospital

The proband of this family (pedigree #17 II/2) was 66 years old when enrolled into the BRIDGE-BPD study. He had been diagnosed with a platelet storage pool disorder several years previously. He described severe bleeding symptoms starting in childhood with life- threatening bleeding after a tonsillectomy, bleeding for several days after dental extractions and bleeding requiring blood transfusion after surgery. He did not report a history of spontaneous bleeding but bruised excessively after trauma. Condensed MCMDM-1 bleeding score was 11 at enrolment. Platelet count was variable and towards the lower end of the normal range (around 150 x109/L), mean platelet volume was raised and large, agranular platelets were noted on the blood smear (table 5.2, fig 5.4). Platelet light transmission aggregometry demonstrated impaired aggregation to ADP with disaggregation at lower concentrations and no secondary wave seen, impaired aggregation in response to epinephrine and collagen and reduced agglutination to ristocetin compared to control. Nucleotide assays showed low ATP content and ATP:ADP ratio was elevated at 2.38 (ref 1-2.2). Family history suggested a highly penetrant disorder with autosomal dominant inheritance (fig 5.2)

161

Figure 5.2: Pedigree #17 from BRIDGE-BPD collection. Black arrow points to the proband. Family members with history of significant bleeding symptoms suggestive of platelet function disorder are marked in red; those for whom the phenotype is unknown are marked in grey; those known to be unaffected are white. Case III/4 marked with ? was reported by the proband to having bleeding symptoms but her platelet phenotype is unknown and she was not enrolled in the study.

5.4 Variant identified

Whole exome sequencing identified a missense variant in the GFI1B gene (9:135864478 C/T) predicted to cause a His181Tyr substitution located in the first zinc finger of GFI1B (fig 5.1). The histidine residue at position 181 is well conserved across species, in the paralogue GFI1 and in the SNAG domain-containing zinc finger protein SNAI1 (fig 5.3). The variant is not present in the GnomAD database but there is a different nucleotide substitution which produces a histidine to arginine change at the same position (His181Arg) with a MAF of 4x10-6.

162

Figure 5.3: Conservation of the Histidine residue 181 in GFI1B. Alignment of the protein sequences of human GFI1B long and short isoforms (GFI1B short is missing ZNF 1&2), GFI1B in other species, the paralogue GFI1 and another SNAG domain- containing protein, SNAI1. The first ZNF domain and the start of the second ZNF are highlighted in purple.

5.5 Co-segregation studies and extended phenotyping

In order to gain more evidence for the pathogenicity of this variant, relatives of the proband were contacted, interviewed and enrolled in the study as described previously (fig 5.2). Case II/2 was already a patient at the Hammersmith hospital haemophilia centre, having been investigated for a platelet disorder several years previously after the diagnosis of her brother. Other family members were not under the care of a haematologist. Given the presence of alpha granule defects and CD34+ expression on platelets described in published cases, extended phenotyping was performed on cases where possible. Clinical and laboratory characteristics of all enrolled relatives are summarised in table 5.2. Two siblings (II/2 and II/6) and a niece (III/1) of the proband had a bleeding tendency associated with platelet dysfunction. Platelet counts were in the normal range in all cases. Peripheral blood smears were examined in all enrolled family members – 3 cases (II/2, II/3, II/6) had platelet anisocytosis with large, hypogranular platelets seen in variable quantities between cases (fig 5.3). Case III/1, the niece of the proband, had a normal platelet count and morphology on the blood smear. Cases III/1 and II/2 had impaired

163 platelet aggregation to 3 agonists, similar to the proband. LTA was not performed in the other affected sibling (II/6) but PFA-100 closure times were markedly prolonged in both the ADP/collagen (>300s) and epinephrine/collagen (188s) membranes. One sibling of the proband (II/4) did not have any bleeding symptoms despite having undergone previous haemostatic challenges, platelets were a normal size and had normal appearance on the blood smear and PFA-100 closure times were within the normal range (marked unaffected in fig 5.3) Electron microscopy of platelet thin sections was performed by Dr Kathleen Freson at the University of Leuven on platelets from 2 affected cases (II/2 and II/3) and both had a reduced number of alpha granules (table 5.2). Whole mount electron microscopy was not performed so an accurate assessment could not be made of dense granules, however lumi-aggregometry in III/1 showed reduced ATP secretion in response to both ADP and thrombin and platelet nucleotide assays in II/2 and II/3 showed reduced ADP content with an increased ATP:ADP ratio, suggestive of a platelet dense granule defect in all three of these cases.

164

Table 5.2: Bleeding and platelet characteristics of all family members tested from Hammersmith pedigree #17. All affected family members have abnormal bleeding, a platelet function defect with abnormal aggregation responses to at least 3 agonists and either platelet nucleotide assay or ATP secretion (measured by lumi-aggregometry) results suggestive of a dense granule defect. The two cases (II/2 and II/3) who had thin-section electron microscopy were also found to have decreased numbers of alpha granules. Blood smear analysis was performed by me using light microscopy and blood smears from cases II/3, II/2, II/5 and II/6 were all independently assessed by Prof Wendy Erber, Perth Australia.

Family ID II/3 II/2 II/5 II/6 III/1 Bleeding score 11 10 -1 4 8 Platelets (x109/L) 156 152 237 178 257 MPV (fl) 12.7 11.5 10.4 12.4 10.8

Blood smear Some large,hypogranular Platelet anisocytosis with Normal platelet Large,hypogranular Platelet anisocytosis platelets. Majority have variable granulation morphology platelets normal granulation

PFA-100 closure times Coll/ADP (secs) prolonged prolonged normal prolonged prolonged Coll/Epi prolonged prolonged normal prolonged normal

Light transmission aggregometry ADP abnormal abnormal n/a n/a abnormal Epinephrine abnormal abnormal n/a n/a abnormal Collagen abnormal normal n/a n/a normal Ristocetin abnormal abnormal n/a n/a normal AA n/a n/a n/a n/a abnormal

165

ATP secretion n/a n/a n/a n/a reduced ATP release (lumi-aggegometry) with ADP and thrombin Platelet nucleotides (nM) ATP 0.51 (0.35-0.81) 0.62 (0.35-0.8) n/a n/a n/a ADP 0.18 (0.25-0.6) 0.15 (0.25-0.6) n/a n/a n/a ATP:ADP ratio 2.38 (1-2.2) 4.13 (1-2.2) n/a n/a n/a

Electron Reduced number of Reduced number of n/a n/a n/a Microscopy alpha granules. Vacuoles alpha granules. Vacuoles

166

Figure 5.4: Phenotypes of affected members of pedigree #17. (A) Peripheral blood smears (May-Grunwald Giemsa stain 100X) of whole blood from proband II/3. Platelet anisocytosis is evident and large, hypogranular platelets are indicated by arrows. (B)

167

Representative light transmission aggregometry traces from case III/2 (right) showing lack of secondary wave in response to ADP and disaggregation at 3 lower concentrations; delayed and reduced maximum aggregation in response to epinephrine; normal responses to ristocetin and collagen; delayed and reduced aggregation in response to low dose arachidonic acid. All aggregation responses are compared to an age and sex-matched control (left). Agonists used at various concentrations are indicated. The platelet aggregation responses from cases II/2 and II/3 were similar to this case. (C) Thin section electron microscopy of platelets from II/3 (C-1&2) showing marked absence of alpha granules in comparison to platelets from a healthy control (C-3&4). Granule-sized empty vacuoles are also seen (arrow). Images courtesy of Dr Kathleen Freson, University of Leuven.

168

169

DNA was extracted from whole blood from all relatives as described previously and Sanger sequencing performed to detect the presence of the 9:135864478 C/T variant seen in the proband. The variant in GFI1B co-segregated in an autosomal dominant manner with bleeding and platelet granule defect in the family. The unaffected sibling of the index case (II/5) was homozygous for the reference allele (C/C). (Fig 5.5)

Figure 5.5: Co-segregation of pedigree #17. (A) Pedigree diagram indicating proband (black arrow), affected cases (red), unaffected relatives (white) and relatives in whom the phenotype is unknown (grey). Results from Sanger sequencing for the 9:135864478 C>T variant in GFI1B show that the missense variant co-segregates with the bleeding and platelet abnormalities in those family members who have been sequenced. (B) Sanger sequencing chromatograms from members of this family, indicating the heterozygous peak in the affected relatives and homozygosity for the reference C allele in the unaffected sibling of the proband (II/5)

170

5.5.1 CD34+ expression on platelets is increased in cases with GFI1B His181Tyr

Flow cytometry analysis was performed to measure CD34 expression on the surface of platelets in cases II/2, II/3 and II/6. Flow cytometry was performed by Harriet Mckinney at the University of Cambridge. Samples from cases II/2, II/3 and II/6 showed increased CD34 expression on the surface of their platelets, in comparison to platelets from a group of healthy controls (n=175) in whom CD34 expression was not detected (fig 5.6).

Figure 5.6: CD34+ expression on platelets in GFI1B cases. Scatter plot showing the mean fluorescence intensity of platelets incubated with FITC-labelled CD34 antibody from cases II/2, II/3 and II/6 (GFI1B cases) and 175 healthy controls.

5.5.2 Alternative candidate variants

Sequencing data from the proband (WES) and relative III/1 (who had undergone WGS) was scrutinised for the presence of alternative candidate variants. Forty-five rare (MAF <1:1000) missense or predicted high-impact variants were present in both cases. Data was compared across all BRIDGE-BPD cases and the NIHR BioResource. Variants were excluded if they were present in BPD or non-BPD cases with a different phenotype to the proband. Ten missense or high-impact variants in ten genes were identified which met the phenotype criteria (fig 5.5). None of these genes are known to be associated with

171 platelet disorders, megakaryopoiesis or granule biogenesis. I looked at the relative expression of these genes in megakaryocytes and platelets measured by RNA expression levels (data from the Blueprint Epigenome consortium226), depicted in fig 5.7 and GFI1B had the highest relative mRNA expression in megakaryocytes and platelets when compared to all the other candidate genes.

Figure 5.7: Expression of candidate genes in which cases II/3 and III/1 share missense or high-impact variants with a MAF <0.001. Heatmap courtesy of Luigi Grassi, University of Cambridge. Data from Blueprint Epigenome consortium226.

5.6 Identification of GFI1B variants in other BRIDGE-BPD cases

The BRIDGE-BPD collection was also scrutinised for GFI1B variants. Twelve additional heterozygous missense variants were identified in 16 index cases with a variety of bleeding and platelet phenotypes (detailed in table 5.3 and figure 5.6). Two missense variants, Asp23Asn and Gly139Ser are located in the intermediary domain between the SNAG domain and the 1st ZF, whose function is unknown. No pathogenic missense variants have yet been described in the intermediary domain. The Asp23Asn variant has a MAF of 0.001461 in GnomAD and is also seen in homozygous form in the database. The case with this variant is of South Asian descent and the variant is particularly prevalent in the South Asian population at a frequency too high to be

172 considered pathogenic (MAF 0.01268). The p.Gly139Ser substitution was seen in a case with thrombocytopenia but normal size platelets and platelet function. Gly139 is a highly evolutionary conserved residue (Chen 2016) and this variant is seen <1 in 100,000 in GnomAD, although there are also variants causing different amino acid substitutions at the same position (p.Gly139Arg, MAF 0.000004; p.Gly139Cys, MAF 0.0001). The significance of p.Gly139Ser in terms of causing thrombocytopenia is therefore uncertain. Four missense variants are predicted to cause amino acid substitutions in the 1st zinc finger - Cys165Gly, Cys168Phe and Arg184Pro – and the His181Tyr mutation which has already been discussed. All 4 of these variants are in pedigrees with thrombocytopenia but abnormal platelet function, bleeding and increased platelet size are not consistently present. The Cys165Gly variant is present in 2 members of a pedigree with macrothrombocytopenia without bleeding, but not in a third relative who has large platelets but a preserved platelet count. The variant is absent from GnomAD and is located close to residues previously shown to be important for the conformation of the first zinc-finger domain226. Further studies are required to determine the significance of the Cys165Gly variant and whether the large platelets in the non-thrombocytopenic relative could have an alternative cause.

One case had a homozygous missense variant (Cys168Phe) which is seen in the GnomAD dataset and has also been reported in heterozygous form in 3 families with macrothrombocytopenia229. The zinc-finger structure suggests the cysteine to phenylalanine substitution would disrupt folding of the zinc finger domain and it has been shown to affect transcriptional repression by GFI1B229. This variant is seen in GnomAD with an allele frequency of 1:2000, but is even more prevalent (MAF 1:250) in South Asian populations and the BRIDGE-BPD case is of South Asian descent. This is very high frequency to be considered a pathogenic variant for rare disease. The platelet phenotype is also different to that published – the BRIDGE case has additional bleeding and platelet dysfunction which is absent in the reported case, although it is arguable that this could be explained by having the variant in homozygous versus heterozygous form. An Arg184Pro variant was seen in 2 BRIDGE-BPD probands with quite different phenotypes – one with mild macrothrombocytopenia without significant bleeding or platelet dysfunction and the other with more significant thrombocytopenia, small platelets, platelet dysfunction and abnormal alpha-granule distribution. In the first case

173

(pedigree 7 in table 5.3), four family members were sequenced and the GFI1B variant was found to co-segregate with the presence of macrothrombocytopenia but not with the bleeding and platelet defect (Dr Sarah Westbury, personal communication). The Arg184 residue is highly conserved throughout evolution and the Arg184Pro variant is not seen in GnomAD. However, variants causing different amino acid substitutions at the same position are seen (Arg184Gly, Arg184Cys, Arg184His) with a combined MAF of 1:10,000. It is unlikely that all of these variants cause macrothrombocytopenia and the variation in phenotypes between the BRIDGE-BPD cases with this variant make its significance uncertain. Four unrelated cases were identified carrying the same variant which is predicted to cause an Arg190Trp substitution in the linker region between the first and second zinc finger domains. This variant is also seen in GnomAD (MAF 0.000134) and the platelet phenotypes were not consistent between cases (table 5.3). One case (#11, table 5.3) had small platelets and a likely pathogenic variant in WAS which fully explained the phenotype. This evidence supports the conclusion that Arg190Trp is unlikely to be pathogenic. Two missense variants are seen located in the 2nd ZNF domain (Gly198Ser and Thr211Met). The phenotypes of these two cases are very different and it is unlikely that they share a common mechanism. Case #13 (Gly198Ser, table 5.3) had a macrothrombocytopenia with no associated platelet function phenotype and is explained by a pathogenic variant in ACTN1. Gly198 is highly conserved231 but there are 2 variants in GnomAD which produce different amino acid substitutions at the same position (Gly198Arg and Gly198Asp), suggesting that the glycine may not be critical for function. Case #14 (Thr211Met) had bleeding and a specific aggregation defect in response to ristocetin only. Platelet count and morphology were normal. This variant remains of uncertain significance. Three missense variants were identified which could potentially affect the DNA-binding ZNF’s – Cys222Arg in the link region between the 2nd and 3rd ZNFs in case #15 with bleeding and platelet dysfunction without thrombocytopenia; Ser233Pro in case #16 with macrothrombocytopenia and platelet dysfunction and Gly282Arg with a mild ATP secretion defect and bleeding without thrombocytopenia (case #18). These variants are of uncertain significance until ongoing functional studies can ascertain the effects of

174 missense variants in these regions. The phenotypes in the cases are too different to be explained by a common mechanism. One GFI1B frameshift was identified in the BPD cohort at Gln89 in a case with macrothrombocytopenia and abnormal alpha granules. This frameshift occurs earlier in the protein than the previously reported frameshift variants, but the case has a similar phenotype to those reported and is likely to be pathogenic.

Figure 5.8: GFI1B variants identified in BRIDGE-BPD cases. Missense and frameshift variants identified in BRIDGE-BPD cases by whole exome and whole genome sequencing are indicated above the diagram with the Hammersmith H181Y mutation highlighted in red. Previously reported variants (heterozygous variants in green; homozygous variants in blue) are labelled below the diagram for comparison.

A splice donor variant was identified in 2 members of pedigree #17 (table 5.3) with macrothrombocytopenia, bleeding and platelet dysfunction. This variant is listed in HGMD as a disease-causing variant on the basis of a case reported by the GAPP whole exome sequencing study209 (table 5.1). The phenotype of the two cases is very similar and this would ordinarily be enough to assign ‘likely pathogenic’ status to a variant. However, further enquiry revealed that the same patient had undergone whole exome sequencing in the GAPP study and whole genome sequencing in the BRIDGE-BPD study so this is the same pedigree and the variant has still only been seen once. This variant is predicted to affect splicing but the mechanism of causing the platelet phenotype is unknown.

175

Table 5.3: Phenotypes of BRIDGE-BPD cases with rare coding variants in GFI1B. TCP: thrombocytopenia; RBC: red blood cell abnormalities; EM: Electron microscopy; na: information not available. *indicates a variant causing a different amino acid substitution at the same residue is present in the GnomAD database. Y indicates the variant is listed in HGMD as DM. Variant GnomAD Bleeding TCP Large RBC Platelet function Platelet thin section CD34+ Alternative MAF platelets (LTA or PFA-100) EM platelets cause 1 p.Asp23Asn 0.001461 y n na na abnormal na na 2 p.Gln89fs 0 y y n na abnormal decreased a granules na 3 p.Gly139Ser 8.13E-06 y y n na na na na 4 p.Cys165Gly 0 n y y ? na na na 5 p.Cys168Phe 0.0005 y y na na abnormal na na (hom) 6 p.His181Tyr 0 y n y na abnormal decreased a granules y 7 p.Arg184Pro 0* y y/n y/n y normal na na 8 p.Arg184Pro 0* y y n na abnormal normal number of a na granules but abnormal distribution 9 p.Arg190Trp 0.000134 n y y na normal na na 10 p.Arg190Trp 0.000134 n y y na abnormal na na 11 p.Arg190Trp 0.000134 y y n na normal a granules na WAS 12 p.Arg190Trp 0.000134 n n y na normal a granules y 13 p.Gly198Ser 0.000144 n y y na abnormal na na ACTN1 14 p.Thr211Me 0.000012 y n n na abnormal na na t 15 p.Cys222Arg 0 y n n na abnormal na na

176

16 p.Ser233Pro 0 y y y na abnormal na na 17 splice donor Y y y n y abnormal na na 18 p.Gly282Arg 0.000012 y n n na abnormal na na

177

5.7 GFI1B variants in the general population

In order to gain an estimate of the frequency of different GFI1B variants in the general population, 13,049 cases enrolled into the entire NIHR BioResource , UK Biobank and GEL pilot studies (referred to as non-BPD cases) and the GnomAD database of 123,136 exomes and 15,496 genomes from unrelated individuals were scrutinised for the presence of GFI1B variants. In the non-BPD cases, there were 49 different GFI1B variants (with a MAF <0.001) occurring in 63 unrelated individuals. This gives an estimated frequency of 0.0048, or a rare GFI1B variant occurring in 1 in every 208 cases in this group. This is a much higher frequency seen than in the GnomAD cohort where the overall frequency of GFI1B variants is 0.0001, giving an estimated occurrence of 1 in 10,000 in the general population. The types of variants seen in the non-BPD cases are listed in table 5.4. Only 2 of these variants were also seen in BRIDGE-BPD cases. One missense variant, predicted to cause an Arg190Trp substitution was detected in a case from the Primary immune disorders (PID) study and also 2 related cases from the GEL pilot. The platelet phenotypes for non- BPD cases are unknown. This variant was also seen in 4 BRIDGE-BPD cases with a variety of BPD phenotypes (listed in table 5.3) and this adds to evidence that this variant is unlikely to be pathogenic. A splice donor variant g.9:135865295 G>A, c.184+1 G>A) seen in a case with multiple primary tumours (MPMT study) is also present in a BRIDGE-BPD pedigree (listed in table 5.3) and is listed in HGMD as a disease-causing variant for thrombocytopenia209. There are two splice variants at this position listed in HGMD as disease-causing (c.814+1 G>A and c.814+1 G>C), in two unrelated pedigrees with autosomal dominant thrombocytopenia. Functional studies on the latter mutation have shown that it results in a frameshift leading to loss of the 5th zinc-finger with dominant- negative effect on wild-type GFI1B 227. Without knowing the platelet count of the non- BPD case in the NIHR BioResource it isn’t possible to comment on the significance of the GFI1B variant in this case. It also illustrates that we cannot presume that variants present in a non-BPD case are non pathogenic. The type of variants present in GnomAD are listed in table 5.4 and the predicted locations of the missense variants in the GFI1B protein are depicted in figure 5.9. A total of 215 different variants were seen. There are 20 loss of function (LOF) variants including 7 stop gains and 2 frameshifts (which are also predicted to lead to a premature stop). All except

178

1 of these occur in the intermediary domain (fig 5.9), suggesting that heterozygous LOF variants may be better tolerated here than in the ZF domains. There are 195 different missense variants seen in GnomAD, occurring in 6754 alleles. 7 of these variants are relatively high frequency, occurring at a frequency of >0.001 and accounting for 4918 of the allele count. Again, the vast majority of missense variants occur in the intermediary domain. Comparatively fewer variants are seen in ZF domains 3 and 4. More variants are seen in ZF’s 1 and 2 with cumulative allele counts of 248 and 1504 respectively, suggesting that alterations of the amino acid sequence in these domains are either better tolerated than in the DNA-binding domains, or that the resulting human phenotype is mild, thus going undiagnosed and allowing entry into the GnomAD and NIHR BioResource/UK Biobank and GEL pilot datasets. Notably, no variants are seen in the SNAG domain in either dataset, indicating that disruption to this domain is unlikely to be tolerated in humans and is consistent with the observation that knockdown of GFI1B is embryonically lethal in mice219.

Table 5.4: Types of GFI1B variants seen in the general population. The number of different variants seen in the GnomAD database and NIHR BioResource/UK Biobank/GEL pilot collections (non-BPD cases) are grouped by the type of effect they are predicted to have on the protein.

GnomAD Non-BPD cases

Loss of function (LOF) • splice 11 1 donor/acceptor • frameshift 2 0 • stop gain 7 1 Missense 195 46

179

Figure 5.9: GnomAD and NIHR BioResource GFI1B variants. GFI1B protein structure annotated with the location of variants present in GnomAD (above diagram) and non-BPD cases in NIHR BioResource/UK Biobank/GEL pilot (below diagram). Variants are grouped by the functional domain of GFI1B in which they occur. Some missense variants occur in the linker regions between the zinc-finger domains and these are indicated by arrows directed in-between ZF domains. Frameshift variants and other variants predicted to produce a premature stop codon are listed in green. Missense variants are in black. For GnomAD missense variants, the number of different variants per domain is given with the allele count given in italics in brackets (eg. there are 19 different missense variants listed in GnomAD affecting the 1st ZF domain, seen in 248 alleles). For variants in non-BPD cases, the

180 number of different variants is given followed by the number of index cases in which that variant is seen (eg. 7 missense variants in 9 index cases in the 1st ZF).

181

5.8 Discussion

Despite sharing many phenotypic features with the reported cases with gain-of-function variants in GFI1B, the His181Tyr missense variant detected in the Hammersmith family does not meet criteria to be designated pathogenic. This case highlights many of the challenges when assigning pathogenicity to a novel variant in a BPD case, even in a known BPD gene. His181 is located in the 1st zinc-finger domain of GFI1B in a highly conserved residue and the His181Y variant co-segregated with bleeding and platelet dysfunction, but not platelet count or morphology in the Hammersmith pedigree. Despite the phenotype of the affected cases being compatible with a pathogenic variant in GFI1B (increased platelet size, alpha granule defect, bleeding and CD34 expression on platelets), there are as yet no functional studies confirming how a heterozygous missense variant would cause this phenotype and so the variant cannot be labelled anything more than a VUS. The emergence of more reported cases of this variant in association with the same phenotype could also upgrade this variant to likely pathogenic. A different heterozygous missense variant also located in the 1st zinc finger domain (Cys168Phe) was recently reported and shown to affect GFI1B transcriptional repression229. One might expect two closely located variants to produce the same patient phenotype, if they were both pathogenic, but the reported Cys168Phe case does not have platelet dysfunction. The Cys168Phe variant was only labelled pathogenic for platelet CD34 expression in the Rabbolini paper, with the macrothrombocytopenia attributed to an MYH9 variant. The Cys168Phe variant was also identified in homozygous form in a BRIDGE-BPD case with thrombocytopenia, bleeding and platelet dysfunction. It is possible that the variant has a more severe effect in homozygous than heterozygous form: this requires confirmation in functional studies. This variant is also seen with high frequency in South Asian populations, highlighting the importance of taking a patient’s ethnicity into account when assessing a variant. I have described examples of GFI1B variants in pedigrees with multiple platelet phenotypes which do not necessarily co-segregate together, for example the Arg184Pro variant described above. This raises the possibility that these phenotypes may be caused by different genetic variants (as in Rabbolini et al, 2017) and also that there may be important, possibly quite frequent, modifiers causing variable penetrance within families.

182

Alternative candidate variants were sought in this family but GFI1B was the only candidate gene amongst the 11 genes in which the index case and his niece shared rare coding variants and which has a known role in megakaryopoiesis. Using RNA-seq data from the Blueprint epigenome project can help identify genes which are megakaryocyte- specific or are relatively highly expressed in MKs. On this occasion, RNA-seq data prioritised GFI1B as the gene with the highest MK and platelet expression from the list of potential candidates. It is possible that a non-coding variant is responsible for the platelet phenotype in this family but the proband II/3 was submitted for whole exome sequencing only and whole genome sequencing data is therefore only available for case III/1. Ongoing work by the BRIDGE team using epigenomic data to highlight regulatory areas of genes which are important for megakaryopoiesis will help to prioritise non-coding variants.

Comparison of the BRIDGE dataset with sequencing data from GnomAD and the NIHR BioResource shows that there are relatively more misssense variants in the 1st and 2nd ZNFs of GFI1B than in the other ZNF domains. Six of these variants were identified in an early analysis of the BRIDGE dataset and published226 and the later analysis presented here has revealed 8 more variants in cases with platelet disorders. This shows that missense variants in GFI1B are reasonably common and their significance in the pathogenesis of BPD must be assessed on an individual basis.

The published GFI1B cases exhibit a variety of phenotypes and this observation is corroborated by analysis of the BRIDGE-BPD cohort. Frameshift and truncating variants so far have all been associated with macrothrombocytopenia with platelet dysfunction and alpha granule defects (plus dense granule defects where this has been assessed). But the mechanism may be different depending on the zygosity and location of the variant. The truncating mutations in the 4th and 5th ZNFs appear to exert a dominant negative effect on the wild type protein but the reported frameshift in the 1st ZNF is in homozygous form, with heterozygous carriers being unaffected. A heterozygous frameshift variant in the intermediary domain (Q89fs) was seen in a BPD case also with macrothrombocytopenia and alpha-granule defects but several frameshift variants are seen in this domain in GnomAD and also in a case in the NIHR-BioResource non-BPD collection, indicating that we cannot be certain about the pathogenicity of the Q89fs

183 variant without further functional studies to prove a mechanism of action for heterozygous frameshifts in this domain. A variety of platelet phenotypes have also been seen in missense variants in the BRIDGE- BPD collection. Many of these variants are likely to be benign and this highlights the need to carefully consider the pathogenicity of every individual variant even if the patient phenotype appears to ‘match’ that expected from a defect in the gene of interest. The frequency of variants in the intermediary domain shows that variants here need to be treated with particular caution. In the BRIDGE-BPD dataset, the ability to compare sequencing and phenotype data from cases across the entire NIHRBioResource has been useful in assessing variants such as the Arg190Trp and the splice donor variant. Analysis of the c.814+1G>A splice variant in the BRIDGE-BPD case has highlighted the problem of being too reliant on variant databases. The variant seen in the BRIDGE-BPD case is reported in HGMD as a disease-causing mutation in a case with a similar phenotype. This would traditionally be enough to label a variant ‘pathogenic’, however, closer inspection revealed this case to be the same case referred to in HGMD and published in 2016 by another study group209. The same variant is also seen in another case in the NIHR BioResource Multiple Primary Tumours study which may make it less likely to be pathogenic, although we do not know the platelet phenotype of the NIHR BioResource case. This raises a further issue of ‘control’ datasets: mild thrombocytopenia and platelet dysfunction has historically been underdiagnosed. It is therefore plausible that patients with platelet abnormalities have not been excluded from population datasets such as GnomAD and that platelet phenotypes may not be recorded in the NIHR BioResource and GEL cohorts, meaning we cannot necessarily exclude variants from analysis because they appear in these datasets.

In conclusion, analysis of the 18 pedigrees with platelet disorders with rare variants in GFI1B in the BRIDGE-BPD cohort has confirmed the variety of platelet phenotypes that may be associated with variants in the same gene and underlined the difficulties faced with assigning pathogenicity to variants in tier-1 genes. Each variant must be assessed on its own merit. Control datasets such as GnomAD and variant databases such as HGMD are useful tools but pathogenicity must be supported by functional and phenotypic data. With the rapid expansion of genome sequencing data, the number of variants identified

184 in genes where there is some sort of phenotype match is going to rise exponentially. It is not practical perform functional and in depth phenotyping studies (for example platelet CD34 expression) on every single variant identified in large-scale sequencing studies, therefore there are likely to remain a large number of variants of unknown significance.

5.9 Further work

Functional studies are necessary to confirm the phenotype associated with the GFI1B variants presented here and assess their effect on GFI1B function. An ongoing collaboration with a team at Radboud University Medical Center, Nijmegen aims to characterise the GFI1B variants detected in the BRIDGE-BPD study. This is the largest case series of GFI1B variants yet described and publication of this work will help to confirm the role of missense variants in the pathogenesis of BPD. This in turn will enable better interpretation of GFI1B variants in the future. Since the Hammersmith pedigree remains a VUS, the cases who have not been sequenced on a HTS platform are being entered into the 100,000 genomes project232 to identify any other potential candidate variants in this pedigree.

185

Novel gene discovery

6.1 Novel gene discovery in BRIDGE-BPD

In previous chapters I have presented variants identified in known BPD genes in cases enrolled in the BRIDGE-BPD study and the consequent diagnoses that were made in patients. The primary aim of the BRIDGE-BPD study was to identify novel genes associated with bleeding and platelet disorders. BPD are heterogeneous and individual BPD are often extremely rare so the objective was to recruit more than 1000 BPD cases internationally. We hypothesised that by sequencing a large number of cases this would increase the chance that a particular variant or group of variants in a particular gene would be recruited several times, thereby increasing the probability it would be identified. We employed a series of informatics approaches to facilitate the identification of novel BPD genes, many of which have already been described in the previous chapters: HPO-based clustering; use of large reference datasets such as ExAC and GnomAD to exclude common variants; prioritisation using the CADD pathogenicity score. In addition, a variety of novel algorithms have been developed by the BRIDGE computational scientists to facilitate novel gene discovery. One key approach was the integration of data from other sources, including mouse phenotypes from knockout databases and RNA sequencing data from the BLUEPRINT epigenome consortium226. Comparison of mouse knockout phenotypes (MPO) with BPD case HPO phenotypes combined with pathogenicity scoring and RNA sequencing data from Blueprint facilitated the discovery of SRC as a novel BPD gene200. An HPO-based similarity regression analysis led to the discovery of DIAPH1 as a novel cause of thrombocytopenia and sensory-neural hearing loss199. As part of the core analysis team of the BRIDGE-BPD consortium I was directly involved in the development of methodologies and analyses which led to these discoveries. Since analysis of all cases across the NIHR BioResource was central to all strategies employed by the BRIDGE consortium, my wider contributions to study design, data collection, patient recruitment, phenotyping and co-segregation have also enabled other BRIDGE-BPD gene discoveries during the course of my PhD, listed in table 6.1. In

186 this chapter I describe some of the approaches taken to discover other novel candidate BPD genes in locally enrolled cases.

Table 6.1: Novel BPD genes discovered by the BRIDGE consortium since 2012. Gene BPD phenotype Year DIAPH1 Autosomal dominant thrombocytopenia & sensorineural 2016199 hearing loss SRC Autosomal dominant macrothrombocytopenia, early-onset 2016233 myelofibrosis and bone pathologies TRPM7 Macrothrombocytopenia 2016234 TPM4 Macrothrombocytopenia 2017201 KDSR Thrombocytopenia, platelet granule defect and early onset 2018 (Bariana myelofibrosis et al, submitted)

6.2 Identification of candidate genes in local pedigree with platelet function defect using gene prioritisation methods

I applied the methods described to identify candidate variants causing a well-defined platelet function disorder in a family enrolled locally, called Hammersmith family 428.

6.2.1 Hammersmith family 428 characteristics

The proband in this family was diagnosed in her 40’s with a platelet function disorder when she suffered excessive post-operative bleeding after a hysterectomy requiring blood transfusion and repeat surgery. She had actually requested a hysterectomy due to severe menorrhagia and described life-long easy bruising, gum bleeding and severe bleeding after surgery and dental extractions. The proband’s condensed MCMDM-1VWD bleeding score was 16 at the time of enrolment. The diagnosis in the proband led to the investigation of her two daughters and subsequent diagnoses of a platelet function disorder in both (fig 6.1). Both daughters of the proband had a history of mucocutaneous bleeding with menorrhagia, bruising and excessive bleeding after dental extractions. Only the younger daughter had undergone surgery at the time of her diagnosis – a tonsillectomy, after which she had required multiple blood transfusions, consistent with the severe bleeding phenotype in the family. The proband and her daughters had a normal platelet count and size. Platelet light transmission aggregometry produced

187 similar responses in all cases: the most marked defect was reduced aggregation in response to ADP, with no secondary wave seen and disaggregation at lower doses. Aggregation was reduced in response to arachidonic acid and there were variable responses to epinephrine and collagen. All 3 cases had abnormal platelet ATP release measured by either lumi-aggregometry or nucleotide assay (table 6.2). Local interpretation of these results in the proband were thought suggestive of a specific ADP signalling defect so that proband DNA had been previously sent for sequencing of the P2Y12 receptor gene but no causative variant was identified. The proband (III/3) and one daughter (IV/1) were enrolled early in the study and DNA samples were analysed on the whole exome sequencing platform. The second daughter (IV/3) was enrolled later and whole genome sequencing was performed on this sample (fig 6.1). Two grandchildren of the proband (V/1 and V/2) were born during the BRIDGE study and were investigated at a paediatric centre. Child V/1 was enrolled in the study when old enough to provide a saliva sample which was used for Sanger sequencing of candidate variants. Both children had limited platelet function testing suggestive of a similar defect to other affected family members (table 6.2).

188

Figure 6.1: Hammersmith family 428. Cases marked in red are considered ‘affected’ by the platelet disorder. Case II/2 has not had formal platelet testing but has a severe bleeding history. Cases V/1 and V/2 do not have a severe bleeding history but were diagnosed in infancy and have a platelet function defect similar to other affected family members. Cases where there is insufficient information to call affected or unaffected are marked in grey. Proband is marked by a black arrow. Cases where DNA was sent for either whole exome (WES) or whole genome sequencing (WGS) are marked by an asterisk.

189

Table 6.2: Platelet function results of Hammersmith family 428. Bleeding score was calculated using the condensed MCMDM-1VWD bleeding assessment tool. LTA= light transmission aggreometry. LTA responses to ADP, epinephrine and arachidonic acid (AA) are summarised. ATP release from platelets was measured by either platelet nucleotide assay or lumi-aggregometry. na = results not available

Pedigree ID III/3 IV/1 IV/3 V/1 Bleeding score 16 5 12 na LTA ADP Reduced Reduced. No Reduced. No Reduced. No 2° wave. 2° wave. 2° wave. Disaggregation Disaggregation Disaggregation at lower doses at lower doses at lower dose Epinephrine Reduced same as Delayed and na control reduced AA Reduced Reduced Reduced na Lumi- ADP na Reduced Reduced na Aggregometry Thrombin na Normal Normal na Platelet nucleotides Reduced Reduced ADP ADP release & release increased and ATP:ADP ratio increased ATP:ADP ratio

6.2.2 Variant identification

Sequencing data from cases II/3, IV/1 and IV/3 were analysed as described previously (Chapter 2, section 2.3 & 2.4). In total, 181 rare (MAF <0.1% in reference databases) variants were present in all 3 sequenced cases. None of these variants were in a tier-1 BPD gene. Variants were annotated with their presence in other BRIDGE-BPD cases, or the non-BPD NIHR BioResource Rare Disease studies, the allele frequency in GnomAD and the predicted protein impact using VEP107. I applied a step-wise approach to prioritise variants into a shortlist of potential candidates. In the first step coding variants were selected which were rare, had moderate or high impact and were associated with a BPD phenotype. Variants were excluded if they were present in BRIDGE-BPD cases with a different platelet phenotype, if they were

190 present in at least five unrelated non-BPD cases, or if they were not missense or predicted high-impact (defined as frameshift, stop-gain or lost or splice-site variants). Thirty one rare variants were identified that met the criteria and are listed in table 6.3. The majority were missense and three were predicted to cause a frameshift or premature stop. Ten variants were absent from GnomAD, although in four of these, a different variant was listed causing a different amino acid change at the same protein residue (marked d in table 6.3. Seven variants were absent from both GnomAD and non-BPD cases in the NIHR BioResource. Two variants were absent from GnomAD but were seen in 2 pedigrees in the NIHRBR, neither of whom had any platelet phenotype recorded.

Table 6.3: Thirty-one candidate variants shared by all three affected relatives. Variants that are present in >4 non-BPD cases or in a BPD case with a different phenotype have been excluded. Genes are listed alphabetically. The variant effect is given as predicted by Ensembl VEP (McLaren 2016). NIHRBR = NIHR BioResource-Rare Diseases. Sift analysis: O Deleterious; O Deleterious low confidence; O Tolerated. Polyphen analysis: O Probably damaging; O Possibly damaging; O Benign. d= different variant producing alteration at the same protein residue present in GnomAD. n/c= not calculable. Gene Variant Effect GnomAD NIHRBR CADD Phred SIFT Polyphen score AEBP1 Ala550Thr 0.0000653 4 1.79 O O ATP8B4 Ile1170Thr 0.0000081 0 23.8 O LC O BPI Gly48Arg 0 0 8.26 O O BRAP Met142AsnfsTer5 0d 0 n/c n/c n/c CAMCK2B Val549Ile 0.0000568 2 12.65 O O CD200R1L Arg92His 0.0000217 0 1.88 O O CYP27B1 His396Arg 0.0000323 0 0.02 O O DSC1 Asp193Asn 0.0000163 0 13.71 O O DYSF Glu1779Lys 0.0000163 0 28.1 O O FAM115C Gln797* 0.000053 2 37 n/c n/c FPGT Ile324Val 0.000029 1 5.34 O O GPAA1 Glu576Lys 0.0001845 0 24.1 O O GPAT2 Ser552Arg 0d 0 19.16 O O

191

IL21R Gly4Asp 0 0 11.75 O O KNTC1 Lys258Thr 0d 1 23.5 O O LIMCH1 Pro398Ser 0.0000076 0 3.73 O O NOM1 Asp668ValfsTer10 0.0000122 1 n/c n/c n/c NRK Arg730Gly 0.0000198 0 10.42 O O NRK Pro1195Ser 0 0 0 O O NT5C Arg36Leu 0.0000042 1 9.43 O LC O PLCB3 Ser531Ile 0.0001113 0 15.18 O O RAVER1 Ala463Thr 0 0 16.98 O O RBP2 Arg105Cys 0.0000217 0 34 O O SFRP4 Missense & splice 0 0 23.4 O O Met265Thr SNX25 Val633Leu 0.0000072 0 21.1 O O STON1- Pro820Ser 0.0000165 0 23.1 O O GTF2A1L TECTA Gly1576Val 0.0000323 0 29.5 O O THAP6 Cys75Tyr 0.0000231 0 12 O LC O TTC30A Gln138Arg 0.0000041 0 10.91 O O ZNF44 Cys517Phe 0d 1 24.7 O O

In the second prioritisation step I hypothesised that in order to cause the observed phenotype, the responsible variant should be in a gene that is expressed in megakaryocytes and/or platelets. Genes which have low levels of expression in MK and platelets are less likely to harbour pathogenic variants affecting platelet function. I applied RNA sequencing data of all known genes, in megakaryocytes (MK) and platelets from the Blueprint epigenome project210 to rank the genes in table 6.3 by their expression level in MK and platelets.

192

Figure 6.2: Heatmap showing the expression of 29 genes harbouring rare coding variants listed in table 6.3 in haematopoietic progenitor cells. Illustrates the variety of expression levels seen in genes harbouring candidate variants in Hammersmith family 428. Figure courtesy of Luigi Grassi (RNA seq data from Blueprint project226)

193

Finally, genes were prioritised if they were assessed as likely to be pathogenic by three different pathogenicity scoring algorithms – CADD, SIFT and Polyphen2. Five missense variants had a CADD score >10, were marked deleterious by SIFT and probably or possibly damaging by Polyphen2. The three variants predicted to result in premature stop codons are also automatically prioritised under this criterion. (Table 6.4, column 4) Finally, variants which were not present either GnomAD or the NIHR BioResource/UK Biobank/GEL pilot in-house controls were highlighted (table 6.4, column 5).

Table 6.4: Summary prioritisation of candidate genes. Variants listed in table 6.3 are prioritised by 5 different methods. BRAP is highlighted in red as the only gene which is present in all 5 prioritisation methods. Column 1: Genes ordered by expression in megakaryocytes. Column 2: Genes ordered by expression in platelets. Column 3: Top 10 genes ranked by CADD score, plus frameshift variants not scored by CADD. Column 4: Genes scoring highly on all 3 pathogenicity prediction scores (ie. are High-impact OR are missense and have a CADD >10, are listed deleterious in SIFT and probably or possibly damaging in Polyphen). Column 5: Genes in which this variant is absent from both GnomAD and NIHR Bioresource. Gene expression is measured in log-2FPKM and data is from Blueprint. MK expression Platelet CADD score Combined Absent from expression pathogenicity control databases IL21R BRAP FAM115C FAM115C BRAP NT5C GPAA1 BRAP BRAP BPI GPAA1 FPGT NOM1 RBP2 GPAT2 KNTC1 NT5C RBP2 TECTA IL21R PLCB3 DYSF TECTA SFRP4 KNTC1 BRAP RAVER1 DYSF ZNF44 NRK FPGT NOM1 ZNF44 GPAA1 RAVER1 RAVER1 KNTC1 GPAA1 NOM1 RP11 NOM1 ATP8B4 ATP8B4 SFRP4 THAP6 STON1-GTF2A1L KNTC1 ZNF44 SFRP4

194

The outcome of the prioritisation analysis identified BRAP as the only candidate gene which is highly expressed in MKs and platelets and contains a variant which is both predicted to be pathogenic by all 3 scoring systems and is absent from all control datasets. The next step was to determine whether loss of BRAP function provides a biologically plausible explanation for the bleeding and platelet phenotype.

6.2.3 BRCA-1 associated protein

The BRAP gene encodes BRCA-1 associated protein (BRAP), also known as IMP and BRAP2). BRAP is a 592 amino-acid cytoplasmic protein containing a nuclear localisation signal (NLS) domain at the N-terminus, two zinc-finger domains and a coiled coil domain towards the C-terminus (fig 6.3). The variant identified in Hammersmith pedigree #428 is predicted to introduce a frameshift at Met142 leading to a premature stop codon. This could either lead to the production of a truncated protein missing critical zinc finger domains or it is possible this transcript would undergo nonsense-mediated decay. BRAP has an established role in the regulation of the Ras/Raf/ERK signalling pathway235, which is an important signalling pathway in platelets and although there is no established role for BRAP itself in platelets or megakaryocytes, its expression is increased in platelets relative to other haematopoietic cells (figure 6.4), suggesting it may have an un-identified role. BRAP is also capable of binding to Galectin-2, a cytoskeletal protein and interacts with several molecules involved in inflammation including NFkB236.

Figure 6.3: Diagram of the BRAP protein showing the major domains. NLS = nuclear localisation domain which can bind to nuclear localisation signal motifs, implying a role for BRAP in regulating transport of nuclear proteins. RING = really interesting new gene zinc finger domain. ZNF = zinc finger domain. CC = coiled coil.

195

Figure 6.4: BRAP is highly expressed in platelets. Heatmap showing the expression of the BRAP canonical transcript (ENST00000327551) in haematopoietic progenitor cells. The expression in platelets and megakaryocytes is highlighted. Figure courtesy of Luigi Grassi (University of Cambridge), using RNA sequencing data generated by the BLUEPRINT epigenome project226.

6.2.3.1 BRAP in BRIDGE-BPD cases

A look-up in BRIDGE-BPD identified eight missense variants and a second frameshift predicted to introduce a premature stop codon in the BPD cohort. No consistent platelet phenotype was seen in cases with missense variants. The frameshift variant (p.Thr39AspfsTer37) was predicted to introduce a premature stop at residue 76. This case also had a similar phenotype to the Hammersmith family – Bleeding, normal platelet count and size and decreased platelet aggregation in response to ADP, arachidonic acid and epinephrine. The presence of a second case with a frameshift and similar phenotype is supportive of the pathogenicity of the Hammersmith variant, but co-segregation studies in the second family are crucial to confirm the pathogenicity of the second variant. The lack of similar phenotype clustering in the missense variants is acceptable because a truncated protein produced by a frameshift variant would be expected to have a different mechanism of action and different phenotype to a missense variant in the same gene.

196

Table 6.5: BRAP variants identified in BRIDGE-BPD cases. The Hammersmith case is underlined. Variant BPD phenotype Alternative explanation p. Thr39AspfsTer37 PFD+ p. Met142AsnfsTer5 PFD+ p.Met162Ile MTCP GP9 p.Ser174Gly Dense granule defect; Neurological p.Met203Arg MTCP MYH9 p.Ala214Gly Dense granule defect p.Arg498Gln TCP; PFD RUNX1 p.Val536Ile TCP;Dense & alpha granule defect novel gene, unpublished p.Tyr539Cys Bleeding only p.Arg587His Bleeding only

Figure 6.5: BRAP variants identified in the NIHR BioResource. Variants identified in BPD cases are indicated above the diagram. Frameshift variants in red, missense variants in black. The frameshift variant identified in a non-BPD cases in the NIHR Bioresource is labelled in red below the diagram. Other non-BPD variants are not labelled.

6.2.3.2 BRAP in non-BPD cases in the NIHR BioResource

The NIHRBR was also scrutinised for the presence of BRAP variants. 34 variants were identified, including 25 missense, two predicted frameshifts and three splice site variants. One frameshift variant (p.Thr39AspfsTer37) was also seen in the BPD case described

197 above and is predicted to introduce a premature stop towards the N terminus. The platelet phenotype for the 2 related non-BPD cases who carry this variant is unknown. The second frameshift variant (p.Glu456ArgfsTer35) is also present in a non-BPD NIHRBR-RD case where the platelet phenotype is unknown. This variant is predicted to produce a premature stop codon towards the C terminus of the protein (fig 6.5). If this variant led to the formation of a truncated protein it would retain the zinc finger domains, unlike the frameshift variants at the N-terminus identified in the BPD collection. Both variants are seen in GnomAD with a MAF <1 in 10,000.

6.2.3.3 BRAP in the general population

The overall allele frequency of BRAP variants in GnomAD is 0.0071, which would predict 8.3 cases in the BRIDGE-BPD cohort. Nine cases with BRAP variants are seen in the BPD cohort, suggesting there is no enrichment of BRAP variants in this group (p=0.807). There are 24 BRAP variants in GnomAD predicted to cause loss of function (LOF). Eleven of these are predicted stop-gains, nine are frameshifts (also predicted to introduce a premature stop) and four are predicted splice donor or acceptor variants. The probability of loss-of-function intolerance (pLI score) for BRAP, calculated from the ExAC dataset, is 0.02, which suggests that BRAP tolerates loss of function117.

6.2.3.4 BRAP in Genome wide association studies

There is evidence that genes which harbour common variants associated with platelet traits (detected by GWAS) are also enriched for rare variants causing Mendelian disease237. I used this data to prioritise genes identified in GWAS associated with platelet traits. Two genes from the 31 original candidates identified in Hammersmith pedigree #428 (in table 6.3) were associated with platelet traits in GWAS. Firstly, an intronic variant in BRAP (rs749237684) was associated with platelet count and platelet crit (the volume fraction of blood occupied by platelets). A 3’UTR variant in RAVER1 (rs74956615) was also associated with platelet crit237. Sanger sequencing of DNA from a grandchild of the proband (V/1) confirmed that the BRAP (p.Met142AsnfsTer5) variant

198 was present in the child but the RAVER1 (Ala463Thr) variant was not, adding further supporting evidence to prioritise BRAP as the top candidate gene in this family. BRAP snps have also been linked with coronary artery disease and stroke in two large GWAS, both disorders in which platelets play a central role.

6.2.3.5 Conclusion:

BRAP emerged as the most likely candidate gene in Hammersmith family 428 based on rarity in population and in-house datasets and expression profiles in megakaryocytes and platelets. There is also supporting evidence from GWAS studies, associating BRAP with platelet traits and cardiovascular disease (in which platelets play a key role). However, the candidate gene search here is limited by a focus on coding variants only and lack of functional studies. Further work is essential to confirm a mechanism by which a heterozygous loss of function variant in BRAP would lead to platelet dysfunction.

6.3 Identification of a potentially novel candidate gene for thrombocytopenia

A potentially novel thrombocytopenia gene was highlighted by applying the BeviMed method to the BRIDGE-BPD and NIHR BioResource collections. BeviMed was developed by Dr Daniel Greene for analysis of the NIHR BioResource Rare Disease study data to calculate the probability of association between variants and particular HPO phenotypes211. This method can integrate information from allele frequency databases and pathogenicity prediction algorithms to provide estimates of the probability of association between a locus and phenotype, the mode of inheritance and the probability of pathogenicity of individual variants211. When applied to the BRIDGE-BPD collection, BeviMed observed a high probability of association between protein-altering, rare variants in IKZF5 and cases with thrombocytopenia. IKZF5 ranked 13th out of all genes for probability of association with thrombocytopenia. Ten of the top 12 ranking genes were tier-1 BPD genes with well-established associations with thrombocytopenia, supporting the likelihood of this being a true association.

199

6.3.1 Ikaros zinc-finger 5

IKZF5 (Ikaros zinc finger 5, also known as Pegasus) is a 419 amino acid protein whose function is poorly understood. It is a member of the Ikaros family of zinc finger-containing transcription factors (IKZF1-5). All Ikaros proteins are highly homologous and contain N- terminal and C terminal zinc fingers. The two C terminal zinc fingers regulate dimerisation which in turn appears to also facilitate protein interactions and DNA binding by the three N terminal zinc fingers238. IKZF5 is highly homologous to the other Ikaros proteins except at the N–terminus where it contains three rather than four zinc fingers and has distinct DNA-binding sites238,239 (fig 6.6 & 6.7). Other Ikaros proteins are established as important for lymphocyte development, but the role of IKZF5 is elusive and it is more widely expressed239.

6.3.2 Variants identified in IKZF5 in BPD cases

Nine rare (MAF < 1:1000), protein altering variants in ten unrelated pedigrees were identified in the BRIDGE-BPD collection (listed in table 6.6 and fig 6.7). All variants were predicted missense and six were in pedigrees where the proband had thrombocytopenia with no associated platelet dysfunction or abnormal bleeding. The remaining three variants occurred in pedigrees with normal platelet count where the predominant platelet phenotype was bleeding and/or platelet dysfunction. One missense variant (Gln16His) was present in two unrelated BPD probands, several cases in other studies in the NIHR BioResource and is also listed in GnomAD. It is also present in three members of one BPD pedigree affected by bleeding and platelet dysfunction but not in a fourth, thus ruling Gln16His out as pathogenic in this family, despite it’s relatively high CADD phred score.

200

Table 6.6: IKZF5 variants identified in BRIDGE-BPD cases. Presence of thrombocytopenia and assessment of mean platelet volume (MPV) in the proband are given. n/a = not available Pedigree ID Variant Thrombocytopenia MPV GnomAD allele CADD phred effect in proband frequency score A010544 Gln16His n normal 0.0002491 27.4 A007255 Gln16His n normal 0.0002491 27.4 B200714 Tyr89Cys y n/a 0 33 B200715 Arg96Trp y n/a 0 24.2 A009243 Gly134Glu y normal 0 25.7 A008207 Cys140Arg y normal 0 25.9 A007192 His155Tyr y normal 0 26 K002602 Ser192Gly n decreased 0 12.31 A002748 Ser200Gly y increased 0 23 A007327 Pro266Leu n normal 0 25.2

Four of the six variants which occurred in thrombocytopenic cases are located within the N-terminal zinc finger domains and one (Gly134Glu) is located in the short linker between the second and third zinc finger domains in a highly conserved residue (fig 6.7). Only one variant which occurs in a thrombocytopenic case (Ser200Gly) is not in or near the zinc finger domains. All three variants occurring in non-thrombocytopenic BPD cases are located outside the zinc finger domains.

201

Figure 6.6: Alignment of the three N-terminal zinc finger domains of IKZF5 and position of variants associated with thrombocytopenia in the BRIDGE-BPD collection. Alignment of human IKZF5 with IKZF in other species and other proteins in the Ikaros family. Sequence alignment from residue 45 to 173 of human IKZF5 is shown. Zinc finger domains are highlighted in purple. The presence of a fourth zinc finger domain in other Ikaros family members can be seen that is absent from IKZF5 in all species. Amino acid residues predicted to be altered in four thrombocytopenic BPD pedigrees are highlighted in red. Human = human sequences; Danre = Danio Rerio; Xentr = Xenopus tropicalis. Alignment was created using Clustal Omega free online software and sequences from Uniprot. Uniprot ID’s are labelled in blue to the left of each sequence240,241.

Twenty seven missense variants in IKZF5 and one frameshift were seen in non-BPD cases in the NIHR BioResource/UKBB/GEL pilot collection. The majority (18) of these variants were listed in GnomAD and only one variant (Gln16His) was also present in a BPD case, as described. All non-BPD variants except two are located outside the N-terminal zinc finger domains (fig 6.7). The Pro136FS and Glu135Lys variants which are located between the second and third zinc fingers occur in the same case and are likely to be the same variant that has been called twice.

202

Figure 6.7: IKZF5 variants. Diagram illustrating the protein structure of IKZF5 including the zinc-finger domains (dark blue) and relative position of variants discussed in the text. Variants identified in BPD cases are labelled above the diagram (thrombocytopenic cases in red, non-thrombocytopenic cases in grey). Variants in non- BPD cases enrolled in the NIHR BioResource-Rare Disease (non-BPD) are labelled below the diagram in black. Single letter amino acid codes are used.

Co-segregation was sought in all pedigrees with thrombocytopenia. Pedigree diagrams are shown in figure 6.8. Three pedigrees showed a clear autosomal dominant inheritance pattern and in two pedigrees the variant appears to be de novo (Cys140Arg and His155Tyr), with both parents confirmed homozygous for the wild-type allele. The presence of two de novo variants may be considered strongly supportive of pathogenicity122. In the remaining pedigree (Ser200Gly) we do not have enough information to judge an inheritance pattern. In the pedigree with the Gly134Glu variant, one of the family members carries the variant but is not labelled with thrombocytopenia. In this case, the platelet count is 184, which is still within the normal range (150- 400x109/L) although much lower than the platelet count in the unaffected mother (327x109/L). It is possible that this represents variable penetrance of the IKZF5 variant or alternatively that it is not pathogenic in this family. Co-segregation studies in all pedigrees are ongoing.

203

Figure 6.8: BRIDGE-BPD pedigrees with IKZF5 missense variants associated with thrombocytopenia. Pedigrees are labelled by the IKZF5 amino acid substitution detected in the proband. Cases in black have low platelet counts. Those in white have normal platelet

204 counts and those in grey have unknown platelet counts (at the time of writing). Probands are marked with a black arrow. Where cases have had WES, WGS or Sanger sequencing, the presence of the variant (v)/wild-type (+) alleles or wild-type (+)/wild-type(+) alleles are labelled

205

6.3.3 Conclusion and further work

The statistical association between IKZF5 missense variants and thrombocytopenia in BPD cases, combined with the location of the thrombocytopenia-associated variants in highly conserved zinc finger domains is compelling and identified IKZF5 as a potentially novel inherited thrombocytopenia gene. However, further work is required to establish the association and prove a mechanism by which IKZF5 variants cause thrombocytopenia. Much of this work is currently underway. Co-segregation studies are continuing on all pedigrees with thrombocytopenia including patient and relative recall for DNA samples, and blood samples to confirm thrombocytopenia and any additional platelet phenotype. Functional studies to investigate altered DNA binding by mutant IKF5 and the effect on zebrafish thrombocytes by morpholino knockdown are being conducted by Dr Kathleen Freson at the University of Leuven.

6.4 Investigation of a novel candidate BPD gene in three families with a similar bleeding phenotype.

An interim analysis of the first 650 sequenced BRIDGE-BPD cases revealed four unrelated cases carrying monoallelic variants in ROCK1. ROCK1 was highlighted because, although it was not a known BPD gene, it was known to be involved in key platelet signalling pathways. It was also interesting that three of the unrelated probands shared a BPD phenotype that was uncommon across the cohort as a whole. These three probands all had significant bleeding histories, specifically all having delayed onset bleeding after procedures or trauma. They also all had completely normal platelet function in response to routine laboratory testing (ie. were classified as bleeders only). The investigation into ROCK1 as a candidate BPD gene is detailed in the next chapter.

6.5 Discussion and further work

The three examples of gene discovery presented in this chapter all illustrate the value of analysing individual cases and variants as part of a larger dataset. Considering the

206 variants and candidate genes present in an individual in the context of nearly 2000 other BPD cases and more than 11,000 non-BPD cases in the NIHRBR-RD/UKBB/GEL pilot collection provides power to the analyses. These cases are as close to true controls as it is possible to get because they are sequenced and have variants called in the same analysis pipeline. The comprehensive phenotype information available in all the BPD cases enabled identification of truly similar cases in all presented examples. Phenotype information in the non-BPD cases is more limited but it can help exclude variants from a list of potential candidates if the phenotype is severe. The case of Hammersmith family 428 highlighted the challenges of identifying the pathogenic variant even in a family with a well-defined platelet phenotype and three sequenced relatives. No rare, coding tier-1 gene variants were identified in this family and so it was necessary to employ a variety of prioritisation methods to compile a shortlist of candidate variants in potentially novel genes. There are several limitations to these approaches: Firstly, the widely-used approach of excluding variants which are present in control datasets is potentially hazardous when dealing with platelet disorders. Platelet dysfunction is likely to be underdiagnosed in the general population because bleeding symptoms go unreported and un-investigated and it is plausible that variants responsible for platelet defects are present in cases in both the NIHR BioResource and GnomAD datasets used in BRIDGE analysis. Conversely, an individual variant may be absent from GnomAD and therefore prioritised as a candidate, but the estimated MAF of zero is not truly representative if there happens to be a different genomic variant causing the same or different amino acid change at that locus, as in the example of the BRAP Met142fs variant. Secondly, too much reliance cannot be placed on in silico pathogenicity prediction algorithms. In the BRIDGE study we use the CADD score because it combines other in silico prediction tools and has been shown to be more predictive114 but it does not always correlate with other widely used scoring systems such as SIFT115 and Polyphen116, which also have poor concordance with each other in the candidate variants presented in family 428. In the BRIDGE study, we have not yet found a pathogenic variant with a CADD score of less than 10 (unpublished data) and this has often been used as an arbitrary cut-off, however it is possible this may inadvertently exclude pathogenic variants. Furthermore, the correlation between actual CADD score and pathogenicity is unknown i.e. it is unknown if a variant scoring 35 should be prioritised over one with a score of 25.

207

One of the factors which highlighted BRAP as a candidate in family 428 was its presence in a GWAS associated with platelet traits237, on the basis that genes carrying common variants with small effects on platelet phenotypes may also harbour rare variants causing rare disease such as shown with GFI1B, DIAPH1 and ACTN1237. The association of SNV’s in BRAP with cardiovascular disease, in which platelets play a major role is also supportive of this. One of the concerns about the BRAP Met142fs variant is that it presumes that heterozygous loss of function of BRAP will affect protein function. It is difficult to predict whether a LOF variant will be deleterious or not, but analysis of large datasets like ExAC has enabled prediction scores of which genes are more and less likely to tolerate LOF – the pLI score111,242. BRAP has a low pLI score, suggesting that a heterozygous truncating variant will be well tolerated, however pLI does not necessarily correlate with disease and indeed, some established disease genes such as BRCA1 and ITGA2B have pLI scores of zero117. Finally, I am aware that there are other plausible candidate genes in this family which are expressed in MK’s and platelets, are absent or present in extremely low frequency in reference datasets and are predicted to have deleterious effects on the respective protein. Co-segregation studies using Sanger sequencing of all short-listed variants would be the most effective way of excluding variants from this list, however this endeavour has been limited by lack of unaffected relatives willing to participate in the study.

The potentially novel BPD gene IKZF5 (and the novel mode of inheritance in GP1BB described in chapter 4) were identified using statistical methods developed specifically for the BRIDGE study. It was necessary to develop new analytical methods for this unique dataset which could account for extremely rare and heterogeneous diseases with different modes of inheritance and encompass multiple phenotypic traits185,211. However, the statistics can only prove an association and more evidence is required to confirm the pathogenicity of individual variants. This is a unifying conclusion from all the examples presented in this chapter. Co-segregation studies may confirm a statistical association in genes where a mechanism of disease is already established, such as in GP1BB186, but on an individual basis this is laborious and not always feasible. Similar cases carrying the same variants may eventually be identified through ongoing genome sequencing studies such as the 100,000 genomes project but this is not timely for novel gene discovery.

208

Functional studies are required to confirm pathogenicity where there is no established mechanism of disease or insufficient numbers of reported cases and essential to establish a novel BPD gene such as IKZF5.

A final and crucial limitation of the analysis to date has been the focus on coding regions of the genome which means that potentially pathogenic non-coding variants will have been excluded. This has been necessary because we have a much greater understanding of the consequences of genomic variants affecting coding parts of the genome, but the large number of whole genome sequencing datasets now available are gradually enabling better identification of non-coding variants.

209

Identification and characterisation of ROCK1 as a candidate gene for undefined bleeding disorder.

In chapters 4 to 6 I have described the identification and classification of variants in tier- 1 BPD genes and some of the strategies employed by the BRIDGE core analysis team to identify novel candidate genes for BPD. In this chapter I will describe the identification of one potentially novel BPD gene, ROCK1, and attempts to provide experimental evidence to support its association with BPD in cases enrolled in the BRIDGE-BPD study.

7.1 Rho-associated coiled-coil containing kinase 1: structure and function

The human ROCK1 gene (ENSG00000067900 GRCh37) is located on chromosome 18 (18:18,526,867-18,691,812). The ROCK1 gene has 6 transcripts of which ENST00000399799 is the canonical transcript, encoding the 1354 amino acid ROCK1 protein in 33 exons (ENSP00000382697; Uniprot Q13464). This transcript also appears most highly expressed in blood cell lineages (fig 7.1). There are two ROCK isoforms – ROCK1 and ROCK2. They are encoded by separate genes and share overall 65% amino acid sequence identity, with more than 90% homology in their kinase domains243,244. Both ROCK isoforms are ubiquitously expressed but have some tissue-specific differences in expression and function, although the level of redundancy between the two isoforms in various tissues is not yet fully known243,245-247. ROCK1 has relatively higher expression in blood cells compared to ROCK2 and ROCK1 is particularly highly expressed in neutrophils and highly expressed across a wide range of blood cell types210 (fig 7.1).

210

Figure 7.1: ROCK1 expression in Haematopoietic cells. (A) Heatmap showing the expression level of ROCK1 transcripts. ENST00000399799 is the canonical transcript and is highly expressed across all blood cell types, particularly highly in neutrophils. Non- coding transcripts are marked (nc). (B) Relative expression of ROCK1 and ROCK1 in 8 major haematopoietic progenitors. Heatmap in (A) courtesy of Luigi Grassi; all data from BLUEPRINT epigenome project226.

211

ROCK1 contains an N-terminal kinase domain with an ~75 amino-acid tail at its C- terminal end which regulates the accessibility of the ATP binding site248(fig 7.2). This is followed by a coiled-coil domain which contains the Rho-binding domain and a Pleckstrin homology domain at the C-terminus which is split by a cysteine-rich zinc-finger-like domain249. N- and C-terminal extensions of the kinase domain and a C-terminal coiled- coil region are critical for ROCK1 dimerisation248,250. Dimerisation appears to be important for ROCK1 function as the dimeric structure is favourable for ATP and substrate binding248. The coiled-coil domain also acts as a hinge, enabling the C-terminus to interact with and auto-inhibit the kinase domain (fig 7.2). Upon activation of ROCK1, either through Rho binding or cleavage of the C-terminus by caspase 3, the interaction between the C-terminus and kinase domain is disrupted, enabling kinase activity251,252.

Figure 7.2: Human ROCK1 protein showing the major functional domains. DD: dimerisation domain. KT: kinase tail. RBD: Rho-binding domain. PHD: Pleckstrin homology domain. CRD: cysteine-rich zinc finger-like domain. Protein domains and their indicated positions were taken from Uniprot (ROCK1 Q13464)

ROCK1 and ROCK2 are downstream effectors of the small GTPase RhoA and both function predominantly as regulators of the actin cytoskeleton through phosphorylation of their downstream target proteins, including myosin light chain (MLC) , myosin light chain phosphatase 1 (MLCP1) and LIM kinase (LIMK). Consequently, ROCKs play a major role in cell adhesion, migration, apoptotic morphological changes and embryonic development in a variety of cell types246,252-255. In platelets, Rho GTPase-ROCK signalling and regulation of the actin cytoskeleton is required for integrin activation, granule secretion, platelet shape-change and thrombus stability32. Studies in ROCK1

212 haploinsufficient mice have shown decreased neo-intima formation, vascular smooth muscle proliferation and decreased leucocyte infiltration after vascular injury compared to wild-type256. The central role of ROCKs in cell migration and adhesion has made them a therapeutic target and ROCK inhibitors have been developed to treat a wide range of diseases. ROCK inhibitors are anti-proliferative in a variety of human cell types257: they can decrease tumour invasiveness and metastasis, decrease pulmonary arterial hypertension and cerebral vasospasms and reduce inflammation by inhibiting leucocyte migration and recruitment258. However, ROCK inhibitors are non-specific, inhibiting both ROCK isoforms and also other kinases at higher doses259.

7.2 Identification of ROCK1 variants in interim BRIDGE-BPD analysis

Analysis of the first ~600 cases sequenced in the BRIDGE-BPD study identified three unrelated probands with rare missense variants in ROCK1: p.Met156Leu, p.Arg403Cys and p.Arg403Leu (table 7.1, fig 7.3). Sequencing data from the three probands was scrutinised for variants in the tier-1 gene variants, but no potential candidates were identified. Prior knowledge of ROCK1 as a key regulator of the platelet actin cytoskeleton prompted further investigation into its role in human BPD. The probands all shared a common HPO phenotype bleeding after procedure (HP:0011890) and two had delayed onset bleeding (HP:0040231). Closer scrutiny of the third case revealed that they also had delayed onset bleeding in the form of a delayed post-partum haemorrhage and delayed onset of bleeding hours after minor trauma, but this had not been specifically coded. Cases B200248 and N006225 had an undefined bleeding disorder with normal platelet count and morphology and normal platelet function in clinical laboratory tests. Case B200436 was recorded as having abnormal aggregation in response to ADP and arachidonic acid. Closer scrutiny revealed that this was a mild, inconsistent abnormality and case B200436 was re-evaluated as a undefined bleeding disorder making all three phenotypes consistent. The three missense variants in BPD cases were all located in the kinase domain (fig 7.3). • Met156 is a highly conserved residue in the ATP binding pocket. The Met156Leu variant had a high CADD pathogenicity score (28.6) and was also predicted deleterious and probably damaging by SIFT and Polyphen respectively. The alternative allele was present at a frequency of 1 in 40,000 in ExAC. In the crystal

213

structure of the ROCK1 kinase domain, resolved by Doran et al, 2004248, Met156 is located in the hinge region of the kinase domain in the ATP binding-site (fig 7.4). In this crystal structure, Met156 forms a critical hydrogen bond with the ATP inhibitor, suggesting it may also be a critical residue for ATP-binding. A methionine to leucine substitution here is therefore postulated to affect ATP binding. • Arginine 403 sits in the kinase tail which lies across the kinase domain and interacts with the dimerisation domain (fig 7.4). It is less well conserved but located in a region known to be important for ROCK1 dimerisation and regulating accessibility to the ATP binding site248. Based on the crystal structure of the kinase domain, the substitution of a cysteine residue for the arginine at residue 403 could form a disulphide bond with a cysteine in the dimerisation domain, thereby disrupting the 3D conformation and dimerisation and/or access to the ATP binding pocket. Experimental studies have shown that deleting the kinase tail increases the Km for ATP so it is possible that a variant which disrupts the kinase tail may also disrupt ATP binding. The variant predicted to cause the Arg403Cys substitution was present at 1 in 13,000 in ExAC and had inconsistent pathogenicity predictive scores form in silico algorithms. The Arg403Leu variant was not present in ExAC but the effect of this substitution on structure is uncertain and it also has an intermediate CADD score but is predicted to be tolerated and benign by SIFT and Polyphen (table 7.1).

214

Table 7.1: ROCK1 variants identified in early analysis of BRIDGE-BPD cases. Allele frequencies in ExAC at the time of initial analysis are shown. CADD:CADD-phred score. Sift analysis: O Deleterious; O Deleterious low confidence; O Tolerated. Polyphen analysis: O Probably damaging; O Possibly damaging; O benign Case ID Variant Variant effect Bleeding phenotype ExAC AF CADD SIFT Polyphen

B200436 18:18625377 T/G p.Met156Leu Delayed post-surgical bleeding 2.5x10-5 28.6 O O

B200248 18:18608741 G/A p.Arg403Cys Delayed onset, severe bleeding post 7. 5x10-5 26.9 O O surgery

N006225 18:18608740 C/T p.Arg403Leu Delayed onset bleeding after trauma Absent 22.7 O O and post-partum. Severe epistaxis. Menorrhagia

215

Figure 7.3: ROCK1 protein annotated with BPD variants. The locations of the three BPD variants (Met156Leu, Arg403Cys and Arg403Leu) are indicated. The sequence alignments of ROCK1 in various species and human ROCK2 in these regions are expanded to show the conservation.

216

Figure 7.4: Predicted position of Met156 and Arg403 in ROCK1 kinase domain. The crystal structure of the ROCK1 kinase domain (residues 6 to 415) was resolved bound to ATP competitive inhibitors (illustrated here in the active site) by Jacobs et al, 2006. The ROCK1 protein dimer is shown with one monomer in grey and the other monomer coloured by region, showing the proposed head-to-head dimerisation (interacting dimerisation residues shown in red). The kinase domain is illustrated as N- and C-terminal kinase domains (dark and light blue) connected by a hinge (residues 154- 158, orange) and the kinase tail is in yellow, shown lying across the kinase domain and interacting with the dimerisation domain. Met156 lies in the hinge region, in the ATP- binding pocket. R403 lies in the kinase tail which interacts with the dimerisation domain. (Figure adapted from Jacobs et al, 2006250).

7.3 Co-segregation

Co-segregation was sought in the families of all three probands. Limited co-segregation was available for all families but family history was compatible with an autosomal dominant bleeding disorder in all cases (fig 7.5). The ROCK1 variant co-segregated with

217 the bleeding phenotype in the families of probands B200436 and B200248. In the family of N006225, the variant was present in one of the daughters of the proband who did not have any bleeding tendency. The other two daughters carrying the variant both had definite abnormal bruising, epistaxis and bleeding after trauma since birth (fig 7.5).

Figure 7.5: Co-segregation of ROCK1 variants in three BPD pedigrees. The ROCK1 Met156Leu and Arg403Cys variants and bleeding phenotype co-segregate together in respective pedigrees but the Arg403Leu variant does not co-segregate with the bleeding phenotype in all family members. Cases affected by significant, abnormal bleeding symptoms are marked in black; unaffected cases are in white; cases where the bleeding phenotype is uncertain are in grey. Where cases have had whole exome or Sanger sequencing, the presence of the ROCK1 variant or wild-type is labelled. V= presence of variant allele + = presence of wild-type allele. The proband in the Arg403Cys pedigree died in childhood from severe bleeding prior to the start of the sequencing study.

218

7.4 INTERVAL donor search

The lack of definitive laboratory phenotype and small pedigrees prompted a search for additional cases carrying ROCK1 variants. The presence of the Met156Leu and Arg403Cys variants in the ExAC database suggested that these variants may be seen in up to 1 in 40,000 and 1 in 13,000 of the general population respectively. An application was made to the INTERVAL study of 50,000 blood donors260. The primary aim of this study was to establish the optimum, safe frequency of blood donations but a secondary aim was to establish a national epidemiological resource. We applied to access the DNA samples and corresponding full blood counts and bleeding histories from all INTERVAL study participants. The aim was to identify healthy, well phenotyped donors carrying the same ROCK1 variants as BPD cases. DNA samples would be selectively genotyped for the Met156L and Arg403 variants and those individuals carrying the variants could be invited for further assessment of their bleeding and platelet phenotype to assist assessment of variant pathogenicity. Based on ExAC allele frequencies, I estimated that between seven to ten INTERVAL donors would be identified with either the Met156Leu or Arg403Cys variants. Eight individuals in the INTERVAL study were identified who carried the same ROCK1 variants, consistent with the estimated population allele frequency from ExAC. Six were heterozygous carriers of Met156Leu and two were heterozygous carriers of Arg403Cys. Unfortunately re-consent was required to enable recall of these cases which was not achievable in the timescale of the study.

7.5 Identification of ROCK1 variants in entire BRIDGE-BPD collection

As the study progressed, sequencing data from further BPD cases were scrutinised for ROCK1 variants in July 2017. As before, variants were analysed if they had a MAF <1/1000 in reference datasets and had a CADD score of at least 10. Seven heterozygous ROCK1 missense variants were identified in eight additional unrelated BPD cases including a second case carrying the Arg403Cys variant. Variants and proband phenotypes are listed in table 7.2. As previously, three different algorithms were used to score the likely deleteriousness of each variant and the presence of the same variant in non-BPD cases and in the GnomAD database was sought (table 7.2).

219

Three cases (carrying Arg403Cys, Leu312Phe and Ala905Val) had undefined bleeding disorders (UBD), similar to the original three ROCK1 cases. The Ala905Val variant did not co-segregate with the bleeding phenotype in the BPD family, was also present in a BPD case with thrombocytopenia that was already explained by a GP9 variant and in two non- BPD cases, so is unlikely to be pathogenic for the bleeding disorder. Four cases had thrombocytopenia as the primary platelet phenotype. Two of these were already explained by alternative pathogenic variants (case N013776 by a variant in MYH9 and A014068 by a variant in GP9) and in a third case the ROCK1 variant was not present in a relative with thrombocytopenia, so ROCK1 was not considered as a candidate gene for thrombocytopenia in this pedigree. Two of the thrombocytopenic cases (A013603 and N013776) had a co-existent bleeding disorder with platelet dysfunction. Case A016021 had a familial thrombotic disorder and harboured a missense variant in a splice region of ROCK1 (p.Ser853Leu). This focussed interest on probands A010795 (p.Leu312Phe) and A012362 (p.Arg403Cys) who both had undefined bleeding disorders and further information was sought on these pedigrees. There was no pedigree information available for A010795, who was recruited from a non-UK centre but the HPO terms encoded for this patient were subcutaneous haemorrhage, prolonged bleeding after surgery and bleeding with minor or no trauma. Platelet function was recorded as completely normal which is consistent with the phenotype seen in the original BPD cases described above. Co-segregation was prioritised for proband A012362 who was recruited in the UK and is described in more detail in section 7.5.1 below.

220

Table 7.2: ROCK1 variants in BRIDGE-BPD cases. The eleven BRIDGE-BPD probands harbouring rare coding variants in ROCK1 (MAF <1/1000 in reference datasets) are listed. non-BPD cases refers to the NIHR BioResource/UKBiobank/GEL pilot collection. Where an alternative pathogenic variant has been identified as an explanation for this case, the gene name is given. Bleeding and platelet phenotypes are summarised in the following categories: MTCP – macrothrombocytopenia; UBD – undefined bleeding disorder (abnormal bleeding but no definable platelet defect using light transmission aggregometry); PFD – platelet function defect (any abnormal aggregation response to any agonist in LTA or prolonged PFA-100 closure times); PFD+ - PFD with additional evidence of granule secretion defect measured either by lumi-aggregometry or platelet nucleotide assay. SIFT analysis: O Deleterious; O Tolerated. Polyphen analysis: O Probably damaging; O Possibly damaging; O benign Proband ID Phenotype ROCK1 CADD SIFT Polyphen GnomAD AF Present in non-BPD Variant present in Alternative variant case affected relatives explanation B200436 UBD p.Met156Leu 28.6 O O 4.07x10-5 y y A010795 UBD p.Leu312Phe 23.6 O O 0 n n/a B200248 UBD p.Arg403Cys 26.9 O O 6.74x10-5 y y A012362 UBD p.Arg403Cys 26.9 O O 6.74x10-5 y y N006225 UBD p.Arg403Leu 22.7 O O 0 n n A010819 MTCP p.Arg697Cys 34 O O 0 n n A016021 Thrombosis p.Ser853Leu 35 O O 4.13x10-6 y n/a A014068 MTCP, PFD+ p.Ala905Val 26.6 O O 4.47x10-5 y n/a GP9 N013618 UBD, p.Ala905Val 26.6 O O 4.47x10-5 y n neutropenia A013603 TCP, PFD+ p.Thr1334Ala 15.96 O O 0.000127 y n N013776 MTCP, p.1022 -1030 n/a n/a n/a 0 y n MYH9 bleeding Inframe del

221

7.5.1 Further investigation of pedigree A012362

BPD case A012362 had a history of easy bruising, prolonged bleeding from cuts and prolonged post-operative bleeding after tonsillectomy, hysterectomy (performed to treat menorrhagia) and knee replacement requiring transfusion. Bleeding score using the condensed MCMDM-1VWD BAT at the time of enrolment was 10. The time of onset of the post-operative bleeding is unknown. Laboratory investigations revealed normal platelet count and morphology, normal coagulation factors and normal platelet aggregation in response to all tested agonists (ADP, epinephrine, collagen, ristocetin and arachidonic acid). Bleeding histories of the parents and sister of the proband were unknown since they had died at relatively young ages from non-BPD related illnesses. One of the daughters of the proband also had a severe bleeding history with epistaxis, menorrhagia requiring hysterectomy and bleeding sufficient to require blood transfusion after two ectopic pregnancies. However, she reported dental extractions and laparoscopic cholecystectomy without bleeding complications. Bleeding score was 10 and routine laboratory testing was all normal, similar to the proband. Two granddaughters of the proband are reported to have epistaxis and easy bruising, suggestive of a bleeding tendency but have not been formally assessed (marked with ? in fig 7.6). A second daughter of the proband reported only easy bruising and had not had any bleeding after hysterectomy, placenta praevia or multiple wisdom teeth extraction. Routine laboratory tests were normal and bleeding score was 5. She was labelled as unaffected by the bleeding tendency in the family (fig 7.6). The phenotype in this family is similar to the family of proband B200248 who also carry the same variant (table 7.3). Sanger sequencing for the Arg403Cys ROCK1 variant showed it was present in the affected daughter and not in the unaffected daughter of the proband (fig 7.6).

222

Figure 7.6: Pedigree of proband A012362. Cases affected by significant, abnormal bleeding symptoms are marked in black; unaffected cases are in white; cases where the bleeding phenotype is uncertain are in grey. Where cases have had whole genome or sanger sequencing, the presence of the ROCK1 variant or wild-type is labelled. V: presence of variant allele, +: presence of wild-type allele

7.6 ROCK1 variants in control and reference populations

To support the hypothesis that ROCK1 missense variants are associated with UBD, the GnomAD database and NIHR BioResource Rare Disease/UKBiobank/GEL pilot collections (n=11,686) – hereafter referred to as non-BPD cases – were scrutinised for ROCK1 variants. I hypothesised that the frequency of ROCK1 variants would be higher or that the distribution of variants would be different in the BPD cases compared to the non-BPD and GnomAD populations.

7.6.1 ROCK1 variants in non-BPD cases

Forty-five rare ROCK1 variants were identified in 57 unrelated pedigrees in the non-BPD cases. Forty-three were missense variants, occurring across all ROCK1 domains and two were predicted to be splice-acceptor variants (fig 7.7). Since the variants associated with undefined bleeding disorder in BPD cases were all located in the kinase domain and

223 kinase-tail, the non-BPD cases were particularly scrutinised for variants in these domains. Seven variants were identified in the kinase domain in nine unrelated pedigrees and are summarised in table 7.3. Two cases, one from UKBioBank healthy donors and one from the primary immune disorders (PID) study carried the same Met156Leu variant described above in a BPD case. It is notable that five of these variants have intermediate CADD scores and are absent from GnomAD, making them extremely rare. Three variants were identified in the kinase tail region (residues 409-425) in eight unrelated cases and are listed in table 7.4. Five of these cases in the GEL pilot, NDD and pulmonary arterial hypertension (PAH) studies, shared the same Arg403His variant which is also listed in GnomAD with a MAF of 1/12,198. A Ser407Pro variant was present in a single case from the NDD study and also a case from the steroid-resistant nephropathy (SRNS) study, although it can be excluded from pathogenicity for the latter phenotype because the variant was not present in an affected relative. Only two predicted high-impact variants were present in the non-BPD collections. Both were splice-acceptor variants, predicted to affected splicing of exon12 and occurred in single patients in the GEL and NDD studies. The fact that these two variants are predicted to have a similar effect on splicing and are present in cases with different rare disorders makes them unlikely to be pathogenic. The presence of variants Met156Leu and Arg403Cys in the non-BPD cohort was not unexpected given the MAF of these variants in the GnomAD dataset, however the presence of 7 unrelated cases with variants at Arg403 does not support the hypothesis that a variant at this residue causes a severe BPD phenotype. Unfortunately no bleeding or platelet phenotype information was available on any of the non-BPD cases so it is possible they may carry a mild BPD phenotype. The Arg403 variants are unlikely to cause the other phenotypes for which the cases were entered into the other rare disease studies because each variant was absent from at least one affected relative.

224

Table 7.3: ROCK1 kinase domain variants in non-BPD cases. GEL: GEL pilot study, PID: primary immune disorders study, IRD: inherited retinal dystrophy study, PAH: pulmonary arterial hypertension study, UKBB: UK BioBank, NDD: Neurodevelopmental disorders study. Variant Study Number of sequenced CADD GnomAD AF Comments cases in pedigree with variant p.Gly328Cys GEL not in affected relative 35 0 p.Val284Ile PID 1 25.4 0 p.Glu258Asp IRD 1 26.8 0 Already explained p.Asp187His PID 1 31 0 Homozygous p.Met156Leu PAH 1 28.6 4.34x10-5 Present in BPD UKBB 1 case p.Arg147His GEL 1 24.7 8.3x10-6 NDD 1 p.Arg147Ser GEL 2 33 0

Table 7.4: ROCK1 kinase tail variants in non-BPD collections. SRNS: steroid-resistant nephropathy study Variant Study Number of sequenced CADD GnomAD AF Comments cases in pedigree with variant p.Ser407Pro NDD 1 23 5.29x10-5 SRNS not in affected relative p.Arg403His GEL not in affected relative 22.9 8.2x10-5 NDD 1 PAH 1 Already explained GEL 1 GEL 1 p.Arg403Cys GEL not in affected relative 26.9 6.74x10-5 Present in 2x BPD cases

7.6.2 ROCK1 variants in wider populations

There are 347 different ROCK1 variants in the GnomAD database, distributed throughout the functional domains of the ROCK1 protein (fig 7.7). Two variants (Thr773Ser and Thr112Pro) occur in more than 1 in 1000 individuals and were excluded from

225 comparative analysis with the BPD and non-BPD cases, which only include variants with a MAF<1:1000 in reference datasets. ROCK1 has a calculated pLI score of 1 which suggests that it is extremely likely to be intolerant of loss-of-function117 and these types of variants are rarely seen in the GnomAD dataset: there were seven frameshift or stop-gain variants, occurring throughout the protein, none occurring with an allele frequency of greater than 5.23x10-6. In addition there were eight in-frame deletions, all located in the coiled-coil domain, five splice-donor variants and 327 missense variants (fig 7.7). The Z-score for the number of observed vs. expected ROCK1 missense variants in the ExAC database is 4.96, indicating there were fewer ROCK1 missense variants than would be expected to occur if mutations occurred randomly111. This suggests increased constraint and low tolerance of missense variation in ROCK1. Forty alternative alleles were present at Arg403, producing Arg403Cys and Arg403His variations with a combined MAF of 1.4x10- 4. Met156Leu has an allele count of 12 in GnomAD (MAF 4.34x10-5) and Leu312Phe was not seen.

226

Figure 7.7: Distribution of rare variants in the ROCK1 protein. The location of rare (MAF<1:1000) variants in ROCK1 in the GnomAD database (top, blue), non-BPD cases in the NIHR BioResource Rare Diseases/UK BioBank/GEL pilot collection (middle, green) and BRIDGE-BPD cases (bottom, red) is shown along the X-axis. The ROCK1 protein schematic is shown for reference to indicate the locations of the variants respective to the ROCK1 domains. The relative proportion of missense variants at each amino acid residue is indicated by blue (GnomAD) and green (non-BPD) bars. The locations of BPD missense variants listed in table 7.2 are indicated by red bars: for simplicity, the relative frequency of each variant is not given because the sample size is much smaller than the other collections. Variants associated with undefined bleeding disorder in BPD cases are marked with an asterisk. High-impact variants are indicated by •:splice-donor or acceptor, x:stop-gain or frame-shift, p: inframe deletion.

7.6.3 Statistical comparison of BPD ROCK1 variants with other populations

The cumulative allele count of ROCK1 variants was 1473 in the 138,632 individuals in the GnomAD database (1.06%). Based on this frequency, 12.4 cases heterozygous for a rare ROCK1 variant would be expected in the 1169 whole genome sequenced individuals in the BPD collection. A total of nine variants were seen in 11 BPD cases, indicating that

227 there is no enrichment of ROCK1 variants in BPD cases compared to the general population (fig 7.8A). The frequency of rare variants in each ROCK1 domain was compared between the BPD, non-BPD and GnomAD collections to see if there was any enrichment of variants in a particular domain in the BPD cases (fig 7.8B). In all three collections, variants were most frequently seen in the coiled-coil domain. This is not surprising because the coiled coil domain is the largest domain, comprising ~600 amino acids and is only known to have a spacer function. In the BPD collection, the frequency of cases harbouring variants in the kinase domain (KD) and kinase tail (KT) was higher (allele count 2 for KD and 2 for KT in 1169 cases) than in the non-BPD cases (AC(KD)=8 and AC(KT)=7 in 11,686 cases) and in GnomAD (AC(KD)=114 and AC(KT)=197 in 138,632 cases). However, these differences did not reach statistical significance (Fisher’s exact test p>0.05).

228

Figure 7.8: ROCK1 variants in BPD, non-BPD and GnomAD cases. (A) Bar-graph showing the proportion of individuals in the GnomAD database, non-BPD in-house studies and the BRIDGE-BPD study who carry rare, heterozygous missense variants in ROCK1. (B) Bar-graph showing the frequency of rare ROCK1 missense variants in each collection, categorised by ROCK1 domain. The location of variants was not significantly different between the different collections using Fisher’s exact test. (ROCK1 domains were defined by the following amino-acids - N-terminus:0-75, Kinase-76-338, Kinase tail:341-405, Coiled-coil: 409-1022, Pleckstrin homology domain (PHD):1118-1317, C- terminus:1318-1354). Non-missense variants were not included.

229

In summary, the analysis of the genomic distribution of ROCK1 variants in human populations indicates that they occur throughout the protein with no particular domain preference between groups and there is no enrichment of ROCK1 variants in BPD cases. It should be noted that variants in cases with UBD are in the kinase domain and kinase tail, but the sample size is too small to ascertain whether this is significant. This analysis has also highlighted the frequency of variants at Arg403 making it unlikely that they cause a severe phenotype. Population studies of ROCK1 suggest that missense and loss of function in ROCK1 would be poorly tolerated. ROCK1 LOF variants would therefore be predicted to cause a severe phenotype in humans and this is supported by the observation that high-impact variants are rare in the BPD and non-BPD collections.

7.7 Investigation of the functional effects of ROCK1 variants

The identification of rare ROCK1 kinase domain variants in association with a similar undefined bleeding disorder phenotype in four unrelated pedigrees was interesting but inconclusive without supporting functional studies. This prompted an experimental plan to investigate the effect of the ROCK1 variants identified in BRIDGE-BPD cases on ROCK1 functions in vitro. My aim was to over-express wild-type and variant ROCK1 in HEK293T cells and measure effects on cell migration, the actin cytoskeleton and ROCK1 kinase function.

7.7.1 Expression of ROCK1 in HEK293T cells

ROCK1 cDNA was cloned into a pcDNA3.1-B vector, in-frame with a His tag at the C terminus (hereafter referred to as ROCK1-His) as described in section 2.6.2. HEK293T cells cultured in 6-well plates were transfected with both 2500ng and 1250ng ROCK1- His per well to establish the optimal concentration. 1250ng was sufficient to induce ROCK1 over-expression and this was the concentration used in all future ROCK1 transfections. Transfected ROCK1 was detected with both ROCK1 and His antibodies and transfected HEK293T cells over-expressed ROCK1 compared to non-transfected control cells (fig 7.9).

230

Figure 7.9: Overexpression of His-tagged wild-type ROCK1 in HEK293T cells. Cell lysates were collected at 48 hours post-transfection and subjected to Western blot then incubated with ROCK1 or His antibodies. WT: wild-type.

7.7.1.1 Creation of ROCK1 mutants

Site-directed mutagenesis was performed to create mutant ROCK1-His plasmids containing the ROCK1 variants identified in BRIDGE-BPD cases associated with undefined bleeding disorder (UBD). Two additional mutants were also planned for comparison: firstly, a variant which introduces a premature stop codon at tyrosine 405 (ROCK1-Y405*) leading to a truncated protein which is constitutively active; secondly a variant substituting glycine for lysine at position 105 (ROCK1-K105G) which leads to a ‘kinase dead’ ROCK1 protein261. Successful mutagenesis was confirmed for all variants by Sanger sequencing but at the time of writing only ROCK1-Y405* and ROCK1-M156L expression in HEK293T cells had been confirmed by Western blotting (table 7.5)

Table 7.5: ROCK1 mutants created by site-directed mutagenesis of ROCK1-His plasmid. UBD: undefined bleeding disorder. Variant Rationale Confirmed by Confirmed expression Sanger sequencing in HEK293T cells Y405* Kinase active y y K105G Kinase dead y n M156L Present in BRIDGE-BPD UBD y y pedigree B200436

231

R403C Present in BRIDGE-BPD UBD y n pedigree B200248 & A012302 R403L Present in BRIDGE-BPD UBD y n pedigree N006225 L312F Present in BRIDGE-BPD UBD y n pedigree A010795

7.7.1.2 ROCK1 mutants Y405* and M156L expression in transfected HEK293T cells is time-dependent

Cell lysates collected at 24, 48 and 72 hours from HEK293T cells transfected with wild- type, kinase active (Y405*) and Met156Leu mutant ROCK1 showed that expression of the wild-type and M156L ROCK1 increased over time compared to control HEK293T cells (fig 7.10). Immunoblotting with His antibodies confirmed that this was due to the presence of the transfected, not endogenous ROCK1. ROCK1-Y405* was not detected by anti-His or anti-ROCK1 antibodies because this variant produces a truncated protein which lacks both the C-terminal residues recognised by the ROCK1 antibody and the His tag. However, the presence of the Y405* mutant does appear to be having some effect because the endogenous ROCK1 expression appears to decrease over time and other band intensity increases over time. Non-specific bands are detected with increasing intensity in the WT and Y405* lysates but not from the M156L lysate.

232

Figure 7.10: Expression of transfected ROCK1 increases over time. Cell lysates were collected at indicated time-points from HEK293T cells transfected with wild-type ROCK1 (WT), kinase active ROCK1 (Y405*) and ROCK1 with Met156Leu variant (M156L) and western blot performed with anti-ROCK1 and anti-His antibodies. Anti-GAPDH was used as a control.

7.7.2 ROCK1 effects on wound closure

In order to investigate the effects of the ROCK1 variants on cell migration, a scratch wound healing assay was performed in HEK293T cells transfected with ROCK1 wild-type and mutants.

7.7.2.1 Over-expression of wild-type ROCK1 in HEK293T cells reduces wound closure

Scratch tests were performed on HE293T cells 24hrs after transfection with WT ROCK1- His when cells were in a confluent monolayer. Wound area was measured at the time of the scratch then at 24hr intervals for either 48 or 72 hours. Wound area was measured at three locations at each time-point and the percentage wound closure was calculated (as described in chapter 2, section 2.6.7). HEK239T cells overexpressing ROCK1 (WT- ROCK1-His) had slower wound closure than non-transfected cells (fig7.11).

233

Figure 7.11: HEK293T cells transfected with wild-type ROCK1 have reduced wound closure compared to non-transfected cells. HEK293T cells were cultured in 6-well plates and transfected with 1250ng wild-type ROCK1 when in a confluent monolayer. Scratch was performed 24hrs post-transfection and wound area measured at 0, 24 and 48hrs. (A) Representative images of scratch wound taken at indicated time- points (B) Bar graph showing the mean percentage wound closure at 48 hours from three separate experiments. WT: cells transfected with wild-type ROCK1. Statistical comparison was performed using an unpaired t-test, p-value shown.

7.7.2.2 Overexpression of wild-type and mutant ROCK1 reduces wound closure and effects are time-dependent

HEK293T cells transfected with WT-ROCK1-His, ROCK1-Y405* and ROCK1-M156L exhibited slower wound closure than cells transfected with the empty pcDNA3.1(-)B vector. There was no difference in mean wound closure between non-transfected cells and cells transfected with the empty vector, suggesting that the effect of ROCK1 WT and mutant over-expression is due to the ROCK1 and not as a result of the transfection procedure (fig 7.11). An overall difference between the groups was already present at 24 hours post-scratch (ANOVA p=0.0347), however, the differences between individual groups were not evident until at least 48 hours (ANOVA p=0.0017, multiple comparison testing shown in fig 7.12). Wound closure in cells overexpressing WT-ROCK1-His and

234

ROCK1-M156L was reduced compared to non-transfected cells (p<0.05). Wound closure in cells overexpressing ROCK1-M156L was also significantly reduced compared to the cells transfected with the empty vector (p<0.05; fig 7.12) at 48 hours. ROCK1-Y405* and ROCK1-M156L did not have a significant effect on wound closure compared to the WT- ROCK1-His. (Fig. 7.12).

Figure 7.12: Effects of wild-type and mutant ROCK1 on wound closure after 24 and 48 hours. Bar graphs showing the mean percentage wound closure at 24hrs (n=2) and 48hrs (n=3) post scratch test. N=1 for the M156L experiment. Statistical comparison between groups was performed by analysis of variance (ANOVA) followed by Tukey’s multiple comparison testing. * indicates a significant difference between the means of the two indicated groups (p<0.05).

7.8 Discussion

ROCK1 was identified as a candidate BPD gene due to the prior knowledge of the importance of the Rho kinase- ROCK pathway in platelet signalling and the identification of variants in three probands with similar bleeding phenotype. The position of the p.Met156Leu and p.Arg403Cys variants predicted that they could impact ATP binding and ROCK1 dimer conformation. A major limitation of this study has been the lack of a well-defined, reproducible phenotype to co-segregate against. Bleeding is subjective and can be influenced by

235 heritable and non-heritable factors, there is no well-defined reproducible test for it and its detection often requires exposure to haemostatic challenge. As a result, the lack of co- segregation of the Arg403Leu variant with bleeding in the N006225 pedigree may be due to variable penetrance, other factors affecting bleeding or differential exposure to bleeding risk. We are therefore left uncertain in all pedigrees discussed in this chapter whether the co-segregation is ‘true’ or occurring by chance. The presence of candidate variants in population datasets such as GnomAD is often used as an argument against pathogenicity. This may be reasonable for severe, visible, congenital phenotypes, however bleeding symptoms are frequently underreported or not diagnosed due to lack of risk exposure and may present later in life so are more likely to appear in ‘healthy’ population datasets. This is why variants Arg403Cys and Met156Leu were not immediately excluded from further analysis on the basis of their frequency in ExAC/GnomAD. Conversely, there are variants absent from GnomAD which cannot be pathogenic of a severe BPD: for example the Arg697Cys variant which is carried by a case with macrothrombocytopenia but is absent from another affected relative. However, it is possible that this variant could potentially be acting as a modifier leading to the proband’s presentation. The ROCK1 study again illustrates the limitations of pathogenicity scoring systems. There is variable concordance between CADD, SIFT and Polyphen predictions in the variants identified in BPD cases and scores do not necessarily correlate with pathogenicity. For example, the Arg697Cys variant also has a CADD-Phred score of 34, putting it in the top 0.1% of all variants, but co-segregation studies do not support its pathogenicity. The lack of bleeding and platelet phenotype information available for non-BPD cases makes it difficult to interpret the ROCK1 variants in these cases but most are unlikely to be pathogenic for their presenting phenotypes because they either do not occur in affected relatives or the phenotypes in cases with the same variant are too different to have the same genetic cause. It is possible that some of the ROCK1 variants listed may be pathogenic of mild bleeding which is either undiagnosed or unreported. Similarly, two BPD cases, A014068 and N013776, had macrothrombocytopenia in association with platelet function defect and bleeding. Pathogenic variants in MYH9 and GP9 had already been identified as causing the macrothrombocytopenia, but the potential influence of these variants on the bleeding symptoms is unknown without further studies. Rare disease studies traditionally set MAF limits of 1 in 1000 or even 1 in 10,000, however this

236 is likely to exclude some pathogenic variants causing milder or less well-defined disorders. Analysis of the genomic landscape of ROCK1 did not identify any enrichment in ROCK1 variants in the BPD cohort, nor any domain specificity, however the analysis was limited by the very different cohort sizes and small number of BPD variants. It is noted that all the variants in cases with UBD are located in the kinase domain, but simply being in the same domain doesn’t infer they will have the same impact.

The identification of four ROCK1 variants in five pedigrees with similar phenotypes was an interesting association although inconclusive without functional studies to confirm association of variant with phenotype. Experimental studies were therefore planned to confirm the effect of the ROCK1 variants identified in BPD cases on known ROCK1 functions. ROCK1 was cloned into an expression vector with a myc-His tag at the C- terminus and ROCK1 mutants were created using site-directed mutagenesis. This process was more problematic than anticipated: ROCK1 was extremely susceptible to mutation and it took repeated attempts to successfully clone full-length ROCK1 cDNA without any unwanted mutations in to the expression vector. A similar problem was found in ROCK1 plasmids gifted from another lab and purchased from a specialist commercial company – both were found to contain mutations within the ROCK1 open- reading frame which did not have a big impact on the size of the ROCK1 ORF, therefore were not detected by restriction digest, but were only detectable by Sanger sequencing of the full-length of ROCK1. Site-directed mutagenesis (SDM) also frequently introduced off-target mutations into ROCK1 which were random and not related to the primers used. Variants would be introduced at the cloning, SDM and transformation stages. I finally had the most consistent success transforming into DH10B cells at 30°C. Transfection of wild-type and mutant ROCK1 into HEK293T cells over-expressed WT-, M156L- and Y405*-ROCK1 with expression levels increasing with time. One limitation is that neither the ROCK1 nor His antibodies used could detect the Y405* variant because it is a truncated protein lacking the C-terminus, however the changing band pattern over time compared to non-transfected cells and those transfected with WT-ROCK1 and M156L-ROCK1 suggests an effect of the transfected protein. Overexpression of ROCK1 WT and mutants was also associated with reduced wound healing in preliminary scratch assay experiments. M156L appears to have the greatest

237 effect on wound closure, with only cells transfected with M156L-ROCK1 having significantly less wound closure at 48 hours than control cells transfected with the empty vector, however the difference compared to ROCK1-WT was not significant. Repetitions of these experiments, especially the ROCK1-M156L transfections alongside controls would be necessary to confirm this possible association but it does support the hypothesis that ROCK1 overexpression affects cell migration as expected from the literature 246,258. It also supports an effect of the Met156Leu variant on ROCK1 activity. If the M156L-ROCK1 does indeed inhibit wound closure through inhibition of ROCK1 activity then one would expect the Y405*-ROCK1 (which has constitutional kinase activity261) to have the opposite effect and increased wound closure. There was a trend towards this in preliminary experiments but further work is needed to investigate this. Unfortunately in the time available I have not been able to perform more than these preliminary experiments which provide proof of principle that the Met156Leu variant may impact ROCK1 function. Although incomplete, these experiments demonstrate that variants and expression levels of ROCK1 can have an effect on cellular function. It is not clear how alteration in ROCK1 function might cause the bleeding phenotype seen in BPD cases. RhoA-ROCK signalling has an established role in platelet function31,32 although the bleeding in these cases doesn’t appear to be driven by platelet dysfunction since laboratory platelet function tests in these cases were normal. This does not rule out a role for platelets however, since ROCK1 deficient mouse platelets have increased actin polymerisation, phosphatidylserine exposure and thrombin generation whilst maintaining normal platelet aggregation, shape change and ATP secretion in response to collagen262. Alternatively, the haemostatic defect in these cases may not be in the platelets but rather in the endothelium or leucocytes, leading to defective thrombus formation and/or stability. ROCK1 heterozygous mice exhibit decreased vascular wall neointima formation, aberrant leucocyte migration and decreased expression of endothelial adhesion molecules after injury256. ROCK1 is particularly highly expressed in neutrophils which play a key role in both maintaining vascular integrity and in thrombus formation263,264. A defect in neutrophil recruitment could be consistent with the delayed onset bleeding after surgery or trauma seen in the ROCK1 cases.

238

7.9 Future work

The preliminary data presented here suggests that the Met156Leu ROCK1 variant may have an effect on ROCK1 function but further studies are essential to characterise this effect and establish a mechanism by which ROCK1 variants could contribute to bleeding in humans. Unfortunately I was not able to complete the following experiments during this PhD but I outline here the next line of investigation. Expression of all ROCK1 mutants should be confirmed in HEK293T cells as already done with ROCK1-WT, M156L and Y405*. All scratch wound healing assays should be repeated to achieve at least n=3 for all mutants in order to show the effect of the ROCK1-WT and mutant overexpression on cell migration. If an effect is shown a more specific migration assay could be used to differentiate between cell migration and proliferation. The role of ROCK in the regulation of the actin cytoskeleton was the first established role of ROCK1 and has been extensively investigated. In order to establish an effect of the ROCK1 mutants in the actin cytoskeleton I planned to initially stain transfected HEK293T cells for F-actin and image using confocal microscopy. ROCK1 directly phosphorylates a series of downstream effectors including myosin light chain (MLC), myosin light chain phosphatase subunit 1 (MYPT1) and LIM kinase (LIMK). Lysates from HEK293T cells transfected with ROCK1 mutants and appropriate controls can be analysed by western blotting with antibody detection of the phosphorylated proteins.

239

Chapter 8: Discussion

This thesis presents my work as part of the BRIDGE-Bleeding and platelet disorders consortium. I have shown how the application of high-throughput sequencing in the form of whole exome and whole genome sequencing has enabled novel gene discovery for BPD and also provided a diagnosis for many patients for whom a specific genetic diagnosis had not previously been possible. Specifically, the work in this thesis has shown the following 1. How comprehensive phenotyping of BPD can facilitate the interpretation of sequencing data 2. The specific diagnoses made in a cohort of locally enrolled cases and the challenges encountered in interpreting variants in established BPD genes. I have also illustrated how these diagnoses contribute to our understanding of the mechanisms of disease and genotype-phenotype correlations in BPD 3. Identification of potentially novel candidate BPD genes BRAP, IKZF5 and ROCK1.

8.1. Study design and recruitment

The BRIDGE-BPD consortium study was set up to investigate the genetic basis of rare, inherited bleeding and platelet disorders; one of several rare disease studies which over time became incorporated into the NIHR BioResource for Rare Diseases265. BRIDGE-BPD was specifically designed to recruit probands with bleeding and platelet disorders and the international collaboration enabled recruitment of more than 2000 probands and their relatives with presumed inherited BPD. A central hypothesis of the BRIDGE-BPD study was that by sequencing and comprehensively phenotyping large numbers of unrelated cases with rare diseases, many cases with the same disorder would be enrolled: similar cases would then be identified by phenotypic clustering and a common genetic basis would be revealed. The emphasis was therefore on recruiting probands and relatives were not recruited as a priority. In many cases where relatives were enrolled at the same time as the proband, DNA samples were not routinely run on the HTS platform and were instead stored for later co-segregation studies by Sanger sequencing only. With hindsight, despite the success of phenotypic clustering, the novel gene discoveries were

240 made in the larger, more informative pedigrees and co-segregation studies became critical for confirming variant pathogenicity in both known and novel genes. Enrolling more relatives at the outset of the study would have made co-segregation studies easier later on and also enabled selective WES or WGS of relatives in particularly informative pedigrees. For example, relatives who are more distantly related or families where there a several recruited members with a well-defined and reproducible phenotype.

The design of an online data collection tool that was specific to BPD enabled focussed data capture relevant to this disease category. Involvement of clinicians and bioinformaticians in the design process resulted in the collection of data that was both clinically informative and computationally useful. Our dedicated online collection tool enabled individual clinicians to input their own data which reduced transcription errors and allowed clinicians to record both the results and their interpretation of the data. There is significant variation between recruiting clinicians and laboratories in the range and number of investigations performed in a patient presenting with BPD and as a result, there was substantial inter-patient variability in the laboratory data recorded. This often made direct comparison of cases difficult – for example, cases could be labelled with storage pool disorder on the basis of lumi-aggregometry, platelet nucleotide assay or electron microscopy results, or any combination of the three – and limited the phenotypic clustering. Ideally, the data collection would be standardised with the same laboratory tests recorded on all enrolled individuals, however this was not feasible for two main reasons. Firstly, not all tests would have been available to all recruiting clinicians and secondly, many cases were enrolled with historically-collected data and DNA samples.

Another limitation of the phenotype data collection was the subjective nature of one of the main phenotypes in this study – bleeding. The nature and severity of bleeding is influenced by many non-hereditary variables such as age and exposure to haemostatic challenge making the direct comparison of bleeding between two individuals almost impossible. We attempted to overcome this firstly by optimising the use of HPO to record the anatomical location, type of bleeding and whether blood transfusion was required, however HPO does not account for the subjective measure of bleeding severity. Clinicians were also requested to score bleeding severity using the condensed MCMDM1-VWD bleeding score but not all clinicians use this tool and it was also validated for VWD

241 assessment, not for unselected bleeding so it cannot really be used for quantifying bleeding severity. The ISTH-BAT65 is very similar to the condensed MCMDM1-VWD63 tool and would probably now be a better choice since it was specifically designed for the evaluation of bleeding symptoms of patients referred with any possible bleeding disorder and has been shown to be useful for documentation of bleeding symptoms in such unselected patients148.

8.2. Phenotyping

The phenotypes of all cases enrolled in the BRIDGE-BPD study were recorded using the comprehensive, standardised terminology of the Human Phenotype Ontology104,266. This facilitated the aggregation and computational analysis of phenotype and genotype data from heterogeneous cases enrolled by clinicians around the world. Annotating BPD with an ontology encompassing all organ systems revealed the presence of non- haematological pathologies in 60% of cases, particularly the central nervous system (eg. autism spectrum disorders) and immune systems, highlighting that BPD can often be part of a more complex spectrum of pathologies83. HPO phenotyping was also able to identify specific BPD that were not clinically suspected (eg. MYH9) and enabled computational clustering of similar cases which helped to identify DIAPH1 and SRC as novel BPD genes199,200. Annotation of tier-1 BPD genes with a set of HPO terms is also facilitating diagnosis of BPD on the Thrombogenomics targeted sequencing platform78. HPO phenotype coding is now used by a variety of large-scale whole genome sequencing studies, including the Deciphering Developmental Disorders267 and 100,000 genomes projects232. Increasing use of this standardised terminology will enable the data-sharing across these large-scale projects which is particularly necessary for rare disease research. Standardised phenotyping will also aid the transfer of data between registries, biobanks and curated gene and variants databases268.

8.3. Utility of large-scale High-throughput sequencing in bleeding and platelet disorders

Until recently, the majority of BPD genes were discovered by Sanger sequencing of candidate genes identified through a series of laborious clinical investigations and

242 through linkage studies in large informative pedigrees. However, over the last five to ten years the emergence of high-throughput sequencing (HTS) has led to the discovery of many novel BPD genes and the development of targeted HTS gene panels for BPD diagnosis78. In the 100,000 genomes project, WGS is now being introduced into mainstream NHS clinical practice. The application of HTS presented in this thesis has contributed to the diagnosis of BPD in two ways. Firstly, five novel BPD genes have been identified through WES and WGS in the BRIDGE study199-201,234 and other manuscripts are in progress. The BRIDGE-BPD study is closely allied to the Thrombogenomics consortium, enabling rapid transfer of novel genes to the targeted sequencing panel once they have been approved by the ISTH-SSC for genomics in haemostasis and thrombosis. This continually increases the repertoire of possible diagnoses on this panel. Many more potential candidate genes have been identified through WGS, some presented in chapter 6 and 7. Confirmation of these candidate genes is required through co-segregation and functional studies. Secondly, HTS has identified causal (pathogenic) variants in known BPD genes that either fully or partially explain the BPD phenotype in 13% of enrolled probands. Practically, the identification of a pathogenic variant enables feedback to recruiting clinicians and a genetic diagnosis to be given to the patient. The identification of variants in ‘known’ BPD genes amongst the BRIDGE cohort, which supposedly consists of patients in whom known BPD are either not suspected or have been excluded, highlights two issues: firstly that the existing diagnostic process is unsatisfactory and fails to recognise a significant proportion of established disorders. Secondly the rapidity with which HTS is transforming the genetic landscape of BPD – many genes in which diagnoses have been made have only been discovered since the BRIDGE study was initiated (eg. ACTN1, GFI1B, RASGRP2, SLC45A2, ANKRD26, ETV6).

It is increasingly recognised and shown in chapter 3 that many BPD are associated with additional pathologies including myelofibrosis, renal failure, lung fibrosis and malignancy and establishing a conclusive molecular diagnosis prevents mis-diagnosis, informs optimal treatment and family counselling and can provide clarity about disease progression83. The majority of diagnoses made in the BRIDGE study were in cases with non-syndromic thrombocytopenia. A genetic diagnosis in these cases distinguishes between rare BPD which may all present with just an isolated low platelet count but may

243 be associated with a risk of developing severe pathologies in the future such as haematological malignancy (RUNX1, ETV6 or ANKRD26-related thrombocytopenias) or renal failure (MYH9-related disorder). Current guidelines place candidate gene Sanger sequencing at the final tier of BPD diagnosis269 but the increasing availability and decreasing cost of WGS now make it possible to perform at an earlier stage in the diagnostic pathway. WGS is not limited by candidate gene bias, is cheaper, more efficient and better suited to the investigation of genetically heterogeneous disorders such as BPD because it can identify disorders for which there is no definitive laboratory test268.

In addition to identifying novel genes and providing a genetic diagnosis, this work has identified many novel and previously-reported variants in BPD genes recently discovered by other groups, such as CYCS (Lentaigne et al, in preparation), GFI1B (van Oorschot et al submitted), ACTN1216 and RASGRP2193. These replications are important to validate these disorders and improve our understanding of genotype-phenotype correlations in recently discovered BPD. The identification of further cases in disorders where only a few cases have been previously reported worldwide also help to improve our understanding of these disorders and how to interpret variants in these genes. We have also identified many heterozygous variants in genes previously associated with autosomal recessive disorders and confirmed autosomal dominant macrothrombocytopenia caused by heterozygous variants in GP1BB186, thus establishing a novel mode of inheritance. The significance of heterozygous variants in one or more genes traditionally associated with recessive disease will be elucidated with the emergence of data from more sequencing studies like this.

Whilst WES and WGS have identified pathogenic variants in 13% of locally enrolled cases, the majority remain without a specific diagnosis. This can be explained by three factors. Firstly, all the novel genetic loci and the majority of the pathogenic variants in known genes identified during this PhD have been in the coding portions of the genome. Coding variants are less frequent and are easier to interpret that non-coding variants. Exonic regions tend to be more highly conserved than non-exonic regions and variants here do not alter the protein sequence, so interpreting their consequence is difficult. Variants in the 5’UTR of ANKRD26 120 and RBM8A 270 have been already implicated in BPD, so it is

244 likely that there are more pathogenic non-coding variants to be discovered in both known and unknown genes. An ongoing focus of the BRIDGE consortium is to establish the most significant regulatory regions of the genome that are relevant to megakaryocytes and platelets with the hypothesis that variants in these areas may be more likely to affect gene expression and impact on disease. Secondly, BPD are highly heterogeneous disorders and it is likely that there are unknown BPD genes which are as yet undiscovered. As illustrated in chapter 6, the process to confirm a candidate gene as a true BPD gene usually requires international collaboration to prove co-segregation in multiple pedigrees and functional studies to establish a mechanism of disease. This process takes time and there will be more candidate BPD genes yet to emerge from the BRIDGE and similar studies. It is also possible that even larger sequencing studies would be required to achieve the numbers necessary to find multiple cases of the same disorder, such as the BRAP variant in Hammersmith pedigree #428. This may also be the case for ‘storage pool disorders’, which remain poorly defined and where there is a lack of gene discovery on a global scale83. Finally, the heterogeneous nature of bleeding disorders suggests that they may not all be Mendelian. Cases within the same family often exhibit a range of bleeding severity and different platelet phenotypes. This suggests that some BPD may be oligogenic and/or modified by more common variants of smaller effect which would be filtered out by current analytical methods. Extremely large studies are necessary to provide sufficient case numbers to power the detection of oligogenic disorders and/or modifier variants with small effect and this also poses a big computational challenge.

8.4. Interpretation of variants

Given the existence of hundreds to thousands of variants in any individual’s genome, one of the main challenges of this work was in determining which variants are disease- causing. The correct identification of novel causal variants critically depends on their allele frequency in relevant control samples and we have relied on data from 1000 genome108, UK10K109 and Exome Aggregation Consortium (ExAC) and GnomAD projects 117 (http://gnomad.broadinstitute.org) to define the rarity of variants in the general population. However, they have some limitations: there is still a relative paucity of sequencing data from individuals of non-European ancestry, making it harder to identify

245 truly rare variants in these populations. Furthermore, with underreported and milder phenotypes such as BPD, disease-causing variants can be present in GnomAD and non- pathogenic variants can be completely absent. We have also annotated variants identified in BPD cases against the Human Genome Mutation Database118 which maintains a curated catalogue of variants derived from the literature. Data from ExAC/GnomAD and other HTS studies has shown that unfortunately HGMD and other variant databases are not 100% accurate and contain benign variants which are mistakenly labelled as disease-causing78,271. Similar findings are presented in this thesis, with HGMD-listed variants identified which cannot be pathogenic, either because they occur too frequently in the population or the phenotype in the BPD case is not that expected to be caused from a defect in the gene. The incorrect classification of variants in this way often occurred historically, prior to studies such as ExAC providing more accurate population MAF data and highlighting the degree of variation in the human population. It may also occur if insufficient scrutiny is given to the evidence presented for pathogenicity in the original case-report or if a variant is repeatedly submitted from the same proband or pedigree and thereby giving a false impression of the number of independent reported cases. Large scale WGS projects such as 100,000 genomes will undoubtedly identify more incorrectly labelled variants and help us make increasingly accurate assessments of variant allele frequency across ethnicities. International efforts such as ClinGen, a National Institutes of Health-funded resource aiming to become an authoritative central resource defining the clinical relevance of genes and variants to disease, will help to avoid incorrect classifications in the future272.

Variant prioritisation methods presented here can provide a shortlist of candidate variants but further assessment in the context of the patient’s phenotype is essential to confirm a diagnosis268. This is the role of the central Thrombogenomics MDT – to independently assess all the evidence for each variant and follow standardised criteria to assign pathogenicity. The initial results from the Thrombogenomics project showed that each patient had on average five candidate variants in tier-1 BPD genes78. A centralised reporting system allows the combined expertise of a geneticist, clinician and bioinformatician to be utilised and standardised criteria followed before a formal research report is issued. Resources allow the automated integration of data from a

246 variety of free and pay-per-use publically available databases to enable a comprehensive assessment of evidence. Some of the limitations of a centralised MDT approach are highlighted in this thesis. Firstly, not having the referring clinician present who knows the patient can lead to incorrect assessment, illustrated by the F8 variant discussed in Chapter 4 which changed pathogenicity from CPV to VUS after local review. Secondly, the MDT can only process a certain volume of cases in the level of detail required. In the BRIDGE-BPD study this has led to selection of cases for the MDT which may be considered more likely to be pathogenic or are part of a case series of variants in a particular gene. Consequently, not all rare variants in tier-1 BPD genes are discussed at MDT. This is a problem for two reasons: many VUS will not be formally recorded in the central database and therefore some pathogenic variants may be inadvertently missed and not reported. Finally, as per generally accepted guidelines122,273, variants of unknown significance are not issued a formal research report and are not reported back to clinicians. This is necessary to avoid both clinical over-interpretation of these variants and inappropriate feedback to patients, but it does mean that co-segregation studies (which may provide evidence in favour of the variant being either pathogenic or benign) are less likely to be performed. Ideally all VUS in tier-1 BPD genes should be recorded, publically deposited and continually re- appraised in the light of emerging evidence and new cases. In an entirely centralised process, the onus for this lies entirely in the hands of the MDT and not the treating clinician or the patient. In the results presented in Chapter 4, there are some discrepancies between the MDT and local assessments of variants in tier-1 BPD genes. The relationship between the BRIDGE and Thrombogenomics consortia allows for these discrepancies to be addressed, but this issue needs to be considered in the future of large- scale WGS projects. One reason for discrepancies between locally and centrally assessed variants is that the MDT has been reporting on variants since 2013. In that time new evidence has emerged from both GnomAD and the literature about individual variants and also the criteria with which variants are assigned pathogenicity are continually updated. In 2015 and 2016 guidance was published by both the American College of Medical Genetics122 and the European Society of Human Genetics273. The ACMG guidelines are widely followed in many HTS studies and have been increasingly adopted by the Thrombogenomics MDT. The ACMG guidance suggests scoring variants by the strength of evidence for their

247 pathogenicity in multiple criteria including population data, functional studies, co- segregation, de novo observations and computational predictions. Each variant is scored against multiple criteria and a certain strength of evidence needs to be reached in order to be classified pathogenic. However there are few specific parameters and professional judgement can be used to change the strength of the evidence for any particular criteria. Grading is therefore highly subjective and studies have shown poor inter-laboratory concordance274. Applying ACMG criteria to each variant requires assessment of a variety of evidence which is best done in an automated manner by a central MDT process. Weight is quite rightly placed on experimental and co-segregational data, but, as shown in this thesis, these are laborious and not always feasible. Consequently, the majority of variants in tier-1 BPD genes are VUS and will remain so until further evidence emerges.

8.5. The future

Although a necessary consequence of a cautious approach to variant interpretation, the number of cases with VUS and the number of BPD which remain undiagnosed is unsatisfactory. Genomic sequencing alone cannot explain the mechanisms underlying the relationship between genotype and phenotype in cases with an apparent inherited disorder275. To ascertain the functional consequences of rare variants, it is essential that knowledge of phenotypes and pathways is integrated systematically within a frame of reference similar to that of the human genome276. I have shown some examples of novel methods already applied to BPD cases in the BRIDGE study, such as HPO phenotyping145, the BeviMed statistical association method211 and integration of GWAS data237. It is likely that additional non-coding variants in regulatory regions of the genome remain undiscovered. Ongoing work by the BRIDGE-BPD consortium to identify key regulatory loci relevant to megakaryopoiesis and platelet function through the integration of data from GWAS, ChIP-seq and proteomics will hopefully highlight key regulatory regions in which variant analysis can be focussed. With the introduction of the 100,000 genomes project, WGS is becoming part of mainstream clinical practice in the UK and with falling costs it is likely to completely replace targeted sequencing platforms within the foreseeable future. This means that the issues faced in the interpretation of the role of novel rare variants will become increasingly relevant in the clinic. The international collaboration and data-sharing

248 demonstrated by the BRIDGE-BPD project will be essential. Providing a molecular diagnosis to patients is highly desirable, but making incorrect assumptions about variants could be harmful.

249

Publications related to this thesis

Sivapalaratnam, S., Westbury, S. K., Stephens, J. C., Greene, D., Downes, K., Kelly, A. M., Lentaigne, C., Astle, W. J., Huizinga, E. G., Nurden, P., Papadia, S., Peerlinck, K., Penkett, C. J., Perry, D. J., Roughley, C., Simeoni, I., Stirrups, K., Hart, D. P., Tait, R. C., Mumford, A. D., Laffan, M. A., Freson, K., Ouwehand, W. H., Kunishima, S., Turro, E. & BioResource, N. (2017) Rare variants in GP1BB are responsible for autosomal dominant macrothrombocytopenia. Blood, 129(4), 520-524

Lentaigne, C., Freson, K., Laffan, M. A., Turro, E., Ouwehand, W. H. & Consortium, B.-B. C. a. t. T. (2016) Inherited platelet disorders: toward DNA-based diagnosis. Blood, 127(23), 2814-23

Simeoni, I., Stephens, J. C., Hu, F., Deevi, S. V., Megy, K., Bariana, T. K., Lentaigne, C., Schulman, S., Sivapalaratnam, S., Vries, M. J., Westbury, S. K., Greene, D., Papadia, S., Alessi, M. C., Attwood, A. P., Ballmaier, M., Baynam, G., Bermejo, E., Bertoli, M., Bray, P. F., Bury, L., Cattaneo, M., Collins, P., Daugherty, L. C., Favier, R., French, D. L., Furie, B., Gattens, M., Germeshausen, M., Ghevaert, C., Goodeve, A. C., Guerrero, J. A., Hampshire, D. J., Hart, D. P., Heemskerk, J. W., Henskens, Y. M., Hill, M., Hogg, N., Jolley, J. D., Kahr, W. H., Kelly, A. M., Kerr, R., Kostadima, M., Kunishima, S., Lambert, M. P., Liesner, R., Lopez, J. A., Mapeta, R. P., Mathias, M., Millar, C. M., Nathwani, A., Neerman-Arbez, M., Nurden, A. T., Nurden, P., Othman, M., Peerlinck, K., Perry, D. J., Poudel, P., Reitsma, P., Rondina, M. T., Smethurst, P. A., Stevenson, W., Szkotak, A., Tuna, S., van Geet, C., Whitehorn, D., Wilcox, D. A., Zhang, B., Revel-Vilk, S., Gresele, P., Bellissimo, D. B., Penkett, C. J., Laffan, M. A., Mumford, A. D., Rendon, A., Gomez, K., Freson, K., Ouwehand, W. H. & Turro, E. (2016) A high-throughput sequencing test for diagnosing inherited bleeding, thrombotic, and platelet disorders. Blood, 127(23), 2791-803.

Stritt, S., Nurden, P., Turro, E., Greene, D., Jansen, S. B., Westbury, S. K., Petersen, R., Astle, W. J., Marlin, S., Bariana, T. K., Kostadima, M., Lentaigne, C., Maiwald, S., Papadia, S., Kelly, A. M., Stephens, J. C., Penkett, C. J., Ashford, S., Tuna, S., Austin, S., Bakchoul, T., Collins, P., Favier, R., Lambert, M. P., Mathias, M., Millar, C. M., Mapeta, R., Perry, D. J., Schulman, S., Simeoni, I., Thys, C., Consortium, B. B., Gomez, K., Erber, W. N., Stirrups, K., Rendon, A., Bradley, J. R., van Geet, C., Raymond, F. L., Laffan, M. A., Nurden, A. T., Nieswandt, B., Richardson, S., Freson, K., Ouwehand, W. H. & Mumford, A. D. (2016) A gain-of-function variant in DIAPH1 causes dominant macrothrombocytopenia and hearing loss. Blood.

Turro, E., Greene, D., Wijgaerts, A., Thys, C., Lentaigne, C., Bariana, T. K., Westbury, S. K., Kelly, A. M., Selleslag, D., Stephens, J. C., Papadia, S., Simeoni, I., Penkett, C. J., Ashford, S.,

250

Attwood, A., Austin, S., Bakchoul, T., Collins, P., Deevi, S. V., Favier, R., Kostadima, M., Lambert, M. P., Mathias, M., Millar, C. M., Peerlinck, K., Perry, D. J., Schulman, S., Whitehorn, D., Wittevrongel, C., De Maeyer, M., Rendon, A., Gomez, K., Erber, W. N., Mumford, A. D., Nurden, P., Stirrups, K., Bradley, J. R., Lucy Raymond, F., Laffan, M. A., Van Geet, C., Richardson, S., Freson, K., Ouwehand, W. H. & Consortium, B.-B. (2016) A dominant gain- of-function mutation in universal tyrosine kinase SRC causes thrombocytopenia, myelofibrosis, bleeding, and bone pathologies. Sci Transl Med, 8(328), 328ra30.

*Westbury, S. K., *Turro, E., *Greene, D., *Lentaigne, C., *Kelly, A. M., *Bariana, T. K., Simeoni, I., Pillois, X., Attwood, A., Austin, S., Jansen, S. B., Bakchoul, T., Crisp-Hihn, A., Erber, W. N., Favier, R., Foad, N., Gattens, M., Jolley, J. D., Liesner, R., Meacham, S., Millar, C. M., Nurden, A. T., Peerlinck, K., Perry, D. J., Poudel, P., Schulman, S., Schulze, H., Stephens, J. C., Furie, B., Robinson, P. N., van Geet, C., Rendon, A., Gomez, K., Laffan, M. A., Lambert, M. P., Nurden, P., Ouwehand, W. H., Richardson, S., Mumford, A. D., Freson, K. & Consortium, B.-B. (2015) Human phenotype ontology annotation and cluster analysis to unravel genetic defects in 707 cases with unexplained bleeding and platelet disorders. Genome Med, 7(1), 36 *contributed equally to this paper.

251

References

1 Machlus, K. R., Thon, J. N. & Italiano, J. E. Interpreting the developmental dance of the megakaryocyte: a review of the cellular and molecular processes mediating platelet formation. Br J Haematol 165, 227-236, doi:10.1111/bjh.12758 (2014).

2 Machlus, K. R. & Italiano, J. E. The incredible journey: From megakaryocyte development to platelet formation. J Cell Biol 201, 785-796, doi:10.1083/jcb.201304054 (2013).

3 Avecilla, S. T. et al. Chemokine-mediated interaction of hematopoietic progenitors with the bone marrow vascular niche is required for thrombopoiesis. Nat Med 10, 64-71, doi:10.1038/nm973 (2004).

4 Zimmet, J. & Ravid, K. Polyploidy: occurrence in nature, mechanisms, and significance for the megakaryocyte-platelet system. Exp Hematol 28, 3-16 (2000).

5 Deutsch, V. R. & Tomer, A. Megakaryocyte development and platelet production. Br J Haematol 134, 453-466, doi:10.1111/j.1365-2141.2006.06215.x (2006).

6 Raslova, H. et al. Megakaryocyte polyploidization is associated with a functional gene amplification. Blood 101, 541-544, doi:10.1182/blood-2002-05-1553 (2003).

7 Mazzi, S., Lordier, L., Debili, N., Raslova, H. & Vainchenker, W. Megakaryocyte and polyploidization. Exp Hematol 57, 1-13, doi:10.1016/j.exphem.2017.10.001 (2018).

8 Geddis, A. E. Megakaryopoiesis. Semin Hematol 47, 212-219, doi:10.1053/j.seminhematol.2010.03.001 (2010).

9 Kaushansky, K. The molecular mechanisms that control thrombopoiesis. J Clin Invest 115, 3339-3347, doi:10.1172/JCI26674 (2005).

10 Deutsch, V. R. & Tomer, A. Advances in megakaryocytopoiesis and thrombopoiesis: from bench to bedside. Br J Haematol 161, 778-793, doi:10.1111/bjh.12328 (2013).

11 Ihara, K. et al. Identification of mutations in the c-mpl gene in congenital amegakaryocytic thrombocytopenia. Proc Natl Acad Sci U S A 96, 3132-3136 (1999).

12 Dasouki, M. J. et al. Exome sequencing reveals a thrombopoietin ligand mutation in a Micronesian family with autosomal recessive aplastic anemia. Blood 122, 3440-3449, doi:10.1182/blood-2012-12-473538 (2013).

252

13 Wiestner, A., Schlemper, R. J., van der Maas, A. P. & Skoda, R. C. An activating splice donor mutation in the thrombopoietin gene causes hereditary thrombocythaemia. Nat Genet 18, 49-52, doi:10.1038/ng0198-49 (1998).

14 Zheng, C. et al. TPO-independent megakaryocytopoiesis. Crit Rev Oncol Hematol 65, 212-222, doi:10.1016/j.critrevonc.2007.11.003 (2008).

15 Tijssen, M. R. & Ghevaert, C. Transcription factors in late megakaryopoiesis and related platelet disorders. J Thromb Haemost 11, 593-604, doi:10.1111/jth.12131 (2013).

16 Song, W. J. et al. Haploinsufficiency of CBFA2 causes familial thrombocytopenia with propensity to develop acute myelogenous leukaemia. Nat Genet 23, 166-175, doi:10.1038/13793 (1999).

17 Nichols, K. E. et al. Familial dyserythropoietic anaemia and thrombocytopenia due to an inherited mutation in GATA1. Nat Genet 24, 266-270, doi:10.1038/73480 (2000).

18 Hart, A. et al. Fli-1 is required for murine vascular and megakaryocytic development and is hemizygously deleted in patients with thrombocytopenia. Immunity 13, 167-177 (2000).

19 Italiano, J. E., Lecine, P., Shivdasani, R. A. & Hartwig, J. H. Blood platelets are assembled principally at the ends of proplatelet processes produced by differentiated megakaryocytes. J Cell Biol 147, 1299-1312 (1999).

20 Thon, J. N. et al. Cytoskeletal mechanics of proplatelet maturation and platelet release. J Cell Biol 191, 861-874, doi:10.1083/jcb.201006102 (2010).

21 Chen, Y. et al. The abnormal proplatelet formation in MYH9-related macrothrombocytopenia results from an increased actomyosin contractility and is rescued by myosin IIA inhibition. J Thromb Haemost 11, 2163-2175, doi:10.1111/jth.12436 (2013).

22 Pecci, A. et al. Megakaryocytes of patients with MYH9-related thrombocytopenia present an altered proplatelet formation. Thromb Haemost 102, 90-96, doi:10.1160/TH09-01-0068 (2009).

23 Kunishima, S., Kobayashi, R., Itoh, T. J., Hamaguchi, M. & Saito, H. Mutation of the beta1-tubulin gene associated with congenital macrothrombocytopenia affecting microtubule assembly. Blood 113, 458-461, doi:10.1182/blood-2008-06-162610 (2009).

24 Blair, P. & Flaumenhaft, R. Platelet alpha-granules: basic biology and clinical correlates. Blood Rev 23, 177-189, doi:10.1016/j.blre.2009.04.001 (2009).

25 Kosaki, G. & Kambayashi, J. Thrombocytogenesis by megakaryocyte; Interpretation by protoplatelet hypothesis. Proc Jpn Acad Ser B Phys Biol Sci 87, 254-273 (2011).

253

26 Howell, W. H. & Donahue, D. D. THE PRODUCTION OF BLOOD PLATELETS IN THE LUNGS. J Exp Med 65, 177-203 (1937).

27 Lefrançais, E. et al. The lung is a site of platelet biogenesis and a reservoir for haematopoietic progenitors. Nature 544, 105-109, doi:10.1038/nature21706 (2017).

28 Gremmel, T., Frelinger, A. L. & Michelson, A. D. Platelet Physiology. Semin Thromb Hemost 42, 191-204, doi:10.1055/s-0035-1564835 (2016).

29 White, J. G. in Platelets (3rd Edition) (ed Alan D Michelson) Ch. 7, 117-144 (Elsevier Academic Press, 2013).

30 Hartwig, J. H. & DeSisto, M. The cytoskeleton of the resting human blood platelet: structure of the membrane skeleton and its attachment to actin filaments. The Journal of Cell Biology 112, 407-425, doi:10.1083/jcb.112.3.407 (1991).

31 Aslan, J. E. & McCarty, O. J. T. Rho GTPases in platelet function. Journal of Thrombosis and Haemostasis 11, 35-46, doi:10.1111/jth.12051 (2013).

32 Pleines, I. et al. Megakaryocyte-specific RhoA deficiency causes macrothrombocytopenia and defective platelet activation in hemostasis and thrombosis. Vol. 119 (2012).

33 Golebiewska, E. M. & Poole, A. W. Platelet secretion: From haemostasis to wound healing and beyond. Blood Rev 29, 153-162, doi:10.1016/j.blre.2014.10.003 (2015).

34 Oh, J. et al. Positional cloning of a gene for Hermansky-Pudlak syndrome, a disorder of cytoplasmic organelles. Nat Genet 14, 300-306, doi:10.1038/ng1196- 300 (1996).

35 Dell'Angelica, E. C., Shotelersuk, V., Aguilar, R. C., Gahl, W. A. & Bonifacino, J. S. Altered trafficking of lysosomal proteins in Hermansky-Pudlak syndrome due to mutations in the beta 3A subunit of the AP-3 adaptor. Mol Cell 3, 11-21 (1999).

36 Anikster, Y. et al. Mutation of a new gene causes a unique form of Hermansky- Pudlak syndrome in a genetic isolate of central Puerto Rico. Nat Genet 28, 376- 380, doi:10.1038/ng576 (2001).

37 Suzuki, T. et al. Hermansky-Pudlak syndrome is caused by mutations in HPS4, the human homolog of the mouse light-ear gene. Nat Genet 30, 321-324, doi:10.1038/ng835 (2002).

38 Li, W. et al. Hermansky-Pudlak syndrome type 7 (HPS-7) results from mutant dysbindin, a member of the biogenesis of lysosome-related organelles complex 1 (BLOC-1). Nat Genet 35, 84-89, doi:10.1038/ng1229 (2003).

39 Zhang, Q. et al. Ru2 and Ru encode mouse orthologs of the genes mutated in human Hermansky-Pudlak syndrome types 5 and 6. Nat Genet 33, 145-153, doi:10.1038/ng1087 (2003).

254

40 Morgan, N. V. et al. A germline mutation in BLOC1S3/reduced pigmentation causes a novel variant of Hermansky-Pudlak syndrome (HPS8). Am J Hum Genet 78, 160- 166, doi:10.1086/499338 (2006).

41 Albers, C. A. et al. Exome sequencing identifies NBEAL2 as the causative gene for gray platelet syndrome. Nat Genet 43, 735-737, doi:10.1038/ng.885 (2011).

42 Kahr, W. H. et al. Mutations in NBEAL2, encoding a BEACH protein, cause gray platelet syndrome. Nat Genet 43, 738-740, doi:10.1038/ng.884 (2011).

43 Gunay-Aygun, M. et al. NBEAL2 is mutated in gray platelet syndrome and is required for biogenesis of platelet alpha-granules. Nat Genet 43, 732-734, doi:10.1038/ng.883 (2011).

44 Barbosa, M. D. et al. Identification of mutations in two major mRNA isoforms of the Chediak-Higashi syndrome gene in human and mouse. Hum Mol Genet 6, 1091- 1098 (1997).

45 Lindemann, S., Krämer, B., Seizer, P. & Gawaz, M. Platelets, inflammation and atherosclerosis. J Thromb Haemost 5 Suppl 1, 203-211, doi:10.1111/j.1538- 7836.2007.02517.x (2007).

46 Gay, L. J. & Felding-Habermann, B. Contribution of platelets to tumour metastasis. Nat Rev Cancer 11, 123-134, doi:10.1038/nrc3004 (2011).

47 Golebiewska, E. M. & Poole, A. W. Secrets of platelet exocytosis - what do we really know about platelet secretion mechanisms? Br J Haematol, doi:10.1111/bjh.12682 (2013).

48 Versteeg, H. H., Heemskerk, J. W., Levi, M. & Reitsma, P. H. New fundamentals in hemostasis. Physiol Rev 93, 327-358, doi:10.1152/physrev.00016.2011 (2013).

49 Swystun, L. L. & Liaw, P. C. The role of leukocytes in thrombosis. Blood 128, 753- 762, doi:10.1182/blood-2016-05-718114 (2016).

50 Genetic Alliance UK. Rare Disease UK. http://www.raredisease.org.uk Accessed January 2018.

51 Rodeghiero, F., Castaman, G. & Dini, E. Epidemiological investigation of the prevalence of von Willebrand's disease. Blood 69, 454-459 (1987).

52 UK Haemophilia Centre Doctors Organisation. UK National Haemophilia Database Bleeding Disorder Statistics for 2011-2012. www.ukhcdo.org (Accessed Jan 2018).

53 Bryant, N. & Watts, R. Thrombocytopenic syndromes masquerading as childhood immune thrombocytopenic purpura. Clin Pediatr (Phila) 50, 225-230, doi:10.1177/0009922810385676 (2011).

54 Bader-Meunier, B. et al. Misdiagnosis of chronic thrombocytopenia in childhood. J Pediatr Hematol Oncol 25, 548-552 (2003).

255

55 Arnold, D. M. et al. Misdiagnosis of primary immune thrombocytopenia and frequency of bleeding: lessons from the McMaster ITP Registry. Blood Adv 1, 2414- 2420, doi:10.1182/bloodadvances.2017010942 (2017).

56 Bolton-Maggs, P. H. et al. A review of inherited platelet disorders with guidelines for their management on behalf of the UKHCDO. Br J Haematol 135, 603-633, doi:10.1111/j.1365-2141.2006.06343.x (2006).

57 Harrison, P. et al. Guidelines for the laboratory investigation of heritable disorders of platelet function. Br J Haematol 155, 30-44, doi:10.1111/j.1365- 2141.2011.08793.x (2011).

58 Sadler, J. E. Slippery criteria for von Willebrand disease type 1. J Thromb Haemost 2, 1720-1723, doi:10.1111/j.1538-7836.2004.00933.x (2004).

59 Friberg, B., Ornö, A. K., Lindgren, A. & Lethagen, S. Bleeding disorders among young women: a population-based prevalence study. Acta Obstet Gynecol Scand 85, 200-206 (2006).

60 Quiroga, T. et al. High prevalence of bleeders of unknown cause among patients with inherited mucocutaneous bleeding. A prospective study of 280 patients and 299 controls. Haematologica 92, 357-365 (2007).

61 Mauer, A. C. et al. Impact of sex, age, race, ethnicity and aspirin use on bleeding symptoms in healthy adults. J Thromb Haemost 9, 100-108, doi:10.1111/j.1538- 7836.2010.04105.x (2011).

62 Quiroga, T. & Mezzano, D. Is my patient a bleeder? A diagnostic framework for mild bleeding disorders. Hematology Am Soc Hematol Educ Program 2012, 466-474, doi:10.1182/asheducation-2012.1.466 (2012).

63 Bowman, M. et al. Generation and validation of the Condensed MCMDM-1VWD Bleeding Questionnaire for von Willebrand disease. Journal of Thrombosis and Haemostasis 6, 2062-2066, doi:10.1111/j.1538-7836.2008.03182.x (2008).

64 Tosetto, A., Castaman, G., Plug, I., Rodeghiero, F. & Eikenboom, J. Prospective evaluation of the clinical utility of quantitative bleeding severity assessment in patients referred for hemostatic evaluation. J Thromb Haemost 9, 1143-1148, doi:10.1111/j.1538-7836.2011.04265.x (2011).

65 Rodeghiero, F. et al. ISTH/SSC bleeding assessment tool: a standardized questionnaire and a proposal for a new bleeding score for inherited bleeding disorders. J Thromb Haemost 8, 2063-2065, doi:10.1111/j.1538- 7836.2010.03975.x (2010).

66 Mauer, A. C. et al. Creating an ontology-based human phenotyping system: The Rockefeller University bleeding history experience. Clin Transl Sci 2, 382-385, doi:10.1111/j.1752-8062.2009.00147.x (2009).

67 Favaloro, E. J. Clinical utility of the PFA-100. Semin Thromb Hemost 34, 709-733, doi:10.1055/s-0029-1145254 (2008).

256

68 Hayward, C. P. et al. Development of North American consensus guidelines for medical laboratories that perform and interpret platelet function testing using light transmission aggregometry. Am J Clin Pathol 134, 955-963, doi:10.1309/AJCP9V3RRVNZMKDS (2010).

69 Nieuwenhuis, H. K., Akkerman, J. W. & Sixma, J. J. Patients with a prolonged bleeding time and normal aggregation tests may have storage pool deficiency: studies on one hundred six patients. Blood 70, 620-623 (1987).

70 Gordon, N., Thom, J., Cole, C. & Baker, R. Rapid detection of hereditary and acquired platelet storage pool deficiency by flow cytometry. Br J Haematol 89, 117-123 (1995).

71 Cattaneo, M. Light transmission aggregometry and ATP release for the diagnostic assessment of platelet function. Semin Thromb Hemost 35, 158-167, doi:10.1055/s-0029-1220324 (2009).

72 Cattaneo, M. et al. Results of a worldwide survey on the assessment of platelet function by light transmission aggregometry: a report from the platelet physiology subcommittee of the SSC of the ISTH. J Thromb Haemost 7, 1029, doi:10.1111/j.1538-7836.2009.03458.x (2009).

73 Watson, S. et al. Phenotypic approaches to gene mapping in platelet function disorders - identification of new variant of P2Y12, TxA2 and GPVI receptors. Hamostaseologie 30, 29-38 (2010).

74 Pereira, J., Quiroga, T. & Mezzano, D. Laboratory assessment of familial, nonthrombocytopenic mucocutaneous bleeding: a definitive diagnosis is often not possible. Semin Thromb Hemost 34, 654-662, doi:10.1055/s-0028-1104544 (2008).

75 Sanger, F., Nicklen, S. & Coulson, A. R. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A 74, 5463-5467 (1977).

76 Rehm, H. L. Disease-targeted sequencing: a cornerstone in the clinic. Nat Rev Genet 14, 295-300 (2013).

77 Gresele, P. Diagnosis of inherited platelet function disorders: guidance from the SSC of the ISTH. J Thromb Haemost 13, 314-322, doi:10.1111/jth.12792 (2015).

78 Simeoni, I. et al. A high-throughput sequencing test for diagnosing inherited bleeding, thrombotic, and platelet disorders. Blood 127, 2791-2803, doi:10.1182/blood-2015-12-688267 (2016).

79 UK Haemophilia Centre Doctors Organisation (UKHDCO). Directory of molecular diagnostic services for inherited bleeding disorders. http://www.ukhcdo.org/genetics-networkdirect/.

80 Bamshad, M. J. et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nat Rev Genet 12, 745-755,

257

doi:http://www.nature.com/nrg/journal/v12/n11/suppinfo/nrg3031_S1.html (2011).

81 Gilissen, C., Hoischen, A., Brunner, H. G. & Veltman, J. A. Unlocking Mendelian disease using exome sequencing. Genome Biol 12, 228, doi:10.1186/gb-2011-12- 9-228 (2011).

82 Robinson, P. N., Krawitz, P. & Mundlos, S. Strategies for exome and genome sequence data analysis in disease-gene discovery projects. Clin Genet 80, 127-132, doi:10.1111/j.1399-0004.2011.01713.x (2011).

83 Lentaigne, C. et al. Inherited platelet disorders: toward DNA-based diagnosis. Blood 127, 2814-2823, doi:10.1182/blood-2016-03-378588 (2016).

84 Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376-380, doi:10.1038/nature03959 (2005).

85 McKernan, K. J. et al. Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res 19, 1527-1541, doi:10.1101/gr.091868.109 (2009).

86 Bentley, D. R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53-59, doi:10.1038/nature07517 (2008).

87 Rothberg, J. M. et al. An integrated semiconductor device enabling non-optical genome sequencing. Nature 475, 348-352, doi:10.1038/nature10242 (2011).

88 Henson, J., Tischler, G. & Ning, Z. Next-generation sequencing and large genome assemblies. Pharmacogenomics 13, 901-915, doi:10.2217/pgs.12.72 (2012).

89 Johnsen, J. M., Nickerson, D. A. & Reiner, A. P. Massively parallel sequencing: the new frontier of hematologic genomics. Blood 122, 3268-3275, doi:10.1182/blood- 2013-07-460287 (2013).

90 Ley, T. J. et al. A pilot study of high-throughput, sequence-based mutational profiling of primary human acute myeloid leukemia cell genomes. Proceedings of the National Academy of Sciences 100, 14275-14280, doi:10.1073/pnas.2335924100 (2003).

91 de Ligt, J. et al. Diagnostic Exome Sequencing in Persons with Severe Intellectual Disability. New England Journal of Medicine 367, 1921-1929, doi:10.1056/NEJMoa1206524 (2012).

92 Worthey, E. A. et al. Making a definitive diagnosis: successful clinical application of whole exome sequencing in a child with intractable inflammatory bowel disease. Genet Med 13, 255-262, doi:10.1097/GIM.0b013e3182088158 (2011).

93 Choi, M. et al. Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proc Natl Acad Sci U S A 106, 19096-19101, doi:10.1073/pnas.0910672106 (2009).

258

94 Morgan, J. E. et al. Genetic diagnosis of familial breast cancer using clonal sequencing. Hum Mutat 31, 484-491, doi:10.1002/humu.21216 (2010).

95 Ng, S. B. et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461, 272-276, doi:10.1038/nature08250 (2009).

96 Ng, S. B. et al. Exome sequencing identifies the cause of a mendelian disorder. Nat Genet 42, 30-35, doi:10.1038/ng.499 (2010).

97 Kryukov, G. V., Pennacchio, L. A. & Sunyaev, S. R. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. Am J Hum Genet 80, 727-739, doi:10.1086/513473 (2007).

98 O'Sullivan, J. et al. A paradigm shift in the delivery of services for diagnosis of inherited retinal disease. J Med Genet 49, 322-326, doi:10.1136/jmedgenet-2012- 100847 (2012).

99 Teekakirikul, P., Kelly, M. A., Rehm, H. L., Lakdawala, N. K. & Funke, B. H. Inherited cardiomyopathies: molecular genetics and clinical genetic testing in the postgenomic era. J Mol Diagn 15, 158-170, doi:10.1016/j.jmoldx.2012.09.002 (2013).

100 NIH & Institute, N. H. G. R. The cost of sequencing a human genome. https://www.genome.gov/sequencingcosts/ (accessed Jan 2018).

101 Gilissen, C. et al. Genome sequencing identifies major causes of severe intellectual disability. Nature 511, 344-347, doi:10.1038/nature13394 (2014).

102 England, G. The 100,000 genomes project. https://www.genomicsengland.co.uk/the-100000-genomes-project (accessed Jan 2018).

103 Azzam, H. A. et al. The condensed MCMDM-1 VWD bleeding questionnaire as a predictor of bleeding disorders in women with unexplained menorrhagia. Blood Coagul Fibrinolysis 23, 311-315, doi:10.1097/MBC.0b013e32835274d9 (2012).

104 Köhler, S. et al. The Human Phenotype Ontology in 2017. Nucleic Acids Research 45, D865-D876, doi:10.1093/nar/gkw1039 (2017).

105 Illumina Inc. Isaac genome alignment and Isaac variant caller. https://www.illumina.com/Documents/products/whitepapers/whitepaper_isaac_ workflow.pdf (accessed March 2018).

106 Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6, 80-92, doi:10.4161/fly.19695 (2012).

107 McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol 17, 122, doi:10.1186/s13059-016-0974-4 (2016).

259

108 Auton, A. et al. A global reference for human genetic variation. Nature 526, 68-74, doi:10.1038/nature15393 (2015).

109 Walter, K. et al. The UK10K project identifies rare variants in health and disease. Nature 526, 82-90, doi:10.1038/nature14962 (2015).

110 NHLBI Exome Sequencing Project (ESP). Exome variant server. http://evs.gs.washington.edu/EVS/ (accessed June 2018).

111 Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285-291, doi:10.1038/nature19057 http://www.nature.com/nature/journal/v536/n7616/abs/nature19057.html#supple mentary-information (2016).

112 HUGO gene nomenclature committee. Genenames.org. https://www.genenames.org/ (accessed March 2018).

113 den Dunnen, J. T. et al. HGVS Recommendations for the Description of Sequence Variants: 2016 Update. Hum Mutat 37, 564-569, doi:10.1002/humu.22981 (2016).

114 Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 46, 310-315, doi:10.1038/ng.2892 (2014).

115 Ng, P. C. & Henikoff, S. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res 31, 3812-3814 (2003).

116 Adzhubei, I., Jordan, D. M. & Sunyaev, S. R. Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet Chapter 7, Unit7.20, doi:10.1002/0471142905.hg0720s76 (2013).

117 Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285-291, doi:10.1038/nature19057 (2016).

118 Stenson, P. D. et al. The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine. Hum Genet 133, 1-9, doi:10.1007/s00439-013-1358-4 (2014).

119 International Society on Thrombosis and Haemostasis (ISTH). SSC:Genomics in thrombosis and haemostats. http://www.isth.org/members/group.aspx?id=104628 (accessed feb 2018).

120 Pippucci, T. et al. Mutations in the 5' UTR of ANKRD26, the ankirin repeat domain 26 gene, cause an autosomal-dominant form of inherited thrombocytopenia, THC2. Am J Hum Genet 88, 115-120, doi:10.1016/j.ajhg.2010.12.006 (2011).

121 Thrombogenomics. Gene and disorder list. http://thrombo.cambridgednadiagnosis.org.uk/ (accessed Jan 2018).

260

122 Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med 17, 405-424, doi:10.1038/gim.2015.30 (2015).

123 Ellard, S. et al. (Association for Clinical Genomic Science, 2017).

124 Jarvik, G. P. & Browning, B. L. Consideration of Cosegregation in the Pathogenicity Classification of Genomic Variants. Am J Hum Genet 98, 1077-1081, doi:10.1016/j.ajhg.2016.04.003 (2016).

125 Landrum, M. J. et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res 44, D862-868, doi:10.1093/nar/gkv1222 (2016).

126 EMBL-EBI. European Genome-phenome Archive (EGA),. www.ebi.ac.uk/ega (accessed May 2018).

127 New England Biolabs Inc. NEBuilder Assembly Tool. http://nebuilder.neb.com (last accessed June 2017).

128 Westbury, S. K. et al. Human phenotype ontology annotation and cluster analysis to unravel genetic defects in 707 cases with unexplained bleeding and platelet disorders. Genome Med 7, 36, doi:10.1186/s13073-015-0151-5 (2015).

129 Kohler, S. et al. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res 42, D966-974, doi:10.1093/nar/gkt1026 (2014).

130 Oxford University Press. Oxford English dictionary. https://en.oxforddictionaries.com (accessed Jan 2018).

131 Robinson, P. N. et al. The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. Am J Hum Genet 83, 610-615, doi:10.1016/j.ajhg.2008.09.017 (2008).

132 Kohler, s. http://human-phenotype-ontology.github.io (accessed Jan 2018).

133 www.orpha.net (accessed Jan 2018).

134 www.monarchinitiative.org (accessed Jan 2018).

135 Firth, H. V. et al. DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources. Am J Hum Genet 84, 524-533, doi:10.1016/j.ajhg.2009.03.010 (2009).

136 https://decipher.sanger.ac.uk (accessed Jan 2018).

137 World Health Organisation. ICD-10 online version. http://apps.who.int/classifications/icd10/browse/2016/en (accessed Jan 2018).

261

138 SNOMED International. SNOMED-CT. https://www.snomed.org/snomed- ct (accessed Jan 2018).

139 Johns Hopkins University. OMIM. https://omim.org (last accessed July 2018).

140 Zemojtel, T. et al. Effective diagnosis of genetic disease by computational phenotype analysis of the disease-associated genome. Sci Transl Med 6, 252ra123, doi:10.1126/scitranslmed.3009262 (2014).

141 Köhler, S. et al. Clinical diagnostics in human genetics with semantic similarity searches in ontologies. Am J Hum Genet 85, 457-464, doi:10.1016/j.ajhg.2009.09.003 (2009).

142 Masino, A. J. et al. Clinical phenotype-based gene prioritization: an initial study using semantic similarity and the human phenotype ontology. BMC Bioinformatics 15, 248, doi:10.1186/1471-2105-15-248 (2014).

143 Bone, W. P. et al. Computational evaluation of exome sequence data using human and model organism phenotypes improves diagnostic efficiency. Genet Med 18, 608-617, doi:10.1038/gim.2015.137 (2016).

144 Robinson, P. N. Genomic data sharing for translational research and diagnostics. Genome Med 6, 78, doi:10.1186/s13073-014-0078-2 (2014).

145 Westbury, S. et al. Human phenotype ontology annotation and cluster analysis to unravel genetic defects in 707 cases with unexplained bleeding and platelet disorders. Genome Medicine 7, 36 (2015).

146 Simeoni, I. et al. A high-throughput sequencing test for diagnosing inherited bleeding, thrombotic, and platelet disorders. Blood 127, 2791-2803, doi:10.1182/blood-2015-12-688267 (2016).

147 Sudlow, C. et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med 12, e1001779, doi:10.1371/journal.pmed.1001779 (2015).

148 Lowe, G. C., Lordkipanidzé, M., Watson, S. P. & group, U. G. s. Utility of the ISTH bleeding assessment tool in predicting platelet defects in participants with suspected inherited platelet function disorders. J Thromb Haemost 11, 1663-1668, doi:10.1111/jth.12332 (2013).

149 Leinøe, E. et al. Application of whole-exome sequencing to direct the specific functional testing and diagnosis of rare inherited bleeding disorders in patients from the Öresund Region, Scandinavia. Br J Haematol 179, 308-322, doi:10.1111/bjh.14863 (2017).

150 Bariana, T. K., Ouwehand, W. H., Guerrero, J. A., Gomez, K. & BRIDGE Bleeding, T. r. a. P. D. a. T. C. Dawning of the age of genomics for platelet granule disorders: improving insight, diagnosis and management. Br J Haematol 176, 705-720, doi:10.1111/bjh.14471 (2017).

262

151 Noris, P. et al. Mutations in ANKRD26 are responsible for a frequent form of inherited thrombocytopenia: analysis of 78 patients from 21 families. Blood 117, 6673-6680, doi:10.1182/blood-2011-02-336537 (2011).

152 Kunishima, S. et al. Mutations in the NMMHC-A gene cause autosomal dominant macrothrombocytopenia with leukocyte inclusions (May-Hegglin anomaly/Sebastian syndrome). Blood 97, 1147-1149 (2001).

153 Kunishima, S. et al. Identification of six novel MYH9 mutations and genotype- phenotype relationships in autosomal dominant macrothrombocytopenia with leukocyte inclusions. J Hum Genet 46, 722-729, doi:10.1007/s100380170007 (2001).

154 Seri, M. et al. Mutations in MYH9 result in the May-Hegglin anomaly, and Fechtner and Sebastian syndromes. The May-Heggllin/Fechtner Syndrome Consortium. Nat Genet 26, 103-105, doi:10.1038/79063 (2000).

155 Saposnik, B. et al. Mutation spectrum and genotype-phenotype correlations in a large French cohort of MYH9-Related Disorders. Mol Genet Genomic Med 2, 297- 312, doi:10.1002/mgg3.68 (2014).

156 Bluteau, D. et al. Thrombocytopenia-associated mutations in the ANKRD26 regulatory region induce MAPK hyperactivation. J Clin Invest 124, 580-591, doi:10.1172/jci71861 (2014).

157 Averina, M., Jensvoll, H., Strand, H. & Sovershaev, M. A novel ANKRD26 gene variant causing inherited thrombocytopenia in a family of Finnish origin: Another brick in the wall? Thromb Res 151, 41-43, doi:10.1016/j.thromres.2017.01.001 (2017).

158 Weerakkody, R. A. et al. Targeted next-generation sequencing makes new molecular diagnoses and expands genotype-phenotype relationship in Ehlers- Danlos syndrome. Genet Med 18, 1119-1127, doi:10.1038/gim.2016.14 (2016).

159 Anstey, A., Mayne, K., Winter, M., Van De Pette, J. & Pope, F. M. Platelet and coagulation studies in Ehlers-Danlos syndrome. British Journal of Dermatology 125, 155-163, doi:10.1111/j.1365-2133.1991.tb06063.x (1991).

160 Karaca, M., Cronberg, L. & Nilsson, I. M. Abnormal platelet-collagen reaction in Ehlers-Danlos syndrome. Scand J Haematol 9, 465-469 (1972).

161 Español, I., Hernández, A., Pujol, R. M., Urrutia, T. & Pujol-Moix, N. Type IV Ehlers- Danlos syndrome with platelet delta-storage pool disease. Ann Hematol 77, 47-50 (1998).

162 Busch, A. et al. Vascular type Ehlers-Danlos syndrome is associated with platelet dysfunction and low vitamin D serum concentration. Orphanet J Rare Dis 11, 111, doi:10.1186/s13023-016-0491-2 (2016).

163 Bach, J. E., Wolf, B., Oldenburg, J., Müller, C. R. & Rost, S. Identification of deep intronic variants in 15 haemophilia A patients by next generation sequencing of

263

the whole factor VIII gene. Thromb Haemost 114, 757-767, doi:10.1160/TH14-12- 1011 (2015).

164 Liu, M., Murphy, M. E. & Thompson, A. R. A domain mutations in 65 haemophilia A families and molecular modelling of dysfunctional factor VIII proteins. Br J Haematol 103, 1051-1060 (1998).

165 Schneppenheim, R. et al. A cluster of mutations in the D3 domain of von Willebrand factor correlates with a distinct subgroup of von Willebrand disease: type 2A/IIE. Blood 115, 4894-4901, doi:10.1182/blood-2009-07-226324 (2010).

166 Perry, D. J. & Carrell, R. W. CpG dinucleotides are "hotspots" for mutation in the antithrombin III gene. Twelve variants identified using the polymerase chain reaction. Mol Biol Med 6, 239-243 (1989).

167 Amendola, L. M. et al. Actionable exomic incidental findings in 6503 participants: challenges of variant classification. Genome Res 25, 305-315, doi:10.1101/gr.183483.114 (2015).

168 Pospisilova, D., Cmejlova, J., Slavik, L. & Cmejla, R. Elevated thrombopoietin levels and alterations in the sequence of its receptor, c-Mpl, in patients with Diamond- Blackfan anemia. Haematologica 89, 1391-1392 (2004).

169 van den Oudenrijn, S. et al. Mutations in the thrombopoietin receptor, Mpl, in children with congenital amegakaryocytic thrombocytopenia. Br J Haematol 110, 441-448 (2000).

170 Sutherland, M. S. et al. A novel deletion mutation is recurrent in von Willebrand disease types 1 and 3. Blood 114, 1091-1098, doi:10.1182/blood-2008-08- 173278 (2009).

171 George, J. N., Nurden, A. T. & Phillips, D. R. Molecular defects in interactions of platelets with the vessel wall. N Engl J Med 311, 1084-1098, doi:10.1056/NEJM198410253111705 (1984).

172 BERNARD, J. & SOULIER, J. P. [Not Available]. Sem Hop 24, 3217-3223 (1948).

173 Ware, J. et al. Nonsense mutation in the glycoprotein Ib alpha coding sequence associated with Bernard-Soulier syndrome. Proc Natl Acad Sci U S A 87, 2026- 2030 (1990).

174 López, J. A., Andrews, R. K., Afshar-Kharghan, V. & Berndt, M. C. Bernard-Soulier syndrome. Blood 91, 4397-4418 (1998).

175 Ludlow, L. B. et al. Identification of a mutation in a GATA binding site of the platelet glycoprotein Ibbeta promoter resulting in the Bernard-Soulier syndrome. J Biol Chem 271, 22076-22080 (1996).

176 Moran, N. et al. Surface expression of glycoprotein ib alpha is dependent on glycoprotein ib beta: evidence from a novel mutation causing Bernard-Soulier syndrome. Blood 96, 532-539 (2000).

264

177 Kunishima, S. et al. Missense mutations of the glycoprotein (GP) Ib beta gene impairing the GPIb alpha/beta disulfide linkage in a family with giant platelet disorder. Blood 89, 2404-2412 (1997).

178 Kunishima, S., Naoe, T., Kamiya, T. & Saito, H. Novel heterozygous missense mutation in the platelet glycoprotein Ib beta gene associated with isolated giant platelet disorder. Am J Hematol 68, 249-255 (2001).

179 Kurokawa, Y. et al. A missense mutation (Tyr88 to Cys) in the platelet membrane glycoprotein Ibbeta gene affects GPIb/IX complex expression--Bernard-Soulier syndrome in the homozygous form and giant platelets in the heterozygous form. Thromb Haemost 86, 1249-1256 (2001).

180 Balduini, A. et al. Proplatelet formation in heterozygous Bernard-Soulier syndrome type Bolzano. J Thromb Haemost 7, 478-484, doi:10.1111/j.1538- 7836.2008.03255.x (2009).

181 Savoia, A. et al. Spectrum of the Mutations in Bernard–Soulier Syndrome. Human Mutation 35, 1033-1045, doi:10.1002/humu.22607 (2014).

182 Miller, J. L., Lyle, V. A. & Cunningham, D. Mutation of leucine-57 to phenylalanine in a platelet glycoprotein Ib alpha leucine tandem repeat occurring in patients with an autosomal dominant variant of Bernard-Soulier disease. Blood 79, 439- 446 (1992).

183 Noris, P. et al. Clinical and laboratory features of 103 patients from 42 Italian families with inherited thrombocytopenia derived from the monoallelic Ala156Val mutation of GPIbα (Bolzano mutation). Haematologica 97, 82-88, doi:10.3324/haematol.2011.050682 (2012).

184 Liang, H. P. et al. Heterozygous loss of platelet glycoprotein (GP) Ib-V-IX variably affects platelet function in velocardiofacial syndrome (VCFS) patients. Thromb Haemost 98, 1298-1308 (2007).

185 Greene Daniel, N. B., Richardson Sylvia, Turro Ernest. Phenotype similarity regression for identifying the genetic determinants of rare diseases. American Journal of Human Genetics (2016).

186 Sivapalaratnam, S. et al. Rare variants in GP1BB are responsible for autosomal dominant macrothrombocytopenia. Blood 129, 520-524, doi:10.1182/blood- 2016-08-732248 (2017).

187 Nurden, A. T., Fiore, M., Nurden, P. & Pillois, X. Glanzmann thrombasthenia: a review of ITGA2B and ITGB3 defects with emphasis on variants, phenotypic variability, and mouse models. Blood 118, 5996-6005, doi:10.1182/blood-2011- 07-365635 (2011).

188 Buitrago, L. et al. αIIbβ3 variants defined by next-generation sequencing: predicting variants likely to cause Glanzmann thrombasthenia. Proc Natl Acad Sci U S A 112, E1898-1907, doi:10.1073/pnas.1422238112 (2015).

265

189 Nelson, E. J. et al. Diversity of Glanzmann thrombasthenia in southern India: 10 novel mutations identified among 15 unrelated patients. J Thromb Haemost 4, 1730-1737, doi:10.1111/j.1538-7836.2006.02066.x (2006).

190 Stefanini, L., Roden, R. C. & Bergmeier, W. CalDAG-GEFI is at the nexus of calcium- dependent platelet activation. Blood 114, 2506-2514, doi:10.1182/blood-2009- 04-218768 (2009).

191 Canault, M. et al. Human CalDAG-GEFI gene (RASGRP2) mutation affects platelet function and causes severe bleeding. J Exp Med 211, 1349-1362, doi:10.1084/jem.20130477 (2014).

192 Westbury, S. K. et al. Expanded repertoire of RASGRP2 variants responsible for platelet dysfunction and severe bleeding. Blood, doi:10.1182/blood-2017-03-776773 (2017).

193 Westbury, S. K. et al. Expanded repertoire of. Blood 130, 1026-1030, doi:10.1182/blood-2017-03-776773 (2017).

194 Stefanini, L. & Bergmeier, W. CalDAG-GEFI and platelet activation. Platelets 21, 239-243, doi:10.3109/09537101003639931 (2010).

195 Sevivas, T. et al. Identification of two novel mutations in RASGRP2 affecting platelet CalDAG-GEFI expression and function in patients with bleeding diathesis. Platelets 29, 192-195, doi:10.1080/09537104.2017.1336214 (2018).

196 Saultier, P. et al. Macrothrombocytopenia and dense granule deficiency associated with FLI1 variants: ultrastructural and pathogenic features. Haematologica 102, 1006-1016, doi:10.3324/haematol.2016.153577 (2017).

197 Pecci, A. et al. MYH9-related disease: a novel prognostic model to predict the clinical evolution of the disease based on genotype-phenotype correlations. Hum Mutat 35, 236-247, doi:10.1002/humu.22476 (2014).

198 Savoia, A., De Rocco, D. & Pecci, A. MYH9 gene mutations associated with bleeding. Platelets 28, 312-315, doi:10.1080/09537104.2017.1294250 (2017).

199 Stritt, S. et al. A gain-of-function variant in DIAPH1 causes dominant macrothrombocytopenia and hearing loss. Blood, doi:10.1182/blood-2015-10- 675629 (2016).

200 Turro, E. et al. A dominant gain-of-function mutation in universal tyrosine kinase SRC causes thrombocytopenia, myelofibrosis, bleeding, and bone pathologies. Sci Transl Med 8, 328ra330, doi:10.1126/scitranslmed.aad7666 (2016).

201 Pleines, I. et al. Mutations in tropomyosin 4 underlie a rare form of human macrothrombocytopenia. J Clin Invest 127, 814-829, doi:10.1172/JCI86154 (2017).

266

202 Hao, Z. et al. Specific Ablation of the Apoptotic Functions of Cytochrome c Reveals a Differential Requirement for Cytochrome c and Apaf-1 in Apoptosis. Cell 121, 579-591, doi:http://dx.doi.org/10.1016/j.cell.2005.03.016 (2005).

203 Chandra, D. et al. Intracellular nucleotides act as critical prosurvival factors by binding to cytochrome C and inhibiting apoptosome. Cell 125, 1333-1346, doi:10.1016/j.cell.2006.05.026 (2006).

204 Morison, I. M. et al. A mutation of human cytochrome c enhances the intrinsic apoptotic pathway but causes only thrombocytopenia. Nature Genetics 40, 387- 389, doi:10.1038/ng.103 (2008).

205 Banci, L., Bertini, I., Rosato, A. & Varani, G. Mitochondrial cytochromes c: a comparative analysis. JBIC Journal of Biological Inorganic Chemistry 4, 824-837, doi:10.1007/s007750050356 (1999).

206 Josephs, T. et al. Conformational change and human cytochrome c function: mutation of residue 41 modulates caspase activation and destabilizes Met-80 coordination. JBIC Journal of Biological Inorganic Chemistry 18, 289-297, doi:10.1007/s00775-012-0973-1 (2013).

207 De Rocco, D. et al. Mutations of cytochrome c identified in patients with thrombocytopenia THC4 affect both apoptosis and cellular bioenergetics. Biochimica et Biophysica Acta (BBA) - Molecular Basis of Disease 1842, 269-274, doi:http://dx.doi.org/10.1016/j.bbadis.2013.12.002 (2014).

208 Ong, L., Morison, I. M. & Ledgerwood, E. C. Megakaryocytes from CYCS mutation- associated thrombocytopenia release platelets by both proplatelet-dependent and -independent processes. Br J Haematol 176, 268-279, doi:10.1111/bjh.14421 (2017).

209 Johnson, B. et al. Whole exome sequencing identifies genetic variants in inherited thrombocytopenia with secondary qualitative function defects. Haematologica 101, 1170-1179, doi:10.3324/haematol.2016.146316 (2016).

210 Chen, L. et al. Transcriptional diversity during lineage commitment of human blood progenitors. Science 345, 1251033, doi:10.1126/science.1251033 (2014).

211 Greene, D., Richardson, S., Turro, E. & BioResource, N. A Fast Association Test for Identifying Pathogenic Variants Involved in Rare Diseases. Am J Hum Genet 101, 104-114, doi:10.1016/j.ajhg.2017.05.015 (2017).

212 Ghevaert, C. et al. A nonsynonymous SNP in the ITGB3 gene disrupts the conserved membrane-proximal cytoplasmic salt bridge in the αIIbβ3 integrin and cosegregates dominantly with abnormal proplatelet formation and macrothrombocytopenia. Blood 111, 3407-3414, doi:10.1182/blood-2007-09- 112615 (2008).

213 Peyruchaud, O. et al. R to Q amino acid substitution in the GFFKR sequence of the cytoplasmic domain of the integrin IIb subunit in a patient with a Glanzmann's thrombasthenia-like syndrome. Blood 92, 4178-4187 (1998).

267

214 Savoia, A. et al. Autosomal dominant macrothrombocytopenia in Italy is most frequently a type of heterozygous Bernard-Soulier syndrome. Blood 97, 1330- 1335 (2001).

215 Savoia, A. et al. Spectrum of the mutations in Bernard-Soulier syndrome. Hum Mutat 35, 1033-1045, doi:10.1002/humu.22607 (2014).

216 Westbury, S. K., Shoemark, D. K. & Mumford, A. D. ACTN1 variants associated with thrombocytopenia. Platelets 28, 625-627, doi:10.1080/09537104.2017.1356455 (2017).

217 Chiang, C. & Ayyanathan, K. Snail/Gfi-1 (SNAG) family zinc finger proteins in transcription regulation, chromatin dynamics, cell signaling, development, and disease. Cytokine Growth Factor Rev 24, 123-131, doi:10.1016/j.cytogfr.2012.09.002 (2013).

218 Vassen, L., Fiolka, K. & Möröy, T. Gfi1b alters histone methylation at target gene promoters and sites of gamma-satellite containing heterochromatin. EMBO J 25, 2409-2419, doi:10.1038/sj.emboj.7601124 (2006).

219 Saleque, S., Cameron, S. & Orkin, S. H. The zinc-finger proto-oncogene Gfi-1b is essential for development of the erythroid and megakaryocytic lineages. Genes Dev 16, 301-306, doi:10.1101/gad.959102 (2002).

220 Foudi, A. et al. Distinct, strict requirements for Gfi-1b in adult bone marrow red cell and platelet generation. J Exp Med 211, 909-927, doi:10.1084/jem.20131065 (2014).

221 Laurent, B. et al. A short Gfi-1B isoform controls erythroid differentiation by recruiting the LSD1-CoREST complex through the dimethylation of its SNAG domain. J Cell Sci 125, 993-1002, doi:10.1242/jcs.095877 (2012).

222 Schulze, H. et al. Recessive grey platelet-like syndrome with unaffected erythropoiesis in the absence of the Splice Isoform GFI1B-p37. Haematologica, doi:10.3324/haematol.2017.167957 (2017).

223 Monteferrario, D. et al. A dominant-negative GFI1B mutation in the gray platelet syndrome. N Engl J Med 370, 245-253, doi:10.1056/NEJMoa1308130 (2014).

224 Stevenson, W. S. et al. GFI1B mutation causes a bleeding disorder with abnormal platelet function. J Thromb Haemost 11, 2039-2047, doi:10.1111/jth.12368 (2013).

225 Monteferrario, D. et al. A Dominant-Negative GFI1B Mutation in the Gray Platelet Syndrome. New England Journal of Medicine 370, 245-253, doi:doi:10.1056/NEJMoa1308130 (2014).

226 Chen, L. et al. Transcriptional diversity during lineage commitment of human blood progenitors. Science 345, 1251033, doi:10.1126/science.1251033 (2014).

268

227 Kitamura, K. et al. Functional characterization of a novel GFI1B mutation causing congenital macrothrombocytopenia. J Thromb Haemost 14, 1462-1469, doi:10.1111/jth.13350 (2016).

228 Ferreira, C. R. et al. Combined alpha-delta platelet storage pool deficiency is associated with mutations in GFI1B. Mol Genet Metab 120, 288-294, doi:10.1016/j.ymgme.2016.12.006 (2017).

229 Rabbolini, D. J. et al. Thrombocytopenia and CD34 expression is decoupled from α-granule deficiency with mutation of the first growth factor-independent 1B zinc finger. J Thromb Haemost 15, 2245-2258, doi:10.1111/jth.13843 (2017).

230 Polfus, L. M. et al. Whole-Exome Sequencing Identifies Loci Associated with Blood Cell Traits and Reveals a Role for Alternative GFI1B Splice Variants in Human Hematopoiesis. Am J Hum Genet 99, 481-488, doi:10.1016/j.ajhg.2016.06.016 (2016).

231 Chen, L. et al. Transcriptional diversity during lineage commitment of human blood progenitors. Science 345 (2014).

232 Genomics England. The 100,000 genomes project. https://www.genomicsengland.co.uk/the-100000-genomes-project (accessed Jan 2018).

233 Turro, E. et al. A dominant gain-of-function mutation in universal tyrosine kinase SRC causes thrombocytopenia, myelofibrosis, bleeding, and bone pathologies. Science Translational Medicine 8, 328ra330 (2016).

234 Stritt, S. et al. Defects in TRPM7 channel function deregulate thrombopoiesis through altered cellular Mg(2+) homeostasis and cytoskeletal architecture. Nat Commun 7, 11097, doi:10.1038/ncomms11097 (2016).

235 Matheny, S. A. et al. Ras regulates assembly of mitogenic signalling complexes through the effector protein IMP. Nature 427, 256-260, doi:http://www.nature.com/nature/journal/v427/n6971/suppinfo/nature022 37_S1.html (2004).

236 Ozaki, K. & Tanaka, T. Molecular genetics of coronary artery disease. J Hum Genet 61, 71-77, doi:10.1038/jhg.2015.70 (2016).

237 Astle, W. J. et al. The Allelic Landscape of Human Blood Cell Trait Variation and Links to Common Complex Disease. Cell 167, 1415-1429.e1419, doi:http://doi.org/10.1016/j.cell.2016.10.042 (2016).

238 Fan, Y. & Lu, D. The Ikaros family of zinc-finger proteins. Acta Pharm Sin B 6, 513- 521, doi:10.1016/j.apsb.2016.06.002 (2016).

239 Perdomo, J., Holmes, M., Chong, B. & Crossley, M. Eos and pegasus, two members of the Ikaros family of proteins with distinct DNA binding activities. J Biol Chem 275, 38347-38354, doi:10.1074/jbc.M005457200 (2000).

269

240 UniProt Consortium, T. UniProt: the universal protein knowledgebase. Nucleic Acids Res 46, 2699, doi:10.1093/nar/gky092 (2018).

241 Uniprot Consortium. Align: Clustal Omega alignment tool. https://www.uniprot.org/align/ (last accessed March 2018).

242 MacArthur, D. G. et al. Guidelines for investigating causality of sequence variants in human disease. Nature 508, 469-476, doi:10.1038/nature13127 (2014).

243 Nakagawa, O. et al. ROCK-I and ROCK-II, two isoforms of Rho-associated coiled- coil forming protein serine/threonine kinase in mice. FEBS Letters 392, 189-193, doi:http://dx.doi.org/10.1016/0014-5793(96)00811-3 (1996).

244 Julian, L. & Olson, M. F. Rho-associated coiled-coil containing kinases (ROCK). Small GTPases 5, e29846, doi:10.4161/sgtp.29846 (2014).

245 Lock, F. E. & Hotchin, N. A. Distinct roles for ROCK1 and ROCK2 in the regulation of keratinocyte differentiation. PLoS One 4, e8190, doi:10.1371/journal.pone.0008190 (2009).

246 Shi, J. et al. Distinct roles for ROCK1 and ROCK2 in the regulation of cell detachment. Cell Death Dis 4, e483, doi:http://www.nature.com/cddis/journal/v4/n2/suppinfo/cddis201310s1.ht ml (2013).

247 Kümper, S. et al. Rho-associated kinase (ROCK) function is essential for cell cycle progression, senescence and tumorigenesis. Elife 5, e12994, doi:10.7554/eLife.12203 (2016).

248 Doran, J. D., Liu, X., Taslimi, P., Saadat, A. & Fox, T. New insights into the structure- function relationships of Rho-associated kinase: a thermodynamic and hydrodynamic study of the dimer-to-monomer transition and its kinetic implications. Biochem J 384, 255-262, doi:10.1042/bj20040344 (2004).

249 Wen, W., Liu, W., Yan, J. & Zhang, M. Structure Basis and Unconventional Lipid Membrane Binding Properties of the PH-C1 Tandem of Rho Kinases. Journal of Biological Chemistry 283, 26263-26273, doi:10.1074/jbc.M803417200 (2008).

250 Jacobs, M. et al. The Structure of Dimeric ROCK I Reveals the Mechanism for Ligand Selectivity. Journal of Biological Chemistry 281, 260-268, doi:10.1074/jbc.M508847200 (2006).

251 Ishizaki, T. et al. The small GTP-binding protein Rho binds to and activates a 160 kDa Ser/Thr protein kinase homologous to myotonic dystrophy kinase. EMBO J 15 (1996).

252 Coleman, M. L. et al. Membrane blebbing during apoptosis results from caspase- mediated activation of ROCK I. Nat Cell Biol 3, 339-345, doi:http://www.nature.com/ncb/journal/v3/n4/suppinfo/ncb0401_339_S1.ht ml (2001).

270

253 Amano, M. et al. Formation of Actin Stress Fibers and Focal Adhesions Enhanced by Rho-Kinase. Science 275, 1308-1311, doi:10.1126/science.275.5304.1308 (1997).

254 Shimizu, Y. et al. ROCK-I regulates closure of the eyelids and ventral body wall by inducing assembly of actomyosin bundles. J Cell Biol 168, doi:10.1083/jcb.200411179 (2005).

255 Chang, J. et al. Activation of Rho-associated coiled-coil protein kinase 1 (ROCK-1) by caspase-3 cleavage plays an essential role in cardiac myocyte apoptosis. Proceedings of the National Academy of Sciences 103, 14495-14500, doi:10.1073/pnas.0601911103 (2006).

256 Noma, K. et al. ROCK1 mediates leukocyte recruitment and neointima formation following vascular injury. J Clin Invest 118, 1632-1644, doi:10.1172/jci29226 (2008).

257 Zhao, Z. & Rivkees, S. A. Rho-associated kinases play an essential role in cardiac morphogenesis and cardiomyocyte proliferation. Dev Dyn 226, 24-32, doi:10.1002/dvdy.10212 (2003).

258 Biro, M., Munoz, M. A. & Weninger, W. Targeting Rho-GTPases in immune cell migration and inflammation. Br J Pharmacol 171, 5491-5506, doi:10.1111/bph.12658 (2014).

259 Defert, O. & Boland, S. Rho kinase inhibitors: a patent review (2014 - 2016). Expert Opin Ther Pat 27, 507-515, doi:10.1080/13543776.2017.1272579 (2017).

260 Moore, C. et al. The INTERVAL trial to determine whether intervals between blood donations can be safely and acceptably decreased to optimise blood supply: study protocol for a randomised controlled trial. Trials 15, 363, doi:10.1186/1745- 6215-15-363 (2014).

261 Lochhead, P. A., Wickman, G., Mezna, M. & Olson, M. F. Activating ROCK1 somatic mutations in human cancer. Oncogene 29, 2591-2598, doi:http://www.nature.com/onc/journal/v29/n17/suppinfo/onc20103s1.html (2010).

262 Dasgupta, S. K. et al. Rho associated coiled-coil kinase-1 regulates collagen- induced phosphatidylserine exposure in platelets. PLoS One 8, e84649, doi:10.1371/journal.pone.0084649 (2013).

263 Kambas, K., Mitroulis, I. & Ritis, K. The emerging role of neutrophils in thrombosis- the journey of TF through NETs. Front Immunol 3, 385, doi:10.3389/fimmu.2012.00385 (2012).

264 Deppermann, C. Platelets and vascular integrity. Platelets, 1-7, doi:10.1080/09537104.2018.1428739 (2018).

271

265 NIHR Bioresource. NIHR Bioresource - Rare Diseases. https://bioresource.nihr.ac.uk/rare-diseases/rare-diseases/ (last accessed June 2018).

266 Robinson, P. N. & Mundlos, S. The human phenotype ontology. Clin Genet 77, 525- 534, doi:10.1111/j.1399-0004.2010.01436.x (2010).

267 Study, D. D. D. Large-scale discovery of novel genetic causes of developmental disorders. Nature 519, 223-228, doi:10.1038/nature14135 (2015).

268 Freson, K. & Turro, E. High-throughput sequencing approaches for diagnosing hereditary bleeding and platelet disorders. Journal of Thrombosis and Haemostasis 15, 1262-1272, doi:10.1111/jth.13681 (2017).

269 Gresele, P. et al. Diagnosis of suspected inherited platelet function disorders: results of a worldwide survey. J Thromb Haemost 12, 1562-1569, doi:10.1111/jth.12650 (2014).

270 Albers, C. A. et al. Compound inheritance of a low-frequency regulatory SNP and a rare null mutation in exon-junction complex subunit RBM8A causes TAR syndrome. Nat Genet 44, 435-439, doi:http://www.nature.com/ng/journal/v44/n4/abs/ng.1083.html#supplemen tary-information (2012).

271 Xue, Y. et al. Deleterious- and disease-allele prevalence in healthy individuals: insights from current predictions, mutation databases, and population-scale resequencing. Am J Hum Genet 91, 1022-1032, doi:10.1016/j.ajhg.2012.10.015 (2012).

272 ClinGen. Clinical Genome Resource. https://www.clinicalgenome.org (Accessed March 2018).

273 Matthijs, G. et al. Guidelines for diagnostic next-generation sequencing. Eur J Hum Genet 24, 2-5, doi:10.1038/ejhg.2015.226 (2016).

274 Amendola, L. M. et al. Performance of ACMG-AMP Variant-Interpretation Guidelines among Nine Laboratories in the Clinical Sequencing Exploratory Research Consortium. Am J Hum Genet 99, 247, doi:10.1016/j.ajhg.2016.06.001 (2016).

275 Vidal, M., Cusick, M. E. & Barabasi, A. L. Interactome networks and human disease. Cell 144, 986-998, doi:10.1016/j.cell.2011.02.016 (2011).

276 Rolland, T. et al. A Proteome-Scale Map of the Human Interactome Network. Cell 159, 1212-1226, doi:10.1016/j.cell.2014.10.050.

272

Appendix

1. Case Report Form 1

Communication Record Name: Study:

Hospital Number: Date of birth:

Tel. Home: Mobile: Work:

GP details:

Date of contact:

Outcome:

Date of contact:

Outcome:

Date of contact:

Outcome:

273

General information Age at Presentation (age of onset of first significant symptoms/signs): Gender: Ethnicity: Reason for entering study: Bleeding disorder/Abnormality of platelet number or morphology/Both Clinical Details of bleeding/platelet problem and Current Diagnosis:

Relevant genetic testing already performed:

Clinical syndrome & Orphanet number (if the patient phenotype fits a known clinical syndrome but no gene has been identified):

274

Name of Patient: Date of Clinic:

Hospital Number: Name of Assessor:

Date of Birth: Bleeding Score:

Grades of bleeding severity used to compute the bleeding score

Symptom - 1 0 1 2 3 4

Epistaxis - No or trivial >5 episodes or Consultation Packing or Blood (<5 episodes >10 min only cauterisation products / / year ) duration or TXA DDAVP / concentrate Cutaneous - No or >1cm and no Consultation trivial (<1 c trauma only m) Bleeding from - No or trivial >5 episodes or Consultation Surgical Blood minor wounds (<5 episodes > 5 min only haemostasis products / / year ) DDAVP / concentrate

Oral cavity - None Referred at Consultation Surgical Blood least once only haemostasis or products / TXA DDAVP / concentrate Tooth extraction No bleeding No Referred in Referred in Resuturing Blood in at least 2 extractions <25% of all ≥25% of all or packing products / extractions or no procedures, procedures, DDAVP / bleeding in 1 no concentrate extraction intervention Surgery No bleeding None Referred in Referred in Surgical Blood in at least 2 performed <25% of all ≥25% of all haemostasis or products / surgeries or no surgeries procedures, TXA DDAVP / bleeding in no concentrate one Surgery intervention Menorrhagia - No Consultation TXA or pill Dilatation & Blood only use Curettage, iron products / therapy DDAVP / concentrate / hysterectomy

275

Postpartum No bleeding No Consultation Dilatation & Blood Hysterectomy haemorrhage in at least 2 deliveries or only Curettage, products / deliveries no bleeding iron therapy, DDAVP / in one TXA concentrate delivery Muscle - Never Post-trauma no Spontaneous, Spontaneous Surgical Haematomas therapy no therapy or traumatic, intervention or treated with blood DDAVP / transfusion concentrate Haemarthrosis - Never Consultation Spontaneous, Spontaneous Surgical only no therapy or traumatic, intervention or treated with blood DDAVP / transfusion concentrate

CNS Bleeding - Never - - Subdural, any Intra cerebral, intervention any intervention GI Bleeding - No Associated Spontaneous Surgical with GI haemostasis pathology e.g. Blood ulcer, varices, products / haemorrhoids DDAVP concentrate/ TXA

MCMDM - 1, Molecular and clinical markers for diagnosis and management of type 1 Von Willebrands disease

TXA –Tranexamic acid, DDAVP – Desmopressin

Full details of the clinical questions on which the above questionnaire is based.

Epistaxis:

1. Current average number/ year, age of maximum severity with average number of episodes, any seasonal variation 2. the average duration of Epistaxis, and the duration of the longest episode 3. Spontaneous cessation, short compression or other medical interventions

Cutaneous Symptoms:

276

1. New onset bruising or a lifelong history of bruising, type of symptoms - petechiae, bruises and subcutaneous haematomas. 2. Average size, location in exposed or unexposed areas, related to trauma or spontaneous and the types of intervention.

Bleeding from Minor Wounds:

1. The average number of episodes per year, average duration of bleeding (? >5min), and any medical attention required

Oral Cavity Bleeding:

1. spontaneous gum bleeding or bleeding from the gums after brushing 2. Bleeding following a tooth eruption or and bleeding from after self-induced bites to lips and tongue 3. Medical or surgical interventions required to arrest bleeding

Tooth Extraction:

1. total number of teeth extracted, including permanent, molar and deciduous teeth 2. Bleeding following tooth extraction, i.e. suturing or packing or comments from the dental surgeons

Surgery:

1. The total number of surgeries, including minor and major, and the number with no bleeding 2. comments from surgeons and other medical interventions

Menorrhagia:

1. Age of menarche, the present duration of average menstruation / cycle length, regularity, duration of heavy days, change of pads 2. age when menstruation became heavy and the type of treatment that was required

Postpartum Haemorrhage:

1. The number of deliveries, and number followed by bleeding 2. Terminations and miscarriages, if followed by profuse bleeding 3. the type of assistance provided to help with PPH

GI Bleeding:

1. If haematemesis, malena or fresh blood PR reported, detail medical investigations for the underlying diagnosis

277

Haematomas and Haemarthrosis:

1. spontaneous or traumatic bleed, elaborate the location, precipitating events and resolution

CNS Bleeding

1. Please elaborate

278

Past medical history: Ask specifically about Autoimmune disease; Skeletal abnormalities/dysmorphism; Eye disease; Renal disease; Deafness; Congenital heart disease; Acquired cardiovascular disease in young; Metabolic disease; Developmental problems/Learning Difficulties; Neurological or psychological disorders; Skin disease; Malignancy

Family History and Pedigree tree: Again, ask specifically about the disorders above

279

Laboratory Data: (Other than a platelet count and observation of plt size (either via MPV or blood film analysis), these fields are not all mandatory but clinicians should record whatever information is available) FBC

Coagulation screen

Other relevant coagulation tests (eg. VWF levels, clotting factors etc...)

Platelet function testing

PFA-100

Light transmission aggregometry

Lumi-aggregometry

Nucleotide assays

Flow cytometry

Microscopy Blood Film

Bone Marrow

Electron Microscopy

280

Overall Clinical Interpretation (and any other relevant tests/information):

281

2. Case report form 2

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

3. The phenotypic abnormalities associated with defects in 63 BPD genes coded as HPO terms.

The HPO-coded phenotypes of cases can be compared to these gene-specific HPO profiles to prioritise candidate variants. (As published in Simeoni et al, Blood 2016)

Gene HPO profile ANKRD26 Abnormal alpha granule content; Abnormality of cells of the megakaryocyte lineage; Bruising susceptibility; Epistaxis; Petechiae; Thrombocytopenia

AP3B1; Abnormal bleeding; Abnormal platelet aggregation; Abnormal platelet dense granule BLOC1S3; secretion; Abnormality of retinal pigmentation; Albinism; Hypoplasia of the fovea; DTNBP1; Immunodeficiency; Inflammation of the large intestine; Iris transillumination defect; HPS1; HPS3; Neutropenia; Nystagmus; Pulmonary fibrosis; Severe visual impairment; Strabismus HPS4; HPS5; HPS6

CYCS Thrombocytopenia

F10 Abnormal bleeding; Prolonged partial thromboplastin time; Prolonged prothrombin time; Reduced factor X activity

F11 Persistent bleeding after trauma; Prolonged bleeding following procedure; Prolonged partial thromboplastin time; Reduced factor XI activity

F13A1; F13B Abnormal umbilical stump bleeding; Reduced factor XIII activity; Spontaneous abortion

F2 Abnormal bleeding; Prolonged partial thromboplastin time; Prolonged prothrombin time; Reduced prothrombin activity

F5 Abnormal bleeding; Prolonged partial thromboplastin time; Prolonged prothrombin time; Reduced factor V activity

F7 Abnormal bleeding; Prolonged prothrombin time; Reduced factor VII activity

F8 Joint hemorrhage; Prolonged partial thromboplastin time; Reduced factor VIII activity; X-linked inheritance

F9 Joint hemorrhage; Prolonged partial thromboplastin time; Reduced factor IX activity; X- linked inheritance

FGA; FGB; Abnormal bleeding; Deep venous thrombosis; Dysfibrinogenemia; Hypofibrinogenemia; FGG Prolonged prothrombin time; Pulmonary embolism; Splenic rupture; Spontaneous abortion

FLI1 Abnormal alpha granules; Abnormal bleeding; Clinodactyly of the 5th finger; Coarctation of aorta; Cognitive impairment; Downturned corners of mouth; Hepatomegaly; Hypertelorism; Long philtrum; Low-set ears; Macrocephaly; Micrognathia; Prominent epicanthal folds; Ptosis; Pyloric stenosis; Syndactyly; Thin upper lip vermilion; Thrombocytopenia; Trigonocephaly; Wide nasal bridge

FLNA Giant platelets; Intestinal pseudo-obstruction; Patent ductus arteriosus; Thrombocytopenia

GATA1 Abnormal alpha granules; Abnormal bleeding; Abnormality of cells of the

297

megakaryocyte lineage; Increased mean platelet volume; Macrocytic dyserythropoietic anemia; Thrombocytopenia

GGCX Abnormal bleeding; Angioid streaks of the retina; Atherosclerosis; Cutis laxa; Epiphyseal stippling; Prolonged partial thromboplastin time; Prolonged prothrombin time; Reduced factor IX activity; Reduced factor VII activity; Reduced factor X activity; Reduced protein C activity; Reduced protein S activity; Reduced prothrombin activity; Short distal phalanx of finger; Short nose

GP1BA Abnormal bleeding; Decreased platelet glycoprotein Ib-IX-V; Impaired ristocetin- induced platelet aggregation; Increased mean platelet volume; Intermittent thrombocytopenia; Reduced quantity of Von Willebrand factor; Total absence von Willebrand Factor multimers

GP1BB; GP9 Abnormal bleeding; Decreased platelet glycoprotein Ib-IX-V; Impaired ristocetin- induced platelet aggregation; Increased mean platelet volume; Thrombocytopenia

GP6 Abnormal bleeding; Decreased platelet glycoprotein VI; Impaired collagen-induced platelet aggregation; Thrombocytopenia

HOXA11 Abnormal bleeding; Clinodactyly of the 5th finger; Hip dislocation; Limited pronation/supination of forearm; Megakaryocytopenia; Radioulnar synostosis; Shallow acetabular fossae; Thrombocytopenia

HRG; PLAT Abnormal thrombosis

ITGA2B; ITGB3 Abnormal bleeding; Decreased platelet glycoprotein IIb-IIIa; Impaired platelet aggregation

KLKB1 Abnormality of the kinin-kallikrein system; Prolonged partial thromboplastin time

KNG1 Prolonged partial thromboplastin time; Reduced kininogen activity

LMAN1; Epistaxis; Menorrhagia; Persistent bleeding after trauma; Prolonged partial MCFD2 thromboplastin time; Prolonged prothrombin time; Reduced factor V activity; Reduced factor VIII activity

LYST Abnormal bleeding; Abnormal platelet dense granule secretion; Abnormality of retinal pigmentation; Areflexia; Ataxia; Cognitive impairment; Gait imbalance; Hemophagocytosis; Immunodeficiency; Impaired neutrophil bactericidal activity; Iris transillumination defect; Large clumps of pigment irregularly distributed along hair shaft; Neutropenia; Nystagmus; Parkinsonism; Partial albinism; Peripheral neuropathy; Spastic paraplegia; Tremor; Visual impairment

MPL Abnormal bleeding; Amegakaryocytic thrombocytopenia; Megakaryocytopenia; Pancytopenia

MYH9 Abnormal bleeding; Cataract; Giant platelets; Nephropathy; Neutrophil inclusion bodies; Sensorineural hearing impairment; Thrombocytopenia

NBEA Abnormal dense granules; Autism

NBEAL2 Abnormal alpha granules; Abnormal bleeding; Abnormal serum cobalamin; Increased mean platelet volume; Myelofibrosis; Splenomegaly; Thrombocytopenia

P2RY12 Impaired ADP-induced platelet aggregation; Persistent bleeding after trauma; Prolonged bleeding after surgery

PLA2G4A Duodenal ulcer; Gastrointestinal hemorrhage; Impaired platelet aggregation; Jejunoileal ulceration

298

PLAU Hematuria; Impaired epinephrine-induced platelet aggregation; Joint hemorrhage; Persistent bleeding after trauma; Prolonged bleeding after surgery; Thrombocytopenia

PLG Conjunctivitis; Deep venous thrombosis; Hydrocephalus; Pulmonary embolism

PROC Deep venous thrombosis; Pulmonary embolism; Reduced protein C activity; Warfarin- induced skin necrosis

PROS1 Deep venous thrombosis; Pulmonary embolism; Reduced protein S activity

RBM8A Abnormal bleeding; Abnormality of the genitourinary system; Aplasia/hypoplasia involving bones of the lower limbs; Aplasia/Hypoplasia of the radius; Arteriovenous malformation; Cleft palate; Edema of the dorsum of feet; Lactose intolerance; Malformation of the heart and great vessels; Sensorineural hearing impairment; Thrombocytopenia

RUNX1 Abnormal bleeding; Impaired ADP-induced platelet aggregation; Myelodysplasia; Myeloid leukemia; Thrombocytopenia

SERPINC1 Abnormal thrombosis; Pulmonary embolism; Reduced antithrombin III activity

SERPIND1 Arterial thrombosis; Recurrent deep vein thrombosis

SERPINE1 Menorrhagia; Persistent bleeding after trauma; Prolonged bleeding following procedure

TBXA2R Abnormal bleeding; Impaired arachidonic acid-induced platelet aggregation; Impaired thromboxane A2 agonist-induced platelet aggregation

TBXAS1 Abnormal cortical bone morphology; Abnormal form of the vertebral bodies; Abnormality of immune system physiology; Abnormality of the femur; Abnormality of the metaphyses; Abnormality of the tibia; Bone marrow hypocellularity; Bowing of the long bones; Bruising susceptibility; Craniofacial hyperostosis; Diaphyseal dysplasia; Hyperostosis cranialis interna; Increased bone mineral density; Myelofibrosis; Prolonged bleeding time; Refractory anemia; Splenomegaly; Thrombocytopenia

THBD Cerebral venous thrombosis; Reduced protein C activity

THPO Myeloproliferative disorder; Thrombocytosis

TUBB1 Giant platelets; Thrombocytopenia

VKORC1 Intracranial hemorrhage; Prolonged partial thromboplastin time; Prolonged prothrombin time; Reduced factor IX activity; Reduced factor VII activity; Reduced factor X activity; Reduced protein C activity; Reduced protein S activity; Reduced prothrombin activity

VWF Absence of intermediate von Willebrand factor multimers (sic); Absence of large von Willebrand factor multimers (sic); Enhanced ristocetin cofactor assay activity; Impaired ristocetin cofactor assay activity; Impaired ristocetin-induced platelet aggregation; Impaired von Willebrand factor collagen binding activity (sic); Joint hemorrhage; Persistent bleeding after trauma; Prolonged bleeding time; Prolonged partial thromboplastin time; Reduced factor VIII activity; Reduced quantity of Von Willebrand factor; Reduced von Willebrand factor activity; Thrombocytopenia

WAS Abnormal bleeding; Autoimmunity; Decreased mean platelet volume; Eczema; Hyperostosis; Immunodeficiency; Increased IgA level; Neoplasm; Nephropathy; Thrombocytopenia

299

Permissions

Figure Reference Journal Permission Fig 1.1 7 Experimental Haematology Granted Fig 1.1 10 BJ Haematology Granted Fig 1.2 19 J Cell Biology Granted Fig 1.4 47 B J Haematology Granted Fig 1.5 48 Phys Reviews Granted Fig 1.8 83 Blood Not required Fig 1.9 89 Blood Granted Fig 3.6 145 Genome Medicine Not required Fig 3.7 145 Genome Medicine Not required Fig 3.10 78 Blood Not required Fig 7.4 250 J Cell Biology Granted

300