ELUCIDATION OF THE GENETIC ETIOLOGY OF HEMOSTASIS AND THROMBOSIS

THROUGH FAMILY-BASED STUDIES

By

JESSE DARRELL HINCKLEY

B.S., Brigham Young University, 2005

A thesis submitted to the

Faculty of the Graduate School of the

University of Colorado in partial fulfillment

of the requirements for the degree of

Doctor of Philosophy

Human Medical Genetics Program

2013 This thesis for the Doctor of Philosophy degree by

Jesse Darrell Hinckley

has been approved for the

Human Medical Genetics Program

by

Tasha Fingerlin, Chair

Jorge Di Paola, Advisor

Douglas Graham

Elaine Spector

Richard Spritz

Date __04/15/13__

ii

Hinckley, Jesse Darrell (Ph.D., Human Medical Genetics)

Elucidation of the Genetic Etiology of Hemostasis and Thrombosis through Family-

Based Studies

Thesis directed by Associate Professor Jorge A. Di Paola

ABSTRACT

Von Willebrand Factor (VWF) plays a key role in primary hemostasis by initiating platelet recruitment and adhesion and an indirect role in secondary hemostasis as the noncovalent chaperone of coagulation factor VIII (FVIII) in circulation. Larger blood components such as erythrocytes and leukocytes contribute to the etiology of hemostasis and thrombosis by displacing platelets and VWF toward the endothelium.

I first characterized an Amish pedigree with VWD type 2. Genome-wide linkage analyses identified candidate quantitative trait loci (QTL) of VWF and FVIII levels on 6p22, 9q34 (ABO), and 16q11. I also replicated previously reported associations confirming STXBP5, SCARA5, STAB2, and TC2N as candidate QTL.

Secondly, I evaluated VWF single nucleotide polymorphisms (SNPs) as sources of genetic variation of VWF levels in the Amish pedigree. I replicated association with two SNPs in three replication cohorts (n=3,316). I also found significant differences of the minor allele frequency of five SNPs between VWD type 1 index cases (n=310) and a control cohort.

Thirdly, I mapped candidate QTL of complete blood count traits by genome-wide quantitative trait linkage analysis in the Amish pedigree. I identified four candidate loci with LOD scores above 2.0: 6q25, 9q33, 10p12, and 20q13. I also demonstrated that the most common causal variant of hereditary hemochromatosis was associated with common variation of complex erythrocyte traits.

iii

Fourthly, I characterized the novel VWF c.3911T>G mutation in a family that demonstrated phenotypic variability. Results demonstrated the extent of mutant VWF peptide incorporation into multimers may explain phenotypic variability in VWD patients.

Fifthly, I studied two families with inherited thrombocytopenias. Homozygosity mapping and analysis of platelet mRNA in a large Native American kindred identified

NBEAL2 as the most likely cause of gray platelet syndrome. Whole genome sequence analysis in a family with autosomal dominant thrombocytopenia (THC2) identified

MASTL and ANKRD26 as the most likely causal .

Taken together, these findings demonstrate the importance of multiple family- based study strategies in elucidating the genetic etiology of hemostasis and thrombosis.

The genes and candidate QTL reported herein provide the foundation for molecular biology experiments to further characterize the complex pathways that mediate hemostasis and thrombosis.

The form and content of this abstract are approved. I recommend its publication.

Approved: Jorge A. Di Paola

iv

DEDICATION

This thesis is dedicated to Frank Frerman and MJ Stewart, two valuable mentors and friends whose lives and friendship motivate me to excel in my academic pursuits in order to be able make a meaningful contribution to the field of hematology and oncology.

They grounded me and gave my training purpose.

This thesis is also dedicated to my wife who followed me to Colorado and has supported me in decisions that have taken us away from family and friends on more than one occasion.

v

ACKNOWLEDGMENTS

This work would not have been possible without my thesis advisor, Jorge Di

Paola. He provided me the space and resources to grow as a scientist by allowing me to lead out on collaborative projects and to attend conferences and meetings. I greatly appreciate all that has been provided during my dissertation. This work also would not have been possible without all of the amazing families that were willing to give of their time and blood (literally). My experience was enriched by participating in patient enrollment and meeting so many selfless patients.

I would not be here without the many others that made my transfer possible, including the Human Medical Genetics and Genomics Program director Rich Spritz. Rich has also been an extremely valuable mentor and committee member. He challenged me, improving the depth of my training and the quality of my science.

During my time in the program, I had two of the best committee chairs: Frank

Frerman and Tasha Fingerlin. Frank was a close friend and mentor who treated students as equals. He was an anchor through the turbulence of academic politics and a true advocate as my committee chair. Tasha is no second to Frank as a committee chair. I thank her for agreeing to serve as my chair this past year. Her expertise and support were invaluable. She has also been an advocate and a voice of reason.

I am also grateful for the mentorship I have received from Doug Graham and

Elaine Spector as committee members. Their friendship and insight enriched my training experience. Doug also served as my first clinical mentor in the Medical Scientist Training

Program at Colorado, and I have learned much from him as a physician scientist. I also owe a debt of gratitude to Stephen Hunger, section chief for Hematology and Oncology at Children’s Hospital Colorado, for his efforts to make my transfer possible.

I am thankful to those that entered the trenches of science, helped me find my footing, and contributed their own time and efforts to the work presented herein. Kai

vi

Wang and Trudy Burns at the University of Iowa spent countless hours conducting statistical genetics analyses and editing manuscripts. Amy Shapiro and Meadow

Heimann at the Indiana Hemophilia and Thrombosis Center made it possible to enroll so many members of the Amish pedigree. Sandy Haberichter and Pam Christopherson at the Blood Center of Wisconsin were invaluable in providing expertise and resources.

Many other scientists have made their labs and resources available including Andy

Weyrich at the University of Utah; Walter Kahr at the Hospital for Sick Kids in Toronto;

Karl Desch, Jordan Shavit, and David Ginsburg at the University of Michigan; Bob

Montgomery at the Blood Center of Wisconsin; Jose Lopez at the Puget Sound Blood

Center; and Jeff Murray at the University of Iowa.

Friends and coworkers at the University of Colorado have also assisted with this research including Di Paola Lab alumni Shay Fabbro, Shawn Jobe, and Brie Nixon and our current lab manager Gary Brodsky. Randy Wong has been invaluable. I am also grateful for resources provided by Uma Pugazhenthi, Anna Peljto, and many others.

In addition to those who have assisted at the bench, I am thankful for all our amazing administrators who make science happen including Tina Green, Cammie

Kennedy, Judie Lehman, and MJ Stewart. My research was also supported by the

Hemophilia and Thrombosis Center, and I owe a debt of gratitude to Marilyn Manco-

Johnson and Judy Primeaux for their support. I also appreciate the support of Tamim

Shaikh, Arthur Gutierrez-Hartmann, Angie Ribera, and Jodi Cropper.

Most importantly, I am thankful for good friends and family that have grounded me during my research experience including fellow genetics students, Taylor Blades,

Alex Blasky, Cassidy Punt, and Erin Bashal. My wife, Latha, has tolerated me spending nights and weekends sequestered at the computer. She has been very understanding and facilitating. I am also grateful to my parents and immediate family members who have been encouraging and understood our absence during mandatory training events.

vii

CONTENTS

CHAPTER

I. HEMOSTASIS AND THROMBOSIS...... 1

Significance...... 1

Background...... 2

Thrombus formation: VWF, FVIII, and platelets...... 4

Von Willebrand factor...... 6

FVIII and the coagulation cascade...... 12

Environmental and genetic etiology of VWF and FVIII...... 16

Platelets...... 20

Objective...... 24

II. ANALYSIS IN A MULTIGENERATIONAL AMISH PEDIGREE IDENTIFIES NOVEL CANDIDATE QTL THAT AFFECT VWF AND FVIII LEVELS AND REPLICATES GWAS QTL ASSOCIATIONS...... 26

Abstract...... 26

Introduction...... 27

Methods...... 29

Participants...... 29

VWF:Ag, VWF:RCo and FVIII:C levels...... 30

Bleeding score...... 30

ABO blood group...... 31

VWF sequencing...... 31

In silico modeling...... 32

Cell culture, transfection, and multimer analysis...... 32

Genome-wide scan...... 33

SNP selection and genotyping...... 34

Heritability, linkage, and association analyses...... 34

viii

STXBP5 and STAB2 sequencing...... 35

Results...... 35

VWF and factor VIII levels are variable yet reproducible...... 35

VWF:Ag, VWF:RCo and FVIII:C are heritable traits in the pedigree ...... 37

A c.4120C>T missense mutation in the A1 domain of VWF underlies familial VWD...... 37

c.4120C>T demonstrates complete penetrance for VWF levels but incomplete penetrance for bleeding scores...... 39

The ABO blood group is a modifier of VWF levels in the Amish pedigree...... 41

Linkage analysis identifies candidate QTL for VWF:Ag and VWF:RCo...... 42

Common variants in ABO, but not ADAMTS13, contribute to the 9q34 QTL...... 44

Chromosome 6p22 and 16q11 candidate QTL fine-mapping is ongoing...... 48

STXBP5, SCARA5, STAB2, and TC2N are confirmed as candidate QTL...... 49

Two-point linkage analysis identifies regions of suggestive linkage to the bleeding score phenotype...... 50

Discussion...... 51

III. VWF SNP ANALYSIS IDENTIFIES MULTIPLE COMMON VARIANTS THAT OCCUR AT DIFFERENT FREQUENCIES IN VWD TYPE 1 PATIENTS...... 56

Abstract...... 56

Introduction...... 57

Methods...... 59

Participants...... 59

Amish pedigree...... 59

ZPMCB-VWD cohort...... 59

Genes and Blood-Clotting and Trinity Student cohorts...... 60

ix

VWF:Ag and VWF:RCo levels...... 60

SNP selection and genotyping in the Amish pedigree...... 60

SNP genotyping in the ZPMCB-VWD control and the VWD type 1 Index case cohorts...... 61

Genotype imputation in GABC and TSS cohorts...... 61

Statistical analysis...... 63

Results...... 63

Single SNP analysis in the Amish identifies 12 SNPs associated with VWF:RCo...... 65

Replication of SNP associations in the ZPMCB-VWD, GABC, and TSS cohorts...... 68

Analysis of minor allele frequency in the ZPMCB-VWD type 1 index case cohort...... 71

Discussion...... 73

IV. QUANTITATIVE TRAIT LOCUS LINKAGE ANALYSIS IN A LARGE AMISH PEDIGREE IDENTIFIES NOVEL CANDIDATE LOCI FOR ERYTHROCYTE TRAITS...... 78

Abstract...... 78

Introduction...... 78

Methods...... 81

Amish pedigree and complete blood counts...... 81

Genome-wide scan...... 81

HFE rs1800562 and ABO rs687289 genotyping...... 82

Covariate, heritability and linkage analyses...... 82

Results...... 83

Description of CBC traits and heritability estimates...... 83

Genome-wide QTL linkage analysis of CBC traits...... 85

Proportion of variation attributable to candidate QTL...... 91

HFE C282Y is associated with MCH and MCHC...... 92

x

Discussion...... 92

V. DIFFERENTIAL VWF MUTANT MONOMER INCORPORATION DRIVES THE PHENOTYPIC VARIABILITY OBSERVED IN A FAMILY WITH VWD TYPE 2...... 96

Abstract...... 96

Introduction...... 97

Methods...... 98

Participants...... 98

VWF testing...... 99

Bleeding score...... 99

Isolation of genomic DNA and VWF sequencing...... 100

In silico molecular modeling...... 100

Mammalian cell culture and transfection...... 102

Confocal immunofluorescence microscopy...... 102

Allele-specific mRNA assay...... 103

AU/VWFa-11 nanobody...... 104

Results...... 104

VWD in this family is caused by a novel c.3911T>G mutation in VWF..105

The M1304R variant results in defective VWF multimerization, trafficking, and platelet binding...... 108

The phenotypic variability observed is not due to a hypomorphic VWF allele...... 112

VWF M1304R multimerization depends on the wild-type/mutant monomer ratio...... 113

The nanobody AU/VWFa-11 efficiently estimates mutant monomer incorporation in VWF multimers...... 113

Discussion...... 116

VI. MUTATIONS IN NBEAL2 ARE THE PUTATIVE CAUSE OF GRAY PLATELET SYNDROME...... 120

xi

Abstract...... 120

Introduction...... 121

Methods...... 122

Research subjects...... 122

Transmission electron microscopy...... 122

Homozygosity mapping and linkage analysis...... 123

Candidate sequencing and mutational analysis...... 123

Next generation RNA-sequencing...... 124

Protein sequence data collection and analysis...... 124

Multiple sequence alignment...... 125

Phylogeny reconstruction...... 125

Results...... 125

GPS locus maps to chromosome 3p21...... 126

NBEAL2 identified as putative GPS gene...... 130

Homozygosity for NBEAL2 mutations alters the platelet transcriptome...... 133

Discussion...... 135

VII. AN ANKRD26 5’-UTR MUTATION IN LINKAGE DISEQUILIBRIUM WITH A MASTL MUTATION IN A FAMILY WITH AUTOSOMAL DOMINANT THROMBOCYTOPENIA...... 138

Abstract...... 138

Introduction...... 139

Methods...... 140

Subjects...... 140

Linkage analysis...... 141

Whole genome sequencing and in silico modeling...... 141

Candidate mutation and THC2 tagging SNP sequencing...... 142

xii

RNA isoform analysis...... 142

Results...... 144

Whole genome sequencing identifies an ANKRD26 and a MASTL mutation...... 146

ANKRD26 and MASTL mutations are in linkage disequilibrium...... 150

ANKRD26 c.-126T>C variant identified in the original MASTL kindred and a third unrelated kindred...... 150

ANKRD26 is a widely expressed gene...... 152

Discussion...... 154

VIII. SUMMARY AND FUTURE DIRECTIONS...... 160

Summary...... 160

Future directions...... 163

REFERENCES...... 168

APPENDIX

A. VWF and Pseudogene Alignment for Primer Design...... 198

B. Reported CBC Candidate QTL...... 220

C. Marshfield Map Screening Set 16...... 229

D. VWF c.4120C>T Expression in Cell Culture...... 246

E. Chromosome 9q34 ABO-ADAMTS13 Tagging SNPs...... 247

F. VWF Tagging SNPs...... 249

G. VWF Tagging SNP Primers...... 250

H. Screenshot of rs216308 in the USCS Genome Browser with GERP++ and ENCODE Tracks...... 252

I. CBC Genome-Wide QTL Linkage Analysis LOD Scores > 1.0...... 254

J. M1304R Molecular Dynamics Simulations Methods...... 255

K. M1304R Individual I.1 and III.1 VWF Variants...... 259

L. NBEAL2 Sequencing Primers...... 261

xiii

M. Isolating CD45 Negative selected Platelets...... 262

N. Autosomal Dominant Thrombocytopenia Candidate Mutation and THC2 Locus Sequencing Primers...... 264

O. ANKRD26 qPCR Primers and Melt Curve Gels...... 267

P. Variants Mapping to the THC2 Locus Where Autosomal Dominant Thrombocytopenia Patients Are Heterozygous...... 271

Q. Autosomal Dominant Thrombocytopenia Candidate Mutation Sequencing Results in Full Pedigree...... 276

xiv

TABLES

TABLE

1.1 Classification of VWD...... 9

1.2 VWF and FVIII candidate QTL by association...... 19

1.3 VWF candidate QTL by linkage...... 20

1.4 Candidate QTL for platelet count...... 24

2.1 Description of the 384 examined pedigree members by gender...... 36

2.2 Heritability estimates of log-transformed biochemical phenotypes...... 37

2.3 VWF:Ag, VWF:RCo, and FVIII:C and the bleeding score by age group and VWF mutation status...... 40

2.4 Least squares mean VWF:Ag, VWF:RCo, and FVIII:C by ABO blood group and VWF 4120 genotype...... 41

2.5 SNPs associated with VWF levels...... 47

2.6 CHARGE and MARTHA meta-analyses SNPs evaluated for association with VWF and FVIII levels in the Amish pedigree...... 49

3.1 VWF levels by gender in Amish subjects with the CC4120 genotype...... 63

3.2 VWF tagging SNPs that achieved study-wide significance for association with VWF:RCo...... 66

3.3 Top six VWF SNPs associated with VWF:Ag...... 67

3.4 VWF levels by population-based replication cohort...... 69

3.5 MAF by ethnic group in the ZPMCB-VWD control cohort...... 69

3.6 P-values of association tests by ethnicity and phenotype...... 69

3.7 SNP MAF and p-values of association with VWF:Ag in the GABC and TSS replication cohorts...... 71

3.8 Comparison of MAF between the ZPMCB-VWD control and VWD type 1 index case cohorts...... 72

3.9 Odds ratio of having the minor allele in VWD type 1 index cases...... 73

3.10 MAF in VWD type 1 index cases by VWF mutation status...... 73

4.1 Comparison of CBC traits by gender and VWF mutation...... 83

xv

4.2 Correlation of CBC traits...... 84

4.3 CBC traits covariate and heritability estimates analysis...... 84

4.4 Genome-wide QTL linkage analysis results LOD score > 1.5...... 88

4.5 Previously published candidate QTL concomitant with Amish pedigree...... 89

5.1 Phenotypic characterization of family members...... 106

6.1 Three kindreds with gray platelet syndrome...... 127

6.2 Platelet transcriptome top 20 fold changes...... 134

7.1 ADT candidate mutations allowing for incomplete penetrance...... 147

7.2 ANKRD26 mRNA isoforms expression by tissue type...... 154

xvi

FIGURES

FIGURE

1.1 Hemostasis and thrombosis...... 3

1.2 VWF, FVIII, and platelets in thrombogenesis...... 5

1.3 The coagulation cascade...... 13

1.4 Congenital thrombocytopenia...... 22

2.1 VWF c.4120C>T mutation...... 38

2.2 SOLAR genome-wide QTL scans for VWF:Ag, VWF:RCo, and FVIII:C...... 42

2.3 SOLAR QTL chromosome scans for VWF:Ag and VWF:RCo...... 43

2.4 Haploview LD and R2 patterns demonstrate distinct blocks of LD corresponding to ABO and ADAMTS13...... 46

2.5 P-values of association between VWF:Ag and individual SNPs on 9q34...... 47

3.1 Haploview linkage disequilibrium and R2 patterns identify four haplotype blocks in VWF...... 64

3.2 P-values of association between VWF:RCo and single SNPs in VWF...... 66

4.1 Genome-wide SOLAR QTL linkage scans...... 86

4.2 Chromosome-specific SOLAR QTL linkage scans for LOD scores > 2.0...... 87

4.3 SOLAR QTL linkage scans of chromosome 4 for RBCC and MCV...... 89

4.4 SOLAR QTL linkage scan of chromosome 20 for MCV-MCH...... 91

5.1 Pedigree and VWF multimer analysis...... 105

5.2 Mutation analysis and in silico modeling of M1304R...... 107

5.3 Flexibility of the M1304R mutant vs. WT peptide in MD simulations...... 109

5.4 Expression and intracellular localization of recombinant VWF-M1304R...... 110

5.5 Cell culture VWF multimer and platelet binding analyses...... 111

5.6 Allele-specific VWF mRNA levels...... 112

5.7 Two-color multimer analysis demonstrates M1304R mutant incorporation...... 114

xvii

5.8 AU/VWFa-11 nanobody binding predicts differential mutant monomer incorporation into VWF multimers...... 115

5.9 Schematic representation VWF multimerization incorporating WT and 1:1 WT/mutant protomers while excluding mutant promoters...... 118

6.1 GPS kindreds and NBEAL2 mutations...... 126

6.2 GPS blood smear and platelet transmission electron microscopy...... 127

6.3 Linkage established to chromosome 3p21...... 128

6.4 High-density homozygosity mapping narrows the chromosome 3p21 interval to 1.4 Mb...... 129

6.5 RNA-Seq reveals intron 9 retention in NBEAL2...... 131

6.6 RNA-Seq reveals reduced NBEAL2 mRNA levels in GPS patients...... 132

6.7 NBEAL2 mutations alter the platelet transcriptome heat map...... 133

6.8 Phylogenetic analysis of NBEAL family members...... 137

7.1 Index case blood smear and bone marrow aspirate...... 144

7.2 ADT pedigree and candidate mutations...... 145

7.3 ANKRD26 5’-UTR structure predicted by mfold...... 149

7.4 THC2 interval linkage disequilibrium and pedigree haplotype analysis...... 151

7.5 Expression of ANKRD26 mRNA by tissue type...... 153

xviii

ABBREVIATIONS

ACS: acute coronary syndrome

ADT: autosomal dominant thrombocytopenia

ANOVA: Analysis of Variance

CBC: complete blood count

CEPH: Centre d’Etude du Polymorphisme Humain

CEU: Centre d’Etude du Polymorphisme Humain from Utah

CI: confidence interval cM: centi-Morgan

CVD: cardiovascular disease

FVIII: coagulation factor VIII

FVIII:C: factor VIII coagulant, a measure of FVIII clotting time

GPIbα: platelet glycoprotein Ib-alpha

GVC: genetic variance components

GWAS: genome-wide association study

HB: hemoglobin

HCT: hematocrit

IBD: identical by descent

ICC: intraclass correlation coefficient kb: kilobase

LD: linkage disequilibrium

LOD: logarithm of odds

MAF: minor allele frequency

Mb: megabase

MCH: mean corpuscular hemoglobin

MCHC: mean corpuscular hemoglobin concentration

xix

MCV: mean corpuscular volume

MI: myocardial infarction

OR: odds ratio

PAD: peripheral artery disease

PLT: platelet count

QTL: quantitative trait locus, quantitative trait loci

RBCC: red blood cell count

RDW: red cell distribution width

SNP: single nucleotide polymorphism

SOLAR: Sequential Oligogenic Linkage Analysis Routines

STRP: short tandem repeat polymorphic

VTE: venous thromboembolism

VWD: von Willebrand disease

VWF: von Willebrand factor

VWF:Ag: VWF antigen

VWF:CB: VWF collagen binding activity

VWF:RCo: ristocetin-induced VWF binding to platelets

VWFpp: VWF propeptide

WBCC: white blood cell count

WT: wild-type

xx

CHAPTER I

HEMOSTASIS AND THROMBOSIS

Significance

Hemostasis and thrombosis are important physiologic processes that maintain the integrity of the circulatory system through multiple interacting pathways. Excessive bleeding and pathologic thromboembolism, dichotomous presentations of abnormalities in hemostasis and thrombosis, respectively, are significant medical issues worldwide, including in the United States. In addition to posing a significant financial burden on the healthcare system, both hemostasis and thrombosis may be permanently debilitating or fatal. In the United States approximately 14 million blood transfusions are administered each year, with subsequent risk of transfusion-related reactions such as anaphylaxis and sepsis. Menorrhagia (heavy menstrual bleeding), in particular, represents a significant cause of morbidity, with approximately 100,000 admissions per year to acute-care hospitals. Of those admissions, 82% result in hysterectomy, yet 50% of patients have no identified cause of bleeding1,2. Post-partum bleeding is a leading cause of maternal mortality and causes serious morbidity in developing countries3. Furthermore, mild

bleeding and bruising are common in the general population in the absence of an

identifiable bleeding disorder, and some symptoms overlap between patients with a

bleeding disorder and normal controls4.

Pathologic thrombotic events are also a common clinical presentation and a leading cause of death in the United States. Venous thromboembolism (VTE), encompassing deep-vein thrombosis (DVT) and pulmonary embolism (PE), and myocardial infarction (MI) are two of the three most common presentations of cardiovascular disease (CVD). CVD affects an estimated 82.6 million American adults, and it is the most common cited underlying cause (32.8%) and contributing to cause

1 (55.0%) of deaths reviewed in 20085. The estimated prevalence of coronary heart

disease in the United States is 7.0%, with an incidence of 610,000 new MI and of

325,000 recurrent MI annually. Approximately 15% of patients with MI will die due to the

event. Furthermore, an estimated 1 million patients present each year with VTE, and

approximately two-thirds of VTE-related deaths have an iatrogenic component, making

VTE the most common preventable cause of death during hospitalization6. The overall stroke prevalence (hemorrhagic and ischemic) is an estimated 3%, with a prevalence of silent cerebral infarction ranging from 6 to 28%; approximately 795,000 individuals present with new or recurrent strokes annually5. Therefore, due to the prevalence of hemostatic and thrombotic pathology in our population, it is vital to better characterize the genetic etiology of key components of hemostasis and thrombosis in order to improve risk assessment and provide new hypotheses related to pathogenesis of excessive bleeding and pathologic thromboembolism. Furthermore, understanding the genetic basis of defects in hemostasis and thrombosis may yield new treatment insights.

Background

The circulatory system consists of the cardiovascular and the lymphatic systems.

The cardiovascular system is a closed system with regions of specialized endothelium that create zones of exchange between the blood and interstitial fluid. Blood is composed of various circulating cells, organic molecules, , and salt suspended in aqueous plasma7. Cell types include erythrocytes which carry oxygen to and carbon

dioxide from tissues, leukocytes which participate in the immune system, and platelets

which function primarily to maintain vascular integrity. Complete blood counts (CBC) are

one of the most commonly ordered clinical tests and are used to provide a global

evaluation of the patient’s cellular blood components and to assess overall wellbeing.

2 Standard CBC assays evaluate the quantification and size of erythrocytes, leukocytes, and platelets as well as several other erythrocyte traits.

The integrity of the cardiovascular system is maintained by complex interactions of multiple biologic pathways that mediate a balance between hemostasis and thrombosis (Figure 1.1). At the site of endothelial damage, a thrombus is formed through platelet plug formation (primary hemostasis) and deposition of cross-linked fibrin fibrils as the result of the coagulation cascade (secondary hemostasis) to strengthen the platelet plug8. Von Willebrand factor (VWF) plays a key role in primary hemostasis by recruiting and activating platelets at the site of endothelial damage and by functioning as the noncovalent chaperone of coagulation factor VIII (FVIII), thus participating indirectly in secondary hemostasis. Additionally, hemostasis is tempered by multiple regulatory pathways to allow the circulatory system to be fluid yet dynamically primed to respond to hemostatic challenges. Each pathway is affected by environmental and genetic factors as well as by vascular and endothelial physiology. The convergence of these various pathways and components produces a finely balanced system that, when altered or deficient, results in excessive bleeding or pathologic thrombosis.

Blood cells Coagulation cascade Platelet function Regulatory VWF pathways

Vascular physiology Environmental Factors Subendothelium Modifier genes Endothelial function Hemostasis and Quantitative trait loci Thrombosis

Figure 1.1: Hemostasis and thrombosis Hemostasis and thrombosis are complex physiologic endpoints that result from the interaction of multiple biologic processes tempered by regulatory pathways. Each biologic process is affected by environmental and genetic factors.

3 Thrombus formation: VWF, FVIII, and platelets

The main role of VWF in thrombogenesis is thought to be initiation of platelet adhesion as the foundation of the platelet plug9. At the site of endothelial damage, increased shear stress induces a conformational change in VWF from a globular state to an extended chain conformation that exposes intracellular domains essential for platelet plug formation10 (Figure 1.2). VWF binds exposed collagen in the extracellular matrix via

the A3 domain11. Immobilized VWF then transiently interacts with platelet glycoprotein

Ibα (GPIbα) via the A1 domain to mediate platelet rolling and tethering12,13.

GPIbα is part of the GPIB-IX-V membrane complex and is extended from the

platelet surface by an O-glycosylated stalk, making it accessible for transient interactions

with VWF14,15. However, VWF-GPIbα interactions do not occur at detectable levels in static conditions, as shear stress-induced unfolding of VWF occurs only under flofw conditions16,17. Though transient, the VWF-GPIbα interaction is able to withstand the

high shear stresses of arterial flow18,19. This shear stress-induced activation of VWF localizes VWF-mediated thrombogenesis to regions of vascular injury.

The VWF-GPIBα interaction stimulates platelet activation via protein kinase C and tyrosine kinases, resulting in elevated levels of cytosolic calcium and activation of

18,20,21 phospholipase A2 . Platelet rolling increases contact time with reactive components

of the cellular matrix, leading to further platelet activation and integrin-mediated

adhesion of platelets to form the base of the platelet plug12,16. Platelet activation and

signal transduction in turn trigger activation of platelet GPIIb/IIIa. Fibrinogen, and to

some degree VWF, act as ligands to bridge platelets via GPIIb/IIIa, forming platelet

aggregates22.

Native VWF is a globular protein in circulation and functions as a noncovalent chaperone for coagulation FVIII, protecting FVIII from rapid inactivation by activated protein C (APC) and FXa10,22. While FVIII levels are influenced by VWF levels, there is

4

Figure 1.2: VWF, FVIII, and platelets in thrombogenesis VWF circulates in a globular form and chaperones coagulation FVIII. Platelets circulate in an inactive state with membrane-bound GPIbα extended to interact with VWF. VWF is also stored in platelet α-granules as well as endothelial Weibel-Palade bodies. At a site of endothelial damage, higher shear rates unfold and activate VWF, exposing domains that bind exposed collagen and platelet GPIbα. Unfolding also releases FVIII to participate in the coagulation cascade. Activated platelets and endothelial cells secrete VWF, increasing the local concentration. Adapted and reprinted from Clinics in Laboratory Medicine, Vol 29, Torres and Fedoriw, Laboratory Testing for von Willebrand Disease: Toward a Mechanism-Based Classification, Pages 193-228, Copyright (2009), with permission from Elsevier23. not a reciprocal dependency for the stability of VWF levels24. VWF may also traffic FVIII

into storage vesicles to allow for rapid local increases in concentration to participate in

hemostasis25. It is hypothesized that VWF may participate in the activation of FVIII by thrombin26. Upon VWF unfolding, released FVIII is able to participate in the coagulation cascade to organize a cross-linked fibrin clot in conjunction with the platelet plug.

Though not directly involved in hemostasis and thrombosis, the viscosity of the blood and number of erythrocytes in circulation affect the extent of platelet adhesion to the vessel wall26. Erythrocytes contribute to the central layer of circulating blood,

5 displacing platelets and the VWF-FVIII complex to the boundary layer near the vessel wall. This hemorheologic effect is more pronounced at increased erythrocyte counts and at higher shear rates in arterial flow. Blood viscosity is also inversely proportional to shear stress, an important stimulus of VWF activation18. Given these unique interactions

between VWF, FVIII, and common CBC traits, the work presented herein focuses on

elucidating the genetic etiology of variation in VWF and FVIII levels as well as CBC traits

including platelet count and erythrocyte count. Extremes in variation of each trait are

causes of or risk factors for excessive bleeding or pathologic thromboembolism. Hence,

a more complete understanding of the genetic etiology underlying trait variability will

allow improved diagnostics and risk assessment.

Von Willebrand factor

VWF is a large multimeric protein synthesized exclusively by endothelial cells and megakaryocytes27. The VWF gene (MIM: 613160) spans 178 kilobases (kb) on chromosome 12p13 and is composed of 52 exons ranging from 40 base pairs (bp; exon

50) to 1.3 kb (exon 28)28-30. Exon 1 encodes a signal peptide, exons 2-17 encode the

VWF propeptide, and exons 18-52 encode the VWF monomer (P04275). The propeptide

is composed of 741 amino acids and demonstrates internal homologies24. The VWF

monomer also consists of four types of repeating domains that make up over 80% of the

2,050 amino acid sequence: NH2-D1-D2-D’-D3-A1-A2-A3-D4-B1-B2-B3-C1-C2-CK-

COOH18,31,32. This suggests the VWF gene evolved from a series of gene segment

duplications and exon shuffling. Genetic analysis of VWF is complicated by an

unprocessed pseudogene located on chromosome 22 that duplicates the VWF gene

sequence for exons 23-34 with 97% (See Appendix A)33.

Cell-specific expression of VWF is regulated by positive and negative regulatory

elements spanning the transcription start site (TSS)34. Deletion analysis identified a 724-

6 bp fragment spanning from -487 to +247, relative to the TSS of VWF, as the cell-specific

promoter of VWF in endothelial cells. The minimal core promoter consists of bases -90

to +22 and is not cell-specific. A strong negative regulatory element common to all cell

types was mapped from bases -500 to -300. Downstream of the core promoter,

mutational analysis of the positive regulatory region identified a GATA binding element

that relieves inhibition only in endothelial cells, resulting in endothelial-specific

expression of VWF. The VWF promoter also contains a GT repeat segment 2 kb

upstream of the TSS that encodes a shear stress response element responsible for

mediating an increase in gene expression in regions of high shear stress35. Thus, endothelial expression of VWF is not uniform throughout the vasculature with highest levels in regions of disturbed blood flow or steep shear stress gradients. Regulation of

VWF expression in megakaryocytes is less understood.

In cell culture, COS-1 and CHO cells transfected with VWF expression constructs

synthesize and process VWF, including dimerization, proteolytic processing of the

propeptide, polymerization, and secretion via the constitutive pathway, suggesting VWF

synthesis occurs by common cellular mechanisms26. Cysteine-rich domains at the amino

and carboxy-terminal ends of the VWF monomer participate in formation of head-to-

head dimerization and tail-to-tail polymerization36. Within the endoplasmic reticulum, the

VWF signal peptide is cleaved and VWF undergoes initial glycosylation events and dimerization24. These early glycosylation events are required for VWF dimerization and

may protect VWF from degradation26,37. VWF is further processed in the Golgi apparatus

by multiple glycosylation events. Carbohydrate moieties consistent with H antigens

processed by the ABO blood system are also present on mature VWF37. Cleavage of the propeptide occurs in the trans-Golgi and secretory granules, though it is not required before multimerization of VWF.32 Glycosylated VWF dimers polymerize within the trans-

Golgi and secretory granules, and polymerization is arrested after secretion32.

7 VWF is synthesized via constitutive and secretory pathways, generating monomers ranging in size from 0.5 to 20 million daltons32. The largest (high molecular weight) VWF multimers are the most biologically active, suggesting multivalency of binding sites is optimal for adhesion and bridging function of VWF in homeostasis22,38.

High molecular weight VWF multimers are stored in specialized intracellular granules in endothelial cells and platelets (Weibel Palade bodies and α-granules, respectively) whereas constitutively secreted multimers have undergone less polymerization36.

However, VWF released through the constitutive pathway consists of multimers of all

molecular weights and contributes 95% of plasma VWF levels measured by laboratory

assays32,39. Storage granule release is stimulated by endothelial stress, increasing the local concentration of high molecular weight VWF multimers40-42. VWF undergoes

proteolytic cleavage in circulation under physiologic conditions, which is mediated in part

by the metalloproteinase ADAMTS1322,36. As a result of cleavage, VWF multimer

analysis by electrophoresis produces a characteristic banding pattern that distinguishes

the most active high molecular weight multimers from smaller, less- effective multimers.

VWF protein domains exhibit specific binding activity that characterize their

recognized functions: the D’ domain binds FVIII and heparin, the A1 domain binds

platelet GPIbα and mimetic molecules such as botrocetin and ristocetin, the A3 domain binds fibril forms of collagen, and the C1 domain binds platelet GPIIb/IIIa18,23. Thus, VWF bridges the pathways of hemostasis and thrombosis as a noncovalent chaperone of

FVIII and the initial binding partner of platelets to exposed subendothelial cellular matrix collagen (Figure 1.2). The D3 domain contains cysteine residues that participate in multimerization and the CK domain contains cysteine residues that participate in dimerization23. The A2 domain is the target cleavage site for ADAMTS1318.

In the clinical setting, VWF levels are assessed by an enzyme-linked

immunosorbent assay (ELISA; VWF:Ag)43. The FVIII coagulation assay (FVIII:C)

8 contributes to the understanding of VWF levels in the clinical context due to their noncovalent relationship. The interaction of VWF with platelet GPIbα can be simulated using the antibiotic ristocetin which agglutinates formalin-fixed platelets in the presence of VWF (VWF:RCo). The VWF:RCo/VWF:Ag ratio is used to evaluate qualitative vs. quantitative VWF defects. Multimer distribution is analyzed by low-resolution agarose gel electrophoresis in order to evaluate the banding pattern for presence or loss of high molecular weight VWF multimers.

Von Willebrand disease (VWD) is the most common inherited bleeding disorder with an estimated prevalence ranging from 0.8 to 1.3%44. The clinical presentation of

VWD is characterized by mucocutaneous bleeding symptoms including epistaxis,

gingival bleeding, and menorrhagia as well as easy bruising and bleeding upon

significant challenge including tooth extraction, vaginal live births, surgeries, and minor

traumas26. VWD is classified as quantitative (type 1 and 3) or qualitative (type 2) abnormalities of VWF (Table 1.1)45.

VWD type 1 (OMIM: 193400) is characterized by mild deficiencies in VWF levels that may or may not result in a clinical phenotype. More than 100 mutations have been reported throughout the VWF promoter and coding region. The International Society on

Table 1.1: Classification of VWD Type VWF: VWF: FVIII: Multimer Mechanism Ag RCo C pattern Type 1 ↓ ↓ ↓ Normal VWF transcription or transcript processing, secretion, clearance Type 2A ↓ ↓↓ ↓ Abnormal Dimerization or multimerization, secretion, increased ADAMTS13- mediated cleavage Type 2B ↓ ↓↓ ↓ Abnormal Increased affinity for platelet GPIbα Type 2M ↓ ↓↓↓ ↓ Normal Decreased affinity for platelet GPIbα Type 2N ↓ ↓ ↓↓↓ Normal Decreased affinity for FVIII Type 3 ↓↓↓ ↓↓↓ ↓↓↓ Absent Null mutations

9 Thrombosis and Hemostasis Scientific Standardization Committee maintains a comprehensive database of VWF variants (http://www.VWF.group.shef.ac.uk). Proposed disease mechanisms include disruption of gene transcription or transcript processing, inefficient VWF secretion, and accelerated VWF clearance.

VWD type 1 is a challenging diagnosis in the clinical setting because two-thirds of patients with moderately low VWF levels are asymptomatic or have minimal bleeding symptoms that are indistinguishable from the general population46. Though classically

considered “monogenic,” type 1 mutations commonly exhibit incomplete penetrance and

variable expressivity, even in families with a causal VWF mutation47. The majority of type

1 mutations are inherited in an autosomal dominant pattern and have an estimated penetrance of 60%47. Approximately 37-45% of patients with mild VWD have no putative

mutation in VWF48,49. Cases with no identifiable VWF mutation are likely due to

previously unreported mutations or modifier gene variants (quantitative trait loci, QTL)

involved in VWF biosynthesis and processing50. VWF mutations are likely to be found in

cases with VWF:Ag <30IU/mL50. Mutations in VWF are also more likely to be found in

patients with abnormal VWF multimer patterns51.

VWD type 3 (OMIM: 277480) is a more severe quantitative deficiency, in which patients have markedly decreased or immeasurable VWF levels. Whereas type 1 mutations demonstrate variable expressivity and incomplete penetrance, type 3 VWF mutations tend to generate null alleles that result in clinically significant bleeding47.

Mutations are typically nonsense or frameshift mutations or large deletions that are inherited in an autosomal recessive pattern47,52. Of note, heterozygosity for type 3 mutations is typically asymptomatic, and type 3 mutations are rarely if ever found in cases of VWD type 146,47,53. Patients with VWD type 1 typically have VWF levels less than 50 IU/dL, whereas obligate carriers of type 3 mutations often have normal VWF levels53. Because VWF functions to chaperone FVIII in circulation, VWD type 3 patients

10 also have decreased FVIII levels and present with hemophilia-like muscle and joint hemorrhages in addition to classical VWD mucocutaneous bleeding46.

Qualitative defects in VWF comprise VWD type 2 (OMIM: 613554). Mutations are

typically missense mutations inherited in an autosomal dominant pattern46. Penetrance

for the biochemical phenotype of low VWF levels is typically 100%, whereas penetrance

for the clinical phenotype of bleeding is commonly much less with variable expressivity,

even in families with a unifying causal VWF mutation. Pathomechanisms of mutations

include defective dimerization and multimerization, impaired secretion of high molecular

weight VWF multimers, increased ADAMTS13-mediated clearance, altered VWF-GPIbα

or VWF-collagen interactions, and altered VWF-FVIII interactions54,55. VWD type 2

mutations are further subdivided into types 2A, 2B, 2M, and 2N. Type 2A is

characterized by selective loss of intermediate and high molecular weight multimers

caused by defective dimerization or multimerization, impaired secretion, or increased

clearance54-56. Type 2B mutations are generally gain-of-function mutations in the A1

domain that increase the affinity of VWF for platelet GPIbα54,55,57,58. The gain-of-function

A1 mutations result in increased clearance of high molecular weight VWF multimers bound to platelets at otherwise physiologic conditions. Conversely, type 2M mutations in the A1 domain result in decreased affinity or disruption of the GPIbα binding site leading to defective recruitment of platelets to the site of endothelial damage54,55. Type 2N mutations consist of mutations in the D’ and D3 domains of VWF and disrupt the FVIII binding domain54,55,59.

Recombination between the VWF pseudogene and VWF locus has been implicated as a source of pathogenic alleles in VWD60-66. Gene conversion is a form of

homologous recombination where a donor sequence transfers genetic material to a

highly homologous acceptor sequence in a unidirectional manner67. Evidence of multiple

genetic changes within an exon consistent with the pseudogene sequence has

11 implicated gene conversion in familial cases of all VWD types, with an apparent pathogenic preference of conversion events encompassing exon 2862,65. Additionally, multiple genetic changes consistent with gene conversion have been identified in both patients and unaffected relatives, suggesting gene conversion is not always pathogenic66.

The role of VWF in thrombogenesis is well-characterized. However, the

significance of elevations in VWF levels in pathologic thromboembolic events is currently

a topic of clinical interest. Numerous studies reported high VWF levels in patients

presenting with acute MI and established associations between elevated VWF levels

and future risk of MI, as well as reinfarction and higher mortality rates68-74. At the time of

clinical presentation, VWF levels are associated with increased incidence of major

adverse cardiac events, including death75. High levels of VWF (VWF:Ag and VWF:RCo)

are associated with lacunar infarct and ischemic stroke76-78. In patients with atrial fibrillation or acute coronary syndrome, elevated VWF levels are predictive of clinical outcome. Furthermore, elevated VWF levels are a risk factor for stroke and vascular events in patients with atrial fibrillation79. In the general population associations have

been reported between VWF levels and the risk of future cardiovascular events and

stroke as well as future coronary artery disease75,79. The role of VWF in vascular disease

may be amplified by other disease, including type 2 diabetes mellitus80. VWF levels are

also associated with cardiovascular disease and are an independent risk factor of CVD

in patients with insulin resistance or type 2 diabetes mellitus80,81.

FVIII and the coagulation cascade

Coagulation factors are circulating zymogens activated by a cascade of stepwise limited proteolysis events to serine proteases that ultimately result in deposition of a cross-linked fibrin clot (Figure 1.3)82. The penultimate step activates prothrombin to

12 thrombin, and thrombin then coverts fibrinogen, a soluble protein, into fibrin monomers that polymerize to form an insoluble clot. By convention the coagulation cascade divided into the intrinsic pathway, extrinsic pathway, and common pathway. Many zymogen activations require cell membrane phospholipids as a cofactor. This brings activated coagulation factors into close proximity to form complexes such as TF-FVIIa, FXa-FVIIIa, and FXa-FVa (prothrombinase).

Tissue factor (TF) is a type I integral membrane receptor that is expressed constitutively on several cell types83. At the site of vascular injury, TF binds circulating

Figure 1.3: The coagulation cascade The coagulation cascade is a series of stepwise limited proteolysis events that convert inactive circulating zymogens to serine proteases. Inactive isoforms are traditionally denoted by Roman numerals and active isoforms are denoted by “a” following the Roman numeral. By convention, the coagulation cascade is divided into the intrinsic pathway consisting of FXII, FXI, FIX, and FVIII; the extrinsic pathway consisting of tissue factor (TF) and FVII; and the common pathway consisting of FX and FV which form the prothrombinase complex, prothrombin (FII), fibrinogen, and FXIII. Cleavage of fibrinogen by thrombin releases fibrin monomers that are covalently cross-linked by FXIIIa to form an insoluble clot.

13 FVII in a one-to-one complex and functions as a cofactor in the activation of FVII by limited proteolysis in a calcium-dependent manner82,84. The TF-FVIIa complex then activates FX, the first zymogen in the common pathway82. TF is an important cofactor for

the activation of both FVII and FX and accelerates both cascade steps85. FXa forms a one-to-one complex with FVa, constituting the prothrombinase complex with calcium

82,86 ions and phospholipids . FVa increases the Vmax of prothrombin activation by 1,000-

82,87 fold and the phospholipid component reduces the Km 1,400-fold . In addition to activating fibrinogen to free fibrin monomers, thrombin also activates FV, FVIII, and FXIII as well as regulatory protein C and protein S82. Thrombin cleaves fibrinogen to generate

fibrin monomers A and B which cross-link through side-to-side overlapping88-90. When

activated, fibrin acts as a cofactor with thrombin to activate FXIII82,91. FXIIIa then forms

covalent cross-links between overlapping fibrin monomers82.

At low concentrations TF-FVIIa complex also activates FIX, a component of the

intrinsic or contact pathway82,92,93. FIXa complexes with FVIIIa, calcium ions, and

phospholipids to activate FX82. The FVIII zymogen circulates in complex with VWF and is released upon VWF unfolding at the site of endothelial damage94. As are other coagulation zymogens, FVIII is activated by limited proteolysis, initially by FXa95-97.

Thrombin also acts to cleave FVIII, generating a positive feedback loop to accelerate FX activation and thereby prothrombin activation82. FVIIIa functions as a cofactor for FIXa in a similar manner to FVa in the prothrombinase complex and increases the Vmax of FX activation by 200,000 fold82,98.

Fibrinolysis is a complementary cascade of zymogen activations that culminates in the activation of plasminogen to the serine protease plasmin82. Thrombin stimulates endothelial cells to secrete tissue-type plasminogen activator (tPA)99. Fibrin binds tPA

and plasminogen and functions as the cofactor in the activation of plasminogen to

plasmin100. Plasmin then functions to dissolve fibrin clots. Furthermore, there are several

14 inhibitors of coagulation that localize clot formation to the site of damage: tissue factor pathway inhibitor (TFPI) inhibits the TF-FVIIa complex and FXa101,102, FXII is inhibited by

103 104 C1 inhibitor , FXI is inhibited by α1-antitrypsin , and FX is inhibited by Z protease

inhibitor105. Antithrombin III inhibits thrombin, FIXa, FXa, and FXIa by forming one-to-one

complexes of the inhibitor and serine protease82. The rate of sequestration is effectively

immediate in the presence of the drug heparin. Thrombin substrate specificity is altered

by thrombomodulin, an endothelial cell membrane protein, from procoagulant proteins

fibrinogen and FX to anticoagulant proteins C and S82. APC complexes with protein S,

calcium ions, and phospholipids and inactivates cofactors FVa and FVIIIa by proteolysis

to significantly slow the activation of FX and prothrombin97,106,107.

F8 (MIM: 300841), which encodes the FVIII zymogen, is 186 kb in length, consists of 26 exons, and maps to chromosome Xq28108. F8 mRNA has been detected

in multiple tissues, but liver transplant cures FVIII deficiency, indicating the importance of

the liver as the primary site of FVIII synthesis109. The 9,000 nucleotide transcript encodes a 2,332 amino acid peptide with a 19 amino acid leader sequence

(P00451)110,111. The resulting peptide is cleaved a second time during processing and

circulates as a two-chain molecule. FVIII has two types of internal homology in triplicated

repeats that are also homologous to FV and ceruloplasmin97,108,111,112. As with ceruloplasmin, copper is bound to processed and secreted FVIII, but the role of copper remains unclear113,114.

Hemophilia A, which accounts for 80% of hemophilia cases, is diagnosed in 1 in

5,000 live male births115. Most cases of hemophilia A are caused by single-base

mutations in F8 that result in impaired biosynthesis, secretion, or stability, resulting in

quantitative deficiencies of FVIII116. Additional mutations have been reported that result

in functional or qualitative defects of FVIII116-120. Approximately one-third of cases are

sporadic and caused by de novo mutations121,122. Less than 5% of cases are caused by

15 large indels or rearrangements123-125. However, approximately 50% of all cases of severe

hemophilia A are caused by an inversion in intron 22. Most deletions are thought to be

caused by nonhomologous recombination126. Unlike the discrepancy between VWF

levels and bleeding in VWD, bleeding severity usually correlates well with FVIII levels in

hemophilia127. In addition to hemophilia, combined FV-FVIII deficiency (OMIM: 613625)

causes low FVIII levels in an autosomal recessive pattern. The first mutations described

were identified in the gene LMAN1 (MIM: 601567), previously called endoplasmic

reticulum intermediate Golgi compartment-53 (ERGIC53)128. Additional mutations in

MCFD2 (MIM: 607788), an ERGIC component that associates with LMAN1, have also been identified as putative causes of combined FV-FVIII deficiency.

While low levels of FVIII cause excessive bleeding, elevated levels of FVIII are associated with increased risk and recurrence of pathologic thromboembolic events including VTE129,130. Elevated FVIII:C is also associated with higher incidence of arterial thrombotic events as well as with incidence of ischemic stroke77,81,131. In women with type 2 diabetes mellitus, elevated FVIII levels were associated with cardiovascular disease132. However, this association may be due to the role of FVIII as an acute phase

reactant and the significance of elevated FVIII levels as a risk factor for thrombotic

events remains unclear. In a large Austrian cohort, FVIII:C levels were associated with

all-cause mortality with a stronger association in women133.

Environmental and genetic etiology of VWF and FVIII

Among the general population VWF levels exhibit a high degree of variability, ranging from 50 to 200 IU/dL134-136. Even in families with a single VWF mutation that causes VWD, variable expressivity and incomplete penetrance are often noted, but the mechanism of variation has yet to be elucidated47. Furthermore, normal variation of

VWF:Ag overlaps with the distribution of VWF:Ag in patients with VWD, which

16 complicates diagnosis and risk stratification. Elucidation of the environmental and genetic etiology underlying VWF and FVIII variation may provide clinical insight for the diagnosis of VWD, as well as risk stratification for excessive bleeding and pathologic thrombosis47. Identification of genetic risk factors will allow classification of individuals as high-risk due to persistently elevated levels as opposed to individuals with environmentally-acquired elevations in VWF:Ag or FVIII:C137. Genetic variants may also

explain incomplete penetrance and variable expressivity of VWF mutations138-143.

VWF:Ag and FVIII:C are highly correlated traits144; thus, it is feasible that environmental

and genetic components may modify both traits through their noncovalent relationship.

Several environmental modifiers have been described that influence hemostasis

and thrombosis. Diet is an important environmental factor that affects both platelet plug

formation and the coagulation cascade145. FVIII and VWF levels also differ by ethnicity

and gender, with higher FVIII and VWF levels in African Americans than

Caucasians144,146-148. FVIII levels are higher in women than men in both ethnic groups, whereas VWF level gender differences were demonstrated only in African-American individuals144. Age and body mass index are also strong covariates of VWF levels149. It is thought that age mediates a secondary effect of FVIII:C through its interaction with

VWF150.

A significant genetic variance component has been reported for both VWF:Ag and FVIII:C. Twin studies demonstrate that an estimated 66-75% of the variability in

VWF levels is heritable, whereas pedigree analyses estimate a genetic variance component of 0.25-0.32150-153. Similarly, twin studies report a heritability estimate of 0.57-

0.61 for FVIII:C, whereas pedigree analyses report a more conservative estimate of

0.40150,151,153. Though heritability calculations are population-specific, pedigree studies

consistently generate lower heritability estimates for hemostatic proteins than twin

studies, which is more pronounced for heritability estimates of VWF antigen levels137.

17 Elucidating QTL that influence VWF and FVIII levels is important to understand the underlying physiology of phenotypic variation.

VWF is a highly polymorphic gene with over 1,100 single nucleotide polymorphisms (SNPs), of which more than 150 are exonic or in closely flanking intronic sequence154. The contribution of these SNPs in isolation or as part of a functional allelic unit remains to be elucidated. Because sequencing VWF would be cost-prohibitive, we proposed to study VWF haplotypes as a functional unit to evaluate the contribution of allelic heterogeneity to the variability of VWF levels. Recently, apparently silent polymorphisms have been demonstrated to affect VWF levels and even to cause VWD type 1155. Allelic variants in VWF which are not sufficient to cause VWD but can

modulate VWF levels and activity have also been reported156.

The ABO locus, which determines blood type, is the best-characterized QTL of

VWF levels, accounting for up to 30% of the genetic variability of VWF and, by

extension, FVIII levels37,136,150. The effect of blood type is large enough to mitigate known variants within the VWF gene previously associated with VWF:Ag levels157. Individuals

with type O blood have the lowest mean VWF:Ag level (74.8 IU/dL) and individuals with

type AB blood have the highest mean VWF:Ag level (123.3 IU/dL)136. Type O blood is

overrepresented in cohorts with VWD type 1 and other blood types are

underrepresented. ABO genotyping provides a more accurate representation of the

influence of ABO on VWF:Ag by identifying individuals heterozygous for the inactive

(type O) H transferase allele158. Multimeric analysis of VWF indicated that subjects with type O blood exhibited higher rates of proteolysis than for non-O blood types

(O≥B>A≥AB) with preferential loss of high molecular weight VWF multimers, suggesting that processed H antigens protect VWF from rapid clearance37. Other components of the

ABO system have been implicated as QTL that affect VWF and FVIII levels, including the secretor locus and the Bombay blood group159-165.

18 Agnostic genome-wide linkage and association analyses have been published reporting multiple QTL that affect VWF and FVIII levels, including the Genetic Analysis of

Idiopathic Thrombophilia (GAIT) study linkage analysis134, the Framingham Heart Study

(FHS) linkage and association studies166, the Cohorts for Heart and Aging Research in

Genome Epidemiology (CHARGE) Consortium meta-analysis167, and a meta-analysis of

3 French-Canadian cohorts (MARTHA)168. Candidate QTL identified by association and

by linkage are presented in Table 1.2 and in Table 1.3, respectively.

Table 1.2: VWF and FVIII candidate QTL by association Chr. Phenotype SNP P-value Gene Study 2p23.1 FVIII rs6708166 1.30x10-6 MARTHA 3q27.2 VWF rs4686760 1.08x10-6 VPS8 (intergenic) MARTHA 5q22.2 VWF rs13361927 4.51x10-6 EPB41L4A MARTHA 5q22.2 VWF rs379440 9.82x10-7 EPB41L4A MARTHA 6p22.1 VWF rs9295740 2.8x10-5 FHS 6p22.1 VWF rs202906 2.1x10-5 FHS 6q14.1 FVIII rs1321761 9.54x10-6 MARTHA 6q15 VWF rs6454764 5.12x10-6 ANKRD6 MARTHA 6q24 VWF rs9390459 1.2x10-22 STXBP5 CHARGE 6q24 FVIII rs9390459 6.7x10-10 STXBP5 CHARGE 8p21 VWF rs2726953 1.3x10-16 SCARA5 CHARGE 8p21 FVIII rs9644133 4.4x10-15 SCARA5 CHARGE 9q21.31 VWF rs1757948 7.37x10-6 MARTHA 9q34.2 VWF rs687621 <5.0x10-324 ABO CHARGE 9q34.2 FVIII rs687249 <5.0x10-324 ABO CHARGE 9q34.2 FVIII rs12344583 7.92x10-6 VAV2 MARTHA 12p13 VWF rs1063857 1.7x10-32 VWF CHARGE 12p13 FVIII rs1063856 3.6x20-9 VWF CHARGE 12q21.33 VWF rs1438993 6.25x10-6 MARTHA 12q21.33 VWF rs10745527 5.43x10-6 MARTHA 12q21.33 VWF rs2579103 7.61x10-6 MARTHA 12q23 VWF rs4981022 7.3x10-10 STAB2 CHARGE 12q23 FVIII rs122299292 7.2x10-9 STAB2 CHARGE 12q23.3 FVIII rs7306642 2.95x10-6 STAB2 MARTHA 12q24.3 VWF rs7978987 3.8x10-11 STX2 CHARGE 12q24.33 VWF rs708356 1.5x10-5 FHS 13q13.3 VWF rs7319671 3.6x10-5 FHS 14q32 VWF rs10133762 2.3x10-10 TC2N CHARGE 17q11.2 FVIII rs1354492 2.41x10-6 ACCN1 MARTHA 17q11.2 FVIII rs12941510 5.67x10-6 ACCN1 MARTHA 18q12.1 VWF rs1954971 1.0x10-5 FHS 18q12.1 VWF rs2298574 5.67x10-6 CDH2 MARTHA 19p13.2 VWF rs868875 1.3x10-9 CLEC4M CHARGE 19p13.3 VWF rs732505 9.38x10-6 MARTHA

19 Table 1.3: VWF candidate QTL by linkage Chromosome Phenotype LOD Gene Study 1p36.13 VWF 1.48 GAIT 3p25.3-p24.2 VWF 2.1 FHS 2q23.2 VWF 1.84 GAIT 5q31.1 VWF 1.27 GAIT 6p22.3 VWF 1.4 GAIT 9q34.2 VWF 3.46 ABO GAIT 22q11.1 VWF 1.47 VWFP1 GAIT

Platelets

First described in the late 1800s, platelets are small discoid subcellular fragments that circulate in a resting state169. At the site of vascular injury, platelets activate,

undergoing a dynamic transformation, and form the bulk of the thrombus to stop

bleeding. Platelets are derived from megakaryocytes by thrombopoiesis at a rate of

approximately 1011 platelets per day and have a circulating lifespan of about 7-10 days170-172. The small size and high concentration results in platelets being displaced in

circulatory flow (hemorheologic effect) to patrol the cardiovascular endothelium

efficiently throughout the body.

Megakaryocytes are polyploid myeloid cells that each give rise to hundreds of

platelets173-177. Thrombopoietin, a glycoprotein produced primarily in the bone marrow,

liver, and kidneys, is the most potent regulator of thrombopoiesis and promotes the

expansion of all hematopoietic stem cells and progenitor cells with megakaryocytic

potential178. Thrombopoietin-induced thrombopoiesis is autoregulated by high-affinity platelet receptors that sequester the hormone178,179. Hence, when platelet counts are

low, more thrombopoietin is available to stimulate megakaryocytes and when platelet

counts are replenished, thrombopoietin is sequestered and thrombopoiesis slows.

20 The process of platelet formation and release is complex and many steps remain unclear. Megakaryocytes extend long intermediate proplatelet formations that are continuous with the megakaryocyte cytoplasm in a microtubule-dependent manner180-182.

Mature platelets are found only at the ends of protruding proplatelet formations, and the proplatelet microtubule network functions to traffic granules and organelles to nascent platelet buds182. Repeated bending and bifurcation of the proplatelet projection generates proplatelet ends in an actin-dependent manner. Filamin, a protein that crosslinks actin filaments, also connects the actin cytoskeleton with platelet GPIbα.

Filamin-GPIbα complexes are abundant within the platelet and each actin filament is linked an average of six times to GPIbα170. Based on clinical observation in patients with

Bernard-Soulier syndrome, these filamin-GPIbα complexes appear to be necessary for

normal platelet formation170,183,184. When bound to VWF, this actin-filamin-GPIbα

interaction initiates intracellular signaling and platelet activation185,186.

Early observations by electron microscopy noted the ability of platelets to

radically change shape187,188. Platelet activation is a dynamic process that includes rapid

shape change, translocation of proteins to the cell membrane, and degranulation and

exocytosis of cellular vesicles189. Actin is an abundant platelet protein that forms a

complex network connected to the plasma membrane throughout the platelet170. In the

resting state, actin polymerization is restricted by sequestration of the assembly ends of

the filaments by capping proteins170,190. Upon activation, assembly ends are exposed

and dynamic actin nucleation sites are organized191. Platelet activation also results in exocytosis of vesicles, α-granules (procoagulant proteins including VWF, fibrinolytic regulators, and other proteins), and dense granules (cell activating agents including ADP and calcium)189,192. Activated platelets release microvesicles that contain platelet

transmembrane proteins including GPIbα and GPIIb/IIIa. Platelet degranulation releases

21 proteins and agents that amplify and regulate thrombotic response, recruit and activate cells, and initiate tissue remodeling and healing192.

Congenital platelet disorders are classified as thrombocytopenias with otherwise

normal platelet function and thrombocytopenias with thrombocytopathy193-200.

Thrombocytopenias are caused by mutations in several genes that encode components of platelet structure and function (Figure 1.4). Thrombocytopenias with normal platelet function include autosomal dominant thrombocytopenia which demonstrates linkage to (the THC2 locus; OMIM: 188000)201. Thrombocytopenias with

Figure 1.4: Congenital thrombocytopenias Congenital thrombocytopenias are caused by mutations in genes that encode components underlying platelet structure and function. α:α-granules; δ: dense granules; L: lysosomes; G: G-protein coupled receptor; TxA2: thromboxane A2; GPS: gray platelet syndrome; QPD: Quebec platelet disorder; ARC: arthrogryposis, renal dysfunction, cholestasis syndrome; SPD: storage pool deficiency; CHS: Chediak-Higashi syndrome; HPS: Hermansky-Pudlak syndrome; VWD: von Willebrand disease; WASP: Wiskott- Aldrich syndrome protein. This material is reproduced with permission of John Wiley & Sons, Inc200.

22 thrombocytopathy include platelet storage pool disorders. The classic α-granule storage pool disorder gray platelet syndrome (OMIM: 139090) demonstrates little phenotypic variation, whereas dense granule storage pool disorders are heterogeneous192. Dense storage pool disorders may also present as part of syndromes, including Chediak-

Higashi syndrome (OMIM: 214500) and Hermansky-Pudlak syndrome (OMIM:

203300)192,202,203.

Platelets play an important role in atherosclerotic disease, including acute MI, peripheral artery disease, and stroke204. Baseline platelet count at the time of diagnosis is a strong predictor of reocclusion, reinfarction, and mortality even one year post- infarct205. Elevated platelet counts are also associated with increased risk of fatal coronary heart disease206. Both platelet count and mean platelet volume are independent risk factors for ischemic stroke, and platelet count demonstrates a significant negative correlation with hemorrhagic stroke207. Both low and high platelet

counts are associated with non-cardiovascular mortality208. In certain cancers including breast and ovarian, higher platelet counts are associated with worse prognosis209.

Though platelet counts demonstrate a large range in the general population, the degree of intra-individual variation is much more modest, suggesting that platelet count is under strong genetic influence210-214. The largest study of serial platelet counts to date

reported that the repeatability of platelet counts was 0.871 in male subjects and 0.849 in

female subjects214. Twin studies estimate that the heritability of platelet count is as high

as 0.80212,215-217. However, heritability estimates are population-specific, and some studies do not establish a significant genetic component216. Multiple studies also report that platelet function is a heritable trait, as evaluated by in vitro aggregation218-220.

Several genome-wide studies have been conducted to map candidate QTL of CBC traits including platelet count. Table 1.4 presents candidate QTL reported for platelet count.

For a comprehensive review of candidate QTL of other CBC traits, see Appendix B.

23 Table 1.4: Candidate QTL for platelet count Chromosome P-value Gene Study 3q27.1 5.38x10-11 THPO-CHRD Kamatani et al. 2010221 6p21.3 3.70x10-10 BAK1 Soranzo et al. 2009222 6p21.31 6.66x10-11 BAK1 Kamatani et al. 2010221 6p21.31 6.20x10-8 BAK1 Lo et al. 2011223 6p21.31 7.60x10-73 BAK1 Lo et al. 2011223 6q23.3 2.54x10-14 HBS1L-MYB Kamatani et al. 2010221 9p24.1 2.95x10-13 RCL1 Kamatani et al. 2010221 9p24.1 –p24.3 8.50x10-17 AK3 Soranzo et al. 2009222 12q13.13 1.60x10-6 NFE2-COPZI Lo et al. 2011223 12q24 2.20x10-13 ATXN2 Soranzo et al. 2009222 12q24 7.70x10-12 PTPN11 Soranzo et al. 2009222 12q24.12 4.75x10-19 SH2B3 Kamatani et al. 2010221 17p13.2 2.13x10-12 GP1BA Kamatani et al. 2010221 19p13.12 3.00x10-7 TPM4 Lo et al. 2011223 19p13.12 3.00x10-7 TPM4 Lo et al. 2011223

Objective

The objective of this dissertation is to further elucidate the genetic etiology of

VWF, FVIII, and platelets through the study of human pedigrees. The first aim was to further characterize allelic variation of VWF. Subsequent chapters report identification and characterization of a novel VWF mutation (M1304R) and haplotype analysis of common variants in VWF. The second aim was to map candidate QTL of VWF and FVIII

levels and CBC traits including platelet count in a large Amish pedigree. One approach

to identifying QTL for complex traits like the aforementioned is to focus on a single large,

culturally-isolated pedigree, in which it is reasonable to assume that the genetic

component represents the common variation of an outbred population while being under

the control of fewer QTL224,225. Thus, it is anticipated that related individuals with similar

trait values will share a higher amount of genetic material identical by descent near the

gene or genes that contribute to the quantitative trait. Additionally, genetic isolates

demonstrate a more homogenous environmental component, whereas a collection of

ancestrally unrelated smaller pedigrees tend to have more variable lifestyles. The third

24 aim was to identify genes involved in platelet biogenesis by mapping the putative causative genes of gray platelet syndrome and autosomal dominant thrombocytopenia,

THC2.

25 CHAPTER II

ANALYSIS IN A MULTIGENERATIONAL AMISH PEDIGREE IDENTIFIES NOVEL

CANDIDATE QTL THAT AFFECT VWF AND FVIII LEVELS AND REPLICATES

GWAS QTL ASSOCIATIONS1

Abstract

Von Willebrand disease (VWD) is the most common heritable bleeding disorder and is characterized by incomplete penetrance and variable expressivity. It is highly likely that modifier genes influence phenotypic variability. We phenotypically and genetically characterized a 2,000-member Amish pedigree with VWD type 2. Sequence analysis of the VWF gene demonstrated heterozygosity for a missense mutation at position c.4120 (C>T) in 74 individuals that predicts an arginine to cysteine amino acid change at position 1374 (R1374C). The R1374C mutation disrupts the VWF A1 domain, resulting in defective binding to platelet glycoprotein 1bα. Distribution of VWF levels among pedigree members with the familial VWF mutation ranges three-fold between the maximum and minimum observation. Bleeding patterns also vary significantly. A primary genetic screen at an average interval of 10 cM was performed to search for candidate modifier loci. Genome-wide linkage analyses of VWF antigen levels

(VWF:Ag), VWF ristocetin cofactor activity (VWF:RCo), and coagulation factor VIII

(FVIII:C) levels were performed using SOLAR and MORGAN. Dichotomous multipoint linkage analysis of the bleeding score was performed using MORGAN. We

1 The Sandy Haberichter laboratory conducted all cell culture data. The in silico modeling of the R1374C mutation was completed by Daniel Ferraro. STRP genotyping and Illumina VeraCode and GoldenGate genotyping were conducted by third party facilities. Statistical genetics software packages were operated by Kai Wang and Trudy Burns. Jesse Hinckley designed and/or conducted all other aspects and experiments presented herein including subject recruitment clinics, scoring all bleeding score surveys, VWF and ABO sequencing, Illumina custom array design and quality control analysis of genotypes, and analysis of results provided by aforementioned individuals.

26 independently confirmed a previously reported candidate QTL that maps to chromosme

6q22. Additionally, we identified a novel candidate QTL on chromosome 16q11. Fine- mapping of the chromosome 9q34 QTL established significant associations between tagging SNPs in ABO and VWF levels whereas no significant associations were established with tagging SNPs in ADAMTS13, confirming ABO is the primary 9q34 QTL gene modulating VWF levels. We also independently replicated previously reported associations with VWF and FVIII levels, confirming STXBP5, SCARA5, STAB2, and

TC2N as candidate QTL.

Introduction

Von Willebrand disease (VWD) is one of the most common congenital bleeding disorders. Qualitative defects of von Willebrand factor (VWF) comprise VWD type 2, whereas mild and severe quantitative deficiencies comprise VWD types 1 and 3, respectively45. Alternatively, elevated levels of VWF have been associated with cardiovascular disease, specifically with acute coronary syndromes226.

Though considered “monogenic,” VWD, particularly type 1, is characterized by incomplete penetrance and variable expressivity, even in families with a unifying causal

VWF mutation47. Individuals with very low levels of VWF antigen (VWF:Ag; < 30 IU/dL)

or abnormal VWF multimer patterns typically have mutations in the VWF gene50,63.

However, a large number of VWD patients who have VWF:Ag levels greater than 30

IU/dL do not demonstrate linkage to VWF, suggesting the involvement of other genes48,227. Recent studies have shown that approximately 40% of patients with mild

VWD have no presently identifiable putative mutation in VWF48,49. The complexity of

VWD is underscored by the variability observed in the clinical phenotype (bleeding) as

well as the biochemical phenotype (VWF levels). Furthermore, the extent of clinical

bleeding in patients with VWD does not always correlate with VWF levels. Patients with

27 mild or moderate deficiencies (types 1 and 2) may show considerable variation in bleeding tendency even within the same family. Conversely, mild bleeding and bruising are common in the general population without an identifiable bleeding disorder and some symptoms may overlap between “bleeders” and healthy controls4.

VWF levels are also highly variable in the general population, ranging between

50 and 200 IU/dL134-136. VWF levels also increase with age. While a percentage of that

variation can be attributed to environmental factors such as stress, exercise, and

hormonal status, twin and family studies have estimated that 25 to 75% of the variation

in VWF levels may be genetically determined137,150,151,228. Family-based genetic study designs have been used to conduct linkage analysis of VWF levels in individuals without VWD, and genetic linkage of VWF levels to VWF has been difficult to demonstrate134,229.

The best-characterized quantitative trait locus (QTL) of VWF levels is that encoding the ABO blood group136. However, ABO accounts for no more than 30% of the genetic component of VWF:Ag levels150. In terms of bleeding severity, it is likely that other modifiers may explain the significant disparity in bleeding tendencies observed between patients with similar laboratory profiles and levels of VWF. Variants in the

ITGA2B, GP6 and ITGA2 genes have been identified by association studies as candidate modifiers of disease severity230,231. These genetic factors may also contribute

to the incomplete penetrance and variable expressivity observed in VWD.

One approach to identify QTL of complex traits such as VWF levels is to focus

on a single large, culturally isolated pedigree in which it is reasonable to assume that

trait variability is under the control of fewer genes and environmental influences than

would be found in a collection of ancestrally unrelated smaller pedigrees with more

variable life styles224,225. We identified, characterized, and collected blood samples from

a single Amish pedigree diagnosed with VWD that shows significant variability in VWF

28 levels and bleeding severity among both the individuals with a familial VWF mutation and unaffected relatives. In addition to identifying a novel candidate QTL on chromosome 16q11, we also replicated candidate QTL mapping to chromosomes 6p22 and 9q34. Additionally, we report replication of meta-analyses associations with VWF levels to seven SNPs in six genes: STXBP5, SCARA5, ABO, VWF, STAB2, and TC2N.

Elucidation of the genetic etiology of the variability of VWF and FVIII levels will provide further insight into the phenotypic variability of VWD and the underlying cause of cases in which no putative VWF mutation is identified. This pedigree is a unique source of genetic information.

Methods

Participants

The Amish pedigree is composed of over 2,000 individuals, of whom 788 provided blood samples. Eleven pedigree members were excluded because of their uncertain position in the pedigree. Although this is a community with limited outbreeding, no consanguineous marriages were identified. The bleeding phenotype was assessed by survey and blood was drawn, processed, and immediately frozen for DNA extraction and laboratory evaluation of VWF and FVIII levels. The number of examined pedigree members included in the genome-wide linkage analysis presented herein is 384.

Additional samples were collected after the initial mapping study, increasing the sample resource from 395 samples to 788. The number of Amish pedigree members included in fine-mapping assays is 460. Informed consent was obtained following institutional guidelines and according to the Declaration of Helsinki.

29 VWF:Ag, VWF:RCo and FVIII:C levels

Testing for VWF and FVIII:C levels was performed at the Blood Center of

Wisconsin (Milwaukee, WI). VWF:Ag and VWF:RCo levels were measured for all recruited pedigree members who provided a blood sample. VWF:Ag was measured on standard enzyme-linked immunosorbent assay (ELISA) plates coated with a combination of two monoclonal antibodies. Plasma samples were plated into duplicate wells for three different dilutions per sample. Captured VWF was detected with polyclonal rabbit antibody with an enzyme conjugate reaction. Agreement between dilutions (and observed ristocetin cofactor [RCo] level – see below) was evaluated as a measure of quality. VWF:RCo was measured by automated analysis of formalin-fixed platelet agglutination on the Dade-Behring BCS analyzer (Dade Behring Inc. Newark, DE). Data were analyzed using the manufacturer’s end-point based software package. To investigate intra-individual variability in VWF:Ag, VWF:RCo, and FVIII:C, 15 pedigree members were re-sampled at the time of the second enrollment clinic.

Bleeding score

To characterize the bleeding phenotype of pedigree members, a modified bleeding questionnaire was employed that was originally designed and validated by the

Epidemiology Branch of the Centers for Disease Control and Prevention (CDC) to screen women with VWD232. The questionnaire consists of 10 questions selected from a

more extensive set of questions on the basis of their positive predictive value. Questions

focused on nosebleeds, minor injury (cuts), bruising, gingival bleeding, bleeding after

challenges dental and surgical procedures, and transfusion requirements.

Questionnaires were graded on a Likert scale with the lowest (no bleeding phenotype)

possible score being zero and the highest (most severe) score being seven. To evaluate

sensitivity and specificity, the questionnaire was also given to a group of unrelated

30 patients with VWD recruited from the Bleeding Disorders Clinic at the University of Iowa and a group of unrelated controls (data not shown). All questionnaires were administered to pedigree members by clinical personnel to increase accuracy, and they were evaluated by one individual to standardize scoring and implementation of clinical judgment233.

ABO blood group

DNA was extracted with the Gentra Puregene Kit (Qiagen Valencia, CA) and sequencing of exons 6 and 7 of the ABO gene was performed in all collected samples.

Blood type determination was made based on haplotypes formed by sequence variants at positions 261, 802, 803 and 1,069234. Although this is not a standard technique for

ABO determination, the advantage of DNA sequencing over serum antibody detection includes the ability to distinguish individuals heterozygous for an O allele (inactive glycosyltransferase; AO and BO) from individuals with no O allele (AA, BB, and AB).

Genotyping of ABO was validated by sequencing samples from the University of Iowa

Blood Bank with known serologic ABO status (data not shown).

VWF sequencing

Low VWF:RCo and VWF:Ag levels in several pedigree members suggested the diagnosis of VWD type 2. Thus, a sequencing strategy for VWF was initiated in exon 28, which harbors the majority of type 2 mutations. Exon 28 of VWF was sequenced for all pedigree members who provided a blood sample. PCR and sequencing primers were designed for selective amplification of VWF in order to avoid amplification of the VWF pseudogene located on chromosome 2233. Targeted sequence analysis identified a C>T

mutation at position c.4120 that segregated with VWF:RCo < 25 IU/dL. To rule out

additional mutations, the entire VWF sequence, including introns, was analyzed in three

31 pedigree members: two VWD patients and one unaffected relative. Sequencing was performed at the Partners Healthcare Center for Genetics and Genomics (Harvard

Medical School, Boston, MA). A 5’ allelic discrimination high-throughput fluorescent genotyping assay (TaqMan)235 was designed to determine the population frequency of the c.4120C>T mutation in a group of individuals with European ancestry (n=358) from different populations included in the Centre d’Etude du Polymorphisme Humain (CEPH) diversity panel236.

In silico modeling

The R1374C amino acid substitution was modeled using the structure of the wild-type VWF A1 domain bound to platelet glycoprotein Ibα, 1SQ0237. The arginine

side chain of reside 1374 (residue 611 using the numbering from 1SQ0) was replaced

with a cysteine side chain using the program COOT238. The most common protomer

was selected for the position of the side chain. In this position, the alpha and beta

carbons of the arginine and cysteine residues were in the same position. There was no

steric clash with any of the cysteine atoms and any other atoms in the model. Images

were rendered using the program PyMol239.

Cell culture, transfection, and multimer analysis

The human VWF expression vector, pCIneo-VWF-WT, has been previously

described240. The c.4120C>T mutation was introduced into the full-length VWF expression vector to create pCIneo-VWF-R1374C. Human embryonic kidney cells

(HEK293T, D. Ginsburg, University of Michigan, Ann Arbor, MI) and mouse pituitary tumor cells (AtT-20/D16v-F2, CRL 1795, American Type Culture Collection) were

240 cultured at 37°C in an atmosphere of 5% CO2 as previously described . The variant pCIneo-VWF-R1374C was expressed alone or co-expressed with pCIneo-VWF-WT.

32 Conditioned media were harvested from the cells 72 hours after transfection and centrifuged to remove debris. Transfected HEK293T cells were washed twice with PBS

(phosphate buffered saline) and lysed in Reporter Lysis Buffer (Promega, Madison, WI).

The concentration of VWF:Ag in conditioned media and in cell lysate was determined by

ELISA. VWF multimer structure was analyzed by electrophoresis through a 0.8% (w/v)

HGT(P) agarose (FMC Bioproducts, Rockland, ME) stacking gel and 2% (w/v) HGT(P) agarose running gel containing 0.1% SDS for 16 hours at 40 V using the Laemmli buffer system and Western blotting as previously described240. Binding of expressed VWF to platelets in the presence or absence of either ristocetin or botrocetin was determined using a modification of the platelet-binding assay previously described. Briefly, conditioned media from transfected cells was incubated with Alexa-647 labeled rabbit- anti-human VWF (DAKO, Carpintera, CA), and then added to formalin-fixed platelets.

Ristocetin or botrocetin was added to the VWF-antibody-platelet mixture and labeled cells were analyzed using a BD LSRII flow cytometer.

Genome-wide scan

Genotyping was completed by the NHLBI Mammalian Genotyping Service at

Marshfield, WI (http://research.marshfieldclinic.org/genetics/home/index.asp), using the

Screening Set 16 with 400 short tandem repeat polymorphic (STRP) markers. For a complete list of Screening Set 16 STRP markers, refer to Appendix C. This generated a total of approximately 160,000 genotypes for the 384 samples. Data were processed and linkage analysis data files were created. Mendelian inconsistencies were evaluated using the computer program PEDCHECK and resolved241.

33 SNP selection and genotyping

For all candidate QTL intervals, tagging SNPs were selected using the Tagger algorithm, implemented in Haploview using pairwise tagging with an r2 threshold of 0.80

(http://www.broadinstitute.org/scientific-community/science/programs/medical-and- population-genetics/haploview/haploview)242,243. Centre d’Etude du Polymorphisme

Humain from Utah (CEU) data downloaded from the HapMap Genome Browser Release

#28 (http://hapmap.ncbi.nlm.nih.gov/cgi-perl/gbrowse /hapmap28_B36/) were used to

evaluate linkage disequilibrium (LD) patterns and pairwise r2 of tagging SNPs for the

candidate intervals. All coordinates were based on the NCBI36/hg18 build. In addition to

fine-mapping tagging SNPs, we also genotyped 14 SNPs identified as candidate QTL of

VWF and/or FVIII levels by previously reported meta-analyses (rs10133762, rs1063857,

rs12229292, rs12941510, rs13361927, rs1354492, rs2726953, rs379440, rs4981022,

rs687289, rs7978987, rs868875, rs9390459, and rs9644133)167,168. Chromosome 6 tagging SNPs and genome-wide association study (GWAS) replication SNPs were genotyped using an Illumina VeraCode assay by BodySync (Aurora, CO). Chromosome

9 and chromosome 16 tagging SNPs were genotyped using an Illumina GoldenGate assay by the University of Colorado Denver Genomics and Microarray Core (Aurora,

CO). All SNPs were filtered per manufacturer’s quality control recommendations using

GenomeStudio. Mendelian inconsistencies were evaluated using PEDCHECK and inconsistent genotypes were zeroed out241.

Heritability, linkage, and association analyses

VWF:Ag, VWF:RCo, and FVIII:C are quantitative traits that demonstrate high

kurtosis (data not shown). Hence, VWF:Ag, VWF:RCo and FVIII:C were transformed by

taking the natural logarithm to better meet model assumptions. Heritability estimates

were obtained using the SOLAR (Sequential Oligogenic Linkage Analysis Routines,

34 version 4.2.7) program assuming a polygenic model244. All individuals in the pedigree were included in the analysis, and STRP genotypes were provided for the original 384 samples genotyped by the NHLBI Mammalian Genotyping Service. Multipoint variance components linkage analysis of quantitative traits was also performed in SOLAR.

Because of the large size of this pedigree, the multipoint estimates of pairwise identity by descent (IBD) status that is required for SOLAR computation were obtained using the software package Loki (version 2.4.5)245. Parametric multipoint linkage analysis of dichotomized bleeding score (2,  3) and VWF:RCo (≤ 25 IU/dL) were performed using the program lm_markers in MORGAN (version 2.9, http://www.stat.washington.edu/thompson/Genepi/genepi.shtml). Association analyses were also conducted in SOLAR. Multiple testing corrections were implemented using the

Holm-Bonferroni method246. For analysis of bleeding scores with VWF levels, the PROC

LOGISTIC procedure in SAS (version 9.2) was used to fit a multivariable logistic

regression model; pedigree members were treated as independent.

STXBP5 and STAB2 sequencing

Candidate gene primers were designed covering all exons and exon/intron

boundaries of the genomic sequence of STXBP5 and STAB2 using Primer3

(http://frodo.wi.mit.edu/). PCR was performed on genomic DNA and Sanger sequencing

was performed using BigDye V3.1 on an ABI3730xl DNA analyzer247. Analysis of

sequence was conducted using Sequencher V4.9.

Results

VWF and FVIII levels are variable yet reproducible

VWF:Ag, VWF:RCo, and FVIII:C were measured and analyzed in the 384 Amish pedigree members included in the genome-wide linkage analysis. VWF:RCo levels were

35 reported as < 10 IU/dL (lower limit of detection) for 21 pedigree members. For analysis, a value of 5 IU/dL was assigned for these individuals. As the biochemical measures were not normally distributed based on the Kolmogorov-Smirnov test (p < 0.01), log- transformed values were used for subsequent analyses. The VWF c.4120C>T mutation

(detailed below) was present in 74 of the pedigree members, including all pedigree members with VWF:RCo < 10 IU/dL. Of the pedigree members analyzed, males were younger overall, and male unaffected relatives had lower levels of VWF:Ag, VWF:RCo, and FVIII:C than female unaffected relatives (Table 2.1). There was a very strong association between log-transformed VWF:Ag and FVIII:C, VWF:Ag and VWF:RCo, and

VWF:RCo and FVIII:C within unaffected relatives (Spearman rank correlation coefficients = 0.74, 0.88, and 0.69, respectively; all p<0.0001) as well as within VWD patients (Spearman rank correlation coefficients = 0.90, 0.61, and 0.60, respectively; all p<0.0001).

The intra-individual variability was minimal, with intraclass correlation coefficient

(ICC) of 0.94 for VWF:Ag; 0.98 for VWF:RCo (when individuals with VWF:RCo <10, were excluded from the analysis the ICC = 0.97), and 0.95 for FVIII:C. It could be argued that measures for the 11 pedigree members from this subset who carry the

VWF c.4120C>T mutation were less likely to vary. However, four of the 15 pedigree members were non-carriers, and their levels also showed minimal variability.

Table 2.1: Description of the 384 examined pedigree members by gender

All Examined Females Males N = 384 N = 195 N = 189 p-value† Mean Age (Range) 21.5 (0.2, 86.1) 24.1 (0.2, 86.1) 18.9 (0.2, 76.7) < 0.005 Unaffected Relatives* N = 310 N = 159 N = 151 Mean VWF:Ag IU/dL (Range) 102.7 (26, 483) 108.0 (26, 483) 97.0 (42, 239) > 0.10‡ Mean VWF:RCo IU/dL 115.9 (43, 508) 121.4 (46, 508) 110.1 (43, 263) > 0.20‡ (Range) Mean FVIII:C IU/dL (Range) 115.8 (64, 290) 119.3 (64, 290) 112.1 (65, 232) < 0.10‡ * Pedigree members without the VWF c.4120C>T mutation † Wilcoxon Rank Sum Test (two-tailed) ‡ Analysis of log-transformed traits corrected by age

36 Table 2.2: Heritability estimates of log-transformed biochemical phenotypes Trait Heritability estimate p-value VWF:Ag 0.30 p < 610-7 VWF:RCo 0.39 p < 210-6 FVIII:C 0.30 p < 210-8

VWF:Ag, VWF:RCo, and FVIII:C are heritable traits in the pedigree

To investigate the heritability of biochemical phenotypes, a polygenic model was assumed with VWF c.4120C>T mutation status included as a covariate. Gender was initially considered as a possible covariate; however, its effect was not significant, and gender was excluded from the final models. There was strong evidence that each phenotype is influenced by genetic factors (Table 2.2).

Interestingly, when age was included as a covariate, heritability for VWF:Ag significantly decreased to h2 = 0.11 (p < 0.0005) due to a reduced variance component while heritability for VWF:RCo remained stable (h2 = 0.37, p < 2  10-8),

underscoring the previously described effect of age on VWF:Ag136.

A c.4120C>T missense mutation in the A1 domain of VWF underlies familial VWD

Sequence analysis of VWF exon 28 identified a missense mutation at position

c.4120, represented by a single C>T base substitution (c.4120C>T) that segregated with

all pedigree members found to have VWF:RCo < 25 IU/dL. The c.4120C>T mutation

predicts an arginine to cysteine change at amino acid position 1374 (R1374C) in the A1

domain of the mature VWF molecule (Figure 2.1A). The A1 domain has been well

characterized and consists of six β-sheets surrounded by six -helices forming a typical

β structure248. This domain represents the main binding site for the platelet glycoprotein

1b (GPIbα), and mutations in this particular region increase or impair binding of VWF to

37

Figure 2.1: VWF c.4120C>T mutation The c.4120C>T mutation predicts an arginine to cysteine change at position 1374 in the A1 domain of the mature VWF molecule. A) Sequence chromatograms demonstrate a heterozygous C>T change in individuals with VWF:RCo < 25 IU/dL at position c.4120 compared to a homozygous CC4120 signal in WT individuals. B) VWF A1 domain (orange)-GPIbα (dark grey) complex modeled with the WT arginine and mutant cysteine amino acids. As seen in the magnified area, the R1374C mutation appears to cause a conformational change causing disruption in the A1 domain. C) Ristocetin-induced platelet binding to recombinant WT VWF (•), R1374C-VWF (■), and co-expressed alleles R1374C/WT (○). Co-expression of the WT VWF expression vector partially rescues ristocetin-induced platelet binding activity.

platelets resulting in VWD types 2B and 2M, respectively. VWD type 2M is characterized

by VWF:RCo < 25 IU/dL, consistent with reduced VWF binding to GPIbα, yet an

apparently normal VWF multimer pattern16. All 358 CEPH samples evaluated by

TaqMan assay were homozygous for the wild-type (WT) C nucleotide at position c.4120.

Full length sequence of VWF in three pedigree members, two VWD patients and one

unaffected relative, did not identify any additional mutations (data not shown).

This R1374C amino acid change was modeled using the structure of the WT

VWF A1 domain bound to GPIbα, 1SQ0237. Molecular modeling demonstrated that the

38 R1374C substitution affects a salt bridge interaction between the arginine side chain and three residues in the 1-2 loop of the A1 domain: the carbonyl oxygen of residues

K1312 and W1313 and the arginine side chain of residue R1315 (Figure 2.1B).

Disruption of the A1 domain has been shown to cause VWD type 2M54,249-251. Though

R1374 is not located in the 1-2 loop, it most likely helps stabilize the loop conformation. A loss of this salt-bridge in the R1374C mutant was predicted to result in a conformational change in the 1-2 loop, altering the functionality of the A1 domain237.

Expression of the VWF R1374C variant alone and co-expressed with WT VWF showed impaired secretion, which resolves as the WT/R1374C mutant ratio increased.

Confocal microscopic imaging of transfected HEK293 and AtT-20 cells demonstrated granular storage of VWF when R1374C-VWF was expressed alone, or co-expressed with wild-type VWF (data not shown). Multimer analysis of R1374C-VWF demonstrates a smeared pattern compared to the WT (see Appendix D). Functional analysis of ristocetin-induced platelet binding showed markedly decreased binding of R1374C-VWF that was partially corrected with the co-expression of the wild-type allele (Figure 2.1C; also see Appendix D).

c.4120C>T demonstrates complete penetrance for VWF levels but incomplete penetrance for bleeding scores

VWF:RCo levels for all of the pedigree members heterozygous for the c.4120C>T mutation were  25 U/dL, indicating 100% penetrance for the biochemical phenotype. Among these individuals, all but one exhibited VWF:Ag levels below 33

IU/dL (Table 2.3). Of the 74 members who heterozygous for the c.4120C>T mutation, 38 have bleeding scores  3, estimating penetrance for the bleeding phenotype is 51%.

Among individuals older than 12 years-old, penetrance for the bleeding phenotype was

39 Table 2.3: VWF:Ag, VWF:RCo, and FVIII:C and the bleeding score by age group and VWF mutation status Age Group < 12 years*  12 years* Mutation Status VWF C4120T VWF CC4120 VWF C4120T VWF CC4120 Gender (Males/ Females) 16 / 8 75 / 51 20 / 28 77 / 108 Mean Age† 5.1 (0.3, 11.2) 4.9 (0.2, 11.7) 35.4 (16.6, 86.1) 31.5 (12.0, 73.7) Mean VWF:Ag IU/dL† 22.2 (13, 33) 99.6 (44, 239) 27.9 (9, 76§) 104.8 (26, 483) Mean VWF:RCo IU/dL†,‡ 14.0 (10, 21) 113.5 (43, 263) 13.5 (10, 25) 117.6 (43, 508) Mean FVIII:C IU/dL† 47.9 (31, 82) 117.3 (76, 232) 53.6 (25, 108) 114.7 (64, 290) Bleeding Score N (%) 0-1 16 (66.7) 102 (81.6) 6 (12.5) 80 (43.7) 2 3 (12.5) 15 (12.0) 12 (25.0) 53 (29.0) 3 4 (16.6) 6 (4.8) 15 (31.2) 30 (16.4)  4 1 (4.2) 2 (1.6) 15 (31.2) 20 (10.9) * Bleeding score distribution < 12 years of age Fisher’s Exact Test p < 0.10;  12 years of age Pearson chi- square test p < 0.000025 † Mean (Range) ‡Lower limit of measurement: 10 IU/dL § Only one individual with the VWF c.4120C>T mutation exhibited a VWF:Ag level > 35 IU/dL. This value of 76 is likely a lab error the test could not be repeated.

62% (33 of 53). Nineteen (49%) of 39 pedigree members with the c.4120C>T mutation and a bleeding score  2 are children under the age of 12. The distribution of bleeding scores for individuals with the c.4120C>T mutation compared with unaffected relatives was significantly different among pedigree members  12 years of age (p<5.0x10-6) but

not for members younger than 12 years of age (p<0.10). This is consistent with the

progression of clinical symptoms as patients reach menarche and begin to experience

hemostatic challenges such as menstruation, tooth extraction, surgery, and parturition.

In all pedigree members ≥ 12 years of age, the odds of having a bleeding score < 3 were

1.78 (95% CI 1.19-2.65; p<0.005) times higher for each unit increase in log(VWF:Ag),

and the odds of having a bleeding score < 3 were 1.78 (95% CI 1.36-2.34; p<0.0001)

times higher for each unit increase in log(VWF:RCo). When c.4120C>T mutation status

was added to each of the models, neither VWF:Ag nor VWF:RCo were able to predict

the bleeding score with any accuracy in individuals with the c.4120C>T mutation (data

not shown).

40 The ABO blood group is a modifier of VWF levels in the Amish pedigree

ABO genotypes were available in 382 pedigree members. Approximately 60% of pedigree members were found to have type A blood: 8% A1A1, 45% A1O1, and 7% A2O1.

Type O was the second most common blood type: 24% O1O1 and 7% O1O2. Type B

blood, type AB blood, and other type A and type O blood genotypes were rare in the

Amish pedigree (0.3 to 4%). Age-gender-adjusted least squares VWF:RCo, VWF:Ag,

and FVIII:C levels are displayed in Table 2.4 by copy number of an inactive

glycosyltransferase allele (type O) and by VWF c.4120 genotype. The general linear

models analysis identified a significant age-gender-adjusted ABO type  VWF C4120T genotype interaction effect for log-transformed VWF:Ag, VWF:RCo, and FVIII:C levels.

Among unaffected relatives, those with type O blood had significantly lower mean levels of VWF:Ag, VWF:RCo, and FVIII:C than did individuals with type A blood (AA and

AO) as has previously been reported136. Interestingly, among unaffected relatives,

individuals with type A or type B blood that were heterozygous for an inactive

glycosyltransferase allele (AO, BO) had lower VWF:RCo levels than individuals with two

copies of the type A allele or with type AB blood. Among unaffected relatives, ABO

Table 2.4: Least squares mean VWF:Ag, VWF:RCo, and FVIII:C by ABO blood group and VWF 4120 genotype VWF C4120T * VWF CC4120 OO AO+BO AA+AB OO AO+BO AA+AB N = 24 N = 40 N = 8 N = 95 N = 174 N = 41 p-value VWF:Ag† 25.11 21.99 26.34 83.49 100.48 111.88 < 0.01 VWF:RCo‡ 12.03 12.50 11.64 90.29 112.38 134.56 < 0.025 FVIII:C§ 48.58 47.84 52.47 103.27 115.46 116.55 NS * No significant age-gender-adjusted ABO type pairwise differences for VWF:Ag or VWF:RCo or FVIII:C or Platelet Count within VWF C4120T † Age-gender-adjusted VWF:Ag within VWF CC4120: OO vs. AO+BO p<0.001; OO vs. AA+AB p<0.001; AO+BO vs. AA+AB NS, adjusted for multiple comparisons ‡ Age-gender-adjusted VWF:RCo within VWF CC4120: OO vs. AO+BO p<0.001; OO vs. AA+AB p<0.001; AO+BO vs. AA+AB p<0.05, adjusted for multiple comparisons § Age-gender-adjusted FVIII:C within VWF CC4120 ABO type p<0.001: OO vs. AO+BO p<0.0025; OO vs. AA+AB p<0.10; AO+BO vs. AA+AB NS, adjusted for multiple comparisons

41 genotypes explained an additional 8, 12, and 7% of the variability in log-transformed

VWF:Ag, VWF:RCo, and FVIII:C levels, respectively. There were no significant age- adjusted pairwise ABO type differences among individuals with the c.4120C>T mutation, which is consistent with the impact the c.4120C>T mutation has on VWF levels.

Linkage analysis identifies candidate QTL for VWF:Ag and VWF:RCo

Given the significant heritability of the biochemical phenotypes, a genome-wide

QTL linkage analysis was conducted in an attempt to identify new candidate modifier loci

(Figure 2.2). The highest LOD score from the QTL analysis of log- transformed VWF:Ag,

VWF:RCo, and FVIII:C levels corresponded to chromosome 12p13 where VWF has

Figure 2.2: SOLAR genome-wide QTL scans for VWF:Ag, VWF:RCo, and FVIII:C Chromosomes are represented along the x-axis with an estimated LOD score (y-axis) plotted at every 1 cM interval. Chromosome numbers correspond to the middle of the respective region. The y-axis scales with the maximum LOD score for the linkage model. A) VWF:Ag, B) VWF:RCo, C) FVIII:C.

42

Figure 2.3: SOLAR QTL chromosome scans for VWF:Ag and VWF:RCo CentiMorgan (cM) distance is mapped along the x-axis with a LOD score estimate (y- axis) plotted every 1 cM interval to generate a continuous linkage estimate. The y-axis scales with the maximum LOD score for the linkage model. Marshfield Map STRP markers are listed vertically. A) VWF:Ag chromosome 3 scan (3q28, LOD=1.93). B) VWF:Ag chromosome 9 scan (LOD=2.85). C) VWF:RCo chromosome 9 scan (LOD- 2.79). D) VWF:Ag chromosome 16 scan (LOD=1.56). E) VWF:RCo chromosome 16 scan (LOD=2.79)

43 been physically mapped. The maximum LOD score of 64.73 corresponds to linkage with

VWF:RCo levels, the phenotype most directly affected by the VWF c.4120C>T mutation.

A LOD score of 47.12 was obtained in the same region for VWF:Ag. The next highest

LOD scores were established for linkage of VWF:Ag and VWF:RCo to chromosome

9q34 (LOD=2.85 and 2.79, respectively; Figure 2.3). Chromosome 9q34 is the physical

location of the ABO gene that determines the ABO blood type as well as ADAMTS13, which encodes the metalloproteinase that cleaves VWF in circulation. We also identified novel candidate QTL of VWF:Ag and VWF:RCo that map to chromosome 16q11

(LOD=1.56 and 2.79, respectively; Figure 2.3) and to chromosome 3q28 (VWF:Ag,

LOD=1.93; Figure 2.3).

Prior to implementation of SOLAR, the pedigree size limited genome-wide linkage analyses to multipoint parametric analysis of dichotomous traits. We dichotomized VWF:RCo levels at < 25 IU/dL. Among candidate QTL identified, we established a LOD score of 2.18 to a previously reported QTL on chromosome 6p22.

Outside of chromosome 12p13 (VWF) and 9q34 (ABO), this is the first QTL reported concomitantly in multiple studies including the Genetic Analysis of Idiopathic

Thrombophilia (GAIT) Project genome-wide linkage analysis and the Framingham

Heart Study genome-wide association study (GWAS) 134,166,252. While linkage was not replicated by SOLAR, multiple independent studies validate the chromosome 6p22 candidate QTL.

Common variants in ABO, but not ADAMTS13, contribute to the chromosome

9q34 QTL

Previous reports demonstrated tagging SNPs selected from the HapMap Project are applicable to other Caucasian populations and capture a majority of the genetic variation253,254. As an isolated founder population, the Amish have a more uniform

44 genetic background with wider intervals of LD than outbred Caucasian populations255.

However, the Amish demonstrate similar mean heterozygosity of HapMap tagging SNPs

to the CEU dataset, HapMap tagging SNPs capture much of the same variation in the

Amish, and the Amish demonstrate similar LD patterns compared with the CEU

dataset256. The chromosome 9 candidate interval was defined as 135,119,000-

135,315,000 and encompasses both ABO (135,120,384-135,140,451) and ADAMTS13

(135,284,856-135,314,329). We and others have established evidence of ABO as a QTL

of VWF levels. The role of ADAMTS13, which encodes the metalloproteinase that

cleaves VWF in circulation under physiologic conditions, as a candidate QTL is largely

uninvestigated22,36. Fifty-eight tagging SNPs were required to capture both genes in a

continuous interval. Of the 58 tagging SNPs, 11 SNPs were excluded due to low signal

intensity, low cluster separation or low minor allele frequency in the Amish pedigree,

resulting in a total of 47 SNPs used in the analysis.

Analysis of the Amish pedigree revealed evidence of strong LD corresponding to

each ABO and ADAMTS13 and evidence of pockets of long-range LD (Figure 2.4).

There is no evidence of strong pairwise correlations (r2 ≥ 0.80) between tagging SNPs in

ABO and tagging SNPs in ADAMTS13. Three tagging SNPs within ABO and one

intergenic tagging SNP approximately 3.5 kb telomeric to ABO demonstrated significant

associations with VWF:Ag and VWF:RCo (Figure 2.5; Table 2.5). Additionally, we

established significant associations between VWF:Ag and three additional SNPs in ABO

and one additional intergenic SNP approximately 6.5 kb telomeric to ABO. All eight

tagging SNPs to which significant associations were established demonstrate moderate

to high pairwise correlation (r2=0.34-0.81). Intergenic SNPs rs579459 and rs558240 demonstrate high pairwise correlations with ABO SNPs (rs657152 [r2=0.73] and rs500498 [r2=0.70], respectively). Prior to multiple testing corrections, one SNP in

ADAMTS13 demonstrated association with VWF:Ag and VWF:RCo: rs1055432 (p=0.028

45

Figure 2.4: Haploview LD and r2 patterns demonstrate distinct blocks of LD corresponding to ABO and ADAMTS13 LD (A) and r2 (B) patterns were visualized using Haploview. Haplotype blocks were generated by Haploview using the confidence intervals function242. The ABO and ADAMTS13 genes are represented by blue and green boxes, respectively. Lines radiating from the boxes represent tagging SNPs within the respective gene. A) There is evidence strong local LD with pockets of long-range LD within the pedigree. B) Strong pairwise correlations are demonstrated locally.

46

Table 2.5: SNPs associated with VWF levels SNP Gene Phenotype p-value rs2073824 ABO VWF:Ag 2.7x10-4 rs8176694 ABO VWF:Ag 1.9 x10-4 rs657152 ABO VWF:Ag 1.8 x10-7 VWF:RCo 9.9x x10-7 rs474279 ABO VWF:Ag 6.7 x10-6 VWF:RCo 3.4 x10-5 rs500498 ABO VWF:Ag 5.5 x10-4 rs630014 ABO VWF:Ag 2.4 x10-7 VWF:RCo 1.8 x10-6 rs579459 Intergenic VWF:Ag 6.3 x10-6 VWF:RCo 4.3 x10-5 rs558240 Intergenic VWF:Ag 5.1 x10-4

Figure 2.5: P-values of association between VWF:Ag and individual SNPs on 9q34 Association between individual tagging SNPs mapping to chromosome 9q34 and log- transformed VWF:Ag adjusting for age was evaluated using SOLAR. The data show one cluster of SNP associations that reach study-wide significance (dashed line) corresponding to ABO (red) and nearby intergenic (blue) SNPs.

47 and 0.025, respectively; Appendix E). Within the Amish pedigree, rs1055432 is not correlated with ABO tagging SNPs (highest r2=0.12). When ABO is considered as a covariate, no significant associations were established with SNPs in ADAMTS13 (p- values: 0.17-0.96). Taken together, these data suggest common variants (MAF ≥ 0.01)

in ABO contribute to variation in VWF levels whereas common variants in ADAMTS13

were not found to be associated with VWF levels.

Chromosome 6p22 and 16q11 candidate QTL fine-mapping is ongoing

For the chromosome 6p22 candidate QTL, we narrowed the candidate interval to

27,700,000-28,200,000 based on fine-mapping inferred from the resolution of the

Framingham Heart Study GWAS. In addition to 52 tagging SNPs required to capture the

interval, we also genotyped 2 SNPs within the candidate interval (rs9295740 and

rs202906) to which associations with VWF:Ag were reported166. All tagging SNPs

passed quality control standards and no Mendelian inconsistencies were identified. The

chromosome 16q11 candidate QTL 1-LOD confidence interval is 39 Mb in size

(19,400,000-58,451,000) and contains 264 reported genes. To analyze the CEU

HapMap data in Haploview for tagging SNP selection, it was necessary to divide this

interval into four regions. A total of 6,211 tagging SNPs are required to capture this

candidate interval. However, due to GoldenGate platform limitations, we were limited to

3,009 SNPs. To determine SNP coverage by region we multiplied the percentage of total

SNPs required to completely capture the region by 3,009 and limited Tagger to select

the most informative SNPs based on that constraint. Of 3,009 tagging SNPs, 473 were

eliminated for low signal intensity or low cluster separation. An additional 9 SNPs were

eliminated for more than 3 genotype clusters or one genotype cluster for a common

variant (minor allele frequency > 0.15). In the Amish pedigree 10 SNPs were not

informative and were eliminated from further analysis. Association analyses of VWF:Ag

48 and VWF:RCo are ongoing to fine-map candidate QTL on chromosomes 6p22 and

16q11.

STXBP5, SCARA5, STAB2, and TC2N are confirmed as candidate QTL

To date, two meta-analyses of genome-wide association studies of VWF and

FVIII levels have been reported by the Cohorts for Heart and Aging in Genomic

Epidemiology (CHARGE) consortium and the Marseille Thrombosis Association

(MARTHA)167,168. We evaluated association of 14 previously reported SNPs: 10 SNPs in eight genes selected from the CHARGE consortium meta-analysis and 4 SNPs in two genes selected from the MARTHA meta-analysis, with VWF levels and FVIII levels based on previously reports (Table 2.6). For SNPs previously associated with VWF levels, we evaluated association with VWF:Ag and VWF:RCo. Of the 14 SNPs we evaluated, association with VWF or FVIII levels were replicated for seven SNPs in six genes: STXBP5, SCARA5, ABO, VWF, STAB2, and TC2N. P-values are reported for all

SNPs evaluated by VWF:Ag and FVIII:C and for SNPs. P-values are only provided for

SNPs to which significant associations were established with VWF:RCo. Additionally,

where significant associations were established, we report the percent of the trait

genetic variance component explained by the test SNP.

Significant associations were established for 3 SNPs with VWF:Ag and

VWF:RCo (rs9644133, SCARA5; rs687289, ABO; rs1063857, VWF), for 2 SNPs with

VWF:Ag (rs9390459, STXBP5; rs12229292, STAB2), and for 2 SNPs with VWF:RCo

(rs4981022, STAB2; rs10133762, TC2N). Three SNPs associated with VWF levels also demonstrated significant associations with FVIII:C: rs9644133 (SCARA5), rs687289

(ABO), and rs12229292 (STAB2). ABO SNP rs687289 explained 24.6%, 20.6%, and

20.0% of the heritability of VWF:Ag, VWF:RCo, and FVIII:C, respectively. VWF SNP

rs1063857 explained 12.5% and 22.5% of the heritability of VWF:Ag and VWF:RCo,

49 Table 2.6: CHARGE and MARTHA meta-analyses SNPs evaluated for association with VWF and FVIII levels in the Amish pedigree Gene SNP Chr. Phenotype p-value % GVC* Study EPB41L4A rs13361927 5q22.2 VWF:Ag 0.28 MARTHA rs379440 5q22.2 VWF:Ag 0.30 MARTHA STXBP5 rs9390459 6q24 VWF:Ag 0.03 3.5% CHARGE SCARA5 rs2726953 8p21 VWF:Ag 0.22 CHARGE rs9644133 8p21 FVIII:C 0.01 6.0% CHARGE rs9644133 8p21 VWF:Ag 0.02 5.3% CHARGE rs9644133 8p21 VWF:RCo 0.0047 7.8% CHARGE ABO rs687289 9q34 FVIII:C 8.26x10-6 20.0% CHARGE rs687289 9q34 VWF:Ag 1.34x10-5 24.6% CHARGE rs687289 9q34 VWF:RCo 7.07x10-7 20.6% CHARGE VWF rs1063857 12p13 VWF:Ag 0.01 12.5% CHARGE rs1063857 12p13 VWF:RCo 3.62x10-5 22.5% CHARGE STAB2 rs12229292 12q23 FVIII:C 0.021 6.0% CHARGE rs12229292 12q23 VWF:Ag 0.001 8.9% CHARGE rs4981022 12q23 VWF:Ag 0.09 CHARGE rs4981022 12q23 VWF:RCo 0.04 3.9% CHARGE STX2 rs7978987 12q24.3 VWF:Ag 0.16 CHARGE TC2N rs10133762 14q32 VWF:RCo 0.048 7.8% CHARGE ACCN1 rs1354492 17q11.2 FVIII:C 0.77 MARTHA rs12941510 17q11.2 FVIII:C 0.59 MARTHA CLEC4M rs868875 19p13.2 VWF:Ag 0.66 CHARGE * Percent of trait genetic variance component explained by the test SNP respectively. All other SNPs explained less than 10% of trait heritability each. Exonic sequence analysis including exon/intron boundaries of STXBP5 and STAB2 identified only common variants (MAF ≥ 0.05) previously reported in dbSNP (data not shown).

Two-point linkage analysis identifies regions of suggestive linkage to the bleeding score phenotype

To evaluate modifier genes of the bleeding phenotype, we dichotomized the bleeding score at a threshold of ≥ 3 and ≥ 4. Nine regions showed suggestive LOD

scores when the phenotype was defined as bleeding score ≥ 3: 7q32 (LOD = 2.52),

50 12p13.3 (LOD = 1.37), 12p12 (LOD = 1.27), 12q12 (LOD = 1.13), 16q12.2 (LOD =

1.23), 16q23 (LOD = 1.65), 19p13.2 (LOD = 1.10), 21q22 (LOD = 1.38) and 22q13

(LOD = 1.87). Eight regions with LOD scores > 1.0 were identified with the more stringent phenotype of BS ≥ 4: 6p23 (LOD = 1.09), 7q32 (LOD = 1.52), 8p23 (LOD=

1.33), 8q11 (LOD = 1.82), 10q26.3 (LOD = 1.05), 12q12 (LOD = 1.26), 16q12.2 (LOD =

1.48) and 18q12 (LOD = 1.33). Three regions suggested linkage with both bleeding score phenotypes: 7q32, 12q12, and 16q12. Perhaps even more interesting, two regions of suggestive linkage for bleeding overlapped with candidate QTL of VWF levels: 12p13.3 and 16q12.2.

Discussion

In this study we genetically and phenotypically characterized the largest single pedigree with VWD described to date and identified several candidate QTLs that affect

VWF and FVIII levels. One of the advantages of this particular pedigree is the presence of a large number of individuals that exhibit a single unifying VWF mutation in the presence of a variable phenotype characterized by VWF levels and bleeding patterns.

Over the last decade the phenotypes of prevalent human diseases such as cystic fibrosis and cleft lip and palate have been proven to be modulated at least in part by genetic modifiers139-141. Because of the complexity of the genetic and environmental

etiologies, large families or populations are needed to identify genes of even modest

impact257-259. A strategy that can circumvent the limitations of association analysis is to

take advantage of more homogeneous groups, such as the Amish pedigree presented

herein, to conduct genome-wide linkage analysis. Environmental factors such as

physical activity, medications and diet are also known to modify VWF levels and

bleeding tendency and are likely to play a role in the observed variability of VWD145,260.

The environmental homogeneity of the Amish may also provide a distinct advantage in

51 our dissection of the genetic components of this variability261. Linkage analysis results can then be fine-mapped using association-based studies and candidate gene sequencing. This two-step approach has been successfully employed in Alzheimer disease (APOE) and Crohn’s disease (NOD2)257,262.

The VWF c.4120C>T mutation, which exhibits high penetrance for VWF:RCo and VWF:Ag levels, was first described in a single individual from northern France with

VWD type 2 and has been further identified in four individuals from the European Union study48,263. Based on previous reports and our own studies, it is clear that this mutation results in a qualitative defect of VWF. As shown in the model presented in Figure 2.1B and supported by the ristocetin binding experiments, the mutation affects the A1 domain of VWF and appears to alter its binding to platelet GP1bα. When the mutated allele was co-expressed 1:1 with a WT allele in a mammalian cell line, we observed only a modest decrease in total VWF secretion. In contrast, pedigree members with the c.4120C>T mutation have substantially decreased plasma VWF:Ag levels and very low

VWF:RCo activity, suggesting a dominant negative effect.

While the c.4120C>T allele exhibits a significant effect on VWF:Ag, VWF:RCo, and FVIII:C levels, we also identified regions suggestive of linkage on several other chromosomes. The ABO gene, the only well-characterized QTL of VWF levels, maps to chromosome 9q33-34. We established evidence for ABO as a QTL of VWF levels by association and linkage. Furthermore, fine-mapping of the chromosome 9q34 QTL confirmed ABO is the primary gene contributing to variation in VWF levels whereas significant associations were not established between VWF levels and tagging SNPs in

ADAMTS13. To the knowledge of the authors, this study represents the first report to fine-map the chromosome 9q34 QTL distinguishing ABO from ADAMTS13.

Identification of ABO by linkage and association allows us to postulate that other signals identified in this study may have biological and clinical relevance. Interestingly,

52 the chromosome 6p22 candidate QTL is concomitant with the GAIT Project and

Framingham Heart Study and also displays orthology with murine Vwf modifiers264. The

highest LOD score in the Amish pedigree outside of VWF and ABO maps to a new

candidate QTL on chromosome 16q11. Fine-mapping of the chromosome 6p22 and

16q11 candidate QTL is ongoing.

In addition to mapping QTL by genome-wide linkage analysis, we independently

replicated previously reported associations with VWF and FVIII levels, confirming

STXBP5, SCARA5, STAB2, and TC2N as candidate QTL. Significant associations of

tagging SNPs in STXBP5, STAB2, and SCARA5 with VWF and FVIII levels were also

reported by Antoni et al. within the MARTHA cohorts168. The role of these genes in VWF

biosynthesis, processing, secretion, or clearance is unknown.

Associations were reported between STXBP5 variants and VWF:Ag and VWF

collagen binding activity (VWF:CB) in a case-control cohort with cardiovascular disease,

though no significant association was established between STXBP5 and the prevalence

of cardiovascular disease265. The association with VWF:Ag and with VWF:CB was more

significant in the case cohort, indicating STXBP5 may modulate regulatory pathway

VWF secretion. The majority of VWF detected by laboratory testing is released via the

constitutive pathway. Thus, differences in VWF levels between cases and controls are

likely mediated by the regulatory pathway. Furthermore, STXBP5 encodes tomosyn, a

protein implicated in exocytosis266.

In a cohort of VWD type 1 patients, genetic variation in STXBP5 was associated with the bleeding score among females267. Furthermore, significant associations between STXBP5 variants and the incidence of venous thromboembolism (VTE) have also been reported268. The CHARGE consortium study reported the TC2N SNP rs10133762 was associated with increased VWF levels. In the MARTHA and FARIVE cohort studies, the minor allele of rs1884841 (TC2N), which was highly correlated with

53 the rs10133762 (r2=0.961), was more common in cases with VTE than controls

(combined OR = 1.28 [95% CI = 1.11-1.48; p=9.9x104)269.

Interestingly, there are few individuals in the pedigree who exhibit bleeding scores greater than 3 and do not have the VWF c.4120C>T mutation. Amish

communities have been reported to have hemophilia B and plasminogen activator (PAI-

1) deficiency270,271. However, initial screening laboratory analyses among unaffected relatives with bleeding scores greater than 3 have been negative for these deficiencies.

In terms of bleeding, the analysis was complicated by several factors. Linkage analysis was limited by the subjective nature of the phenotype. Because the bleeding score is a Likert scale, we employed two-point linkage analyses and identified several regions suggestive of linkage. These regions are now being pursued for identification of modifiers of bleeding. One could argue that for the bleeding phenotype, the most informative analysis will compare those who bleed upon challenge with those who do not among individuals with the c.4120C>T mutation. In the era of next-generation sequencing, we have seen that rare variants, while at low minor allele frequencies

(<0.01), are collectively common in the general population272. Linkage analyses will

likely prove valuable in sorting through the massive amounts of data to identify causal

variants with a large effect size. Thus, as sequencing costs continue to decrease,

further analysis of the entire exome of pedigree members with VWD will likely yield rare

variants that may affect their bleeding tendency and could potentially be tested in

unrelated populations.

In summary, we characterized a large Amish pedigree with VWD and identified

several chromosomal regions of linkage to VWF levels and suggestive linkage for

disease severity. These regions are actively being investigated at higher resolution.

While genome-wide association studies are characterized by the identification of

common variants of relatively modest effect, linkage studies such as the one we

54 present here may have the ability to identify rare variants as well as allelic heterogeneity, which may explain a large portion of the observed variability. These newly discovered regions that are likely to harbor modifier genes will provide new insights into the genetics of VWF and FVIII levels and could have important implications for understanding the genetic basis of bleeding and thrombosis. Fine-mapping of the chromosome 9q34 QTL confirms that common variants in ABO primarily modulate VWF levels as no significant SNP associations were established within ADAMTS13.

Sequencing of ADAMTS13 is needed to evaluate the possibility of rare or private variants in ADAMTS13 that may contribute to linkage to the 9q34 QTL for VWF levels.

We also replicated GWAS associations confirming STXBP5, STAB2, SCARA5, and

TC2N as candidate QTL of VWF and FVIII levels. Consistent with previous genome- wide QTL analyses, all candidate FVIII:C QTL loci identified by linkage analysis or replicated by association overlap with candidate VWF:Ag loci. Thus, all QTL effects on

FVIII are likely mediated indirectly through its noncovalent relationship with VWF.

55 CHAPTER III

VWF SNP ANALYSIS IDENTIFIES MULTIPLE COMMON VARIANTS THAT OCCUR

AT DIFFERENT FREQUENCIES IN VWD TYPE 1 PATIENTS

Abstract

Von Willebrand disease (VWD) type 1 is a complex diagnosis in which nearly half of patients do not have an identifiable mutation in the VWF gene. To date, the role of common variants that affect VWF levels is not understood in cases of VWD type 1. We genotyped 94 tagging SNPs through VWF in an Amish pedigree (n=445). Linkage

disequilibrium (LD) patterns and pairwise correlations predict four haplotype blocks with

evidence of long-range LD. Single SNP association tests established significant

associations between VWF:RCo and twelve tagging SNPs: rs2362479 (telomeric to

VWF), rs10849363 and rs7964554 (intron 4); rs216308 (intron 24); rs216310 and

rs216311 (exon 28); rs11612370 and rs11612384 (intron 33); rs216330, rs11064003,

and rs2239162 (intron 37); and rs216293 (intron 38). rs216308 through rs216293 map

to haplotype block 3, which encompasses exon 28 and the functional domains of the

VWF polypeptide. To next evaluate the relevance of these SNPs in a broader Caucasian

population, we conducted association analyses in three population-based cohorts. In

total, rs11064003 and rs2239162 VWF associations were replicated in the ZPMCB-VWD

control (n=150), GABC (n=934), and TSS (n=2,145) cohorts. An additional four SNP

associations were replicated in the ZPMCB-VWD control and TSS cohorts. Three SNP

associations were replicated in ZPMCB-VWD control cohort. These findings confirm

VWF SNPs identified in the Amish pedigree are relevant among Caucasians of Northern

European ancestry. Multivariable logistic regression models predict different VWF SNPs

are most important in explaining the variation of VWF:Ag, VWF:RCo, and the

VWF:RCo/VWF:Ag ratio. We then compared the minor allele frequency (MAF) of the

56 tagging SNPs between the ZPMCB-VWD control cohort and the ZPMCB-VWD type 1 index case cohort (n=310) and established that the minor allele of rs7964554 (lower

VWF:Ag) was more common whereas the minor alleles of rs11612370, rs11612384, rs11064003, and rs2239162 (all higher VWF:Ag) were less common in VWD type 1 index cases.

Introduction

Von Willebrand disease (VWD) is the most common inherited bleeding disorder and presents clinically with a pattern of mucocutaneous bleeding44. VWD is classified as

quantitative (type 1 or 3) or qualitative (type 2) abnormalities of the glycoprotein von

Willebrand factor (VWF)45. Conversely, elevated VWF levels are associated with acute myocardial infarction (MI) as well as reinfarction and mortality rates68-74. Elevated levels

of VWF are also associated with lacunar infarct and ischemic stroke76-78.

VWD type 1 encompasses mild quantitative deficiencies in VWF. Kindreds with familial VWD type 1 are characterized by incomplete penetrance, estimated at 60%, and variable expressivity. Despite being considered a “monogenic” disease, approximately

37-45% of patients with VWD type 1 have no putative VWF mutation. Mutations in VWF are more likely to be identified in cases with VWF antigen (VWF:Ag) levels less than 30

IU/dL48-50. Over 100 VWD type 1 mutations have been reported throughout VWF, the majority of which are inherited in an autosomal dominant pattern47.

Von Willebrand Factor (VWF) is a large multimeric protein synthesized exclusively by endothelial cells and megakaryocytes that plays a prominent role in hemostasis and thrombosis27. The VWF gene spans 178 kb on chromosome 12p13 and is composed of 52 exons, where exon 1 encodes the 5’-untranslated region (UTR) and a signal peptide, exons 2-17 encode the VWF propeptide, and exons 18-52 encode the

VWF peptide monomer and 3’-UTR28-30. VWF is synthesized via constitutive and

57 secretory pathways, generating multimers ranging in size from 0.5 to 20 million daltons32. Within the endoplasmic reticulum, the signal peptide is cleaved and the VWF

pre-propeptide undergoes initial glycosylation events and C-terminal disulfide-linked

dimerization24. VWF is further processed in the Golgi apparatus, including glycosylation

with H antigen carbohydrate moieties processed by the ABO blood system37.

Glycosylated VWF dimers polymerize via N-terminal disulfide bonds within the trans-

Golgi and secretory granules to form multimers of variable size32. High molecular weight

multimers, the most biologically active multimers, are stored in endothelial Weibel

Palade bodies and platelet α-granules36. VWF is also released through the constitutive pathway, which contributes 95% of plasma VWF levels measured by laboratory assays32,39. Endothelial stress induces storage granule release, increasing the local concentration of high molecular weight VWF multimers40-42.

Among the general population, VWF levels exhibit a high degree of inter-person variability134-136. Significant determinants of VWF levels include ethnicity, age, gender,

and body mass index144,146-149. Pedigree and twin studies estimate a genetic component of 0.25-0.75 with pedigree analyses providing consistently more conservative estimates150-153. The genetic variance component consists of allelic variation at VWF as

well as other QTLs. VWF is a highly polymorphic gene with over 1,100 reported

mutations and polymorphisms154. The impact of these changes remains unknown. To

date, ABO is the best-characterized QTL that affects VWF levels, accounting for up to

30% of the genetic variability of VWF levels37,136,150.

In this study we genotyped 94 tagging SNPs through VWF in a

multigenerational Amish pedigree to evaluate the contribution of VWF variants to protein

levels and activity. Replication in three independent population cohorts identified multiple

VWF variants associated with VWF:Ag, VWF:RCo, and the VWF:RCo/VWF:Ag ratio.

Evaluation of the minor allele frequency (MAF) identified significant differences in the

58 occurrence of five variants in VWD type 1 index cases compared to controls. Elucidation of the environmental and genetic etiology underlying VWF and FVIII variation may provide insight to improve the diagnosis of VWD as well as more accurate risk stratification for excessive bleeding.

Methods

Participants

Amish pedigree

As described in Chapter II, the Amish pedigree studied is composed of over

2,000 individuals, of whom 788 provided blood samples. Informed consent was obtained following institutional guidelines and according to the Declaration of Helsinki. Of the pedigree member samples available, 445 were included in this analysis.

ZPMCB-VWD cohort

The ZPMCB-VWD cohort is a multicenter collection of smaller cohorts enrolled at eight primary clinical centers in Atlanta, Detroit, Houston, Indianapolis, Iowa City,

Milwaukee, New Orleans, and Pittsburgh146. In this study, we utilized the control cohort and the VWD type 1 index case cohort. The control cohort consists of 231 unrelated individuals with no known family history of VWD. Of these individuals, 150 are

Caucasian and 66 are African American. The remaining 10 individuals are of other ethnicities and were not included in this study. The VWD type 1 index case cohort consists of 361 patients, of whom 310 are Caucasian, 16 are African American, and 14 are multiethnic. Ethnicity is not available on the remaining 20 individuals. Of the VWD type 1 index case cohort, we focused on Caucasian samples due to insufficient sample size for the other ethnicities. Informed consent was obtained following institutional guidelines at each primary site and according to the Declaration of Helsinki.

59 Genes and Blood-Clotting and Trinity Student cohorts

As previously described, the Genes and Blood-Clotting (GABC) study cohort consists of 1,152 healthy siblings between the ages of 14 and 35 years-old that were recruited from the University of Michigan (Ann Arbor, MI) between June 2006 and

January 2009273. The Trinity Student Study (TSS) cohort consists of 2,524 healthy ethnically Irish Caucasian students attending Trinity College Dublin (Dublin, Ireland) during the academic year of 2003-2004273,274. Informed consent was obtained following

institutional guidelines at each primary site and according to the Declaration of Helsinki.

VWF:Ag and VWF:RCo levels

As described in Chapter II, testing for VWF levels in the Amish pedigree was

performed at the Blood Center of Wisconsin (Milwaukee, WI). VWF:Ag was measured by

enzyme-linked immunosorbent assay (ELISA) and VWF:RCo was measured by

automated analysis of formalin-fixed platelet agglutination. VWF:Ag levels in the GABC

and TSS cohort were measured by ELISA as previously described 273. The

VWF:RCo/VWF:Ag ratio is used to evaluate the potential of a qualitative deficiency in the

VWF peptide. A ratio near 1 suggests a quantitative deficiency whereas a ratio ≤ 0.7 suggests a qualitative deficiency. VWD type 1 is defined as a mild deficiency in VWF levels characterized by moderately reduced VWF:Ag and VWF:RCo, a

VWF:RCo/VWF:Ag ratio near 1, and with all species of multimers usually present by electrophoretic analysis36.

SNP selection and genotyping in the Amish pedigree

VWF tagging SNPs were selected using the Tagger algorithm as implemented in

Haploview using pairwise tagging with an r2 threshold of 0.80242,243. As described in

Chapter II, Centre d’Etude du Polymorphisme Humain from Utah (CEU) data

60 downloaded from the HapMap Genome Browser Release #28 was used to evaluate linkage disequilibrium and pairwise r2 of tagging SNPs. A total of 94 SNPs were required to tag an interval of 5,888,000-6,144,000 (NCBI36/hg18 coordinates) on chromosome

12, capturing VWF plus 40 kb upstream and downstream of the gene. For a complete list of tagging SNPs genotyped, see Appendix F. Of the 94 SNPs, 14 map within exons 23-

34, a region with 97% homology to the VWF pseudogene on chromosome 2233. Primers

were designed with at least 2 differences between VWF and the corresponding

pseudogene sequence (see Appendix G for primer sequences), and the 14

corresponding SNPs were genotyped by Sanger Sequencing247. The remaining 80 SNPs

were genotyped using an Illumina VeraCode assay by BodySync (Aurora, CO). All SNPs

were filtered per manufacturer’s quality control recommendations using GenomeStudio.

Mendelian inconsistencies were evaluated using PEDCHECK and inconsistent

genotypes were resolved241.

SNP genotyping in the ZPMCB-VWD control and VWD type 1 index case cohorts

SNPs rs216308, rs216310, rs216311, rs11612370, and rs11612384, which map

between exons 23 and 34 (pseudogene region) were genotyped by Sanger sequencing

as previously described in Chapter II. Primers were designed using Primer3

(http://frodo.wi.mit.edu/) to amplify rs10849363, rs7964554, rs216330, rs11064003,

rs2239162, and rs216293 (see Appendix G).

Genotype imputation in the GABC and TSS cohorts

The GABC and TSS cohorts were genotyped using the Illumina Human Omni 1M

Quad v1_B SNP array. For the purposes of imputation in the GABC cohort, study SNPs

were selected using a composite quality filter combined with a minor allele frequency

(MAF) threshold of 0.01, with all founder samples included in the imputation. The

61 GENEVA (http://www.genevastudy.org) Coordinating Center (CC) performed genotype imputation of the GABC project using BEAGLE software 3 (http:// http://faculty.washington.edu/ browning/beagle/beagle.html). Imputation was performed for all autosomes as well as the X chromosome, with a reference panel selected from the 1000 Genomes Project (http://www.1000genomes.org). While the majority of GABC participants self-identified as Caucasian, a range of other racial groups were also represented. In order to allow imputation of all GABC participants simultaneously, the

CC created a “cosmopolitan” 1000 Genomes reference panel comprising the unique intersection of BEAGLE-formatted haplotypes from the AFR, EUR, and ASN panels.

Relatedness in GABC subjects was ignored here and all high quality samples were imputed together in one set. For a detailed description of genotype quality control analysis (QC) for the GABC project, please see the dbGAP report

(http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000304.v1.p1).

Within the TSS cohort, 757,533 autosomal SNPs were retained following standard QC for all 2,145 individuals to impute 2,512,420 SNPs based on HapMap II

(release #22, n = 90) and 3,908,761 SNPs based on HapMap III (release #28, n = 90) in

CEU phased haplotypes using MACH1.0. We used 200 randomly selected individuals from all 2,145 subjects for estimating model parameters with 100 iterations of rounds using the Monte Carlo procedure and all available haplotypes. We then imputed all

SNPs in the HapMap II and HapMap III reference panels to the 2,145 individuals’ genotypes in our dataset. We carried out single marker analysis using simple linear regression of the dosage of each imputed marker (scaled from 0 to 2) of high quality based on the squared correlations between imputed and actual genotypes.

62 Statistical analysis

For all association analyses of VWF tagging SNPs with VWF:Ag or VWF:RCo,

SNPs were modeled including age, gender, and rs687289 (ABO surrogate marker) as covariates. As described in Chapter II, association analyses within the Amish pedigree were also conducted in SOLAR; gender was not found to be a significant covariate and was removed from the final analysis model. Multiple testing corrections were implemented using the Holm-Bonferroni method246. In the ZPMBC-VWD control and

VWD type 1 index case cohorts, SAS (9.2) PROC FREQ was used to evaluate the genotype and allele frequency distributions. Tests for associations between SNPs and

VWF:Ag or VWF:RCo were conducted assuming an additive genetic model. The PROC

LOGISTIC procedure in SAS was then used to fit a multivariable logistic regression model. In the GABC cohort, sibling relationships were adjusted for using EMMAX. For both the GABC and TSS cohorts, association analyses were conducted using PLINK.

Results

VWF:Ag and VWF:RCo were measured in a multigenerational Amish pedigree.

Of the 445 subjects included in this study, 83 were heterozygous for a familial c.4120C>T mutation in VWF that causes VWD type 2 that is inherited in an autosomal dominant pattern (see Chapter II). We analyzed VWF:Ag and VWF:RCo with respect to gender in individuals homozygous for the wild-type (WT) allele at position 4120 (Table

3.1).

Table 3.1: VWF levels by gender in Amish subjects with the CC4120 genotype All Examined Females Males p-value* Number of subjects 362† 184 176 Mean VWF:Ag IU/dL (Range) 107.0 (26, 483) 110.9 (26, 483) 102.9 (42, 262) 0.10 Mean VWF:RCo IU/dL 114.6 (43, 508) 118.2 (44, 508) 110.8 (43, 263 0.15 (Range) * t Test, GraphPad Prism † Gender information was not available for two subjects

63

Figure 3.1: Haploview linkage disequilibrium and r2 patterns identify four haplotype blocks in VWF Linkage disequilibrium (LD; A) and r2 (B) patterns were visualized using Haploview. There is evidence of long-range LD within the pedigree. Based on LD and r2 patterns, we defined four haplotype blocks. Strong pairwise correlations are demonstrated within defined haplotype blocks.

64 A total of 94 tagging SNPs were genotyped encompassing the VWF gene and 40 kb upstream and downstream of the gene. Nine tagging SNPs genotyped were not informative in the Amish pedigree. An additional 4 SNPs in complete LD with other tagging SNPs were excluded from further analysis. One additional SNP was excluded because of Mendelian inconsistencies. As expected, visualization of haplotype block structure demonstrated extensive long-range LD within the Amish pedigree (Figure

3.1A). Additionally, we identified several pockets of short-range LD, defining four haplotype blocks for further analysis. Tagging SNPs within haplotype blocks demonstrate high pairwise correlation with pockets of moderate to high long-range correlation (Figure 3.1B). Based on LD and r2 patterns, we defined four haplotype blocks. Block 1 encompasses the VWF promoter region through intron 4 and block 3 encompasses exon 28 and the functional domains of the VWF peptide. Haplotype estimations within blocks 1-4 and association analyses with VWF:Ag and VWF:RCo are ongoing.

Single SNP analysis in the Amish identifies 12 SNPs associated with VWF:RCo

In addition to ongoing haplotype analysis, single SNP association analyses with

VWF:Ag and VWF:RCo were conducted using SOLAR. The c.4120C>T mutation exhibits 100% penetrance for VWF:Ag and VWF:RCo and is subsequently included as a covariate. Previously, we demonstrated ABO and age are strong modifiers of VWF levels (see Chapter II), both of which were included as covariates in this study. After multiple testing corrections, 12 SNPs achieved study-wide significance for association with VWF:RCo (Table 3.2). Though tagging SNPs were distributed throughout the VWF gene, significant associations were concentrated in two regions of the gene: intron 4 and exon 28 through intron 38 (Figure 3.2). VWF exon 1 encodes a signal peptide, exons 2-

17 encode the propeptide, and exons 18-52 encode the VWF monomer28-30.

65 Table 3.2: VWF tagging SNPs that achieved study-wide significance for association with VWF:RCo SNP Position Location P-value rs2362479 6,026,354 Upstream 9.03 x10-5 rs10849363 6,066,357 Intron 4 3.16x10-4 rs7964554 6,067,152 Intron 4 4.17x10-7 rs216308 6,124,666 Intron 24 5.08x10-5 rs216310 6,127,943 Exon 28 5.32x10-5 rs216311 6,128,443 Exon 28 9.17x10-5 rs11612370 6,136,832 Intron 33 2.20x10-5 rs11612384 6,136,846 Intron 33 6.72x10-5 rs216330 6,146,630 Intron 37 1.65x10-4 rs11064003 6,149,706 Intron 37 1.16x10-4 rs2239162 6,151,670 Intron 37 3.78x10-4 rs216293 6,153,659 Intron 38 1.80x10-4

Figure 3.2: P-values of association between VWF:RCo and single SNPs in VWF Association between individual tagging SNPs in VWF and log-transformed VWF:RCo adjusting for age and ABO was evaluated using SOLAR. The data show two clusters of SNP associations that reach study-wide significance (dashed line) corresponding to haplotype blocks 1 and 3. The largest cluster of significant SNP associations maps to block 3, which includes exon 28 and the functional domains of the VWF peptide.

66 Two SNPs that map to intron 4 achieved study-wide significance for association

with VWF:RCo. rs10849363 and rs7964554 are in moderate LD (r2=0.31). Neither SNP

demonstrated high pairwise correlation with the remaining VWF tagging SNPs in Table

3.2 (r2 range: 0.00-0.14). Only one SNP outside of VWF, rs2362479, achieved study- wide significance. rs2362479 maps 31.7 kb telomeric to VWF and demonstrates moderate long-range LD with VWF tag SNP rs7964554 (r2=0.53).

We established significant associations with VWF:RCo for two SNPs in exon 28, the largest and most polymorphic exon: rs216310, a synonymous variant, and rs216311, a nonsynonymous variant. rs216311 encodes an A>G change at position 4391 of VWF,

resulting in a threonine to alanine substitution at amino acid position 1381 (T1381A).

PolyPhen-2 analysis predicts the T1381A variant is benign (score=0.00). rs216311 is in

LD with several other significant SNPs, including rs216308, rs11612370, rs11612384,

rs216330, rs11064003, rs2239162, and rs216293, which demonstrated

moderate to high pairwise correlations (r2 range: 0.42-0.95). Interestingly, rs216310 is not correlated with the other significant SNPs reported herein (r2 range: 0.015-0.031).

After multiple testing corrections, no SNP achieved study-wide significance for association with VWF:Ag. Table 3.3 lists the top six VWF tagging SNPs associated with

VWF:Ag. For rs2362489, rs10849363, and rs7964554, we also established significant associations with VWF:RCo. rs1159993 is a less common variant with a MAF of 0.013 in the CEU population and does not demonstrate pairwise correlations with other reported

Table 3.3: Top six VWF SNPs associated with VWF:Ag SNP BP Location p-value rs2362479 6,026,354 Upstream 0.049 rs10849363 6,066,357 Intron 4 0.017 rs7964554 6,067,152 Intron 4 6.46x10-4 rs1159993 6,083,564 Intron 9 0.0092 rs216314 6,136,580 Intron 34 0.030 rs216295 6,154,206 Intron 38 0.022

67 tagging SNPs. rs216314 and rs216295 are in almost complete LD and demonstrate strong correlation in the Amish pedigree (r2=0.99). Neither of these SNPs demonstrates pairwise correlations with VWF:RCo SNPs sufficient to be considered adequate substitutes.

Replication of SNP associations in the ZPMCB-VWD, GABC, and TSS cohorts

Eleven VWF tagging SNPs with established VWF:RCo associations were selected for analysis in three replication cohorts: rs10849363, rs7964554, rs216308, rs216310, rs216311, rs11612370, rs11612384, rs216330, rs11064003, rs2239162, and rs216293. The replication cohorts consisted of three distinct population cohorts, including the ZPMCB-VWD control cohort, the GABC sibling cohort, and the TSS cohort

(Table 3.4).

The ZPMCB-VWD control cohort includes 150 Caucasian subjects and 66

African American samples. Variants associated with VWF levels in African-American control subjects within the cohort were previously reported146. We demonstrated using two-factor Analysis of Variance (ANOVA) that VWF:Ag was significantly higher in

African-American control subjects compared to Caucasian subjects (p=0.016), which has a profound effect on the VWF:RCo/VWF:Ag ratio (p<0.0001), consistent with previous observations146. Based on ethnic differences in VWF:Ag, we anticipated significant distribution differences of allele frequencies between the Caucasian and African-

American control subjects. For 10 of the 11 candidate SNPs genotyped, the MAF was significantly different between the Caucasian and African-American control subjects

(Table 3.5).

Within the Caucasian control subjects, we replicated nine significant associations with VWF:RCo (Table 3.6). The most significant VWF:RCo associations were

68 Table 3.4: VWF levels by population-based replication cohort Cohort VWF:Ag VWF:RCo ZPMCB-VWD Control (Caucasian)* 119.9 (41, 289) 120.2 (46, 301) ZPMCB-VWD Control (A. American) * 151.82 (67, 399) 130.0 (52, 290) GABC 108.8 (0, 350)† Not available TSS 108.4 (0, 265) † Not available * VWF:Ag differs significantly between the Caucasian and African American control subjects (p=0.016). † VWF:Ag of 0 is suspected to be a failed test and corresponding individuals are excluded from analysis

Table 3.5: MAF by ethnic group in the ZPMCB-VWD control cohort Minor Allele Frequency SNP Caucasian African American p-value rs10849363 0.43 0.18 2.0×10-12 rs7964554 0.34 0.45 2.7×10-4 rs216308 0.4 0.14 5.5×10-7 rs216310 0.36 0.12 1.5×10-6 rs216311 0.35 0.13 4.2×10-6 rs11612370 0.26 0.41 0.0019 rs11612384 0.26 0.42 9.3×10-4 rs216330 0.33 0.3 0.6 rs11064003 0.29 0.47 1.8×10-6 rs2239162 0.33 0.44 1.9×10-4 rs216293 0.42 0.2 1.2×10-12

Table 3.6: P-values of association tests by ethnicity and phenotype Caucasian African American SNP VWF:Ag VWF:RCo Ratio* VWF:Ag VWF:RCo Ratio* rs10849363 0.20 0.44 0.59 0.94 0.97 0.90 rs7964554 0.25 0.55 0.51 0.29 0.28 0.99 rs216308 0.28 0.01 0.0002 0.22 0.70 0.093 rs216310 0.30 0.0024 2.5x10-6 0.071 0.41 0.050 rs216311 0.17 0.0013 1.2x10-5 0.080 0.46 0.041 rs11612370 0.0008 5.1x10-5 0.087 0.69 0.11 0.024 rs11612384 0.0003 1.9x10-5 0.083 0.19 0.02 0.077 rs216330 0.10 0.0025 0.0014 0.90 0.86 0.54 rs11064003 0.0014 0.0002 0.098 0.50 0.62 0.060 rs2239162 0.019 0.03 0.80 0.98 0.35 0.18 rs216293 0.021 0.01 0.22 0.54 0.69 0.070 * VWF:RCo/VWF:Ag ratio

69 established with rs11612370 and rs1162384, which are in almost complete LD (r2=0.91).

We established five significant associations with VWF:Ag and four significant

associations with the VWF:RCo/VWF:Ag ratio. Interestingly, the five SNPs with

associations to both VWF:Ag and VWF:RCo (rs11612370, rs11612384, rs11064003,

rs2239162, and rs216293) were not found to be associated with the VWF:RCo/VWF:Ag

ratio. Conversely, the four SNPs associated with VWF:RCo but not VWF:Ag (rs216308,

rs216310, rs216311, and rs216330) also established significant associations with the

VWF:RCo/VWF:Ag ratio. In light of extensive LD within the VWF gene, these findings

suggest that these nine tagging SNPs constitute two different haplotypes that contain

unique variants modulating VWF levels. Multivariable logistic regression models predict

rs11612384 is the most important SNP in explaining variation of VWF:Ag, rs216311 and

rs11064003 are the most important SNPs in explaining variation in VWF:RCo, and

rs216310 is the most important SNPs in explaining the variation of the

VWF:RCo/VWF:Ag ratio, respectively.

As suspected, the SNP VWF:RCo associations established in the Amish

pedigree, a Caucasian discovery cohort, were not replicated in the African-American

control subjects (Table 3.6). The most significant p-value established corresponded to

rs11612384 and VWF:RCo (p=0.02). At a threshold of α=0.05, significant associations

were established between both rs216310 and rs216311 and the VWF:RCo/VWF:Ag ratio

(p=0.050 and 0.041, respectively).

To further analyze the 11 tagging SNPs identified in the Amish pedigree study,

we imputed genotypes from the GABC and TSS population cohorts. In the GABC cohort,

genotypes were available in 934 subjects. In the TSS cohort, we were able to impute

genotypes for nine SNPs in 2,145 subjects. VWF:Ag was the only phenotype available in

both cohorts for evaluation. The MAF and p-values for association tests with VWF:Ag

are provided in Table 3.7.

70 Table 3.7: SNP MAF and p-values of association with VWF:Ag in the GABC and TSS replication cohorts GABC TSS SNP MAF(%) p-value MAF(%) p-value rs10849363 41.5 0.16 43.4 0.86 rs7964554 35.8 0.97 36.4 0.95 rs216308 37.2 0.76 Not available rs216310 36.5 0.52 Not available rs216311 35.4 0.60 38.7 0.42 rs11612370 23.4 0.050 27.8 8.0x10-8 rs11612384 23.4 0.050 27.8 8.0x10-8 rs216330 32.6 0.59 35.1 0.0099 rs11064003 26.1 0.0017 30.9 1.34x10-11 rs2239162 30.3 9.6x10-6 33.3 7.35x10-15 rs216293 43.5% 0.080 47.7% 2.59x10-8

Within the GABC cohort, rs2239162 demonstrated the most significant association (p=9.6x10-6) with VWF:Ag. Three SNPs in moderate LD with rs2239162 were also found to be associated with VWF:Ag: rs11612370 (r2=0.66, p=0.050), rs11612384 (r2=0.66, p=0.050), and rs11064003 (r2= 0.57, p=0.0017).

We established significant associations with VWF:Ag for six of nine SNPs queried in the TSS cohort (Table 3.7). As in the GABC cohort, the most significant association with VWF:Ag was rs2239162 (p=7.35x10-15). Pairwise correlation data were not available in the TSS cohort to assess independence of the remaining four associated

SNPs, but we anticipate that rs11612370 and rs11612384 are in complete or near complete LD, as demonstrated in the GABC cohort.

Analysis of minor allele frequency in the ZPMCB-VWD Type 1 index case cohort

Within the ZPMCB-VWD cohort, we genotyped 11 candidate SNPs in 310

Caucasian VWD type 1 index cases. We determined the MAF in the index cases and compared it to those established in the ZPMCB-VWD control cohort (Table 3.8).

71 Table 3.8: Comparison of MAF between the ZPMCB-VWD control and VWD type 1 index case cohorts Minor Allele Frequency (%) SNP Control (n=150)* VWD Type 1 (n=310) p-value rs10849363 43.2 49.7 0.067 rs7964554 34.0 46.2 0.0005 rs216308 40.3 36.0 0.21 rs216310 36.1 31.8 1.20 rs216311 35.3 32.2 0.36 rs11612370 25.0 18.3 0.018 rs11612384 25.9 18.3 0.009 rs216330 33.0 28.8 0.19 rs11064003 29.5 21.8 0.012 rs2239162 33.0 22.0 0.0004 rs216293 41.6 38.1 0.31 * Caucasian subjects from the ZPMCB-VWD control cohort

The minor allele of rs7964554 was more common (p=0.0005), while the minor alleles of

rs11612370, rs11612384, rs11064003, and rs2239162 were less common (p=0.018,

0.009, 0.012, and 0.0004, respectively) in VWD type 1 index cases compared to

controls. If the subject has VWD type 1, then the odds of being heterozygous for the

minor allele of rs7964554 was 1.47 (95% CI: 1.08, 2.00; p=0.014; Table 3.9). Similarly,

the odds of being heterozygous for the minor allele of rs216308 was 0.64 (95% CI: 0.45,

0.91; p=0.014). The estimated odds ratios of a VWD type 1 index case being

homozygous for the minor alleles of rs7964554 or rs216308 were 2.17 and 0.41,

respectively.

Sequencing of VWF exons and exon/intron boundaries was previously

conducted, and suspected VWD type 1 mutations were reported by individual (data not

shown). To date, molecular classification of VWD type 1 mutations has been lacking.

Hence, we also compared MAF of VWD type 1 index cases with no identified mutations

to index cases with reported mutations (Table 3.10). Interestingly, the minor alleles of

exon 28 variants rs216310 and rs216311 were more common in VWD type 1 index

72 Table 3.9: Odds ratio of having the minor allele in VWD type 1 index cases SNP Odds Ratio 95% CI p-value rs7964554 1.47 1.08, 2.00 0.014 rs216308* 0.49 0.32, 0.74 0.0009 rs2239162 0.64 0.45, 0.91 0.014 * Comparing VWD type 1 index cases with no mutation to index cases with a reported VWF mutation

Table 3.10: MAF in VWD type 1 index cases by VWF mutation status Minor Allele Frequency (%) SNP No Mutation (n=180) Mutation (n=130) p-value rs10849363 51.4 47.2 0.3071 rs7964554 48.3 43.4 0.231 rs216308 41.3 28.7 0.0015 rs216310 36.9 24.8 0.0016 rs216311 36.7 26.0 0.0055 rs11612370 19.6 16.4 0.3138 rs11612384 19.7 16.4 0.3066 rs216330 32.5 23.6 0.0173 rs11064003 22.3 21.1 0.733 rs2239162 21.9 22.3 0.9087 rs216293 41.4 33.5 0.0501

cases with no mutations than in index cases with a reported mutation (p=0.0016 and

0.0055, respectively). The minor alleles of rs216308 and rs216330 were also more common in index cases with no mutation. Interestingly, the MAF of rs7964554, rs11612370, rs11612384, rs11064003, and rs2239162 were not significantly different with respect to VWD type 1 mutations. If the subject had a reported VWF mutation, then the odds of being heterozygous for the minor allele of rs2163308 was 0.49 (95% CI:

0.32, 0.74; p=0.0009; Table 3.9), while the odds ratio for being homozygous was 0.24.

Discussion

We established significant associations of 12 VWF tagging SNPs with VWF:RCo

in the Amish pedigree. Nine of the twelve SNPs identified in the Amish discovery cohort

73 map to haplotype block 3, which contains exon 28 and the functional domains of VWF, including the A1 domain that binds platelet receptor glycoprotein Ibα (GPIbα) to mediate platelet tethering and rolling at the site of endothelial damage12,13. The interaction of

VWF with platelet GPIbα can be simulated using the antibiotic ristocetin which agglutinates formalin-fixed platelets in the presence of VWF (VWF:RCo). Haplotype analysis within the Amish pedigree is ongoing.

Of the 12 VWF:RCo SNPs identified in the Amish pedigree, two are exonic: rs216310 (synonymous variant) and rs216311 (T1381A). The T1381A variant maps to the A1 loop domain that binds GP1bα (data not shown). Interestingly, D1472H, an A1 loop variant which was previously reported to be associated with a lower

VWF:RCo/VWF:Ag ratio in the ZPMCB-VWD control cohort 146, is also predicted to be a

benign variant (Polyphen-2 score = 0.00). Further characterization of D1472H

determined the variant affected ristocetin binding but did not ultimately alter VWF-

platelet interactions. While T1381A may be a functional variant, it may also be in LD with

other variants modulating VWF levels. Alternatively, as with D1472H, T1381A may

represent an artificial variant that affect the binding of ristocetin but does not alter VWF-

platelet interactions.

Nine VWF:RCo SNPs were intronic with one intergenic variant 31.7 kb telomeric

to VWF. Recently, the Encyclopedia of DNA Elements (ENCODE) Project Consortium

provided experimental and computational data to predict potential protein binding sites

including transcription factor binding sites, DNA methylation sites, histone binding

patterns, and other regulatory features275,276. Multiple components of the ENCODE dataset can be explored using the UCSC Genome Browser

(http://genome.ucsc.edu/ENCODE/). The region of intron 24 encompassing rs216308 demonstrated DNase hypersensitivity in the ENCODE dataset, suggesting it may comprise a protein binding site (APPENDIX H). For all other variants analyzed, the

74 ENCODE data do not predict the SNPs coincide with protein binding sites or other regulatory features (data not shown).

Providing further exploration of predicted functionality, GERP++ is an algorithm that allows genomic evolutionary rate profiling by estimating the evolutionary rates for individual alignment columns of multiple mammalian species to generate a rejected substitutions score277,278. The rejected substitution score ranges from -12.3 to 6.17 at the base-pair-level where positive rejection scores are calculated when fewer substitutions are observed than expected assuming a model of neutrality, which is suggestive of evolutionary constraint. The rejection score spectrum shifts toward neutral (0) as allele frequency increases such that approximately 75% of common variants (MAF between

0.01 and 0.50) have rejection scores < 2.0. For rs216308, the rejection score is 2.28.

Thus, rs216308 may represent a functional variant under evolutionary constraint.

Previously, Van Hout et al. conducted a genome-wide survey of LD in the Old

Order Amish and compared their findings to the CEU dataset from the International

HapMap project256. Allele frequencies were not significantly different for more than 99%

of SNPs evaluated, though the authors report a slightly lower heterozygosity rate in the

Amish reflecting the larger number of monomorphic SNPs in the Amish relative to the

CEU dataset. LD patterns were also similar between the Amish and the CEU dataset

with slightly less haplotype diversity and larger haplotype blocks in the Amish. The

authors concluded the limited founder population of the Amish is not sufficient to

significantly alter the MAF or LD patterns when compared to outbred populations. Thus,

it is reasonable to assume association analyses and LD patterns established in the

Amish may be applicable to other Caucasian populations.

All nine SNPs in haplotype block 3 were replicated in the ZPMCB-VWD control

cohort. We established significant associations with VWF:Ag and VWF:RCo to four

SNPs. Significant associations were established for the remaining five SNPs with

75 VWF:RCo and the VWF:RCo/VWF:Ag ratio. Cogently, this observation suggests that variants altering both VWF:Ag and VWF:RCo do not ultimately significantly alter the

VWF:RCo/VWF:Ag ratio whereas variants that alter only VWF:RCo also alter the

VWF:RCo/VWF:Ag ratio. Multivariable logistic regression analysis established that different variants are most important for determining the variability of VWF:Ag,

VWF:RCo, and the VWF:RCo/VWF:Ag ratio. Within the TSS cohort, we also established significant associations between six of nine SNPs available for analysis and VWF:Ag.

Inconsistent replication in the GABC cohort may be due to population stratification.

Though the majority of the GABC cohort self-reported as Caucasian, other ethnic groups were also represented. In total, of the 12 VWF tagging SNPs identified in the Amish pedigree, significant associations were established between VWF levels and rs11064003 and rs2239162 in the ZPMCB-VWD control, GABC, and TSS cohorts.

Furthermore, significant associations were established between VWF levels and rs11612370, rs11612384, rs216330, and rs216293 in the ZPMCB-VWD control and TSS cohorts. We established a significant association for VWF levels with rs216311 in the

ZPMCB-VWD control cohort as well as rs216308 and 216310, which were not available for analysis in the TSS cohort.

VWD type 1 is a complex diagnosis where symptoms and levels may overlap with the general population. Furthermore, 37-45% of patients do not have a “causal” mutation in VWF. The role of VWF variants that contribute to normal variation of levels remains unclear in VWD type 1. We report two SNPs with significantly different MAF in

VWD type 1 index cases when compared to controls: rs7964554 (odds ratio=1.47) and rs2239162 (odds ratio=0.64). The minor allele of rs216308 was less common in VWD type 1 index cases with a reported VWF mutation than in VWD type 1 index cases with no mutation. For rs7964554, we established a significant association with VWF levels only in the Amish pedigree whereas significant associations for rs216308 and rs2239162

76 were established in multiple cohorts. Our results suggest that SNPs associated with common variation in VWF levels may contribute to the complex genetic etiology of VWD type 1.

77 CHAPTER IV

QUANTITATIVE TRAIT LOCUS LINKAGE ANALYSIS IN A LARGE AMISH

PEDIGREE IDENTIFIES NOVEL CANDIDATE LOCI FOR ERYTHROCYTE TRAITS

Abstract

We characterized a large Amish pedigree and, in 377 pedigree members, analyzed the genetic variance components with covariate screen as well as genome- wide quantitative trait locus (QTL) linkage analysis of red blood cell count (RBCC), hemoglobin (HB), hematocrit (HCT), mean corpuscular volume (MCV), mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), red cell distribution width (RDW), platelet count (PLT), and white blood cell count

(WBCC) using SOLAR. Age and gender were found to be significant covariates in many

CBC traits. We obtained significant heritability estimates for RBCC, MCV, MCH, MCHC,

RDW, PLT, and WBCC. We report four candidate loci with LOD scores above 2.0: 6q25

(MCH), 9q33 (WBCC), 10p12 (RDW), and 20q13 (MCV). We also report eleven candidate loci with LOD scores between 1.50 and 1.99. Bivariate linkage analysis of

MCV and MCH on chromosome 20 resulted in a higher maximum LOD score of 3.14.

Linkage signals on chromosomes 4q28, 6p22, 6q25, and 20q13 are concurrent with previously reported QTL. All other linkage signals reported herein represent novel evidence of candidate QTL. Interestingly rs1800562, the most common causal variant of hereditary hemochromatosis in HFE (6p22) was associated with MCH and MCHC in this family.

Introduction

Complete blood counts (CBC) are one of the most commonly ordered laboratory tests to provide a rapid and inexpensive appraisal of a patient’s status. Standard CBC

78 analyses report several erythrocyte measures, including red blood cell count (RBCC), hemoglobin level (HB), hematocrit (HCT), mean corpuscular volume (MCV), mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), and red blood cell distribution width (RDW), as well as platelet count (PLT) and white blood cell count (WBCC). In addition to providing a cursory overall evaluation of current status, several associations have been reported between CBC traits and thrombotic disease. RBCC and HCT demonstrate association with increased risk of cardiovascular disease (CVD); HCT and HB are risk factors for ischemic heart disease; MCV is a risk factor for peripheral artery disease (PAD); and RDW is associated with increased risk of myocardial infarction (MI), heart failure, and stroke279-284. Furthermore, HCT is

associated with morbidity and mortality in CVD and RDW is associated with higher

mortality rates in cases of heart failure, pulmonary emboli, and PAD279,281,283-285. Platelets play an important role in acute MI pathogenesis; furthermore, baseline PLT at the time of diagnosis is a strong predictor of reocclusion, reinfarction, and mortality even one year post-MI205. WBCC is an independent risk factor for the development of new coronary

heart disease and a prognostic marker in existing cardiovascular disease286.

CBC traits are multifactorial phenotypes determined by complex etiologies that

display significant inter-person variation as well as intra-person longitudinal

stability212,287. Direct and inverse correlations have been well-documented among erythrocyte traits, including RBCC, HB, HCT, and MCV287. Intra-individual longitudinal

stability and strong correlations among hematologic traits suggest these traits are

subjected to homeostatic regulation, which is influenced by both environmental and

genetic factors. Several co-morbidities of thrombotic disease are also independent

environmental modifiers of CBC traits including smoking, alcohol consumption, and use

of oral contraceptives288-290. Likewise, gender, age, and body mass index are

documented modifiers of multiple CBC traits including RBCC, MCV, and WBCC287,288,291.

79 Multiple heritability and variance components analyses report genetic variance components ranging from 0.10 to 0.86 for RBCC, HB, HCT, MCV, MCH, MCHC, WBCC, and PLT216,217,292. Though heritability estimates have varied, there is evidence for both a significant genetic component and a significant environmental component for all standard CBC traits (with the exception of RDW which has not been previously evaluated)287.

Several genes have been implicated in hematopoiesis due to the identification of rare mutations in patients with Mendelian phenotypes. However, common variants in these same and other genes are likely involved in modulating more mild variation in hematologic phenotypes. One approach to QTL that affect complex traits like the CBC traits is to focus on a single large, culturally isolated pedigree where it is reasonable to assume that the genetic component represents the common variation of an outbred population while being under the control of fewer QTL224,225. Thus, it is anticipated that

related individuals with similar trait values will have a higher amount of sharing of genetic

material identical by descent (IBD) near the gene or genes that contribute to the

quantitative trait. Additionally, genetic isolates demonstrate a more homogenous

environmental component than populations that are not socially or geographically

isolated, whereas a collection of ancestrally unrelated smaller pedigrees tend to have

more variable lifestyles than genetic isolates. We identified, characterized, and collected

blood samples from a single Amish pedigree diagnosed with von Willebrand disease

(VWD). CBC traits in this pedigree show variability that is representative of the normal

population. QTL analysis of the pedigree identified several chromosomal regions that

appear to harbor QTLs for CBC traits, as reported herein. In addition to identifying novel

candidate loci, we also report independent replication of three previously reported QTLs.

80 Methods

Amish pedigree and complete blood counts

As described in Chapter II, the Amish pedigree is composed of more than 2,000 individuals. Informed consent was obtained following institutional guidelines and according to the Declaration of Helsinki. Of the 395 pedigree members genotyped for genome-wide linkage analysis, eleven pedigree members were not included because of their uncertain position in the pedigree and seven others did not have CBC measures.

Seventy-two individuals were heterozygous for a C>T missense mutation at position c.4120 (c.4120C>T) in VWF that results in an arginine to cysteine amino acid substitution at codon 1374 (R1374C), resulting in VWD type 2. CBC analyses were performed within 24 hours for all individuals and consisted of RBCC, HB, HCT, MCV,

MCH, MCHC, RDW, PLT, and WBCC. RBCC, HB, and MCV that were measured directly, whereas HCT, MCH, and MCHC were mathematically derived from directly measured erythrocyte traits (MCVxRBC, HB/RBCC, and MCH/MCV, respectively)293.

RDW is a measure of the variation of erythrocyte width as evaluated by MCV and is

defined by an inverse relationship between the standard deviation of MCV and the mean

MCV. Age-adjusted mean CBC measures were compared between males and females

and between VWF c.4120C>T mutation carriers and non-carriers (wild-type sequence at

position c.4120; CC4120) by fitting a two-factor analysis of variance model.

Genome-wide scan

As described in Chapter II, DNA was extracted with the Gentra Puregene Kit

(Qiagen) and genotyping of pedigree members was completed by the NHLBI

Mammalian Genotyping Service at Marshfield, WI using Screening Set 16 with 400 short

tandem repeat polymorphic (STRP) markers at a 10 centiMorgan (cM) average interval.

Data were processed and linkage analysis data files were created. Mendelian

81 inconsistencies were identified using the computer program PEDCHECK and resolved241.

HFE rs1800562 and ABO rs687289 genotyping

rs1800562 (HFE C282Y) and rs687289 (ABO) were genotyped in the Amish

cohort using a commercially-available 5’ allelic discrimination high throughput

fluorescent genotyping assay (TaqMan, Life Technologies).235 Hardy-Weinberg

equilibrium was assessed using MENDEL. Assessment of an association of rs180562

and rs687280 with CBC traits was conducted using SOLAR.

Covariate, heritability, and linkage analyses

Variance components analysis was conducted using the SOLAR (Sequential

Oligogenic Linkage Analysis Routines, version 4.2.7) program and the significance of

age, gender, rs687289 (an ABO blood group proxy), and the VWF c.4120C>T mutation

as covariates for CBC traits RBCC, HB, HCT, MCV, MCH, MCHC, RDW, PLT, and

WBCC was assessed244. The genetic variance component of the quantitative phenotypes was estimated using SOLAR assuming a polygenic model incorporating covariates found to be significantly associated with the analyzed trait. All individuals in the pedigree were included in the analysis. Multipoint variance component quantitative trait linkage analysis was also conducted using SOLAR. Because of the large size of this pedigree, the multipoint estimates of pairwise identity by descent (IBD) status that are required for SOLAR computation were obtained using the software package Loki

(version 2.4.5)245.

82 Results

Description of CBC traits and heritability estimates

Standard CBC analyses were conducted for 384 Amish pedigree members; CBC data were available for 377 (Table 4.1). As gender is known to be associated with several erythrocyte traits, we described all CBC traits by gender and VWF mutation status. Males had a higher mean RBCC, HB, HCT, and MCHC and lower mean WBCC than females, which is consistent with previously reported findings217. Two-factor age-

adjusted Analysis of Variance (ANOVA) established males were younger than females

and had a significantly higher mean RBCC, HB, HCT, and MCHC. Pedigree members

who were heterozygous (C4120T, VWD) mutation carriers had significantly higher PLT,

WBCC, and RDW than homozygous (CC4120, wild-type [WT]) members (Table 4.1).

Additionally, previous studies reported that RBCC, HB, and HCT are highly

correlated traits287. Within the Amish cohort, we demonstrated strong correlations

(r2≥0.80) between RBCC, HB, and HCT as well as MCV and MCH (Table 4.2).

Table 4.1: Comparison of CBC traits by gender and VWF mutation Age RBCC HB HCT MCV MCH MCHC RDW PLT WBCC Mean 21.5 4.8 13.8 40.4 84.4 29.0 34.1 13.2 299.5 7.3 Male 18.8* 4.9† 14.1† 41.3† 83.4 28.6 34.2* 13.3 298.0 7.3 Female 24.2 4.6 13.4 39.6 85.4 29.3 33.9 13.2 301.1 7.4

VWF C4120T 25.1 4.8 13.8 40.5 84.8 28.8 34.0 13.4‡ 310.8‡ 7.7‡ VWF WT 20.7 4.8 13.8 40.4 84.3 29.0 34.1 13.2 297.0 7.2

Standard 17.2 0.4 1.5 4.1 4.8 3.3 1.1 1.3 78.2 2.8 Deviation Male 16.5 0.4 1.6 4.6 4.3 1.7 1.2 1.5 80.4 3.2 Female 17.5 0.3 1.2 3.4 5.1 4.3 0.9 1.0 76.1 2.4

VWF C4120T 19.6 0.4 1.6 4.6 5.4 2.1 1.1 1.0 77.7 3.5 VWF WT 16.5 0.4 1.4 4.0 4.7 3.5 1.1 1.3 78.2 2.6 * p<0.025 comparing males vs. females † p<0.0001 comparing males vs. females ‡ p<0.05 comparing VWF C4120T vs. VWF WT

83 Table 4.2: Correlation of CBC traits RBCC HB HCT MCV MCH MCHC RDW PLT WBCC RBCC 1 0.80 0.82 0.04 0.05 0.06 -0.15 -0.18 -0.19 HB 0.80 1 0.97 0.54 0.56 0.24 -0.42 -0.37 -0.30 HCT 0.82 0.97 1 0.56 0.49 0.05 -0.41 -0.32 -0.22 MCV 0.04 0.54 0.56 1 0.89 -0.03 -0.55 -0.32 -0.19 MCH 0.05 0.56 0.49 0.89 1 0.34 -0.11 -0.23 -0.19 MCHC 0.06 0.24 0.05 -0.03 0.34 1 -0.11 -0.23 -0.19 RDW -0.15 -0.42 -0.41 -0.55 -0.52 -0.11 1 0.19 0.10 PLT -0.18 -0.37 -0.32 -0.32 -0.37 -0.23 0.19 1 0.51 WBCC -0.19 -0.30 -0.22 -0.19 -0.26 -0.19 0.10 0.51 1

We also identified moderate correlations (r2 range: 0.49 to 0.56) of HB and HCT with

MCV and MCH as well as moderate inverse correlations between RDW and HB, HCT,

MCV, and MCH (r2 range: -0.41 to -0.55).

In the variance components analysis, age was significantly associated with all

CBC traits (Table 4.3). Gender was significantly associated with RBCC, HB, HCT,

MCHC, and WBCC. The VWF c.4120C>T mutation was not associated with any CBC trait and was excluded as a covariate from all subsequent analyses. For RBCC, age- squared was also significant. Significant genetic variance components estimates were obtained for RBCC, MCV, MCH, MCHC, RDW, PLT, and WBCC, with MCHC, RDW,

PLT, and WBCC demonstrating the highest genetic variance components (0.48, 0.35,

Table 4.3: CBC traits covariate and heritability estimates analysis Age Gender C4120T * rs687289† GVC§ p-value p-value p-value p-value GVC§ p-value RBCC 7.005x10-14 2.478x10-21 0.119 0.258 0.23 0.00675 HB 6.056x10-39 2.699x10-19 0.312 0.267 0.12 0.0792 HCT 2.435x10-33 2.750x10-16 0.229 0.596 0.04 0.215 MCV 5.482x10-31 0.153 0.553 0.716 0.20 0.00213 MCH 6.135x10-25 0.841 0.789 0.840 0.22 0.00916 MCHC 0.01881 1.056x10-4 0.813 0.663 0.48 1.060x10-9 RDW 1.773x10-13 0.411 0.134 0.462 0.35 5.103x10-9 PLT 1.682x10-13 0.0972 0.994 0.348 0.57 3.140x10-12 WBCC 1.238x10-7 7.601x10-3 0.427 0.258 0.41 2.360x10-11 * VWF c.4120C>T mutation, VWD † rs687289 SNP in ABO, a surrogate marker of ABO blood type § GVC: genetic variance components

84 0.57, and 0.41, respectively; Table 4.3).Within the Amish pedigree, the genetic variance components for HB and HCT were not significant.

Previous studies reported evidence for the ABO gene on chromosome 9q33 as a candidate QTL for several erythrocyte traits223,294. We identified linkage of WBCC to

chromosome 9q33 (Table 4.4), but we did not identify concomitant linkage for

erythrocyte traits. In characterizing the Amish pedigree for VWD, ABO blood types were

determined by sequencing (see Chapter II). Type O blood is over-represented within the

Amish cohort and type B blood is under-represented. Analyses of blood types with CBC

traits were limited to homozygous type A (A,A), heterozygous type A (A,O), and

homozygous type O (O,O). No significant differences in CBC trait means were identified

between blood group genotypes (data not shown). Additionally, rs687289, a surrogate

genetic marker for ABO, was previously genotyped within the Amish cohort. With

SOLAR, rs687289 was evaluated as a covariate of CBC traits, and no association was

found with WBCC (p=0.258) or other erythrocyte traits (Table 4.3).

Genome-wide QTL linkage analysis of CBC traits

Genome-wide QTL linkage analysis was conducted using SOLAR for RBCC,

MCV, MCH, MCHC, RDW, PLT, and WBCC (Figure 4.1). Age was included as a

covariate for all CBC traits and gender was included as a covariate for RBCC, MCHC,

and WBCC (Table 4.3). HB and HCT were excluded as there was no evidence for a

significant genetic component in the Amish pedigree. Four candidate loci with logarithm

of odds (LOD) scores above 2.0 were identified (Table 4.4; Figure 4.2): 6q25 (MCH),

9q33 (WBCC), 10p12 (RDW), and 20q13 (MCV). We also identified eleven candidate

loci on eight chromosomes with LOD scores between 1.5 and < 2.0 (Table 4.4): RBCC

(2q37.3, 3p22, 4q28), MCV (4q35, 7p15), MCHC (6p22), RDW (13q32), WBCC (2q35,

10p15, 13q14), and PLT (22q11).

85

Figure 4.1: Genome-wide SOLAR QTL linkage scans Quantitative trait locus linkage analysis was performed using SOLAR. The graphs represent the genome-wide scan results for RBCC (A), MCV (B), MCH (C), MCHC (D), RDW (E), PLT (F), and WBCC (G). Somatic chromosomes are represented along the x- axis, and chromosome numbers are centered below the respective graph. LOD scores are represented along the y-axis with the scale relative to the maximum LOD score of the scan. As QTL analysis is a multipoint chromosome-wide analysis, the traces represent the LOD score as a continuous trace across the chromosome evaluated at each centiMorgan (cM). The maximum LOD score for single-trait analyses corresponds to MCH and maps to chromosome 6q25 (LOD=2.46) followed by WBCC on 9q33 (LOD=2.18), MCV on 20q13 (LOD=2.17), and RDW on 10p12 (LOD=2.09).

86

Figure 4.2: Chromosome-specific SOLAR QTL linkage scans for LOD scores > 2.0 Each panel represents a chromosome-wide scan for phenotypes where a candidate QTL was identified with a LOD score greater than 2.0. The map position along the chromosome is represented in cM along the X-axis and the LOD score is represented along the Y-axis. STRP marker names genotyped along each respective chromosome are imposed on the graphs vertically to represent the approximate coverage of each signal. Linkage traces are plotted continuously across a chromosome with an estimated LOD score at each cM. A) Chromosome 6, MCH (LOD=2.46). B) Chromosome 9, WBCC (LOD=2.18). C) Chromosome 10, RDW (LOD=2.09). D) Chromosome 20, MCV (LOD=2.17).

87 Table 4.4: Genome-wide QTL linkage analysis results LOD score > 1.5 Nearest Chromosome Phenotype 1-LOD Interval (Mb) Marker LOD p-value 1q32 PLT 208.28-227.02 GATA48B01 1.77 p=0.003 2q35 WBCC 222.70-235.96 GATA4G12 1.53 p=0.005 GATA12H10, 2q37.3 RBCC 242.01-243.20 1.80 p=0.001 GATA178G09 3p22 RBCC 26.72-41.17 GATA27C08 1.71 p=0.001 4q28 RBCC 112.12-136.85 GATA62A12 1.91 p=0.001 4q35 MCV 190.78-191.15 165zf8ZP 1.76 p=0.001 6q25 MCH 151.41-161.61 GATA184A08 2.46 p<0.001 6p22 MCHC 16.00-38.78 GATA163B10 1.55 p=0.003 7p15 MCV 18.03-26.97 GATA41G07 1.77 p=0.001 9q33 WBCC 121.94-133.23 GATA64G07 2.18 p=0.002 10p15 WBCC 0.00-4.93 GATA88F09 1.98 p=0.002 10p12 RDW 18.41-38.59 GATA70E11 2.09 p=0.001 13q32 RDW 102.10-109.72 ATA26D07 1.61 p=0.005 13q14 WBCC 47.06-77.57 GATA11C08 1.61 p=0.005 20p12.1-p11.23 MCV-MCH 11.14-48.64 GATA129B03 3.14 N.A. 20q13 MCV 44.00-48.51 GATA47F05 2.17 p<0.001 22q11 PLT 0.00-23.22 GATA198B05 1.69 p=0.004

The chromosome 10p12 QTL for RDW represents one of the strongest linkage signals identified herein. Linkage between chromosome 9q33 and WBCC represents a novel finding. The candidate QTL for RBCC that maps to chromosome 4q28 is a novel finding, although linkage of RBCC to 4q31.2-q34.1 was reported in an Australian adolescent twin cohort (Table 4.5)295. Additionally, we identified a candidate QTL with a

LOD score suggestive of linkage for MCV at 4q35 (Table 4.4, Figure 4.3). These three

linkage intervals for RBCC and for MCV, which is inversely proportional to RBCC,

suggest the presence of a QTL on the long arm of chromosome 4. The candidate

linkage intervals for MCHC and MCH on chromosome 6p22 and 6p25, respectively, and

MCV on chromosome 20q13 are consistent with previously reported QTL (Table 4.5). All

other candidate linkage intervals in Table 4.4 are evidence of novel candidate QTL.

88 Table 4.5: Previously published candidate QTL concomitant with Amish pedigree Chromosome Phenotype SNP Gene Statistics Study 4q31.2-q34.1 RBCC LOD=3.07 Iliadou295 6p22.1 HB rs1800562 HFE p=5.74x10-19 Ganesh296 6p22.1 HB rs1800562 HFE p=1.40x10-15 Lo223 6p22.1 HCT rs1800562 HFE p=2.04x10-9 Ganesh296 6p22.1 HCT rs1800562 HFE p=2.50x10-10 Lo223 6p22.1 MCH rs1800562 HFE p=2.76x10-9 Kullo297 6p22.1 MCV rs1800562 HFE p=1.40x10-23 Soranzo222 6p22.1 MCV rs1800562 HFE p=1.01x10-46 Ganesh296 6p22.1 MCV rs1800562 HFE p=6.80x10-16 Lo223 6p22.2 MCH rs1408272 SLC17A3 p=3.87x10-39 Ganesh296 6p22.2 MCH rs17342717 SLC17A1 p=4.66x10-8 Kullo297 6q25.1 RBCC rs727979 TAB2 p=6.30x10-6 Yang166 6q25.3-q27 RBCC rs2023048 LOD=2.9 Yang166 20p12.3 RBCC rs6108011 p=5.8x10-6 Yang166 20q13.2 HB rs6013509 TSHZ2 p=1.054x10-10 Ganesh296

Figure 4.3: SOLAR QTL linkage scans of chromosome 4 for RBCC and MCV The map position along the chromosome is represented in cM along the X-axis and the LOD score is represented along the Y-axis. STRP marker names genotyped along each respective chromosome are imposed on the graphs vertically to represent the approximate coverage of each signal. Linkage traces are plotted continuously across a chromosome with an estimated LOD score at each cM. (A) RBCC (LOD=1.91), (B) MCV (LOD=1.76).

89 Two candidate loci on chromosome 6, corresponding to 6p22 for MCHC and

6q25 for MCH, were identified (Table 4.4). Ten associations with five erythrocyte measures were previously reported on chromosome 6p22, eight of which are with rs1800562, a nonsynonymous variant in the hemochromatosis gene, HFE (Table 4.5). rs1800562 has been associated with HB, HCT, and MCV in two meta-analyses of

Caucasian cohorts223,296. Additionally, rs1800562 associations have been reported with serum iron as well as transferrin traits298. Fifty linkage and association signals of erythrocyte measures, including RBCC, HCT, HB, MCH, MCHC, and MCV, have been reported at multiple loci on the long arm of chromosome 6 in multiple Caucasian cohorts166,223,295,296. The Framingham Heart Study reported linkage and association of

RBCC with 6q25.3-q27 and 6q25.1, respectively, which map within the 6q25 candidate

linkage interval for MCH (Table 4.5)166. Additionally, for MCV, a LOD score weakly suggestive of linkage was identified on 6q25, overlapping with the MCH QTL (Appendix

I). Due to the strong correlation between MCV and MCH, we conducted bivariate QTL linkage analysis for MCV-MCH on chromosome 6. Bivariate analysis lowered the maximum LOD score achieved with MCH, suggesting that 6q25 primarily impacts MCH.

Multiple associations between erythrocyte traits and loci on chromosome 20 have been reported166,296. QTL analysis also identified weak evidence of linkage between chromosome 20q13 and MCH (Appendix I). We therefore conducted bivariate analysis for MCV-MCH on chromosome 20, which resulted in a higher maximum LOD score of

3.14 (Table 4.4, Figure 4.4). When considered alone, the peak linkage region for MCV maps to 20q13 and the 1-LOD interval is restricted to the long arm of chromosome 20.

Bivariate analysis of MCH-MCV shifted the peak linkage region to 20p12.1-p11.23. The

1-LOD interval of the bivariate QTL encompasses the centromere and the entire MCV

QTL reported herein. RBCC is inversely proportional to both MCV and MCH, and HB is proportional to MCH. In Caucasian populations, an association between RBCC and

90

Figure 4.4: SOLAR QTL linkage scan of chromosome 20 for MCV-MCH The map position along the chromosome is represented in cM along the X-axis and the LOD score is represented along the Y-axis. STRP marker names genotyped along each respective chromosome are imposed on the graphs vertically to represent the approximate coverage of each signal. Linkage traces are plotted continuously across a chromosome with an estimated LOD score at each cM. The maximum LOD score of 3.14 maps to 20p12.1-p11.23 whereas the peak LOD score for the univariate analysis of MCV maps to the long arm of chromosome 20 (figure 4.2). Markers GATA29F06 and GATA42A03 flank the centromere, which corresponds to the trough within the 1-LOD candidate QTL interval.

rs6108011 corresponds to the peak of the 20p12.1-p11.23 MCV-MCH linkage interval

and an association between HB and rs6013509 (3 Mb telomeric to the 1-LOD linkage

interval for MCV-MCH).166,296

Proportion of variation attributable to candidate QTL

To evaluate the proportion of variation accounted for by the candidate QTL reported herein, we employed an internal function of SOLAR that estimates the proportion of variance associated with a linkage element (denoted as H2q1) for each marker. Table 4.4 lists the short tandem repeat polymorphic (STRP) marker nearest the peak LOD score that was used to estimate H2q1 for the corresponding QTL. From this analysis, we determined that 6q25 (GATA184A08) accounted for 24.15% of the variation

91 in MCH, 9q33 (GATA64G07) accounted for 13.40% of the variation in WBCC, 10p12

(GATA70E11) accounted for 26.67% of the variation in RDW, and 20q13 (GATA47F05) accounted for 37.41% of the variation in MCV in the Amish pedigree. Furthermore, on chromosome 4, 4q28 (GATA62A12Z) accounted for 24.15% of the variation in RBCC and 4q35 (165zf8ZP) accounted for 31.41% of the variation in MCV.

HFE C282Y is associated with MCH and MCHC

Interestingly, rs1800562, which has been associated erythrocyte traits in multiple studies (Table 4.5), corresponds to the amino acid change C282Y in HFE, the most common causal variant of hereditary hemochromatosis (MIM: 235200). Within the Amish pedigree, rs1800562 was found to be in Hardy-Weinberg equilibrium, with a minor allele frequency of 0.16. With adjustment for age and gender, there was significant associations of rs1800562 genotype with MCH (p=0.0036) and MCHC (p=0.0056), but not with RBCC (p=0.4966), HB (p=0.0666), HCT (p=0.5241), MCV (p=0.3354), or RDW

(p=0.1025).

Discussion

CBC traits are multifactorial phenotypes determined by environmental and genetic components. Clinical assessment of CBC traits as risk factors and prognostic factors in thrombotic disease has been inconsistent due to the complex etiology of the phenotypes as well as a publication bias for small clinical studies that identify large effects299,300. Further elucidating environmental and genetic determinants of hematologic

traits will allow for more accurate analyses and improve estimates of clinical significance.

No genes within the 1-LOD interval for RBCC on chromosome 4q28 have been

previously reported to affect erythropoiesis. Three genes involved in cellular replication

and differentiation, though not directly implicated in erythropoiesis, demonstrate

92 relatively high expression in hematopoietic cell linages, including early erythroid cells

(GNF Expression Atlas 2 Data from U133A and GNF1H Chips, UCSC Genome Browser; http://www.genome.uscs.edu): MAD2L1, EXOSC9, and CCNA2. MAD2L1 encodes a component of the mitotic spindle assembly checkpoint and is homologous to MAD2, a checkpoint component that stabilizes kinetochore-microtubule attachments301. Cyclin A2

(CCNA2) is a mammalian cell cycle regulator that promotes G1/S and G2/M transitions, and siRNA-mediated knockdown of CCNA2 in K562 cells resulted in a significant

decrease in erythroid differentiation following doxorubicin administration302. In low doses doxorubicin induces erythroid differentiation of K562 cells (human leukemic cell lineage), suggesting that CCNA2 may play a role as a key regulator of erythropoiesis.

Additionally, we identified a candidate QTL with a LOD score suggestive of linkage for MCV that maps to chromosome 4q35 (Table 4.4) and contains MLF1IP which encodes MDS/myeloid leukemia factor 1 (MLF1) interacting protein, a component of the kinetochore complex. Overexpression of Hls7, the mouse ortholog of MLF1, results in suppression of erythropoietin-induced differentiation of the J2E erythroleukemic cell line, suggesting MLF1 and its binding partners (including MLF1IP) may play a role in suppression of erythrocyte maturation303. While the causal gene on chromosome 4 remains unclear, the two linkage intervals for RBCC and MCV reported herein and the previous report of linkage of RBCC strongly suggest the presence of a QTL for RBCC on the telomeric half of the long arm of chromosome 4.

The short arm of chromosome 6 contains several gene clusters, including transfer RNA genes, histone cluster component genes, and the major histocompatibility complex gene cluster. Three candidate genes within the linkage interval for MCHC include MAD2L1BP, the binding partner of MAD2L1, a candidate gene for RBCC previously discussed; HFE, to which multiple MCH, HB, and HCT associations have been reported (Table 4.5); and FANCE, a Fanconi anemia gene.

93 HFE encodes an MCH class-I type gene that regulates serum iron levels by increasing the dissociation constant of transferrin from the transferrin receptor complex, reducing transferrin-mediated iron uptake within the gastrointestinal tract, and inhibiting iron efflux from monocyte and macrophage cells304,305. rs1800562, which has been associated with multiple erythrocyte traits as previously discussed, corresponds to a cysteine to tyrosine mutation at position 282 (C282Y), the most common mutation underlying the autosomal recessive disease hereditary hemochromatosis.

Hemochromatosis is characterized by systemic iron overload but also anemia of chronic disease due to iron withholding from maturing erythrocytes306.

C282Y is a common HFE variant, with an estimated prevalence of 5-10% among

populations of northern European ancestry and rare among other ethnic groups.307 In the

Amish pedigree, we identified a higher MAF, approximately 16%, which may be due to sampling bias or may reflect a higher prevalence of the C282Y variant among Anabaptist population isolates. Hfe -/- mice are resistant to Salmonella typhimurium septicemia, which may explain a selective pressure resulting in the prevalence of the C282Y among

Caucasian populations308. Healthy cohorts screened for C282Y status and evaluated for

evidence of hemochromatosis or iron overload-related disease demonstrate a range in

the prevalence of clinically significant hemochromatosis from less than 1% to 28%,

estimates that have been disputed as being too low and too high, respectively309,310.

Thus, it is reasonable to hypothesize that C282Y, as well as other common variants within HFE, may also contribute to complex iron-dependent traits, such as the erythrocyte measures MCH and MCHC.

The peak linkage region for MCV-MCH on chromosome 20 maps to 20p12.1-

20p11.23 and contains SEC23B, a COPII vesicle coat component involved in ER-to-

Golgi trafficking. Mutations in SEC23B have been identified in several familial cases of

congenital dyserythropoietic anemia (CDA) type II, a form of anemia resulting from

94 ineffective erythropoiesis which is characterized by phenotypic variability311. Short hairpin RNA (shRNA)-mediated knock-down of sec23b in zebrafish results in aberrant erythropoiesis311. BMP2, which maps to 20p12.3 (telomeric of the 1-LOD linkage interval

reported herein), encodes a bone morphogenetic protein that is part of the transforming

growth factor-beta superfamily. In a study of hemochromatosis penetrance to identify

genetic modifiers, significant association was demonstrated between rs235756 and

ferritin levels; however, the role of the BMP pathway in modulating hemochromatosis

penetrance remains unclear and BMP2 has not to date been implicated in erythropoiesis312. E2F1, which maps to 20q11.22, within the 1-LOD interval reported

herein, encodes a component of cell cycle regulation that is mediated by BMP2, which

suggests that HFE, BMP2, and E2F1 may encode components of a single pathway that affect erythrocyte traits313. XKR7, a Kell blood group complex subunit; EPB41L1, a neuronal variant of EPB41; and HCK (hemopoietic cell kinase), a member of the Src tyrosine kinase family primarily expressed in myeloid and B-lymphoid cell lineages, are also contained within the 1-LOD interval but seem less likely as modulators of erythrocyte traits.

In summary, we characterized a large Amish pedigree and identified several chromosomal regions of linkage to CBC traits. We also demonstrated association between rs1800562 and MCH and MCHC, confirming HFE as a candidate QTL of erythrocyte traits. These newly discovered QTLs are likely to harbor genes that may provide insight into genetic components underlying hematopoiesis and could have important implications for understanding the genetic basis of bleeding and thrombosis.

Furthermore, a more complete elucidation of the variation of CBC traits will allow a more accurate characterization of the association of CBC traits with common diseases.

95 CHAPTER V

DIFFERENTIAL VWF MUTANT MONOMER INCORPORATION DRIVES THE

PHENOTYPE VARIABILITY OBSERVED IN A FAMILY WITH VWD TYPE 22

Abstract

Von Willebrand disease (VWD) is characterized by incomplete penetrance and variable expressivity. We characterized a 24-member pedigree with VWD type 2 due to a novel VWF T>G missense mutation at position c.3911 (c.3911T>G) that predicts a methionine to arginine change (M1304R) in the A1 domain of the von Willebrand factor

(VWF) polypeptide. The c.3911T>G mutation segregated with low VWF levels in an autosomal-dominant manner and resulted in clinical and biochemical phenotypic variability, including different VWF multimer patterns among VWD patients heterozygous for the c.3911T>G mutation. Sequencing of the complete coding regions of VWF in one patient and one unaffected relative did not identify additional mutations or

VWF polymorphisms previously associated with VWF levels or activity. Evaluation of allele-specific mRNA levels by a novel primer extension assay demonstrated consistent mRNA ratios that cluster within unaffected relatives and VWD patients, ruling out a hypomorphic wild-type allele. In vitro expression of the c.3911T>G allele showed decreased VWF secretion and lack of storage granules. When co-expressed with the wild-type (WT) VWF allele, total VWF secretion was also decreased and no storage granules were formed, indicating that the M1304R substitution affects VWF trafficking.

Gel electrophoresis analysis and immunostaining of VWF isolated form cell culture with

2 The Sandy Haberichter laboratory conducted all cell culture data. The in silico modeling of the M1304R mutation was completed by Donald Backos and Gianluca Interlandi. Nanobody binding experiments were conducted by the Jose Lopez laboratory. Jesse Hinckley designed and/or conducted all other experiments presented herein including pedigree characterization, VWF exon 28 and SNP sequencing, allele-specific mRNA quantification, and analysis and description of data provided by others.

96 demonstrated incorporation of the mutant monomer into VWF multimers. A llama nanobody (AU/VWFa-11), which detects the mutated A1 domain, bound with a wide range of variability among affected family members, indicating that VWF multimers contain different amounts of mutant monomers. These findings suggest that the observed variability in VWD phenotypes could in part be defined by the rate of mutant monomer incorporation into the mature VWF multimeric molecule.

Introduction

Von Willebrand disease (VWD) is the most common congenital bleeding disorder and is caused by quantitative or qualitative defects in von Willebrand factor (VWF). The

VWF gene is approximately 178 kilobases (kb) long and maps to chromosome

12p13.328,29,314. Exons 1-17 encode the signal peptide and propeptide and exons 18-52

encode the mature VWF polypeptide monomer. A pseudogene with homology to exons

23-34 of VWF maps to chromosome 22 and contains several nonsense and splice site

mutations, suggesting that it is unable to generate functional transcripts30,33.

VWF is an adhesive glycoprotein that is necessary for the initial attachment of platelets to the blood vessel wall at sites of injury. It is synthesized by endothelial cells and platelets where it is stored in the Weibel-Palade bodies and platelet α-granules, respectively, until released at the site of endothelial damage32,315,316. VWF monomers

homodimerize; undergo several posttranslational modifications in the endoplasmic

reticulum and trans-Golgi network; and form large multimers through polymerization,

ranging in size from 0.5 to 200 million daltons22,317.

Mild and severe quantitative defects of VWF are classified as VWD types 1 and

3, respectively, whereas qualitative defects of VWF are classified as VWD type 245.

VWD type 2 can be the result of abnormal multimerization or dimerization; impaired

secretion of VWF high molecular weight multimers (HMWM); increased ADAMTS13

97 cleavage; or abnormal interaction of VWF binding with platelet receptor glycoprotein

(GP) 1b, collagen, or coagulation factor VIII318. The complexity of VWD is underscored by the variability observed in both the clinical phenotype (mucocutaneous bleeding) and the biochemical phenotype (represented by VWF levels, activity, and multimer pattern) among VWD patients. Though considered a monogenic disease, VWD is characterized by incomplete penetrance and variable expressivity, even in families with a single causal

VWF mutation47,319. Variability may be explained in part by differences in VWF expression or incorporation of the mutant peptide monomer into VWF multimers.

We identified a 24-member family including 11 individuals with VWD caused by a novel VWF mutation in exon 28. Interestingly, affected members of this family exhibit significant variability in VWF levels, multimer distribution, and bleeding severity. We thoroughly investigated the potential genetic and molecular mechanisms involved in this variability and demonstrated differential incorporation of the mutant subunit into plasma

VWF multimers of affected individuals that may explain the observed variability.

Methods

Participants

Twenty-four members of a single Caucasian multi-generational pedigree (11

VWD patients and 13 unaffected relatives) were recruited at the bleeding disorders clinic of the University of Iowa. The study received institutional review board approval from the

University of Iowa and the University of Colorado Anschutz Medical Campus, and informed consent was obtained following institutional guidelines and according to the

Declaration of Helsinki.

98 VWF testing

VWF antigen and ristocetin cofactor levels (VWF:Ag and VWF:RCo, respectively) and factor VIII coagulant activity (FVIII:C) were measured for all recruited pedigree members who provided blood samples. VWF:Ag was measured by standard enzyme- linked immunosorbent assay (ELISA) with a combination of two monoclonal antibodies as previously described146. Briefly, plasma samples were plated into duplicate wells at

three different dilutions per sample. Captured VWF was detected with polyclonal rabbit

antibody with an enzyme conjugate reaction. Agreement between dilutions was

evaluated as a quality control measure. VWF:RCo was measured by automated analysis

of formalin-fixed platelet agglutination on a Dade-Behring BCS analyzer (Dade Behring

Inc. Newark, DE). Data were analyzed using the manufacturer’s end-point based

software package. FVIII:C was determined by one-stage clotting assay in an MLA 900

analyzer using a "lupus-anticoagulant" insensitive activator reagent (Actin FS, Dade

Behring Inc.). Samples were measured at three dilutions and results were compared to a

standard curve. FVIII:C was reported in relation to a population reference range. VWF

multimer analysis was performed in all affected individuals. The VWF multimer pattern

was evaluated by 1.5% agarose gel electrophoresis followed by Western blotting with a

polyclonal VWF antibody, as previously described320.

Bleeding score

As described in Chapter II, to evaluate the bleeding phenotypes of pedigree members, a modified bleeding questionnaire that consisted of 10 questions that were selected on the basis of their positive predictive value was administered at study enrollment. Questions focused on nosebleeds, minor injury (cuts), bruising, gingival bleeding, bleeding after dental and surgical procedures, and transfusion requirements.

99 Questionnaires are graded on a Likert scale with the lowest (no clinical symptoms of bleeding) possible score being 0 and the highest (most severe) score being 7.

Isolation of genomic DNA and VWF sequencing

Genomic DNA was isolated from whole blood using a commercially available kit

(Qiagen, Germantown, MD). Since VWF:RCo and VWF:Ag levels in several pedigree members suggested the diagnosis of VWD type 2, the sequencing strategy for VWF focused on exon 28, which harbors the majority of type 2 mutations. PCR and sequencing primers were designed to discriminate and selectively amplify VWF exon 28

versus its corresponding pseudogene33. PCR was performed on genomic DNA using standard conditions, and Sanger sequencing was performed using BigDye V3.1 on an

ABI3730xl DNA analyzer (Life Technologies, Carlsbad, CA). Analysis was conducted using Sequencher V4.9 (Gene Codes Corporation, Ann Arbor, MI). To rule out the presence of additional mutations in the VWF gene, full-length sequencing of genomic

DNA for VWF, including introns, was performed in one VWD patient and one unaffected relative at the Partners Healthcare Center for Genetics and Genomics (Harvard Medical

School, Boston, MA).

In silico molecular modeling

Molecular modeling studies were conducted using Accelrys Discovery Studio 3.0

(Accelrys Software, Inc., San Diego, CA; http://accelrys.com). The crystal structure coordinates of the human von Willebrand factor fragment (residues 1269-1466) in complex with GP1bα (PDB: 1SQ0) was obtained from the protein data bank

(http://www.pdb.org). Structures of the M1304R mutant were constructed by altering the wild-type (WT) residue to the mutant form. Predicted mutation-induced changes in energy and/or protein conformation were determined by subjecting the WT and mutant

100 structures to the conjugate gradient minimization protocol using a CHARMM forcefield and the Generalized Born implicit solvent model321,322. Predicted protein-protein interactions between the minimized WT and/or mutant fibrinogen fragment D structures were calculated using the ZDock algorithm323. The ZRank scoring function was used to calculate the interaction energies of the resulting predicted protein complexes and represents a combination of van der Waals attractive and repulsive energies, short and long range repulsive and attractive energies, and desolvation324. Similarity of the

predicted protein complexes to the crystal structure (1SQ0) was determined by

calculating and the root mean square deviation (RMSD) using the crystal structure as

the reference. The four poses with the lowest RMSD values (i.e. the highest similarity to

the crystal structure) were selected from each group for comparison.

Molecular dynamics (MD) simulations were performed in order to investigate

possible structural mechanisms by which the M1304R substitution may inhibit the

function of the A1 domain. The simulation with the WT was started from the

crystallographic structure with PDB code 1AUQ. The mutant M1304R was obtained per

homology using the program CHARMM to construct missing atoms from internal

coordinates. The coordinates of the mutated side chain were minimized with 100 steps

of steepest descent325. The MD simulations were performed with the program NAMD using the CHARMM all-hydrogen force field (PARAM22) and the TIP3P model of water326,327. Electrostatic interactions were calculated within a cutoff of 10 Å, while long- range electrostatic effects were taken into account by the Particle Mesh Ewald summation method328. Van der Waals interactions were treated with the use of a switch function starting at 8 Å and turning off at 10 Å. The dynamics were integrated with a time step of 2 fs using SHAKE to rigidly constrain hydrogen atoms. Snapshots were saved every 10 ps for trajectory analysis. One simulation was performed with the WT and one with the M1304R mutant. The free energy perturbation calculations were

101 performed using the so-called "dual-topology" approach, where the initial and final states are defined329,330. MD simulation was performed where the atoms belonging to the initial

state (M1304) slowly disappear while the atoms of the final state (R1304) slowly appear.

For a complete description of MD simulations method, see Appendix J.

Mammalian cell culture and transfection

The human VWF expression vector, pCIneo-VWF-WT, has been previously

described240. The c.3911T>G mutation was introduced into the full-length VWF expression vector to create pCIneo-VWF-M1304R. Human embryonic kidney cells

(HEK293T, D. Ginsburg, University of Michigan, Ann Arbor, MI) and mouse pituitary tumor cells (AtT-20/D16v-F2, CRL 1795, American Type Culture Collection) were

240 cultured at 37°C in an atmosphere of 5% CO2 as previously described . The variant pCIneo-VWF-M1304R was expressed alone or co-expressed with pCIneo-VWF-WT.

AtT-20 and HEK293T cells were transiently transfected as previously described331.

Conditioned media were harvested from the cells 72 hours after transfection and

centrifuged to remove debris. Transfected HEK293T cells were washed twice with PBS

(phosphate buffered saline) and lysed in Reporter Lysis Buffer (Promega, Madison, WI).

VWF:Ag concentration in conditioned media and cell lysate was determined by ELISA.

Confocal immunofluorescence microscopy

The intracellular localization of VWF was examined by immunofluorescence antibody staining and confocal laser scanning microscopy in the Imaging Core of the

Medical College of Wisconsin using a Leica TCS SP2 confocal laser imaging system as previously described240. Transfected HEK293T cells grown on 2-well chamber slides

were fixed using 3.7% (vol/vol) buffered formalin, permeabilized in 1% Triton X-100 (in

20 mM N-2-hydroxyethylpiperazine-N′-2-ethanesulfonic acid, 300 mM sucrose, 50 mM

102 NaCl, 3 mM MgCl2·6 H2O, pH 7.0), and then blocked with 2% normal goat serum in

Hanks balanced salt solution (HBSS). Cells were incubated at 4°C overnight in 2 to 5

μg/mL primary antibodies diluted in HBSS containing 1% BSA. Cells were washed 3

times and then incubated in secondary antibodies diluted 1:1000 in HBSS containing 1%

BSA for 30 minutes at room temperature. Cells were washed 3 times with HBSS and

mounted using Vectashield (Vector Laboratories).

Allele-specific mRNA assay

Allele-specific mRNA ratios were calculated as previously described264. Blood

samples were collected and stored in RNA later per manufacturer recommendations

(Ambion, Austin, TX). Total RNA isolations were performed using Ribopure Isolation kit

(Ambion, Austin, TX), and quality was assessed using the RNA Nano chip on a

Bioanalyzer 2100 (Agilent, Santa Clara, CA). First strand cDNA was synthesized from

200 ng of total RNA using iScript cDNA kit (BioRad, Hercules, CA). Standard locus-

specific PCR using exon 28 primers (5-GAGCCCCACCACTCTGTATG-3 and 5-

TGCCCGCATACTTCACCT-3) was performed. Amplicons were treated with ExoSap-IT

per manufacturer protocol (Affymetrix, Cleveland, OH). Allele-specific primer extension

was performed with 0.15 uM of 6-FAM-labeled probe (5-

TGAAGTGCTGAAGGCCTTTGTGGTGGACAT-3), mixing with Qiagen PCR buffer at 1.5

mM MgCl2 (Qiagen, Valencia, CA), 200 uM nucleotide mixture of dATP, dGTP, dCTP

and ddTTP (GE Amersham, Piscataway, NJ), 2.5 U Taq polymerase (Epicentre

Biotechnologies, Madison, WI), and 2 ul of ExoSap-treated PCR product and amplified

via an initial denaturation at 94°C for 2 minutes; 30 cycles of 94°C for 30 seconds, 60°C

for 30 seconds, and 72°C for 40 seconds; and a final extension of 72°C for 7 minutes.

The resulting extension product(s) are differently sized (33 bp or 42 bp) because of the

c.3911T>G mutation. Products were diluted 1:10 in Hi-Di formamide, separated by

103 capillary electrophoresis on an Applied Biosystems 3100-Avant Genetic Analyzer, and analyzed with GeneMapper 3.5 (Applied Biosystems, Carlsbad, CA). Relative allele- specific mRNA accumulation was determined using the primer extension protocol with the fluorescently labeled probe and minigene plasmid constructs (Integrated DNA

Technologies, Coralville, IA) containing each homozygous allele (T or G) of the c.3911

T>G SNP. Each clone was quantitated with PicoGreen (Invitrogen, Carlsbad, CA) and mixed in specific ratios to generate a 10-point standard curve. All separations were completed in quadruplicate. The standards were fit to a line resulting in an r2 value of

0.996, calculated by linear regression with Microsoft Excel.

AU/VWFa-11 nanobody

The llama nanobody AU/VWFa-11 detects an epitope in the VWF A1 domain that

is exposed when VWF conformation is able to bind GPIbα332. Briefly, the nanobody was

immobilized on an ELISA plate as a capture antibody, and oxidized or nonoxidized VWF

in serial dilutions was then added to the coated wells. Bound VWF was detected by an

HRP-conjugated polyclonal VWF antibody, was plotted against the concentration of

VWF added to the wells, and the slopes of these plots were compared.

Results

We enrolled 11 patients with VWD type 2 and 13 unaffected relatives and evaluated VWF:Ag, VWF:RCo, and FVIII:C (Figure 5.1A, Table 5.1). The mode of transmission in this family was autosomal dominant. The bleeding scores, obtained from the bleeding questionnaire at the time of enrollment, differed significantly among family members with VWD. Interestingly, VWF levels also varied significantly among VWD patients within this family (Table 5.1). VWF multimer analysis showed decreased numbers of HMWM in some individuals (Figure 5.1B). However, other VWD patients in

104

Figure 5.1: Pedigree and VWF multimer analysis A) A three-generation pedigree with VWD type 2 due to the novel c.3911T>G mutation, which results in the M1304R amino acid substitution. Shaded individuals are heterozygous for the c.3911T>G and were diagnosed with VWD type 2 by VWF:Ag and VWF:RCo. B) Electrophoretic gel analysis of VWF multimers by bleeding score. Bleeding scores are labeled along the top x-axis and separated by vertical dashed lines. Pedigree members are labeled along the bottom x-axis. Individual III.4 is an infant and a bleeding survey was not completed for this subject. PNP: pooled normal plasma.

the family exhibited multimer patterns that are almost indistinguishable from the unaffected relatives. The distribution of multimers did not correlate with the bleeding scores. All but two VWD patients had a VWF propeptide (VWFpp)/ VWF:Ag ratio greater than 2, indicating that the mutant confers increased clearance (Table 5.1). Additionally,

VWD patients exhibited decreased binding to collagens type III and VI, likely due to the paucity of HMWM (Table 5.1). Biochemical characterization suggested a qualitative defect in VWF, resulting in classification as VWD type 2.

VWD in this family is caused by a novel c.3911T>G mutation in VWF

Sequencing of exon 28 revealed a novel T>G substitution at position c.3911

(c.3911T>G), which predicts a methionine to arginine change at amino acid position

105 Table 5.1: Phenotypic characterization of family members VWFpp/ VWF:GPIb/ Individual Age Sex VWF:Ag VWF:RCo VWFpp VWF:CBIII VWF:CBVI VWF:GPIb VWF:Ag VWF:Ag FVIII BS I.1 63 F 32 15 72 22 7 22 2.23 0.68 45 7 I.2 71 M 347 201 226 233* 295 215* 0.65 0.62 301 1 I.3 65 F 33 13 75 15 5 12 2.29 0.37 70 5 II.1 41 M 33 17 82 7 16† 12 2.51 0.35 54 3 II.3 40 M 21 3 40 12 9 10 1.91 0.49 42 5 II.4 37 F 50 60 81 36 22† 27† 1.62 0.53 135 1 II.5 38 M 60 16 82 51 17‡ 26 1.37 0.44 70 5 II.6 36 F 102 91 100 112 69† 85 0.97 0.83 129 2 II.7 43 F 20 11 55 11 11 10 2.72 0.47 40 7 II.8 42 M 122 82 127 109 57 111 1.04 0.91 155 1 III.1 20 M 75 91 103 66 51 53 1.38 0.71 83 1 III.2 18 M 10 7 46 <1.6 <1.6 <1.6 4.69 27 4 III.3 14 F 20 14 82 9 10 7 4.13 0.36 40 4 III.4 <1 F 20 5 44 9 8 9 2.27 0.48 42 N/A§ III.5 <1 M 67 46 111 47 31† 35 1.66 0.53 128 N/A§ III.6 8 F 27 9 60 15 16 11 2.20 0.41 47 3 III.7 11 F 34 13 67 16 14 10 2.00 0.31 69 4 III.8 9 M 121 110 147 115 121‡ 121 1.21 1.00 208 1 III.9 10 M 92 121 140 80 60† 83 1.52 0.90 149 1 * Value based on one dilution † % difference of dilutions > 20% ‡ % difference of dilutions 15-20% § Subjects are young children and bleeding surveys were not administered

106

Figure 5.2: Mutation analysis and in silico modeling of M1304R A) Representative chromatograms of the c.3911T>G variant; homozygous WT chromatogram (top) and heterozygous c.3911T>G chromatogram (bottom). B) WT crystal structure of VWF (PDB: 1SQ0) with the location of M1304, the disulfide bonded C1272 and C1458, and the region of interaction with GP1bα indicated. WT (C) and M1304R (D) mutant and the surrounding residues after energy minimization of the respective structures. Predicted hydrogen bond interactions are represented as dashed green lines.

1304 (M1304R) in the A1 domain of the VWF peptide (Figure 5.2A). The c.3911T>G mutation segregated to all VWD patients (defined by VWF:RCo < 20 IU/dL) in the family and was not present in the unaffected relatives. Complete VWF gene sequencing, including introns, in individual I.1 (VWD patient) and individual III.1 (unaffected relative) showed several sequence variants (Appendix K). There were 5 variants in individual I.1 in addition to the c.3911T>G mutation, all of which were previously reported single- nucleotide polymorphisms (SNPs). A total of 21 variants were identified in individual III.1, most of which were intronic variants or known SNPs. Several exonic SNPs found in

107 these two individuals were sequenced in all members of the pedigree to assess their segregation status. No SNP segregated with the c.3911T>G mutation (data not shown).

Mutations in the same region as the c.3911T>G mutation are known to cause VWD type

2B.

The M1304R variant results in defective VWF multimerization, trafficking, and platelet binding

Despite the larger side chain, introduction of the arginine residue in the substituted protein using Accelrys Discovery Studio did not substantially alter the predicted static conformation of the protein structure. The mutant also did not exhibit any energetic differences in the predicted association with GP1bα (Figure 5.2B-D). Molecular dynamics (MD) simulations were performed in order to investigate possible structural reasons why the M1304R substitution inhibits function of the A1 domain in binding to

GP1bα. It is reasonable to speculate that a charged side chain introduced into an otherwise hydrophobic environment, such as the location of M1304 in the WT peptide, might thermodynamically destabilize the protein fold and alter functionality of the A1 domain. We performed 50-ns long, room temperature simulations of the WT and the

M1304R substituted peptide. Plots of the Cα root mean square fluctuations (RMSF)

indicated that the α-helix containing the mutant arginine was more flexible than in the

WT polypeptide (Figure 5.3A). Also, time series of the Cα root mean square deviation

(RMSD) from the crystallographic structure indicated that the α-helix containing the arginine substitution deviated more from the starting configuration than the WT polypeptide and underwent larger fluctuations (Figure 5.3B). Furthermore, visual inspection of the trajectory demonstrated the arginine variant explored a larger space than the methionine residue in the WT polypeptide, indicating that an arginine side chain may kinetically destabilize peptide folding. Free energy perturbation calculations

108

Figure 5.3: Flexibility of the M1304R variant vs. WT peptide in MD simulations A) RMSF of Cα atoms over the last 40 ns of the run with the mutant (M1304R; solid line) and WT (dashed line) peptides. Atoms in secondary structure elements are highlighted through solid circles for the mutant and hatched circles for the WT models. Red circles denote α-helices, green circles denote β-sheets, and violet circles denote 310 helices. The α-helix to which position 1304 maps is highlighted by a blue ellipse. The magenta line indicates RMSF values derived from crystallographic B-factors using the formula 2 √(3/[8*PI ]*8Bi) where Bi is the B-factor of Cα of residue (i). The y-axis was truncated at 2.5 Å in order to visually distinguish differences in the mutation site. The value accompanied by an up-arrow indicates that Cα atoms near the C-terminus have RMSF values up to 10 Å. B) Time series of the Cα RMSD from the initial conformation of the α- helix containing residues 1290 to 1305. Grey and orange indicate the raw values for the M1304R mutant and WT, respectively, whereas black and red represent the respective averages over a 200 ps time window.

109

Figure 5.4: Expression and intracellular localization of recombinant M1304R VWF A) Left: Homozygous expression of WT (black) and M1304R (white) VWF in HEK293T cells. Right: Heterozygous expression of WT-MH tagged VWF with WT VWF (black) or M1304R VWF (white) at a 1:1 ratio in HEK293T cells. The concentration of recombinant VWF in conditioned media (supernatant) and cell lysate was assessed by ELISA and compared to WT VWF. WT VWF was normalized to 100% and M1304R VWF is presented as % of WT VWF. Each bar represents the mean ± standard deviation of at least three separate experiments. B) WT-MH tagged VWF (green) was co-expressed with WT (top, red) or M1304R (bottom, red) VWF at a 1:1 ratio in HEK293T, immunostained, and examined by confocal microscopy. Scale bars represent 10 μm. Top: WT-MH tagged and WT VWF demonstrate localization to Weibel-Palade body-like granules (WPBLG) indicated by punctate staining. Bottom: M1304R VWF demonstrates diffuse staining indicating loss of regulated storage. WT-MH tagged VWF colocalizes with M1304R in a diffuse staining pattern and localizes to WPBLG.

indicated that the M1304R substitution was thermodynamically unfavorable and might lead to unfolding or misfolding of the A1 domain (ΔΔG=41kcal/mol, SEM ±3 kcal/mol).

To assess the impact of the M1304R substitution on VWF synthesis, trafficking, and function, a VWF expression vector containing the c.3911T>G mutation (pCIneo-

VWF-M1304R) was constructed and expressed in HEK293T cells. The mutant VWF

110 vector was expressed alone (homozygous expression) and co-expressed with WT VWF

(pClneo-VWF-WT; heterozygous expression). Homozygous expression of recombinant mutant full-length VWF resulted in an almost normal amount of VWF protein in cell lysates but a severe reduction of secretion into the medium (17% of WT; p<0.001; Figure

5.4A). Confocal images of M1304R VWF co-expressed with WT showed a diffuse fine- granular pattern, whereas the WT control formed defined Weibel Palade body-like storage granules with little diffuse staining (Figure 5.4B). The VWF multimer pattern of the 100% mutant transfection showed a more marked abnormality with high and intermediate multimers being barely detectable on low-resolution gel (Figure 5.5A). The ability of VWF to bind platelets in the presence of either ristocetin or botrocetin was

Figure 5.5: Cell culture VWF multimer and platelet binding analyses A) Multimer analysis by gel electrophoresis demonstrates selective loss of HMWM in 100% mutant expression assays with recovery as percent-WT construct increases. Ristocetin- (B) and botrocetin- (C) induced platelet binding demonstrate the M1304R mutation results in defective GPIbα binding which recovers with increasing percent-WT.

111 significantly impaired with increasing amounts of mutant expression compared to WT

(Figure 5.5B and C). These findings indicate that the degree of mutant monomer incorporation defines not only expression of the mature VWF but also its function.

The phenotypic variability observed is not due to a hypomorphic VWF allele

One potential source of variation in VWF multimers and activity may be due to allele-specific expression of VWF, especially if the WT allele harbor “common” variants that alter VWF levels. Several polymorphisms have been reported that modify VWF levels and may modify VWF activity47,154-156. Thus, differential expression of VWF alleles

may drive phenotypic variability due to differences in the ratio of WT and mutant VWF

polypeptide monomers available for multimerization. In pedigree members with the

c.3911T>G mutation, the median percent WT allele of total VWF mRNA was 76.4%

(range 73.8% to 78.1%). Clustering of WT VWF mRNA levels in VWD patients around

Figure 5.6: Allele-specific VWF mRNA levels Allele-specific mRNA levels are represented as percent T allele (WT allele at base 3911) along the y-axis. Individual pedigree members are represented along the x-axis. All samples were quantitated in quadruplicate, and each bar represents the mean percent WT with one-sided standard deviation bars. Samples are ordered by mean percent wild- WT to demonstrate clustering of unaffected relatives around 95.2% WT and VWD patients around 76.4% WT.

112 76.4% and unaffected relatives around 95.3% indicate there is no significant variation in allele-specific expression of VWF in this pedigree (Figure 5.6).

VWF M1304R multimerization depends on the wild-type/mutant monomer ratio

As observed in Figure 5.5A, the transfection ratios of WT vs. M1304R expression constructs determined the degree of VWF multimerization. With increasing percentage of M1304R substitution construct, there was concomitant decrease in the HMWM with complete loss of HMWM when the M1304R substitution construct was transfected alone.

However, the expression of M1304R at equal concentration with the WT did not appear to grossly affect multimerization. To explore the possibility that M1304R was incorporated in smaller multimeric forms (i.e. dimers, tetramers, etc.), we studied co- expression of the WT and M1304R mutant alleles in a two-color multimer analysis.

Antibodies with binding specificity for VWF or a Myc-His tag were used to visualize multimers generated from co-expression of the M1304R mutant construct and WT VWF-

Myc-His construct (Figure 5.7). Both WT and mutant monomers were readily detected, indicating that the M1304R mutant is incorporated into secreted multimers.

The nanobody AU/VWFa-11 efficiently estimates mutant monomer incorporation in

VWF multimers

We then used the VWF AU/VWFa-11 nanobody to evaluate the amount of VWF mutant peptide incorporated in patient multimer structures. The AU/VWFa-11 nanobody detects an epitope within the A1 domain that is exposed when the A1 domain is activated and has been shown to bind at elevated levels to VWF multimers with causal

VWD type 2B mutations332,333. We found that AU/VWFa-11 also binds at elevated levels

to plasma from patients with the M1304R mutation. We tested multiple VWD patients

with the M1304R variant, and inter-person AU/VWFa-11 binding to plasma VWF was

113

M1304R + WT-VWF mH 1:3 Mock WT-VWF mH + WT-VWF pCIneo 1:1 WT-VWF pCIneo + Mock mH 1:1 WT-VWF pCIneo Mock mH Mock pCIneo M1304R M1304R + WT-VWF mH 3:1 WT-VWF mH WT-VWF mH + Mock pCIneo 1:1 M1304R + WT-VWF1:1 mH

Figure 5.7: Two-color multimer analysis demonstrates M1304R mutant incorporation The M1304R mutant was expressed alone or co-expressed with WT VWF-mH (Myc-His tag) in 3:1, 1:1, and 1:3 ratios. Two antibodies were used to discriminate VWF without a mH tag (mutant or WT; red) from WT VWF-mH (green). Lane 1 demonstrates loss of HMWM due to the M1304R substitution. Lane 2 demonstrates M1304R mutant incorporation into multimers (yellow). As the ratio of M1304R/WT construct decreases, M1304R mutant incorporation also decreases. Lanes 7-12 are control experiments that demonstrate the mH tag does not interfere with the ability of VWF to multimerize.

114

Figure 5.8: AU/VWFa-11 nanobody binding predicts differential mutant monomer incorporation into VWF multimers A) AU/VWFa-11 nanobody binding varies between patients, suggesting differential mutant monomer incorporation into VWF multimers. In most patients, nanobody binding was higher than pooled normal plasma (PNP, denoted by gray dashed line). This indicates the AU/VWFa-11 nanobody binds at elevated levels to VWF with the M1304R mutation. B) AU/VWFa-11 nanobody binding to VWF multimers derived from cell culture of VWF-M1304R expressed alone or co-expressed with VWF-WT at multiple ratios. Notice that with no ristocetin treatment (filled circles), nanobody binding decreases whereas ristocetin-induced nanobody epitope exposure increases with a decreased mutant/WT transfection ratio.

115 heterogeneous (Figure 5.8A). This indicates that incorporation of mutant VWF peptides into multimers was variable.

The antibody ristocetin is a mimetic that simulates the ability of VWF to bind platelet GPIbα (VWF:RCo)334. Ristocetin also results in exposure of the AU/VWFa-11

nanobody epitope. We expressed the VWF-M1304R mutant alone or co-expressed with

the VWF-WT construct at multiple mutant/WT ratios. Analysis of ristocetin-induced

exposure of the AU-VWFa-11 nanobody epitope demonstrated the increase with

ristocetin was much less pronounced in VWF derived from the mutant construct

compared to the WT construct (Figure 5.8B). This suggests that enhanced nanobody

binding in the presence of ristocetin is a characteristic unique to WT monomers in the

VWF multimer. Based on this observation, we devised an estimate of percent

incorporation of mutant subunits into VWF multimers by examining nanobody indices

(the slope of nanobody binding as a function of added VWF) in the absence and

presence of ristocetin in VWF with various compositions of mutant dimers as estimated

by transfections with different ratios of wild-type and mutant DNA. We found that a

mutant/WT transfection ratio of 4:1 was sufficient to facilitate VWF multimerization.

These findings are consistent with the hypothesis that the degree of mutant

incorporation into VWF multimers is an important determinant of the synthetic and

functional impairments observed.

Discussion

We thoroughly characterized the molecular pathogenesis of VWD in a family with

VWD type 2 caused by a novel VWF c.3911T>G mutation that has not been previously reported. Interestingly, several other missense mutations that alter amino acids in the same region cause VWD type 2B. In this family, the platelet count was normal in all

VWD patients and response to low-dose ristocetin was not enhanced, ruling out VWD

116 type 2B. Additionally, recombinant M1304R VWF did not bind platelets in the presence of ristocetin or botrocetin. Expression of a VWF construct containing the c.3911T>G mutation demonstrated that the M1304R substitution resulted in defective VWF multimerization and trafficking. In silico modeling suggested that the arginine amino acid may destabilize an α-helix in the A1-domain, resulting in defective GPIbα binding.

One of the most remarkable features about this family is the phenotypic variability despite a unifying c.3911T>G mutation. Potential explanations include the presence of a hypomorphic WT allele in addition to the c.3911T>G mutant allele335. We sequenced the VWF gene in two individuals of this family, one VWD patient and one unaffected relative. We subsequently sequenced all exonic variants and found that none co-segregated with disease status nor were any variants associated with VWF levels

(data not shown). Allele-specific mRNA ratios ruled out the presence of a hypomorphic

WT allele, as all c.3911T>G/WT ratios were comparable among VWD patients.

Another possible explanation for the observed variability in multimer distribution and ultimately in bleeding patterns is variable incorporation of mutant monomers into

VWF multimers. In our studies of the mutant VWF M1304R peptide in cell culture, we found that the ability of VWF to multimerize greatly depended on the mutant/WT transfection ratio (Figure 5.5A). Interestingly, a 3:1 mutant-to-WT transfection ratio was sufficient to facilitate synthesis of HMWM. At this ratio expression of WT VWF decreased the percentage of protomers composed of two mutant monomers to 56%. Increasing the transfection ratio to 1:3 mutant-to-WT resulted in 6.25% of protomers containing two mutant monomers. Additionally, the VWF nanobody AU/VWFa-11, which preferentially recognizes the activated (exposed) A1 domain, exhibited heterogeneous binding to plasma from VWD patients in this family, suggesting a variable distribution of mutant monomer incorporation into VWF multimers.

117

Figure 5.9: Schematic representation VWF multimerization incorporating WT and 1:1 WT/mutant protomers while excluding mutant promoters

Our results suggest phenotypic variability is driven by the degree of mutant monomer incorporation into VWF multimers (Figure 5.9). In this model, we assume a

VWD patient heterozygous for the causal mutation synthesizes the mutant and WT polypeptides in equal quantities. Dimeric protomers would consist of two WT polypeptides, one WT and one mutant polypeptide, and two mutant polypeptides in a ratio 1:2:1. The protomers containing two mutant polypeptides are either recognized as defective within the endoplasmic reticulum and sequestered for degradation, or evade detection and are trafficked to the Golgi network where they are unable to incorporate into growing multimers. Mutant dimers would subsequently be released into the plasma as C-terminal disulfide-bonded protomers. Multimers would form only from WT dimers

118 and from dimers containing one WT and one mutant monomer. Thus, multimers would consist of an average of two WT monomers to every one mutant monomer.

Recently, Siguret et al. reported that when the VWF c.1458C>G mutation was

expressed alone protomers consisted of only minor quantities of tetramers and

hexamers336. Additionally, Rosenberg et al. reported that co-expression of WT VWF with

a serine487 variant showed primarily WT homodimer formation with only 9% of the

multimers containing heterodimers337. A similar observation was reported by Bodo et al.

with the VWF C1149R mutation338.

In summary, our findings suggest that the rate of mutant monomer incorporation into VWF protomers may underlie the efficiency of multimerization, resulting in the variable phenotype observed with some VWD type 2 mutations. Therefore, some mutations such as M1304R may alter VWF multimer biosynthesis in addition to resulting in a functional deficiency. Further studies are needed to directly quantify the amount of mutant protein incorporated into VWF multimers, which, to date, is limited by the complexity of VWF multimer structure and requires the development or adaptation of new techniques.

119 CHAPTER VI

MUTATIONS IN NBEAL2 ARE THE PUTATIVE CAUSE OF GRAY PLATELET

SYNDROME3

Abstract

Gray platelet syndrome (GPS) is a rare autosomal recessive bleeding disorder characterized by thrombocytopenia with large gray-appearing platelets and paucity of α-

granules. Genome-wide mapping analyses by our group and others mapped the GPS

locus within a 8.73 MB region on chromosome 3p21. We employed homozygosity

mapping in five patients with GPS from two related Native American families and one

patient with GPS of Pakistani origin to redefine localization of the GPS locus to a 1.4 Mb region. RNA isolated from platelets of a GPS patient was subjected to focused next- generation RNA sequencing (RNA-Seq), which revealed abnormal transcript reads, including intron retention, mapping to NBEAL2 (Neurobeachin-like 2). DNA sequencing of genomic NBEAL2 identified compound heterozygosity for a canonical 3’ splice site mutation and a frameshift indel in this patient and a familial nonsense mutation and a frameshift indel in the Native American kindreds, supporting NBEAL2 as the genetic cause of GPS. NBEAL2 encodes a BEACH domain-containing protein predicted to be

3 This chapter contains previously published data as well as data that are not yet published. Figures adapted from publications are reproduced with permissions and will be cited within the figure legend. In this chapter, Jesse Hinckley and Shawn Jobe designed and conducted the initial homozygosity mapping experiments that originally mapped the GPS locus to chromosome 3p21 and defined an initial interval of 8.73 Mb. Jesse Hinckley conducted microsatellite genotyping to confirm the GPS locus by linkage analysis. Shay Fabbro and Jesse Hinckley conducted fine-mapping analysis of the Affymetrix 6.0 chip experiment to narrow the region of interest. Kai Wang performed the statistical genetics analysis. Jesse Hinckley analyzed genes in the interval to prioritize candidate genes for sequencing, sequenced NBEAL2 in all patients and controls identifying, and verifying the putative mutations. Andrew Weyrich’s laboratory performed the RNA-Seq experiments and provided results for analysis. John Patterson provided the evolutionary genetics analysis. James G. White provided the platelet electron microscopy images.

120 involved in vesicular trafficking, NBEAL2 arose through a recent gene duplication event in the vertebrate lineage and appears to be uniquely associated with platelet -granule development.

Introduction

Thrombocytopenia is a common clinical presentation. Thrombocytopenias are classified as microthrombocytic, normothrombocytic, or macrothrombocytic. In fewer than 5% of those with a chief complaint of thrombocytopenia is the finding congenital339.

X-linked, autosomal dominant, and autosomal recessive inheritance patterns have been reported, as well as multisystem thrombocytopenic syndromes as part of chromosomal disorders.

Gray platelet syndrome (GPS; OMIM: 13090) is a pure α-granule storage

disorder characterized by large gray-appearing platelets when evaluated by stained

peripheral blood smear340-344. Although familial clusters of GPS patients have been

reported that are consistent with X-linked or autosomal dominant inheritance, most GPS

patients and families autosomal recessive inheritance345-347. GPS is classified as a

macrothrombocytopenia, though the mechanism of the thrombocytopenia is unclear.

Patients with GPS have thrombocytopenia to varying degrees, mild to moderate

bleeding, and splenomegaly342,348-350. Later in life, GPS patients also develop progressive

bone marrow fibrosis due to abnormal reticulin deposition, thought to result from several

of the proteins, including growth factors, normally stored in α-granules being “leaked” by

megakaryocytes into the extracellular space340,342,351-353.

Ultra-structural studies of GPS platelets show selective depletion of -granules and normal formation of -granules, lysosomes, mitochondria and peroxisomes by bone marrow-resident megakaryocytes, confirming GPS is a pure α-granule storage

121 disorder341,342,344,354,355. To date, very little is known about α-granule biogenesis. Hence,

though a rare disorder, the identification of the genetic cause of GPS will provide new

insight into platelet physiology. The main objective of the study presented here is the

identification of the gene involved in GPS.

Methods

Research subjects

The study population consisted of two Native American kindreds and one

Pakistani kindred, totaling six GPS patients, seven parents, and five control siblings.

The mother of family 2 is from the same tribe as family 1 and the father is of Mexican ethnicity. The parents of family 3 are of Pakistani ethnicity and are not consanguineous.

Informed consent was obtained from all participants according to institutional guidelines of the University of Iowa, University of Colorado, and the Research Ethics Board at The

Hospital for Sick Children in Toronto. DNA was extracted from blood using the Gentra

Puregene or QIAmp Blood Kit (Qiagen, Valencia, CA).

Transmission electron microscopy

Platelets were isolated using established methods356. Platelet-rich plasma was

centrifuged at 800x g for 10 minutes and fixed with 2.5% glutaraldehyde in PBS pH 7.4

at 4°C for 1 hour. Supernatants were discarded and platelets were washed with 0.1 M

phosphate buffer (pH 7.4), followed by dH2O. Platelets were then post-fixed with 2% osmium tetroxide, dehydrated and embedded in Epon. Thin sections were examined with a JEOL JEM-1011 electron microscope after uranyl acetate and lead citrate staining. Digital images were captured with a side mounted Advantage HR CCD camera

(Advanced Microscopy Techniques (AMT) Danvers, MA) saved in TIFF format, and trimmed for publication using Adobe Photoshop (CS3).

122 Homozygosity mapping and linkage analysis

DNAs from 3 affected GPS patients, 2 parents, and 1 control sibling from family 1 were interrogated using the Affymetrix GeneChip Mapping 50K assay. The data were analyzed using Excel (Microsoft). Short-tandem repeat polymorphic (STRP) markers were selected corresponding to the candidate intervals, including 14 STRP markers in the 3p21 region, and genotyped according to manufacturer’s protocol (Applied

Biosystems, Life Technologies). Two-point and multipoint non-parametric linkage analyses were performed using GeneHunter (http://linkage.rockefeller.edu/soft/gh/)357.

Families 1 and 2 were analyzed individually and a combined logarithm of odds (LOD)

score is reported herein.

DNAs from each of eleven unaffected and five affected individuals were

interrogated on the Genome-Wide Human SNP Array 6.0 (Affymetrix), which contains

906,600 SNPs, using the Microarray Core facilities at the University of Colorado Denver

as previously described 358. The data were analyzed using the Genotyping Console

Software (Affymetrix). Homozygosity and copy number variation (CNV) analyses were performed on all samples using default parameters.

Candidate gene sequencing and mutational analysis

Candidate gene primers were designed covering all exons and exon/intron boundaries of the genomic sequence using Primer3 (see Appendix K). PCR was performed on genomic DNA and Sanger sequencing was performed using BigDye V3.1 on an ABI3730xl DNA analyzer247. Sequence analysis was conducted using Sequencher

V4.9 and Mutation Surveyor V3.97. All mutations identified in NBEAL2 were

resequenced on both DNA strands.

123 Next generation RNA-sequencing

Platelets were isolated from healthy volunteers and the Pakistani subject with

GPS using established methods and protocols approved by the University of Utah and the Research Ethics Board at The Hospital for Sick Children in Toronto (Appendix

L)356,359. Leukocytes were removed by CD45+ selection followed by RNA isolation359.

The integrity of the RNA was evaluated on an Agilent BioAnalyzer to ensure that each sample had an RNA integrity number (RIN) greater than 7.0. Poly(A)-tailed RNA was subsequently prepared and used to create libraries for deep sequencing. Samples were sequenced for 50 cycles on an Illumina GAIIx sequencer and sequence reads were processed and aligned as previously described359. For visualization of actual read

sequences in IGB, reads were re-aligned with Bowtie and processed into BAM format

using Samtools360,361.

Protein sequence data collection and analysis

Protein sequences for the NBEAL-LYST family (NBEA, NBEAL1, NBEAL2 and

LYST) were retrieved by the combination of two methods: Ensembl and PhyloPro searches362,363. Orthologous protein sequences were retrieved from the Ensembl database (version 62)362. These orthologs had been identified previously by performing a

genome-wide reciprocal WUBlastp+SmithWaterman search of each gene across all

completed genomes. Additional orthologous protein sequences were retrieved from

PhyloPro. Representative sequences were selected for further analysis from the

Vertebrata (H. sapiens, P. troglodytes, M. mulatta, M. muscuus, B. Taurus, G. gallus, M.

domestica, X. tropicalis, T. rubripes and D. rerio), other Chordata (C. savignyi and C.

intestinalis) and other eukaryotic species (D. melanogaster, C. elegans, N. vectensis, M.

brevicollis and C. albicans). Pfam searches were performed to identify known domain

architecture for each representative protein sequence364.

124 Multiple sequence alignment

All representative protein sequences were aligned in MUSCLE v3.6 using default parameter settings365. In addition, NBEAL2 and its non-vertebrate orthologs (NBEAL2

gene family) were also aligned as above. Protein multiple sequence alignments (MSAs)

were manually edited by removing ambiguous regions from the alignment using the

sequence alignment editor, Se-Al 2.0a11 (http://tree.bio.ed.ac.uk/software/seal/).

Phylogeny reconstruction

The model for amino acid substitution to be used for the NBEAL-LYST and

NBEAL2 gene families MSAs were determined using ModelGenerator v0.85366. For the

NBEAL-LYST family a neighbor joining (NJ) phylogeny was generated using MEGA5 under the predetermined amino acid substitution model, JTT + G, and 1000 bootstrap replicates367. A maximum likelihood (ML) phylogeny of the NBEAL2 gene family was generated using PhyML with 1,000 bootstrap replicates368. The selected substitution

model was JTT with invariable sites and a discrete gamma model of rate heterogeneity.

Results

The study cohort included two nuclear families from a Native American settlement (families 1 and 2) and a male patient of Pakistani origin and his parents

(family 3; Figure 6.1A). The Native American families are part of the same settlement with limited outbreeding, suggesting the possibility of founder effect resulting in homozygosity for a causal familial mutation in the patients. The father of family 2 is of

Mexican origin, which suggests compound heterozygosity in this family. The Pakistani family is not consanguineous. Platelet counts in patients ranged from 34-84 x109 platelets/L (normal: 150-400 x109 platelets/L; Table 6.1). By transmission electron

microscopy (EM), platelets showed complete absence of -granules (Figure 6.2).

125 GPS locus maps to chromosome 3p21

Homozygosity mapping is a proven method to elucidate candidate genes underlying Mendelian disorders in consanguineous families369,370. Due to founder effect among Native American tribes, we hypothesized GPS patients in family 1 would be homozygous for the familial mutation and the GPS patients in family 2 would be compound heterozygotes for the familial mutation. We therefore genotyped individuals

II.11, III.2, III.6, III.87, III.9, and IV.5 of family 1 using the Affymetrix Gene Chip Mapping

50K assay. Analysis of homozygosity among individuals III.6, III.7, and IV.5 while the control parents II.1 and III.2 were heterozygous and control sibling III.9 was heterozygous or homozygous for the major allele identified an 8.73 Mb region on chromosome 3p21 as the candidate GPS locus.

Figure 6.1: GPS kindreds and NBEAL2 mutations A) The study population consists of three kindreds. Family 1 is a Native American family. The mother of family 2 is of the same tribe as family one and the father is of Mexican ethnicity. Family 3 is a nonconsanguenious Pakistani family. B) NBEAL2 is a 30 kb gene and consists of 54 exons. C) Four mutations were identified in NBEAL2, including c.1029+1G>A (Pakistani), c.1820G>A (W607X; Native American), c.4371_4375dupCGTGG (Pakistani), and c.5413dupG (Mexican). This material is adapted from Kahr and Hinckley et al. with permission of the Nature Publishing Group371.

126 Table 6.1: Three kindreds with Gray Platelet Syndrome. Subject Ethnicity* Gender PLT (K/μl) Hb (g/dL) WBCC (K/μl) 1 (III-1) NA M 188 11.4 6.1 1 (III-2) NA F 298 11.7 7.4 1 (IV-1) NA F 345 13.4 5.9 1 (IV-2) NA F 353 13.3 10.3 1 (IV-4) NA F 360 10.7 7.2 1 (IV-5) NA F 83 11.2 4.5 1 (II-10) NA M 207 11.8 2.1 1 (II-11) NA F 255 13.1 6.8 1 (III-6) NA F 58 13.1 5.2 1 (III-7) NA F 34 13.2 4.5 1 (III-9) NA F 242 13.8 6.5 2 (I-1) MEX M 247 15.1 4.5 2 (I-2) NA F 311 13 8.6 2 (II-1) MEX/NA M 407 14.8 8.1 2 (II-2) MEX/NA M 84 11.8 7.4 2 (II-3) MEX/NA F 68 12.6 4.8 3 (I-1) PAK M 262 17.2 6.5 3 (I-2) PAK F 168 12.9 7.4 3 (II-1) PAK M 82 12.6 9.1 * NA: Native American; MEX: Mexican; PAK: Pakistani

Figure 6.2: GPS blood smear and platelet transmission electron microscopy A) Blood smear demonstrated macrocytic thrombocytopenia in a GPS patient (top left) compared to a unaffected relative (top right). B) Platelet transmission electron microscopy revealed platelets with a paucity of α-granules that are enlarged in the GPS patient (bottom left) compared to control (bottom right). This research was originally published in Blood. Fabbro, Kahr, and Hinckley et al. Homozygosity mapping with SNP arrays confirms 3p21 as a recessive locus for gray platelet syndrome and narrows the interval significantly. Blood. 2011;117:3430-3434. © the American Society of Hematology372.

127

Figure 6.3: Linkage established to chromosome 3p21 Focal linkage analysis of gray platelet syndrome established a maximum LOD score of 2.7 at 70.6 cM (chromosome 3p21). This research was originally published in Blood. Fabbro, Kahr, and Hinckley et al. Homozygosity mapping with SNP arrays confirms 3p21 as a recessive locus for gray platelet syndrome and narrows the interval significantly. Blood. 2011;117:3430-3434. © the American Society of Hematology372.

Though the mother of family 2 is from the same tribe as family 1, we were unable

to establish a direct relationship between the families. To validate 3p21 as the GPS

locus, we genotyped 14 STRP microsatellite markers that map to the 8.73 Mb candidate

interval. Linkage to 3p21 was evaluated and a cumulative LOD score of 2.7 was

established to markers D3S3615, D3S648, and D3S1578 (70.6 cM) for GPS (Figure

6.3). Our findings are consistent with a 9.4 Mb GPS locus reported by Gunay-Aygun et

al., published after our initial mapping studies348.

To narrow the candidate interval, we genotyped individuals I-10, I-11, II-3, II-4, II-

9, II-11, III-2, III-3, III-5, and III-6 from family 1; all individuals of family 2, and individual

II-1 from family 3 using the Affymetrix 6.0 chip and identified 3 regions of homozygosity shared by the GPS patients of each family: 1q25.1 (171.8-172.3 Mb), 3p21 (48.4-50.1

Mb), and a large centromeric region on chromosome 16. Based on our previous GPS homozygosity mapping and linkage results, we further investigated 3p21 as the locus

128

Figure 6.4: High-density homozygosity mapping narrows the chromosome 3p21 interval to 1.4 Mb Due to founder effect, we hypothesized family 1 GPS patients would be homozygous for a familial mutation whereas family 2 GPS patients would be heterozygous for the familial mutation. Homozygosity mapping using the Affymetrix Genome-wide Human SNP Array 6.0 identified a 1.4 Mb interval (47.0-48.4 Mb) where family 1 patients II.9 and III.6 were homozygous and family 2 patients II.2 and II.3 were heterozygous. This research was originally published in Blood. Fabbro, Kahr, and Hinckley et al. Homozygosity mapping with SNP arrays confirms 3p21 as a recessive locus for gray platelet syndrome and narrows the interval significantly. Blood. 2011;117:3430-3434. © the American Society of Hematology.372

129 and identified a 1.4 Mb region preceding the homozygosity block where GPS patients II-

9 and III-6 of family 1 were homozygous and GPS patients II-2 and II-3 of family 2 were heterozygous (Figure 6.4).

Systematic review of candidate genes and coding region sequencing conducted by our lab and others did not identify any homozygous or compound heterozygous mutations. Gunay-Aygun et al. employed a nonbiased genome sequencing approach encompassing 69% of the coding region of 165 protein-coding genes within the 9.4 Mb interval which also did not identify any candidate mutations348.

NBEAL2 identified as putative GPS gene

While platelets are anucleate cell fragments derived from megakaryocytes, they retain a diverse transcriptome360. To identify the putative GPS gene, we performed next-

generation genome-wide RNA sequencing (RNA-Seq) of platelet mRNA isolated from

individual II-1 of family 3. We focused our analysis strategy on transcripts corresponding

to genes within the 3p21 candidate interval. NBEAL2 transcripts from the GPS patient

demonstrated intronic retention, suggesting abnormal mRNA splicing (Figure 6.5A).

Sequencing of genomic DNA of individual II-1 identified a heterozygous c.1029+1G>A

mutation resulting in disruption of the canonical 3’ splice site of exon 9, which predicts

intron 9 would be abnormally retained within NBEAL2 transcripts derived from this allele

(Figure 6.1C). Assuming intron 9 retention, three in-frame premature stop codons are

introduced within 80 nucleotides of the putative splice site. The c.1029+1G>A mutation

and intron 9 retention was confirmed by RNA-Seq analysis (Figure 6.5B).

Subsequent sequencing of NBEAL2 coding sequence and exon/intron

boundaries identified compound heterozygosity for a five-base pair duplication in exon

28 of bases 4,371-4,375 (c.4371_4375dupCGTGG; Figure 6.1C). Sequence analysis in

130

Figure 6.5: RNA-Seq reveals intron 9 retention in NBEAL2 Platelets were isolated from individual II.1 of family 3 and an unrelated control and mRNA was extracted. A) RNA-Seq revealed intron 9 retention in NBEAL2 suggesting abnormal splicing. B) RNA-Seq sequence analysis confirms the c.1029+1G>A mutation disrupting the canonical exon 9 3’ splice site. This material is adapted from Kahr and Hinckley et al. with permission of the Nature Publishing Group371.

individuals III-6, III-7, and IV-5 of family 1 identified the familial mutation as a homozygous G>A substitution at position 1820 (c.1820G>A), resulting in a premature stop codon (W607X; Figure 6.1C). An additional 55 unaffected relatives and 2 GPS patients where both parents are confirmed members of the Native American tribe were available for sequencing. Within this sample population, the c.1820G>A minor allele

131 frequency is 0.158 among tribe members evaluated with 100% penetrance homozygous individuals. In family 2, we identified single base pair duplication at position in 5413

(c.5413dupG) for which the father and his three children are heterozygous (Figure

6.1C). Position 5413 maps to exon 33, and a one-base duplication results in a frameshift that predicts the introduction of a premature TGA stop codon 172 bases downstream

(bases 5,586-5,588).

Figure 6.6: RNA-Seq reveals reduced NBEAL2 mRNA levels in GPS patients Platelets were isolated from family 1 patients III.7 and IV.5, obligate carrier III.1, and a control individual and mRNA was extracted. A) RNA-Seq analysis reveals the c.1820G>A mutation results in reduced NBEAL2 transcript levels. B). Relative quantity of NBEAL2 mRNA transcripts.

132 Following identification of the putative disease gene, we isolated platelet mRNA from individuals III.1, III.7 and IV.5 from family 1 and individual II.1 from family 3 and conducted quantitative RNA-Seq analysis of the platelet transcriptome. NBEAL2 transcript levels were significantly lower (p=<0.05) in GPS patients III.7 and IV.5 than in individual III.1 (heterozygote) and a control individual (Figure 6.6). These findings suggest the familial c.1820G>A mutation results in nonsense-mediated decay.

Homozygosity for NBEAL2 mutations alters the platelet transcriptome

We also evaluated the RNA heat maps of the platelet transcriptome of GPS patients III.7 and IV.5 of family 1 and compared them between each other, to obligate carrier III.1 of family 1, and to an unrelated healthy control (Figure 6.7). Interestingly, we found a profound alteration of the GPS platelet transcriptome. GPS platelets exhibit a unique molecular signature with several differentially expressed transcripts when compared to normal platelets. Of the top 20 differentially expressed transcripts that were common to individuals III.7 and IV.5 (Table 6.2), some genes are over or under expressed by as much as 90-fold and 558-fold, respectively. Specifically, we found that

ALAS2 and SELENBP1 were overexpressed by 94- and 61-fold respectively. Genes from the SLC superfamily such as SLC4A1 and SLC6A8, and SLC25A39 were also

Figure 6.7: NBEAL2 mutations alter the platelet transcriptome heat map Platelets were isolated from family 1 patients III.7 and IV.5, family 1 obligate carrier III.1, and an unrelated control and platelet mRNA was extracted. The platelet transcriptome was evaluated by RNA-Seq and quantitated by reads per kilobase per million. The platelet transcriptome heat map revealed similar patterns of alteration between GPS patients III.7 and IV.5.

133 Table 6.2: Platelet transcriptome top 20 fold changes Control Gene GPS RPKM* RPKM* Δ Fold ALAS2 400.43 4.25 94.28 ↑ CA1 122.02 1.87 65.40 ↑ SELENBP1 49.20 0.82 59.87 ↑ SLC4A1 51.45 1.14 45.16 ↑ AHSP 103.51 2.42 42.71 ↑ EPB42 19.76 0.48 40.99 ↑ IFIT1B 5.44 0.13 40.64 ↑ SLC6A8 3.86 0.14 27.99 ↑ HBE1 104.62 3.74 27.99 ↑ MYL4 25.78 1.19 21.72 ↑ HBM 86.41 3.99 21.63 ↑ HBA1 74993.67 3982.35 18.83 ↑ HBA2 81754.06 4522.02 18.08 ↑ SLC25A39 242.24 14.35 16.88 ↑ FGL2 0.13 7.56 58.21 ↓ VCAN 0.0706 5.84 82.77 ↓ MPEG1 0.10 9.14 89.65 ↓ FGD2 0.0610 6.92 113.44 ↓ RPS4Y1 0.0254 7.32 287.52 ↓ EIF1AY 0.0166 9.27 557.91 ↓ * Reads per kilobase per million

overexpressed by 45-, 28-, and 17-fold, respectively. SLC genes encode solute carrier proteins including passive transporters and ion-coupled cotransporters and exchangers373. The SLC superfamily is subdivided into 43 families that consist of 298 genes. Families are defined by proteins that have at least 20-25% sequence homology.

The SLC4 family encodes bicarbonate transporter proteins important in cellular pH homeostasis, the SLC6 family encodes sodium- and chloride-dependent neurotransmitter transporters, and the SLC25 family encodes mitochondrial carrier proteins.

134 Discussion

Nonsense mutations as well as frameshift mutations and splice site mutations are known to have deleterious effects on gene expression often constituting a null allele as a result of nonsense-mediated mRNA decay and other regulatory mechanisms374. We

report a familial nonsense mutation (c.1820G>A) as well as a canonical splice site

mutation (c.1029+1G>A) and 2 exonic duplications (c.4371_4375dupCGTGG and

c.5413dupG), strongly suggesting that NBEAL2 mutations are the putative cause of

autosomal recessive GPS. Platelet transcriptome analysis confirmed the c.1820G>A

mutation results in decreased NBEAL2 mRNA levels, suggesting that the mutant

transcript is subjected to nonsense-mediated decay. Heat map analysis of the platelet

transcriptome also identified patterns of alteration in GPS patients.

Additionally, mutations in NBEAL2 were reported as the cause of GPS by Albers

et al. and Gunay-Aygun et al., whose articles were published in series with our own

work371,375,376. Gunay-Aygun et al. demonstrated NBEAL2 mRNA is present in platelets

and cultured megakaryocytes376. NBEAL2 protein fragments were also identified by

mass spectrometry in platelets isolated from control individuals. Albers et al. reported that silencing of nbeal2 by injection of antisense morpholino oligonucleotides in zebrafish embryos demonstrated complete loss of thrombocytes while erythropoiesis was unaffected375. Furthermore, 41% of nbeal2-depleted embryos developed spontaneous

caudal bleeds whereas spontaneous bleeding was not observed in any control embryos.

NBEAL2 encodes a 2,574 amino acid polypeptide that shows homology to the

protein product of LYST. Mutations in LYST underlie Chediak-Higashi syndrome (CHS;

OMIM: 214500), a rare autosomal recessive disorder characterized by bleeding due to

depletion of platelet dense granules377-381. LYST and NBEAL2 gene products contain a concavalin A-like lectin domain, a pleckstrin homology (PH) domain predicted to function in recruitment of proteins to cellular membranes, a Beige and CHS (BEACH) domain,

135 and numerous WD40 repeats382-385. BEACH proteins are implicated in vesicular trafficking by loss of platelet dense granules as well as abnormally large cytoplasmic granules in other cells in CHS386,387. BEACH proteins may also affect trafficking of endosomes, lysosomes, and lysosome-related organelles388. LYST orthologs and other

BEACH proteins have been reported to participate in protein sorting, lysosome-

endosome fusion, and Rab11-dependent trafficking389-391.

Phylogenetic analysis conducted by John Parkinson revealed that NBEAL2 and its close homolog NBEAL1 are members of a family of BEACH domain-containing proteins related to neurobeachin (expressed in neurons) and distinct from LYST (Figure

6.8)371. During the emergence of metazoa the ancestral NBEAL gene underwent duplication, producing the neurobeachin and neurobeachin-like gene subfamilies. This event appears to have followed the divergence of protostomes from deuterostomes. A more recent duplication within the NBEAL lineage led to the formation of the NBEAL1 and NBEAL2 subfamilies. A search of the cephalochordate Amphioxus genome identified only a single gene of the NBEAL1/NBEAL2 subfamily, suggesting the latter duplication is unique to the craniate (vertebrate) lineage.

Our findings strongly suggest NBEAL2 mutations underlie GPS, identifying

NBEAL2 as a key component required for platelet α-granule biogenesis. Phylogenetic analysis reported herein also indicates duplication of an ancestral cellular trafficking gene may have resulted in neofunctionalization of NBEAL2, accounting for the tissue- specific phenotype of GPS. These results provide evidence of the first known gene in platelet α-granule biogenesis, which provides an anchor point for future studies to elucidate a previously unsolved pathway.

136

Figure 6.8: Phylogenetic analysis of NBEAL protein family members The maximum likelihood tree shows phylogenetic relationships between LYST and NBEAL family PH-BEACH domains. Numbers at nodes indicate bootstrap support for 1000 samplings. From this tree it is apparent that NBEAL1 and NBEAL2 arose from a unique gene duplication event in the early vertebrate lineage. A search of the genome of the cephalochordate Amphioxus identified a single gene of the NBEAL1/NBEAL2 subfamily (data not shown), suggesting the duplication is unique to the craniate (vertebrate) lineage and that LRBA and NBEA diverged prior to the emergence of Vertebrata. An initial neighbor joining tree of the complete NBEAL-LYST multiple sequence alignment produced the same overall topology. There were no topological changes to the overall PH-BEACH phylogeny on testing for the presence of long branch attraction, indicating that the analysis is not affected by this potential artifact (data not shown).This material is reproduced from Kahr and Hinckley et al. with permission of the Nature Publishing Group371.

137 CHAPTER VII

AN ANKRD26 5’-UTR MUTATION IN LINKAGE DISEQUILIBRIUM WITH A MASTL

MUTATION IN A FAMILY WITH AUTOSOMAL DOMINANT THROMBOCYTOPENIA

Abstract

Autosomal dominant thrombocytopenia, THC2 is a familial nonsyndromic normocytic thrombocytopenia that demonstrates linkage to the THC2 locus on chromosome 10p11.2-12. Previous reports suggest causal mutations in ACBD5,

MASTL, and ANKRD26. To date, six mutations in a 22-base pair segment of the 5’- untranslated region (UTR) of ANKRD26 have been reported in 21 families. We report a new family with nonsyndromic thrombocytopenia inherited in an autosomal dominant pattern. Genome-wide linkage analysis was significant only for linkage to the THC2

locus on chromosome 10 (LOD=3.29). Sequence analysis revealed all family members

with thrombocytopenia were heterozygous for the previously reported MASTL c.506G>C variant as well as a novel ANKRD26 c.-126T>C variant that maps to the previously implicated 5’-UTR segment. Haplotype analysis established both variants are in linkage disequilibrium within the family, which is consistent with the extensive long-range linkage disequilibrium present in the CEU HapMap dataset. We also identified a non-penetrant

ANKRD26 c.-113A>C variant previously reported as causal. Sequence analysis of two patients from the original MASTL kindred revealed they are also heterozygous for the

ANKRD26 c.-126T>C variant. We also report a third unrelated kindred heterozygous for the ANKRD26 c.-126T>C variant in which no mutations in MASTL were present.

Expression analysis by quantitative PCR demonstrated ANKRD26 is a widely expressed gene with low expression in hematopoietic tissues. Modeling of the ANKRD26 5’-UTR variants predicted the c.-126T>C and c.-134G>A (most common variant published to date) variants alter a stem structure within a very stable UTR structure. This results in

138 the introduction of a terminal stem loop. The c.-113A>C variant is not predicted to alter the stem structure, which is consistent with a nonpenetrant variant. Stem loop structures have previously been characterized as protein binding sites, suggesting that ANKRD26

5’-UTR mutations that introduce this stem loop structure may alter protein binding resulting in a gain-of-function variant that causes nonsyndromic thrombocytopenia.

Introduction

Congenital platelet disorders are a heterogeneous array of diseases classified as thrombocytopenias with otherwise normal platelet function and thrombocytopenias with thrombocytopathy193-200. Thrombocytopenia may be the only laboratory finding or may

present as part of a syndrome such as Chediak-Higashi syndrome, Fanconi anemia,

Hermansky Pudlak syndrome, and Wiskott-Aldrich syndrome. Thrombocytopathies may

also result from mutations in proteins that interact with platelets, including VWF54.

Thrombocytopenias are caused by mutations in several genes that encode components

of platelet structure and function200. Reported inheritance patterns include autosomal dominant and recessive and X-linked recessive. By platelet volume, nonsyndromic thrombocytopenias are classified as normocytic, macrocytic, and microcytic.

Nonsyndromic autosomal dominant thrombocytopenias are typically moderate thrombocytopenias characterized by normocytic platelets with no detectable alterations of platelet morphology, normal megakaryocyte physiology, and a mild bleeding diathesis392-395. Linkage was established between autosomal dominant

thrombocytopenia and chromosome 10p11.2-12 (THC2, OMIM: 188000) in two large

unrelated Caucasian kindreds from Italy and the United States201,395. Candidate

mutations were subsequently reported in ACBD5, encoding an acyl-CoA binding domain containing gene, and in MASTL (MIM: 608221), encoding a microtubule-associated serine/threonine kinase-like protein, respectively396,397. Further screening for MASTL and

139 ACBD5 mutations was negative in four Italian pedigrees in which linkage to the THC2 locus was established398. Subsequently, DNA sequence analysis of 32 genes identified six mutations in a conserved 19-base pair (bp) region of the 5’-untranslated region

(UTR) of ANKRD26 (MIM: 610855) in nine Italian families, including the original kindred with the putative ACBD5 mutation398. Further sequence analysis of ANKRD26 exon 1

(encodes the 5’-UTR) identified a total of 12 candidate mutations confined to a 22-bp

region of the 5’-UTR in 21 of 210 families with congenital thrombocytopenias of unknown

cause399.

Here we present a new kindred with nonsyndromic autosomal dominant thrombocytopenia in which linkage was established to the THC2 locus. Whole genome sequence analysis identified the previously reported candidate mutation in MASTL as well as a novel variant in the purported 22 bp region of ANKRD26. No other variants on chromosome 10 co-segregated among affected individuals. Subsequent haplotype analysis determined the MASTL and ANKRD26 variants were in linkage disequilibrium within the pedigree. Of note, a phenotypically normal founder who married into the kindred was heterozygous for an ANKRD26 variant previously reported as causal. We also report identification of an ANKRD26 5’-UTR mutation in the original MASTL kindred.

Our findings suggest ANKRD26 is the most likely cause of nonsyndromic autosomal dominant thrombocytopenia, THC2.

Methods

Subjects

The study population consists of three kindreds. The first kindred is a four- generation pedigree, of whom 17 individuals have been diagnosed with nonsyndromic autosomal dominant thrombocytopenia. A total of 24 family members provided blood samples for DNA extraction. The second kindred consists of two cousins who are part of

140 the original MASTL pedigree reported by Drachman et al201,396. No known relationship was established between the first and second kindreds. The third kindred consists of a mother and two children diagnosed with nonsyndromic thrombocytopenia whose status was confirmed by platelet count at the time of enrollment. All patients and family members were enrolled according to institutional guidelines at the University of Iowa, the

University of Colorado, and the Puget Sound Blood Center. DNA was extracted from blood using the Gentra Puregene Kit (Qiagen, Valencia, CA).

Linkage analysis

In the first kindred, DNA from 24 pedigree members was genotyped using the

Affymetrix Gene Chip Mapping 50K assay. Multipoint linkage analysis was conducted using GeneHunter (v2.1r5) as implemented in easyLINKAGE (v5.08)400. Significance of linkage results are reported as a logarithm of odds (LOD) score. Linkage results from the second kindred were previously published201. Linkage was not assessed in the third kindred.

Whole genome sequencing and in silico modeling

Ten samples were submitted to Illumina for whole genome sequencing using the

Genome Analyzer IIx platform. Though a candidate interval was identified by linkage, whole genome sequencing was the most cost-effective technique to sequence the entire

THC2 locus. Reads were mapped to the reference sequence

(GRCh37/hg19) with large-scale alignment software (ELAND). Sequence calls for variants (single-nucleotide polymorphisms, insertions, and deletions) were performed using the Broad’s Genome Analysis Toolkit (GATK). The resulting variants were input to the program ANNOVAR to annotate the variants by cross referencing various genetic variation databases (e.g., dbSNP, the 1,000 Genomes Project, AVSIFT,

141 etc.) to extract information about variant frequencies and location within genes if previously reported 401. For non-synonymous changes, variants were cross-referenced to the dbSNP database to determine whether the changes to the protein would be considered tolerable or damaging, using prediction scores from SIFT, Polyphen2, LRT, and Mutation Taster, and conservation scores from PhyloP and GERP++. The

ANKRD26 5’-UTR structure and variant structures were modeled using mfold402.

Candidate mutation and THC2 tagging SNP sequencing

Primers to amplify candidate mutations were designed based on the

GRCh37/hg19 build human genome sequence using Primer3 (http://frodo.wi.mit.edu/).

As described in Chapter II, tagging SNPs were selected using the Tagger algorithm implemented in Haploview using pairwise tagging with an r2 threshold of 0.80242,243.

Centre d’Etude du Polymorphisme Humain from Utah (CEU) data downloaded from the

HapMap Genome Browser Release #28 was used to evaluate linkage disequilibrium and pairwise correlations of tagging SNPs for the candidate intervals indicated. Based on candidate mutation sequencing results, the THC2 locus was limited to chr10:27,330,000-

27,516,000 (GRCh36/hg18), encompassing ANKRD26 and MASTL. Primers to amplify tagging SNPs were designed using Primer3. For candidate mutation and tagging SNP primer sequences, see Appendix N. PCR was performed on genomic DNA and Sanger sequencing was performed using BigDye V3.1 on an ABI3730xl DNA analyzer247.

Analysis of sequence was conducted using Sequencher V4.9. THC2 haplotype analysis was conducted using MERLIN (version 1.1.2) with command line option”—best”403.

RNA isoform analysis

Whole blood samples were collected from one patient and 3 unrelated control subjects directly into PAXgene blood RNA tubes (PreAnalytiX/Qiagen, Valencia, CA)

142 and total RNA isolation was performed using PAXgene Blood RNA kit per manufacturer protocol. RNA was treated with 2U DNase I digestion for 15 min at 37°C and purified with DNAse Inactivation Reagent (Ambion/Life Technologies AM1914, Grand Island,

NY). RNA integrity was evaluated on a RNA Nano chip run on a BioAnalyzer 2100

(Agilent Technologies, Santa Clara, CA). RNA integrity numbers (RIN) for all samples were 8.1-9.3. RNA for tissue panel was purchased from Ambion (FirstChoice Human

Total RNA Survey Panel, AM600). Human bone marrow total RNA (636591) and peripheral leukocyte total RNA (636592) were purchased from Clontech (Mountain View,

CA). Adult bone marrow CD33+ (RNA-BM006C) and CD36+ (RNA-BM013C) total RNA were purchased from AllCells (Emeryville, CA). cDNA first strand synthesis was performed in duplicate using 0.5 μg (whole blood) or 1μg (tissue panel) of total RNA with

the iScript RT mix for RT-qPCR following manufacturer protocol (BioRad 170-8840,

Hercules, CA). Primers were designed to amplify specific regions of sequence unique to

each protein-coding ANKRD26 isoform as well as one region (exon 21) common to all

isoforms using Primer3 (http://frodo.wi.mit.edu/) with a target amplicon length of 80-150

bp (Appendix N). Primer sequences were evaluated for specificity and degeneracy by

searching BLASTN (http://blast.ncbi.nlm.nih.gov/).

Reference gene primers were designed and purchased from IDT [PrimeTime

qPCR primers, Coralville, IA). Quantitative PCR was performed using a Roche

LightCycler 480II Real Time PCR instrument, using Roche SYBR Green I Master (Roche

04 707 516 001, Mannheim, Germany). All standard curve and validation reactions were

performed in triplicate with no template control reactions for all primer sets; all samples

were run in duplicate. Comparative quantitative analysis was performed using the

LC480II data collection software (release 1.5.0.39 SP4). Seven log10 dilutions of

plasmid template for all ANKRD26 isoforms and reference gene amplicon sequences

were prepared and used for primer validation and standard curve reference. PCR

143 efficiency for all target gene primer sets ranged from 90.1%-95.6% with all r2 values

>0.99. To avoid confounding technical variation, one standard curve sample for all reference and target genes were included on every plate. Geometric averaging of five reference genes (ACTB, GAPDH, GUSB, HPRT, and TBP) were used normalize target

gene transcripts. Melt curve analysis for all assays verify single product amplification and

absence of primer dimers.

Results

A five year-old male, the index case of the first kindred, presented with a bleeding predisposition consistent with thrombocytopenia and a platelet count of

58,000/mm3. Peripheral blood smear showed thrombocytopenia with normocytic

platelets (Figure 7.1A). Bone marrow aspiration revealed normal numbers of

megakaryocytes with numerous hypolobated forms (Figure 7.1B). Medical history was

consistent with congenital nonsyndromic thrombocytopenia, and family history revealed

extensive vertical transmission with multiple affected family members (Figure 7.2A).

Figure 7.1: Index case blood smear and bone marrow aspirate A) The blood smear revealed normocytic thrombocytopenia. B) Bone marrow aspirate exhibited normal numbers of megakaryocytes at all stages of development that are characterized by hypolobated nuclei.

144

Figure 7.2: ADT pedigree and candidate mutations A) We characterized three generations of a multigenerational kindred with nonsyndromic autosomal dominant thrombocytopenia. B) For candidate mutations, wild-type chromatograms are presented above heterozygous mutant chromatograms.

Twenty-four pedigree members provided blood for DNA isolation including 10

individuals with a diagnosis of nonsyndromic thrombocytopenia confirmed by platelet

count at the time of enrollment (≤ 140,000/mm3; II.3, III.5, III.7, III.11, III.12, IV.5, IV.10,

IV.12, IV.14, and IV.15) and 5 relatives with normal platelet counts confirmed at the time of enrollment (≥ 234,000/mm3; II.4, III.6, III.8, and IV.7). An additional 9 pedigree members provided samples, but platelet counts are not available to confirm their status.

Two individuals (II.5 and II.7) self-reported a diagnosis of nonsyndromic thrombocytopenia. Four individuals (II.6, II.8, III.10, and III.13) are spouses of pedigree

145 members and self-report no bleeding diatheses. For three individuals (IV.11, IV.13, and

IV.16), no phenotypic data was available.

Genome-wide analysis was undertaken by multipoint linkage analysis using easyLINKAGE. Linkage of the phenotype of thrombocytopenia was established to chromosome 10p11-12 with a maximum LOD score of 3.29. We observed no LOD scores greater than 2 in the genome-wide linkage analysis. Our findings are consistent with previously reported linkage of nonsyndromic autosomal dominant thrombocytopenia to the THC2 locus on chromosome 10201,395.

Whole genome sequencing identifies an ANKRD26 and a MASTL mutation

Ten DNA samples were submitted to Illumina for whole genome sequencing

including six pedigree members with thrombocytopenia (II.5, III.11, III.12, IV.10, IV.12,

and IV.14) and four unaffected relatives (II.6, III.10, III.13, and IV.11). Chromosome 10

variants were filtered by unaffected relatives, resulting in identification of nine exonic

variants, 18 5’- and 3’-UTR variants, 1,438 intronic variants, 90 noncoding RNA variants,

and 1,667 intergenic variants among individuals with thrombocytopenia. Of the nine

exonic variants, eight were synonymous and one variant was nonsynonymous (MASTL

c.506G>C, rs2894147). Based on the previously reported THC2 linkage intervals, we

narrowed the candidate interval to a 5 Mb region (24,650,000-29,580,000). Within this

interval when filtering by unaffected relatives, we retained one exonic variant, one 5’-

UTR variant, 210 intronic variants, seven noncoding RNA variants, and two intergenic

variants (Appendix O). One insertion resulting in a frameshift was detected in PTCHD3.

To allow for incomplete penetrance, we focused on exonic changes (UTR and amino acid coding regions) with the THC2 locus where patients were heterozygous and three or fewer of the unaffected relatives were heterozygous (Table 7.1). We then

146 Table 7.1: ADT candidate mutations allowing for incomplete penetrance Amino Heterozygous Gene Region Acid Base rs# Patients Controls ANKRD26 5' UTR N/A 27,389,381 6/6 0/4 MASTL Exonic E167D 27,450,059 rs28941470 6/6 0/4 ARHGAP21 3' UTR N/A 24,872,922 6/6 1/4 PTCHD3 Exonic C407G 27,692,279 rs2484180 6/6 1/4 GPR158 Exonic I>V 25,888,180 rs10828833 6/6 2/4 MPP7 5' UTR N/A 28,543,716 rs11006956 6/6 2/4 MPP7 5' UTR N/A 28,543,735 rs11006957 6/6 2/4 PRTFDC1 3' UTR N/A 25,137,772 rs1045873 6/6 3/4 ANKRD26 Exonic Q20R 27,389,197 rs7897309 6/6 3/4

sequenced seven variants for which all six patients and two or fewer of the unaffected relatives analyzed were heterozygous in 24 pedigree members including candidate mutations in ANKRD26, MASTL, ARHGAP21, PTCHD3, GPR158, and MPP7 (Appendix

P). Only two variants were identified for which all patients were heterozygous for the minor allele and all unaffected relatives were homozygous for the wild-type allele:

ANKRD26 c.-126T>C and MASTL c.506G>C (Figure 7.2B).

The MASTL variant has been previously reported as causal396. The ANKRD26

variant represents a novel variant within the 22-bp region of the 5’-UTR that has been

previously reported to contain multiple causal variants, including a c.-126T>G variant399.

No individuals, including the 10 samples analyzed by whole genome sequencing, were found to have the ARHGAP21 variant by Sanger sequencing. The remaining four variants in PTCHD3, GPR158, and MPP7 did not co-segregate with disease status within the pedigree. Interestingly, individual II.4, who married into the pedigree, carries an ANKRD26 variant previously reported as causal (ANKRD26 c.-113A>C; Figure 7.2B).

Her platelet count is 246,000/mm3, and she does not exhibit any bleeding propensity.

Her daughter, individual III.7, is heterozygous for both ANKRD26 c.-126T>C and c.-

113A>C variants as well as the MASTL c.506G>C variant and has a platelet count of

147 29,000/mm3. The ANKRD26 c.-126T>C and c.-113A>C and the MASTL c.506G>C variants are not present in the 1,000 Genomes Project.

The MASTL c.506G>C variant predicts a glutamic acid to aspartic acid substitution at amino acid position 167 (E167D). In silico analysis of the aspartic acid variant is discrepant. SIFT reports the site is not conserved among species with a score of 0.4 (< 0.05 is conserved), whereas GERP++ reports the site is conserved with a score of 5.58 (site is considered more conserved as the score approaches 6.17). Similarly,

PolyPhen2 suggests the variant is possibly damaging in one isoform with a score of

0.83, while being benign in two other reported isoforms (0.33, 0.39). In contrast, LRT and MutationTaster predict the MASTL variant is damaging, with scores of 8.2x10-5 and

0.998, respectively.

Modeling using mfold predicted a stable structure for the ANKRD26 5’-UTR

(ΔG=-74.1 kcal/mol). All reported mutations map to the distal half of a stem structure

(bases 39-60, Figure 7.3A). This stem structure is present in five predicted models of the

5’-UTR alone as well as five predicted models of exons 1-3, suggesting accurate

modeling of the 5’-UTR when considered in isolation (data not shown). Introduction of

the ANKRD26 c.-126T>C variant alters the shape of the predicted stem structure,

resulting in a smaller internal loop and a new terminal stem loop (base 47, Figure 7.3B).

The c.-113A>C variant is not predicted to alter the 5’-UTR structure (base 60, ΔG=-74.1

kcal/mol; Figure 7.3C). Modeling of the c.-134G>A variant, the most common variant

reported by Noris et al. 399, predicted abolition of the internal loop of the stem structure with introduction of a new terminal loop (base 39, Figure 7.3D). The c.-126T>C and c.-

134G>A variants are predicted to introduce the same nine base loop structure.

148

Figure 7.3: ANKRD26 5’-UTR structure predicted by mfold A) mfold simulation predicts a stable 5’-UTR structure. All reported ANKRD26 5’-UTR mutations map to a stem structure between model bases 39 and 60. Bases -134, -126, and -113 are shaded in gray and correspond to model bases 39, 47, and 60, respectively. Only the stem structure represented in the box is altered by the modeled 5’- UTR mutations. The c.-126T>C (B) and c.-134G>A (D) variants alter the stem structure and are reported to be penetrant variants resulting in thrombocytopenia. The c.-113A>C (C) variant is not predicted to alter the stem structure, and we report it is nonpenetrant.

149 ANKRD26 and MASTL variants are in linkage disequilibrium

ANKRD26 and MASTL are separated by approximately 55 kb. To determine if the ANKRD26 c.-126T>C and MASTL c.506G>C variants were in linkage disequilibrium,

we selected tagging SNPs based on the HapMap project CEU dataset to capture an

interval of 186 kb that contained both candidate genes (GRCh36/hg18: 27,330,000-

27,516,000). In the CEU dataset, this region demonstrates extensive linkage

disequilibrium (Figure 7.4A). The majority of the interval is contained within one

haplotype block, including most of ANKRD26 and MASTL. Thirty-two tagging SNPs were

required to capture 100% of alleles in the interval. Phased haplotypes were generated

via MERLIN including the sites of the two candidate mutations (Figure 7.4B). The

ANKRD26 c.-126T>C and MASTL c.506G>C variants were found to be in linkage disequilibrium in haplotype 1. Haplotypes 2-6 represent wild-type founder haplotypes that did not contain either candidate mutation. Haplotype 2 differs from haplotype 1 at only the sites of the two candidate mutations. Haplotype 7 contains the wild-type sequence at both candidate mutation sites; however, this haplotype contains the

ANKRD26 c.-113A>C variant.

ANKRD26 c.-126T>C variant identified in the original MASTL kindred and a third unrelated kindred

In order to evaluate if the ANKRD26 c.-126T>C and MASTL c.506G>C variants are in linkage disequilibrium in an ancestral allele, we genotyped two cousins with ADT from the original MASTL kindred described by Drachman et al201,396. Both individuals were heterozygous for the previously reported MASTL c.506G>C variant as well as the

ANKRD26 c.-126T>C. We did not have access to additional samples to generate phased haplotypes within this kindred. We also genotyped ANKRD26 exon 1 and all

MASTL exons in a third unrelated kindred consisting of a mother and two children, all of

150 ihiiO:: ...... lit); ..... •'*"' """'" ...... A. l""tl_ _,- · - ....••• -~~ ANKRD26--·--'

B. 1.1 1. 2

11. 5 11. 6

5 ? 1 4 1 2 2 2

IV.1

Figure 7.4: THC2 interval linkage disequilibrium and pedigree haplotype analysis A) In the CEU dataset, THC2 demonstrates significant linkage disequilibrium between ANKRD26 and MASTL (generated using Haploview). B) Phased haplotypes were generated using MERLIN. Haplotype 1 represents the ANKRD26 c.-126T>C and MASTL c.506G>C haplotype. Haplotypes 2-7 represent wild-type haplotypes. Haplotypes were imputed for individuals I.1, I.2, II.1, and III.10.

151 whom presented with nonsyndromic thrombocytopenia consistent with ADT. All three individuals were heterozygous for the ANKRD26 c.-126T>C variant. No mutations were

identified in MASTL exonic regions including exon/intron boundaries.

ANKRD26 is a widely expressed gene

Previous studies reported MASTL mRNA expression is restricted to

hematopoietic cells and cancer cell lines404. Four protein coding ANKRD26 mRNA isoforms have been reported (http://www.ensembl.org). Isoform 001

(ENST00000376087) is the only isoform that includes all 34 exons. Isoform 003

(ENST00000445828) is the shortest isoform at 761 bp and is completely redundant with other isoforms. Isoform 201 (ENST00000376070) excludes the first 15 exons of

ANKRD26, beginning with a 306 bp segment of intron 16 followed by exons 17-34.

Isoform 202 (ENST00000436985) differs from isoform 001 by inclusion a 147 bp region from intron 6 and exclusion of exon 13. An additional three processed transcript isoforms have been reported (002, 004, and 005) that are less than 725 bp in length and do not encode a protein product. Exon 21 was selected to evaluate total ANKRD26 expression as it is present in all reported isoforms.

ANKRD26 is a widely expressed gene with the highest levels of expression in cervix, heart, skeletal muscle, and small intestine total tissue mRNA (Figure 7.5).

ANKRD26 expression was also detected in hematopoietic tissue including total bone marrow mRNA as well as CD33+ (myeloid progenitor) and CD36+ (erythroid progenitor) cells. ANKRD26 mRNA was not detected in thymus, thyroid, or trachea total mRNA.

Isoforms 201 and 202 are the most commonly expressed ANKRD26 isoforms (Table

7.2). Expression levels of isoform 001 were below the limit of quantitation for this assay, though the target product was confirmed by sequencing (data not shown).

152

Figure 7.5: Expression of ANKRD26 mRNA by tissue type ANKRD26 expression was assessed by qPCR of exon 21 (common to all reported isoforms) and quantitated for comparison to a standard curve. All mRNA samples except MEQ 01 and MEQ α (cell lines for future analyses of ANKRD26) are commercially available. X-axis: tissue or cell type. Y-axis: relative quantity of ANKRD26 exon 21 mRNA based on quantitation to standard curve. A) Ambion FirstChoice human total RNA survey panel, Clontech human bone marrow and peripheral leukocyte total RNA, AllCells adult bone marrow CD33+ and CD36+ total RNA, and cell line Meq 01 and Meq α RNA. ANKRD26 is widely expressed with the highest expression levels cervix, heart, skeletal muscle, and small intestines. B) ANKRD26 is expressed in hematopoietic tissues at lower levels relative to other tissue types.

153 Table 7.2: ANKRD26 mRNA isoforms expression by tissue type ANKRD26 001 201 202 Tissue Exon 21 Exon 13 Intron 16 Intron 6 Adipose + - + + Bladder + - + + Bone Marrow + - + + Brain + - + + CD33+ + - + + CD36+ + - + + Cervix + - + + Colon + - + + Esophagus + - + + Heart + - + + Kidney + - + + Liver + - + +/- Lung + - + + MEQ 01 + - + + MEQ α + - + + Ovary + - + + P. leukocytes + - + + Placenta + - + + Prostate + - + + Skeletal muscle + - + + Small intestine + - + + Spleen + - + + Testes + - + + Thymus - - - - Thyroid - - - - Trachea - - - -

Discussion

To date, the MASTL c.506G>C mutation has been reported in two North

American Caucasian kindreds with nonsyndromic autosomal dominant thrombocytopenia, THC2, including the new kindred reported herein. Whereas in silico analysis of the conservation of position 506 and the potential impact of the E167D substitution is discrepant, MASTL cannot be ruled out as a candidate gene. Previously published Northern blot analysis reported MASTL mRNA is expressed in hematopoietic cells404. Furthermore, transient knockdown of the MASTL ortholog in zebrafish resulted

154 in decreased expression of the thrombopoietin receptor (c-mpl) as well as platelet protein CD41 (GPIIb) and decreased circulating thrombocytes404. These findings suggest

that MASTL may participate in hematopoiesis. Moreover, the zebrafish model indicates

that the mechanism of disease may be haploinsufficiency. Though there is no known

microdeletion syndrome that includes 10p12.1, at least six individuals have been

reported with segmental duplications or deletions encompassing MASTL and ANKRD26

(https://decipher.sanger.ac.uk/). Neither thrombocytopenia nor bleeding diathesis is

listed as phenotypic traits in any of these six reports, suggesting that haploinsufficiency

may not be the correct mechanism of disease.

In the study reporting identification of ANKRD26 5’-UTR mutations in

nonsyndromic autosomal dominant thrombocytopenia, sequence analysis of MASTL

revealed only variants previously reported in dbSNP, though no other details are

provided about these sequence results398. It is worth noting the MASTL c.506G>C

variant is listed in dbSNP (rs28941470). Subsequently, Pippucci et al., report five

mutations confined to a 19-bp segment of the 5’-UTR of ANKRD26, which they propose

is the causal gene underlying nonsyndromic autosomal dominant thrombocytopenia,

THC2. A subsequent report by the same group increased the total number of ANKRD26

variants to 12 confined to a 22-bp segment of the 5’-UTR399. The 1,000 Genomes

Project did not reveal any variation in the candidate segment of ANKRD26 5’-UTR nor at position 506 of MASTL. Nearby SNPs are present within the 1,000 Genomes dataset, suggesting adequate coverage of these regions.

Additionally, we identified the ANKRD26 c.-113A>C variant in two pedigree

members: a mother who married into the kindred and a daughter with thrombocytopenia

that is also heterozygous for the ANKRD26 c.126T>C and MASTL c.506G>C haplotype.

The ANKRD26 c.-113A>C variant was previously reported as causal in an Italian family

with three patients399. The previous report agnostically sequenced ANKRD26 exon 1 in

155 210 index cases of inherited thrombocytopenia of unknown cause. Thus, it is possible the family with the ANKRD26 c.-113A>C variant may not demonstrate linkage to THC2 and thus be misclassified.

Bone marrow examination of four Italian patients and the index case presented herein revealed normal numbers of megakaryocytes at all stages of maturity, though with hypolobated nuclei, suggesting that thrombocytopenia likely arises as a result of defective platelet release or decreased platelet lifespan. The proposed role of ANKRD26

5’-UTR mutations resulting in increased apoptosis is mechanistically consistent with a gain-of-function mutation.

The potential role of ANKRD26 in hematopoiesis and the pathogenesis of

nonsyndromic thrombocytopenia are not well understood. ANKRD26 is believed to be

the earliest known common ancestor to the POTE (prostate-, ovary-, testis-, and

placenta-expressed ankyrin domain) family of genes based on sequence homology with

POTE8, the earliest POTE gene by phylogeny405. Ankyrin is a conserved 33 amino-acid

domain, and multiple ankyrin domains in a gene are thought to stack side-by-side to

form specific protein-binding interfaces that mediate protein-protein interactions406.

ANKRD26 also contains a coiled-coil region of α-helices thought to serve a mechanical role in forming stiff bundles of fibers. Sequence analysis predicts a strong nuclear localization signal near the amino terminal region of the polypeptide405. Previous reports

suggest that the mouse ortholog peptide colocalizes with filamentous actin just beneath

the plasma membrane when expressed in HeLa and 293T cells407. However, our

sequence analysis of the Ankrd26-EGFP plasmid provided by Bera et al., reveals the

expression construct contains a truncated Ankrd26 coding region (data not shown).

To date, no data have been published demonstrating that ANKRD26 participates

in apoptosis. POTE paralogs fused with actin cDNA have been reported, and

overexpression in HeLa cells proved to be toxic to the cells408,409. Furthermore,

156 expression of a POTE-2α-actin fusion induced Fas ligand- and TRAIL-mediated apoptosis409. Autosomal dominant thrombocytopenia in which linkage to the THC4 locus was established is also hypothesized to be due to increased apoptotic activity410. Taken

together, these reports establish apoptosis as a mechanism of thrombocytopenia, and

these reports show genes with ancestral homology to ANKRD26 may encode apoptotic

proteins.

Interestingly, a mouse model in which Ankrd26 was disrupted by β-

galatctosidase insertion into intron 4, resulting in partial inactivation of Ankrd26,

demonstrated that homozygous mutant mice develop obesity and insulin resistance due

to hyperphagia.407 Heterozygous mice did not display an observable phenotype, and heterozygous and homozygous mice have a normal platelet count, supporting previous data to suggest that haploinsufficiency is not the disease mechanism underlying autosomal dominant thrombocytopenia, THC2398,407.

When considering ANKRD26 as the putative gene underlying autosomal

dominant thrombocytopenia, a key unanswered question is the role of the implicated 5’-

UTR mutations in mRNA biology. The 5’-UTR of mRNA is an important regulatory

component of gene expression through diverse mechanisms that can include regulating

translational efficiency411. The predicted ΔG of the ANKRD26 5’-UTR suggests that the

folded 5’-UTR of ANKRD26 is a stable structure and would pose an obstacle to

translation. While the predicted structural changes introduced by ANKRD26 5’-UTR

variants are local, they may impact key protein binding domains within the 5’-UTR. The

ANKRD26 c.-126T>C and c.-134G>A variants are predicted to alter a stem structure

consistent in five mfold model simulations, introducing a novel terminal stem loop. 5’-

UTR stem loops have been implicated as protein binding sites that mediate translational

efficiency412,413. Furthermore, gain-of-function 5’-UTR variants have been described that result in decreased binding of repressive proteins414.

157 Alternatively, 5’-UTR mRNA folded structures are dynamic and can adopt multiple conformations415. The 5’-UTR mRNA stem structure in ANKRD26 implicated in nonsyndromic autosomal dominant thrombocytopenia, THC2, may undergo conformational changes under physiologic conditions that would result in the creation of a similar stem loop as that introduced by the c.-126T>C variant. Hence, it is possible

ANKRD26 5’-UTR mutations may generate a more readily translated ANKRD26 transcript. Recently, it has been demonstrated that the 5’-UTR folded structure is important to the formation and function of internal ribosome entry sites (IRES)416.

Mutations within IRES domains may alter the efficiency of ribosome recruitment and

assembly417. The c.-126T>C and c.-134G>A variants may result in the conformational

changes necessary for recognition of an IRES within the ANKRD26 5’-UTR.

Modeling the c.-113A>C variant results in no predicted structural changes of the

5’-UTR stem structure implicated in nonsyndromic autosomal dominant

thrombocytopenia, THC2. This is consistent with our observation that the c.-113A>C

variant is non-penetrant, suggesting that it was previously misclassified as a causal

variant. It is likely that ANKRD26 5’-UTR mutations underlying thrombocytopenia result

in conformational changes of this stem structure and introduce a stem loop similar to that

introduced by the c.-126T>C and c.-134 G>A variants. Thus, variants that do not disrupt

the stem structure are likely nonpenetrant and may result in misclassification of the

cause of thrombocytopenia if not further interrogated.

Overexpression reporter assays of transfected K562 (undifferentiated myeloid)

and Dami (megakaryoblastic) cells with wild-type or ANKRD26 c.-127A>T, c.-128G>A,

or c.-134G>A demonstrated the ANKRD26 c.-128G>A and c.-134G>A constructs

resulted in increased expression of the reporter gene398. When Dami cells were stimulated to mature toward a megakyarocytic linkage, all three ANKRD26 variant constructs resulted in increased expression of the reporter gene. Increased luciferase

158 activity is consistent with a gain-of-function mechanism, though the authors provide insufficient experimental details or results to differentiate transcript stability or levels from translational efficiency.

It was previously reported that ANKRD26 transcripts were detected in megakaryocytes and erythroblasts cultured from cord blood418. We detected ANKRD26

mRNA in multiple hematopoietic tissues and cell types including total bone marrow

mRNA, CD33+ and CD36+ progenitor cells, and peripheral leukocytes. Preliminary

experiments demonstrate slight elevation in total ANKRD26 expression in the one

patient evaluated to date when compared to pooled unrelated control samples (1.51-fold

increase in exon 21, 2.32-fold increase in isoform 201, and 2.25-fold increase in isoform

202). Replication experiments are currently underway using mRNA isolated from a

normalized number of platelets from patients and controls.

In summary, we report the presence of the ANKRD26 c.-126T>C variant in

linkage disequilibrium with the MASTL c.506G>C variant in a new kindred with

autosomal dominant thrombocytopenia in which linkage to THC2 has been established.

Furthermore, we report identification of the ANKRD26 c.-126T>C variant in the original

MASTL kindred described by Drachman et al201,396. We also report a third kindred in which the ANKRD26 c.-126T>C variant was identified, but no exonic MASTL variants

were present. Taken together with ANKRD26 5’-UTR variants reported in multiple Italian

kindreds in which linkage to the THC2 locus has been established, these findings

suggest that certain 5’-UTR mutations in ANKRD26 are the most likely cause of

nonsyndromic autosomal dominant thrombocytopenia, THC2398.

159 CHAPTER VIII

SUMMARY AND FUTURE DIRECTIONS

Summary

The focus of this dissertation was to further elucidate the genetic etiology of the von Willebrand factor (VWF) – coagulation factor VIII (FVIII) – platelet axis of hemostasis and thrombosis through family-based studies. The multigenerational Amish pedigree, in which we diagnosed VWD type 2 with an underlying c.4120C>T mutation, was enrolled for the purpose of mapping quantitative trait loci (QTL) of VWF and FVIII levels, bleeding, and complete blood count (CBC) traits such as erythrocyte measures and platelet count. The candidate QTLs mapped within the Amish pedigree that may affect

VWF and FVIII levels will provide new insights into the genetic etiology of these two continuous traits. Furthermore, our findings suggest that the effect of QTLs on FVIII may be mediated through its noncovalent relationship with VWF, which highlights the importance of VWF in determining FVIII levels. Additionally, the candidate QTLs that may affect CBC traits will provide new insights into erythropoiesis.

By dichotomous multipoint linkage analysis, we established linkage between

VWF:RCo and chromosome 6p22, a candidate QTL previously reported in multiple studies. With the implementation of SOLAR, genome-wide QTL linkage analysis identified candidate QTL of VWF:Ag and VWF:RCo mapping to chromosomes 3q28,

9q34, and 16q11. Fine-mapping analysis of chromosome 9q34 established that ABO,

and not ADAMTS13, contributes to variation in VWF levels in the Amish pedigree. To

our knowledge, we are the first to specifically evaluate ADAMTS13 in light of linkage or

association to chromosome 9q34. We also replicated previously reported associations

with VWF and FVIII levels from the largest meta-analysis reported to date, confirming

STXBP5, SCARA5, STAB2, and TC2N as candidate QTL.

160 We also conducted genome-wide linkage analyses of CBC traits. Due to hemorheologic effect, larger blood components such as erythrocytes and leukocytes displace the circulating VWF-FVIII complex and platelets toward the endothelium. Thus, for all standard CBC traits, independent associations with pathologic thromboembolism have been reported. Within the Amish, we identified four candidate QTL with LOD scores greater than 2.0, replicating two previously reported QTL and identifying two novel QTL. To our knowledge, this study is also the first genome-wide evaluation of red blood cell distribution width (RDW), for which we identified candidate linkage to chromosome 10p12 (LOD=2.09). Interestingly, we demonstrated that rs1800562, the most common causal variant of the Mendelian disease hereditary hemochromatosis, is also associated with common variation in erythrocyte traits.

In addition to mapping QTL of VWF levels, we sought to further characterize variation as a result of the biology of the VWF gene and VWF polypeptide biosynthesis.

We genotyped 94 tagging SNPs encompassing VWF in the Amish pedigree and

established significant associations between 12 SNPs and VWF:RCo. Nine VWF:RCo

SNPs map to a haplotype block within the pedigree that includes the functional domains

of the VWF polypeptide. Of 11 SNPs selected for follow-up, we ultimately replicated

significant associations for two SNPs in three different population-based cohorts

(n=3,316). An additional four SNPs were replicated in two of three cohorts. Interestingly,

we found that different VWF variants were most important in determining variation in

VWF:Ag, VWF:RCo, and the VWF:RCo/VWF:Ag ratio. Furthermore, we identified a

nonsynonymous variant (T1381A) as well as a potentially functional intronic variant

(rs216308) that are associated with VWF levels. The ultimate goal of this project was to

identify variants that may contribute to the complex etiology of VWD type 1. We

established that one variant was overrepresented and four variants were

161 underrepresented in a VWD type 1 index case cohort compared to a control cohort, suggesting these variants may contribute to the complex etiology of VWD type 1.

As discussed, VWD is characterized by phenotypic variability. Management of mild cases of VWD is further complicated by the absence of a putative VWF mutation in nearly half of patients. In a cohort of VWD type 1 index cases, we established that the minor allele frequencies of five VWF SNPs were significantly different compared to unrelated control individuals. These findings suggest that common variants that affect

VWF levels may also contribute to the genetic etiology of mild cases of VWD, particularly in the absence of a putative VWF mutation.

We identified the novel VWF c.3911T>G mutation and evaluated other VWF variants, allele-specific mRNA expression, and VWF multimer structure to characterize sources of phenotypic variation in a family with VWD type 2. Through cell expression assays we established that the c.3911T>G mutation results in defective VWF polypeptide synthesis and trafficking, though the mutant polypeptide is incorporated into

VWF multimers. Using the AU/VWFa-11 nanobody we determined VWD patients had differential incorporation of the mutant polypeptide into VWF multimers. Thus, we proposed that the phenotypic variation in VWD type 2 may be due, in part, to differential mutant monomer incorporation. As better assays are developed to quantitate mutant monomer incorporation in VWF multimers, we may be able to establish a new biomarker of VWF function that may be more predictive of bleeding risk.

The final aim of this dissertation was to identify genes involved in platelet biogenesis through studying two families with congenital forms of thrombocytopenia. In three kindreds, including a large Native American pedigree, we identified multiple deleterious mutations in NBEAL2, ascertaining the first gene implicated in platelet α- granule biogenesis. We also characterized a family with nonsyndromic autosomal dominant thrombocytopenia. Focused analysis of whole genome sequencing identified

162 two candidate mutations: c.506G>C in MASTL and c.-126T>C in ANKRD26. We established that both mutations are in complete linkage disequilibrium within the family and identified both mutations in a second kindred. However, in a third kindred, we identified only the ANKRD26 c.-126T>C mutation. This finding, in combination with recently published studies, suggests that 5’-untranslated region (UTR) mutations in

ANKRD26 are the most likely cause of nonsyndromic autosomal dominant thrombocytopenia. In silico modeling determined the c.-126T>C mutation introduces a terminal stem loop, which may alter protein binding or ribosomal recruitment to the transcript, in turn affecting translational efficiency.

Future directions

Analysis of the VWF gene and VWF multimerization provided unique insights into the genetic etiology of VWD type 1 as well as the phenotypic variability of VWD type 2.

Ultimately, the interest in further characterizing common variation in VWF is to identify patients that are at high risk for excessive bleeding or pathologic thromboembolism due to chronically elevated or decreased VWF levels. Furthermore, recombinant protein manufacturing is expensive and manufacturers are interested interest in making the process more efficient by increasing protein expression or increasing product half-life115.

VWF SNP analysis in the Amish pedigree identified four haplotype blocks that demonstrate strong linkage disequilibrium. Analyses are ongoing to identify haplotypes associated with VWF levels. Further biochemical characterization of the T1381A variant is needed to determine if it is an artifact altering ristocetin binding, a clinically relevant variant altering VWF levels or activity, or a marker of other functional variants. The most direct approach is to analyze blood plasma samples from individuals with and without the T1381A variant with respect to multiple laboratory evaluations of VWF-mediated binding including ristocetin-induced platelet aggregation and VWF collagen binding

163 assays. Identification of variants that alter laboratory test results such as VWF:RCo but have no other clinical impact continue to provide clarification of a complex clinical challenge. Additionally, experiments are needed to determine if the intronic variant rs216308 is functional or is in linkage disequilibrium with functional variants.

Electrophoretic mobility shift assay (EMSA) would allow one to determine if the DNA sequence containing the minor allele of rs216308 demonstrates altered protein binding when compared with the wild-type sequence. All other SNPs to which significant associations were established likely tag functional variants, which have yet to be elucidated. Based on haplotype structure within the Amish, we can conduct focused sequencing to elucidate variants for functional characterization.

Within a smaller pedigree with VWD type 2, we identified and characterized the novel c.3911T>G mutation and proposed that differential mutant VWF monomer incorporation into multimers underlies phenotypic variability. Due to the complexity and size of VWF multimers, it remains a challenge to quantitate mutant and wild-type monomers in VWF multimers. Efforts are ongoing to optimize mass spectrometry protocols that would allow dissection of and allele-specific quantitation of VWF monomer subunits. This will provide a robust tool to validate our hypothesis and to further characterize the phenotypic variation of VWD type 2.

While we did not identify differences in allele-specific mRNA expression within the VWF c.3911T>G family, we did optimize a novel assay for allele-specific mRNA quantitation in human subjects. Phenotypic variation in a larger family, such as the

Amish pedigree, may be explained in part by differences in mRNA levels. Additional enrollment clinics will be required to collect samples for platelet isolation, and the allele- specific quantitation protocol can then be adapted to the c.4120C>T mutation to evaluate the presence of a hypomorphic allele within the Amish pedigree. Furthermore,

164 the nanobody binding assay, and ultimately mass spectrometry, will be used to evaluate mutant monomer incorporation and phenotypic variability in the Amish pedigree.

In terms of the bleeding phenotype of VWD, analyses are complicated due limitations in phenotype ascertainment. The bleeding survey is a subjective screening tool used to assess one’s risk to bleed. However, its usefulness as a measure of bleeding for research purposes has been poor. Arguably, the most informative individuals to study are those with a known causal VWF mutation that do not bleed upon clinical challenge such as tooth extraction, surgery, or parturition versus patients with the same mutation that demonstrate clinically significant bleeding. Limiting analysis of bleeding to a dichotomous trait will simplify the phenotype while simultaneously eliminating uninformative individuals from the study design. Within the Amish pedigree, work is ongoing to objectively define bleeding upon challenge and quality of life impacts that bleeding has upon this agrarian culture. Furthermore, as the cost of next generation sequencing continues to decline, it is becoming affordable to conduct whole genome sequencing or exome sequencing within the “bleeders” and “non-bleeders” with the c.4120C>T mutation to identify variants of the bleeding phenotype. Linkage analysis may prove valuable to sort through next generation sequencing data and detect variants with a large effect on the bleeding phenotype.

In addition to studying the phenotypic variability of VWD, we also sought to map candidate QTL of VWF and FVIII levels. Association analyses to fine-map the candidate

QTL on chromosomes 6p22 and 16q11 are ongoing. Subsequently, we will increase fine-mapping resolution and sequence candidate genes to identify variants for replication in unrelated cohorts. Furthermore, we know that VWF:Ag and FVIII:C are highly correlated traits. All candidate QTL of FVIII:C identified are concomitant with VWF:Ag

QTL, which suggests that variation is mediated through the noncovalent relationship of

VWF and FVIII in circulation. Thus, we are currently conducting bivariate analysis of

165 VWF:Ag and FVIII:C to identify candidate QTL whose effect on VWF:Ag levels is sufficient to subsequently alter FVIII:C as well.

Identification of NBEAL2 in gray platelet syndrome provided the first gene implicated in α-granule biogenesis. We presented initial platelet transcriptome heat map analyses which are currently being replicated and further characterized within additional patients. Efforts are also ongoing to characterize the protein product of NBEAL2 in megakaryocytic cell lines to determine protein expression and localization as well as elucidate protein-protein interactions. By siRNA, we can knock down NBEAL2 expression and evaluate the impact on α-granule and proplatelet formation within cell culture models. In addition, the nbeal2 null mouse is now available. We will conduct

experiments to determine if the absence of murine nbeal2 expression triggers a similar

disruption of the transcriptome as well as the role of nbeal2 in platelet biogenesis.

The elucidation of the gene in nonsyndromic autosomal dominant

thrombocytopenia was complicated by the presence of two candidate mutations in

complete linkage disequilibrium within the family. Identification of only the ANKRD26 5’-

UTR variant within an unrelated family in combination with other recently published

reports suggests 5’-UTR mutations in ANKRD26 are the most likely cause of nonsyndromic autosomal dominant thrombocytopenia. Interestingly, we identified a nonpenetrant 5’-UTR variant (c.-113A>C) that has previously been reported as “causal.”

In silico modeling analyses predicted the penetrant c.-126T>C variant introduces a novel terminal stem loop whereas the c.-113A>C variant does not alter the predicted wild-type

5’-UTR structure. We will next evaluate the potential functionality of the ANKRD26 c.-

126 T>C and c.-113 A>C variants by EMSA and luciferase promoter assays. Based on clinical reports of microdeletions encompassing ANKRD26, we hypothesize the mutation functions through gain-of-function rather than haploinsufficiency. Increased luciferase activity would confirm a gain-of-function hypothesis. Furthermore, characterization of

166 mRNA and protein levels within the luciferase assay would allow differentiation of transcriptional stability versus translational efficiency.

Pending luciferase assay results, we plan to evaluate the gain-of-function hypothesis by overexpressing ANKRD26 in megakaryocytic cell lines to determine the effect on megakaryopoiesis and proplatelet formation. We also plan to introduce gain-of- function variants using the recently developed zinc finger nuclease methodology into a zebrafish model to evaluate thrombopoiesis419.

Hemostasis and thrombosis represent a complex clinical spectrum. The research presented in this dissertation provided valuable insights and hypotheses regarding variation of common hemostatic traits as well as the phenotypic variability of VWD.

Additionally, we identified genes implicated in platelet biogenesis. Additional studies will allow further elucidation of the complex etiology of hemostasis and thrombosis, providing new clinical insights for diagnosis and risk assessment. Furthermore, a better understanding of the molecular mechanisms of hemostasis and thrombosis may provide insights for the development of new treatments.

167 REFERENCES

1. James, A., Matchar, D.B. & Myers, E.R. Testing for von Willebrand disease in women with menorrhagia: a systematic review. Obstet Gynecol 104, 381-8 (2004).

2. Clarke, A., Black, N., Rowe, P., Mott, S. & Howle, K. Indications for and outcome of total abdominal hysterectomy for benign disease: a prospective cohort study. Br J Obstet Gynaecol 102, 611-20 (1995).

3. El-Refaey, H. & Rodeck, C. Post-partum haemorrhage: definitions, medical and surgical management. A time for change. Br Med Bull 67, 205-17 (2003).

4. Sadler, J.E. Von Willebrand disease type 1: a diagnosis in search of a disease. Blood 101, 2089-93 (2003).

5. Roger, V.L. et al. Heart disease and stroke statistics--2012 update: a report from the American Heart Association. Circulation 125, e2-e220 (2012).

6. Mahan, C.E., Holdsworth, M.T., Welch, S.M., Borrego, M. & Spyropoulos, A.C. Deep-vein thrombosis: a United States cost model for a preventable and costly adverse event. Thromb Haemost 106, 405-15 (2011).

7. Baskurt, O.K. & Meiselman, H.J. Blood rheology and hemodynamics. Semin Thromb Hemost 29, 435-50 (2003).

8. Furie, B. & Furie, B.C. Thrombus formation in vivo. J Clin Invest 115, 3355-62 (2005).

9. Casonato, A. et al. Post-DDAVP thrombocytopenia in type 2B von Willebrand disease is not associated with platelet consumption: failure to demonstrate glycocalicin increase or platelet activation. Thromb Haemost 81, 224-8 (1999).

10. Siedlecki, C.A. et al. Shear-dependent changes in the three-dimensional structure of human von Willebrand factor. Blood 88, 2939-50 (1996).

11. Sadler, J.E. Biochemistry and genetics of von Willebrand factor. Annu Rev Biochem 67, 395-424 (1998).

12. Savage, B., Saldivar, E. & Ruggeri, Z.M. Initiation of platelet adhesion by arrest onto fibrinogen or translocation on von Willebrand factor. Cell 84, 289-97 (1996).

13. Ruggeri, Z.M. Structure of von Willebrand factor and its function in platelet adhesion and thrombus formation. Best Pract Res Clin Haematol 14, 257-79 (2001).

14. Andrews, R.K., Gardiner, E.E., Shen, Y., Whisstock, J.C. & Berndt, M.C. Glycoprotein Ib-IX-V. Int J Biochem Cell Biol 35, 1170-4 (2003).

168 15. Berndt, M.C., Shen, Y., Dopheide, S.M., Gardiner, E.E. & Andrews, R.K. The vascular biology of the glycoprotein Ib-IX-V complex. Thromb Haemost 86, 178- 88 (2001).

16. Huizinga, E.G. et al. Structures of glycoprotein Ibalpha and its complex with von Willebrand factor A1 domain. Science 297, 1176-9 (2002).

17. Goto, S., Salomon, D.R., Ikeda, Y. & Ruggeri, Z.M. Characterization of the unique mechanism mediating the shear-dependent binding of soluble von Willebrand factor to platelets. J Biol Chem 270, 23352-61 (1995).

18. Mendolicchio, G.L. & Ruggeri, Z.M. New perspectives on von Willebrand factor functions in hemostasis and thrombosis. Semin Hematol 42, 5-14 (2005).

19. Ruggeri, Z.M. Platelets in atherothrombosis. Nat Med 8, 1227-34 (2002).

20. Kasirer-Friede, A. et al. Lateral clustering of platelet GP Ib-IX complexes leads to up-regulation of the adhesive function of integrin alpha IIbbeta 3. J Biol Chem 277, 11949-56 (2002).

21. Mazzucato, M., Pradella, P., Cozzi, M.R., De Marco, L. & Ruggeri, Z.M. Sequential cytoplasmic calcium signals in a 2-stage platelet activation process induced by the glycoprotein Ibalpha mechanoreceptor. Blood 100, 2793-800 (2002).

22. Furlan, M., Robles, R. & Lammle, B. Partial purification and characterization of a protease from human plasma cleaving von Willebrand factor to fragments produced by in vivo proteolysis. Blood 87, 4223-34 (1996).

23. Torres, R. & Fedoriw, Y. Laboratory testing for von Willebrand disease: toward a mechanism-based classification. Clin Lab Med 29, 193-228 (2009).

24. de Wit, T.R. & van Mourik, J.A. Biosynthesis, processing and secretion of von Willebrand factor: biological implications. Best Pract Res Clin Haematol 14, 241- 55 (2001).

25. Rosenberg, J.B. et al. Intracellular trafficking of factor VIII to von Willebrand factor storage granules. J Clin Invest 101, 613-24 (1998).

26. Baruch, D., Bahnak, B., Girma, J.P. & Meyer, D. von Willebrand factor and platelet function. Baillieres Clin Haematol 2, 627-72 (1989).

27. de Wit, T.R. & van Mourik, J.A. Biosynthesis, processing, and secretion of von Willebrand factor: biological implications. Best Practice & Research Clinical Haematology 14, 241-255 (2001).

28. Ginsburg, D. et al. Human von Willebrand factor (vWF): isolation of complementary DNA (cDNA) clones and chromosomal localization. Science 228, 1401-6 (1985).

169 29. Mancuso, D.J. et al. Structure of the gene for human von Willebrand factor. J Biol Chem 264, 19514-27 (1989).

30. Shelton-Inloes, B.B., Chehab, F.F., Mannucci, P.M., Federici, A.B. & Sadler, J.E. Gene deletions correlate with the development of alloantibodies in von Willebrand disease. J Clin Invest 79, 1459-65 (1987).

31. Verweij, C.L., Diergaarde, P.J., Hart, M. & Pannekoek, H. Full-length von Willebrand factor (vWF) cDNA encodes a highly repetitive protein considerably larger than the mature vWF subunit. EMBO J 5, 1839-47 (1986).

32. Mayadas, T.N. & Wagner, D.D. von Willebrand factor biosynthesis and processing. Ann N Y Acad Sci 614, 153-66 (1991).

33. Mancuso, D.J. et al. Human von Willebrand factor gene and pseudogene: structural analysis and differentiation by polymerase chain reaction. Biochemistry 30, 253-69 (1991).

34. Jahroudi, N., Ardekani, A.M. & Greenberger, J.S. An NF1-like protein functions as a repressor of the von Willebrand factor promoter. J Biol Chem 271, 21413-21 (1996).

35. Hough, C. et al. Influence of a GT repeat element on shear stress responsiveness of the VWF gene promoter. J Thromb Haemost 6, 1183-90 (2008).

36. Budde, U., Pieconka, A., Will, K. & Schneppenheim, R. Laboratory testing for von Willebrand disease: contribution of multimer analysis to diagnosis and classification. Semin Thromb Hemost 32, 514-21 (2006).

37. Bowen, D.J. An influence of ABO blood group on the rate of proteolysis of von Willebrand factor by ADAMTS13. J Thromb Haemost 1, 33-40 (2003).

38. Budde, U., Metzner, H.J. & Muller, H.G. Comparative analysis and classification of von Willebrand factor/factor VIII concentrates: impact on treatment of patients with von Willebrand disease. Semin Thromb Hemost 32, 626-35 (2006).

39. O'Donnell, J. & Laffan, M.A. The relationship between ABO histo-blood group, factor VIII and von Willebrand factor. Transfus Med 11, 343-51 (2001).

40. Sporn, L.A., Marder, V.J. & Wagner, D.D. Inducible secretion of large, biologically potent von Willebrand factor multimers. Cell 46, 185-90 (1986).

41. Galbusera, M. et al. Fluid shear stress modulates von Willebrand factor release from human vascular endothelium. Blood 90, 1558-64 (1997).

42. Brouland, J.P. et al. In vivo regulation of von willebrand factor synthesis: von Willebrand factor production in endothelial cells after lung transplantation between normal pigs and von Willebrand factor-deficient pigs. Arterioscler Thromb Vasc Biol 19, 3055-62 (1999).

170 43. Federici, A.B. & Canciani, M.T. Clinical and laboratory versus molecular markers for a correct classification of von Willebrand disease. Haematologica 94, 610-5 (2009).

44. Werner, E.J. et al. Prevalence of von Willebrand disease in children: a multiethnic study. J Pediatr 123, 893-8 (1993).

45. Nichols, W.C. & Ginsburg, D. von Willebrand disease. Medicine (Baltimore) 76, 1-20 (1997).

46. Di Paola, J. Determinants of bleeding severity in von Willebrand disease. Curr Hematol Rep 4, 345-9 (2005).

47. Levy, G. & Ginsburg, D. Getting at the variable expressivity of von Willebrand disease. Thromb Haemost 86, 144-8 (2001).

48. Goodeve, A. et al. Phenotype and genotype of a cohort of families historically diagnosed with type 1 von Willebrand disease in the European study, Molecular and Clinical Markers for the Diagnosis and Management of Type 1 von Willebrand Disease (MCMDM-1VWD). Blood 109, 112-21 (2007).

49. James, P.D. et al. The mutational spectrum of type 1 von Willebrand disease: Results from a Canadian cohort study. Blood 109, 145-54 (2007).

50. Lillicrap, D. Genotype/phenotype association in von Willebrand disease: is the glass half full or empty? J Thromb Haemost 7 Suppl 1, 65-70 (2009).

51. Eikenboom, J. et al. Linkage analysis in families diagnosed with type 1 von Willebrand disease in the European study, molecular and clinical markers for the diagnosis and management of type 1 VWD. J Thromb Haemost 4, 774-82 (2006).

52. Ginsburg, D. & Sadler, J.E. von Willebrand disease: a database of point mutations, insertions, and deletions. For the Consortium on von Willebrand Factor Mutations and Polymorphisms, and the Subcommittee on von Willebrand Factor of the Scientific and Standardization Committee of the International Society on Thrombosis and Haemostasis. Thromb Haemost 69, 177-84 (1993).

53. Castaman, G., Federici, A.B., Rodeghiero, F. & Mannucci, P.M. Von Willebrand's disease in the year 2003: towards the complete identification of gene defects for correct diagnosis and treatment. Haematologica 88, 94-108 (2003).

54. Sadler, J.E. et al. Update on the pathophysiology and classification of von Willebrand disease: a report of the Subcommittee on von Willebrand Factor. J Thromb Haemost 4, 2103-14 (2006).

55. Schneppenheim, R. & Budde, U. Phenotypic and genotypic diagnosis of von Willebrand disease: a 2004 update. Semin Hematol 42, 15-28 (2005).

171 56. Lyons, S.E., Bruck, M.E., Bowie, E.J. & Ginsburg, D. Impaired intracellular transport produced by a subset of type IIA von Willebrand disease mutations. J Biol Chem 267, 4424-30 (1992).

57. Cooney, K.A., Lyons, S.E. & Ginsburg, D. Functional analysis of a type IIB von Willebrand disease missense mutation: increased binding of large von Willebrand factor multimers to platelets. Proc Natl Acad Sci U S A 89, 2869-72 (1992).

58. Kroner, P.A., Kluessendorf, M.L., Scott, J.P. & Montgomery, R.R. Expressed full- length von Willebrand factor containing missense mutations linked to type IIB von Willebrand disease shows enhanced binding to platelets. Blood 79, 2048-55 (1992).

59. Mazurier, C. von Willebrand disease masquerading as haemophilia A. Thromb Haemost 67, 391-6 (1992).

60. Holmberg, L. et al. von Willebrand factor mutation enhancing interaction with platelets in patients with normal multimeric structure. J Clin Invest 91, 2169-77 (1993).

61. Zhang, Z.P., Blomback, M., Nyman, D. & Anvret, M. Mutations of von Willebrand factor gene in families with von Willebrand disease in the Aland Islands. Proc Natl Acad Sci U S A 90, 7937-40 (1993).

62. Eikenboom, J.C., Vink, T., Briet, E., Sixma, J.J. & Reitsma, P.H. Multiple substitutions in the von Willebrand factor gene that mimic the pseudogene sequence. Proc Natl Acad Sci U S A 91, 2221-4 (1994).

63. Eikenboom, J.C., Castaman, G., Vos, H.L., Bertina, R.M. & Rodeghiero, F. Characterization of the genetic defects in recessive type 1 and type 3 von Willebrand disease patients of Italian origin. Thromb Haemost 79, 709-17 (1998).

64. Surdhar, G.K., Enayat, M.S., Lawson, S., Williams, M.D. & Hill, F.G. Homozygous gene conversion in von Willebrand factor gene as a cause of type 3 von Willebrand disease and predisposition to inhibitor development. Blood 98, 248-50 (2001).

65. Gupta, P.K. et al. Gene conversions are a common cause of von Willebrand disease. Br J Haematol 130, 752-8 (2005).

66. Berber, E. et al. A Common VWF Exon 28 Haplotype in the Turkish Population. Clin Appl Thromb Hemost (2012).

67. Chen, J.M., Cooper, D.N., Chuzhanova, N., Ferec, C. & Patrinos, G.P. Gene conversion: mechanisms, evolution and human disease. Nat Rev Genet 8, 762- 75 (2007).

68. Goto, S., Sakai, H., Ikeda, Y. & Handa, S. Acute myocardial infarction plasma augments platelet thrombus growth in high shear rates. Lancet 349, 543-4 (1997).

172 69. Goto, S. et al. Enhanced shear-induced platelet aggregation in acute myocardial infarction. Circulation 99, 608-13 (1999).

70. Yamashita, A. et al. Detection of von Willebrand factor and tissue factor in platelets-fibrin rich coronary thrombi in acute myocardial infarction. Am J Cardiol 97, 26-8 (2006).

71. Hoshiba, Y., Hatakeyama, K., Tanabe, T., Asada, Y. & Goto, S. Co-localization of von Willebrand factor with platelet thrombi, tissue factor and platelets with fibrin, and consistent presence of inflammatory cells in coronary thrombi obtained by an aspiration device from patients with acute myocardial infarction. J Thromb Haemost 4, 114-20 (2006).

72. Thompson, S.G., Kienast, J., Pyke, S.D., Haverkate, F. & van de Loo, J.C. Hemostatic factors and the risk of myocardial infarction or sudden death in patients with angina pectoris. European Concerted Action on Thrombosis and Disabilities Angina Pectoris Study Group. N Engl J Med 332, 635-41 (1995).

73. Haines, A.P. et al. Haemostatic variables and the outcome of myocardial infarction. Thromb Haemost 50, 800-3 (1983).

74. Jansson, J.H., Nilsson, T.K. & Johnson, O. von Willebrand factor in plasma: a novel risk factor for recurrent myocardial infarction and death. Br Heart J 66, 351- 5 (1991).

75. Paulinska, P., Spiel, A. & Jilma, B. Role of von Willebrand factor in vascular disease. Hamostaseologie 29, 32-8 (2009).

76. Bongers, T.N. et al. High von Willebrand factor levels increase the risk of first ischemic stroke: influence of ADAMTS13, inflammation, and genetic variability. Stroke 37, 2672-7 (2006).

77. Folsom, A.R. et al. Prospective study of markers of hemostatic function with risk of ischemic stroke. The Atherosclerosis Risk in Communities (ARIC) Study Investigators. Circulation 100, 736-42 (1999).

78. Gottesman, R.F. et al. Hemostatic factors and subclinical brain infarction in a community-based sample: the ARIC study. Cerebrovasc Dis 28, 589-94 (2009).

79. Conway, D.S., Pearce, L.A., Chin, B.S., Hart, R.G. & Lip, G.Y. Prognostic value of plasma von Willebrand factor and soluble P-selectin as indices of endothelial damage and platelet activation in 994 patients with nonvalvular atrial fibrillation. Circulation 107, 3141-5 (2003).

80. Frankel, D.S. et al. Von Willebrand factor, type 2 diabetes mellitus, and risk of cardiovascular disease: the framingham offspring study. Circulation 118, 2533-9 (2008).

173 81. Folsom, A.R., Wu, K.K., Shahar, E. & Davis, C.E. Association of hemostatic variables with prevalent cardiovascular disease and asymptomatic carotid artery atherosclerosis. The Atherosclerosis Risk in Communities (ARIC) Study Investigators. Arterioscler Thromb 13, 1829-36 (1993).

82. Davie, E.W., Fujikawa, K. & Kisiel, W. The coagulation cascade: initiation, maintenance, and regulation. Biochemistry 30, 10363-70 (1991).

83. Rao, L.V., Williams, T. & Rapaport, S.I. Studies of the activation of factor VII bound to tissue factor. Blood 87, 3738-48 (1996).

84. Sakai, T., Lund-Hansen, T., Paborsky, L., Pedersen, A.H. & Kisiel, W. Binding of human factors VII and VIIa to a human bladder carcinoma cell line (J82). Implications for the initiation of the extrinsic pathway of blood coagulation. J Biol Chem 264, 9980-8 (1989).

85. Nemerson, Y. & Gentry, R. An ordered addition, essential activation model of the tissue factor pathway of coagulation: evidence for a conformational cage. Biochemistry 25, 4020-33 (1986).

86. Tracy, P.B., Nesheim, M.E. & Mann, K.G. Coordinate binding of factor Va and factor Xa to the unstimulated platelet. J Biol Chem 256, 743-51 (1981).

87. Krishnaswamy, S., Church, W.R., Nesheim, M.E. & Mann, K.G. Activation of human prothrombin by human prothrombinase. Influence of factor Va on the reaction mechanism. J Biol Chem 262, 3291-9 (1987).

88. Ferry, J.D. The Mechanism of Polymerization of Fibrinogen. Proc Natl Acad Sci U S A 38, 566-9 (1952).

89. Blomback, B. & Blomback, M. The molecular structure of fibrinogen. Ann N Y Acad Sci 202, 77-97 (1972).

90. Hermans, J. & McDonagh, J. Fibrin: structure and interactions. Semin Thromb Hemost 8, 11-24 (1982).

91. Lorand, L. & Konishi, K. Activation of the Fibrin Stabilizing Factor of Plasma by Thrombin. Arch Biochem Biophys 105, 58-67 (1964).

92. Komiyama, Y., Pedersen, A.H. & Kisiel, W. Proteolytic activation of human factors IX and X by recombinant human factor VIIa: effects of calcium, phospholipids, and tissue factor. Biochemistry 29, 9418-25 (1990).

93. Zur, M. & Nemerson, Y. Kinetics of factor IX activation via the extrinsic pathway. Dependence of Km on tissue factor. J Biol Chem 255, 5703-7 (1980).

94. Tuddenham, E.G. et al. Response to infusions of polyelectrolyte fractionated human factor VIII concentrate in human haemophilia A and von Willebrand's disease. Br J Haematol 52, 259-67 (1982).

174 95. Rapaport, S.I., Schiffman, S., Patch, M.J. & Ames, S.B. The importance of activation of antihemophilic globulin and proaccelerin by traces of thrombin in the generation of intrinsic prothrombinase activity. Blood 21, 221-36 (1963).

96. Osterud, B., Rapaport, S.I., Schiffman, S. & Chong, M.M. Formation of intrinsic factor-X activator activity, with special reference to the role of thrombin. Br J Haematol 21, 643-60 (1971).

97. Vehar, G.A. & Davie, E.W. Preparation and properties of bovine factor VIII (antihemophilic factor). Biochemistry 19, 401-10 (1980).

98. van Dieijen, G., Tans, G., Rosing, J. & Hemker, H.C. The role of phospholipid and factor VIIIa in the activation of bovine factor X. J Biol Chem 256, 3433-42 (1981).

99. Levin, E.G., Marzec, U., Anderson, J. & Harker, L.A. Thrombin stimulates tissue plasminogen activator release from cultured human endothelial cells. J Clin Invest 74, 1988-95 (1984).

100. Lijnen, H.R. & Collen, D. Interaction of plasminogen activators and inhibitors with plasminogen and fibrin. Semin Thromb Hemost 8, 2-10 (1982).

101. Rao, L.V. & Rapaport, S.I. Studies of a mechanism inhibiting the initiation of the extrinsic pathway of coagulation. Blood 69, 645-51 (1987).

102. Broze, G.J., Jr. et al. The lipoprotein-associated coagulation inhibitor that inhibits the factor VII-tissue factor complex also inhibits factor Xa: insight into its possible mechanism of action. Blood 71, 335-43 (1988).

103. Schapira, M., Scott, C.F. & Colman, R.W. Protection of human plasma kallikrein from inactivation by C1 inhibitor and other protease inhibitors. The role of high molecular weight kininogen. Biochemistry 20, 2738-43 (1981).

104. Scott, C.F., Schapira, M., James, H.L., Cohen, A.B. & Colman, R.W. Inactivation of factor XIa by plasma protease inhibitors: predominant role of alpha 1-protease inhibitor and protective effect of high molecular weight kininogen. J Clin Invest 69, 844-52 (1982).

105. Broze, G.J., Jr. Protein Z-dependent regulation of coagulation. Thromb Haemost 86, 8-13 (2001).

106. Walker, F.J. Regulation of activated protein C by a new protein. A possible function for bovine protein S. J Biol Chem 255, 5521-4 (1980).

107. Marlar, R.A., Kleiss, A.J. & Griffin, J.H. Mechanism of action of human activated protein C, a thrombin-dependent anticoagulant enzyme. Blood 59, 1067-72 (1982).

108. Gitschier, J. et al. Characterization of the human factor VIII gene. Nature 312, 326-30 (1984).

175 109. Bontempo, F.A. et al. Liver transplantation in hemophilia A. Blood 69, 1721-4 (1987).

110. Toole, J.J. et al. Molecular cloning of a cDNA encoding human antihaemophilic factor. Nature 312, 342-7 (1984).

111. Vehar, G.A. et al. Structure of human factor VIII. Nature 312, 337-42 (1984).

112. Leyte, A. et al. Sulfation of Tyr1680 of human blood coagulation factor VIII is essential for the interaction of factor VIII with von Willebrand factor. J Biol Chem 266, 740-6 (1991).

113. Tagliavacca, L., Moon, N., Dunham, W.R. & Kaufman, R.J. Identification and functional requirement of Cu(I) and its ligands within coagulation factor VIII. J Biol Chem 272, 27428-34 (1997).

114. Sudhakar, K. & Fay, P.J. Effects of copper on the structure and function of factor VIII subunits: evidence for an auxiliary role for copper ions in cofactor activity. Biochemistry 37, 6874-82 (1998).

115. Branchford, B.R., Monahan, P.E. & Di Paola, J. New developments in the treatment of pediatric hemophilia and bleeding disorders. Curr Opin Pediatr 25, 23-30 (2013).

116. Higuchi, M. et al. Characterization of mutations in the factor VIII gene by direct sequencing of amplified genomic DNA. Genomics 6, 65-71 (1990).

117. Foster, P.A., Fulcher, C.A., Houghten, R.A. & Zimmerman, T.S. An immunogenic region within residues Val1670-Glu1684 of the factor VIII light chain induces antibodies which inhibit binding of factor VIII to von Willebrand factor. J Biol Chem 263, 5230-4 (1988).

118. Gitschier, J., Kogan, S., Levinson, B. & Tuddenham, E.G. Mutations of factor VIII cleavage sites in hemophilia A. Blood 72, 1022-8 (1988).

119. Arai, M. et al. Direct characterization of factor VIII in plasma: detection of a mutation altering a thrombin cleavage site (arginine-372----histidine). Proc Natl Acad Sci U S A 86, 4277-81 (1989).

120. Pattinson, J.K. et al. The molecular genetic analysis of hemophilia A: a directed search strategy for the detection of point mutations in the human factor VIII gene. Blood 76, 2242-8 (1990).

121. Tuddenham, E.G. et al. Haemophilia A: database of nucleotide substitutions, deletions, insertions and rearrangements of the factor VIII gene. Nucleic Acids Res 19, 4821-33 (1991).

122. Kemball-Cook, G., Tuddenham, E.G. & Wacey, A.I. The factor VIII Structure and Mutation Resource Site: HAMSTeRS version 4. Nucleic Acids Res 26, 216-9 (1998).

176 123. Youssoufian, H. et al. Characterization of five partial deletions of the factor VIII gene. Proc Natl Acad Sci U S A 84, 3772-6 (1987).

124. Youssoufian, H., Kasper, C.K., Phillips, D.G., Kazazian, H.H., Jr. & Antonarakis, S.E. Restriction endonuclease mapping of six novel deletions of the factor VIII gene in hemophilia A. Hum Genet 80, 143-8 (1988).

125. Millar, D.S. et al. The molecular genetic analysis of haemophilia A; characterization of six partial deletions in the factor VIII gene. Hum Genet 86, 219-27 (1990).

126. Woods-Samuels, P., Kazazian, H.H., Jr. & Antonarakis, S.E. Nonhomologous recombination in the human genome: deletions in the human factor VIII gene. Genomics 10, 94-101 (1991).

127. Hoyer, L.W. Molecular pathology and immunology of factor VIII (hemophilia A and factor VIII inhibitors). Hum Pathol 18, 153-61 (1987).

128. Zhang, B. et al. Bleeding due to disruption of a cargo-specific ER-to-Golgi transport complex. Nat Genet 34, 220-5 (2003).

129. O'Donnell, J. et al. High prevalence of elevated factor VIII levels in patients referred for thrombophilia screening: role of increased synthesis and relationship to the acute phase reaction. Thromb Haemost 77, 825-8 (1997).

130. Kyrle, P.A. et al. High plasma levels of factor VIII and the risk of recurrent venous thromboembolism. N Engl J Med 343, 457-62 (2000).

131. Bank, I. et al. Absolute annual incidences of first events of venous thromboembolism and arterial vascular events in individuals with elevated FVIII:c. A prospective family cohort study. Thromb Haemost 98, 1040-4 (2007).

132. Soares, A.L. et al. Elevated plasma factor VIII and von Willebrand factor in women with type 2 diabetes: inflammatory reaction, endothelial perturbation or else? Blood Coagul Fibrinolysis 22, 600-5 (2011).

133. Kovar, F.M. et al. Coagulation factor VIII levels are associated with long-term survival - interactions with gender in a large hospital-based cohort. Wien Klin Wochenschr 122, 334-40 (2010).

134. Souto, J.C. et al. Genome-wide linkage analysis of von Willebrand factor plasma levels: results from the GAIT project. Thromb Haemost 89, 468-74 (2003).

135. Casana, P., Martinez, F., Haya, S., Espinos, C. & Aznar, J.A. Significant linkage and non-linkage of type 1 von Willebrand disease to the von Willebrand factor gene. Br J Haematol 115, 692-700 (2001).

136. Gill, J.C., Endres-Brooks, J., Bauer, P.J., Marks, W.J., Jr. & Montgomery, R.R. The effect of ABO blood group on the diagnosis of von Willebrand disease. Blood 69, 1691-5 (1987).

177 137. Rosendaal, F.R. & Bovill, E.G. Heritability of clotting factors and the revival of the prothrombotic state. Lancet 359, 638-9 (2002).

138. Nadeau, J.H. Modifier genes in mice and humans. Nat Rev Genet 2, 165-74 (2001).

139. Zucchero, T.M. et al. Interferon regulatory factor 6 (IRF6) gene variants and the risk of isolated cleft lip or palate. N Engl J Med 351, 769-80 (2004).

140. Rozmahel, R. et al. Modulation of disease severity in cystic fibrosis transmembrane conductance regulator deficient mice by a secondary genetic factor. Nat Genet 12, 280-7 (1996).

141. Zielenski, J. et al. Detection of a cystic fibrosis modifier locus for meconium ileus on human chromosome 19q13. Nat Genet 22, 128-9 (1999).

142. Nichols, W.C. et al. Moderation of hemophilia A phenotype by the factor V R506Q mutation. Blood 88, 1183-7 (1996).

143. Kutlar, A. Sickle cell disease: a multigenic perspective of a single-gene disorder. Med Princ Pract 14 Suppl 1, 15-9 (2005).

144. Conlan, M.G. et al. Associations of factor VIII and von Willebrand factor with age, race, sex, and risk factors for atherosclerosis. The Atherosclerosis Risk in Communities (ARIC) Study. Thromb Haemost 70, 380-5 (1993).

145. Allman-Farinelli, M.A. & Dawson, B. Diet and aging: bearing on thrombosis and hemostasis. Semin Thromb Hemost 31, 111-7 (2005).

146. Flood, V.H. et al. Common VWF exon 28 polymorphisms in African Americans affecting the VWF activity assay by ristocetin cofactor. Blood 116, 280-6 (2010).

147. Miller, C.H., Dilley, A., Richardson, L., Hooper, W.C. & Evatt, B.L. Population differences in von Willebrand factor levels affect the diagnosis of von Willebrand disease in African-American women. Am J Hematol 67, 125-9 (2001).

148. Miller, C.H. et al. Measurement of von Willebrand factor activity: relative effects of ABO blood type and race. J Thromb Haemost 1, 2191-7 (2003).

149. Campos, M. et al. Genetic determinants of plasma von Willebrand factor antigen levels: a target gene SNP and haplotype analysis of ARIC cohort. Blood 117, 5224-30 (2011).

150. Orstavik, K.H. et al. Factor VIII and factor IX in a twin population. Evidence for a major effect of ABO locus on factor VIII level. Am J Hum Genet 37, 89-101 (1985).

151. de Lange, M., Snieder, H., Ariens, R.A., Spector, T.D. & Grant, P.J. The genetics of haemostasis: a twin study. Lancet 357, 101-5 (2001).

178 152. Vossen, C.Y. et al. Heritability of plasma concentrations of clotting factors and measures of a prethrombotic state in a protein C-deficient family. J Thromb Haemost 2, 242-7 (2004).

153. Souto, J.C. et al. Genetic determinants of hemostasis phenotypes in Spanish families. Circulation 101, 1546-51 (2000).

154. Hampshire, D.J. et al. Polymorphic variation within the VWF gene contributes to the failure to detect mutations in patients historically diagnosed with type 1 von Willebrand disease from the MCMDM-1VWD cohort. Haematologica 95, 2163-5 (2010).

155. Daidone, V. et al. An apparently silent nucleotide substitution (c.7056C>T) in the von Willebrand factor gene is responsible for type 1 von Willebrand disease. Haematologica 96, 881-7 (2011).

156. Harvey, P.J., Keightley, A.M., Lam, Y.M., Cameron, C. & Lillicrap, D. A single nucleotide polymorphism at nucleotide -1793 in the von Willebrand factor (VWF) regulatory region is associated with plasma VWF:Ag levels. Br J Haematol 109, 349-53 (2000).

157. Hickson, N. et al. Effect of the VWF promoter (GT)n repeat and single-nucleotide polymorphism c.-2527G>A on circulating von Willebrand factor levels under normal conditions. J Thromb Haemost 9, 603-5 (2011).

158. O'Donnell, J., Boulton, F.E., Manning, R.A. & Laffan, M.A. Amount of H antigen expressed on circulating von Willebrand factor is modified by ABO blood group genotype and is a major determinant of plasma von Willebrand factor antigen levels. Arterioscler Thromb Vasc Biol 22, 335-41 (2002).

159. Bhende, Y.M. et al. A "new" blood group character related to the ABO system. Lancet 1, 903-4 (1952).

160. Sneath, J.S. & Sneath, P.H. Transformation of the Lewis groups of human red cells. Nature 176, 172 (1955).

161. Rouger, P., Riveau, D. & Salmon, C. Detection of the H and I blood group antigens in normal plasma. A comparison with A and i antigens. Vox Sang 37, 78-83 (1979).

162. Orstavik, K.H., Kornstad, L., Reisner, H. & Berg, K. Possible effect of secretor locus on plasma concentration of factor VIII and von Willebrand factor. Blood 73, 990-3 (1989).

163. Oriol, R., Candelier, J.J. & Mollicone, R. Molecular genetics of H. Vox Sang 78 Suppl 2, 105-8 (2000).

164. O'Donnell, J., Boulton, F.E., Manning, R.A. & Laffan, M.A. Genotype at the secretor blood group locus is a determinant of plasma von Willebrand factor level. Br J Haematol 116, 350-6 (2002).

179 165. O'Donnell, J.S., McKinnon, T.A., Crawley, J.T., Lane, D.A. & Laffan, M.A. Bombay phenotype is associated with reduced plasma-VWF levels and an increased susceptibility to ADAMTS13 proteolysis. Blood 106, 1988-91 (2005).

166. Yang, Q., Kathiresan, S., Lin, J.P., Tofler, G.H. & O'Donnell, C.J. Genome-wide association and linkage analyses of hemostatic factors and hematological phenotypes in the Framingham Heart Study. BMC Med Genet 8 Suppl 1, S12 (2007).

167. Smith, N.L. et al. Novel associations of multiple genetic loci with plasma levels of factor VII, factor VIII, and von Willebrand factor: The CHARGE (Cohorts for Heart and Aging Research in Genome Epidemiology) Consortium. Circulation 121, 1382-92 (2010).

168. Antoni, G. et al. Combined analysis of three genome-wide association studies on vWF and FVIII plasma levels. BMC Med Genet 12, 102 (2011).

169. Osler, W. An account of certain organisms occuring in the liquor sanguinis. Proceedings of the Royal Society of London 22, 391-398 (1874).

170. Hartwig, J. & Italiano, J., Jr. The birth of the platelet. J Thromb Haemost 1, 1580- 6 (2003).

171. Italiano, J.E., Jr. & Shivdasani, R.A. Megakaryocytes and beyond: the birth of platelets. J Thromb Haemost 1, 1174-82 (2003).

172. Patel, S.R., Hartwig, J.H. & Italiano, J.E., Jr. The biogenesis of platelets from megakaryocyte proplatelets. J Clin Invest 115, 3348-54 (2005).

173. Ebbe, S. & Stohlman, F., Jr. Megakaryocytopoiesis in the Rat. Blood 26, 20-35 (1965).

174. Kaufman, R.M., Airo, R., Pollack, S. & Crosby, W.H. Circulating megakaryocytes and platelet release in the lung. Blood 26, 720-31 (1965).

175. Harker, L.A. & Finch, C.A. Thrombokinetics in man. J Clin Invest 48, 963-74 (1969).

176. Paulus, J.M. DNA metabolism and development of organelles in guinea-pig megakaryocytes: a combined ultrastructural, autoradiographic and cytophotometric study. Blood 35, 298-311 (1970).

177. Trowbridge, E.A. et al. The origin of platelet count and volume. Clin Phys Physiol Meas 5, 145-70 (1984).

178. Kaushansky, K. The molecular mechanisms that control thrombopoiesis. J Clin Invest 115, 3339-47 (2005).

179. Kuter, D.J. & Rosenberg, R.D. The reciprocal relationship of thrombopoietin (c- Mpl ligand) to changes in the platelet mass during busulfan-induced thrombocytopenia in the rabbit. Blood 85, 2720-30 (1995).

180

180. Radley, J.M. & Scurfield, G. The mechanism of platelet release. Blood 56, 996-9 (1980).

181. Tablin, F., Castro, M. & Leven, R.M. Blood platelet formation in vitro. The role of the cytoskeleton in megakaryocyte fragmentation. J Cell Sci 97 ( Pt 1), 59-70 (1990).

182. Italiano, J.E., Jr., Lecine, P., Shivdasani, R.A. & Hartwig, J.H. Blood platelets are assembled principally at the ends of proplatelet processes produced by differentiated megakaryocytes. J Cell Biol 147, 1299-312 (1999).

183. Bernard, J. & Soulier, J.P. [Not Available]. Sem Hop 24, 3217-23 (1948).

184. Ware, J., Russell, S. & Ruggeri, Z.M. Generation and rescue of a murine model of platelet dysfunction: the Bernard-Soulier syndrome. Proc Natl Acad Sci U S A 97, 2803-8 (2000).

185. Andrews, R.K., Lopez, J.A. & Berndt, M.C. Molecular mechanisms of platelet adhesion and activation. Int J Biochem Cell Biol 29, 91-105 (1997).

186. Uff, S., Clemetson, J.M., Harrison, T., Clemetson, K.J. & Emsley, J. Crystal structure of the platelet glycoprotein Ib(alpha) N-terminal domain reveals an unmasking mechanism for receptor activation. J Biol Chem 277, 35657-63 (2002).

187. Zucker, M.B. & Borrelli, J. Reversible alterations in platelet morphology produced by anticoagulants and by cold. Blood 9, 602-8 (1954).

188. White, J.G. & Krivit, W. Changes in platelet microtubules and granules during early clot development. Thromb Diath Haemorrh Suppl 26, 29-42 (1967).

189. Heijnen, H.F., Schiel, A.E., Fijnheer, R., Geuze, H.J. & Sixma, J.J. Activated platelets release two types of membrane vesicles: microvesicles by surface shedding and exosomes derived from exocytosis of multivesicular bodies and alpha-granules. Blood 94, 3791-9 (1999).

190. Barkalow, K., Witke, W., Kwiatkowski, D.J. & Hartwig, J.H. Coordinated regulation of platelet actin filament barbed ends by gelsolin and capping protein. J Cell Biol 134, 389-99 (1996).

191. Hartwig, J.H. et al. Thrombin receptor ligation and activated Rac uncap actin filament barbed ends through phosphoinositide synthesis in permeabilized human platelets. Cell 82, 643-53 (1995).

192. Reed, G.L. Platelet secretory mechanisms. Semin Thromb Hemost 30, 441-50 (2004).

193. Najean, Y. & Lecompte, T. Hereditary thrombocytopenias in childhood. Semin Thromb Hemost 21, 294-304 (1995).

181 194. Bellucci, S. Megakaryocytes and inherited thrombocytopenias. Baillieres Clin Haematol 10, 149-62 (1997).

195. Nurden, A.T. & Nurden, P. Inherited defects of platelet function. Rev Clin Exp Hematol 5, 314-34; quiz following 431 (2001).

196. Balduini, C.L. & Savoia, A. Inherited thrombocytopenias: molecular mechanisms. Semin Thromb Hemost 30, 513-23 (2004).

197. Nurden, A.T. & Nurden, P. Inherited thrombocytopenias. Haematologica 92, 1158-64 (2007).

198. Salles, II et al. Inherited traits affecting platelet function. Blood Rev 22, 155-72 (2008).

199. Balduini, C.L. & Savoia, A. Genetics of familial forms of thrombocytopenia. Hum Genet 131, 1821-32 (2012).

200. Israels, S.J. & Rand, M.L. What we have learned from inherited platelet disorders. Pediatric Blood & Cancer 60, S2-S7 (2013).

201. Drachman, J.G., Jarvik, G.P. & Mehaffey, M.G. Autosomal dominant thrombocytopenia: incomplete megakaryocyte differentiation and linkage to human chromosome 10. Blood 96, 118-25 (2000).

202. Karim, M.A. et al. Mutations in the Chediak-Higashi syndrome gene (CHS1) indicate requirement for the complete 3801 amino acid CHS protein. Hum Mol Genet 6, 1087-9 (1997).

203. Oh, J. et al. Positional cloning of a gene for Hermansky-Pudlak syndrome, a disorder of cytoplasmic organelles. Nat Genet 14, 300-6 (1996).

204. Bigalke, B., Schuster, A., Sopova, K., Wurster, T. & Stellos, K. Platelets in atherothrombosis--diagnostic and prognostic value of platelet activation in patients with atherosclerotic diseases. Curr Vasc Pharmacol 10, 589-96 (2012).

205. Nikolsky, E. et al. Impact of baseline platelet count in patients undergoing primary percutaneous coronary intervention in acute myocardial infarction (from the CADILLAC trial). Am J Cardiol 99, 1055-61 (2007).

206. Thaulow, E., Erikssen, J., Sandvik, L., Stormorken, H. & Cohn, P.F. Blood platelet count and function are related to total and cardiovascular death in apparently healthy men. Circulation 84, 613-7 (1991).

207. Mayda-Domac, F., Misirli, H. & Yilmaz, M. Prognostic role of mean platelet volume and platelet count in ischemic and hemorrhagic stroke. J Stroke Cerebrovasc Dis 19, 66-72 (2010).

208. van der Bom, J.G. et al. Platelet count and the risk for thrombosis and death in the elderly. J Thromb Haemost 7, 399-405 (2009).

182 209. Gupta, G.P. & Massague, J. Platelets and metastasis revisited: a novel fatty link. J Clin Invest 114, 1691-3 (2004).

210. Sloan, A.W. The Normal Platelet Count in Man. J Clin Pathol 4, 37-46 (1951).

211. Brecher, G., Schneiderman, M. & Cronkite, E.P. The reproducibility and constancy of the platelet count. American Journal of Clinical Pathology 23, 15-26 (1953).

212. Costongs, G.M. et al. Short-term and long-term intra-individual variations and critical differences of haematological laboratory parameters. J Clin Chem Clin Biochem 23, 69-76 (1985).

213. Ross, D.W., Ayscue, L.H., Watson, J. & Bentley, S.A. Stability of hematologic parameters in healthy subjects. Intraindividual versus interindividual variation. American Journal of Clinical Pathology 90, 262-7 (1988).

214. Buckley, M.F. et al. A novel approach to the assessment of variations in the human platelet count. Thromb Haemost 83, 480-4 (2000).

215. Fraser, C.G. et al. Biologic variation of common hematologic laboratory quantities in the elderly. American Journal of Clinical Pathology 92, 465-70 (1989).

216. Dal Colletto, G.M., Fulker, D.W., Barretto, O.C. & Kolya, M. Genetic and environmental effects on blood cells. Acta Genet Med Gemellol (Roma) 42, 245- 52 (1993).

217. Garner, C. et al. Genetic influences on F cells and other hematologic variables: a twin heritability study. Blood 95, 342-6 (2000).

218. O'Donnell, C.J. et al. Genetic and environmental contributions to platelet aggregation: the Framingham heart study. Circulation 103, 3051-6 (2001).

219. Bray, P.F. et al. Heritability of platelet function in families with premature coronary artery disease. J Thromb Haemost 5, 1617-23 (2007).

220. Faraday, N. et al. Heritability of platelet responsiveness to aspirin in activation pathways directly and indirectly related to cyclooxygenase-1. Circulation 115, 2490-6 (2007).

221. Kamatani, Y. et al. Genome-wide association study of hematological and biochemical traits in a Japanese population. Nat Genet 42, 210-5 (2010).

222. Soranzo, N. et al. A genome-wide meta-analysis identifies 22 loci associated with eight hematological parameters in the HaemGen consortium. Nat Genet 41, 1182-90 (2009).

223. Lo, K.S. et al. Genetic association analysis highlights new loci that modulate hematological trait variation in Caucasians and African Americans. Hum Genet 129, 307-17 (2011).

183 224. Blangero, J., Williams, J.T. & Almasy, L. Novel family-based approaches to genetic risk in thrombosis. J Thromb Haemost 1, 1391-7 (2003).

225. Heutink, P. & Oostra, B.A. Gene finding in genetically isolated populations. Hum Mol Genet 11, 2507-15 (2002).

226. Spiel, A.O., Gilbert, J.C. & Jilma, B. von Willebrand factor in cardiovascular disease: focus on acute coronary syndromes. Circulation 117, 1449-59 (2008).

227. James, P.D. et al. The mutational spectrum of type 1 von Willebrand disease: results from a Canadian cohort study. Blood 109,145-54 (2006).

228. Takeuchi, M., Nagura, H. & Kaneda, T. DDAVP and epinephrine-induced changes in the localization of von Willebrand factor antigen in endothelial cells of human oral mucosa. Blood 72, 850-4 (1988).

229. De Visser, M.C. et al. Linkage analysis of factor VIII and von Willebrand factor loci as quantitative trait loci. J Thromb Haemost 1, 1771-6 (2003).

230. Di Paola, J. et al. Low platelet alpha2beta1 levels in type I von Willebrand disease correlate with impaired platelet function in a high shear stress system. Blood 93, 3578-82 (1999).

231. Kunicki, T.J. et al. An association of candidate gene haplotypes and bleeding severity in von Willebrand disease (VWD) type 1 pedigrees. Blood 104, 2359-67 (2004).

232. Drews, C.D., Dilley, A.B., Lally, C., Beckman, M.G. & Evatt, B. Screening questions to identify women with von Willebrand disease. J Am Med Womens Assoc 57, 217-8 (2002).

233. Rodeghiero, F. et al. The discriminant power of bleeding history for the diagnosis of type 1 von Willebrand disease: an international, multicenter study. J Thromb Haemost 3, 2619-26 (2005).

234. Pearson, S.L. & Hessner, M.J. A(1,2)BO(1,2) genotyping by multiplexed allele- specific PCR. Br J Haematol 100, 229-34 (1998).

235. Ranade, K. et al. High-throughput genotyping with single nucleotide polymorphisms. Genome Res 11, 1262-8 (2001).

236. Dausset, J. et al. Centre d'etude du polymorphisme humain (CEPH): collaborative genetic mapping of the human genome. Genomics 6, 575-7 (1990).

237. Dumas, J.J. et al. Crystal structure of the wild-type von Willebrand factor A1- glycoprotein Ibalpha complex reveals conformation differences with a complex bearing von Willebrand disease mutations. J Biol Chem 279, 23327-34 (2004).

238. Emsley, P. & Cowtan, K. Coot: model-building tools for molecular graphics. Acta Crystallogr D Biol Crystallogr 60, 2126-32 (2004).

184 239. DeLano, W.L. Th PyMOL molecular graphics system. in DeLano Scientific (San Carlos, CA, USA,, 2002).

240. Haberichter, S.L., Jacobi, P. & Montgomery, R.R. Critical independent regions in the VWF propeptide and mature VWF that enable normal VWF storage. Blood 101, 1384-91 (2003).

241. O'Connell, J.R. & Weeks, D.E. PedCheck: a program for identification of genotype incompatibilities in linkage analysis. Am J Hum Genet 63, 259-66 (1998).

242. Barrett, J.C., Fry, B., Maller, J. & Daly, M.J. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263-5 (2005).

243. de Bakker, P.I. et al. Efficiency and power in genetic association studies. Nat Genet 37, 1217-23 (2005).

244. Almasy, L. & Blangero, J. Multipoint quantitative-trait linkage analysis in general pedigrees. Am J Hum Genet 62, 1198-211 (1998).

245. Heath, S.C. Markov chain Monte Carlo segregation and linkage analysis for oligogenic models. Am J Hum Genet 61, 748-60 (1997).

246. Holm, S. A Simple Sequentially Rejective Multiple Test Procedure. Scandinavian Journal of Statistics 6, 65-70 (1979).

247. Sanger, F., Nicklen, S. & Coulson, A.R. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A 74, 5463-7 (1977).

248. Emsley, J., Cruz, M., Handin, R. & Liddington, R. Crystal structure of the von Willebrand Factor A1 domain and implications for the binding of platelet glycoprotein Ib. J Biol Chem 273, 10396-401 (1998).

249. Hillery, C.A. et al. Type 2M von Willebrand disease: F606I and I662F mutations in the glycoprotein Ib binding domain selectively impair ristocetin- but not botrocetin-mediated binding of von Willebrand factor to platelets. Blood 91, 1572- 81 (1998).

250. Keeney, S. & Cumming, A.M. The molecular biology of von Willebrand disease. Clin Lab Haematol 23, 209-30 (2001).

251. Mancuso, D.J. et al. Type 2M:Milwaukee-1 von Willebrand disease: an in-frame deletion in the Cys509-Cys695 loop of the von Willebrand factor A1 domain causes deficient binding of von Willebrand factor to platelets. Blood 88, 2559-68 (1996).

252. Lemmerhirt, H.L., Broman, K.W., Shavit, J.A. & Ginsburg, D. Genetic regulation of plasma von Willebrand factor levels: quantitative trait loci analysis in a mouse model. J Thromb Haemost 5, 329-35 (2007).

185 253. Montpetit, A. et al. An evaluation of the performance of tag SNPs derived from HapMap in a Caucasian population. PLoS Genet 2, e27 (2006).

254. Lundmark, P.E. et al. Evaluation of HapMap data in six populations of European descent. Eur J Hum Genet 16, 1142-50 (2008).

255. Peltonen, L., Palotie, A. & Lange, K. Use of population isolates for mapping complex traits. Nat Rev Genet 1, 182-90 (2000).

256. Van Hout, C.V. et al. Extent and distribution of linkage disequilibrium in the Old Order Amish. Genet Epidemiol 34, 146-50 (2010).

257. Hugot, J.P. et al. Association of NOD2 leucine-rich repeat variants with susceptibility to Crohn's disease. Nature 411, 599-603 (2001).

258. Ogura, Y. et al. A frameshift mutation in NOD2 associated with susceptibility to Crohn's disease. Nature 411, 603-6 (2001).

259. Rioux, J.D. et al. Genetic variation in the 5q31 cytokine gene cluster confers susceptibility to Crohn disease. Nat Genet 29, 223-8 (2001).

260. Schafer, A.I. Effects of nonsteroidal anti-inflammatory therapy on platelets. Am J Med 106, 25S-36S (1999).

261. Francomano, C.A., McKusick, V.A. & Biesecker, L.G. Medical genetic studies in the Amish: historical perspective. Am J Med Genet 121C, 1-4 (2003).

262. Strittmatter, W.J. et al. Apolipoprotein E: high-avidity binding to beta-amyloid and increased frequency of type 4 allele in late-onset familial Alzheimer disease. Proc Natl Acad Sci U S A 90, 1977-81 (1993).

263. Hilbert, L., Gaucher, C. & Mazurier, C. Identification of two mutations (Arg611Cys and Arg611His) in the A1 loop of von Willebrand factor (vWF) responsible for type 2 von Willebrand disease with decreased platelet-dependent function of vWF. Blood 86, 1010-8 (1995).

264. Shavit, J.A., Manichaikul, A., Lemmerhirt, H.L., Broman, K.W. & Ginsburg, D. Modifiers of von Willebrand factor identified by natural variation in inbred strains of mice. Blood 114, 5368-74 (2009).

265. van Loon, J.E. et al. Effect of genetic variations in syntaxin-binding protein-5 and syntaxin-2 on von Willebrand factor concentration and cardiovascular risk. Circ Cardiovasc Genet 3, 507-12 (2010).

266. Fujita, Y. et al. Tomosyn: a syntaxin-1-binding protein that forms a novel complex in the neurotransmitter release process. Neuron 20, 905-15 (1998).

267. van Loon, J.E. et al. Effect of genetic variation in STXBP5 and STX2 on von Willebrand factor and bleeding phenotype in type 1 von Willebrand disease patients. PLoS One 7, e40624 (2012).

186 268. Smith, N.L. et al. Genetic variation associated with plasma von Willebrand factor levels and the risk of incident venous thrombosis. Blood 117, 6007-11 (2011).

269. Morange, P.E., Saut, N., Antoni, G., Emmerich, J. & Tregouet, D.A. Impact on venous thrombosis risk of newly discovered gene variants associated with FVIII and VWF plasma levels. J Thromb Haemost 9, 229-31 (2011).

270. Fay, W.P., Shapiro, A.D., Shih, J.L., Schleef, R.R. & Ginsburg, D. Brief report: complete deficiency of plasminogen-activator inhibitor type 1 due to a frame-shift mutation. N Engl J Med 327, 1729-33 (1992).

271. Ketterling, R.P., Bottema, C.D., Koeberl, D.D., Ii, S. & Sommer, S.S. T296----M, a common mutation causing mild hemophilia B in the Amish and others: founder effect, variability in factor IX activity assays, and rapid carrier detection. Hum Genet 87, 333-7 (1991).

272. Bailey-Wilson, J.E. & Wilson, A.F. Linkage analysis in the next-generation sequencing era. Hum Hered 72, 228-36 (2011).

273. Desch, K.C. et al. Linkage analysis identifies a locus for plasma von Willebrand factor undetected by genome-wide association. Proc Natl Acad Sci U S A 110, 588-93 (2013).

274. Stone, N. et al. Bioinformatic and genetic association analysis of microRNA target sites in one-carbon metabolism genes. PLoS One 6, e21851 (2011).

275. Birney, E. et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799-816 (2007).

276. A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol 9, e1001046 (2011).

277. Cooper, G.M. et al. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res 15, 901-13 (2005).

278. Davydov, E.V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput Biol 6, e1001025 (2010).

279. Felker, G.M. et al. Red cell distribution width as a novel prognostic marker in heart failure: data from the CHARM Program and the Duke Databank. J Am Coll Cardiol 50, 40-7 (2007).

280. Haltmayer, M., Mueller, T., Luft, C., Poelz, W. & Haidinger, D. Erythrocyte mean corpuscular volume associated with severity of peripheral arterial disease: an angiographic evaluation. Ann Vasc Surg 16, 474-9 (2002).

281. Ozsu, S., Abul, Y., Gunaydin, S., Orem, A. & Ozlu, T. Prognostic Value of Red Cell Distribution Width in Patients With Pulmonary Embolism. Clin Appl Thromb Hemost 00, 1-6 (2012).

187 282. Puddu, P.E. et al. Red blood cell count in short-term prediction of cardiovascular disease incidence in the Gubbio population study. Acta Cardiol 57, 177-85 (2002).

283. Tonelli, M. et al. Relation Between Red Blood Cell Distribution Width and Cardiovascular Event Rate in People With Coronary Disease. Circulation 117, 163-168 (2008).

284. Ye, Z., Smith, C. & Kullo, I.J. Usefulness of red cell distribution width to predict mortality in patients with peripheral artery disease. Am J Cardiol 107, 1241-5 (2011).

285. Gagnon, D.R., Zhang, T.J., Brand, F.N. & Kannel, W.B. Hematocrit and the risk of cardiovascular disease--the Framingham study: a 34-year follow-up. Am Heart J 127, 674-82 (1994).

286. Madjid, M., Awan, I., Willerson, J.T. & Casscells, S.W. Leukocyte count and coronary heart disease: implications for risk assessment. J Am Coll Cardiol 44, 1945-56 (2004).

287. Evans, D.M., Frazer, I.H. & Martin, N.G. Genetic and environmental causes of variation in basal levels of blood cells. Twin Res 2, 250-7 (1999).

288. Chalmers, D.M., Levi, A.J., Chanarin, I., North, W.R. & Meade, T.W. Mean cell volume in a working population: the effects of age, smoking, alcohol and oral contraception. Br J Haematol 43, 631-6 (1979).

289. Schwartz, J. & Weiss, S.T. Host and environmental factors influencing the peripheral blood leukocyte count. Am J Epidemiol 134, 1402-9 (1991).

290. Whitehead, T.P., Robinson, D., Allaway, S.L. & Hale, A.C. The effects of cigarette smoking and alcohol consumption on blood haemoglobin, erythrocytes and leucocytes: a dose related study on male subjects. Clin Lab Haematol 17, 131-8 (1995).

291. Yokoyama, Y. & Akiyama, T. Intrapair differences of the blood cell components and lymphocyte subsets in monozygotic and dizygotic twins. Acta Genet Med Gemellol (Roma) 44, 203-14 (1995).

292. Whitfield, J.B. & Martin, N.G. Genetic and environmental influences on the size and number of cells in the blood. Genet Epidemiol 2, 133-44 (1985).

293. Walters, M.C. & Abelson, H.T. Interpretation of the complete blood count. Pediatr Clin North Am 43, 599-622 (1996).

294. Lin, J.P., O'Donnell, C.J., Levy, D. & Cupples, L.A. Evidence for a gene influencing haematocrit on chromosome 6q23-24: genomewide scan in the Framingham Heart Study. J Med Genet 42, 75-9 (2005).

295. Iliadou, A. et al. Genomewide scans of red cell indices suggest linkage on chromosome 6q23. J Med Genet 44, 24-30 (2007).

188 296. Ganesh, S.K. et al. Multiple loci influence erythrocyte phenotypes in the CHARGE Consortium. Nat Genet 41, 1191-8 (2009).

297. Kullo, I.J., Ding, K., Jouni, H., Smith, C.Y. & Chute, C.G. A genome-wide association study of red blood cell traits using the electronic medical record. PLoS One 5(2010).

298. Benyamin, B. et al. Variants in TF and HFE explain approximately 40% of genetic variation in serum-transferrin levels. Am J Hum Genet 84, 60-5 (2009).

299. Danesh, J., Collins, R., Peto, R. & Lowe, G.D. Haematocrit, viscosity, erythrocyte sedimentation rate: meta-analyses of prospective studies of coronary heart disease. Eur Heart J 21, 515-20 (2000).

300. Froom, P. Blood viscosity and the risk of death from coronary heart disease. Eur Heart J 21, 513-4 (2000).

301. Kabeche, L. & Compton, D.A. Checkpoint-independent stabilization of kinetochore-microtubule attachments by Mad2 in human cells. Curr Biol 22, 638- 44 (2012).

302. Wang, X., Song, Y., Ren, J. & Qu, X. Knocking-down cyclin A(2) by siRNA suppresses apoptosis and switches differentiation pathways in K562 cells upon administration with doxorubicin. PLoS One 4, e6665 (2009).

303. Williams, J.H. et al. HLS7, a hemopoietic lineage switch gene homologous to the leukemia-inducing gene MLF1. EMBO J 18, 5559-66 (1999).

304. Drakesmith, H. et al. The hemochromatosis protein HFE inhibits iron export from macrophages. Proc Natl Acad Sci U S A 99, 15602-7 (2002).

305. Feder, J.N. et al. The hemochromatosis gene product complexes with the transferrin receptor and lowers its affinity for ligand binding. Proc Natl Acad Sci U S A 95, 1472-7 (1998).

306. Feder, J.N. et al. A novel MHC class I-like gene is mutated in patients with hereditary haemochromatosis. Nat Genet 13, 399-408 (1996).

307. Ajioka, R.S. et al. Haplotype analysis of hemochromatosis: evaluation of different linkage-disequilibrium approaches and evolution of disease chromosomes. Am J Hum Genet 60, 1439-47 (1997).

308. Nairz, M. et al. Absence of functional Hfe protects mice from invasive Salmonella enterica serovar Typhimurium infection via induction of lipocalin-2. Blood 114, 3642-51 (2009).

309. Allen, K.J. et al. Iron-overload-related disease in HFE hereditary hemochromatosis. N Engl J Med 358, 221-30 (2008).

189 310. Andersen, R.V., Tybjaerg-Hansen, A., Appleyard, M., Birgens, H. & Nordestgaard, B.G. Hemochromatosis mutations in the general population: iron overload progression rate. Blood 103, 2914-9 (2004).

311. Schwarz, K. et al. Mutations affecting the secretory COPII coat component SEC23B cause congenital dyserythropoietic anemia type II. Nat Genet 41, 936- 40 (2009).

312. Milet, J. et al. Common variants in the BMP2, BMP4, and HJV genes of the hepcidin regulation pathway modulate HFE hemochromatosis penetrance. Am J Hum Genet 81, 799-807 (2007).

313. Tomari, K., Kumagai, T., Shimizu, T. & Takeda, K. Bone morphogenetic protein-2 induces hypophosphorylation of Rb protein and repression of E2F in androgen- treated LNCaP human prostate cancer cells. Int J Mol Med 15, 253-8 (2005).

314. Verweij, C.L. et al. Construction of cDNA coding for human von Willebrand factor using antibody probes for colony-screening and mapping of the chromosomal gene. Nucleic Acids Res 13, 4699-717 (1985).

315. Jaffe, E.A., Hoyer, L.W. & Nachman, R.L. Synthesis of antihemophilic factor antigen by cultured human endothelial cells. J Clin Invest 52, 2757-64 (1973).

316. Nachman, R., Levine, R. & Jaffe, E.A. Synthesis of factor VIII antigen by cultured guinea pig megakaryocytes. J Clin Invest 60, 914-21 (1977).

317. Lethagen, S., Carlson, M. & Hillarp, A. A comparative in vitro evaluation of six von Willebrand factor concentrates. Haemophilia 10, 243-9 (2004).

318. Schneppenheim, R. & Budde, U. von Willebrand factor: the complex molecular genetics of a multidomain and multifunctional protein. J Thromb Haemost 9 Suppl 1, 209-15 (2011).

319. Abildgaard, C.F., Suzuki, Z., Harrison, J., Jefcoat, K. & Zimmerman, T.S. Serial studies in von Willebrand's disease: variability versus "variants". Blood 56, 712-6 (1980).

320. Chen, J.M. et al. N-acetylcysteine reduces the size and activity of von Willebrand factor in human plasma and mice. Journal of Clinical Investigation 121, 593-603 (2011).

321. Feig, M., Im, W. & Brooks, C.L. Implicit solvation based on generalized Born theory in different dielectric environments. Journal of Chemical Physics 120, 903- 911 (2004).

322. Momany, F.A. & Rone, R. Validation of the General-Purpose Quanta(R)3.2/Charmm(R) Force-Field. Journal of Computational Chemistry 13, 888-900 (1992).

190 323. Chen, R. & Weng, Z.P. Docking unbound proteins using shape complementarity, desolvation, and electrostatics. Proteins-Structure Function and Genetics 47, 281-294 (2002).

324. Pierce, B. & Weng, Z.P. ZRANK: Reranking protein docking predictions with an optimized energy function. Proteins-Structure Function and Bioinformatics 67, 1078-1086 (2007).

325. Brooks, B.R. et al. CHARMM: The Biomolecular Simulation Program. Journal of Computational Chemistry 30, 1545-1614 (2009).

326. MacKerell, A.D. et al. All-atom empirical potential for molecular modeling and dynamics studies of proteins. Journal of Physical Chemistry B 102, 3586-3616 (1998).

327. Phillips, J.C. et al. Scalable molecular dynamics with NAMD. Journal of Computational Chemistry 26, 1781-1802 (2005).

328. Darden, T., York, D. & Pedersen, L. Particle Mesh Ewald - an N.Log(N) Method for Ewald Sums in Large Systems. Journal of Chemical Physics 98, 10089-10092 (1993).

329. Gao, J., Kuczera, K., Tidor, B. & Karplus, M. Hidden Thermodynamics of Mutant Proteins - a Molecular-Dynamics Analysis. Science 244, 1069-1072 (1989).

330. Pearlman, D.A. A Comparison of Alternative Approaches to Free-Energy Calculations. Journal of Physical Chemistry 98, 1487-1493 (1994).

331. Haberichter, S.L. et al. Re-establishment of VWF-dependent Weibel-Palade bodies in VWD endothelial cells. Blood 105, 145-52 (2005).

332. Hulstein, J.J. et al. A novel nanobody that detects the gain-of-function phenotype of von Willebrand factor in ADAMTS13 deficiency and von Willebrand disease type 2B. Blood 106, 3035-42 (2005).

333. Groot, E., de Groot, P.G., Fijnheer, R. & Lenting, P.J. The presence of active von Willebrand factor under various pathological conditions. Curr Opin Hematol 14, 284-9 (2007).

334. Othman, M. & Favaloro, E.J. Genetics of type 2B von Willebrand disease: "true 2B," "tricky 2B," or "not 2B." What are the modifiers of the phenotype? Semin Thromb Hemost 34, 520-31 (2008).

335. Hickson, N. et al. von Willebrand factor variant p.Arg924Gln marks an allele associated with reduced von Willebrand factor and factor VIII levels. J Thromb Haemost 8, 1986-93 (2010).

336. Siguret, V. et al. Characterization of recombinant von Willebrand factors mutated on cysteine 509 or 695. Thromb Haemost 76, 453-9 (1996).

191 337. Rosenberg, J.B. et al. The role of the D1 domain of the von Willebrand factor propeptide in multimerization of VWF. Blood 100, 1699-706 (2002).

338. Bodo, I. et al. Type 1 von Willebrand disease mutation Cys1149Arg causes intracellular retention and degradation of heterodimers: a possible general mechanism for dominant mutations of oligomeric proteins. Blood 98, 2973-9 (2001).

339. Lambert, M.P. & Poncz, M. Inherited Thrombocytopenias. in Platelets (ed. Michelson, A.D.) 985-998 (Academic Press, Burlington, MA, 2007).

340. Cramer, E.M., Vainchenker, W., Vinci, G., Guichard, J. & Breton-Gorius, J. Gray platelet syndrome: immunoelectron microscopic localization of fibrinogen and von Willebrand factor in platelets and megakaryocytes. Blood 66, 1309-16 (1985).

341. Levy-Toledano, S. et al. Gray platelet syndrome: alpha-granule deficiency. Its influence on platelet function. J Lab Clin Med 98, 831-48 (1981).

342. Nurden, A.T. & Nurden, P. The gray platelet syndrome: clinical spectrum of the disease. Blood Rev 21, 21-36 (2007).

343. Raccuglia, G. Gray platelet syndrome. A variety of qualitative platelet disorder. Am J Med 51, 818-28 (1971).

344. White, J.G. Ultrastructural studies of the gray platelet syndrome. Am J Pathol 95, 445-62 (1979).

345. Balduini, C.L., De Candia, E. & Savoia, A. Why the disorder induced by GATA1 Arg216Gln mutation should be called "X-linked thrombocytopenia with thalassemia" rather than "X-linked gray platelet syndrome". Blood 110, 2770-1; author reply 2771 (2007).

346. Mori, K., Suzuki, S., Akutsu, Y., Ishikawa, M. & Sakai, H. Gray platelet syndrome: relationship between morphological abnormality of the dense tubular system (DTS) and intracellular Ca++ mobilization in the platelet. Nippon Ketsueki Gakkai Zasshi 52, 1534-41 (1989).

347. Tubman, V.N. et al. X-linked gray platelet syndrome due to a GATA1 Arg216Gln mutation. Blood 109, 3297-9 (2007).

348. Gunay-Aygun, M. et al. Gray platelet syndrome: natural history of a large patient cohort and locus assignment to chromosome 3p. Blood 116, 4990-5001 (2010).

349. Jantunen, E. Inherited giant platelet disorders. Eur J Haematol 53, 191-6 (1994).

350. Jantunen, E., Hanninen, A., Naukkarinen, A., Vornanen, M. & Lahtinen, R. Gray platelet syndrome with splenomegaly and signs of extramedullary hematopoiesis: a case report with review of the literature. Am J Hematol 46, 218-24 (1994).

351. Blair, P. & Flaumenhaft, R. Platelet alpha-granules: basic biology and clinical correlates. Blood Rev 23, 177-89 (2009).

192 352. Coppinger, J.A. et al. Characterization of the proteins released from activated platelets leads to localization of novel platelet proteins in human atherosclerotic lesions. Blood 103, 2096-104 (2004).

353. Harrison, P. & Cramer, E.M. Platelet alpha-granules. Blood Rev 7, 52-62 (1993).

354. Breton-Gorius, J., Vainchenker, W., Nurden, A., Levy-Toledano, S. & Caen, J. Defective alpha-granule production in megakaryocytes from gray platelet syndrome: ultrastructural studies of bone marrow cells and megakaryocytes growing in culture from blood precursors. Am J Pathol 102, 10-9 (1981).

355. Gerrard, J.M. et al. Biochemical studies of two patients with the gray platelet syndrome. Selective deficiency of platelet alpha granules. J Clin Invest 66, 102-9 (1980).

356. Licht, C. et al. Platelet-associated complement factor H in healthy persons and patients with atypical HUS. Blood 114, 4538-45 (2009).

357. Kruglyak, L., Daly, M.J., Reeve-Daly, M.P. & Lander, E.S. Parametric and nonparametric linkage analysis: a unified multipoint approach. Am J Hum Genet 58, 1347-63 (1996).

358. McCarroll, S.A. et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat Genet 40, 1166-74 (2008).

359. Rowley, J.W. et al. Genome wide RNA-seq analysis of human and mouse platelet transcriptomes. Blood (2011).

360. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory- efficient alignment of short DNA sequences to the human genome. Genome Biol 10, R25 (2009).

361. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078-9 (2009).

362. Flicek, P. et al. Ensembl 2011. Nucleic Acids Res 39, D800-6 (2011).

363. Xiong, X. et al. PhyloPro: a web-based tool for the generation and visualization of phylogenetic profiles across Eukarya. Bioinformatics 27, 877-8 (2011).

364. Finn, R.D. et al. The Pfam protein families database. Nucleic Acids Res 38, D211-22 (2010).

365. Edgar, R.C. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113 (2004).

366. Keane, T.M., Creevey, C.J., Pentony, M.M., Naughton, T.J. & McLnerney, J.O. Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evol Biol 6, 29 (2006).

193 367. Tamura, K. et al. MEGA5: Molecular Evolutionary Genetics Analysis using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Mol Biol Evol 10, 2731-9 (2011).

368. Guindon, S. & Gascuel, O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52, 696-704 (2003).

369. Chiang, A.P. et al. Homozygosity mapping with SNP arrays identifies TRIM32, an E3 ubiquitin ligase, as a Bardet-Biedl syndrome gene (BBS11). Proc Natl Acad Sci U S A 103, 6287-92 (2006).

370. Walsh, T. et al. Whole exome sequencing and homozygosity mapping identify mutation in the cell polarity protein GPSM2 as the cause of nonsyndromic hearing loss DFNB82. Am J Hum Genet 87, 90-4 (2010).

371. Kahr, W.H. et al. Mutations in NBEAL2, encoding a BEACH protein, cause gray platelet syndrome. Nat Genet 43, 738-40 (2011).

372. Fabbro, S. et al. Homozygosity mapping with SNP arrays confirms 3p21 as a recessive locus for gray platelet syndrome and narrows the interval significantly. Blood 117, 3430-4 (2011).

373. Hediger, M.A. et al. The ABCs of solute carriers: physiological, pathological and therapeutic implications of human membrane transport proteinsIntroduction. Pflugers Arch 447, 465-8 (2004).

374. Hentze, M.W. & Kulozik, A.E. A perfect message: RNA surveillance and nonsense-mediated decay. Cell 96, 307-10 (1999).

375. Albers, C.A. et al. Exome sequencing identifies NBEAL2 as the causative gene for gray platelet syndrome. Nat Genet 43, 735-7 (2011).

376. Gunay-Aygun, M. et al. NBEAL2 is mutated in gray platelet syndrome and is required for biogenesis of platelet alpha-granules. Nat Genet 43, 732-4 (2011).

377. Barbosa, M.D. et al. Identification of the homologous beige and Chediak-Higashi syndrome genes. Nature 382, 262-5 (1996).

378. Certain, S. et al. Protein truncation test of LYST reveals heterogenous mutations in patients with Chediak-Higashi syndrome. Blood 95, 979-83 (2000).

379. Introne, W.J., Westbroek, W., Golas, G.A. & Adams, D. Chediak-Higashi Syndrome. (1993).

380. Kaplan, J., De Domenico, I. & Ward, D.M. Chediak-Higashi syndrome. Curr Opin Hematol 15, 22-9 (2008).

381. Nagle, D.L. et al. Identification and mutation analysis of the complete gene for Chediak-Higashi syndrome. Nat Genet 14, 307-11 (1996).

194 382. Burgess, A., Mornon, J.P., de Saint-Basile, G. & Callebaut, I. A concanavalin A- like lectin domain in the CHS1/LYST protein, shared by members of the BEACH family. Bioinformatics 25, 1219-22 (2009).

383. Lemmon, M.A. Membrane recognition by phospholipid-binding domains. Nat Rev Mol Cell Biol 9, 99-111 (2008).

384. Musacchio, A., Gibson, T., Rice, P., Thompson, J. & Saraste, M. The PH domain: a common piece in the structural patchwork of signalling proteins. Trends Biochem Sci 18, 343-8 (1993).

385. Smith, T.F., Gaitatzes, C., Saxena, K. & Neer, E.J. The WD repeat: a common architecture for diverse functions. Trends Biochem Sci 24, 181-5 (1999).

386. Burkhardt, J.K., Wiebel, F.A., Hester, S. & Argon, Y. The giant organelles in beige and Chediak-Higashi fibroblasts are derived from late endosomes and mature lysosomes. J Exp Med 178, 1845-56 (1993).

387. Zhao, H. et al. On the analysis of the pathophysiology of Chediak-Higashi syndrome. Defects expressed by cultured melanocytes. Lab Invest 71, 25-34 (1994).

388. Shiflett, S.L., Kaplan, J. & Ward, D.M. Chediak-Higashi Syndrome: a rare disorder of lysosomes and lysosome related organelles. Pigment Cell Res 15, 251-7 (2002).

389. Khodosh, R., Augsburger, A., Schwarz, T.L. & Garrity, P.A. Bchs, a BEACH domain protein, antagonizes Rab11 in synapse morphogenesis and other developmental events. Development 133, 4655-65 (2006).

390. Kypri, E., Schmauch, C., Maniak, M. & De Lozanne, A. The BEACH protein LvsB is localized on lysosomes and postlysosomes and limits their fusion with early endosomes. Traffic 8, 774-83 (2007).

391. Shiflett, S.L., Vaughn, M.B., Huynh, D., Kaplan, J. & Ward, D.M. Bph1p, the Saccharomyces cerevisiae homologue of CHS1/beige, functions in cell wall formation and protein sorting. Traffic 5, 700-10 (2004).

392. Bithell, T.C., Didisheim, P., Cartwright, G.E. & Wintrobe, M.M. Thrombocytopenia Inherited as an Autosomal Dominant Trait. Blood 25, 231-40 (1965).

393. Najean, Y. & Lecompte, T. Genetic thrombocytopenia with autosomal dominant transmission: a review of 54 cases. Br J Haematol 74, 203-8 (1990).

394. Fabris, F. et al. Chronic isolated macrothrombocytopenia with autosomal dominant transmission: a morphological and qualitative platelet disorder. Eur J Haematol 58, 40-5 (1997).

395. Savoia, A. et al. An autosomal dominant thrombocytopenia gene maps to chromosomal region 10p. Am J Hum Genet 65, 1401-5 (1999).

195 396. Gandhi, M.J., Cummings, C.L. & Drachman, J.G. FLJ14813 missense mutation: a candidate for autosomal dominant thrombocytopenia on human chromosome 10. Hum Hered 55, 66-70 (2003).

397. Punzo, F. et al. A mutation in the acyl-coenzyme A binding domain-containing protein 5 gene (ACBD5 ) identified in autosomal dominant thrombocytopenia. J Thromb Haemost 8, 2085-7 (2010).

398. Pippucci, T. et al. Mutations in the 5' UTR of ANKRD26, the ankirin repeat domain 26 gene, cause an autosomal-dominant form of inherited thrombocytopenia, THC2. Am J Hum Genet 88, 115-20 (2011).

399. Noris, P. et al. Mutations in ANKRD26 are responsible for a frequent form of inherited thrombocytopenia: analysis of 78 patients from 21 families. Blood 117, 6673-80 (2011).

400. Lindner, T.H. & Hoffmann, K. easyLINKAGE: a PERL script for easy and automated two-/multi-point linkage analyses. Bioinformatics 21, 405-7 (2005).

401. Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 38, e164 (2010).

402. Zuker, M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 31, 3406-15 (2003).

403. Abecasis, G.R., Cherny, S.S., Cookson, W.O. & Cardon, L.R. Merlin--rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet 30, 97- 101 (2002).

404. Johnson, H.J. et al. In vivo inactivation of MASTL kinase results in thrombocytopenia. Exp Hematol 37, 901-8 (2009).

405. Hahn, Y., Bera, T.K., Pastan, I.H. & Lee, B. Duplication and extensive remodeling shaped POTE family genes encoding proteins containing ankyrin repeat and coiled coil domains. Gene 366, 238-45 (2006).

406. Mosavi, L.K., Cammett, T.J., Desrosiers, D.C. & Peng, Z.Y. The ankyrin repeat as molecular architecture for protein recognition. Protein Sci 13, 1435-48 (2004).

407. Bera, T.K. et al. A model for obesity and gigantism due to disruption of the Ankrd26 gene. Proc Natl Acad Sci U S A 105, 270-5 (2008).

408. Lee, Y. et al. Evolution and expression of chimeric POTE-actin genes in the human genome. Proc Natl Acad Sci U S A 103, 17885-90 (2006).

409. Liu, X.F., Bera, T.K., Liu, L.J. & Pastan, I. A primate-specific POTE-actin fusion protein plays a role in apoptosis. Apoptosis 14, 1237-44 (2009).

196 410. Morison, I.M. et al. A mutation of human cytochrome c enhances the intrinsic apoptotic pathway but causes only thrombocytopenia. Nat Genet 40, 387-9 (2008).

411. Gauss, K.A. et al. Variants of the 5'-untranslated region of human NCF2: expression and translational efficiency. Gene 366, 169-79 (2006).

412. Sean, P., Nguyen, J.H. & Semler, B.L. Altered interactions between stem-loop IV within the 5' noncoding region of coxsackievirus RNA and poly(rC) binding protein 2: effects on IRES-mediated translation and viral infectivity. Virology 389, 45-58 (2009).

413. Cai, L., Fritz, D., Stefanovic, L. & Stefanovic, B. Binding of LARP6 to the conserved 5' stem-loop regulates translation of mRNAs encoding type I collagen. J Mol Biol 395, 309-26 (2010).

414. Coppotelli, G., Summers, A., Chidakel, A., Ross, J.M. & Celi, F.S. Functional characterization of the 258 A/G (D2-ORFa-Gly3Asp) human type-2 deiodinase polymorphism: a naturally occurring variant increases the enzymatic activity by removing a putative repressor site in the 5' UTR of the gene. Thyroid 16, 625-32 (2006).

415. Halvorsen, M., Martin, J.S., Broadaway, S. & Laederach, A. Disease-associated mutations that alter the RNA structural ensemble. PLoS Genet 6, e1001074 (2010).

416. Plank, T.D. & Kieft, J.S. The structures of nonprotein-coding RNAs that drive internal ribosome entry site function. Wiley Interdiscip Rev RNA 3, 195-212 (2012).

417. El Awady, M.K. et al. Positional effect of mutations in 5'UTR of hepatitis C virus 4a on patients' response to therapy. World J Gastroenterol 15, 1480-6 (2009).

418. Macaulay, I.C. et al. Comparative gene expression profiling of in vitro differentiated megakaryocytes and erythroblasts identifies novel activatory and inhibitory platelet membrane proteins. Blood 109, 3260-9 (2007).

419. Sood, R. et al. Efficient Methods for Targeted Mutagenesis in Zebrafish Using Zinc-Finger Nucleases: Data from Targeting of Nine Genes Using CompoZr or CoDA ZFNs. PLoS One 8, e57239 (2013).

197 APPENDIX A

VWF Gene and Pseudogene Alignment for Primer Design

Homo sapiens chromosome 22 genomic contig, GRCh37.p2 reference primary assembly Length=3661581

Query: VWF sequence Subject: Pseudogene sequence Sort alignments for this subject sequence by: E value Score Percent identity Query start position Subject start position Features flanking this part of subject sequence: 89813 bp at 5' side: putative T-complex protein 1 subunit theta-like 2 87433 bp at 3' side: XK-related protein 3

Score = 2.252e+04 bits (12193), Expect = 0.0 Identities = 13368/13919 (96%), Gaps = 146/13919 (1%) Strand=Plus/Minus Score = 1.562e+04 bits (8457), Expect = 0.0 Identities = 9022/9293 (97%), Gaps = 45/9293 (0%) Strand=Plus/Minus

Query 98631 AGCAAGATAGAATACATTTTTTCTCTTTGTCTCAACATCAGGGCTCTAAGTATATTGTAA 98690 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 338498 AGCAAGATAGAATACATTTTTTCTCTTTGTCTCAACATCAGGGCTCTAAGTATATTGTAA 338439

Query 98691 AAGTATAGCACCTTTTCTTCTCCAGGCCAAATATTTCCAGttttttctgtag---ttttt 98747 |||||||||||||||||||||||||||||||||||||||||||||||||||| ||||| Sbjct 338438 AAGTATAGCACCTTTTCTTCTCCAGGCCAAATATTTCCAGTTTTTTCTGTAGTTTTTTTT 338379

Query 98748 ttgttttattattatttttagttgacatttaactgtatgtatttactggtgacagtgtga 98807 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 338378 TTGTTTTATTATTATTTTTAGTTGACATTTAACTGTATGTATTTACTGGTGACAGTGTGA 338319

Query 98808 tatttcagtacatgtatacagtgtgtagtgatcaaatcagggtagttactatatctgttc 98867 ||||||| ||||||||||||||||||| |||||||||||||||||||| ||||||| ||| Sbjct 338318 TATTTCAATACATGTATACAGTGTGTAATGATCAAATCAGGGTAGTTAGTATATCTCTTC 338259

Query 98868 cctcaaacatttaccatttctttgtgttgggaacattcaaagtctgctcttccagctact 98927 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 338258 CCTCAAACATTTACCATTTCTTTGTGTTGGGAACATTCAAAGTCTGCTCTTCCAGCTACT 338199

Query 98928 tgaaaatatacaataaactgttgataattatagtcaccctgtggtgctaccgaacactag 98987 |||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||| Sbjct 338198 TGAAAATATACAATAAACTGTTGATAATTATAGTCACCCTGTGGTGCTACAGAACACTAG 338139

Query 98988 aacttatccctcctctttgtgtagttttCACTTGTTTCTCACTCATTCGTAACTACTTTT 99047 |||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||| Sbjct 338138 AACTTATCCCTCCTCTTTGTGTAGTTTTCACTTGTTTCTCACTCATTCATAACTACTTTT 338079

Query 99048 TCAGAACTTGTTGCATGTTGCACTCCATGATGGGTGAACATGGAGACAGGTGAGACATAT 99107 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 338078 TCAGAACTTGTTGCATGTTGCACTCCATGATGGGTGAACATGGAGACAGGTGAGACATAT 338019

Query 99108 CTTCCTGATTACATTTATGAGAGAGCAGAATTCTGCCCAGCCTACTCTGAACAGGCTGCA 99167 ||||||| |||||||||||| |||||||||||||||||||||| |||||||||||||| Sbjct 338018 CTTCCTGGTTACATTTATGATAGAGCAGAATTCTGCCCAGCCTGGTCTGAACAGGCTGCG 337959

198

Query 99168 CTTCTCCAGGCAAGGGGACTCACCTTCCCAGAGGCAGGCAGTTCTATAACTTTCCCTCCT 99227 |||||||||||||||||||||||| |||||||||||||||||||||||| ||||||||| Sbjct 337958 GTTCTCCAGGCAAGGGGACTCACCTGCCCAGAGGCAGGCAGTTCTATAACGTTCCCTCCT 337899

Query 99228 TGGACCCCTACACCTCCATTGGTGGAGCAGTGTACCTGGCTCCTCTCTCTGCTTCTATGC 99287 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||| || Sbjct 337898 TGGACCCCTACACCTCCATTGGTGGAGCAGTGTACCTGGCTCCTCTCTCTGCTTCTACGC 337839

Query 99288 TTGTTGGGGACAGTGATAGTTCCCACCAGTGATCTCAGGGCCAAGGCTGCCTGATTCCCA 99347 ||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||| Sbjct 337838 TTGTTGGGGACAGCGATAGTTCCCACCAGTGATCTCAGGGCCAAGGCTGCCTGATTCCCA 337779

Query 99348 CCTCTGCCCTTGGCTGACTATGTGACATGGGCATGTTGCCTCTC--TGTTTCCATAGCTT 99405 | |||||||||||||||||||||||||||||||||||||||||| |||||||||||||| Sbjct 337778 CTTCTGCCCTTGGCTGACTATGTGACATGGGCATGTTGCCTCTCTTTGTTTCCATAGCTT 337719

Query 99406 TAAATAAAATGGGGCCAGCAAGGAAGCTCAGGAAT-GGGTCTTGGCAATGGCAAGGCTTT 99464 |||||||||||||||||||||||||||||||||| ||| |||||||||||||||||||| Sbjct 337718 CAAATAAAATGGGGCCAGCAAGGAAGCTCAGGAATAGGG-CTTGGCAATGGCAAGGCTTT 337660

Query 99465 GCTGCTCAcctcgggcctcctctgagtctctgtcccgctcctcctcctcttcctcgaatg 99524 |||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||| Sbjct 337659 GCTGCTCACCTCGGGCCTCCTCTGAGTCTCTGTCCCGCTCCTCCTCCTCTCCCTCGAATG 337600

Query 99525 ccctctgcctccATTGCCGCCAGGAATGTTCCCCTTTCCCCTGAGCCGGAGAGCATGCTC 99584 |||||||||||||||||||||||||| |||||||||||||||||||| |||||||||| | Sbjct 337599 CCCTCTGCCTCCATTGCCGCCAGGAACGTTCCCCTTTCCCCTGAGCCAGAGAGCATGCCC 337540

Query 99585 CTGGGCTTGACGGTGCTCATCCCTCAACTTGTCTCTCAAGGAGAAAGTGTGTGGCCTGTG 99644 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 337539 CTGGGCTTGACGGTGCTCATCCCTCAACTTGTCTCTCAAGGAGAAAGTGTGTGGCCTGTG 337480

Query 99645 TGGGAATTTTGATGGCATCCAGAACAATGACCTCACCAGCAGCAACCTCCAAGTGGAGGA 99704 |||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||| Sbjct 337479 TGGGAATTTTGATGGCATCCAGAACAGTGACCTCACCAGCAGCAACCTCCAAGTGGAGGA 337420

Query 99705 AGACCCTGTGGACTTTGGGAACTCCTGGAAAGTGAGCTCGCAGTGTGCTGACACCAGAAA 99764 |||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||| Sbjct 337419 AGACCCTGTGGACTTTGGGAACTCCTGGAAAGTGAGCTCGCAGTCTGCTGACACCAGAAA 337360

Query 99765 AGTACGTCTGGGTCTCTGTGTGGACAGAGCCCTAGAGCTTGCTTCCTGGAATGTCCCTCT 99824 | |||||||||||||||||||||||||||||| ||||||||||||||||||||||||||| Sbjct 337359 ACTACGTCTGGGTCTCTGTGTGGACAGAGCCCGAGAGCTTGCTTCCTGGAATGTCCCTCT 337300

Query 99825 GTCCCCATTGTCATGGGGGCTGGAAGGGGGGTTGTGGGTGGTATGACCTCCAGGTGGCTG 99884 ||||||||| | ||||||||||||||||||||||||||||||||||||| |||||||||| Sbjct 337299 GTCCCCATTCTTATGGGGGCTGGAAGGGGGGTTGTGGGTGGTATGACCTTCAGGTGGCTG 337240

Query 99885 CAGGGTGGGAAGGAGGGTCTCTTGGATCCTTCTGGGCTGAATAACCCCAGTTTGACCAGC 99944 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 337239 CAGGGTGGGAAGGAGGGTCTCTTGGATCCTTCTGGGCTGAATAACCCCAGTTTGACCAGC 337180

Query 99945 TGACGGCTGGCCTATCTCTTGCCTGGTTCCCAGGTGCCTCTGGACTCATCCCCTGCCACC 100004 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 337179 TGACGGCTGGCCTATCTCTTGCCTGGTTCCCAGGTGCCTCTGGACTCATCCCCTGCCACC 337120

Query 100005 TGCCATAACAACATCATGAAGCAGACGATGGTGGATTCCTCCTGTAGAATCCTTACCAGT 100064 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 337119 TGCCATAACAACATCATGAAGCAGACGATGGTGGATTCCTCCTGTAGAATCCTTACCAGT 337060

Query 100065 GACGTCTTCCAGGACTGCAACAAGCTGGTGAGGACCTTGAGGGTAGTGGGAAGCAGACGG 100124 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||| || Sbjct 337059 GACGTCTTCCAGGACTGCAACAAGCTGGTGAGGACCTTGAGGGTAGTGGGAAGCAGATGG 337000

Query 100125 TCCCAAGGCTTGGCCTGGTGGTATGGACACAGAGTGTGACCTTCTAACGTGGACACTACC 100184 ||||||||||||||| ||||||||||||||||||||||||||||||||||||||||||| Sbjct 336999 TCCCAAGGCTTGGCCCAGTGGTATGGACACAGAGTGTGACCTTCTAACGTGGACACTACC 336940

Query 100185 CTCGTGTCTTGACATGATCTGCACCAAGACACCACTTCGGC-tttttttCTTGGCTTTCA 100243 || |||||||||||||||||||||||||||||||||| || |||||||||||||||||| Sbjct 336939 CTTGTGTCTTGACATGATCTGCACCAAGACACCACTTTAGCTTTTTTTTCTTGGCTTTCA 336880

199

Query 100244 ATCTGGGAAACAAAAAGTAAAATCAACAGTTTCTAGGGGAAGCAATGCCTGGCAAAACAT 100303 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 336879 ATCTGGGAAACAAAAAGTAAAATCAACAGTTTCTAGGGGAAGCAATGCCTGGCAAAACAT 336820

Query 100304 TTCCTTCTGCATGAGAAGTAACTCCCCTTGGCATGTGCCAATGCTTCTCTTTCAGCCCCA 100363 ||||||||||||||||||||||||||||||||||||||||| | ||||||||| |||||| Sbjct 336819 TTCCTTCTGCATGAGAAGTAACTCCCCTTGGCATGTGCCAACGTTTCTCTTTCGGCCCCA 336760

Query 100364 GTCTTAGGATTTGTTCTCTTATTGAAGTATCTTGTTTTCAACACCAGAGCCAGAGATTTC 100423 |||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||| Sbjct 336759 GTCTTAGGATTTGTTCTCTTATGGAAGTATCTTGTTTTCAACACCAGAGCCAGAGATTTC 336700

Query 100424 CTTTTCCTGTCACTGCTGCATTTGTCCAGACCAAAAGACCTTCCTCTCCCACCCCCT-AA 100482 |||||||||||||||||||||||||||||||||||||||| |||||||||| | ||| || Sbjct 336699 CTTTTCCTGTCACTGCTGCATTTGTCCAGACCAAAAGACCGTCCTCTCCCA-CTCCTCAA 336641

Query 100483 AACCCCTTGGTGCCCATTTCTTGTCTCACAGAAATTCTTTTCTGGCCTTAATTTTGGTGA 100542 |||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||| Sbjct 336640 AACCCCTTGGTGCCCATTTCTTGTCTCACAGAAATTCTTTTCTGGCCTTACTTTTGGTGA 336581

Query 100543 TTTTGAGTCCTCGTATTATGACTTATTTTTGTGTCTTCATCTCTAATGACAAGGAGGAAT 100602 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 336580 TTTTGAGTCCTCGTATTATGACTTATTTTTGTGTCTTCATCTCTAATGACAAGGAGGAAC 336521

Query 100603 TCCGCAAGGTTAAGTGCAGttgtctctctcttttttttttttttttgagacagagtctcg 100662 ||| || ||||||||||||||| |||||| || ||||||||||||||||||||||| Sbjct 336520 TCCCCAGGGTTAAGTGCAGTTG--TCTCTC----TTCTTTTTTTTTGAGACAGAGTCTCG 336467

Query 100663 ctctgtcacccaggctggagtgcaatggcatgatcttggcttagtgcagcctctgcctcc 100722 ||||||||| |||||||||||||||||||||||||||||||||||||||||||||| ||| Sbjct 336466 CTCTGTCACTCAGGCTGGAGTGCAATGGCATGATCTTGGCTTAGTGCAGCCTCTGCTTCC 336407

Query 100723 tgggctcaagcgattctcctgcctcaacctcccgagtacctgggattacaggcacctgcc 100782 ||| ||||||||||||||||||||||||||||||||| ||||||||||||| ||| ||| Sbjct 336406 CGGGTTCAAGCGATTCTCCTGCCTCAACCTCCCGAGTAGCTGGGATTACAGGTACCCGCC 336347

Query 100783 accacgcccagctaatttttgtatttttagtagagtcaaggtttcaccatgttggtcagg 100842 ||||| ||||||||||||||||||||||||||||| | ||||||||||| |||||||||| Sbjct 336346 ACCACACCCAGCTAATTTTTGTATTTTTAGTAGAGACGAGGTTTCACCACGTTGGTCAGG 336287

Query 100843 ctggccttgaactcctga-ctcaggtgatctacccacctcg-gcctcccaaagtgctggg 100900 ||||||| |||||||||| |||||||||||||||||||| | |||||||||||||||||| Sbjct 336286 CTGGCCTCGAACTCCTGACCTCAGGTGATCTACCCACCT-GAGCCTCCCAAAGTGCTGGG 336228

Query 100901 attacaggtgtgagccaccatgcccgaccATAGTGCAGTTGTCTCTTATCTGCTGCATGT 100960 ||||||||||| |||||||||||||||||||||||| ||||||||||||||||||||||| Sbjct 336227 ATTACAGGTGTAAGCCACCATGCCCGACCATAGTGCGGTTGTCTCTTATCTGCTGCATGT 336168

Query 100961 ATGTGTGTACACACATGTGGACATGGGCTTGTACATGGACATATGTGACTACAGAAGTCT 101020 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 336167 ATGTGTGTACACACATGTGGACATGGGCTTGTACATGGACATATGTGACTACAGAAGTCT 336108

Query 101021 GTGT--ACACACACAGCCATGCACACATACATGCTGCTGTGTAACACCACCAATGTGGGA 101078 ||| ||||||||||||||||||||||||||| ||||||||| |||||||||||||||| Sbjct 336107 ATGTACACACACACAGCCATGCACACATACATGTTGCTGTGTAGCACCACCAATGTGGGA 336048

Query 101079 GAGACTGGTTGAAAACATGGATCTCAATTCTCTTTCTATCTATGCAGTGATTTTCGGTCT 101138 ||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| Sbjct 336047 GAGACTGGTTGAAAACATGGATCTCAATTCTCTTTCTATCTATGCAGTGATTTTCAGTCT 335988

Query 101139 CTGAGGACTCCAAAGGATACTTACATTCCCTGGTTTGGTGGAAATCCTGGGCATCTGAGT 101198 |||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||| Sbjct 335987 CTGAGGACTCCAAAGGATACTTACATTCCC-GGTTTGGTGGAAATCCTGGGCATCTGAGT 335929

Query 101199 TGGAAGAGTGAGGACAGGGGAGGAGTTGGGGACATTGAGACTGTTGGAACGTCTTGGAAA 101258 ||||||||||||||||||||||||| ||||||||||| |||||||||||||||||||||| Sbjct 335928 TGGAAGAGTGAGGACAGGGGAGGAGCTGGGGACATTGGGACTGTTGGAACGTCTTGGAAA 335869

Query 101259 CAATGACCCACTCAGTGTCTGAATTCATTTTCTGTCATAACTGCCCTGAAAAAGTCCAGT 101318 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 335868 CAATGACCCACTCAGTGTCTGAATTCATTTTCTGTCATAACTGCCCTGAAAAAGTCCAGT 335809

200

Query 101319 GTGTTTAGAGGCGTGTTTTGGGGATGAGGAAGGGTTGGAACTGGTTGAACTGGATTTGGA 101378 ||||||||||||||||||| ||||||||||||||| |||||||||||||||||||||||| Sbjct 335808 GTGTTTAGAGGCGTGTTTT-GGGATGAGGAAGGGTGGGAACTGGTTGAACTGGATTTGGA 335750

Query 101379 ATCAGAGTCTAGGCCCTATTGTCCTGCATACCTGCCCCATAGCACTGCAGTGAGGCAGCT 101438 |||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||| Sbjct 335749 ATCAGAGTCTAGGCCCTATTGTTCTGCATACCTGCCCCATAGCACTGCAGTGAGGCAGCT 335690

Query 101439 GCAGGGCCTGATGTATTTCCCCATTCT-CTTTGCTGAATTGAGGCAAAGAAAGACAGTGA 101497 ||||||| ||||||||||||||||||| | |||||||||||||||||||||||||||||| Sbjct 335689 GCAGGGCTTGATGTATTTCCCCATTCTGC-TTGCTGAATTGAGGCAAAGAAAGACAGTGA 335631

Query 101498 CCAGCACATATGTGTGTTTGTGTTTTTGTAAAAGCACCCACATGCTCATGAGGCTAAGAG 101557 ||||||||||||||||||||||||||||||||| |||||||||||||||||||| ||||| Sbjct 335630 CCAGCACATATGTGTGTTTGTGTTTTTGTAAAAACACCCACATGCTCATGAGGCCAAGAG 335571

Query 101558 TGGGTTGTGAGGACAGATGGGTGGCTGAGCAGGGAGGTAGGCAGAGGGACAGAGGGAATG 101617 ||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||| Sbjct 335570 TGGGTTGTGAGGACAGATGGGTGGCTGAGCAGGGAGGCAGGCAGAGGGACAGAGGGAATG 335511

Query 101618 TTCTTCTGGAAAATCCTCAGGCTCATTGTGTTCTGCAGAAGGCCAGCAGCACTGCATTAT 101677 ||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 335510 TTCTTCTGGAATATCCTCAGGCTCATTGTGTTCTGCAGAAGGCCAGCAGCACTGCATTAT 335451

Query 101678 TCAACTCTTCTTGCTGGAATGCAGATTAGAAACTAAGAATCTTGCCTTCCCACTCATTCC 101737 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 335450 TCAACTCTTCTTGCTGGAATGCAGATTAGAAACTAAGAATCTTGCCTTCCCACTCATTCC 335391

Query 101738 CTCTTTGAGACCATTGAGCTACATTTCTCCTTCTACCTGGA-CCCCCTTATCCTTAAATT 101796 ||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||| Sbjct 335390 CTCTTTGAGACCATTGAGCTACATTTCTCCTTCTACCTGGACCCCCCTTATCCTTAAATT 335331

Query 101797 GACCATCAGAACATTTGCACCCAGACTAAGAGCCAGAGTTCCTGACACCTGGCCATAGGC 101856 |||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||| Sbjct 335330 GACCATCAGAACATTTGCACCCAGACTAAGAGCCAGAGTTTCTGACACCTGGCCATAGGC 335271

Query 101857 CTGGGCCACCTGAGGCTGCCTTTGCAGGTGGACCCCGAGCCATATCTGGATGTCTGCATT 101916 |||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||| Sbjct 335270 CTGGGCCACCTGAGGCTGCCTTTGCAGGTGGACCCCGAGCCATACCTGGATGTCTGCATT 335211

Query 101917 TACGACACCTGCTCCTGTGAGTCCATTGGGGACTGCGCCTGCTTCTGCGACACCATTGCT 101976 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 335210 TACGACACCTGCTCCTGTGAGTCCATTGGGGACTGCGCCTGCTTCTGCGACACCATTGCT 335151

Query 101977 GCCTATGCCCACGTGTGTGCCCAGCATGGCAAGGTGGTGACCTGGAGGACGGCCACATTG 102036 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 335150 GCCTATGCCCACGTGTGTGCCCAGCATGGCAAGGTGGTGACCTGGAGGACGGCCACATTG 335091

Query 102037 TGCCGTGAGTACTGACGCCCTCATGTTCTCAGATGCCCTCCCTTCTTCCCATGTGTCTAT 102096 |||| ||||| ||||| ||||||||||||||||||||||||||||||||||||||||||| Sbjct 335090 TGCCATGAGTCCTGACACCCTCATGTTCTCAGATGCCCTCCCTTCTTCCCATGTGTCTAT 335031

Query 102097 GCTTGAAGACCTTGTGAGTGCAGGGGGATATCTTCATGGGCGAGAGAAGACAGAGTGTTA 102156 ||||||||||||||||||||||||||||||||||||||||||||||||||| |||| || Sbjct 335030 GCTTGAAGACCTTGTGAGTGCAGGGGGATATCTTCATGGGCGAGAGAAGAC--AGTGCTA 334973

Query 102157 GTA-GGGACTGGATGGCCAAGGGCTAAGGAGGGTTAAGACATTGGCTGTGTAAGAAGTTT 102215 ||| ||||||||||||||||||||||| |||||||||||| ||||||| ||||||||||| Sbjct 334972 GTATGGGACTGGATGGCCAAGGGCTAACGAGGGTTAAGACGTTGGCTGAGTAAGAAGTTT 334913

Query 102216 ATATTACGG---GTGAGGTGGGACATGGATTCAAGGCATGAACATGTGGAGACTTTTCTT 102272 |||||| || ||||| ||||||||||||||||||||||||||||||| ||||||||| Sbjct 334912 ATATTATGGACTTTGAGGCGGGACATGGATTCAAGGCATGAACATGTGGATACTTTTCTT 334853

Query 102273 CTGGAGAGATTCTGGCAGGGGAGAAGAGGGAATACTGATGAAAAGAAGGAAGTCGATTTA 102332 |||||||||||||||||| |||||||||||||||||||||||||||||||||| |||||| Sbjct 334852 CTGGAGAGATTCTGGCAGAGGAGAAGAGGGAATACTGATGAAAAGAAGGAAGTTGATTTA 334793

Query 102333 TGTCTTTAATTAGGCGTGATTATATTTGCCAATATGAGTGATCAACTCATACATTCATGT 102392 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 334792 TGTCTTTAATTAGGCGTGATTATATTTGCCAATATGAGTGATCAACTCATACATTCATGT 334733

201

Query 102393 CTATAGAATCTTGCTTCTTTGGACAGAAGCAACTTAATGTTTTTATGtagaaaactgggc 102452 |||||||||||| |||||||||||| ||||||||||||| |||||||||||||||||||| Sbjct 334732 CTATAGAATCTTCCTTCTTTGGACATAAGCAACTTAATGCTTTTATGTAGAAAACTGGGC 334673

Query 102453 cgggcacagtgactcatacctgtaatccctgtgttttgggaggctgaggccagaggattc 102512 | |||||||||||||| |||||| || ||||||||||||||||||||||||||||||||| Sbjct 334672 CAGGCACAGTGACTCACACCTGTGATACCTGTGTTTTGGGAGGCTGAGGCCAGAGGATTC 334613

Query 102513 cttgagcccaggagttcaagaccagcctgggcaacatagcaagaccccatctgtacaaaa 102572 |||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||| Sbjct 334612 CTTGAGCCCAGGAGTTCAAGACCAGCCTGGGCAACATAGCAAGACCCCATCTCTACAAAA 334553

Query 102573 attaaaaaaagaaaTGAATTCAGAACCAATAGATTCTGGTTTAGGTGCTTCAACAATCCA 102632 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 334552 ATTAAAAAAAGAAATGAATTCAGAACCAATAGATTCTGGTTTAGGTGCTTCAACAATCCA 334493

Query 102633 GAAGTCTCTAATATTGGTGACGCCCATAGTCCCCTAGTTCCCCAACATTATCTCCAGATG 102692 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 334492 GAAGTCTCTAATATTGGTGACGCCCATAGTCCCCTAGTTCCCCAACATTATCTCCAGATG 334433

Query 102693 GCGCAGGCCATCACCACATGGGTCTGCAGTCCTGGAGGCTTTGCCTGTTGTGGCCACAGC 102752 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 334432 GCGCAGGCCATCACCACATGGGTCTGCAGTCCTGGAGGCTTTGCCTGTTGTGGCCACAGC 334373

Query 102753 CTTGTCTCCTGTCTACACAGCCCAGAGCTGCGAGGAGAGGAATCTCCGGGAGAACGGGTA 102812 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 334372 CTTGTCTCCTGTCTACACAGCCCAGAGCTGCGAGGAGAGGAATCTCCGGGAGAACGGGTA 334313

Query 102813 TGAGTGTGAGTGGCGCTATAACAGCTGTGCACCTGCCTGTCAAGTCACGTGTCAGCACCC 102872 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 334312 TGAGTGTGAGTGGCGCTATAACAGCTGTGCACCTGCCTGTCAAGTCACGTGTCAGCACCC 334253

Query 102873 TGAGCCACTGGCCTGCCCTGTGCAGTGTGTGGAGGGCTGCCATGCCCACTGCCCTCCAGG 102932 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 334252 TGAGCCACTGGCCTGCCCTGTGCAGTGTGTGGAGGGCTGCCATGCCCACTGCCCTCCAGG 334193

Query 102933 TGAGGCCTCTATCCCTGGGGGTCAGGCTGGTGGGATGGGATAGGGATGGATGGAAAGGTG 102992 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 334192 TGAGGCCTCTATCCCTGGGGGTCAGGCTGGTGGGATGGGATAGGGATGGATGGAAAGGTG 334133

Query 102993 CTTCTAGGTCTTGCTTCATCTCAGCCTCCACCTGCCACGTCCTATCTCTGACCTGCAAGG 103052 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 334132 CTTCTAGGTCTTGCTTCATCTCAGCCTCCACCTGCCACGTCCTATCTCTGACCTGCAAGG 334073

Query 103053 CTGCTGCAGGTTCCGTGGGTTCTTTCATCAGAGTCAGGACAGTCGTGATTTTTCTCAAGT 103112 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 334072 CTGCTGCAGGTTCCGTGGGTTCTTTCATCAGAGTCAGGACAGTCGTGATTTTTCTCAAGT 334013

Query 103113 CGAGCTCCTCCAAAATGCTTTTCTGTGCCTATTTATGGGATTCTCACCTAAAGCAGCCCC 103172 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 334012 CGAGCTCCTCCAAAATGCTTTTCTGTGCCTATTTATGGGATTCTCACCTAAAGCAGCCCC 333953

Query 103173 TGCCGATAGAACTTTCTGCAGTGGGGGAATGTTGTATTGAATGTaactgtgacgagtgtg 103232 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 333952 TGCCGATAGAACTTTCTGCAGTGGGGGAATGTTGTATTGAATGTAACTGTGACGAGTGTG 333893

Query 103233 tctgaggaactgaagttttgaattttattgaattttacttaatttaacgtaaatagccac 103292 |||||||||||||| ||||||||||||||||||||||||||||||||| ||||||||||| Sbjct 333892 TCTGAGGAACTGAATTTTTGAATTTTATTGAATTTTACTTAATTTAACTTAAATAGCCAC 333833

Query 103293 acgcagtttgtgacgatcctatggtacagcacagCTCTAGAGTTACTGAAGGTGTTATTC 103352 | ||||||| |||| |||||||||||||||||||||| |||||||||||||||||||||| Sbjct 333832 ATGCAGTTTATGACTATCCTATGGTACAGCACAGCTCGAGAGTTACTGAAGGTGTTATTC 333773

Query 103353 AGAAGAGCAGAAAGAGCCCCGGAGATAAGATTTCATTTGTCCTGAGGCTTGGGGAGGTGA 103412 ||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||| Sbjct 333772 AGAAGAGCAGAAAGAGCCCTGGAGATAAGATTTCATTTGTCCTGAGGCTTGGGGAGGTGA 333713

Query 103413 GGTAGGGTGAAGGAATCCCCGCTCCCAGTTTTGCAGAGGGATCAATCAAGGCACAAGCAG 103472 |||||| |||||||||||||||||||||||||||||||||||||||||||||||||| || Sbjct 333712 GGTAGGATGAAGGAATCCCCGCTCCCAGTTTTGCAGAGGGATCAATCAAGGCACAAGGAG 333653

202

Query 103473 GAGAGATGCTCCTTGAGTGATGGGGTGACCCCT-GGGAGTGCAGGCAGGAGGAGTTGGCT 103531 ||||||||||||||||||| ||||||||||| ||| |||||||||||||||||||||| Sbjct 333652 GAGAGATGCTCCTTGAGTGGCAGGGTGACCCCTAGGG-GTGCAGGCAGGAGGAGTTGGCT 333594

Query 103532 TCTAGGGCAGGAGGAGGAGTTGGCTCCTCCCTTTTAGTTAAAAATGAGGCTTCCTCGTGG 103591 |||||||||||||||||||||||||||||||||||||||||||||||| |||||| |||| Sbjct 333593 TCTAGGGCAGGAGGAGGAGTTGGCTCCTCCCTTTTAGTTAAAAATGAGACTTCCTTGTGG 333534

Query 103592 GAAAGGGGAGCGTTTTGGTTCCTAATGAGAGCTTTCTTTTGCAGGGAAAATCCTGGATGA 103651 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 333533 GAAAGGGGAGCGTTTTGGTTCCTAATGAGAGCTTTCTTTTGCAGGGAAAATCCTGGATGA 333474

Query 103652 GCTTTTGCAGACCTGCGTTGACCCTGAAGACTGTCCAGTGTGTGAGGTGGCTGGCCGGCG 103711 ||||||||||||||||||||||||||||||||| || |||||||||||||||||| |||| Sbjct 333473 GCTTTTGCAGACCTGCGTTGACCCTGAAGACTGCCCGGTGTGTGAGGTGGCTGGCTGGCG 333414

Query 103712 TTTTGCCTCAGGAAAGAAAGTCACCTTGAATCCCAGTGACCCTGAGCACTGCCAGATTTG 103771 |||||||||| ||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 333413 TTTTGCCTCAAGAAAGAAAGTCACCTTGAATCCCAGTGACCCTGAGCACTGCCAGATTTG 333354

Query 103772 GTAAAACAGATTCCTGGGTTGTTTGAAGTGATGAATCTTATTGCTTCTCCAGGGGCCAGC 103831 ||||||||||| ||||| ||||||||||||| |||||||||||||||||||||||||||| Sbjct 333353 GTAAAACAGATGCCTGGATTGTTTGAAGTGACGAATCTTATTGCTTCTCCAGGGGCCAGC 333294

Query 103832 TTAGAGACTAGGTTTTGGGTAAAAAGTCCTAGCCACATGAGTTGAGAGACTGAgtttatt 103891 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 333293 TCAGAGACTAGGTTTTGGGTAAAAAGTCCTAGCCACATGAGTTGAGAGACTGAGTTTATT 333234

Query 103892 ttatttgtttatttatttaattcattatttaatgtagtttattgttttgtttgttCCTAG 103951 ||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||| Sbjct 333233 TTATTTGTTTATTTATTTAATTCATTATTTAGTGTAGTTTATTGTTTTGTTTGTTCCTAG 333174

Query 103952 GGTTTGCTTCTTTTTCTTTAGGGATGGACTAGACTGCTTCCTCCACATATATTCCTCCAC 104011 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||| || Sbjct 333173 GGTTTGCTTCTTTTTCTTTAGGGATGGACTAGACTGCTTCCTCCACATATATTCCTCTAC 333114

Query 104012 AGTGTTTTTGCTCTTAATAGGTTTTTAAGTCCTAACAGGTTTTTGCTTTTCTCACTGAAG 104071 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 333113 AATGTTTTTGCTCTTAATAGGTTTTTAAGTCCTAACAGGTTTTTGCTTTTCTCACTGAAG 333054

Query 104072 GTGGGGGTATGGTCACTTATTGTCCTTGTAGGTCATGACTAGTCTGAGCACAGGTGTCCA 104131 ||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 333053 GTGAGGGTATGGTCACTTATTGTCCTTGTAGGTCATGACTAGTCTGAGCACAGGTGTCCA 332994

Query 104132 TATACTGGAGGTGGGGACTCAGGAGAGGAAAGTGAGATGGAACCCAGGGCCCTCAGG-TA 104190 |||||||||||||| |||||||||||||||||||||||||||| | ||||||||||| || Sbjct 332993 TATACTGGAGGTGGCGACTCAGGAGAGGAAAGTGAGATGGAACGCGGGGCCCTCAGGGTA 332934

Query 104191 ACATGAACGTCGTGGGGGCCATAGTCAAGGTGGCAGGGTGTGGCTGTCATAATGGGGAGG 104250 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 332933 ACATGAACGTCGTGGGGGCCATAGTCAAGGTGGCAGGGTGTGGCTGTCATAATGGGGAGG 332874

Query 104251 GAAGGTGCTCGTTGCTCTCACCCTGGCTTCTCACTGGACTTGCCTCTGGGAAACTGGCCT 104310 |||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||| Sbjct 332873 GAAGGTGCTCGTTGCTCTCACCCTGGGTTCTCACTGGACTTGCCTCTGGGAAACTGGCCT 332814

Query 104311 TTGGGGAGGTGAGAAGGAGTCCTCAGGGTGAGCTCTGAGCCCTCTGAAGTCCTGCTCATA 104370 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 332813 TTGGGGAGGTGAGAAGGAGTCCTCAGGGTGAGCTCTGAGCCCTCTGAAGTCCTGCTCATA 332754

Query 104371 AGTAGGGAGCATTTATTAAAATCCAGATCTATTCAACACCACTCCCTGACACTAATCAAG 104430 |||||||| |||||||||||||||| ||||||||||||||||||||||||||| |||| | Sbjct 332753 AGTAGGGAACATTTATTAAAATCCACATCTATTCAACACCACTCCCTGACACTGATCAGG 332694

Query 104431 CCTTCTGGGAGCTTAAACTTGAGAACGTAAATTCGCTGGGGAGGGGCTATGGTCACAGCT 104490 ||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||| Sbjct 332693 CCTTCTGGGAGCTTAAACTTGAGAATGTAAATTCGCTGGGGAGGGGCTATGGTCACAGCT 332634

Query 104491 GCTAGAGTCAGATCCTGCCGCCACCATCTGTACAAACAAGAATGAACACTGCCTGGGGTG 104550 || |||||||||||||||| |||||||||| ||||||||||| ||||||||||||||||| Sbjct 332633 GCCAGAGTCAGATCCTGCCACCACCATCTGCACAAACAAGAACGAACACTGCCTGGGGTG 332574

203

Query 104551 AGTTTTCTAGCAAACACAGGAAACTAGAAAGTATGTCACTCATGCACTGTTAACCAAGGA 104610 |||||||||||||||||||||||||||||||||||||||||| |||| |||||||||||| Sbjct 332573 AGTTTTCTAGCAAACACAGGAAACTAGAAAGTATGTCACTCACGCACCGTTAACCAAGGA 332514

Query 104611 TTGAAGCAATAAATCTTGTGGATCACGGGAAGGAGGAATTAACTTCCTAGTAAGACTCTA 104670 |||||||||||||||| ||||||| |||||||||||||||||||||||||||||||||| Sbjct 332513 TTGAAGCAATAAATCTCATGGATCATGGGAAGGAGGAATTAACTTCCTAGTAAGACTCTA 332454

Query 104671 CCTGCGCACACTCTTTATGCTGAAACAGAGACTGTAGGGAGCAAAGGACTGGGTGATTCT 104730 |||||||||||||||||||||||||||| ||||||||||||||||||| ||||||||||| Sbjct 332453 CCTGCGCACACTCTTTATGCTGAAACAGGGACTGTAGGGAGCAAAGGATTGGGTGATTCT 332394

Query 104731 GATGCTGGCTTGGGGTTCTGTGCCAAAAGGGCATGAGAGCTTCGAGCCCTCTCCCCACCT 104790 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 332393 GATGCTGGCTTGGGGTTCTGTGCCAAAAGGGCATGAGAGCTTCGAGCCCTCTCCCCACCT 332334

Query 104791 CCTTTGCTGGCACACGGGCAGGGGCAGCTGCCACGGTGCACGCTGAAGCTGCAAGCAGTG 104850 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 332333 CCTTTGCTGGCACACGGGCAGGGGCAGCTGCCACGGTGCACGCTGAAGCTGCAAGCAGTG 332274

Query 104851 TGTGGACAGAAGTCCCACAGCAGGCTGGTGGCATCACCTGGTATCAAGCCGCAGGGGGAC 104910 |||||||||||||||||||||||||||| ||||||||||||||||||||| ||||||||| Sbjct 332273 TGTGGACAGAAGTCCCACAGCAGGCTGGGGGCATCACCTGGTATCAAGCCACAGGGGGAC 332214

Query 104911 ACTTCCCAGGCTCAGGGCGCTGCCGCCTTGTCAGCTTTACTGGGAGTGCTGAAGGCCAGG 104970 |||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||| Sbjct 332213 ACTTCCCAGGCTCAGGGCACTGCCGCCTTGTCAGCTTTACTGGGAGTGCTGAAGGCCAGG 332154

Query 104971 TGCCTCCAAGGCTGAGAGTCCAAAGAGGAGATGGTCCTGGCCCATATTGGTCTCCACCAC 105030 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 332153 TGCCTCCAAGGCTGAGAGTCCAAAGAGGAGATGGTCCTGGCCCATATTGGTCTCCACCAC 332094

Query 105031 ACGGTCACTTGATTTCACCTGTGGGCTGGCAGGCCGTGCCACTTTCCGGGACACTGGCAA 105090 | || ||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 332093 ATGGCCACTTGATTTCACCTGTGGGCTGGCAGGCCGTGCCACTTTCCGGGACACTGGCAA 332034

Query 105091 AAAGGTGATGTTACCAGCAGAGCCTGGAAATTTGGCATGGTAGTGTAGGAATTTGTGTGT 105150 ||||||||||||||||||||||||| |||||||||||| ||||||||||||||||||||| Sbjct 332033 AAAGGTGATGTTACCAGCAGAGCCTCGAAATTTGGCATAGTAGTGTAGGAATTTGTGTGT 331974

Query 105151 GGGGGGCCACTGCTCTACTACAGACTTGTGGTGTGTGCTTGGGGCTGGGCACAGAGCCTG 105210 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 331973 GTGGGGCCACTGCTCTACTACAGACTTGTGGTGTGTGCTTGGGGCTGGGCACAGAGCCTG 331914

Query 105211 GCCAGTACCTGGCACATGAGAGGTGCTCACCCTATGTCTGCTGGAGGAATGAGTGAATGT 105270 |||| ||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 331913 GCCAATACCTGGCACATGAGAGGTGCTCACCCTATGTCTGCTGGAGGAATGAGTGAATGT 331854

Query 105271 CTTTTCCTAGGCCTGACACCGTGGAGACCCCTGTCTGCTACTCCCAAATAATTTGTTCTG 105330 |||||| ||||||| ||||| ||| ||||||||||||||||||||||||||||||||||| Sbjct 331853 CTTTTCTTAGGCCTAACACCTTGGGGACCCCTGTCTGCTACTCCCAAATAATTTGTTCTG 331794

Query 105331 CTAGATAAAGCTGGGGTTTTGAATCTTTATTCATCTTTCTACAATGTTTTTGCTCCAATT 105390 ||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||| Sbjct 331793 CTAGATAAAGCTGGGGTTTTGAATCTTTATTCATCTTTCCACAATGTTTTTGCTCCAATT 331734

Query 105391 TTTTCTCTCTTCGTTAGGGCCCAACATATCAAACATGTTTTAAAACTCATGCAATTTACA 105450 ||||||||||||||||||||||||| ||| |||||||||||||||||||||||||||||| Sbjct 331733 TTTTCTCTCTTCGTTAGGGCCCAACGTATTAAACATGTTTTAAAACTCATGCAATTTACA 331674

Query 105451 TTCTGTTAGAATATGTACTGTAAAGCATTTATCTTCAACAGAACATGTACAAACACAGGT 105510 |||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||| Sbjct 331673 TTCTGTTAGAATATGTACTGTAAAGCATTTATCTTCAACAGAACATGTACAAGCACAGGT 331614

Query 105511 TAGCGTCTACATTTTAGTAACATTTGTTCTTGGAGACACTTGTAAGAAGGCTTGGATTAT 105570 ||||||||||||||||||||||||||||||||||||||||||||||||||||| |||||| Sbjct 331613 TAGCGTCTACATTTTAGTAACATTTGTTCTTGGAGACACTTGTAAGAAGGCTTAGATTAT 331554

Query 105571 AATTAAGTATTCATGTTGCTAAAGATGTAGGTTGAAAAGAAATCAGCTTTGAGTGGTTGT 105630 |||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||| Sbjct 331553 AATTAAGTATTCATGTTGCTAAAGATGTAGGTTGAAAAGAAATCAGCTTTGAGTGGCTGT 331494

204

Query 105631 GTGGATGGCCTAGAACAACGAGTATATTTTTTCATATCCTAGGATGGGCTAAAGCTTTGG 105690 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 331493 GTGGATGGCCTAGAACAACGAGTATATTTTTTCATATCCTAGGATGGGCTAAAGCTTTGG 331434

Query 105691 TTTCTAGAATCAAAGGAAGTCGGCTATGTGTGTGTTTTGAAGGTGGGGGG-CATGCTATT 105749 |||||||||||||||||||||||||||||||||||||||||| ||||||| ||||||||| Sbjct 331433 TTTCTAGAATCAAAGGAAGTCGGCTATGTGTGTGTTTTGAAG-TGGGGGGGCATGCTATT 331375

Query 105750 TGGGGACAGATGTTAAACAATGACATCTCACTTGGATGTGGAATGGTCCATGGGATCTCA 105809 ||||||||||||||||||| ||||||||||||||| ||||||||||||| |||||||||| Sbjct 331374 TGGGGACAGATGTTAAACAGTGACATCTCACTTGGGTGTGGAATGGTCCGTGGGATCTCA 331315

Query 105810 AGTTCAGGTGGAACAGAGGAGATTCTGTGGGAATATGGAAGTCATTG-TACACTGTAGGG 105868 |||||||||||||||||||||||||||||||||||||||||| | | |||||||||||| Sbjct 331314 AGTTCAGGTGGAACAGAGGAGATTCTGTGGGAATATGGAAGT-AGTCATACACTGTAGGG 331256

Query 105869 CTCAGAAGTGTCCACAGGTTCTTCCTGAACCATTTTAATTTCTTCGCTCTTTTCTGCAGC 105928 |||||||||||||| |||||||| |||||||||||||||||||||||||||||||||||| Sbjct 331255 CTCAGAAGTGTCCATAGGTTCTTTCTGAACCATTTTAATTTCTTCGCTCTTTTCTGCAGC 331196

Query 105929 CACTGTGATGTTGTCAACCTCACCTGTGAAGCCTGCCAGGAGCCGGGAGGCCTGGTGGTG 105988 |||||||||| ||||| |||||||||||||||||||||||||||||||||||||||||| Sbjct 331195 CACTGTGATGGTGTCACCCTCACCTGTGAAGCCTGCCAGGAGCCGGGAGGCCTGGTGGTA 331136

Query 105989 CCTCCCACAGATGCCCCGGTGAGCCCCACCACTCTGTATGTGGAGGACATCTCGGAACCG 106048 ||||||||||||||||||||||||||||||||||||||||||||||||||||| |||||| Sbjct 331135 CCTCCCACAGATGCCCCGGTGAGCCCCACCACTCTGTATGTGGAGGACATCTCAGAACCG 331076

Query 106049 CCGTTGCACGATTTCTACTGCAGCAGGCTACTGGACCTGGTCTTCCTGCTGGATGGCTCC 106108 | ||||||||||||||||||||||||||||||||||||| |||||||||||||||||||| Sbjct 331075 CTGTTGCACGATTTCTACTGCAGCAGGCTACTGGACCTGATCTTCCTGCTGGATGGCTCC 331016

Query 106109 TCCAGGCTGTCCGAGGCTGAGTTTGAAGTGCTGAAGGCCTTTGTGGTGGACATGATGGAG 106168 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 331015 TCCAGGCTGTCCGAGGCTGAGTTTGAAGTGCTGAAGGCCTTTGTGGTGGACATGATGGAG 330956

Query 106169 CGGCTGCGCATCTCCCAGAAGTGGGTCCGCGTGGCCGTGGTGGAGTACCACGACGGCTCC 106228 ||||||||||||||| ||||||||||||||||||| |||||||||||||||||||||||| Sbjct 330955 CGGCTGCGCATCTCCTAGAAGTGGGTCCGCGTGGCTGTGGTGGAGTACCACGACGGCTCC 330896

Query 106229 CACGCCTACATCGGGCTCAAGGACCGGAAGCGACCGTCAGAGCTGCGGCGCATTGCCAGC 106288 ||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||| Sbjct 330895 CACGCCTACATCGGGCTCAAGGACCGGAAGCGACCGTCAGAGCTGCGGCGCGTTGCCAGC 330836

Query 106289 CAGGTGAAGTATGCGGGCAGCCAGGTGGCCTCCACCAGCGAGGTCTTGAAATACACACTG 106348 ||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||| Sbjct 330835 CAGGTGAAGTATGCGGGCAGCCAGGTGGCCTCCACCAGCGAGGCCTTGAAATACACACTG 330776

Query 106349 TTCCAAATCTTCAGCAAGATCGACCGCCCTGAAGCCTCCCGCATCACCCTGCTCCTGATG 106408 ||||||||| ||||||||||||||||||||||||||| ||||||| |||||||||||||| Sbjct 330775 TTCCAAATCATCAGCAAGATCGACCGCCCTGAAGCCTTCCGCATCGCCCTGCTCCTGATG 330716

Query 106409 GCCAGCCAGGAGCCCCAACGGATGTCCCGGAACTTTGTCCGCTACGTCCAGGGCCTGAAG 106468 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 330715 GCCAGCCAGGAGCCCCAACGGATGTCCCGGAACTTTGTCCGCTACGTCCAGGGCCTGAAG 330656

Query 106469 AAGAAGAAGGTCATTGTGATCCCGGTGGGCATTGGGCCCCATGCCAACCTCAAGCAGATC 106528 |||||||||||||| ||||||||||||||||||||||||||||||||||||||||||||| Sbjct 330655 AAGAAGAAGGTCATCGTGATCCCGGTGGGCATTGGGCCCCATGCCAACCTCAAGCAGATC 330596

Query 106529 CGCCTCATCGAGAAGCAGGCCCCTGAGAACAAGGCCTTCGTGCTGAGCAGTGTGGATGAG 106588 |||||||| ||||||||||||||||||||||||||||||||||||||| ||||||||||| Sbjct 330595 CGCCTCATTGAGAAGCAGGCCCCTGAGAACAAGGCCTTCGTGCTGAGCGGTGTGGATGAG 330536

Query 106589 CTGGAGCAGCAAAGGGACGAGATCGTTAGCTACCTCTGTGACCTTGCCCCTGAAGCCCCT 106648 ||||||||||||||||| ||||||||||||||||||||||||||||||||||| |||||| Sbjct 330535 CTGGAGCAGCAAAGGGAAGAGATCGTTAGCTACCTCTGTGACCTTGCCCCTGACGCCCCT 330476

Query 106649 CCTCCTACTCTGCCCCCCGACATGGCACAAGTCACTGTGGGCCCGGGGCTCTTGGGGGTT 106708 |||||||||||||||||| |||||||||||||||||||||||||||||||||||||| || Sbjct 330475 CCTCCTACTCTGCCCCCCCACATGGCACAAGTCACTGTGGGCCCGGGGCTCTTGGGGCTT 330416

205

Query 106709 TCGACCCTGGGGCCCAAGAGGAACTCCATGGTTCTGGATGTGGCGTTCGTCCTGGAAGGA 106768 |||||||||||||||||||||||||||||||||||||||||||| || |||||||||||| Sbjct 330415 TCGACCCTGGGGCCCAAGAGGAACTCCATGGTTCTGGATGTGGCATTTGTCCTGGAAGGA 330356

Query 106769 TCGGACAAAATTGGTGAAGCCGACTTCAACAGGAGCAAGGAGTTCATGGAGGAGGTGATT 106828 ||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||| Sbjct 330355 TCGGACAAAATTGGTGAAGCCAACTTCAACAGGAGCAAGGAGTTCATGGAGGAGGTGATT 330296

Query 106829 CAGCGGATGGATGTGGGCCAGGACAGCATCCACGTCACGGTGCTGCAGTACTCCTACATG 106888 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 330295 CAGCGGATGGATGTGGGCCAGGACAGCATCCACGTCACGGTGCTGCAGTACTCCTACATG 330236

Query 106889 GTGACTGTGGAGTACCCCTTCAGCGAGGCACAGTCCAAAGGGGACATCCTGCAGCGGGTG 106948 ||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 330235 GTGACCGTGGAGTACCCCTTCAGCGAGGCACAGTCCAAAGGGGACATCCTGCAGCGGGTG 330176

Query 106949 CGAGAGATCCGCTACCAGGGCGGCAACAGGACCAACACTGGGCTGGCCCTGCGGTACCTC 107008 |||||||||||||||||||| ||||||||||||||||||||| ||||||||| ||||| | Sbjct 330175 CGAGAGATCCGCTACCAGGGTGGCAACAGGACCAACACTGGGGTGGCCCTGCTGTACCCC 330116

Query 107009 TCTGACCACAGCTTCTTGGTCAGCCAGGGTGACCGGGAGCAGGCGCCCAACCTGGTCTAC 107068 || |||||||||||||||||||||||||||||||||||||||| |||||||||||||| | Sbjct 330115 TCCGACCACAGCTTCTTGGTCAGCCAGGGTGACCGGGAGCAGGTGCCCAACCTGGTCTCC 330056

Query 107069 ATGGTCACCGGAAATCCTGCCTCTGATGAGATCAAGAGGCTGCCTGGAGACATCCAGGTG 107128 ||||||||| ||||||||||||||||| ||||||||||| |||||||||||||||||||| Sbjct 330055 ATGGTCACCAGAAATCCTGCCTCTGATAAGATCAAGAGGTTGCCTGGAGACATCCAGGTG 329996

Query 107129 GTGCCCATTGGAGTGGGCCCTAATGCCAACGTGCAGGAGCTGGAGAGGATTGGCTGGCCC 107188 |||||||||||||||||||||||||||||||| ||||||||||||||||| ||||||||| Sbjct 329995 GTGCCCATTGGAGTGGGCCCTAATGCCAACGTTCAGGAGCTGGAGAGGATCGGCTGGCCC 329936

Query 107189 AATGCCCCTATCCTCATCCAGGACTTTGAGACGCTCCCCCGAGAGGCTCCTGACCTGGTG 107248 |||||||| |||||||||||||||||||||||||||||| |||||||||||||||||||| Sbjct 329935 AATGCCCCCATCCTCATCCAGGACTTTGAGACGCTCCCCTGAGAGGCTCCTGACCTGGTG 329876

Query 107249 CTGCAGAGGTGCTGCTCCGGAGAGGGGCTGCAGATCCCCACCCTCTCCCCTGCACCTGGT 107308 ||||||||||||||||||||||||||||||||||||||||||| ||||||||| |||||| Sbjct 329875 CTGCAGAGGTGCTGCTCCGGAGAGGGGCTGCAGATCCCCACCCGCTCCCCTGCCCCTGGT 329816

Query 107309 ATGCTGGCACCTTGTGTGCAGGTGGGAGGGCTGGGCGAGGGCTGGCATGGCCTTGGTGCT 107368 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||| || Sbjct 329815 ATGCTGGCACCTTGTGTGCAGGTGGGAGGGCTGGGCGAGGGCTGGCATGGCCTTGGTACT 329756

Query 107369 ACATGCATCTGCCAAGATACGACTCGGGTTCTAATCCTGGCTTCCCTGGTCTGTGTGGCC 107428 ||||||||||||||||||| |||| |||||||||||||||||||||||||||||||| || Sbjct 329755 ACATGCATCTGCCAAGATATGACTTGGGTTCTAATCCTGGCTTCCCTGGTCTGTGTGCCC 329696

Query 107429 TTGGTTGAAACTTGCCTTCAAAGGGCCTGTGTTTCCTCACCTCCCTGGCAGGGAGACAAA 107488 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 329695 TTGGTTGAAACTTGCCTTCAAAGGGCCTGTGTTTCCTCACCTCCCTGGCAGGGAGACAAA 329636

Query 107489 CTGTGATCCTTTTTCGGGGGAAGATCTTTTCTTATTGTGTGTGCTTCTATAAATTCAAAG 107548 ||||||||||||| |||||||||||||||||||||||||||||||||| ||||||||||| Sbjct 329635 CTGTGATCCTTTTGCGGGGGAAGATCTTTTCTTATTGTGTGTGCTTCTGTAAATTCAAAG 329576

Query 107549 GTAATGGAAGCCTATTGTTAACATTTAAACAATGCCAATAAGGAGTaaaaaaaaaaaaaa 107608 ||||||||||||||||||||||||||||||||||||| ||||||||||||||||||| | Sbjct 329575 GTAATGGAAGCCTATTGTTAACATTTAAACAATGCCACTAAGGAGTAAAAAAAAAAATA- 329517

Query 107609 aaTGCAAGTCTTACTTTATCACACAGCCTCTTCCCCACCCAATTCCACCCTCTGCCTTCA 107668 |||||||||||||||||||||| ||||||||||||||||||||||||||| |||||| Sbjct 329516 --TGCAAGTCTTACTTTATCACACGGCCTCTTCCCCACCCAATTCCACCCTCCACCTTCA 329459

Query 107669 AGGTAACCTCTGTTAAAGGTGGTGATGATAAGCTTCATTTTGACCACACTGGGTGGGTGA 107728 ||||||||||||||||||||| |||||||||||||| ||||||||||||||||||||||| Sbjct 329458 AGGTAACCTCTGTTAAAGGTGATGATGATAAGCTTCGTTTTGACCACACTGGGTGGGTGA 329399

Query 107729 AATTATTCCGAGTAGCTGACATTCTTCAGGGCAAGACCAGGATGTCAGGTGTCAGGTAGC 107788 |||||||| ||||||||||||||||||||||| |||||||||||||||||| |||||||| Sbjct 329398 AATTATTCTGAGTAGCTGACATTCTTCAGGGCGAGACCAGGATGTCAGGTGCCAGGTAGC 329339

206

Query 107789 AGGTGCAGTCTTCAGACACTGGAGTCAGA--CTGCCTTAGTTTGAATCTTATCTCTGCTA 107846 |||||||||||||||||||||||||||| ||||||||||||||||||||||||||| | Sbjct 329338 AGGTGCAGTCTTCAGACACTGGAGTCAGTGTCTGCCTTAGTTTGAATCTTATCTCTGCCA 329279

Query 107847 CTTGCTAGCTGGGTAATTTTGGGCAAATACTTAACCTCTTTGTATATC-GTCT 107898 |||||||||||||||||||||||||||||||||||||||||||||||| |||| Sbjct 329278 CTTGCTAGCTGGGTAATTTTGGGCAAATACTTAACCTCTTTGTATATCAGTCT 329226

Query 107969 tcctcatctataaaatggggataataatagcacctacTTGCTTAAAATAGTATCTGGCAC 108028 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 329226 TCCTCATCTATAAAATGGGGATAATAATAGCACCTACTTGCTTAAAATAGTATCTGGCAC 329167

Query 108029 ATGACAAGTGCTCaaaaaaaaaTGCTTGCTCCCACTGCTGTTACTACTACCTTTTACTGA 108088 ||||| ||||||| ||||||||||||||||||||||||||||| |||||||||||||||| Sbjct 329166 ATGAC-AGTGCTC-AAAAAAAATGCTTGCTCCCACTGCTGTTATTACTACCTTTTACTGA 329109

Query 108089 CACTGGCGTCTAATCCATTCCTAGTTCCTGAACATCTTTATTCGGTGTGTTTGGGACCAT 108148 |||||| ||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 329108 CACTGGTGTCTAATCCATTCCTAGTTCCTGAACATCTTTATTCGGTGTGTTTGGGACCAT 329049

Query 108149 CCCAGAATAATAAGCCTTCTTTTAGTATCTTTTGAGTCACACACTTGTCAGTACttttgt 108208 | |||||||||||||||||||||||||||||||||||| ||||||||||||||||||||| Sbjct 329048 CTCAGAATAATAAGCCTTCTTTTAGTATCTTTTGAGTCTCACACTTGTCAGTACTTTTGT 328989

Query 108209 ttctttgtttttgtttttATTCACGGCCATAGATTTATTTAAATTCTTGTAATATTTCTG 108268 ||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||| Sbjct 328988 TTCTTTGTTTTTGTTTTTATTCATGGCCATAGATTTATTTAAATTCTTGTAATATTTCTG 328929

Query 108269 CTGAGGAAAACAACAATTACATCATTTCATCAAATCTCAGATGTGCTCAGACACTAACAG 108328 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 328928 CTGAGGAAAACAACAATTACATCATTTCATCAAATCTCAGATGTGCTCAGACACTAACAG 328869

Query 108329 GAGCACTAGGCATTTATAGCTGGAAGAATCACAGTATGTTCACCTGCCCTGCAAGATCTG 108388 |||||||||||||||||||||||||||||||||||| ||||||||||| ||||||||||| Sbjct 328868 GAGCACTAGGCATTTATAGCTGGAAGAATCACAGTACGTTCACCTGCCTTGCAAGATCTG 328809

Query 108389 AGGACACAGCAGCTCATTGTCCAGGGAGGGGCTGCCGCTCCATTTCCTTTTGCAGTCTCC 108448 ||||||||||||||||||||||||||||||||||||||||||||||||||||||| | | Sbjct 328808 AGGACACAGCAGCTCATTGTCCAGGGAGGGGCTGCCGCTCCATTTCCTTTTGCAG--T-C 328752

Query 108449 TTGTATTGCCAGGCCAGTATTTTACTCATTTCTAGAAGAATGGTGGCCCCTTCTTACCGA 108508 ||||||||||||||||||||||||||||||||||||||| ||| |||||||||||||||| Sbjct 328751 TTGTATTGCCAGGCCAGTATTTTACTCATTTCTAGAAGAGTGGCGGCCCCTTCTTACCGA 328692

Query 108509 GGAAGCCTATGCCTGCTGCTTTTATTTGTAGACATTTAAACTTCCTTTGGGTAGAATTGG 108568 ||||||| |||||| |||||||||||||||||||||||| |||||||||||||||| ||| Sbjct 328691 GGAAGCCCATGCCTACTGCTTTTATTTGTAGACATTTAAGCTTCCTTTGGGTAGAACTGG 328632

Query 108569 AGTCTTCTCAGTGTCTCTAAATCTGAGGTAGTCCGGACCCAGGAACTCCATCTCCCC-AT 108627 || |||||||||||||| ||| |||||||||||| | |||||||||||||||||||| || Sbjct 328631 AGGCTTCTCAGTGTCTCCAAACCTGAGGTAGTCCAGCCCCAGGAACTCCATCTCCCCTAT 328572

Query 108628 CCCCTCCTCCCTGGCCCACATTGCCCTTGTACTCACGAAGGCAccccccgccccccTTGG 108687 ||||||||||||||||||| ||||||||| |||||| |||||| |||||||||||||||| Sbjct 328571 CCCCTCCTCCCTGGCCCACGTTGCCCTTGCACTCACAAAGGCAGCCCCCGCCCCCCTTGG 328512

Query 108688 TGGTGCCACGTGGTCAGCACGCCCTGCAGATCCTATTGGATGTCAGGTTGTAGGCCTGGT 108747 |||||||||||||||||||||||||||||||||||| |||||||||||||||||||||| Sbjct 328511 TGGTGCCACGTGGTCAGCACGCCCTGCAGATCCTATGGGATGTCAGGTTGTAGGCCTGGC 328452

Query 108748 GGCCATTGTCCCTGCTGGCACCTGTGTGCTCACCTTCCTGGTTGTCTTTGCAGACTGCAG 108807 ||||| || |||||||||||||||||||||||||||||||||| |||||||||||||||| Sbjct 328451 GGCCAGTGCCCCTGCTGGCACCTGTGTGCTCACCTTCCTGGTTATCTTTGCAGACTGCAG 328392

Query 108808 CCAGCCCCTGGACGTGATCCTTCTCCTGGATGGCTCCTCCAGTTTCCCAGCTTCTTATTT 108867 |||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||| Sbjct 328391 CCAGCCCCTGGACGTGATCCTTCTCCTGGATGGCTCCTCCAGTTTCCCAGCTTCTTCTTT 328332

Query 108868 TGATGAAATGAAGAGTTTCGCCAAGGCTTTCATTTCAAAAGCCAATATAGGTGGGTGAGC 108927 |||||||||||||||||| || ||||||||||||||||||||||||||||||| ||||| Sbjct 328331 TGATGAAATGAAGAGTTTTGCTAAGGCTTTCATTTCAAAAGCCAATATAGGTGAGTGAGT 328272

207

Query 108928 GAGGCACCTGAAGCAGCAGGTGACGAAGAGGCTCTTTTTGTGGCTCTACTTGATTCAAAA 108987 |||||||||||||||||||||||||||||||||||||||| ||||| |||||||||| || Sbjct 328271 GAGGCACCTGAAGCAGCAGGTGACGAAGAGGCTCTTTTTGGGGCTCCACTTGATTCAGAA 328212

Query 108988 TAATCCGCATTTTCTCGTTCCGTTTAGGGCCTCGTCTCACTCAGGTGTCAGTGCTGCAGT 109047 |||||| |||||||| ||||||||||||||||| ||||||||||| ||||||||| |||| Sbjct 328211 TAATCCCCATTTTCTTGTTCCGTTTAGGGCCTCCTCTCACTCAGGGGTCAGTGCTCCAGT 328152

Query 109048 ATGGAAGCATCACCACCATTGACGTGCCATGGAACGTGGTCCCGGAGAAAGCCCATTTGC 109107 ||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||| Sbjct 328151 ATGGAAGCATCACCACCATTGACGTGCCATGGAACGTGGCCCCGGAGAAAGCCCATTTGC 328092

Query 109108 TGAGCCTTGTGGACGTCATGCAGCGGGAGGGAGGCCCCAGCCAAATCGGTAACGTTGGTG 109167 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 328091 TGAGCCTTGTGGACGTCATGCAGCGGGAGGGAGGCCCCAGCCAAATCGGTAACGTTGGTG 328032

Query 109168 CCACAGGCTGGATGCAGAAGCTGCATTCTGGTTCTTATTTTTGGCATAAGTGACTGTGTG 109227 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 328031 CCACAGGCTGGATGCAGAAGCTGCATTCTGGTTCTTATTTTTGGCATAAGTGACTGTGTG 327972

Query 109228 ACCTCGGCCAGTCACTTTGCTCCTTGGCCTTAGTTTCTTCTCCTGGAAAGTGAGGGGCTA 109287 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 327971 ACCTCGGCCAGTCACTTTGCTCCTTGGCCTTAGTTTCTTCTCCTGGAAAGTGAGGGGCTA 327912

Query 109288 GATGCTCTTCCACGTCTCTCCAGATCTCAACTGGGTGTTCCTTGGAGTTTCTGAATCATT 109347 |||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||| Sbjct 327911 GATGCTCTTCCACGTCTCTCCAGATCTCAACTGGGTGTTCCTTGGAGTCTCTGAATCATT 327852

Query 109348 CAGCTTTTAAGTGACTTAAGGATCCACCGTTAAGACAGGGTGTCGAGCCGCAGTCAGTAC 109407 ||||||||||||||||||||||||||||| | |||||||||||| ||||||||||||||| Sbjct 327851 CAGCTTTTAAGTGACTTAAGGATCCACCG-TGAGACAGGGTGTCCAGCCGCAGTCAGTAC 327793

Query 109408 TGACTTGGCGTGATCTGTTCTCCATCCTCAGGGGATGCCTTGGGCTTTGCTGTGCGATAC 109467 |||||||| |||||||||||||||||||||||||||||||||||||||||||||||| || Sbjct 327792 TGACTTGGTGTGATCTGTTCTCCATCCTCAGGGGATGCCTTGGGCTTTGCTGTGCGACAC 327733

Query 109468 TTGACTTCAGAAATGCATGGTGCCAGGCCGGGAGCCTCAAAGGCGGTGGTCATCCTGGTC 109527 |||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||| Sbjct 327732 TTGACTTCAGAAATGCATGGTGCCAGGCCGGGAGCCTCGAAGGCGGTGGTCATCCTGGTC 327673

Query 109528 ACGGACGTCTCTGTGGATTCAGTGGATGCAGCAGCTGATGCCGCCAGGTCCAACAGTAAG 109587 ||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 327672 ACGGATGTCTCTGTGGATTCAGTGGATGCAGCAGCTGATGCCGCCAGGTCCAACAGTAAG 327613

Query 109588 AATCTGGTGTACAGTCCTCAATTCAGGAGAGCGAGACTTCAGAAATGATAGAGGATTAAA 109647 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 327612 AATCTGGTGTACAGTCCTCAATTCAGGAGAGCGAGACTTCAGAAATGATAGAGGATTAAA 327553

Query 109648 AGTGGGCTGGGGTTACTTTTGGATGTTGTGGCTGATATTAAATATGCTCAATACCAACCT 109707 ||||| |||||||||||||||||||||||||||||||||||||||||||| ||||||||| Sbjct 327552 AGTGGCCTGGGGTTACTTTTGGATGTTGTGGCTGATATTAAATATGCTCAGTACCAACCT 327493

Query 109708 CTGTCCTGGTTTCAAACAGCAGGAAATTATCCAACCACGATGTGATTAGGAAACCAAACA 109767 |||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||| Sbjct 327492 CTGTCCTGGTTTCAAACAGCAGGAAATTATCCAATCACGATGTGATTAGGAAACCAAACA 327433

Query 109768 ATAAAAAGTGTGTATTGATCCTATATCTTTATAACTAATTATAA-TCTTTAATTTCTCCA 109826 |||| |||||||| |||||||||||||||||||||||||||||| |||||||||||||| Sbjct 327432 ATAATAAGTGTGTGTTGATCCTATATCTTTATAACTAATTATAACCCTTTAATTTCTCCA 327373 Query 109827 CATTTTCCCTTTGTGTCCAGGAGTGATCATACCTCTATGCCCTTTCTTTCTAAAGTAGGA 109886 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 327372 CATTTTCCCTTTGTGTCCAGGAGTGATCATACCTCTATGCCCTTTCTTTCTAAAGTAGGA 327313

Query 109887 TTCGTGGTCTCTTACTAAAATGTCATGTCTGAGGCCAGAGTGATTCATTGTAGGGCCTTG 109946 ||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||| Sbjct 327312 TTCGTGGTCTCTTACTAAAATGTCATGTCTGAGGCCAGAGTGATTCATTGTGGGGCCTTG 327253

Query 109947 AGAACCAATGGGGGCTTTCATGTATTTAGAGTCCTTTCCTGGTACCAGGTAATGGGCTTG 110006 ||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||| Sbjct 327252 AGAACCAATGGGGACTTTCATGTATTTAGAGTCCTTTCCTGGTACCAGGTAATGGGCTTG 327193

208

Query 110007 GCAGGGTTGATGGATGAGAGACAAAGGATCATGCTGTAGGAAAGGGAAGGATCCAGATGT 110066 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 327192 GCAGGGTTGATGGATGAGAGACAAAGGATCATGCTGTAGGAAAGGGAAGGATCCAGATGT 327133

Query 110067 CCAGCCTTGTGCTCCATAAGAACATGTGCTTATCATGTTCTTAATATCAACAGGATGTGA 110126 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 327132 CCAGCCTTGTGCTCCATAAGAACATGTGCTTATCATGTTCTTAATATCAACAGGATGTGA 327073

Query 110127 GCTGGGACAGGTGGTAGAGTCT-AGGGACCACCTCCCAGCTGAAGAAGGTCTGGGTGAGT 110185 |||||||||||||||||||||| |||||||||||||||||||||| |||||||||||||| Sbjct 327072 GCTGGGACAGGTGGTAGAGTCTGAGGGACCACCTCCCAGCTGAAGGAGGTCTGGGTGAGT 327013

Query 110186 GGAGTGACATTGAGTCCCCTCGTGGCCACCTGGAGAACTGTTCATCTTTCAGAACCCTGC 110245 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 327012 GGAGTGACATTGAGTCCCCTCGTGGCCACCTGGAGAACTGTTCATCTTTCAGAACCCTGC 326953

Query 110246 CTAGGAGTGCTGTCAGAACCCTGTCTTTTCTCTGCTGTCTTCCTGATGTTCTTCCTGAGG 110305 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 326952 CTAGGAGTGCTGTCAGAACCCTGTCTTTTCTCTGCTGTCTTCCTGATGTTCTTCCTGAGG 326893

Query 110306 AAGTTATTTGCTCCGTTCTCTATGTCACATATGAGCTACTCAAAATGTGCAAATCGTATT 110365 |||||||||||||||||||||||||||||||||||||||| |||||||||||||| Sbjct 326892 AAGTTATTTGCTCCGTTCTCTATGTCACATATGAGCTACTAAAAATGTGCAAATC----- 326838

Query 110366 GTATTGTATTGTATTTGTAACATATCTATTCCTTGCAAAAAACTGGGAGTTCTTTGAAGG 110425 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 326837 GTATTGTATTGTATTTGTAACATATCTATTCCTTGCAAAAAACTGGGAGTTCTTTGAAGG 326778

Query 110426 CAAGGGAAGATCTTGCTTATTTTTGTATCATCTGTGTATGATGCACAAAATAGGTGACCA 110485 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 326777 CAAGGGAAGATCTTGCTTATTTTTGTATCATCTGTGTATGATGCACAAAATAGGTGACCA 326718

Query 110486 TTTAACGTTTGTTGAGTGAATGAATGTGACATGTTTATGAAATCTATGATTTTAGGGTAA 110545 ||||| ||||||||||||||||||||||||||||||||||||||||| |||||||||||| Sbjct 326717 TTTAATGTTTGTTGAGTGAATGAATGTGACATGTTTATGAAATCTATAATTTTAGGGTAA 326658

Query 110546 AGGACACTAAAAGGGAAAGGATATATATTTGGATAATATAACGAACTTACAAGAGCTGAA 110605 |||||||||||||||||||||||||||||||||| |||||||||||||||||| |||||| Sbjct 326657 AGGACACTAAAAGGGAAAGGATATATATTTGGATGATATAACGAACTTACAAGGGCTGAA 326598

Query 110606 AAGCTGCTAATAGAAAATCCGTTCCAACCTGAAGCACATTGCAAATTCCTAGAACTTTCT 110665 ||||||||| |||||||||| ||||||||||||||||||||||||||||||||||||||| Sbjct 326597 AAGCTGCTAGTAGAAAATCCATTCCAACCTGAAGCACATTGCAAATTCCTAGAACTTTCT 326538

Query 110666 GGAGAGCTGAATTGAGGAATTTCAGGATGGAGATTTTTTCTCCTTCAGTATCAAGAATCC 110725 ||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||| Sbjct 326537 GGAGAGCTGAATTGAGGAATTTCAGGATGGAGATTTTTTCTCCTTTAGTATCAAGAATCC 326478

Query 110726 AAAGAAAGATTTTCACTGGGAAAGAAAGCAGTTGTGGTCTT---GAAGCAGATATGATCC 110782 |||||||||||||||||||||||| |||||||||||||||| |||||||||||||||| Sbjct 326477 AAAGAAAGATTTTCACTGGGAAAGGAAGCAGTTGTGGTCTTGAGGAAGCAGATATGATCC 326418

Query 110783 AGAGCAAGTCATTTAACCGTCCAAAGCCTCGGTTTCTCATGTGTAAATGAGCATGCACAT 110842 |||||| ||||||||||| || |||||||||||||||||||||||||||||||||||||| Sbjct 326417 AGAGCACGTCATTTAACCATCTAAAGCCTCGGTTTCTCATGTGTAAATGAGCATGCACAT 326358

Query 110843 TCTCTGGGCTCAGCCCTGTGATATTTTCTCTTCTCTTCTTACATATACCCCCTTGGTGGT 110902 |||||||||||||||||||||||||||||||||||||||||||||||||| |||||||| Sbjct 326357 TCTCTGGGCTCAGCCCTGTGATATTTTCTCTTCTCTTCTTACATATACCCTGTTGGTGGT 326298

Query 110903 CTAGTTCAGTGTCATGGGTCTAAATACATTTATACATCTTTGACTTTGAAATTCGTATCT 110962 ||||||||||||||||||||| |||| |||||||||||| |||||||||||||||||||| Sbjct 326297 CTAGTTCAGTGTCATGGGTCTGAATATATTTATACATCTCTGACTTTGAAATTCGTATCT 326238

Query 110963 CCAGCCTCACCTCAACTCAGTTTGCATGCTCTAATAGGCAAAACTAAACTcctgattttc 111022 |||||||||||||||||||| ||||||||||||| ||||||||||||||||||||||||| Sbjct 326237 CCAGCCTCACCTCAACTCAGCTTGCATGCTCTAACAGGCAAAACTAAACTCCTGATTTTC 326178

Query 111023 acatctgtctttcttatcttagaaaatggcaattccaaccttccagttgttcagttcaaa 111082 ||||| |||||||||||||||||||||||| ||||||||||||||||||||||||||||| Sbjct 326177 ACATCCGTCTTTCTTATCTTAGAAAATGGCGATTCCAACCTTCCAGTTGTTCAGTTCAAA 326118

209

Query 111083 aagctcagcatccttgattcttcttccttttccctgatacctccatgcaagccacgaaca 111142 |||| |||||||||||||||||||||||||| |||||||||||||||||||||| ||||| Sbjct 326117 AAGCCCAGCATCCTTGATTCTTCTTCCTTTTTCCTGATACCTCCATGCAAGCCATGAACA 326058

Query 111143 aactatagctccatcttcaaaatacttcgagagtcacacctccccacaccatgtccacta 111202 ||||||||||||||||||||||||||| |||||| |||||| ||||||||||||||| | Sbjct 326057 AACTATAGCTCCATCTTCAAAATACTTTGAGAGTTGCACCTCTCCACACCATGTCCACCA 325998

Query 111203 gggtcacctcaatccgagtccccatcatctctcacctggactattgaaatagctgtctaa 111262 |||||||||||||| ||||||||| ||||||||||||||||||||||||||||||||||| Sbjct 325997 GGGTCACCTCAATCTGAGTCCCCAGCATCTCTCACCTGGACTATTGAAATAGCTGTCTAA 325938

Query 111263 ctggaaccccagcttccacctttgtcccctgcagatcattctcagcacagcagctgcagt 111322 |||||| ||||||||||||||||||||||||||| ||||||||||||||||||||||||| Sbjct 325937 CTGGAATCCCAGCTTCCACCTTTGTCCCCTGCAGCTCATTCTCAGCACAGCAGCTGCAGT 325878

Query 111323 ggccctttaatatcaaaatcagctcctgtccctctgctctgtagtcttggctggctttgt 111382 ||||||||| ||||||| ||||| |||||||||||||||||||||||||||||| ||||| Sbjct 325877 GGCCCTTTACTATCAAAGTCAGCCCCTGTCCCTCTGCTCTGTAGTCTTGGCTGG-TTTGT 325819

Query 111383 ttcattcacagaacatggtgagcccttactgtggccaatggagcattgcctgatccatgc 111442 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 325818 TTCATTCACAGAACATGGTGAGCCCTTACTGTGGCCAATGGAGCATTGCCTGATCCATGC 325759

Query 111443 cctttccctctctTGCTTGTCCCTCTCTCCCCAGCCCCCGGGAGCATGcagtcacatttc 111502 |||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||| Sbjct 325758 CCTTTCCCTCTCTTGCTTGTCCCTCTCTCCCCAGCCCCCGGGAGCATGCAGTCACACTTC 325699

Query 111503 tgctctggggcctttgcactcactcttctctctgcttgggatgtccctccctcagacatc 111562 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 325698 TACTCTGGGGCCTTTGCACTCACTCTTCTCTCTGCTTGGGATGTCCCTCCCTCAGACATC 325639

Query 111563 ctcttggctccttccatccttacatcaggtttctgctcagatgccatcttttcaccaaat 111622 ||||||||||||||| |||||||||||||| ||||||||||||||||||||||||||||| Sbjct 325638 CTCTTGGCTCCTTCCCTCCTTACATCAGGTCTCTGCTCAGATGCCATCTTTTCACCAAAT 325579

Query 111623 gcttctccaactgccatatatgcaacgataacccacacctcagccctgggcattctcatc 111682 ||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||| Sbjct 325578 GCTTCTCCAACTGCCATATATACAACGATAACCCACACCTCAGCCCTGGGCATTCTCATC 325519

Query 111683 ccccttatctcattttgtttttttcctaagcatttatccacatctgaaattctatatatt 111742 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 325518 CCCCTTATCTCATTTTGTTTTTTTCCTAAGCATTTATCCACATCTGAAATTCTATATATT 325459

Query 111743 tactttgtttgtgtgtttgttgtctatctctccatgaggacgggggacagggagggactt 111802 ||| ||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 325458 TAC----TTTGTGTGTTTGTTGTCTATCTCTCCATGAGGACGGGGGACAGGGAGGGACTT 325403

Query 111803 tatgtgcttggttcactgctgtacccctattgcttacaatagtacctgacacagagtagc 111862 |||||||||||||||||||||||||||||||||| |||||||||||||||||||||||| Sbjct 325402 TATGTGCTTGGTTCACTGCTGTACCCCTATTGCTGACAATAGTACCTGACACAGAGTAGG 325343

Query 111863 agctcattaatatctgttgactgaaCATCTTCCTCATAGGGCTGATGTATGTGACCAGCC 111922 |||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||| Sbjct 325342 AGCTCATTAATATCTGTTGACTGAACATCTTCCTCATAGGGCTGGTGTATGTGACCAGCC 325283

Query 111923 TGGAAAACATGAGGCTGTATTCAGATGCTGGATATAACGTCAGGCCAGTCCATTTTGAGC 111982 ||||||||| |||||||||||||||||||||||||||||||||||||||||||||||| | Sbjct 325282 TGGAAAACACGAGGCTGTATTCAGATGCTGGATATAACGTCAGGCCAGTCCATTTTGAAC 325223

Query 111983 CTTCTTGCCCACAGATCCTTTCTTGTCTCTTTGCTAACTCTAGGAGTGACAGTGTTCCCT 112042 ||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||| Sbjct 325222 CTTCTTGCCCACAGATCCTTTCTTGTCTCTTTGCTGACTCTAGGAGTGACAGTGTTCCCT 325163

Query 112043 ATTGGAATTGGAGATCGCTACGATGCAGCCCAGCTACGGATCTTGGCAGGCCCAGCAGGC 112102 |||||||||||||||||||||||||||||||||||||||||||||| |||||||||||| Sbjct 325162 GTTGGAATTGGAGATCGCTACGATGCAGCCCAGCTACGGATCTTGGCCGGCCCAGCAGGC 325103

Query 112103 GACTCCAACGTGGTGAAGCTCCAGCGAATCGAAGACCTCCCTACCATGGTCACCTTGGGC 112162 ||||||||| ||||||||||||||| |||||||||||||||||||||||||||||||||| Sbjct 325102 GACTCCAACATGGTGAAGCTCCAGCAAATCGAAGACCTCCCTACCATGGTCACCTTGGGC 325043

210

Query 112163 AATTCCTTCCTCCACAAACTGTGCTCTGGTGAGTCTTATAATACCTTTCTTACTTCCCTC 112222 ||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||| Sbjct 325042 AATTCCTTCCTCCACAAACTGTGCTCTCGTGAGTCTTATAATACCTTTCTTACTTCCCTC 324983

Query 112223 aaaaaaaagttccaaaataaacccgccaaagaaacaaaaaTAGTATTCAAGAGACCCCAG 112282 |||||||||||||||| |||||||||||||||||||||||| ||||||||||||||||| Sbjct 324982 -AAAAAAAGTTCCAAAACAAACCCGCCAAAGAAACAAAAATACTATTCAAGAGACCCCAG 324924

Query 112283 GGTGCCCATGCATAAGATTTGGCCTTGAATGAAGATTATTATATAGACCAGGACCTTTAA 112342 ||||||||||||||||||||||||||||| |||||||||||||||||||||||| || || Sbjct 324923 GGTGCCCATGCATAAGATTTGGCCTTGAAAGAAGATTATTATATAGACCAGGACATTCAA 324864

Query 112343 GTTTCTGTTCATGGGAGGTGTATACCACAAGCCCCCATGGGGCCAGCAGAAAGCTTTCTG 112402 ||||||| |||||||||||||||||||||||||||| ||||||||||||||||||||||| Sbjct 324863 GTTTCTGCTCATGGGAGGTGTATACCACAAGCCCCCGTGGGGCCAGCAGAAAGCTTTCTG 324804

Query 112403 TGTATGGTCCTATTTCCACTCTAATTGTCCATGTGAACTTGGCAGAGAGAGCCTAAGCTT 112462 ||||||||||||||||||||||||||| ||||||||||||||| ||||||||||||||| Sbjct 324803 TGTATGGTCCTATTTCCACTCTAATTGGCCATGTGAACTTGGC--AGAGAGCCTAAGCTT 324746

Query 112463 ATTTGTGTAGAGGGATCTTGGACAAAAGAATCAGAAGACATAAGCAAGGAACCCCAGCGT 112522 |||||||||||||||||||||||||| |||||||||||||||||||||||||||||| || Sbjct 324745 ATTTGTGTAGAGGGATCTTGGACAAAGGAATCAGAAGACATAAGCAAGGAACCCCAGAGT 324686

Query 112523 TCCGTAGGTATAGTCTTCTTCCCCATGGGCCTAGACTAACTCTCAACATGGGTATAAAGG 112582 ||| | || |||||||||||||||||||||||||| |||||||||||||||||||||| Sbjct 324685 TCC--A--TAGAGTCTTCTTCCCCATGGGCCTAGACTCACTCTCAACATGGGTATAAAGG 324630

Query 112583 GCTTTAGAAATACGATAACACGGAGACTCATATCAAAGTACCATAGTTTAAGTTGATTTT 112642 ||||||||||||| |||||| ||||||||||||||||||||||||||||||||||||||| Sbjct 324629 GCTTTAGAAATACAATAACATGGAGACTCATATCAAAGTACCATAGTTTAAGTTGATTTT 324570

Query 112643 AGGTTAGAAacttaaaaaatatgcttttggccaggtgcagtggctcacgtctgtaatccc 112702 ||||||||||||||||||||| ||||||||||||| | |||||||| |||||||||||| Sbjct 324569 AGGTTAGAAACTTAAAAAATACGCTTTTGGCCAGGCACGGTGGCTCATGTCTGTAATCCC 324510

Query 112703 agcactttgggaggccgaggcgggcggatcatgaggtcaggagatcgagaccatcttggc 112762 |||||||||||||||||||||||| |||||| |||||||||||||||||||| ||||||| Sbjct 324509 AGCACTTTGGGAGGCCGAGGCGGGTGGATCACGAGGTCAGGAGATCGAGACCGTCTTGGC 324450

Query 112763 taacacagtgaaaccctgtctctactaaaaatacaaaaaattagccgggcgtggtggcgg 112822 |||||||||||| ||| | |||||||||| |||||||||||||||||||| ||||||||| Sbjct 324449 TAACACAGTGAATCCCCGCCTCTACTAAAGATACAAAAAATTAGCCGGGCATGGTGGCGG 324390

Query 112823 gcacctgtagtaccagctacttgggaggctgaggcaggagaatggtgtgaacccgggagg 112882 ||||||||||||||||||||||||||||||||||||||||||||| |||||||| ||||| Sbjct 324389 GCACCTGTAGTACCAGCTACTTGGGAGGCTGAGGCAGGAGAATGGCGTGAACCCAGGAGG 324330

Query 112883 cagagcttgcagtgagccgagatcgcgccactgtactccagcctgggcaacagagtgaga 112942 | ||||||||||||||||||||||||| ||||| |||||||||||||| |||||| ||| Sbjct 324329 CGGAGCTTGCAGTGAGCCGAGATCGCGTCACTGCACTCCAGCCTGGGCGACAGAGCAAGA 324270

Query 112943 ctccatctc-aaa-a-a-a-a-a-a-a-a-a-a-a-a-atacacacacacacacacacac 112988 ||| |||| ||| | | | | | | | | | | | | | |||||||||||||||||||| Sbjct 324269 CTCTGTCTCAAAACACACACACACACACACACACACACACACACACACACACACACACAC 324210

Query 112989 --gctttttttGTGGCGGG-GGCCTGGGTTTGTATATTTTCCCGTTACTAGATGTAAGTC 113045 |||||||||| || ||| ||||||||||| |||||||||||||||||||||||||||| Sbjct 324209 ATGCTTTTTTTG-GG-GGGAGGCCTGGGTTTATATATTTTCCCGTTACTAGATGTAAGTC 324152

Query 113046 AAAACCTGCATAAAGCTACTGTCCTTCGGGGGAATAAGTCAATGCAAGTTTGCCCTTAAA 113105 ||||||||||||||||||||||| ||| ||||||||||||| |||||||||||||||||| Sbjct 324151 AAAACCTGCATAAAGCTACTGTCTTTCAGGGGAATAAGTCAGTGCAAGTTTGCCCTTAAA 324092

Query 113106 GGGCAATAACTCTATGCAAGTTTTGACTTATAGCTAATAACATTAGCTGTACAGAGAGAT 113165 |||||||||||||| |||||||||||||||||| ||| ||||||||||||||||||||| Sbjct 324091 GGGCAATAACTCTACACAAGTTTTGACTTATAGCAAATCACATTAGCTGTACAGAGAGAT 324032

Query 113166 GGCAGCTCTCCTGGTAGGAATCTTCAAGTAGATCTCTTTCAGGTTTCCAGGATCTTGCTT 113225 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 324031 GGCAGCTCTCCTGGTAGGAATCTTCAAGTAGATCTCTTTCAGGTTTCCAGGATCTTGCTT 323972

211

Query 113226 CATCTCCCCACCTTCCCCATCCCTGGCGTGATCTACATGTGAACCAAGATAATGACAGCG 113285 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 323971 CATCTCCCCACCTTCCCCATCCCTGGCGTGATCTACATGTGAACCAAGATAATGACAGCG 323912

Query 113286 TAAGCTGTAGTTATTGCCATATTATCGCTGTTGTTGGCATCATAATTATTAATAACTGCA 113345 |||||||||||||||||||||||||||||||||||||||||||||||||||| ||| ||| Sbjct 323911 TAAGCTGTAGTTATTGCCATATTATCGCTGTTGTTGGCATCATAATTATTAACAACCGCA 323852

Query 113346 GAGCATGTCTGAAGAACCACAGGATGACCACCTCAGCCTCATGTCCCTATGTCTCCACTG 113405 |||||||||||||||||||||||||||||||||||||| |||||||||| |||||||||| Sbjct 323851 GAGCATGTCTGAAGAACCACAGGATGACCACCTCAGCCACATGTCCCTAGGTCTCCACTG 323792

Query 113406 TTAACCTTGTTCAGATTCTTTTCAGAGTTGAGTTGACTTCAAAAACTAGACCAGGTTGCT 113465 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 323791 TTAACCTTGTTCAGATTCTTTTCAGAGTTGAGTTGACTTCAAAAACTAGACCAGGTTGCT 323732

Query 113466 TAAGCAGACATTGTGAATGGTTCAGAATTTCTGGGTGAAAGATGGGAACTAAGGTCTTAT 113525 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 323731 TAAGCAGACATTGTGAATGGTTCAGAATTTCTGGGTGAAAGATGGGAACTAAGGTCTTAT 323672

Query 113526 TTGTGTCTGTTGCAGGATTTGTTAGGATTTGCATGGATGAGGATGGGAATGAGAAGAGGG 113585 ||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||| Sbjct 323671 TTGTGTCTGTTGCAGGATTTGTTAGGATTCGCATGGATGAGGATGGGAATGAGAAGAGGG 323612

Query 113586 TAAGTTCCTTTCTGTTGACTTTGAAAGAAAGGTTAGAGATGTGTTTGGGGCTCTTGTTCC 113645 ||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||| Sbjct 323611 TAAGTTCCTTTCTGTTGACTTTGAAAGAAAGATTAGAGATGTGTTTGGGGCTCTTGTTCC 323552

Query 113646 CACTGGTTAATTTTTCCTCCTTTGGTCTTAGTCCAGTGCTTCCTTTTACTATTATCTTGT 113705 ||||| |||||||||||||||||||||||||||||| ||||||| ||||||||||||||| Sbjct 323551 CACTGTTTAATTTTTCCTCCTTTGGTCTTAGTCCAGCGCTTCCTCTTACTATTATCTTGT 323492

Query 113706 TTTTGCGGGTCCATCTGTACATCTTGTGTTTTGCTTCCTGTCTCATGTACAGGGGGCCTC 113765 ||||| ||||||||||||||||||||||||||||||||||||||||||||||| |||||| Sbjct 323491 TTTTGAGGGTCCATCTGTACATCTTGTGTTTTGCTTCCTGTCTCATGTACAGGAGGCCTC 323432

Query 113766 CTTGCTGTGTAGGCCTGTGTTCAATTCTAGGGGTCAGTTGTCTGGCAGATGGGCTTAGAG 113825 ||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||| Sbjct 323431 CTTGCTGTGTAGGCCCGTGTTCAATTCTAGGGGTCAGTTGTCTGGCAGATGGGCTTAGAG 323372

Query 113826 TTGGAGTACCTCATCTTATTCCCTGCCTGAATCTGCTGTTTTCTTCTGCAGCCCGGGGAC 113885 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 323371 TTGGAGTACCTCATCTTATTCCCTGCCTGAATCTGCTGTTTTCTTCTGCAGCCCGGGGAC 323312

Query 113886 GTCTGGACCTTGCCAGACCAGTGCCACACCGTGACTTGCCAGCCAGATGGCCAGACCTTG 113945 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 323311 GTCTGGACCTTGCCAGACCAGTGCCACACCGTGACTTGCCAGCCAGATGGCCAGACCTTG 323252

Query 113946 CTGAAGAGTCATCGGGTCAACTGTGACCGGGGGCTGAGGCCTTCGTGCCCTAACAGCCAG 114005 |||||||||||||||||||||||||||||||||| ||||||||| ||||||||||||||| Sbjct 323251 CTGAAGAGTCATCGGGTCAACTGTGACCGGGGGCCGAGGCCTTCATGCCCTAACAGCCAG 323192

Query 114006 TCCCCTGTTAAAGTGGAAGAGACCTGTGGCTGCCGCTGGACCTGCCCCTGTGAGTCCTTT 114065 |||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||| Sbjct 323191 TCCCCTGTTAAAGTGGAAGACACCTGTGGCTGCCGCTGGACCTGCCCCTGTGAGTCCTTT 323132

Query 114066 GCTTCTCCAGCCAGGGCAGCGTCAAAGGGGCAGTGCTTTTAGCTTGGCTGTGCAGAAAAG 114125 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 323131 GCTTCTCCAGCCAGGGCAGCGTCAAAGGGGCAGTGCTTTTAGCTTGGCTGTGCAGAAAAG 323072

Query 114126 TAGAGCAGGCACCCACCAGCCCAGAAGTACCCTTTCCCTCATCACCACATGCACAGTGCT 114185 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 323071 TAGAGCAGGCACCCACCAGCCCAGAAGTACCCTTTCCCTCATCACCACATGCACAGTGCT 323012

Query 114186 ACCTTCACTCACCTTCCTTTCCTCCTGTGCTCTTTGGACATGCATGCAGCCAGTCTCAGG 114245 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 323011 ACCTTCACTCACCTTCCTTTCCTCCTGTGCTCTTTGGACATGCATGCAGCCAGTCTCAGG 322952

Query 114246 GATCACTGCCCTCTTTCTCTGTCTTTGGAAGGCACTTCCCCAGATTATGCATAACTGGAA 114305 ||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| Sbjct 322951 GATCACTGCCCTCTTTCTCTGTCTTTGGAAGGCACTTCCCCAGATTATGCATAACCGGAA 322892

212

Query 114306 GGAAGAATTGCTTTTCTGAGGTCAATGCTCAGCTTGGCTGTTGGCAAGTCAACCTTTAGG 114365 |||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||| Sbjct 322891 GGAAGAATTGCTTTTCTGAGGTCAGTGCTCAGCTTGGCTGTTGGCAAGTCAACCTTTAGG 322832

Query 114366 AATCTGTGTATTCAGGGTATAGCAGTGGAAGTATAGCAGTGAGAGAGACGCTTCCAGAAA 114425 ||||| |||||||||||||||||||||||||||||||||||||||||| ||||||||||| Sbjct 322831 AATCTATGTATTCAGGGTATAGCAGTGGAAGTATAGCAGTGAGAGAGATGCTTCCAGAAA 322772

Query 114426 ACTCTCCAACCACACCTATAGCAACAGTACATCACCAAGGCGGCCTGGATTAAGAGGAGG 114485 |||||||||||||||||||||||| ||||||||||||||| |||||| |||||||||||| Sbjct 322771 ACTCTCCAACCACACCTATAGCAAGAGTACATCACCAAGGTGGCCTGAATTAAGAGGAGG 322712

Query 114486 AAATATTTAAAGTCCATCTCATGTACACTTGTGGAGAGCATGTCATCCATGGGCTGGGTT 114545 ||||||||||||||||||||| ||||||||||||||||||| |||||||||||||||||| Sbjct 322711 AAATATTTAAAGTCCATCTCACGTACACTTGTGGAGAGCATTTCATCCATGGGCTGGGTT 322652

Query 114546 AGCCTTCTGTCTCATGTTTGGGAGCAATGACTGCTTATGGTGACCATGACTCAGCAGTGG 114605 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 322651 AGCCTTCTGTCTCATGTTTGGGAGCAATGACTGCTTATGGTGACCATGACTCAGCAGTGG 322592

Query 114606 ATtgtaacctttatgctgaacaggcataactggtgggatgcaatcaaaactgtat-ttat 114664 |||||||| |||||||||||||||||||||||||||||||||||||||||||||| |||| Sbjct 322591 ATTGTAAC-TTTATGCTGAACAGGCATAACTGGTGGGATGCAATCAAAACTGTATGTTAT 322533

Query 114665 agctcattccccctttagaatttatttagtgagataaatttccctcttggtctaggtgaa 114724 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 322532 AGCTCATTCCCCCTTTAGAATTTATTTAGTGAGATAAATTTCCCTCTTGGTCTAGGTGAA 322473

Query 114725 aactccaccagactttaggtttcttctgtttattcataaagaattctatagttactttgt 114784 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 322472 AACTCCACCAGACTTTAGGTTTCTTCTGTTTATTCATAAAGAATTCTATAGTTACTTTGT 322413

Query 114785 cagaagacaaagtctcagttgataactagatacccactcttgcgtaataatcaaaacttt 114844 ||||||||||||||||||||||||| |||||||||||||||| ||||||||||||||||| Sbjct 322412 CAGAAGACAAAGTCTCAGTTGATAATTAGATACCCACTCTTGTGTAATAATCAAAACTTT 322353

Query 114845 gaatttgattagagttaaaacacctcccAGTTCCAATTTAG-GCTGCCTGCCTCTGGGAA 114903 ||||||||||||||||||||||||||||||||||||||| | |||||||||||||||||| Sbjct 322352 GAATTTGATTAGAGTTAAAACACCTCCCAGTTCCAATTT-GCGCTGCCTGCCTCTGGGAA 322294

Query 114904 GGAGGGTATGGTCAGCTTTCCTCTGTACAAAGAGCTttgccttttgtgattaaaagcata 114963 ||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||| Sbjct 322293 GGAGGGTATGGTCAGCTTTCCTCTGTACAAAGAGCATTGCCTTTTGTGATTAAAAGCATA 322234

Query 114964 tcctggaaattactcgatgtcatttcaaagagagcttcctaatacttcactgtgtaccat 115023 ||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||| Sbjct 322233 TCCTGGAAATTACCCGATGTCATTTCAAAGAGAGCTTCCTAATACTTCACTGTGTACCAT 322174

Query 115024 aggttacccaactgttcttctatggatgtgcgtttaggctgtttccagtgttttgtaatg 115083 ||||||||||||||||||||||||||||||| ||||||||||||||||||| |||||||| Sbjct 322173 AGGTTACCCAACTGTTCTTCTATGGATGTGCATTTAGGCTGTTTCCAGTGTCTTGTAATG 322114

Query 115084 acaagcaatgctgcaatgagtaaacctgaacatacgaacgaattttcatatggttggagg 115143 |||||||||||||||||||||||||||| ||||||||| || ||||||||||||||| Sbjct 322113 ACAAGCAATGCTGCAATGAGTAAACCTGCACATACGAA----TTGTCATATGGTTGGAGG 322058

Query 115144 tgtgtcttcagggtagattcctagaagtgagattcttgggtcaaaagataagtgcatatg 115203 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 322057 TGTGTCTTCAGGGTAGATTCCTAGAAGTGAGATTCTTGGGTCAAAAGATAAGTGCATATG 321998

Query 115204 aagttccccttagacattcccagattctcctctaacaggtgagatgtggttgtgcaaaac 115263 |||||||||||||| ||||||||||||||||||||||||||||||||||||||||||||| Sbjct 321997 AAGTTCCCCTTAGATATTCCCAGATTCTCCTCTAACAGGTGAGATGTGGTTGTGCAAAAC 321938

Query 115264 tggaatgtccatcagtttggttttcgatcagcgaggtatgagagtgcatgcatctccacg 115323 ||||||||||||||||||| |||||||||||||||||||||||||||| || |||||||| Sbjct 321937 TGGAATGTCCATCAGTTTGCTTTTCGATCAGCGAGGTATGAGAGTGCAAGCTTCTCCACG 321878

Query 115324 gccttgctgagtgtgttgttacagtttttatttattttttttgcaaatttcttacttttc 115383 |||||||| |||| ||||||||||| ||| ||| |||||||| ||||||||||||||||| Sbjct 321877 GCCTTGCTCAGTGCGTTGTTACAGTGTTT-TTT-TTTTTTTT-CAAATTTCTTACTTTTC 321821

213

Query 115384 aggaaagaaatagtatctcagtgtacttgaaattcatttctctaatgatgagtgaggtta 115443 |||||||||||||||||| |||||||||||||||||||||| ||||||||||||||| || Sbjct 321820 AGGAAAGAAATAGTATCTGAGTGTACTTGAAATTCATTTCTGTAATGATGAGTGAGGGTA 321761

Query 115444 aacattttatcatatgtttaattgccattttttggtctaattttgtgaattttctgctca 115503 |||||||| ||||||||||||||||||||||||||||||||||| ||||||||||||||| Sbjct 321760 AACATTTT-TCATATGTTTAATTGCCATTTTTTGGTCTAATTTTATGAATTTTCTGCTCA 321702

Query 115504 tgcctttccccattttcccattagaaaagttgtgataagaaacacatgagatctacccta 115563 |||||||||||||||||||||||||| || |||||||| ||||||||||||||||||||| Sbjct 321701 TGCCTTTCCCCATTTTCCCATTAGAAGAGCTGTGATAAAAAACACATGAGATCTACCCTA 321642

Query 115564 ttaataaattttccagtgtacaataccatattgttagctatttgcaagttgttgtacagc 115623 |||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||| Sbjct 321641 TTAATAAATTTTCCAGTGTACAATACAATATTGTTAGCTATTTGCAAGTTGTTGTACAGC 321582

Query 115624 agatctctagaagtttttgttttgtgtgactgaaactatgcccactgaacagcaacttcc 115683 ||| || ||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 321581 AGACCTTTAGAAGTTTTTGTTTTGTGTGACTGAAACTATGCCCACTGAACAGCAACTTCC 321522

Query 115684 cattttccccttcccctagcccccgcaaccaccattctgctttttgcttctatgaatttg 115743 |||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||| Sbjct 321521 CATTTTCCCCTTCCCCTAGCCCCCACAACCACCATTCTGCTTTTTGCTTCTATGAATTTG 321462

Query 115744 acgattttagatacctcctgtaagtggattcatgcagtagttatccttctgtgactggct 115803 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 321461 ACGATTTTAGATACCTCCTGTAAGTGGATTCATGCAGTAGTTATCCTTCTGTGACTGGCT 321402

Query 115804 catttcacttagcttaacatcctccaagttcatctatgttgtggcatatgaaaggatttc 115863 ||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||| Sbjct 321401 CATTTCACTTAGCTTAATGTCCTCCAAGTTCATCTATGTTGTGGCATATGAAAGGATTTC 321342

Query 115864 cttctttctaaagactgaatagtattccattgcatgtatataccacattttctttatcca 115923 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 321341 CTTCTTTCTAAAGACTGAATAGTATTCCATTGCATGTATATACCACATTTTCTTTATCCA 321282

Query 115924 tttgtccactgatgtgcatttggattatttctaactcttggctattgtgaataatgttgc 115983 |||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||| Sbjct 321281 TTTGTCCACTGATGTGCATTTGGATTGTTTCTAACTCTTGGCTATTGTGAATAATGTTGC 321222

Query 115984 agtgaatgtggaagtgtataagtacctttttgagatcctgatttcaatgcttttggatga 116043 |||||| ||||||||||| | ||| ||||||||||||||||||||||||||||||||||| Sbjct 321221 AGTGAACGTGGAAGTGTA-A-GTATCTTTTTGAGATCCTGATTTCAATGCTTTTGGATGA 321164

Query 116044 atacccagaagtgggattactgggtcatatgctaatactagttttaattttttgaggaac 116103 |||||||||||||||||| | ||||||||||||||||| |||||||||||| |||||||| Sbjct 321163 ATACCCAGAAGTGGGATTGCGGGGTCATATGCTAATACCAGTTTTAATTTTCTGAGGAAC 321104

Query 116104 cttcttactgttttctgtagtcgcgacaccactttacattcctaccatcaatgcacaagg 116163 |||||||| |||||||||||| || ||||||||||||||||||||||||||||||||||| Sbjct 321103 CTTCTTACCGTTTTCTGTAGTGGCTACACCACTTTACATTCCTACCATCAATGCACAAGG 321044

Query 116164 gtttcaatttcttcacattcttgttaacacgtggtatttcctggtgtgtgtgtgtgtgtg 116223 |||||| ||||||||||||||||||||||| ||||||||||||||||||||||||||||| Sbjct 321043 GTTTCAGTTTCTTCACATTCTTGTTAACACTTGGTATTTCCTGGTGTGTGTGTGTGTGTG 320984

Query 116224 tgtgtgtgtgtgtgtgtgtgtttgataatggccattctagcacgtatgaggtgatacctc 116283 ||||||||||||||||| | | ||||||||||||||||||||||||||||| ||||||| Sbjct 320983 TGTGTGTGTGTGTGTGT-T-T--GATAATGGCCATTCTAGCACGTATGAGGTAATACCTC 320928

Query 116284 attatggttttgatttgcatttccctgatgatcagtaatgttggatattttttcatataa 116343 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 320927 ATTATGGTTTTGATTTGCATTTCCCTGATGATCAGTAATGTTGGATATTTTTTCATATAA 320868

Query 116344 ttgagtttttggatgtctctttggaggaatgtcttttcaagtcctttgctcatttttaaa 116403 |||| ||||| ||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 320867 TTGACTTTTTAGATGTCTCTTTGGAGGAATGTCTTTTCAAGTCCTTTGCTCATTTTTAAA 320808

Query 116404 ttgggttatttgcattttttttttttttgctattgagttgtatgagttcctaatatattt 116463 ||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||| Sbjct 320807 TTGGGTTATTTGCATTTTTTTTTTTTTTGCTATTGAGTTGTGTGAGTTCCTAATATATTT 320748

214

Query 116464 tggatattaaccctttatcaaatgtacggtttgcaaatattttcttccattctgtaggct 116523 |||||||||||| ||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 320747 TGGATATTAACCTTTTATCAAATGTACGGTTTGCAAATATTTTCTTCCATTCTGTAGGCT 320688

Query 116524 gcctttttattctgttgattgtttctttggctgcacagaagctttttggtttgctgtagt 116583 ||||||||||||||||||||||||||||||||||||||||||||||||||||| |||||| Sbjct 320687 GCCTTTTTATTCTGTTGATTGTTTCTTTGGCTGCACAGAAGCTTTTTGGTTTGATGTAGT 320628

Query 116584 cccacttgtctatttttgcttttgttgccttgtttttggcaatgatatccaagaaatcgt 116643 ||||||||||||||||||||||||||||||||||||||| || ||||||||||||||||| Sbjct 320627 CCCACTTGTCTATTTTTGCTTTTGTTGCCTTGTTTTTGGTAACGATATCCAAGAAATCGT 320568

Query 116644 cgccaagaccaatgttgcaaagttcttgtgttttcttttaggagttttataatttcaggt 116703 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 320567 CGCCAAGACCAATGTTGCAAAGTTCTTGTGTTTTCTTTTAGGAGTTTTATAATTTCAGGT 320508

Query 116704 tttacgtatcagtgtttaagtttgattttgggttaactttgtgtatgatataaggtaagg 116763 |||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||| Sbjct 320507 TTTACGTATCAGTGTTTAAGTTTGATTTTGGGTTAACTTTGTGTATGATATAAGATAAGG 320448

Query 116764 gtccaatttcattctttttcatgtggacaggttcattgttagtgtattgaaacaaatggt 116823 |||||| ||||||||||||||||||||||||| |||||||||||||| ||||||||||| Sbjct 320447 GTCCAACTTCATTCTTTTTCATGTGGACAGGT-CATTGTTAGTGTATAAAAACAAATGGT 320389

Query 116824 tttgtatgttgattttgtattctgcaactgaatttgttttaacttcgctgaatttgctaa 116883 ||| ||||||||||||||||||||||||||||||||||| ||||| |||||||||||||| Sbjct 320388 TTTCTATGTTGATTTTGTATTCTGCAACTGAATTTGTTTCAACTTTGCTGAATTTGCTAA 320329

Query 116884 caaatttttggtgaaatctttagagttttctatgcatatgatcatgtcatctgcaaatgg 116943 |||||||||||||||||||||||||||||||| |||| ||| ||||||||||||||| | Sbjct 320328 CAAATTTTTGGTGAAATCTTTAGAGTTTTCTACACATAAGATTATGTCATCTGCAAATTG 320269

Query 116944 agataattttacttcttttttgatttgggtgatgttcatatcttcttgattcattgctct 117003 |||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||| Sbjct 320268 AGATAATTTTACTTCTTTTTTGATTTGGGTGATGTTTATATCTTCTTGATTCATTGCTCT 320209

Query 117004 agctaagacttctcgtactatgttgaatagaagtggtgagagtgggcatctttgccgtct 117063 ||||||||||||| |||||||||||||||||||||||||||||||||||| |||| |||| Sbjct 320208 AGCTAAGACTTCTGGTACTATGTTGAATAGAAGTGGTGAGAGTGGGCATCCTTGCTGTCT 320149

Query 117064 tcctgattagaggaaaagcttttagttttgcactattcggtgtgatgttagctgtggtat 117123 |||||||||||||||||||||||||||||||||||||| | ||||||||||||||||||| Sbjct 320148 TCCTGATTAGAGGAAAAGCTTTTAGTTTTGCACTATTCAG-GTGATGTTAGCTGTGGTAT 320090

Query 117124 tttcattaatggcctttattttgttgaggtaattttcttctattcttagtttgttgaatg 117183 |||||||||||||||| ||||||||||||||||||||||||||||||||||||||||||| Sbjct 320089 TTTCATTAATGGCCTTAATTTTGTTGAGGTAATTTTCTTCTATTCTTAGTTTGTTGAATG 320030 Query 117184 ttttaatcatgaaatggtgtttgaattatgacaaatgctttatctgtatctgttgacatg 117243 ||||||||||||||||||||||||||| |||||||||||||||||| |||| |||||||| Sbjct 320029 TTTTAATCATGAAATGGTGTTTGAATTTTGACAAATGCTTTATCTGCATCTATTGACATG 319970

Query 117244 atagtatgatttttatcctttgttctgttaatgctatgtatcacctttattgatttttag 117303 | ||||||||||||||||||||| ||||||||||| |||||||| |||| |||||||||| Sbjct 319969 ACAGTATGATTTTTATCCTTTGTCCTGTTAATGCTGTGTATCACATTTACTGATTTTTAG 319910

Query 117304 acgttgagccattctcctatctgagggataaatcccactccgtcagagtgtattatcctt 117363 | | ||||||||||||| |||||||||||||||||||||| ||||||||| |||||| Sbjct 319909 ATGCTGAGCCATTCTCCCATCTGAGGGATAAATCCCACTC----AGAGTGTATGATCCTT 319854

Query 117364 tgattgtgctgtttaattcagtttgctgtcattttgttgagaatttttgcatctgtgttc 117423 ||||||||||||| ||||||||||||| |||||||||||||||||||||||||| ||||| Sbjct 319853 TGATTGTGCTGTTGAATTCAGTTTGCTATCATTTTGTTGAGAATTTTTGCATCTATGTTC 319794

Query 117424 atcaggaatattgacttgtagttttttttaatgtaaagtctttgactttggtatcagggt 117483 ||||||||||||||||||||||||| ||||| |||||||||||| ||||||||||||||| Sbjct 319793 ATCAGGAATATTGACTTGTAGTTTTCTTTAACGTAAAGTCTTTGGCTTTGGTATCAGGGT 319734

Query 117484 aattctggcttcataaagtaggtttcccacttcagtgttttggaagagtgtgagaaagat 117543 ||||||||||||| | | |||||||||||| |||||||||||||| ||||||||| Sbjct 319733 AATTCTGGCTTCA----G--G-TTTCCCACTTCAATGTTTTGGAAGAGTTTGAGAAAGAC 319681

215

Query 117544 tggcattaattcttctttaaatctttggtagaattatccagtgaaagcatctgagctctt 117603 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 319680 TGGCATTAATTCTTCTTTAAATCTTTGGTAGAATTATCCAGTGAAAGCATCTGAGCTCTT 319621

Query 117604 atttattgagagattatgattactaatttcatctccttactaat-gatggggctgttcag 117662 ||| ||| ||||||||||||| |||||| |||||||||||||| | ||||||||||||| Sbjct 319620 CTTTGTTGGGAGATTATGATTATTAATTTGATCTCCTTACTAATTG-TGGGGCTGTTCAG 319562

Query 117663 attttctgtttattcatcattcaggcttggtaggttgtatgtttcagggatttattcatt 117722 |||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||| Sbjct 319561 ATTTTCTGTTTATTCATCATTCAGTCTTGGTAGGTTGTATGTTTCAGGGATTTATTCATT 319502

Query 117723 tctttaggttatccaatttgttagtgtgtagttgttcatagtagtctcttactaaccttt 117782 |||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||| Sbjct 319501 TCTTTAGGTTATCCAATTTGTTAGTGTGTAGTTGTTCATAGTAGTCTCTTACTAGCCTTT 319442

Query 117783 ttatttctgcgctatcaattgtaatgtaagctcttctatttctgattttatttatttgag 117842 ||||||||| ||||||||||||||| ||| |||||||||||||||||||||||||||||| Sbjct 319441 TTATTTCTGTGCTATCAATTGTAATATAACCTCTTCTATTTCTGATTTTATTTATTTGAG 319382

Query 117843 tcttctctttttgtcttggttaatctagctaaagctttgttaattttgttgacctttaaa 117902 |||| | |||| |||||||||||||||||||||| ||||||||||||||||||||||||| Sbjct 319381 TCTT-T-TTTT-GTCTTGGTTAATCTAGCTAAAGATTTGTTAATTTTGTTGACCTTTAAA 319325

Query 117903 aaaccaattctaagttttattgatattttgtattctatatttcatttatttttcctcaaa 117962 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 319324 AAACCAATTCTAAGTTTTATTGATATTTTGTATTCTATATTTCATTTATTTTTCCTCAAA 319265

Query 117963 tctttattatttccttccttctgctaacttacggcttagtttattcattttctagttc-t 118021 |||| | ||||||||||||||||||||||| |||||||||||||||||||||||||| | Sbjct 319264 TCTT-A--ATTTCCTTCCTTCTGCTAACTTAGGGCTTAGTTTATTCATTTTCTAGTTCCT 319208

Query 118022 tacattgtaaaattaggttatttgtttgagatctttcttactgtagacatttactgctgt 118081 || ||||||||||||||| ||||||||||||||||||||||||||||||||||||| Sbjct 319207 TATGTTGTAAAATTAGGTT----GTTTGAGATCTTTCTTACTGTAGACATTTACTGCTGT 319152

Query 118082 aaactttccttttattgcttttgctgtttcccataagctttgctatgttgtgctttagtt 118141 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 319151 AAACTTTCCTTTTATTGCTTTTGCTGTTTCCCATAAGCTTTGCTATGTTGTGCTTTAGTT 319092

Query 118142 tttgtttttctcaaaatattttctgatttttcttttgatttcttctttgactcattggtt 118201 ||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||| Sbjct 319091 TTTGTTTTTCTCAAAATATTTTCTGATTTTTCTTTTGATTTCTTCTTTGACCCATTGGTT 319032

Query 118202 gttcaagagtgttttttaatttccacatatttgagaactttctgaaattatttctcttag 118261 ||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| Sbjct 319031 GTTCAAGAGTGTTTTTTAATTTCCACATATTTGAGAACTTTCTGAAATTATTTCTATTAG 318972

Query 118262 caatgtctcattttattctgttgtcagaaaagatacttggtatgattttagcctccttaa 118321 |||||||||||| || || ||||||||||||||||||||||||||||| |||||||| Sbjct 318971 CAATGTCTCATTCCGTTGTG--GTCAGAAAAGATACTTGGTATGATTTTAGTCTCCTTAA 318914

Query 118322 atttgttaagatttgtttcgtgggctagcatgtgagctattctggagactgttctatgtg 118381 |||||||||||||||||| |||||||||||||||| |||||||||||||||||||||||| Sbjct 318913 ATTTGTTAAGATTTGTTTTGTGGGCTAGCATGTGAACTATTCTGGAGACTGTTCTATGTG 318854

Query 118382 tgtttgagaagaatgtataatttgctgttgttgggtagaatgttctgtatttgtttgata 118441 |||||||| |||||||||||||||||||||||||||||||| |||||||||||||||||| Sbjct 318853 TGTTTGAGGAGAATGTATAATTTGCTGTTGTTGGGTAGAATATTCTGTATTTGTTTGATA 318794

Query 118442 ggtccacttgggccatagtgttgtttgagttctctgtttccttattgatcttctatctgg 118501 ||||||||||||||||||||||||||||||| |||||||||||||||||||||| ||||| Sbjct 318793 GGTCCACTTGGGCCATAGTGTTGTTTGAGTTATCTGTTTCCTTATTGATCTTCTGTCTGG 318734

Query 118502 attttttatccattattgaaagtgagatattaaaatctctcactattatgttgttctata 118561 |||||||||||||||||||||||||||||||| ||| ||||||||||||||||||||||| Sbjct 318733 ATTTTTTATCCATTATTGAAAGTGAGATATTACAATTTCTCACTATTATGTTGTTCTATA 318674

Query 118562 tttatcatcacttctgtcaatatttgcttcttatatttgtgtgctttaatgttgggtgca 118621 |||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||| Sbjct 318673 TTTATCATCACTTCTGTCAACATTTGCTTCTTATATTTGTGTGCTTTAATGTTGGGTGCA 318614

216

Query 118622 atgtacttata-t-----cttcctggtgaattgatgcatttatcactatataatgtccgt 118675 ||||| ||||| | ||||||| |||||||||||||||||||||||||||||||||| Sbjct 318613 ATGTATTTATAATTATATCTTCCTGATGAATTGATGCATTTATCACTATATAATGTCCGT 318554

Query 118676 ttttttgtctcttgtgaccatctttgacttgaagtttaatttctctaagtatggctacct 118735 |||||||||||||||||||||||||||||| ||||||||||| ||||||||||||||||| Sbjct 318553 TTTTTTGTCTCTTGTGACCATCTTTGACTTAAAGTTTAATTTGTCTAAGTATGGCTACCT 318494

Query 118736 gtgctgtcttttagttaacattggcatgaaatatctttttccatcttctcactttaaacc 118795 || | || ||||||||||||||||||||||||||||||||||||||||||| |||| Sbjct 318493 ---CT-T-TT--AGTTAACATTGGCATGAAATATCTTTTTCCATCTTCTCACTTTCAACC 318441

Query 118796 tatgtgcattcttaaatctaaagtgagtctctctcttgtagacagcacatagttggatct 118855 |||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||| Sbjct 318440 TATGTGCATTCTTAAATCCAAAGTGAGTCTCTCTCTTGTAGACAGCACATAGTTGGATCT 318381

Query 118856 tgttttatattt-t--atttatttatttactta---a-tttatttagatggagtcttgct 118908 |||||||||||| | ||||||||||||| ||| | |||||||||||||||||| ||| Sbjct 318380 TGTTTTATATTTATTTATTTATTTATTTATTTATTTAATTTATTTAGATGGAGTCTGGCT 318321

Query 118909 ctgttgcccaggctggagtgcaatggctcgatctcggctcactgcaacctctgcctcccg 118968 ||||||||||||||||||||||||||||||||||||||||||||||| || ||||||||| Sbjct 318320 CTGTTGCCCAGGCTGGAGTGCAATGGCTCGATCTCGGCTCACTGCAATCTATGCCTCCCG 318261

Query 118969 ggttcaagtgattctcctccctcagcctcctaagtagctggaactacaagcacgtgccac 119028 |||||||||||||||||||||||||||||||||||||||||||||| |||| | |||||| Sbjct 318260 GGTTCAAGTGATTCTCCTCCCTCAGCCTCCTAAGTAGCTGGAACTAGAAGCGCATGCCAC 318201

Query 119029 ca-tgcctggctaattttttgtatttttagtagagacggggtttcaccatgttagccagg 119087 || | |||||||||||||||||||||||||||||||||||||||||| ||||||||||| Sbjct 318200 CACT-CCTGGCTAATTTTTTGTATTTTTAGTAGAGACGGGGTTTCACTGTGTTAGCCAGG 318142

Query 119088 atggtctcgatctcctgatctctcatgatctgcctacctcagcctcccaaagtgctggga 119147 ||||||| ||||||||||||||||||||||||||||||||||| |||||||||||||||| Sbjct 318141 ATGGTCTTGATCTCCTGATCTCTCATGATCTGCCTACCTCAGCGTCCCAAAGTGCTGGGA 318082

Query 119148 tcacagacatgagccaccgcacctggctgttttattttatttttaaaaattcatttcagc 119207 | |||||| ||||||||| ||||||||||||||||||||||||||||||||||||||||| Sbjct 318081 TTACAGACGTGAGCCACCACACCTGGCTGTTTTATTTTATTTTTAAAAATTCATTTCAGC 318022

Query 119208 cacttcatagcttttgattggagagtttattctacttatacttaaagtaacta-tggata 119266 |||||||||||||||||||||||||||||||||| |||||||||||||||||| || ||| Sbjct 318021 CACTTCATAGCTTTTGATTGGAGAGTTTATTCTATTTATACTTAAAGTAACTACTG-ATA 317963

Query 119267 gggaaggacttactgttgccattttg-taattattttcagtcagtcttgtagcttttttg 119325 |||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||| Sbjct 317962 GGGAAGGACTTACTGTTGCCATTTTGGTAATTATTTTCAGTCAGTCTTGTAGCTTTTTTG 317903

Query 119326 tccctcttttctactcttactgcccttctttgtgtttcattgat-ttttttgggtaatga 119384 |||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||| Sbjct 317902 TCCCTCTTTTCTACTCTTACTGCCCTTCTTTGTGTTTCATTGATGTTTTTTGGGTAATGA 317843

Query 119385 tgagatttgactccttcctcatttccttttgtgtatcttctatagatattttctttgtgg 119444 |||||||||| ||||| ||||||||||||||||||||||||||||||||||||||||||| Sbjct 317842 TGAGATTTGATTCCTTTCTCATTTCCTTTTGTGTATCTTCTATAGATATTTTCTTTGTGG 317783

Query 119445 ttaccatgaagcttacataaaacagcctatacttataacagtctattctaagcttataac 119504 |||||||||||||||| ||||||||||||||||||||||||||||||||||||||||||| Sbjct 317782 TTACCATGAAGCTTACTTAAAACAGCCTATACTTATAACAGTCTATTCTAAGCTTATAAC 317723

Query 119505 aacttaatttcaattgctcacgaaaactctactcttttactgcctctcccccactttaca 119564 ||||||||||||||||| ||| ||||||||||| |||||||||||||||||||||||| Sbjct 317722 AACTTAATTTCAATTGCACACAAAAACTCTACTTTTTTACTGCCTCTCCCCCACTTTATG 317663

Query 119565 ttattgatgtcacaaattatattttttaatgttgtgtattcattaactgatttttatagt 119624 ||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 317662 TTATTCATGTCACAAATTATATTTTTTAATGTTGTGTATTCATTAACTGATTTTTATAGT 317603

Query 119625 gatagttatttttataagtttgtcttttaagtactataccagaattcaaagatttacaca 119684 || ||||||||||||| ||||||||||||||||||||||||||||| |||||||||||| Sbjct 317602 GA--GTTATTTTTATAACTTTGTCTTTTAAGTACTATACCAGAATTCCAAGATTTACACA 317545

217

Query 119685 ttgtcattatagtattaaaatattctatatttgtttatatatttacctttaccaggaatt 119744 ||||||||||||||||||||||| |||||||||||||||||||||||||||| ||||||| Sbjct 317544 TTGTCATTATAGTATTAAAATATCCTATATTTGTTTATATATTTACCTTTACAAGGAATT 317485

Query 119745 ttacctttcatatgttttcatgttgctgcctagcattcttttatttcaacatgatggact 119804 |||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||| Sbjct 317484 TTACCTTTCATATGTTTTGATGTTGCTGCCTAGCATTCTTTTATTTCAACATGATGGACT 317425

Query 119805 ccctttagcatttcttataaggcaggtatagtggtgatgaactctttcagcttttgtgca 119864 |||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||| Sbjct 317424 CCCTTTAGCATTTCTTATAAGGCAGGTATAGTGGCGATGAACTCTTTCAGCTTTTGTGCA 317365

Query 119865 catgagaaagtctttatttcccctctatttttgaaagacagttttgctagacagagtatt 119924 |||||||||||||||||||||| || ||||||||||||||||||||||||| |||||||| Sbjct 317364 CATGAGAAAGTCTTTATTTCCCTTCCATTTTTGAAAGACAGTTTTGCTAGAGAGAGTATT 317305

Query 119925 cttggctgcccgttgttttcttttgaatagatcatccatctcccttctggcctgcaaggt 119984 |||| |||||| ||||||||||||||||| ||||||||||||||||||||| || ||||| Sbjct 317304 CTTGACTGCCCATTGTTTTCTTTTGAATATATCATCCATCTCCCTTCTGGCTTGTAAGGT 317245

Query 119985 ttctgctgagaaatttgttgatagtctcattgtggcccccttttatatgaaaagttgctt 120044 ||||||| ||||||||||| |||||||||| ||||||||||||||||||||||||||||| Sbjct 317244 TTCTGCTAAGAAATTTGTTAATAGTCTCATCGTGGCCCCCTTTTATATGAAAAGTTGCTT 317185

Query 120045 ttctcttgctgctttcaaaattctgtttttgtctttgacttttgacaaattgattataat 120104 ||||||||||||||||||||||||||||||||||||||||| ||||||||| |||||||| Sbjct 317184 TTCTCTTGCTGCTTTCAAAATTCTGTTTTTGTCTTTGACTTCTGACAAATT-ATTATAAT 317126

Query 120105 atatgttgtagcggacttctttaaattcattctagttggagtcttttgggcttcttgaag 120164 |||| |||||| ||||||||||||||| ||||||||||||||||||||| |||||||||| Sbjct 317125 ATATCTTGTAGTGGACTTCTTTAAATTAATTCTAGTTGGAGTCTTTTGGACTTCTTGAAG 317066

Query 120165 ctggatgtctattttcttccctaaatttATTCATTATTTCTACTTTCCATTGGGTTTTAA 120224 |||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||| Sbjct 317065 CTGGATGTCTATTTTCTTCCCTAAATTTGTTCATTATTTCTACTTTCCATTGGGTTTTAA 317006

Query 120225 TCTTATTCGTTATTTTAAATATTTGAACTATTAGCTCATTATCTGTGATATAGGTTGCAG 120284 |||||||| |||||||||||||| ||||||||||||||||||| |||||||||||||||| Sbjct 317005 TCTTATTCTTTATTTTAAATATTGGAACTATTAGCTCATTATCCGTGATATAGGTTGCAG 316946

Query 120285 TACTATTTATTGACATGTTAATATTTGCCCCAGTGGTGTGAGATAAG--AATAATAATCA 120342 |||||||||||||||||| ||| |||||||||||| |||||||||| ||||||||||| Sbjct 316945 TACTATTTATTGACATGTCAATGTTTGCCCCAGTGATGTGAGATAACTTAATAATAATCA 316886

Query 120343 TAAACCAAATTTTCATATGTGCATGGGCTTATTTCTGAACtttttttCTGATTCCACTTG 120402 |||| ||||||||||| |||||||||||||||||||||| |||||||||||||||||||| Sbjct 316885 TAAATCAAATTTTCATGTGTGCATGGGCTTATTTCTGAA-TTTTTTTCTGATTCCACTTG 316827

Query 120403 TTGTAAGCTATTCATATACCAAAACCACATTGTCTTAATTACCACACTGTCTTAATTAAG 120462 ||||||||||||||||||||||| |||||||||||||||||||||||||||||||||| | Sbjct 316826 TTGTAAGCTATTCATATACCAAAGCCACATTGTCTTAATTACCACACTGTCTTAATTA-G 316768

Query 120463 AGGAaatacctctaatattttgtcattaaaggctttatggtatgttttattgtctggtag 120522 |||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||| Sbjct 316767 AGGAAATACCTCTAATATTTTGTCATTAAAGGCTTTATAGTATGTTTTATTGTCTGGTAG 316708

Query 120523 cactagtacctccacatagctttattctaaagtgttttcctggctattcttttatgtttg 120582 |||||||||||| ||||||||||||||||| |||||||||||||||| ||| ||||||| Sbjct 316707 GACTAGTACCTCCGCATAGCTTTATTCTAAAATGTTTTCCTGGCTATTATTT-ATGTTTG 316649

Query 120583 ttttttctgtatagactttagtgtcaacttgtttacctccataaaaagtttgttggtatt 120642 ||||||| |||||||||||||||||||||||||||||||||||| ||||||||||||||| Sbjct 316648 TTTTTTCCGTATAGACTTTAGTGTCAACTTGTTTACCTCCATAAGAAGTTTGTTGGTATT 316589

Query 120643 tttattggaattgcattaaatttatacattaactttgggagaactggcatttttatgatg 120702 ||||||||||||||||||||||||||||||||||||||||||||| ||||||||||| || Sbjct 316588 TTTATTGGAATTGCATTAAATTTATACATTAACTTTGGGAGAACTAGCATTTTTATGGTG 316529

Query 120703 ttaaatcatcctatccaagaacaagtttgtccacttattcaagctgacttgtctgtcttc 120762 |||||| ||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 316528 TTAAATGATCCTATCCAAGAACAAGTTTGTCCACTTATTCAAGCTGACTTGTCTGTCTTC 316469

218

Query 120763 tagaagtgttttcaggtttattcatttgggttttgcatatttcttgctaagtttattcca 120822 |||||||||||| ||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 316468 TAGAAGTGTTTTAAGGTTTATTCATTTGGGTTTTGCATATTTCTTGCTAAGTTTATTCCA 316409

Query 120823 agtattgaatcttccttgttgcaattgtaaatgagacttttttcttctccattatgttct 120882 |||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||| Sbjct 316408 AGTATTGAATCTTCCTTGTTGCTATTGTAAATGAGACTTTTTTCTTCTCCATTATGTTCT 316349

Query 120883 ttaactacttattatttgtgtatataaagggtattgatatcaatatatataacatctata 120942 ||||||||||||| |||||||||||||||||||||| |||||||||||||||| | ||| Sbjct 316348 TTAACTACTTATTGTTTGTGTATATAAAGGGTATTGGTATCAATATATATAACGTATATC 316289

Query 120943 tatctatatataacctatatatatctgtatatgttaattccatatcctatagctttactg 121002 ||| |||| |||| ||||||||||||||||||||||||||||||||||||||||||||| Sbjct 316288 TAT--ATATCTAACATATATATATCTGTATATGTTAATTCCATATCCTATAGCTTTACTG 316231

Query 121003 aattcttttattattgagttagttttatcatggattctctagggttttctgggtatgatg 121062 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 316230 AATTCTTTTATTATTGAGTTAGTTTTATCATGGATTCTCTAGGGTTTTCTGGGTATGATG 316171

Query 121063 gtttgtcatttgcaaatagaattgtttttcttttttgtcaatttttgtgcctccaattaa 121122 || | ||||||||||||||||||||||||||| ||||||||||||||||||||||||||| Sbjct 316170 GTGTATCATTTGCAAATAGAATTGTTTTTCTTCTTTGTCAATTTTTGTGCCTCCAATTAA 316111

Query 121123 tttctcttgtataattccattaataaactttttagtacaatgttgaatagcagtggagta 121182 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 316110 TTTCTCTTGTATAATTCCATTAATAAACTTTTTAGTACAATGTTGAATAGCAGTGGAGTA 316051

Query 121183 tcttgcacacacacacacgcacacacacacgtgcacacacat-atcatgtattttat-at 121240 ||||||||||||||||||||||||||||| ||||||||||| ||||| |||||||| || Sbjct 316050 TCTTGCACACACACACACGCACACACACA--TGCACACACATTATCATATATTTTATCAT 315993

Query 121241 -t-atgaaagtatccctcaattctcatttttgagtgtttttttgtcaagagtgattgtgg 121298 | ||||||||||||||| |||||||||||||||||||||||||| |||||||||||| | Sbjct 315992 ATTATGAAAGTATCCCTCGATTCTCATTTTTGAGTGTTTTTTTGTAAAGAGTGATTGTAG 315933

Query 121299 aattttggtgttggtgaaggctttttcagcatctatggagataatctcttgatttttttt 121358 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 315932 AATTTTGGTGTTGGTGAAGGCTTTTTCAGCATCTATGGAGATAATCTCTTGATTTTTTTT 315873

Query 121359 -ctttaaatatattgatatggtatattatttgaagaaatttcctaatgttgatcccactt 121417 ||||||||||||||||| |||||||||||||||| |||||||||||||||||||||||| Sbjct 315872 TCTTTAAATATATTGATACGGTATATTATTTGAAGGAATTTCCTAATGTTGATCCCACTT 315813

Query 121418 ggctatgctttgttattttcataatgtggtgttggattctgtttggtaatattttattta 121477 ||||||| ||||||||||||||||||||||||||||||||||||||| |||||||||||| Sbjct 315812 GGCTATGGTTTGTTATTTTCATAATGTGGTGTTGGATTCTGTTTGGTTATATTTTATTTA 315753

Query 121478 gtacgttggttttgacattcataagtatttttggtcagcaattttAACACTACttattta 121537 |||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||| Sbjct 315752 GTACGTTGGTTTTGACATTCATAAGTATTTTTGGTCTGCAATTTTAACACTACTTATTTA 315693

Query 121538 aattttcagtaatatacatataacataaaatttatccttttaactatttttaagtatata 121597 |||||||||||||||||||||||||| || ||||||||||||||||||||||||||| Sbjct 315692 AATTTTCAGTAATATACATATAACAT----TT-ATCCTTTTAACTATTTTTAAGTATATA 315638

Query 121598 tccagtgacattatcatgaatgaggtcaggagattgagaccagcctggctaacacacggt 121657 |||||||||||||||| || | | || |||||| ||||||||| ||||||||||||||| Sbjct 315637 TCCAGTGACATTATCACGA--G-G-TCGGGAGATCGAGACCAGCATGGCTAACACACGGT 315582

Query 121658 gaaaccccgtctctactaaaaatacaaaaaattagccgggcgtggtggcgggcgcctgta 121717 |||||||||||||||||||| |||||||||||||||||||||||||||| || ||||||| Sbjct 315581 GAAACCCCGTCTCTACTAAAGATACAAAAAATTAGCCGGGCGTGGTGGCAGGTGCCTGTA 315522

Query 121718 gttccagttactcgggaggctgaggcaggagaatggcgtgaacccgggaggtggagcttg 121777 || |||| |||| |||||||||||||||||||||||||||||||||||||| |||||||| Sbjct 315521 GTCCCAGCTACTTGGGAGGCTGAGGCAGGAGAATGGCGTGAACCCGGGAGGCGGAGCTTG 315462

Query 121778 tagtgagccgagatcttgccactgcactccagcctgggcaacagagcgagactccgtct 121836 |||||||||||||| |||||||||||||||||||||| ||||||||||||||||||| Sbjct 315461 CAGTGAGCCGAGATCACGCCACTGCACTCCAGCCTGGGCGACAGAGCGAGACTCCGTCT 315403

219

APPENDIX B

Reported CBC Candidate QTL

Chromosome Trait Population Statistics Notes PMID 1p34.2-p36.23 MCH- Baboons LOD 2.5 17557178 MCHC 1p34.2-p36.23 MCH- Baboons LOD 1.5 17557178 MPV 1p34.2-36.23 MCV Baboons LOD 3.5 17557178 1p34.2-p36.23 MCV-HB Baboons LOD 3.2 17557178 1p34.2-p36.23 MCV- Baboons LOD 3.2 17557178 HCT 1p34.2-p36.23 MCV- Baboons LOD 2.5 17557178 MCH 1p34.2-p36.23 MCV- Baboons LOD 2.6 17557178 MCHC 1p34.2-p36.23 MCV- Baboons LOD 3.4 17557178 MPV 1p34.2-p36.23 MCV-PLT Baboons LOD 3.1 17557178 1p34.2-p36.23 MCV- Baboons LOD 3.2 17557178 RBC 1p34.2-p36.23 MCV- Baboons LOD 2.8 17557178 RDW 1p34.2-p36.23 MCV- Baboons LOD 3.0 17557178 WBC 1p34.3 RBC Caucasian 4.20E-06 FLJ11730; BC016328 17903294 1p34.2-p36.23 RBC-HCT Baboons LOD 3.5 17557178 1q21 Hct Mice LOD 3.9 Locus named Hctq1, 16596451 recesive model 1q22 MCHC Japanese 3.39E-09 BGLAP-PAQR6- 20139978 SMG5-TMEM79- C1orf85-VHLL-CCT3- C1orf182 1q23.2 HB Caucasian 1.60E-07 OR10J1; OR10J5 17903294 1q23.1 MCHC Caucasian p=1.033E-10 SPTA1 19862010 1q23.2 WBC African American LOD 96.8 DARC 18179887 1q23.2 WBC African American 1.40E-103 DARC 21153663 1q24.3 MPV Caucasian 2.10E-14 DNM3 19820697 1q31.3 MCH Japanese 6.76E-10 ATP6V1G3-PTPRC 20139978 1q32.1 MPV Caucasian 1.40E-20 TMCC2 19820697 1q41-42 RBC Mice LOD 3.8 Locus named Rbcq3, 16596451 dominant model, same signal as 17p11 1q44 HB Caucasian LOD 2.04 16950815 1q44 HCT Caucasian LOD 2.72 16950815 1q44 MCV Japanese 2.33E-08 TRIM58 20139978 2p11.2-q11.2 MCV Caucasian LOD1.63 16950815 2p16.2 HB Caucasian 1.00E-06 17903294 2p16.1 MCV Caucasian p=1.125E-14 BCL11A 19862010 2p21 HB Caucasian p=7.052E-13 PRKCE 19862010

220

2p21 HCT Caucasian p=3.748E-15 PRKCE 19862010 2p21 MPV Caucasian 3.20E-11 EHD3 19820697 2p21 RBC Japanese 3.81E-08 PRKCE 20139978 2q11.1-q12.1 MCV- Baboons LOD 1.5 17557178 RDW 2q32 MCHC Mice LOD 4.7 Locus named Chmq1, 16596451 recessive model 3p14.2-p13 RBC Caucasian LOD 2.2 16950815 3p21–p13 MPV Caucasian 5.50E-31 ARHGEF3 19820697 3p23-p22.3 RDW- Baboons LOD 1.6 17557178 PLT 3p24.2 MCH Japanese 3.52E-10 THRB 20139978 3p24.2 MCV Japanese 3.33E-08 THRB 20139978 3q12-13 Hgb Mice LOD 3.7 Locus named Hgbq1, 16596451 dominant model 3q27.1 PLT Japanese 5.38E-11 THPO-CHRD 20139978 3q29 MCH Japanese 4.46E-11 TFRC-ZDHHC19 20139978 3q29 MCH Caucasian p=7.729E-13 TFRC 19862010 3q29 MCV Japanese 1.66E-08 TFRC-ZDHHC19 20139978 3q29 MCV Caucasian p=8.499E-14 TFRC 19862010 3q29 MCV Caucasian 6.40E-04 TFRC-ZDHHC19 21153663 4p12-q13.3 RBC Caucasian LOD 2.5 17903294 4p15.1-p14 MCV Caucasian LOD 1.73 16950815 4p15.33 RBC Caucasian 4.50E-06 17903294 4q12 MCH Japanese 2.80E-25 PDGFRA-HK1 20139978 4q12 MCV Japanese 2.19E-29 PDGFRA-HK1 20139978 4q12 MCV Caucasian p=9.816E-10 KIT 19862010 4q12 RBC Japanese 1.92E-17 PDGFRA-HK1 20139978 4q31.2-q34.1 RBC Caucasian LOD 3.07 16950815 4q34.3-q35.1 WBC- Baboons LOD 2.0 17557178 MCHC 5p15.33 RBC Japanese 3.01E-08 TERT 20139978 5q21.1 MCH Caucasian 5.80E-06 17903294 6p12, 6q13 MCHC Mice LOD 6.1 Locus named 16596451 Chcmq2, dominant model 6p21.1 MCH Japanese 1.80E-20 USP49-MED20- 20139978 BYSL-CCND3 6p21.1 MCH Caucasian p=8.198E-20 CCND3/BYSL 19862010 6p21 MCV Australian, Dutch 1.20E-09 CCND3 19853236 6p21.1 MCV Japanese 3.62E-27 USP49-MED20- 20139978 BYSL-CCND3 6p21.1 MCV Caucasian p=1.121E-31 CCND3/BYSL 19862010 6p21.1 MCV Caucasian 7.00E-19 BYSL/ CCND3 19820697 6p21.1 RBC Japanese 1.09E-10 USP49-MED20- 20139978 BYSL-CCND3 6p21 WBC Caucasian 1.90E-08 21153663

221

6p21.3 MCV Caucasian 1.40E-23 HFE 19820697 6p21.31 PLT Japanese 6.66E-11 BAK1 20139978 6p21.3 PLT Caucasian 3.70E-10 BAK1 19820697 6p21.31 PLT African American 6.20E-08 BAK1 21153663 6p21.31 PLT Caucasian 7.60E-73 BAK1 21153663 6p21.33 WBC Japanese 6.76E-09 CDSN-PSORS1C1 20139978 6p22 HB Caucasian p=5.737E-19 HFE 19862010 6p22.1 HB Caucasian 1.40E-15 HFE 21153663 6p22 HCT Caucasian p=2.035E-9 HFE 19862010 6p22.1 HCT Caucasian 2.50E-10 HFE 21153663 6p22.2 MCH Caucasian p=3.868E-39 SLC17A3 19862010 6p22.1 MCH Emergency 2.76E-09 HFE 20927387 department 6p22.2 MCH Emergency 4.66E-08 SLC17A1 20927387 department 6p22.2 MCHC Japanese 5.00E-10 SLC12A7 20139978 6p22 MCV Caucasian p=1.012E-46 HFE 19862010 6p22.1 MCV Caucasian 6.80E-16 HFE 21153663 6q21 MCH Japanese 1.49E-08 C6orf182-CD164 20139978 6q21 MCV Japanese 2.51E-08 C6orf182-CD164 20139978 6q21 MCV Caucasian p=4.198 CD164 19862010 6q21 RBC Japanese 6.92E-09 C6orf182-CD164 20139978 6q21 RBC Caucasian LOD 1.73 16950815 6q23.1-q23.2 HB Caucasian LOD 3.03 16950815 6q23.3 HCT Caucasian p=2.811E-15 HBS1L-MYB 19862010 6q23-24 HCT Caucasian LOD 3.4 EBP41L2, HEBP2 15635079 HPFH 6q23.3 HCT Caucasian 5.50E-04 MYB 21153663 6q23.1-q23.2 HCT Caucasian LOD 2.95 16950815 6q23.3 Ht Japanese 9.52E-11 HBS1L-MYB 20139978 6q23.3 MCH Japanese 2.82E-66 HBS1L-MYB 20139978 6q23.3 MCH Caucasian p=7.356E-69 HBS1L-MYB 19862010 6q23.3 MCH Emergency 5.17E-16 HBS1L/MYB 20927387 department 6q23.3 MCH Emergency 3.12E-14 HBS1L/MYB 20927387 department 6q23.3 MCH Emergency 4.94E-14 HBS1L/MYB 20927387 department 6q23.3 MCH Emergency 1.36E-15 HBS1L/MYB 20927387 department 6q23.3 MCH Emergency 7.05E-11 HBS1L/MYB 20927387 department 6q23.3 MCH Emergency 1.42E-08 HBS1L/MYB 20927387 department 6q23.3 MCHC Japanese 6.22E-12 HBS1L-MYB 20139978 6q23.3 MCHC Caucasian p=6.486E-10 HBS1L-MYB 19862010 6q23.3 MCV Japanese 3.44E-56 HBS1L-MYB 20139978 6q23.3 MCV Caucasian p=7.241E-86 HBS1L-MYB 19862010

222

6q23.3 MCV Emergency 1.37E-14 HBS1L/MYB 20927387 department 6q23.3 MCV Emergency 5.03E-13 HBS1L/MYB 20927387 department 6q23.3 MCV Emergency 7.94E-13 HBS1L/MYB 20927387 department 6q23.3 MCV Emergency 2.82E-15 HBS1L/MYB 20927387 department 6q23.3 MCV Emergency 2.50E-10 HBS1L/MYB 20927387 department 6q23.3 MCV Emergency 3.49E-08 HBS1L/MYB 20927387 department 6q23 –q24 MCV Caucasian 7.40E-42 HBS1L -MYB 19820697 6q23.3 MCV Caucasian 3.90E-09 MYB 21153663 6q23.3 PLT Japanese 2.54E-14 HBS1L-MYB 20139978 6q23.3 RBC Japanese 7.31E-48 HBS1L-MYB 20139978 6q23.3 RBC Caucasian p=1.148E-47 HBS1L-MYB 19862010 6q23.3 RBC Emergency 1.11E-14 HBS1L/MYB 20927387 department 6q23.3 RBC Emergency 2.46E-13 HBS1L/MYB 20927387 department 6q23.3 RBC Emergency 2.85E-13 HBS1L/MYB 20927387 department 6q23.3 RBC Emergency 7.18E-12 HBS1L/MYB 20927387 department 6q23.3 RBC Emergency 5.57E-09 HBS1L/MYB 20927387 department 6q23.3 RBC Caucasian 4.70E-04 MYB 21153663 6q23.1-q23.2 RBC Caucasian LOD 1.88 16950815 6q23.3 WBC Japanese 1.67E-09 HBS1L-MYB 20139978 6q23.3 WBC Caucasian 4.40E-04 MYB 21153663 6q24.1 MCH Japanese 1.42E-09 CITED2 20139978 6q24.1 MCH Caucasian p=1.262E-17 CITED2 19862010 6q24.1 MCV Australian, Dutch 5.30E-09 19853236 6q24.1 MCV Japanese 1.08E-09 CITED2 20139978 6q24.1 MCV Caucasian p=4.665E-25 CITED2 19862010 6q25.1 RBC Caucasian 6.30E-06 MAP3K7IP2 17903294 6q25.3-q27 RBC Caucasian LOD 2.9 17903294 7p12.2 MCV Caucasian p=4.689E-13 IKZF1 19862010 7p13-11 MCV Mice LOD 4.7 Locus named Mcvq2, 16596451 heterozygotes low 7q11.21-q11.22 HB Caucasian LOD 1.55 16950815 7q21.2 WBC Japanese 2.44E-08 CDK6 20139978 7q21.2 WBC Caucasian 4.10E-04 CDK6 21153663 7q22.1 HCT Caucasian p=4.45E-10 TFR2 19862010 7q22 HCT Caucasian 4.30E-07 TFR2 21153663 7q22.1 MCV Caucasian p=2.543E-11 TFR2 19862010 7q22 MCV Caucasian 1.60E-07 TFR2 21153663 7q22.3 MPV Caucasian 1.60E-33 PIK3CG 19820697

223

7q22.1 RBC Caucasian p=1.123E-9 EPO 19862010 7q22 RBC Caucasian 4.90E-10 TFR2 19820697 7q22 RBC Caucasian 1.60E-05 TFR2 21153663 7q22.1 RBC Caucasian 1.30E-06 EPO 21153663 7q36.1 HB Caucasian p=3.025E-15 PRKAG2 19862010 7q36.1 HB Caucasian 1.60E-10 PRKAG2 21153663 7q36.1 HCT Caucasian p=6.045E-15 PRKAG2 19862010 7q36.1 HCT Caucasian 5.00E-09 PRKAG2 21153663 7q36.1-q36.2 MCV Caucasian LOD 2.2 17903294 7q36.2 MCV Caucasian LOD 1.71 16950815 7q36.2-q36.3 MCV Caucasian LOD 2.14 16950815 8p12 RBC Caucasian LOD 1.52 16950815 8p21.3 MCV Japanese 3.32E-08 DOK2-XPO7 20139978 8p21.3-p22 RBC- Baboons LOD 1.6 17557178 MPV 8q22.3-q24.11 RBC Caucasian LOD 1.99 16950815 8q24.13 HB Caucasian 2.10E-06 17903294 8q24.13-q24.11 HCT Caucasian LOD 2.02 16950815 9p24.1 MCH Japanese 5.69E-11 RCL1 20139978 9p24.1 MCH Caucasian p=2.166E-14 RCL1 19862010 9p24.1 MCV Japanese 2.50E-14 RCL1 20139978 9p24.1 MCV Caucasian p=3.184E-20 RCL1 19862010 9p24.2 MCV Caucasian LOD 1.61 16950815 9p24.1 PLT Japanese 2.95E-13 RCL1 20139978 9p24.1 –p24.3 PLT Caucasian 8.50E-17 AK3 19820697 9p24.2 RBC Caucasian LOD 1.62 16950815 9q34.2 Hb Japanese 1.18E-11 ABO 20139978 9q34.2 HB Caucasian 9.60E-04 ABO 21153663 9q34 HCT-HB Caucasian LOD 3.1 15635079 9q34.2 Ht Japanese 6.06E-10 ABO 20139978 9q34.2 MCHC Japanese 4.32E-08 ABO 20139978 9q34 MCV Korean 5.06E-05 ABO. Identified a 22963146 CNV where deletion associated with MCV. 9q34.2-34.3 MCV- Baboons LOD 1.7 17557178 MPV 9q34.2 RBC Japanese 3.33E-12 ABO 20139978 9q34 RBC Korean 2.13E-06 ABO 22963146 10p11.21 MCH Caucasian 5.90E-06 ANKRD30A 17903294 10q11.21 MCH Japanese 3.82E-12 ALOX5-MARCH8- 20139978 ANUBL1-FAM21C- AGAP4 10q11.23 MCH Japanese 6.42E-10 MSMB-NCOA4- 20139978 TIMM23

224

10q11.21 MCV Japanese 2.60E-11 ALOX5-MARCH8- 20139978 ANUBL1-FAM21C- AGAP4 10q11.23 MCV Japanese 6.73E-09 MSMB-NCOA4- 20139978 TIMM23 10q11.21 MCV Caucasian p=1.346E-10 Gene: MARCH8 19862010 10q11.21 MCV Caucasian 4.10E-04 ALOX5-MARCH8- 21153663 ANUBL1-FAM21C- AGAP4 10q22.1 HB Caucasian p=2.116E-11 HK1 19862010 10q22.1 HCT Caucasian p=9.636E-14 HK1 19862010 10q21.2 –q21.3 MPV Caucasian 3.30E-21 JMJD1C 19820697 10q26.3 PLT-MPV Baboons LOD 2.0 17557178 10q26.3 RDW Baboons LOD 1.9 17557178 10q26.3 RDW- Baboons LOD 2.0 17557178 MPV 10q26.3 RDW- Baboons LOD 1.5 17557178 PLT 11p15.4 MCH Caucasian LOD 3.6 Hemoglobin B gene 17211848 cluster 11p15.3-p15.1 MCV Caucasian LOD 3.3 17903294 11p15.4 MCV Caucasian LOD 3.8 Hemoglobin B gene 17211848 cluster 11p15.4 MCV African American 2.90E-04 HBB 21153663 11p15.4 MCV African American 4.90E-07 MMP26 21153663 11p15.5 MPV Caucasian 1.30E-14 BET1L 19820697 11p15.3-p15.2 RBC Caucasian LOD 3.2 17903294 11q12.1 MCH Caucasian 6.90E-08 OR5AP2; OR5AR1; 17903294 OR9G1; OR9G4 12p13.32 RBC Japanese 6.44E-09 CCND2 20139978 12p13.2 RBC Caucasian LOD 3.0 17211848 12p13.32-p13.31 RBC Caucasian LOD 2.8 17903294 12q13.13 PLT African American 1.60E-06 NFE2-COPZI 21153663 12q13.13 RBC Caucasian LOD 1.63 16950815 12q15 WBC Japanese 2.85E-08 RAP1B-NUP107- 20139978 SLC35E3 12q21 MCH Mice LOD 4.4 Locus named Mchq1, 16596451 recessive model 12q21 MCV Mice LOD 4.7 Locus named Mcvq3, 16596451 recessive model 12q21 RBC Mice LOD 3.7 Locus named Rbcq4, 16596451 dominant model 12q23.1 RBC Caucasian LOD 1.61 16950815 12q24.13 HB Caucasian p=1.159E-11 TRAFD1 19862010 12q24.12 HB Caucasian 3.20E-07 SH2B3 21153663 12q24.12 HCT Caucasian p=1.363E-12 SH2B3-ATXN2 19862010 12q24.12 HCT Caucasian 1.50E-06 SH2B3 21153663 12q24.14 MCHC Japanese 7.25E-10 ALDH2 20139978

225

12q24.31 MPV Caucasian 2.70E-44 WDR66 19820697 12q24.12 PLT Japanese 4.75E-19 SH2B3 20139978 12q24 PLT Caucasian 2.20E-13 ATXN2 19820697 12q24 PLT Caucasian 7.70E-12 PTPN11 19820697 13q22.1 MCV Caucasian LOD 1.74 16950815 14q23.3 MCV Caucasian p=4.907E-8 FNTB 19862010 14q24.3-q32.12 RBC Caucasian LOD 2.6 17903294 15q14 MCV Mice LOD 9.8 Locus named Mcvq1, 16596451 recessive model 15q22.31 MCH Japanese 2.88E-09 IGDCC4-DPP8- 20139978 PTPLAD1-C15orf44- SLC24A1-DENND4A 15q22.1 MPV Caucasian 1.90E-08 TPM1 19820697 15q26.3 RBC Caucasian LOD 2.3 17903294 16p13.3 HB African American 6.10E-08 HBA2-HBA1 21153663 16p13.3 HB African American 1.30E-06 MMP25 21153663 16p13.3 MCH Japanese 2.83E-09 HBA2-HBA1-LUC7L- 20139978 ITFG3-RGS11 16p13.3 MCH Caucasian p=2.675E-10 ITFG3 19862010 16p13.3 MCHC African American 7.40E-13 HBA2-HBA1 21153663 16p13.3 MCV Caucasian p=1.819E-12 ITFG3 19862010 16p13 MCV African American 5.00E-07 21153663 16p13.3 MCV African American 1.40E-33 HBA2-HBA1 21153663 16q23.2-q24.3 Hb Baboons LOD 1.6 17557178 16q23.2-q24.3 HB-RDW Baboons LOD 2.2 17557178 16q24.3 MCHC Japanese 3.55E-13 CDT1 20139978 17p13.1-p12 MCV Caucasian LOD 1.95 16950815 17p13.2 PLT Japanese 2.13E-12 GP1BA 20139978 17p11 RBC Mice LOD 3.8 Locus named Rbcq3, 16596451 dominant model, same signal as 1q41- 42 17p34-36 RBC Mice LOD 5.3 Locus named Rbcq2, 16596451 recessive model 17q11.2-q22 HB Caucasian LOD 2.61 16950815 17q11.2-q22 HCT Caucasian LOD 2.98 16950815 17q11.2 MPV Caucasian 1.40E-22 TAOK1 19820697 17q12 WBC Caucasian 9.40E-09 GSDMA/ ORMDL3 19820697 17q21.32-q22 RBC Caucasian LOD 2.05 16950815 17q21.1 WBC Japanese 2.94E-14 GSDMA-PSMD3- 20139978 CSF3-MED24 17q21.1 WBC Caucasian 1.80E-19 CSF3 21153663 17q25.1-q25.3 RBC Caucasian LOD 2.3 17903294 18p11.32-p11.31 RBC Caucasian LOD 3.3 17903294

226

18q21 RBC Mice LOD 3.4 Locus named Rbcq1, 16596451 recessive model 18q22.3 MPV Caucasian 1.40E-10 CD226 19820697 19p13.12-q13.11 Hb Baboons LOD 1.9 17557178 19p13.12-q13.41 HB-PLT Baboons LOD 2.0 17557178 19p13.12-q13.41 Hct Baboons LOD 1.6 17557178 19p13.2 MCH Caucasian p=1.415E-11 GCDH 19862010 19p13.2 MCV Caucasian p=2.173E-12 RTBDN 19862010 19q13.43 MCV Caucasian 8.30E-07 GALP 21153663 19p13.12 PLT African American 3.00E-07 TPM4 21153663 19p13.12 PLT Caucasian 3.00E-07 TPM4 21153663 19p13.3 RBC Caucasian LOD 3.2 EPOR, EKLF 17211848 20p12.3 RBC Caucasian 5.80E-06 17903294 20p12.1-q11.2 RDW- Baboons LOD 2.4 17557178 MPV 20p13 MPV Caucasian 7.70E-11 SIRPA 19820697 20q13.2 HB Caucasian p=1.054E-10 TSHZ2 19862010 20q13.31 MCV Japanese 1.37E-08 RBM38 20139978 22q11.21 MCV Japanese 1.04E-08 HIC2-UBE2L3 20139978 22q11.21-q11.22 MCV Caucasian LOD 1.7 16950815 22q11.23-q13.2 RBC Caucasian LOD 2.2 17903294 22q12 –q13 MCV Caucasian 4.30E-10 FBXO7 19820697 22q12.1-q12.3 MCV Caucasian LOD 2.16 16950815 22q12.3 HB Australian 5.30E-07 TMPRSS6, also 19820699 associated with iron status (mechanism) 22q12.3 Hb Japanese 1.64E-10 TMPRSS6 20139978 22q12.3 HB Caucasian p=3.25E-25 TMPRSS6 19862010 22q12.3 HB Caucasian 1.60E-13 TMPRSS6, V736A 19820698 substitution in serine protease domain reported w/ dose effect 22q12.3 HCT Caucasian p=1.846E-13 TMPRSS6 19862010 22q12.3 MCH Japanese 4.84E-25 TMPRSS6 20139978 22q12.3 MCH Caucasian p=8.77E-34 TMPRSS6 19862010 22q12.3 MCH Emergency 1.10E-12 TMPRSS6 20927387 department 22q12.3 MCH Emergency 3.73E-08 TMPRSS6 20927387 department 22q12.3 MCH Emergency 2.41E-11 TMPRSS6 20927387 department 22q12.3 MCH Caucasian 9.50E-10 TMPRSS6 19820697 22q12.3 MCHC Japanese 7.73E-14 TMPRSS6 20139978 22q12.3 MCHC Emergency 2.40E-11 TMPRSS6 20927387 department 22q12.3 MCHC Emergency 1.13E-12 TMPRSS6 20927387 department

227

22q12.3 MCV Australian 1.10E-10 TMPRSS6, also 19820699 associated with iron status (mechanism) 22q12.3 MCV Japanese 1.22E-15 TMPRSS6 20139978 22q12.3 MCV Caucasian p=2.772E-41 TMPRSS6 19862010 22q12.3 MCV Emergency 5.41E-09 TMPRSS6 20927387 department 22q13.33 MCH Japanese 4.37E-08 NCAPH2-SCO2- 20139978 TYMP-KLHDC7B 22q13.33 MCV Caucasian p=1.033E-15 ECGF1 19862010 22q13.33 MCV Caucasian 1.00E-06 ECGF1 21153663 Xp21.3-p11.4 MCV Caucasian LOD 2.3 17903294

Xq25 HB African American 1.20E-15 G6PD 21153663 Xq25 HCT African American 1.20E-13 G6PD 21153663 Xq25 MCV African American 3.00E-18 G6PD 21153663 Xq25 RBC African American 4.30E-21 G6PD 21153663

228

APPENDIX C

Marshfield Map Screening Set 16 Weber Lab Screening Set 16

MARKER LETTER DESIGNATIONS A = Amplify 2 polymorphic sites. N = Minor change in primer sequence with no change in primer position or allele size. Z = Primer moved from its initial position; the allele size changed. Y = Primer moved twice from its initial position; the allele size changed twice. P = A 7-base pigtail was added to the reverse primer. The allele size has changed. M = An "Adenine" was added to the reverse primer. The allele size has changed by one base. L = The reverse primer was shifted to a position which would accommodate a "Thymine" on the 5' of the primer. The allele size has changed. Q = A 4-base pigtail was added to the reverse primer. The allele size has changed.

229

Genetic Map Ends Allele Size CEPH CEPH Chr. Locus Marker Name Mb cM Gap Het Range 133101 133102 Type Genetic Map Begins 0 0 1 GTTTT002P 1 0.20 0 0 0.40 117-184 150,160 150,155 Tetra 1 D1S1612 GGAA3A07M 2 7.83 16.2 16.2 0.75 139-201 181,161 185,181 Tetra 1 D1S1597 GATA27E01 3 13.15 29.9 13.7 0.71 135-187 171,163 175,171 Tetra 1 D1S3669 GATA29A05P 4 17.16 37.1 7.2 0.75 174-226 194,190 210,210 Tetra 1 ATA47D07 5 20.38 46.6 9.5 0.71 115-151 136,127 136,133 Tri 1 300wb9 6 23.01 52.7 6.1 0.75 123-151 230,226 141,133 Di 1 ATA79C10 7 34.06 62.7 10 0.80 223-265 250,247 253,244 Tri 1 D1S3721 GATA129H04 8 41.14 72.6 9.9 0.88 192-268 232,212 228,216 Tetra 1 D1S2134 GATA72H07 9 47.65 75.7 3.1 0.81 245-305 265,265 293,257 Tetra 1 D1S1596 GATA26G09P 10 59.10 89.5 13.8 0.75 84-140 124,120 124,116 Tetra 1 GATA152F05L 11 73.55 102 12.5 0.75 219-255 239,239 235,231 Tetra 1 D1S1728 GATA109Z 12 81.42 109 7 0.71 276-316 292,292 296,292 Tetra 1 D1S551 GATA6A05 13 82.32 113.7 4.7 0.72 154-194 178,174 178,186 Tetra 1 GATA124C08N 14 98.15 129.4 15.7 0.75 127-172 155,151 155,147 Tetra 1 GATA133A08Q 15 105.64 137.6 8.2 0.75 102-167 130,122 126,122 Tetra 1 D1S1627 ATA25E07M 16 106.32 139 1.4 0.75 87-126 117,114 114,105 Tri 1 D1S534 GATA12A07N 17 119.02 151.9 12.9 0.75 184-228 210,208 206,206 Tetra 1 D1S1653 GATA43A04 18 155.15 164.1 12.2 0.68 88-130 108,100 104,18 Tetra 1 D1S1679 GGAA5F09 19 159.55 170.8 6.7 0.85 132-184 164,148 168,148 Tetra 1 D1S1677 GGAA22G10N 20 160.75 175.6 4.8 0.75 172-224 204,196 200,196 Tetra 1 TATC028 21 165.58 184.6 9 0.75 243-287 271,255 275,255 Tetra 1 D1S1589 ATA4E02 22 171.50 192.1 7.5 0.74 187-229 205,205 214,205 Tri 1 D1S518 GATA7C01 23 184.79 202.2 10.1 0.84 187-227 211,201 215,199 Tetra 1 D1S1660 GATA48B01 24 195.90 212.4 10.2 0.78 218-262 238,238 238,234 Tetra 1 GATA124F08 25 206.05 226 13.6 0.66 213-253 237,237 237,229 Tetra 1 D1S549 GATA4H09 26 216.70 239.7 13.7 0.76 153-205 193,189 177,173 Tetra 1 D1S3462 ATA29C07L 27 228.99 247.2 7.5 0.75 237-276 264,255 255,255 Tri 1 ATA009 28 231.70 248.8 1.6 0.74 255-300 282-282 282-282 Tri 230 1 D1S1594 GATA22D12 29 237.82 265.5 16.7 0.60 96-144 128, 116 124,120 Tetra

1 D1S1609 GATA50F11 30 241.01 274.5 9 0.79 152-224 188,188 188,180 Tetra Genetic Map Ends 289.7 15.2 Allele Size CEPH CEPH Repeat Chromosome Locus Marker Name Mb cM Gap Het Range 133101 133102 Type Genetic Map Begins 0 0 2 SraP 1 1.40 0 0 0.75 104-148 124,124 124,124 Tetra 2 130yg9P 2 4.38 10.86 10.86 0.75 230-264 254,250 254,244 Di 2 D2S2952 GATA116B01N 3 8.10 17.9 7.04 0.75 164-221 205,181 189,189 Tetra 2 D2S1400 GGAA20G10M 4 11.63 27.6 9.7 0.75 92-165 120,116 120,120 Tetra 2 D2S1360 GATA11H10 5 17.48 38.3 10.7 0.83 112-188 144,140 176,160 Tetra 2 GATA8F07 6 29.45 48 9.7 0.70 229-308 253,245 249,249 Tetra 2 D2S1788 GATA86E02P 7 36.23 55.5 7.5 0.75 158-222 198,174 198,170 Tetra 2 ATA47C04P 8 44.11 67.6 12.1 0.75 85-121 106,103 103,97 Tri 2 D2S1352 ATA27D04P 9 50.81 73.6 6 0.75 108-147 123,120 135,120 Tri 2 GATA66D01ZP 10 67.03 85.5 11.9 0.81 256-328 284,300 292,296 Tetra 2 D2S1394 GATA69E12M 11 73.11 90.8 5.3 0.75 147-187 163,163 171,167 Tetra 2 D2S1790 GATA88G05 12 85.05 103 12.2 0.80 280-340 296,292 300,296 Tetra 2 D2S2972 GATA176C01 13 102.19 114.4 11.4 0.74 200-256 232,228 236,228 Tetra 2 D2S410 GATA4E11 14 116.34 125.2 10.8 0.81 146-190 178,164 168,166 Tetra 2 D2S1328 GATA27A12 15 126.28 132.6 7.4 0.74 142-166 162,154 162,158 Tetra 2 D2S1334 GATA4D07 16 136.67 145.1 12.5 0.79 246-322 282,282 286,286 Tetra 2 D2S1399 GGAA20G04 17 148.42 152 6.9 0.83 129-181 153,141 161,141 Tetra 2 D2S1353 ATA27H09 18 159.76 164.5 12.5 0.81 126-168 156,147 153,150 Tri 2 D2S1776 GATA71D01 19 169.85 173 8.5 0.76 280-320 304,296 300,300 Tetra 2 D2S1391 GATA65C03M 20 185.20 186.2 13.2 0.75 102-154 130,126 126,122 Tetra 2 D2S1384 GATA52A04M 21 205.43 200.4 14.2 0.75 130-174 154,150 150,146 Tetra 2 D2S2944 GATA30E06P 22 214.78 210.4 10 0.75 115-143 127,123 135,127 Tetra 2 D2S434 GATA4G12 23 218.78 215.8 5.4 0.77 246-294 274,270 274,266 Tetra 2 D2S1363 GATA23D03ZP 24 227.23 227 11.2 0.75 271-323 291,299 291,295 Tetra 2 D2S427 GATA12H10 25 232.41 236.7 9.7 0.74 211-267 259,251 259,255 Tetra 2 GATA23A02 26 236.31 248 11.3 0.71 123-163 143,131 147,143 Tetra 2 D2S2968 GATA178G09M 27 238.37 251.9 3.9 0.75 172-196 184,172 172,172 Tetra 231 2 AGAT021 28 242.67 266.2 14.3 0.75 190-242 230,222 230,226 Tetra

Genetic Map Ends 269.1 2.9 Allele Size CEPH CEPH Repeat Chromosome Locus Marker Name Mb cM Gap Het Range 133101 133102 Type Genetic Map Begins 0 0 3 D3S2387 GATA22G12 1 1.01 5.5 5.5 0.86 157-225 201,197 197,193 Tetra MFD433- 3 AGAT010 2 2.69 9.5 4 0.75 264-308 292,284 288,284 Tetra 3 GATA131D09 3 4.29 17.7 8.2 0.75 114-179 163,159 167,159 Tetra 3 295yc9P 4 9.34 29.92 12.22 0.75 158-188 184,168 182,176 Di 3 079yg5ZP 5 11.49 36.1 6.18 0.79 102-143 117,119 119,113 Di 3 D3S3038 GATA73D01 6 21.92 44.8 8.7 0.75 175-231 199,191 203,191 Tetra 3 D3S2432 GATA27C08P 7 32.14 58 13.2 0.75 117-182 153,149 153,153 Tetra 3 D3S1768 GATA8B05M 8 34.60 62 4 0.75 175-219 195,195 195,191 Tetra 3 D3S2409 ATA10H11 9 49.38 71 9 0.69 100-139 124,124 124,115 Tri 3 D3S1766 GATA6F06 10 58.94 78.6 7.6 0.71 196-244 220,216 224,220 Tetra 3 AGAT128 11 62.46 84.7 6.1 0.75 258-286 274,274 274,274 Tetra 3 AAC023 12 67.95 94.2 9.5 0.75 177-231 207,204 219,219 Tri 3 D3S3039 GATA7F05 13 73.76 104 9.8 0.71 276-316 300,296 300,296 Tetra 3 D3S4529 GATA128C02M 14 85.77 112 8 0.75 140-176 164,152 164,152 Tetra 3 D3S3045 GATA84B12 15 108.31 124.2 12.2 0.83 158-212 184,176 184,176 Tetra 3 D3S2460 GATA68F07 16 118.72 135 10.8 0.74 139-179 163,159 155,151 Tetra 3 D3S4523 ATA34G06 17 122.21 138 3 0.72 213-261 243,234 240,234 Tri 3 D3S1764 GATA4A10 18 140.51 152.6 14.6 0.80 213-269 253,249 241,233 Tetra 3 D3S1744 GATA3C02ZP 19 148.41 161 8.4 0.75 149-201 169,177 177,185 Tetra 3 AAT071 20 151.48 164.6 3.6 0.75 168-207 186,186 192,189 Tri 3 D3S1763 GATA3H01 21 168.56 176.5 11.9 0.79 240-292 272,272 268,268 Tetra 3 D3S2427 GATA22F11NZ 22 177.11 188 11.5 0.75 141-211 181,177 176,161 Di 3 TTTA040 23 188.03 201.1 13.1 0.75 140-188 164,160 176,164 Tetra 3 D3S2398 GATA6G12 24 190.74 209.4 8.3 0.75 266-298 274,274 286,278 Tetra 3 D3S2418 ATA22E01 25 193.64 216 6.6 0.67 84-126 111,96 111,105 Tri Genetic Map Ends 228.1 12.1 Allele Size CEPH CEPH Repeat

232 Chromosome Locus Marker Name Mb cM Gap Het Range 133101 133102 Type Genetic Map 0 0

Begins 4 D4S3360 4PTEL04 1 0.11 0 0 0.73 164-192 180,174 180,174 Di 4 D4S2366 GATA22G05M 2 6.55 12.9 12.9 0.75 113-149 125,121 133,125 Tetra 4 ATT015 3 13.68 25.9 13 0.75 271-310 280,277 295,292 Tri 4 GATA70E01 4 20.86 35 9.1 0.82 150-238 202,170 198,174 Tetra 4 D4S2397 ATA27C07P 5 27.01 43 8 0.75 130-160 148,148 151,145 Tri 4 D4S2632 GATA72G09Z 6 35.60 51 8 0.75 93-177 97-141 121,129 Tetra 4 ATA21F01 7 46.84 62.2 11.2 0.00 244-274 259,256 259,256 Tri 4 D4S3248 GATA28F03 8 60.02 72.5 10.3 0.71 224-269 245,241 245,245 Tetra 4 D4S2367 GATA24H01N 9 68.29 78 5.5 0.75 127-147 143,135 143,135 Tetra 4 D4S3243 GATA10G07 10 81.39 88 10 0.70 150-197 170,166 162,162 Tetra 4 D4S2361 ATA2A03 11 85.46 93.5 5.5 0.73 137-173 158,155 155,155 Tri 4 D4S1647 GATA2F11 12 99.89 104.9 11.4 0.77 124-164 140,132 132,132 Tetra 4 D4S2623 GATA62A12Z 13 111.35 114 9.1 0.73 224-288 240,260 240,244 Tetra 4 TAGA006 14 119.48 124.6 10.6 0.75 212-256 236,232 240,232 Tetra 4 D4S2394 ATA26B08 15 130.76 130 5.4 0.80 232-268 256,238 256,241 Tri 4 D4S1644 GATA11E09 16 142.33 143.3 13.3 0.74 162-222 202,194 198,194 Tetra 4 D4S1625 GATA107 17 144.08 146 2.7 0.76 162-218 202,198 198,198 Tetra 4 D4S1629 GATA8A05 18 158.91 158 12 0.72 129-165 141,141 153,153 Tetra 4 D4S2368 GATA27G03 19 169.41 167.6 9.6 0.74 284-336 320,320 312,304 Tetra 4 D4S2431 GGAA19H07 20 175.52 176.2 8.6 0.83 208-278 250,242 250,242 Tetra 4 D4S2417 GATA42H02P 21 180.86 181.9 5.7 0.75 246-286 275,263 275,271 Tetra 4 165zf8ZP 22 185.61 195.1 13.16 0.75 176-204 192,190 194,190 Di 4 D4S1652 GATA5B02M 23 190.78 208 12.94 0.75 123-159 151,143 139,139 Tetra Genetic Map Ends 211.7 3.7 Allele Size CEPH CEPH Repeat Chromosome Locus Marker Name Mb cM Gap Het Range 133101 133102 Type Genetic Map Begins 0 0 5 D5S2488 ATA20G07M 1 0.18 0 0 0.75 248-290 281,272 272,263 Tri 5 D5S2849 GATA145D10N 2 3.48 8 8 0.75 182-230 210,202 202,198 Tetra 5 D5S2505 GATA84E11 3 5.87 14.3 6.3 0.80 253-321 289,273 293,289 Tetra 5 D5S817 GATA3E10 4 11.64 23 8.7 0.69 244-284 268,264 268,268 Tetra 233 5 D5S2845 GATA134B03 5 22.44 36 13 0.67 135-175 163,151 163,159 Tetra

5 D5S1470 GATA7C06M 6 32.54 45 9 0.75 158-222 186,182 198,182 Tetra 5 D5S1457 GATA21D04 7 41.08 59 14 0.74 92-132 97,97 115,97 Tetra 5 MFD601 8 58.18 68.6 9.6 155-180 Di 5 D5S2500 GATA67D03 9 58.71 69 0.4 0.75 133-185 169,149 165,157 Tetra 5 GATA138B05ZP 10 71.49 77 8 0.75 107-157 Tetra 5 D5S1501 GATA52A12 11 78.55 85 8 0.81 78-126 102,102 180,102 Tetra 5 GATA89G08Z 12 89.25 98 13 0.75 141-201 177,173 169,169 Tetra 5 D5S1462 GATA3H06M 13 96.45 105 7 0.75 168-212 194,184 188,180 Tetra 5 D5S2501 GATA68A03 14 110.11 117 12 0.72 302-346 322,318 330,330 Tetra 5 D5S1505 GATA62A04 15 119.18 130 13 0.82 199-287 263,255 259,255 Tetra 5 D5S816 GATA2H09 16 135.38 139 9 0.83 221-261 253,249 249,225 Tetra 5 D5S1480 ATA23A10M 17 144.17 147 8 0.75 207-252 234,219 231,231 Tri 5 D5S820 GATA6E05 18 156.10 160 13 0.78 170-230 206,202 202,198 Tetra 5 D5S1456 GATA11A11P 19 169.01 175 15 0.75 186-233 214,202 206,198 Tetra 5 AAT013 20 173.31 183.9 8.9 0.75 253-292 274,271 280,274 Tri 5 AAAT072 21 179.99 196.6 12.7 0.75 189-217 205,197 205,201 Tetra Genetic Map Ends 197.5 0.9 Allele Size CEPH CEPH Repeat Chromosome Locus Marker Name Mb cM Gap Het Range 133101 133102 Type Genetic Map Begins 0 0 6 D6S942 UT6540 1 0.77 0 0 0.66 192-240 216,212 220,204 Tetra 6 TTA032Z 2 7.64 13.2 13.2 0.75 168-204 192,192 195,192 Tri 6 ATTT030 3 9.54 17.9 4.7 0.75 108-144 132,120 132,124 Tetra 6 ATA50C05ZP 4 14.05 25.1 7.2 0.67 84-161 108,111 117,120 Tri 6 D6S1959 GATA29A01 5 20.02 34 8.9 0.65 170-206 Tetra 6 D6S2439 GATA163B10N 6 24.41 42 8 0.75 214-278 246,238 234,234 Tetra 6 ATA12D05P 7 28.75 44.4 2.4 0.75 116-158 137,134 143,134 Tri 6 D6S1051 GATA61E03 8 36.68 51 6.6 0.75 211-251 239,235 235,231 Tetra 6 D6S1017 GGAT3H10M 9 41.72 63 12 0.75 152-184 172,160 160,152 Tetra 6 D6S2410 GATA11E02N 10 50.69 73 10 0.75 220-264 252,248 248,248 Tetra 6 D6S1053 GATA64D02 11 64.59 80 7 0.81 285-329 297,297 317,309 Tetra 6 D6S1031 ATA28B11 12 77.46 89 9 0.85 236-278 260,260 260,251 Tri 234 6 D6S1056 GATA68H04 13 94.09 103 14 0.85 229-277 253,241 265,245 Tetra

6 D6S1021 ATA11D10Z 14 104.72 112 9 0.75 121-163 145,142 148,148 Tri 6 D6S474 GATA31 15 112.92 119 7 0.77 139-179 167,167 167,151 Tetra 6 D6S1040 GATA23F08 16 130.97 129 10 0.76 253-297 281,277 281,273 Tetra 6 D6S1009 GATA32B03 17 137.28 138 9 0.79 221-277 257,237 249,241 Tetra 6 GATA184A08 18 147.95 146 8 0.77 149-209 185,173 173,169 Tetra 6 D6S2436 GATA165G02M 19 154.17 155 9 0.75 122-158 146,122 130,122 Tetra 6 ATA6C09P 20 159.94 164.8 9.8 0.75 126-162 147,141 147,144 Tri 6 D6S1277 GATA81B01 21 164.21 173 8.2 0.73 274-322 306,290 294,294 Tetra 6 D6S1027 ATA22G07P 22 168.97 187 14 0.75 118-152 145,139 145,124 Tri Genetic Map Ends 193.1 6.1 Allele Size CEPH CEPH Repeat Chromosome Locus Marker Name Mb cM Gap Het Range 133101 133102 Type Genetic Map Begins 0 0 7 D7S2477 035xb9ZP 1 0.26 0 0 0.80 192-258 Di 7 D7S1819 GATA24F03ZP 2 4.24 7 7 0.73 149-193 169,173 165,177 Tetra 7 GATA119B03 3 9.22 17 10 0.69 148-192 176,168 172,168 Tetra 7 D7S3051 GATA137H02N 4 18.03 29 12 0.75 134-194 166,166 162,162 Tetra 7 D7S1802 GATA41G07M 5 20.45 33 4 0.75 162-210 190,182 190,178 Tetra 7 D7S1808 GGAA3F06 6 27.78 42 9 0.79 240-300 272,260 268,252 Tetra 7 D7S817 GATA13G11ZP 7 31.88 50 8 0.78 139-189 169,169 169,161 Tetra 7 D7S2846 GATA31A10 8 37.88 58 8 0.76 156-204 184,172 184,180 Tetra 7 D7S1818 GATA24D12P 9 49.13 70 12 0.75 170-214 198,194 198,190 Tetra 7 D7S3046 GATA118G10 10 67.96 79 9 0.84 290-378 338,326 342,326 Tetra 7 D7S2204 GATA73D10L 11 77.74 91 12 0.75 197-273 261,225 265,229 Tetra 7 D7S820 GATA3F01 12 83.40 98 7 0.83 196-240 224,216 220,216 Tetra 7 D7S821 GATA5D08 13 95.67 109 11 0.83 234-282 258,258 258,238 Tetra 7 D7S1799 GATA23F05 14 103.76 114 5 0.73 155-211 183,179 179,175 Tetra 7 D7S3061 GGAA6D03N 15 122.84 128 14 0.75 110-166 146,134 134,114 Tetra 7 D7S1804 GATA43C11 16 131.70 137 9 0.87 228-306 286,266 266,262 Tetra 7 GATA63F08P 17 139.19 149.9 12.9 0.75 143-188 160,156 172,168 Tetra 7 GATA104 18 143.12 155 5.1 0.71 177-224 205,201 201,201 Tetra 7 D7S3070 GATA189C06M 19 150.96 163 8 0.75 165-229 209,201 197,197 Tetra 235 7 D7S3058 GATA30D09N 20 153.96 174 11 0.75 195-247 231,227 235,215 Tetra

MFD442- 7 GTTT002 21 155.41 178.6 4.6 0.75 263-299 279,279 279,279 Tetra Genetic Map Ends 182 3.4 Allele Size CEPH CEPH Repeat Chromosome Locus Marker Name Mb cM Gap Het Range 133101 133102 Type Genetic Map Begins 0 0 8 ATT023 1 4.13 4.3 4.3 0.75 176-223 191,188 188,188 Tri 8 TTCA004P 2 9.16 16 11.7 0.75 103-147 127,123 135,127 Tetra 8 ATT070Z 3 14.93 30 14 0.75 193-227 215,215 218,215 Tri 8 ATAA018P 4 21.26 43.4 13.4 0.75 101-137 125,113 121,113 Tetra 8 D8S1048 UT7129L 5 26.83 54 10.6 0.75 177-221 205,201 205,205 Tetra 8 D8S1477 GGAA20C10Z 6 32.12 60 6 0.75 239-299 251,159 267,271 Tetra 8 D8S1110 GATA8G10M 7 53.23 67 7 0.75 251-307 279,263 279,275 Tetra 8 D8S1113 GGAA8G07 8 59.76 78 11 0.81 209-265 241,217 233,217 Tetra 8 D8S1136 GATA41A01 9 66.12 82 4 0.73 225-273 249,249 257,253 Tetra 8 D8S2324 GATA14E09 10 74.30 94 12 0.75 180-224 212,204 208,204 Tetra 8 GATA8B01 11 91.98 104 10 0.74 214-266 234,234 246,234 Tetra 8 GAAT1A4N 12 99.15 110 6 0.75 128-168 152,140 156,140 Tetra 8 D8S1132 GATA26E03M 13 107.29 119 9 0.75 119-179 160,156 144,140 Tetra 8 D8S592 GATA6B02P 14 118.41 125 6 0.75 157-169 165,161 161,157 Tetra 8 D8S1128 GATA21C12 15 128.55 140 15 0.74 240-268 256,248 256,256 Tetra 8 D8S1108 GATA50D10 16 136.26 154 14 0.64 240-284 264,256 264,260 Tetra 8 D8S373 UT721M 17 144.33 164 10 0.75 182-234 206,206 210,202 Tetra Genetic Map Ends 167.9 3.9 Allele Size CEPH CEPH Repeat Chromosome Locus Marker Name Mb cM Gap Het Range 133101 133102 Type Genetic Map Begins 0 0 9 MFD455-AAT052 1 0.43 0 0 0.75 281-311 302,296 308,302 Tri 9 AAAAC001 2 4.17 9.8 9.8 0.75 165-205 180,180 190,170 Penta 9 D9S2169 GATA62F03M 3 5.19 14 4.2 0.75 260-308 292,288 292,288 Tetra 9 GATA187D09N 4 10.22 21.9 7.9 0.75 168-216 188,184 196,192 Tetra 9 AGAT142P 5 17.26 33.9 12 0.75 176-220 200,200 196,196 Tetra 236 9 D9S1121 GATA87E02N 6 25.39 44 10.1 0.75 160-232 208,184 208,208, Tetra

9 GATA5E06P 7 32.31 58.3 14.3 0.75 131-183 158,142 142,142 Tetra 9 D9S301 GATA7D12 8 69.26 66 7.7 0.81 193-249 237,209 229,213 Tetra 9 D9S1122 GATA89A11 9 75.15 76 10 0.71 178-218 202,190 198,190 Tetra 9 D9S922 GATA21F05 10 78.43 80 4 0.78 239-276 263,251 263,263 Tetra 9 D9S1120 GATA81C04M 11 83.55 89 9 0.75 271-312 308,304 304,300 Tetra 9 D9S910 ATA18A07M 12 97.00 104 15 0.75 94-139 127,106 127,109 Tri 9 GATA27Z 13 104.07 112.8 8.8 0.75 185-229 213,205 209,209 Tetra 9 D9S930 GATA48D07 14 110.62 120 7.2 0.78 270-314 298,290 290,290 Tetra 9 D9S934 GATA64G07 15 116.47 128 8 0.76 198-242 222,222 226,218 Tetra 9 D9S2157 ATA59H06Z 16 131.31 147 19 0.85 139-178 163,142 163,149 Tri 9 NEW TTTTA002 17 133.86 156.3 9.3 0.75 114-179 154,154 154,149 Penta Genetic Map Ends 169 12.7 Allele Size CEPH CEPH Repeat Chromosome Locus Marker Name Mb cM Gap Het Range 133101 133102 Type Genetic Map Begins 0 0 10 D10S1435 GATA88F09 1 2.20 4 4 0.75 240-292 268,264 268,268 Tetra 10 ATCC001 2 6.40 16.3 12.3 0.75 296-332 316,312 320,316 Tetra 10 D10S1412 ATA31G11P 3 9.27 28 11.7 0.75 155-185 167,158 164,158 Tri 10 D10S1430 GATA84C01ZP 4 12.74 33 5 0.68 171-219 187,187 191,187 Tetra 10 D10S1423 GATA70E11 5 19.44 46 13 0.77 202-250 230,226 234,226 Tetra 10 D10S1426 GATA73E11 6 30.50 59 13 0.73 140-184 168,164 160,160 Tetra 10 D10S1208 ATA5A04N 7 35.26 63 4 0.75 164-212 197,194 194,179 Tri 10 D10S1221 ATA21A03Z 8 56.87 76 13 0.78 104-143 122,143 137,141 Tri 10 D10S1225 ATA24F10 9 64.10 81 5 0.76 166-202 190,187 190,184 Tri 10 GATA121A08N 10 69.86 88 7 0.75 164-216 192,184 192,188 Tetra 10 D10S1432 GATA87G01 11 74.00 94 6 0.73 145-193 173,169 177,173 Tetra 10 D10S2327 GGAT1A4 12 80.06 101 7 0.62 192-232 223,204 208,204 Tetra 10 D10S2470 GATA115E01N 13 92.03 113 12 0.75 231-283 259,243 267,259 Tetra 10 D10S677 GGAA2F11N 14 95.63 117 4 0.75 181-237 213,205 221,201 Tetra 10 D10S1239 GATA64A09 15 102.86 125 8 0.76 148-196 184,172 180,180 Tetra 10 GATA71C09 16 119.00 137 12 0.54 156-196 180,168 180,180 Tetra 10 D10S1230 ATA29C03 17 122.41 143 6 0.73 108-144 129,126 132,123 Tri 237 10 D10S1222 ATA22D02 18 128.74 156 13 0.73 215-254 236,236 239,224 Tri

10 D10S1248 GGAA23C05N 19 130.57 165 9 0.75 217-273 257,245 253,245 Tetra 10 ATGT006Z 20 131.99 167.1 2.1 0.75 156-196 176,176 172,168 Tetra Genetic Map Ends 173.1 6 Allele Size CEPH CEPH Repeat Chromosome Locus Marker Name Mb cM Gap Het Range 133101 133102 Type Genetic Map Begins 0 0 11 D11S1984 GGAA17G05P 1 1.53 2 2 0.75 169-223 209,201 197,189 Tetra 11 D11S2362 ATA33B03Z 2 4.88 9 7 0.81 85-130 106,103 Tri 11 D11S1999 GATA23F06L 3 10.68 17 8 0.75 100-148 128,116 120,112 Tetra 11 D11S1981 GATA48E02 4 17.05 21 4 0.81 130-190 174,170 158,150 Tetra 11 ATA34E08N 5 25.90 33 12 0.75 144-184 165,162 165,162 Tri 11 D11S1392 GATA6B09P 6 34.60 43 10 0.75 191-235 215,215 215,211 Tetra 11 D11S1993 ATA1B07 7 43.57 54 11 0.77 215-254 230,230 245,224 Tri 11 D11S2365 GATA63F09 8 55.98 58 4 0.60 222-262 250,242 246,246 Tetra 11 D11S4459 ATA9B04N 9 56.29 58.1 0.1 0.75 237-267 246,246 258,246 Tri 11 D11S2006 GATA46A12 10 59.50 59 0.9 0.69 199-339 323,319 323,319 Tetra 11 D11S2371 GATA90D07N 11 73.23 76 17 0.75 181-221 201,201 201,197 Tetra 11 D11S2002 GATA30G01 12 79.69 85 9 0.80 216-264 244,224 240,232 Tetra 11 D11S2000 GATA28D01M 13 105.10 101 16 0.75 189-245 220,210 222,216 Tetra 11 NEW GATA71E06 14 107.06 103 2 0.74 165-200 187,175 191,187 Tetra 11 D11S1998 GATA23E06L 15 117.24 113 10 0.75 128-172 156,148 160,152 Tetra 11 D11S4464 GATA64D03 16 123.16 123 10 0.75 213-261 241,237 241,241 Tetra 11 D11S4463 GATA117D01N 17 130.41 136.5 13.5 0.75 107-147 131,123 127,127 Tetra 11 ATA27C11ZP 18 133.70 147.5 11 0.75 112-146 121,134 134,137 Tri Genetic Map Ends 147.8 0.3 Allele Size CEPH CEPH Repeat Chromosome Locus Marker Name Mb cM Gap Het Range 133101 133102 Type Genetic Map Begins 0 0 12 D12S372 GATA4H03 1 3.46 6 6 0.74 156-206 186,182 190,182 Tetra 12 GATA49D12N 2 8.05 18 12 0.75 161-249 197,193 205,189 Tetra 12 AAC040Z 3 13.07 27 9 0.75 280-325 Tri

238 12 D12S373 GATA6C01 4 16.91 36 9 0.72 192-232 220,212 220,212 Tetra 12 D12S1042 ATA27A06P 5 27.54 49 13 0.75 119-154 128,125 143,131 Tri

12 GATA91H06M 6 42.35 56 7 0.75 97-129 101,101 121,101 Tetra 12 D12S297 UT5029 7 50.90 65 9 0.85 201-285 245,209 241,233 Tetra 12 D12S1294 GATA73H09N 8 66.22 78 13 0.75 152-216 188,176 196,184 Tetra 12 D12S1052 GATA26D02M 9 73.90 83 5 0.75 146-166 158,154 162,158 Tetra 12 D12S1064 GATA63D12P 10 89.33 95 12 0.75 168-220 200,200 200,188 Tetra 12 D12S1300 GATA85A04M 11 97.00 104 9 0.75 104-148 124,120 128,120 Tetra 12 PAH PAH 12 101.79 109 5 0.79 213-265 237,237 249,237 Tetra 12 D12S2070 ATA25F09M 13 114.49 125 16 0.75 81-111 99,87 90,90 Tri 12 D12S2082 GGAA22C05 14 116.80 130.9 5.9 0.65 286-306 294-290 294-290 TETRA 12 D12S395 GATA4H01 15 118.60 136.8 5.9 0.78 211-263 239,223 239,223 Tetra 12 D12S2078 GATA32F05 16 126.31 150 13.2 0.78 246-298 274,274 282,270 Tetra 12 D12S1045 ATA29A06P 17 128.75 161 11 0.75 77-113 101,98 104,83 Tri Genetic Map Ends 168.8 7.8 Allele Size CEPH CEPH Repeat Chromosome Locus Marker Name Mb cM Gap Het Range 133101 133102 Type Genetic Map Begins 0 0 13 D13S787 GATA23C03P 1 22.18 9 9 0.75 241-282 266,266 270,262 Tetra 13 ATA5A09N 2 28.87 20 11 0.75 138-171 156,150 159,156 Tri 13 D13S1493 GGAA29H03N 3 31.81 26 6 0.75 207-259 239,227 231,231 Tetra 13 D13S894 GATA86H01 4 36.54 33 7 0.69 172-216 192,192 192,192 Tetra 13 D13S1807 GATA11C08P 5 52.90 47 14 0.75 185-229 217,217 213,205 Tetra 13 D13S800 GATA64F08 6 71.67 55 8 0.74 283-331 303,299 307,299 Tetra 13 D13S317 GATA7G10 7 80.52 64 9 0.78 167-207 199,195 191,179 Tetra 13 D13S793 GATA43H03N 8 95.65 76 12 0.75 245-281 265,265 261,261 Tetra 13 D13S779 ATA26D07 9 99.20 83 7 0.70 168-204 195,189 195,192 Tri 13 D13S796 GATA51B02ZP 10 105.59 94 11 0.75 174-234 198,206 202,212 Tetra 13 D13S895 GGAA22G01ZP 11 107.29 99 5 0.66 118,170 142,146 150,154 Tetra 13 AGAT113Z 12 109.99 107.9 8.9 0.75 156-196 180,176 188,184 Tetra Genetic Map Ends 115 7.1 Allele Size CEPH CEPH Repeat Chromosome Locus Marker Name Mb cM Gap Het Range 133101 133102 Type Genetic Map

239 Begins 0 0 14 D14S742 GATA74E02Z 1 20.19 12 12 0.75 344-388 372,372 376,372 Tetra

14 ATA77F05Z 2 23.74 23.2 11.2 0.75 105-153 126,129 126,132 Tri 14 D14S608 GATA43H01M 3 26.84 28 4.8 0.75 185-237 201,201 217,201 Tetra 14 D14S599 ATA29G03Z 4 32.64 41 13 0.75 96-132 116,107 122,113 Tri 14 GATA90G11M 5 46.07 51.1 10.1 0.75 221-261 249,245 245,241 Tetra 14 GGAA30H04ZP 6 55.28 60 8.9 0.75 235-295 259,275 267,279 Tetra 14 D14S592 ATA19H08 7 59.39 67 7 0.75 207-246 234,231 237,234 Tri 14 D14S588 GGAA4A12 8 68.21 76 9 0.75 101-150 133,133 125,125 Tetra 14 GATA169E06ZP 9 73.60 84.7 8.7 0.75 114-146 130,130 134,130 Tetra 14 GATA193A07 10 87.45 96 11.3 0.75 331-391 375,347 371,367 Tetra 14 D14S617 GGAA21G11L 11 90.19 106 10 0.75 125-183 167,167 147,143 Tetra 14 D14S1434 GATA168F06 12 93.30 113 7 0.75 209-233 228,224 224,224 Tetra 14 ATGG002 13 97.21 119 6 0.75 208-258 228,224 228,220 Tetra 14 ATT198Z 14 102.32 134.5 15.5 0.75 148-188 167,167 170,167 Tri Genetic Map Ends 138.2 3.7 Allele Size CEPH CEPH Repeat Chromosome Locus Marker Name Mb cM Gap Het Range 133101 133102 Type Genetic Map Begins 0 0 15 D15S1513 GATA143C02 1 23.73 11 11 0.61 225-250 240,240 244,236 Tetra 15 D15S822 GATA88H02N 2 24.95 12 1 0.75 238-314 294,286 290,267 Tetra 15 GTTTT001Z 3 31.21 25.8 13.8 0.75 262-307 277,277 277,287 Penta 15 GATA50C03N 4 34.71 35 9.2 0.75 250-306 286,282 282,278 Tetra 15 D15S659 GATA63A03N 5 44.09 43 8 0.75 174-206 194,182 198,178 Tetra 15 D15S643 GATA50G06 6 57.43 52 9 0.83 185-239 215,197 219,215 Tetra 15 D15S1507 GATA151F03N 7 63.06 60 8 0.75 192-228 208,204 212,208 Tetra 15 D15S818 GATA85D02 8 71.95 72 12 0.69 138-174 158,154 158,150 Tetra 15 204zg5ZP 9 83.71 80.04 8.04 0.75 103-139 117,113 117,125 Di 15 D15S652 ATA24A08 10 90.25 90 9.96 0.73 273-325 303,288 306,303 Tri 15 D15S816 GATA73F01M 11 92.75 101 11 0.75 113-157 137,133 137,133 Tetra 15 GATA197B10P 12 96.06 109 8 0.75 199-243 215,215 215,215 Tetra 15 D15S642 GATA27A03 13 100.07 122 13 0.84 200-218 210,200 214,210 Tetra Genetic Map Ends 122.2 0.2 Allele Size CEPH CEPH Repeat 240 Chromosome Locus Marker Name Mb cM Gap Het Range 133101 133102 Type Genetic Map 0 0

Begins 16 TTTA028 1 0.61 2.3 2.3 0.75 175-211 195,191 199,195 Tetra 16 ATA41E04 2 6.27 11 8.7 0.69 106-145 133,127 130,127 Tri 16 D16S748 ATA3A07 3 12.11 23 12 0.81 178-217 202,187 187,187 Tri 16 TTAT023Z 4 19.47 38.4 15.4 0.75 155-223 203,185 203,203 Tetra 16 CATA002z 5 23.72 46 7.6 0.75 232-252 240,236 244,244 Tetra 16 D16S769 GATA71H05 6 26.13 51 5 0.69 246-278 266,258 262,262 Tetra 16 D16S753 GGAA3G05 7 31.31 58 7 0.80 240-296 268,252 272,268 Tetra 16 D16S3396 ATA55A11N 8 50.97 64 6 0.75 124-163 148,142 151,139 Tri 16 D16S3253 GATA22F09P 9 54.57 72 8 0.75 166-214 182,182 194,186 Tetra 16 AAT107Z 10 59.67 81.1 9.1 0.75 253-259 271,271 271,274 Tri 16 D16S2624 GATA81D12M 11 71.51 88 6.9 0.75 121-161 137,137 149,137 Tetra 16 MFD466-TTA001 12 78.89 97.1 9.1 0.75 266-299 290,275 287,278 Tri 16 ATACC001 13 82.03 110.1 13 0.75 280-335 325,315 320,310 Penta 16 D16S539 GATA11C06N 14 86.17 125 14.9 0.75 132-180 164,156 168,164 Tetra 16 D16S2621 GATA71F09 15 87.76 130 5 0.79 223-275 255,255 255,255 Tetra Genetic Map Ends 134.1 4.1 Allele Size CEPH CEPH Repeat Chromosome Locus Marker Name Mb cM Gap Het Range 133101 133102 Type Genetic Map Begins 0 0 17 044xg3 1 0.00 0 116.86 0.77 226-238 232,232 232,230 Di - 17 D17S1308 GTAT1A05 2 0.61 1 115.86 0.64 292-328 304,304 308,304 Tetra 17 GATA158H04 3 6.53 14.7 13.7 0.77 158-202 186,178 186,174 Tetra 17 D17S974 GATA8C04 4 10.72 22 7.3 0.68 189-225 209,205 205,205 Tetra 17 ATA78D02N 5 13.32 32 10 0.75 178-212 193,187 190,190 Tri 17 D17S2196 GATA185H04N 6 17.47 45 13 0.75 127-177 155,139 151,139 Tetra 17 D17S1294 GGAA9D03 7 28.53 51 6 0.71 220-284 252,252 256,256 Tetra 17 D17S1299 GATA25A04 8 39.37 62 11 0.69 176-220 Tetra 17 095tc5ZP 9 50.64 74.99 12.99 0.75 100-140 Di 17 AAT245 10 55.77 79.5 4.51 0.75 148-204 192,188 188,172 Tetra 17 D17S1290 GATA49C09N 11 56.81 82 2.5 0.75 162-230 206,202 196,190 Tetra 241 17 300xa5P 12 62.92 88.06 6.06 0.75 250-286 270,268 262,260 Di

17 D17S1301 GATA28D11 13 73.28 100 11.94 0.65 131-175 159,151 155,151 Tetra 17 TTCA006M 14 76.20 106.1 6.1 0.75 147-183 171,163 167,163 Tetra Genetic Map Ends 126.5 20.4 Allele Size CEPH CEPH Repeat Chromosome Locus Marker Name Mb cM Gap Het Range 133101 133102 Type Genetic Map Begins 0 0 18 GATA178F11z 1 2.11 2.8 2.8 0.75 295-335 315,303 323,319 Tetra 18 ATA45G06 2 3.15 7 4.2 0.82 119-155 146,140 140,137 Tri 18 AGAT060 3 6.30 20.6 13.6 0.75 251-287 271,259 275,259 Tetra 18 D18S843 ACT1A01 4 8.60 28 7.4 0.74 173-197 185,179 188,185 Tri 18 D18S542 GATA11A06 5 11.48 41 13 0.80 168-214 194,194 190,182 Tetra 18 D18S877 GATA64H04 6 24.98 54 13 0.69 105-145 133,129 129,121 Tetra 18 D18S535 GATA13 7 36.40 64 10 0.77 123-167 147,131 155,139 Tetra 18 D18S851 GATA6D09 8 48.36 75 11 0.75 232-284 264,256 268,268 Tetra 18 D18S858 ATA23G05 9 53.05 80 5 0.76 184-217 196,196 208,205 Tri 18 D18S1357 ATA7D07 10 56.97 89 9 0.81 114-151 129,129 135,126 Tri 18 D18S1364 GATA7E12 11 61.55 99 10 0.80 152-204 172,172 172,164 Tetra 18 ATA82B02N 12 67.06 107 8 0.75 160-205 190,175 187,172 Tri 18 D18S1371 GATA177C03N 13 71.24 116 9 0.75 125-165 149,145 149,149 Tetra Genetic Map Ends 126 10 Allele Size CEPH CEPH Repeat Chromosome Locus Marker Name Mb cM Gap Het Range 133101 133102 Type Genetic Map Begins 0 0 19 D19S591 GATA44F10P 1 3.03 9.8 9.8 0.75 87-131 115,107 115,111 Tetra 19 D19S1034 GATA21G05 2 6.06 21 11.2 0.72 216-232 238,230 234,230 Tetra 19 D19S586 GATA23B01N 3 9.67 33 12 0.75 218-258 238,238 238,238 Tetra 19 D19S714 GATA66B04 4 15.59 42 9 0.79 216-264 248,244 240,224 Tetra 19 D19S433 GGAA2A03 5 35.11 52 10 0.79 183-229 209,203 221,215 Tetra 19 Unknown GATA156F11 6 42.56 62 10 0.67 218-242 234,230 234,230 Tetra 19 D19S559 UT7544 7 50.02 68 6 0.77 160-200 184,184 188,188 Tetra 19 D19S246 Mfd232 8 55.65 78 10 0.84 177-261 213,185 229,209 Tetra

242 19 D19S589 GATA29B01L 9 58.50 88 10 0.75 148-192 176,168 172,172 Tetra 19 D19S254 Mfd238 10 62.36 101 13 0.78 106-162 146,118 146,138 Tetra

Genetic Map Ends 105 4 Allele Size CEPH CEPH Repeat Chromosome Locus Marker Name Mb cM Gap Het Range 133101 133102 Type Genetic Map Begins 0 0 20 D20S482 GATA51D03 1 4.50 12 12 0.69 135-179 159,155 167,151 Tetra 20 D20S602 GATA72E11 2 7.32 21.2 9.2 0.63 264-296 288,280 288,280 Tetra 20 D20S1143 GATA129B03N 3 15.29 36 14.8 0.75 152-192 180,176 176,172 Tetra 20 GATA29F06Z 4 22.41 48 12 0.75 216-273 261,257 253,241 Tetra 20 D20S478 GATA42A03 5 37.92 54 6 0.82 223-283 263,263 263,263 Tetra 20 D20S481 GATA47F05 6 44.45 62 8 0.83 201-269 241,237 245,233 Tetra 20 AAT269 7 48.64 71.3 9.3 0.75 220-253 241,238 238,238 Tri 20 D20S480 GATA45B10N 8 52.54 80 8.7 0.75 276-316 304,292 300,292 Tetra 20 D20S451 UT254 9 57.35 90 10 0.84 268-356 320,320 308,296 Tetra 20 D20S164 UT1772 10 57.74 101 11 0.73 179-227 191,191 191,191 Tetra Genetic Map Ends 101.2 0.2 Allele Size CEPH CEPH Repeat Chromosome Locus Marker Name Mb cM Gap Het Range 133101 133102 Type Genetic Map Begins 0 0 21 D21S1437 GGAA3C07 1 20.57 13 13 0.79 103-155 139,131 131,127 Tetra 21 D21S2052 GATA129D11N 2 27.74 25 12 0.75 96-166 137,125 145,137 Tetra 21 D21S1440 ATA27F01 3 38.06 37 12 0.72 145-181 160,157 166,157 Tri 21 D21S2055 GATA188F04 4 40.11 40 3 0.88 105-213 157,153 153,121 Tetra 21 UT1355z 5 43.05 51 11 0.75 264-340 300,300 312,300 21 D21S1446 GATA70B08 6 46.89 57.8 6.8 0.73 187-235 223,209 223,223 Tetra Genetic Map Ends 57.8 0 Allele Size CEPH CEPH Repeat Chromosome Locus Marker Name Mb cM Gap Het Range 133101 133102 Type Genetic Map Begins 0 0 22 GATA198B05N 1 16.03 2 2 0.75 113-177 157,141 153,141 Tetra 22 AGAT120 2 19.67 11.9 9.9 0.75 243-287 279,279 279,279 Tetra 22 ATTT019M 3 23.05 15.8 3.9 0.75 136-174 156,156 160,152 Tetra

243 22 D22S689 GATA21F03 4 27.18 29 13.2 0.77 190-238 222,214 218,214 Tetra 22 D22S683 GATA11B12 5 34.79 36 7 0.89 154-233 184,176 198,184 Tetra

22 D22S1045 ATA37D06 6 35.78 43 7 0.64 125-167 155,152 152,155 Tri 22 D22S532 UT7136 7 44.40 53 10 0.66 149-197 181,177 177,173 Tetra 22 TCTA015M 8 47.76 61.4 8.4 0.75 198-234 218,210 218,206 Tetra Genetic Map Ends 62.3 0.9 Allele Size CEPH CEPH Repeat Chromosome Locus Marker Name Mb cM Gap Het Range 133101 133102 Type Genetic Map Begins 0 0 23 GATA52B03 4.20 4.4 4.4 0.67 245-285 253,253 269,265 Tetra 23 AGAT144 8.12 11.3 6.9 0.75 294-338 322,322 326,306 Tetra 23 DXS9902 GATA175D03 14.69 22 10.7 0.70 158-198 178,178 178,174 Tetra 23 ATA28C05 22.61 37.4 15.4 0.62 270-306 279,279 291,285 Tri 23 DXS9896 GATA124E07 28.70 39.5 2.1 0.75 180-248 204,204 196,192 Tetra 23 GATA027M 33.80 52.5 13 0.68 163-203 183,183 187,183 Tetra 23 DXS6810 GATA69C12 41.95 63.6 11.1 0.67 211-227 223,223 219,215 Tetra 23 GATA144D04 44.06 71.3 7.7 0.75 222-254 242,242 296,288 Tetra 23 DXS7132 GATA72E05M 63.52 83 11.7 0.75 280-304 296,296 296,288 Tetra 23 DXS6800 GATA31D10M 77.44 93 10 0.75 198-222 198,198 206,202 Tetra 23 DXS6789 GATA31F01P 94.22 104 11 0.75 113-169 153,153 157,153 Tetra 23 GATA172D05 111.94 116 12 0.81 102-146 130,130 126,110 Tetra 23 DXS6805 GATA48H04 115.26 120.5 4.5 0.65 308-352 340,336 336,336 Tetra 23 GATA165B12P 119.58 133 12.5 0.75 125-157 141,141 149,145 Tetra 23 ATCT003 127.51 143.5 10.5 0.75 279-323 299,299 299,299 Tetra 23 D3S2390 GATA31E08 138.93 154 10.5 0.78 218-262 246,246 246,242 Tetra 23 TATC043 144.37 165.6 11.6 0.75 242-282 270,270 266,262 Tetra 23 224ZG11 146.30 169 3.4 0.75 96-132 113,113 115,115 Di 23 TTTA062 153.05 181 12 0.75 132-164 152,152 148,148 Tetra Genetic Map Ends 184.3 3.3 Allele Size CEPH CEPH Repeat Chromosome Locus Marker Name Mb cM Gap Het Range 133101 133102 Type Genetic Map Begins 0 0 24 DXYS218 GATA2A12ZP 0.88 0 0 0.68 99-147 131,131 127,127 Tetra

244 24 DXS9900 GGAT3F08 1.44 4 4 0.75 146-190 178,170 166,158 Tetra 24 DXYS154 sDF1 154.64 184 180 0.77 222-254 244,240 240,238 Di

Genetic Map Ends Allele Size CEPH CEPH Repeat Chromosome Locus Marker Name Mb cM Gap Het Range 133101 133102 Type Genetic Map Begins 0 0 25 DYS395 GATA81D05 2.78 0 0 0.38 110-138 126,126 0,0 Tetra 25 DYS391 GATA32C10 13.11 0 0 0.53 265-301 289,289 0,0 Tetra 25 DYS389 GATA30F10L 13.62 0 0 0.75 251-259 251,251 0,0 Tetra 25 GATA30F10LA 13.62 0 0 0.75 383-423 403,403 0,0 Tetra 25 GATA31E10 16.22 0 0 0.72 197-233 221,221 0,0 Tetra Genetic Map Ends Allele Size CEPH CEPH Repeat Chromosome Locus Marker Name Mb cM Gap Het Range 133101 133102 Type Genetic Map Begins 0 0 26 MID0448Z 90.32 0 0 0.75 80-110 100,90 90,90 Penta

245

APPENDIX D

VWF c.4120C>T Expression in Cell Culture

Figure D.1: Functional characterization of the c.4120C>T (R1374C) mutation (A) Secretion of recombinant VWF from HEK293 cells transfected with the mutated allele alone, the wild-type allele alone and both alleles co-expressed at different concentrations. (B) Multimer patterns of the secreted VWF show “smeared” configuration when the 4120T mutation is expressed alone (left) and clearly defined bands when co- expressed with wild type. (C) Ristocetin-induced platelet binding to recombinant wild- type VWF (), R1374C-VWF (■), and co-expressed alleles R1374C/WT (○). (D) Botrocetin-induced platelet binding to recombinant R1374C-VWF (■), wild-type VWF (), and co-expressed R1374C/WT VWF (○).

246

APPENDIX E

Chromosome 9q34 ABO-ADAMTS13 Tagging SNPs

SNP Gene log.VWFAg log.VWFRColog.FVIII rs8176749 ABO 0.1293463 0.0201647 0.1239331 rs8176739 ABO 0.4596651 0.5589601 0.9932298 rs2073824 ABO 0.000266 0.001971 0.0032395 rs8176721 ABO 1 1 1 rs512770 ABO 0.011546 0.0124137 0.087933 rs8176707 ABO 0.0013131 0.004541 0.0235884 rs8176704 ABO 0.5423531 0.7414893 0.5254834 rs8176694 ABO 0.0001901 0.0019563 0.0002499 rs657152 ABO 1.82E-07 9.89E-07 8.58E-06 rs474279 ABO 6.65E-06 3.35E-05 0.0002325 rs8176681 ABO 0.7520361 0.8533457 0.774533 rs500498 ABO 0.0005473 0.0013157 0.0036143 rs630014 ABO 2.38E-07 1.75E-06 1.42E-05 rs8176630 Intergenic 0.0632992 0.045705 0.0861703 rs579459 Intergenic 6.29E-06 4.34E-05 2.61E-05 rs7030248 Intergenic 0.9807863 0.9429567 0.4159449 rs558240 Intergenic 0.0005096 0.00285 0.001577 rs7867739 Intergenic 0.7211956 0.463971 0.8908319 rs7025162 Intergenic 0.6696191 0.6353766 0.6238697 rs4962043 Intergenic 0.8012191 0.879608 0.7514891 rs554710 Intergenic 0.0090429 0.0404194 0.0057972 rs17150319 Intergenic 0.995778 0.3666132 0.2418583 rs493014 Intergenic 0.2162935 0.3788538 0.4548992 rs9411508 Intergenic 0.7660441 0.6006146 0.6238885 rs10901263 Intergenic 0.0096928 0.0090574 0.0017128 rs17150482 Intergenic 0.9849042 0.6556539 0.5885834 rs10751505 Intergenic 0.0902177 0.0681751 0.0184148 rs10793964 SURF6 0.0571819 0.0223046 0.0243949 rs563666 SURF6 1 1 1 rs11655 MED22 0.198296 0.2864716 0.1608362 rs621907 MED22 0.738893 0.1767743 0.230251 rs669056 MED22 1 1 1 rs553877 MED22 0.4857907 0.2105227 0.1030687 rs2021976 SURF1 1 1 1 rs4962046 SURF4 0.5363695 0.334225 0.3163036 rs1179037 SURF4 0.5317921 0.9531754 0.4657656 rs28602591 c9orf96 0.6090107 0.5324088 0.2507892

247

rs36091105 c9orf96 0.1599194 0.179287 0.1187587 rs4549850 c9orf96 0.2432379 0.331567 0.1514483 rs7038253 c9orf96 1 1 1 rs3118667 ADAMTS13 0.4176771 0.4501604 0.6319319 rs2073933 ADAMTS13 0.4734933 0.3742429 0.2121071 rs2073932 ADAMTS13 0.5576623 0.491613 0.2321513 rs603551 ADAMTS13 0.8884007 0.6814501 0.5190141 rs652600 ADAMTS13 0.5030662 0.4851808 0.609375 rs4962153 ADAMTS13 0.9816329 0.2478957 0.2755092 rs1055432 ADAMTS13 0.027755 0.0249294 0.1641412

248

APPENDIX F

VWF Tagging SNPs

SNP Base SNP Base SNP Base rs10160866 6014627 rs17491334 6103844 rs980131 6169198 rs10466913 6022495 rs12099542 6103998 rs7139057 6169323 rs7306590 6023686 rs216904 6106018 rs4764482 6169733 rs2362479 6026354 rs216905 6106255 rs12319392 6170486 rs3825331 6030154 rs216800 6107946 rs1800378 6172202 rs7958705 6044881 rs216805 6111476 rs1800387 6182828 rs7398412 6046849 rs216811 6115274 rs3782711 6183181 rs2362481 6049161 rs216812 6117211 rs10849380 6185309 rs12367467 6049568 rs2363329 6119396 rs2109118 6194655 rs7310736 6052265 rs542993 6120431 rs2239147 6195527 rs2270151 6060960 rs9634155 6123777 rs2238101 6198231 rs7961998 6061341 rs216308 6124666 rs2239142 6199078 rs2286646 6061491 rs11613273 6126097 rs12317079 6202462 rs12297370 6062777 rs216309 6126927 rs11836843 6202818 rs3759321 6062878 rs1800385 6127891 rs41388848 6208618 rs12317523 6063142 rs1800384 6127919 rs1579229 6223766 rs10774387 6063429 rs216310 6127943 rs4764537 6225242 rs10849363 6066357 rs216311 6128443 rs4764538 6248983 rs7964554 6067152 rs216313 6129264 rs10849396 6265578 rs12423243 6067240 rs2885752 6132784 rs12320321 6270739 rs12369177 6073520 rs2854871 6133079 rs10849399 6271103 rs11063961 6074016 rs216314 6136580 rs10774401 6272849 rs12829220 6074650 rs11611917 6136634 rs4764478 6078125 rs11612370 6136832 rs12300917 6079058 rs11612384 6136846 rs7301070 6080290 rs216330 6146630 rs917857 6081824 rs11064003 6149706 rs1159993 6083564 rs216340 6149799 rs4764521 6084323 rs11064005 6150066 rs216855 6085513 rs2239162 6151670 rs216865 6090303 rs11064008 6153264 rs216867 6091000 rs216293 6153659 rs2058473 6093926 rs216295 6154206 rs216873 6095274 rs7954351 6163705 rs11063977 6097946 rs12304995 6164455 rs216896 6100796 rs980130 6169027

249

APPENDIX G

VWF Tagging SNP Primers

SNP Forward Reverse Size rs216811 TTGACTCCTTCCTCATTTCC CGGGCAGCCAAGAATAC 546 rs216812 CTCCGTCAGAGTGTATTATCCT TACATTACAATTGATAGCGCAG 470 rs2363329 GAGTGCATGCATCTCCAC CTTCTAGAGATCTGCTGTACAACA 332 rs542993 GCTGAGGCCTTCGTG TCTTAATCCAGGCCGC 503 rs9634155 GCAAGTCATTTAACCGTCC GGAGGTGTGACTCTCGAAG 400 rs216308 GTTAAGACAGGGTGTCGAGC CAATACGATTTGCACATTTTG 991 rs11613273 CTAAATCTGAGGTAGTCCGGAC ACTGACACCTGAGTGAGACG 455 rs216309 TTGAGCACTTGTCATGTGC GGTGGTGATGATAAGCTTCAT 358 rs1800385 GCGTTCGTCCTGGAAG TGTGGTCAGAGAGGTACCG 268 rs1800384 rs1800385 captures N/A N/A rs216311 GTTGTCAACCTCACCTGTG CCTTCCAGGACGAACG 830 rs216313 GCCCAACATATCAAACATG CATGGACCATTCCACATC 393 rs2885752 CTGATGTATTTCCCCATTCTCT GCCAATGTCTTAACCCTCC 756 rs2854871 rs2885752 captures N/A N/A rs216314 TGTTCCGTCTGCCTAAAC TCATATGGTTCTAGAGGTTGG 481 rs11612370 rs216314 captures N/A N/A

Primer Name Sequence Tm Size VWF_rs216311 F GATGTTGTCAACCTCACCTG 53.4 832 VWF_rs216311 R CTTCCAGGACGAACGC 53.2

VWF_rs10849363 F TCATTGTGTTACAGTTGCCTAC 55.44 210 VWF_rs10849363 R TCACGTGTTGCTTAACAATG 55.77

VWF_rs11064003 F TTTCCCACCACGAACTTAC 56.01 209 VWF_rs11064003 R CACACTTCCAGGTTTGAATC 55.55

VWF_rs216293 F GTACACTCGAGCCCTTCAG 55.92 295 VWF_rs216293 R CAATGTAAGCCAGAGGTCAG 55.93

VWF_rs7964554 F TAAATCACCACTTTCAGTTCAC 54.47 249 VWF_rs7964554 R TGGCCTTTTCACTGTATTTC 55.38

VWF_rs216310 F GCGTTCGTCCTGGAAG 56 268 VWF_rs216310 R TGTGGTCAGAGAGGTACCG 57.1

250

VWF_rs216308 F GTTAAGACAGGGTGTCGAGC 57.4 991 VWF_rs216308 R CAATACGATTTGCACATTTTG 56.8

VWF_rs11612370 F agttcaaaaggtgctgtgac 55.35 213 VWF_rs11612370 R ggttctagaggttggaaagc 55.61

VWF_rs11612384 F acagacagaagcagaaaagg 54.72 385 VWF_rs11612384 R TGTATGTGTCAAACGTGGTC 55.23

VWF_rs10849363.1 F GGTAAGTGCCCTATACAGTATGTC 56.1 305 VWF_rs10849363 R TCACGTGTTGCTTAACAATG 55.8

VWF_rs216330 F TTAACACTCTGGAGGTGAGG 55.29 244 VWF_rs216330 R TGGGGCTGGTAGTTCTTAG 55.41

VWF_rs2239162 F CACTTGGATACCTGTCACG 54.85 249 VWF_rs2239162 R CAGCTCCATTTTCTTTGTG 54.39

VWF_rs1800383 F AGCAGTGTGGATGAGCTG 55.65 208 VWF_rs1800383 R CAATTTTGTCCGATCCTTC 55.54

VWF_rs216330.1 F TTGGGCATAAGTAGAAATCG 55.51 233 VWF_rs216330.1 R GGTTTGGCTGACTCTTCTG 55.88

VWF_rs216330.2 F CCAAGTGTTCTTCAATGAGC 55.89 346 VWF_rs216330.2 R AGAGTGGGTTTGGCTGAC 56.01

251

APPENDIX H

Screenshot of rs216308 in the USCS Genome Browser with GERP++ and ENCODE

Tracks

"~

~ •;

i - I •; . ~ ! 1; ! f . I ! f l !h ~ ! I ! i r: .i ii JH ~ !! P i 11 I a .! ! u •·J s i ju uuuu I f j " I1 1PJ • i i j I I f ~ •; ! J " . i . "'" ! I I I i f ! i I i l.

"i ~~

252

This screenshot was generated using the USCS genome browser

(http://genome.ucsc.edu/) showing the Conservation (GERP++) and ENCODE tracks. rs216308 corresponds to the single black bar highlighted in red above. GERP rejection scores are displayed in full view as staggered bars to the right of the SNPs track. All subsequent tracks correspond to the ENCODE project dataset. Project track names are provided in the center of the track and species or cell lines are listed at the bottom of each track in this screenshot. Prediction strength of a regulatory element mapping to the corresponding location is scaled with black being the upper limit.

253

APPENDIX I

CBC Genome-Wide QTL Linkage Analysis LOD Scores > 1.0

Phenotype Chromosome Nearest Marker LOD PLT 1q31.3 GATA48B01 1.77 PLT 2p25.1 GGAA20G10 1.20 PLT 2p22 GATA86E02 1.20 WBC 2q35 GATA4G12 1.53 MCH 3q26 GATA3H01 1.39 MCHC 3p14 AAC023 1.06 PLT 4p16 GATA22G05 1.33 WBC 4q13 GATA28F03 1.19 MCV 4q35 165zf8ZP 1.76 WBC 5q22 GATA68A03 1.45 MCV 6q25 GATA165G02 1.23 MCH 6q24 GATA184A08 2.46 MCHC 6p22 GATA163B10 1.55 MCV 7p21 GATA41G07 1.77 MCH 7p21 GATA41G07 1.06 WBC 8p23 ATT023 1.33 WBC 8q24 GATA21C12 1.28 PLT 9q22 ATA18A07 1.34 WBC 9q33 GATA64G07 2.18 WBC 10p15 GATA88F09 1.98 RDW 10p12 GATA70E11 2.09 WBC 13q14 GATA11C08 1.61 MCV 13q32 ATA26D07 1.30 RDW 13q32 ATA26D07 1.61 MCV 14q24 GGAA4A12 1.35 MCH 16q24 GATA11C06 1.13 PLT 19p13 GATA21G05 1.42 WBC 20p12 GATA129B03 1.16 MCV 20q13 GATA47F05 2.17 MCH 20q13 GATA47F05 1.06 PLT 22q11 GATA198B05- 1.69 AGAT120 MCHC 22q11 GATA198B05 1.11

254

APPENDIX J

M1304R Molecular Dynamics Simulations Methods

The simulation with the wild-type was started from the crystallographic structure with PDB code 1AUQ. The mutant M1304R was obtained per homology using the program CHARMM to construct missing atoms from internal coordinates. The coordinates of the mutated side chain were minimized with 100 steps of steepest descent1.

Unperturbed simulations

The MD simulations were performed with the program NAMD using the

CHARMM all-hydrogen force field (PARAM22) and the TIP3P model of water2,3. The protein was solvated in a cubic water box with side length of 84 Å. Chloride and sodium ions were added to neutralize the system and approximate a salt concentration of 150 mM. To avoid finite size effects, periodic boundary conditions were applied. After solvation, the system underwent 500 steps of minimization while the coordinates of the heavy atoms of the protein were held fixed and subsequent 500 steps with no restraints.

Electrostatic interactions were calculated within a cutoff of 10 Å, while long-range electrostatic effects were taken into account by the Particle Mesh Ewald summation method4. Van der Waals interactions were treated with the use of a switch function starting at 8 Å and turning off at 10 Å. The dynamics were integrated with a time step of

2 fs using SHAKE to rigidly constrain hydrogen atoms. Snapshots were saved every 10 ps for trajectory analysis. Before production runs, harmonic constraints were applied to the positions of all heavy atoms of the protein to equilibrate the system at 300 K during a time length of 0.2 ns. For the simulation with the mutant, harmonic constraints were kept on all heavy atoms except those of the mutated residue and of amino acids where any heavy atom is at least 4 Å from any heavy atom of the mutated residue, and equilibration

255 was continued for another 2 ns. After this equilibration phase, the harmonic constraints were released. One simulation was performed with the wild-type and one with the

M1304R mutant. Each run lasted 50 ns. During all runs, the temperature was kept constant at 300 K by using the Langevin thermostat with a damping coefficient of 1 ps-1,

while the pressure was held constant at 1 atm by applying a pressure piston5,6.

Free energy perturbation (FEP) calculations

Using MD simulation to predict differences in folding ΔG due to mutations is challenging because an exhaustive sampling of the free energy landscape of a protein is computationally prohibitive. By combining the thermodynamic cycle and free energy perturbation simulations, an estimate of the difference in folding ΔG (ΔΔG) can be

calculated. The free energy perturbation calculations were performed using the so

called "dual-topology" approach, where the initial and final states are defined7,8. A MD simulation is performed where the atoms belonging to the initial state (M1304) slowly disappear while the atoms of the final state (R1304) slowly appear. This so called

"alchemical transformation" gives an estimate of the free energy difference between the two states. An illustration of the method can be found here: http://www.ks.uiuc.edu/Research/namd/2.9/ug/node58.html A similar protocol was used here as in a previous study were FEP was used to study binding affinities9. Each FEP

run lasted in total 1 ns and contained 40 intermediate states. Each intermediate state

consisted of 12.5 ps of equilibration and 12.5 ps of data collection. A soft-core term was

used to avoid singularities in the van der Waals potential10. In total, three FEP runs were

performed with the folded state of the protein. They were started from snapshots

collected after 10 ns, 20and 30 ns, respectively, during the 50-ns simulation with the

wild-type. In order to perform FEP in the unfolded state, the assumption was made that

when the protein is denatured the mutation site is completely solvent exposed and can

256 be approximated by a tri-peptide consisting of the mutated side-chain and the neighboring residues. The resulting tri-peptide Met1303-Met1304-Glu1305 was solvated in a cubic water box of 40 Å side length and one FEP run was performed. Using the thermodynamic cycle illustrated in Figure J.1, an estimate for ΔΔG was calculated. Since three runs with the folded state were performed, the average and standard error of the mean were calculated.

Figure J.1: Illustration of the thermodynamic cycle. The thermodynamic cycle allows estimation of the folding DeltaDeltaG in the following 2 1 2 1 1 way: ΔΔG = ΔGfold – ΔGfold = ΔGalch – ΔG alch . ΔGfold is calculated from the FEP run using the tri-peptide containing the mutated side chain, which mimics the unfolded state 2 by completely exposing the mutation site to solvent. ΔGfold is calculated from the FEP runs of the folded state.

1. Brooks, B.R. et al. CHARMM: The Biomolecular Simulation Program. Journal of Computational Chemistry 30, 1545‐1614 (2009). 2. MacKerell, A.D. et al. All‐atom empirical potential for molecular modeling and dynamics studies of proteins. Journal of Physical Chemistry B 102, 3586‐3616 (1998). 3. Phillips, J.C. et al. Scalable molecular dynamics with NAMD. Journal of Computational Chemistry 26, 1781‐1802 (2005). 4. Darden, T., York, D. & Pedersen, L. Particle Mesh Ewald ‐ an N.Log(N) Method for Ewald Sums in Large Systems. Journal of Chemical Physics 98, 10089‐10092 (1993). 5. Feller, S.E., Zhang, Y.H., Pastor, R.W. & Brooks, B.R. Constant‐Pressure Molecular‐ Dynamics Simulation ‐ the Langevin Piston Method. Journal of Chemical Physics 103, 4613‐4621 (1995).

257

6. Schneider, T. & Stoll, E. Molecular‐Dynamics Study of a 3‐Dimensional One‐Component Model for Distortive Phase‐Transitions. Physical Review B 17, 1302‐1322 (1978). 7. Gao, J., Kuczera, K., Tidor, B. & Karplus, M. Hidden Thermodynamics of Mutant Proteins ‐ a Molecular‐Dynamics Analysis. Science 244, 1069 ‐1072 (1989). 8. Pearlman, D.A. A Comparison of Alternative Approaches to Free‐Energy Calculations. Journal of Physical Chemistry 98, 1487‐1493 (1994). 9. Vergara‐Jaque, A. et al. Molecular Basis of Drug Resistance in A/H1N1 Virus. Journal of Chemical Information and Modeling 52, 2650‐2656 (2012). 10. Zacharias, M., Straatsma, T.P. & Mccammon, J.A. Separation‐Shifted Scaling, a New Scaling Method for Lennard‐Jones Interactions in Thermodynamic Integration. Journal of Chemical Physics 100, 9025‐9031 (1994).

258

APPENDIX K

M1304R Individual I.1 and III.1 VWF Variants

Table K.1: Individual I.1 VWF variants Variant

Variant AA Change class Variant_function

c.2365A>G T789A SNP NON_SYNONYMOUS_CODING

c.2555A>G Q852R SNP NON_SYNONYMOUS_CODING

c.3911T>G M1304R SNP NON_SYNONYMOUS_CODING

c.4141A>G T1381A SNP NON_SYNONYMOUS_CODING

c.4196G>A R1399H SNP NON_SYNONYMOUS_CODING

c.4751A>G Y1584C SNP NON_SYNONYMOUS_CODING

c.4751A>G Y1584C SNP NON_SYNONYMOUS_CODING

Table K.2: Individual III.1 VWF variants Variant Genotype Variant type Variant function

C C IVS3+52T>C SNP INTRONIC

G G c.4141A>G SNP NON SYNONYMOUS CODING

G G c.4173A>G SNP SYNONYMOUS CODING

A G c.1451A>G SNP NON SYNONYMOUS CODING

CTT CTT IVS15-17_-16insCTT insertion INTRONIC

A A IVS17-42C>A SNP INTRONIC

G T IVS18+38G>T SNP INTRONIC

T T IVS19+25C>T SNP INTRONIC

G G c.2555A>G SNP NON SYNONYMOUS CODING

T T IVS20+67G>T SNP INTRONIC

C C c.4641T>C SNP SYNONYMOUS CODING

259

G G c.4751A>G SNP NON SYNONYMOUS CODING

C T c.5844C>T SNP SYNONYMOUS CODING

T T IVS38-14C>T SNP INTRONIC

C T IVS41-7C>T SNP INTRONIC

C C c.7239T>C SNP SYNONYMOUS CODING

A C IVS44-59A>C SNP INTRONIC

C T c.-2709C>T SNP UPSTREAM

A G c.-2661A>G SNP UPSTREAM

G A c.-2527G>A SNP UPSTREAM

C T c.-64C>T SNP UPSTREAM

T C c.270T>C SNP DOWNSTREAM

260

APPENDIX L

NBEAL2 Sequencing Primers

Forward Reverse Amplicon 1 attcagaccctggggaag caaggcataagtgccagag Amplicon 2 cattcttcgcctctgtctac ggtcacaataatacaccatgc Amplicon 3 GCTACCACTGGATGAGCTG tctttggttctcccagtcc Amplicon 4 ccctcacactggatcttttac gcaaaacactgactctgtcc Amplicon 5 accctactcttccctagtgc cacccatttctagaccacac Amplicon 6 gtagaaggcctggtgctc ggagagctcaccatctgtg Amplicon 7 tagatcatgttgctgactgc aagccagttatgtctgcttg Amplicon 8 TGTCTGCACCCTATGGATAC ATACAGCAGGAGGAGAAAGG Amplicon 9 acaggcctgactttagcc GCAGATGTTGTTCTTACAAGC

Amplicon 10 TCCTCCATTACTCACCTCAG caggaggttcttcatgactg Amplicon 11 AACCAAGAGAGCCTGGTG TGGACATGTACTGGATGTGAC Amplicon 12 CTGGACCCTCAGTGACTTC AGAAAGACAGCCAAGGACTC Amplicon 13 GTCTCTCAGCAGATGACGTG GCAGATCTGAGAGGTTCAGG Amplicon 14 GCCTCTACAAGCTGTTCCTG TAGGTGACTCTGGGGAAGAC Amplicon 15 CGTTCGCCTAGACATCTG acctttggagaggctacac Amplicon 16 ctgttagccaaatgctctgac GCTGCATACACCTTCAAAGAG Amplicon 17 TAACCAGGAACTGTGGAGTG GCATACGTGTCCATTTCG Amplicon 18 GTGCCACACCCGAATGG ACTTTGGCCTCTTTGGTC Amplicon 19 ATCACTTCGACCCTCACC cagaagaggacacactcagg Amplicon 20 catctctgggagtcatgagag GGATATCTCACGCTGTACCC Amplicon 21 CCCTCTCAAGGCTACCTAAG atctacccttgaccaagctc Amplicon 22 cctgtgacccctctaagtg GAAGTCCTCAGGAGAGCTG Amplicon 23 CCGGAATTCTTCTACTTTCC ttgattaggattcccatctg Amplicon 24 cttctcacccactttctgag AGATGAGGTAGATGCCACAG Amplicon 25 GCAGTGGTGTGAGTGGAC GACTGAATACAGGTGCAAGG Amplicon 26 CATTCCCTGGACCTATTTTC AGGGTTGTATTCCGTCTCTC Amplicon 27 AAGCTCATCGTGGTGGTC actgccgggtaaactgag Amplicon 28 ATGCTGGCATGTGAAGAC ATCGGGCTCAGTCAGTAAC Amplicon 29 CTACCCGGTTACTGACTGAG TAGAACCAGAAGAACCTTCG Amplicon 30 AGCCGAATATGACCTAGTCC AGTCACTCATGATGCAGGTC

261

APPENDIX M

Isolating CD45 Negative selected Platelets

Materials

ACD: Acid/Citrate/Dextrose 8 g/L Citric Acid (Sigma C‐2404) 22.4 g/L Sodium Citrate (Sigma S‐4641) 2 g/L Dextrose Anhydrous (aka Glucose, Sigma G‐8270) pH to 5.1, top off to 1 L with double distilled water. Use 0.2 µm bottle‐top filter to sterilize.

PGE1: Prostaglandin E1, Analytical Grade (Cayman Chemical # 13010.1) 1 mg suspend in 940 µl DMSO [3 mM stock solution]. Aliquot into 10 µl/tube, store at ‐80 C. At time of use, thaw by adding 90 µl PSG to 10 µl aliquot [300 µM]. Use at 1:1000 in platelet samples [final concentration is 300 nM].

PSG: (Pipes/Saline/Glucose) 2 L 4 L 6 L 5 mM PIPES (Sigma P‐3768) 3.464 g 6.928 g 10.392 g 145 mM NaCl (Fisher S‐271‐3) 17 g 34 g 51 g 4 mM KCl (Sigma P‐4504) 0.6 g 1.2 g 1.8 g

50 µM Na2HPO4 (Sigma S‐3264) 0.0142 g 0.0284 g 0.0426 g

1 mM MgCl2•6H2O (Fisher BP‐214‐500) 0.4g 0.8 g 1.2 g 5.5 mM Glucose (Sigma G‐8270) 2.0 g 4.0 g 6.0 g pH to 6.8 Q.S. to final volume with water Filter with 0.2 µm bottle top filter.)

Buffer: PBS pH7.2 with 0.5% BSA and 2mM EDTA

MACS CD45 MicroBeads: (Miltenyi Biotec MACS Beads 130‐045‐801) – will bind WBC and allow platelets to pass through

M199 (Medium 199 with EBSS, L‐Glutamine & HEPES) (Lonza 12‐117F) 500 ml

TRIZOL (Invitrogen 15596‐018)

Note: We did side by side comparison of LD column (suggested by the manufacture for negative selection) and LS column and the final result from LS column is just as good as LD column if not better).

‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐

262

Platelet isolation

1. Warm PSG, ACD and centrifuge to room temperature 2. Collect 80ml of whole blood into blue top tubes 3. Pool blue top tubes for ONE individual into a 50 ml conical tube 4. Add ACD at 14% of final volume (volume of blood x 0.1628 = amt ACD to add) 5. Spin at 100 x g for 30 mins 6. Pull 10 μl aliquot of PGE1 from ‐80 freezer 7. Add 90 μl PSG to aliquot of PGE1 8. Collect PRP into a new 50 ml conical tube 9. Add PGE1 (final concentration at 300nM) to PRP to a final concentration of 300 nM (1:1000 dilution from diluted aliquot) 10. Spin at 500 x g for 20 mins 11. Bring MACS CD45 microbeads to RT 12. Note volume level of plasma and discard plasma 13. Suspend platelet pellet in equal volume of PSG containing PGE1 (1:1000 dilution) as original PRP volume by pipetting up and down gently 14. Add 3ul MACS CD45 microbeads /per ml of original PRP to the cells 15. Incubate at RT for 20 mins mixing periodically by gentle inversion 16. While cells are incubating with beads, place LS column (Miltenyi Biotec 130‐042‐401) in the magnetic field of a suitable MACS Seperator 17. Prepare column by rinsing with 2ml of buffer and discard wash tube 18. Load the entire volume of platelets suspension onto the column and collect platelets (will pass through) in 50 ml conical 19. Wash column 1ml of buffer 2 times and collect into sample tube (only add new buffer when the column reservoir is empty) 20. Add additional PGE1 (1:1000 dilution) to enrich platelets 21. Spin down at 500 x g for 20 mins 22. Discard supernatant 23. Suspend platelets in a small volume M199 (1‐5ml) for platelet counting 24. Acquire platelet count 25. Spin down cells at 12000 x g for 4 mins 26. Discard supernatant 27. Add 1 ml of TRIzol for every 1 x 109 platelets 28. Transfer to RNase free tube 29. Store at 4°C overnight 30. Store ‐800C on day 2 until RNA extraction or shipment (on dry ice)

Reference: http://www.miltenyibiotec.com/download/datasheets_en/24/DS130_045_801.pdf

263

APPENDIX N

Autosomal Dominant Thrombocytopenia Candidate Mutation and THC2 Locus

Sequencing Primers

Table N.1: ADT Candidate mutation sequencing primers. Primer Name Sequence Tm Size ADT ARHGAP21 F TTAAACTGGGGGTATGTCC 54.82 684 ADT ARHGAP21 R CCATCCAGGATCCACTATC 55.14 ADT PTCHD3 F CTCACCTAGAATAAGAAATGGTG 54.9 296 ADT PTCHD3 R AGTCCCTGGGAATTAAAGG 55.65 ADT GPR158 F TTCCAGCAACCTTTAACATC 55.38 270 ADT GPR158 R TTTCCTTGATTGACAAATCC 55.1 ADT MPP7 F AACAGAATTATCTTCGGTGTAAG 54.54 334 ADT MPP7 R GGTCTGTAGCGATTATGTCC 54.73 ADT ARHGAP21.A.1 F CCCCTTCTTTCCATACCAC 56.88 ADT ARHGAP21.A.1 R CTGTTACCAATAAAACGCATTC 56.52 587 ADT ARHGAP21.A.2 F TTGTCTCTTCATTGCAACAC 54.69 ADT ARHGAP21.A.2 R GGAAGTTCTGTATCCACCTG 54.59 691 ADT ARHGAP21.B.1 F CAGGTGGATACAGAACTTCC 54.59 ADT ARHGAP21.B.1 R agtctctacccaccacagc 55.1 489 ADT ARHGAP21.B.1 F CAGGTGGATACAGAACTTCC 54.59 ADT ARHGAP21.B.2 R agcaagaaagaggagaagatg 55.5 473 ADT ANKRD26.1 F acatttgtttgacaggaacg 56.1 ADT ANKRD26.1 R GCAGCTTTGTGGATCTTG 55.24 477 ADT ANKRD26.1 F acatttgtttgacaggaacg 56.1 ADT ANKRD26.2 R CAGCTTTGTGGATCTTGC 55.24 476 ADT ANKRD26.2 F catttgtttgacaggaacg 55.01 ADT ANKRD26.1 R GCAGCTTTGTGGATCTTG 55.24 476 ADT ANKRD26.2 F catttgtttgacaggaacg 55.01 ADT ANKRD26.2 R CAGCTTTGTGGATCTTGC 55.24 475 ADT ANKRD26.3 F accgacatttgtttgacag 54.35 ADT ANKRD26.1 R GCAGCTTTGTGGATCTTG 55.24 481

264

Table N.2: THC2 Locus Tagging SNP Sequencing Primers Name Sequence Tm Size rs3824593 F ACTGCGATCCTATTCACG 54.55 289 rs3824593 R GCCCACGAAGAGTGTAAG 54.14 rs12572344 F GGAACGCTTCTGTTGTTG 55.15 260 rs12572344 R GCAATTCACAGGACCTAAAG 54.96 rs7911024 F TCCACAAAAGGGATAACAAC 55.05 295 rs7911024 R caaagtgctgggattatagg 54.94 rs11015498 F GTTTCTTGATCCACTCATGC 55.62 275 rs11015498 R CGATAGCTGTTTCATAAAATCC 55.74 rs3781099 F AAGCACCACCACATGAAC 55.07 236 rs3781099 R CCACTGAAGAAATTGAGACAC 54.83 rs9731310 F GGTTCGAGTGATTCTCCTG 55.62 395 rs9731310 R GAGCCCTCCTGAACTAAATG 56.48 rs11015574 F TGGAGTTTCACCATGTTAGC 56.19 236 rs11015574 R AGTTTGCCATTGTTCTCAAC 55.28 rs10159601 F gcaacatcaagaagttaccc 54.78 234 rs10159601 R aggcgagagagtctataaagtg 54.77 rs10829165 F GGACTGAAAACACTGGACTC 54.6 250 rs10829165 R AAATGCTCATGTTTTTCACC 55.2 rs10764667 F CTGAACTTATTCCGGTTCG 55.83 295 rs10764667 R TGCTTTCTCATTTTAGTCAACC 56.25 rs9651377 F CTTAACCACAGTAGCCTAAAATG 55.17 388 rs9651377 R CATGTCTGTCTCCATTTGC 54.88 rs12411584 F CTCCTCAAACTCTGAACCAG 54.98 250 rs12411584 R aaaattagccaggtgtgatg 55.25 rs788221 F TTTGCTCACCTGGTACATAAC 55.34 236 rs788221 R TGAGAGTTCCTACAGTTTTGC 54.76 rs12772934 F CAGAATCAAGTGAAATGTATGG 54.85 248 rs12772934 R ACTGGTGAGGATTATCATTTG 54.16 rs788208 F CTATGACAGCATCACTGCAC 55.21 234 rs788208 R TCTCTGCCTTTGATCCTTAC 54.57 rs10458707 F gcaatgggatatgtcctg 54.56 268 rs10458707 R tcatgcagctatttgaacac 54.84 rs3802526 F AAAACTTCGAAAAGCAACAG 54.94 224 rs3802526 R AAAGTCAGCCCTTAGATTCAG 55.38 rs11015501 F GCTAAAGGATCAACATACTTCTG 55.04 495 rs11015501 R CGCTGTACTATAGCAAGTGTATTC 55.34 rs7907059 F TTATGAATACACCTGGCAAAG 55.44 227 rs7907059 R tagagctagtcaggctcagg 55.02 rs6482593 F ACCAGGCTCCTAAGAAAAAC 55.21 235 rs6482593 R AACTGTGTGGTTCAGAGAGG 55.18

265

rs4749219 F AGACGGTAGGGGACAATC 54.79 243 rs4749219 R tagtagacagggcatttcatc 54.49 rs788222 F CTTCTTTCCATAACCACTGC 54.96 240 rs788222 R AGGTTGGAGAAAAGACTTGAG 55.26 rs11015506 F ACTTACAGATCCCTCTTCTGC 55.21 208 rs11015506 R CAAATCATGGAGTTGGGTAG 55.51 rs6482601 F CTGCTCTTCAGTTCCTTCC 55.04 231 rs6482601 R CCCAACATTTGAAAGATCAG 55.65 rs4749233 F GGGAAGATGGGATTATATGG 55.79 268 rs4749233 R CAAAGTGTTTCCTAAGAAAAGG 55.09 rs12266154 F tgacctcaggtggttcac 55.03 272 rs12266154 R TCCAATAGCTTATCTGAAATCC 55.26 rs3781101 F TCCCTTCACCCTAAAACAG 55.2 244 rs3781101 R TTTCTCCAGCATTCTTTCTC 54.71 rs12250031 F GGCCAATTAATCAACCTCTC 56.22 216 rs12250031 R TGGATGAAAGCTCTGAAGAC 55.5 rs11819631 F ACTCAAGTGTGGAGAACACTG 55.28 196 rs11819631 R GGCCTAACATATAGTCCTTGG 55.47 rs10829168 F GTACAGATGGGGTTTCACC 55.13 493 rs10829168 R TGCACACACACAAGTTAAGG 55.68 rs12770357 F ACAAATTGGAATTAGCAACG 55.42 247 rs12770357 R ATTCTTGGTGACGATAGTGG 55.07 rs2297144 F ACATTTACCTGCTTTGGATG 55.25 221 rs2297144 R TCCCTTTTGGTTTTTAGAGTC 55.21

rs10829168.1 F TTACGGTCTAAAAACATGCAC 55.51 363 rs10829168 R TGCACACACACAAGTTAAGG 55.68 rs10829168.1 F TTACGGTCTAAAAACATGCAC 55.51 362 rs10829168.1 R GCACACACACAAGTTAAGGAC 55.75 rs10829168.1 F TTACGGTCTAAAAACATGCAC 55.51 364 rs10829168.2 R ATGCACACACACAAGTTAAGG 56.22

266

APPENDIX O

ANKRD26 qPCR Primers and Melt Curve Gels

Expanded protocol:

Sample Collection, RNA isolation, Reverse Transcription

Whole blood samples were collected from 4 patients (two control and two positive) directly into PAXgene blood RNA tubes [PreAnalytiX/Qiagen, Valencia CA,

USA]. Tubes were stored at 4C overnight and total RNA isolation was performed using

PAXgene Blood RNA kit per manufacturer protocol. RNA was treated with 2U DNAase I digestion for 15min at 37C and removed/purified with DNAse Inactivation Reagent

[Ambion/Life Technologies AM1914, Grand Island NY, USA]. RNA integrity was evaluated on a RNA Nano chip run on a Bioanalyzer 2100 [Agilent Technologies, Santa

Clara CA, USA]. RNA integrity numbers (RIN) for all samples were 8.1-9.3. RNA for tissue panel was purchased from Ambion [AM6000]. The cDNA first strand transcription was performed in duplicate using 0.5ug (patient) or 1ug (tissue panel) of total RNA with the iScript RT mix for RT-qPCR following manufacturer protocol [BioRad 170-8840,

Hercules CA, USA]. The synthesized cDNA was stored at -20C until further use.

qPCR: Primer design for SYBR green

Design for primers that assay various isoforms of ANKRD26 were limited to specific regions of sequence unique to each isoform. Primers were selected using

Primer3 (http://frodo.wi.mit.edu/) with a target amplicon length of 80-150bp. Primer sequences are provided in Table O.1. Primer sequences were evaluated for specificity and degeneracy by searching BLASTN and known SNPs (http://blast.ncbi.nlm.nih.gov/ and http://genome.ucsc.edu/index.html). Reference gene primers were designed and purchased from IDT [PrimeTime qPCR primers, Coralville IA, USA].

267

Quantification by qPCR

Quantitative PCR was performed on a Roche LightCycler 480II Real Time PCR instrument, using Roche SYBR Green I Master [Roche 04 707 516 001, Mannheim

Germany]. PCR was carried out in a 20ul volume and a final concentration of 1X reaction buffer, 200nM forward and reverse primers and 0.5ul cDNA reaction. All standard curve and validation reactions were performed in triplicate with NTC reactions for all primer sets, all samples were done in duplicate. PCR cycling parameters were as follows: hot-start at 95C for 5min, 45 cycles of 95C for 10sec, 60C for 15sec, 72C for

30sec followed by a dissociation curve measurement from 65C to 97C. Comparative quantitative analysis was performed using the LC480II data collection software release

1.5.0.39 SP4. Seven log10 dilutions of plasmid template for all ANKRD26 isoforms and reference gene amplicon sequences were prepared and used for primer validation and standard curve reference. PCR efficiency for all target gene primer sets ranged from

90.1%-95.6% with all r-squared values >0.99. To avoid confounding technical variation, one standard curve sample for all reference and target genes were included on every plate. Geometric averaging of five reference genes (ACTB, GAPDH, GUSB, HPRT1 andTBP) were used normalize target gene transcripts. Melt curve analysis for all assays verify single product amplification and absence of primer dimers. NTC reactions for all primer sets were >5Cq from all control and unknown samples.

268

Table O.1: ANKRD26 qPCR primers Name Sequence Tm Size Notes ANKRD26 Intron 16 F TTCTGCTGGTACAGACGCTT 58.7 137 Amplifies intron 16, ANKRD26 Intron 16 R GTGGTGGAGAAGACCTCACA 58.6 which is unique to isoform 201 ANKRD26 Intron 6 F CAAAGTCTTGCACTGTTGCC 59.5 100 Amplifies intron 6, ANKRD26 Intron 6 R AGGTGGAGGCAGAAGAATTG 59.3 which is unique to isoform 202 ANKRD26 Exon 13 F GCTGCAGACGGAAAAGAAAA 60.5 138 Amplifies exons 12 ANKRD26 Exon 13 R GGCTACTGGCATGCCTACAT 60.13 and 13. Exon 13 is unique to isoform 001 ANKRD26 Exon 21 F CCAGGATGCAGCTCTTTCAT 60.36 184 Amplifies exon 21, ANKRD26 Exon 21 R GCACAGTTCTCGTTCCCATT 60.12 which is in all isoforms and is the control amplicon

Figure O.1: Intron 16 melt curve

Figure O.2: Intron 6 melt curve

269

Figure O.3: Exon 13 melt curve

Figure O.4: Exon 21 melt curve

Figure O.5: Exon 21 amplification plot

270

APPENDIX P

Variants Mapping to the THC2 Locus Where Autosomal Dominant

Thrombocytopenia Patients Are Heterozygous

Gene Position Type rs# Major Minor Genotype KIAA1217 24653841 intronic . C A het KIAA1217 24696474 intronic rs7911930 A T het KIAA1217 24696521 intronic rs7895880 C A het KIAA1217 24697579 intronic rs74122300 C T het KIAA1217 24699516 intronic . T G het KIAA1217 24700710 intronic . C A het KIAA1217 24704065 intronic . T G het KIAA1217 24707576 intronic . A G het KIAA1217 24708278 intronic rs74125404 A T het KIAA1217 24708422 intronic rs74125405 A C het KIAA1217 24711326 intronic rs74125407 G A het KIAA1217 24713241 intronic . C T het KIAA1217 24714839 intronic . A G het KIAA1217 24719505 intronic rs79641988 T C het KIAA1217 24754373 intronic . G A het KIAA1217 24757541 intronic . C T het KIAA1217 24824294 intronic . G C het ARHGAP21 24949517 intronic . G T het PRTFDC1 25139002 intronic . C T het PRTFDC1 25179724 intronic rs12572969 C G het PRTFDC1 25227898 intronic rs10764511 C T het PRTFDC1 25227988 intronic rs10828727 C T het ENKUR 25293765 intronic . G A het GPR158 25497651 intronic rs12762009 T C het GPR158 25584182 intronic . C T het GPR158 25606005 intronic rs71495423 C T het GPR158 25624290 intronic rs1340002 C T het GPR158 25646519 intronic rs12219956 A G het GPR158 25646974 intronic rs11014510 A C het GPR158 25655989 intronic rs80352191 A G het GPR158 25668778 intronic rs12762462 G A het GPR158 25669508 intronic rs10764539 A C het GPR158 25685872 intronic rs35663899 G T het GPR158 25685958 intronic rs35942963 A G het GPR158 25686900 intronic rs11818261 C T het GPR158 25687792 intronic rs10828791 G A het GPR158 25687835 intronic rs10828792 C T het GPR158 25690861 intronic rs11014527 G A het GPR158 25693677 intronic rs11014528 A G het GPR158 25693844 intronic rs12764320 G C het

271

GPR158 25693958 intronic rs12766749 A G het GPR158 25694041 intronic rs12258359 T C het GPR158 25694049 intronic rs12256778 A G het GPR158 25694156 intronic rs12256851 A G het GPR158 25694871 intronic rs12254479 G C het GPR158 25695278 intronic rs11014529 G A het GPR158 25695652 intronic rs12240665 C T het GPR158 25695936 intronic rs12249852 A G het GPR158 25696592 intronic rs12251917 T C het GPR158 25696712 intronic rs12251868 A G het GPR158 25697570 intronic rs12259043 G T het GPR158 25697921 intronic rs10764543 T C het GPR158 25698931 intronic rs10828793 A C het GPR158 25699152 intronic rs11014530 T C het GPR158 25699776 intronic rs12358447 T C het GPR158 25699841 intronic rs12259696 T C het GPR158 25700535 intronic rs7094439 T G het GPR158 25700603 intronic rs7090972 A G het GPR158 25700861 intronic rs11014533 A T het GPR158 25702280 intronic rs10508694 G A het GPR158 25714197 intronic rs7905216 C G het GPR158 25714311 intronic rs1958947 A C het GPR158 25714391 intronic rs1958948 A G het GPR158 25716511 intronic rs11014546 T G het GPR158 25717594 intronic rs11014547 C T het GPR158 25718331 intronic rs4431923 G A het GPR158 25720502 intronic rs10430566 G A het GPR158 25724277 intronic rs11014553 A G het GPR158 25724391 intronic rs11014554 C A het GPR158 25724744 intronic rs12773262 G A het GPR158 25725491 intronic rs16925850 G T het GPR158 25728288 intronic rs10764553 G A het GPR158 25728493 intronic rs11014558 G T het GPR158 25729644 intronic rs10741087 A C het GPR158 25730280 intronic rs10399962 C T het GPR158 25731702 intronic rs11014559 C T het GPR158 25731764 intronic rs11014560 C T het GPR158 25731770 intronic rs11014561 G A het GPR158 25731823 intronic rs10828807 C T het GPR158 25732083 intronic rs10828808 A G het GPR158 25762055 intronic . C T het GPR158 25787705 intronic rs4747522 G C het GPR158 25787906 intronic rs4749041 C G het GPR158 25788668 intronic rs10828810 C T het GPR158 25790197 intronic rs11014580 G A het GPR158 25791017 intronic rs4749042 A G het GPR158 25792709 intronic rs1887925 T C het GPR158 25793569 intronic rs943984 G C het

272

GPR158 25803024 intronic rs4623781 A T het GPR158 25803988 intronic rs4747525 A G het GPR158 25810258 intronic rs11014589 G A het GPR158 25810281 intronic rs11014591 G A het GPR158 25811764 intronic rs10828815 T C het GPR158 25814295 intronic rs1932150 G A het GPR158 25814527 intronic rs80074459 G T het GPR158 25817558 intronic rs1932149 A C het GPR158 25817666 intronic rs1932148 C T het GPR158 25818777 intronic rs4459183 G A het GPR158 25822360 intronic rs4749043 A G het GPR158 25825929 intronic . A T het GPR158 25836299 intronic rs4747529 G C het GPR158 25846158 intronic . T C het GPR158 25864381 intronic rs12359026 G T het GPR158 25865777 intronic rs4749050 C A het GPR158 25867000 intronic rs17557638 T G het GPR158 25867088 intronic rs11014622 A T het GPR158 25867106 intronic rs11014623 T C het GPR158 25867301 intronic rs4237372 G A het GPR158 25867510 intronic rs4237374 T C het GPR158 25867733 intronic rs4747533 G A het GPR158 25868377 intronic rs12355530 C T het GPR158 25869543 intronic rs11014625 T G het GPR158 25872099 intronic rs11014630 C A het GPR158 25874222 intronic rs1932147 A C het GPR158 25874515 intronic rs11014631 A G het GPR158 25874581 intronic rs10764563 G A het GPR158 25874863 intronic rs10764564 G A het GPR158 25875012 intronic rs10764565 C G het GPR158 25875892 intronic rs58254885 G A het GPR158 25877060 intronic rs10458686 T C het MYO3A 26226598 intronic . C T het MYO3A 26234176 intronic rs76951627 C T het MYO3A 26274559 intronic . C G het MYO3A 26363300 intronic . T C het MYO3A 26395920 intronic . T C het MYO3A 26424614 intronic . A C het MYO3A 26426444 intronic . C T het MYO3A 26453823 intronic rs11014980 C T het GAD2 26514047 intronic rs8190617 A T het GAD2 26518749 intronic . A G het GAD2 26527590 intronic rs11015008 G A het GAD2 26545425 intronic . G A het GAD2 26556560 intronic rs59147413 G A het GAD2 26566456 intronic rs11015025 C T het GAD2 26594192 downstream . C T het APBB1IP 26759447 intronic . G A het

273

APBB1IP 26790109 intronic rs787039 T C het APBB1IP 26790824 intronic . C A het APBB1IP 26797343 intronic rs787032 C A het APBB1IP 26810937 intronic . C A het APBB1IP 26842099 intronic rs2040027 C T het NCRNA00264 26881968 ncRNA rs72788603 T C het NCRNA00264 26884079 downstream rs73594379 A C het ABI1 27127607 intronic . A C het NCRNA00202 27221728 ncRNA . C T het NCRNA00202 27222874 ncRNA rs2477937 A G het NCRNA00202 27226655 ncRNA . T G het NCRNA00202 27226684 ncRNA rs36188550 T C het ANKRD26 27348615 intronic . G A het ANKRD26 27389381 UTR5 . A G het YME1L1 27402315 intronic . T C het YME1L1 27438066 intronic . A G het MASTL 27450059 exonic rs28941470 G C het MASTL 27455431 intronic . G T het RAB18 27812465 intronic rs76443154 G A het MKX 27980516 intronic rs11015947 A C het MKX 27999164 intronic rs79185604 T A het ARMC4 28112002 intronic . G A het ARMC4 28170243 intronic rs79110720 G C het ARMC4 28258247 intronic . G C het MPP7 28343645 intronic . A T het MPP7 28351855 intronic rs11006819 T C het MPP7 28355450 intronic rs11006828 C T het MPP7 28355817 intronic rs11006829 T C het MPP7 28356187 intronic rs11006831 C G het MPP7 28358540 intronic . A C het MPP7 28363867 intronic rs11006842 G C het MPP7 28370133 intronic . A T het MPP7 28370770 intronic . C T het MPP7 28372497 intronic rs72801882 G A het MPP7 28376999 intronic rs11006848 T A het MPP7 28380196 intronic rs72801897 C G het MPP7 28381071 intronic rs11006856 T C het MPP7 28383385 intronic rs72801902 C A het MPP7 28385737 intronic rs11006860 G C het MPP7 28387958 intronic rs11006864 T C het MPP7 28388750 intronic rs11006865 T A het MPP7 28393809 intronic rs72803614 T C het MPP7 28395460 intronic rs11006872 A G het MPP7 28414791 intronic rs7080796 C T het MPP7 28414954 intronic rs7098896 T G het MPP7 28416388 intronic rs66764206 G A het MPP7 28417828 intronic rs12411923 A G het MPP7 28428205 intronic rs76812798 T A het

274

MPP7 28428240 intronic rs72803630 G A het MPP7 28430418 intronic rs72803634 A G het MPP7 28430572 intronic rs78131314 C T hom MPP7 28434569 intronic rs72803636 T C het MPP7 28435081 intronic rs58193497 A C het MPP7 28441519 intronic . C T het MPP7 28451221 intronic rs78477008 T C het MPP7 28452477 intronic rs76009353 A G het MPP7 28453215 intronic rs78236200 C G het MPP7 28455686 intronic rs76826800 G A het MPP7 28456021 intronic rs75818413 C T het MPP7 28457035 intronic . G A het MPP7 28458567 intronic rs74835530 T C het MPP7 28467599 intronic rs79041263 G A het MPP7 28479036 intronic rs75243964 A G het MPP7 28484696 intronic rs76528336 G A het MPP7 28490983 intronic . T C het MPP7 28493391 intronic rs80053794 T A het MPP7 28504944 intronic rs11006923 T C hom MPP7 28513112 intronic rs11006927 T G hom MPP7 28514249 intronic rs11006928 A T hom MPP7 28515799 intronic rs17831816 G A hom MPP7 28519598 intronic rs11006933 T C hom MPP7 28523598 intronic rs11006937 C A hom MPP7 28525378 intronic rs11006938 G A hom MPP7 28525523 intronic rs11006939 C T hom MPP7 28536445 intronic rs11006945 T C hom MPP7 28538289 intronic rs78086976 G C hom MPP7 28539175 intronic rs11006947 C A hom MPP7 28544070 intronic rs11006958 A C hom MPP7 28546620 intronic rs11006960 C A hom MPP7 28549726 intronic rs11006964 T C hom MPP7 28557741 intronic rs77457054 G A hom WAC 28858846 ncRNA . A G het WAC 28908889 ncRNA rs80068643 T A het BAMBI 28967570 intronic rs78860727 C T het LYZL1 29579979 intronic rs1751951 T C het

275

APPENDIX Q

Autosomal Dominant Thrombocytopenia Candidate Mutation Sequencing Results

in Full Pedigree

PLT MASTL ANKRD26 ARHGAP21 PTCHD3 GPR158 MPP7 ID (K) 27450059 27389381 24872922 27692279 25888180 28543716 28543735 D9 303 G,G T,T G,G A,C G,G C,C G,G D10 285 G,G T,T G,G A,A G,G C,T G,T D11 33 G,C T,C G,G A,C G,G C,T G,T D13 285 G,G T,T G,G A,A A,G C,C G,G D14 29 G,C T,C G,G C,C A,G C,T G,T D15 246 G,G T,T G,G A,C A,A C,C G,G D16 G,C T,C G,G A,C A,G T,T T,T D17 G,G T,T G,G A,A A,A C,C G,G D18 G,C T,C G,G A,C A,G T,T T,T D19 G,G T,T G,G C,C A,A C,C G,G D20 81 G,C T,C G,G A,C A,G C,T G,T D21 G,G T,T G,G A,C A,G C,T G,T D22 94 G,C T,C G,G C,C A,G C,T G,T D23 105 G,C T,C G,G A,C G,G T,T T,T D24 G,C T,C G,G C,C A,G C,C G,G D25 G,G T,T G,G A,A A,G C,T G,T D26 140 G,C T,C G,G A,C A,G C,T G,T D27 108 G,C T,C G,G A,C A,G C,T G,T D28 G,G T,T G,G A,A A,A C,C G,G D29 119 G,C T,C G,G A,C G,G T,T T,T D30 G,C T,C G,G A,C A,G C,T G,T D31 74 G,C T,C G,G C,C A,G C,T G,T D32 234 G,G T,T G,G A,C A,G C,C G,G D33 55 G,C T,C G,G A,C A,G C,T G,T F1P1 70 G,C T,C G,G C,C A,G C,T G,T F1P2 30 G,C T,C G,G A,C G,G C,T G,T

276