<<

UMEÅ UNIVERSITY MEDICAL DISSERTATIONS New Series No 1060 – ISSN 0346-6612 – ISBN 91-7264-191-6

Genetic Studies of Stroke in Northern Sweden

Sofie Nilsson Ardnor

Department of Medical Biosciences Medical and Clinical Genetics Umeå University Umeå 2006

Department of Medical Biosciences Medical and Clinical Genetics Umeå University SE-901 87 Umeå, Sweden

Copyright © 2006 by Sofie Nilsson Ardnor

ISSN 0346-6612-1060

ISBN 91-7264-191-6

Printed by Solfjädern Offset AB, Umeå, 2006

beskrivning av generationerna på papper” Copyright © 2002 by Jacob Ardnor 6 år Reprinted with permission.

Curiosity killed cat… but cat has many lives!

ABSTRACT

Stroke is a common disorder of later life with a complex etiology, including both environmental and genetic risk factors. The inherited predisposition is challenging to study due to the complexity of the stroke phenotype. Genetic studies in an isolated population have successfully identified a positional candidate gene for stroke, phosphodiesterase 4D (PDE4D). The aim of this thesis was to identify stroke susceptibility loci and positional candidate genes, taking advantage of low genetic variation in the northern Sweden population. All stroke cases were identified in a population-based stroke registry at the northern Sweden MONICA Centre. 56 families containing multiple cases of stroke and a follow up set of an additional 53 families were used for linkage studies. For association studies, 275 cases of first ever stroke together with 550 matched community controls were included. In paper , used a candidate region approach to investigate the PDE4D region on chromosome 5q. Linkage was obtained with a maximum allele- sharing LOD score of 2.06; P = 0.001. However, no significant association of ischemic stroke to the previously defined at-risk allele in PDE4D was observed. We next performed a genome wide linkage scan to explore new susceptibility loci for common forms of stroke (paper II). Non- parametric multipoint linkage analysis yielded allele-sharing LOD scores > 1.2 at nine locations; 1p34, 5q13, 7q35, 9q22, 9q34, 13q32, 14q32, 18p11, 20q13. The highest allele-sharing LOD score was obtained on chromosome 18p (LOD = 2.14). Fine mapping resulted in increased allele-sharing LOD scores for chromosome 5q13 and 9q22. In the follow up analysis of the nine regions, including all 109 families, the highest allele-sharing LOD scores were obtained on chromosomes 5q, 13q and 18p although none reached the initial genome wide values. In paper III, we focused on the chromosome 5q region, and further mapping and haplotype analysis in the families was performed. A common 1 cM haplotype was found to shared among affected members of five families. In this region only the regulatory subunit 1 of phosphatidylinositol 3-kinase (PIK3R1) gene was located. Association of three single nucleotide polymorphisms in the PIK3R1 gene to common stroke was obtained in the case-control material. Finally, in paper IV, an extended pedigree containing seven families connected to common founders eight generations back was identified by genealogical analysis, and submitted to a separate genome wide scan analysis. A significant allele-sharing LOD score of 4.66 (genome wide P < 0.001) at chromosome 9q31-33 was obtained. Haplotype analysis identified a minimal common region of 3.2 cM, which was shared by four of the seven families. These four families contained all of the primary intracerebral hemorrhagic cases present in the extended pedigree. In conclusion we have replicated linkage of stroke susceptibility to the PDE4D region on chromosome 5q, but no significant association of ischemic stroke to PDE4D was observed. Linkage analysis of stroke did not identify any new major stroke loci, indicating that multiple minor susceptibility loci in addition to the previously known locus on chromosome 5q could contribute to the disease. In the chromosome 5q region a novel positional candidate gene for stroke was identified, the PIK3R1 gene. The PIK3R1 protein has several biological actions with potential roles in stroke susceptibility. Also a novel susceptibility locus for common forms of stroke at chromosome 9q was identified in a large pedigree, which may be of special importance for susceptibility to hemorrhagic stroke.

CONTENTS

LIST OF PAPERS 8

ABBREVIATIONS 9

INTRODUCTION 12

Stroke 12

Definitions and classifications 12 Pathophysiology 13 Risk factors 15 Genetic epidemiology 16

Genetics 18

Basic genetic concepts 18 Inheritance patterns 18 Molecular genetics 19 Population genetics 21 Genetic mapping 22 Recombination and genetic markers 22 Linkage 25 Association 27

Stroke genetics 29

Monogenic stroke 29 Polygenic stroke 29 Animal models 33

AIMS 34

SUBJECTS AND METHODS 35

Subjects 35 Genealogy 37 Molecular genetic methods 38 Statistical analysis 40

RESULTS 42

Study samples 42 Replication study, Paper I 43 Genome wide scan, Paper II 45 PIK3R1 association, Paper III 46 Novel susceptibility loci, Paper IV 47

DISCUSSION 49

CONCLUSIONS 54

CONCLUDING REMARKS 55

POPULÄRVETENSKAPLIG SAMMANFATTNING 58 (Swedish summary)

ACKNOWLEDGEMENTS 60

REFERENCES 62

ARTICLES and MANUSCRIPTS

LIST OF PAPERS

This thesis is based on the following papers, which will be referred to in the text by their Roman numerals:

I Nilsson-Ardnor S, Wiklund PG, Lindgren P, Nilsson AK, Janunger T, Escher SA, Hallbeck B, Stegmayr B, Asplund K, Holmberg D. Linkage of ischemic stroke to the PDE4D region on 5q in a Swedish population. Stroke 2005;36:1666-1671

II Nilsson-Ardnor S*, Janunger T*, Wiklund PG, Lackovic K, Nilsson AK, Lindgren P, Escher SA, Stegmayr B, Asplund K, Holmberg D. Genome wide linkage scan of common stroke in families from northern Sweden. Stroke 2006, in press

III Nilsson-Ardnor S, Lackovic K, Wiklund PG, Nilsson AK, Lindgren P, Janunger T, Escher SA, Stegmayr B, Asplund K, Holmberg D. Variation in the regulatory subunit p85alpha of Phosphatidylinositol 3- kinase, PIK3R1, is associated with common forms of stroke. Submitted

IV Janunger T, Nilsson-Ardnor S, Wiklund PG, Lindgren P, Escher SA, Lackovic K, Nilsson AK, Stegmayr B, Asplund K, Holmberg D. A novel stroke susceptibility locus mapped to chromosome 9q in an extended pedigree from northern Sweden. Manuscript

*These authors contributed equally to this work Reprinted with permission from the publisher Lippincott Williams & Wilkins

8 ABBREVIATIONS

ACE angiotensin-converting enzyme AD autosomal dominant ALOX5AP arachidonate 5-lipoxygenase-activating protein AR autosomal recessive AT1R angiotensin II receptor type 1 BMI body mass index bp basepairs CADASIL cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy CBF cerebral blood flow CEPH Centre d’Étude du Polymorphisme Humain cM centimorgan COL4A1 procollagen type IV alpha1 CRP C-reactive protein CT computed tomography CVA cerebrovascular accident DNA deoxyribonucleic acid FLAP five-lipoxygenase activating protein HWE Hardy-Weinberg equilibrium IBD identical by descent IMT intima media thickness LD linkage disequilibrium LOD logarithm of the odds LTA4H leukotriene A4 hydrolase LTB4 leukotriene-B4 Mb million base pairs MCA middle cerebral artery MLS maximum lod score MONICA Monitoring of Trends and Determinants in Cardiovascular Diseases MRI magnetic resonance imaging mRNA messenger RNA MTHFR methylene-tetrahydrofolate reductase NPL nonparametric linkage OR Odds Ratio PAI-1 plasminogen activator inhibitor PCR polymerase chain reaction PDE4D phosphodiesterase4D

9 PICH primary intracerebral hemorrhage PIK3R1 phosphatidylinositol 3-kinase, regulatory subunit 1 (p85α) QTLs quantitative trait loci RAS renin-angiotensin system RFLPs restriction fragment length polymorphisms RNA ribonucleic acid SAH subarachnoid hemorrhage SHR spontaneously hypertensive rat SHRSP stroke-prone spontaneously hypertensive rat SNPs single nucleotide polymorphisms SWISS Siblings With Ischemic Stroke Study TDT transmission/disequilibrium test. TIA transient ischemic attack TOAST the trial of ORG 10172 in acute stroke treatment tPA tissue plasminogen activator VIP Västerbotten Intervention Project VSMC vascular smooth muscle cells WHO World Health Organization XR X-linked recessive

10 Why study stroke?

Stroke is …a common disease. Annually, stroke hits 15 million people worldwide, with 2/3 of the stroke cases in developing countries.1 In Sweden, about 30,000 individuals suffer a stroke each year.2 Stroke predominantly affects the elderly, the mean age of onset is 75 years.2 The incidence in Sweden has been stable for many years,3 but from a worldwide perspective the incidence is expected to increase in an almost epidemic way due to the aging populations.4 ...a severe disease. Globally, 5 million people die and 5 million more are permanently disabled each year.1 Stroke is a major cause of adult disability.2 …costly in human and economical terms. A major stroke is viewed by more than half of those at risk as being worse than death.5 In Sweden only, the direct and indirect costs for society are estimated to be about 10 billion Skr annually.2 …preventable. Treatment of hypertension can reduce the risk of a stroke by up to forty percent.6 Primary and secondary prevention of stroke is important to decrease stroke incidence and the associated morbidity.

Why study the genetics of stroke in northern Sweden?

In northern Sweden …founder effects exist. A history of low immigration, few founders and a rapid expansion of the population contribute to a relatively low genetic diversity in the northern Swedish population.7,8 …epidemiological studies of stroke have long traditions. Since 1985, the two northernmost counties in Sweden; Västerbotten and Norrbotten, have been a participating centre in the World Health Organisation´s Monitoring of Trends and Determinants in Cardiovascular Diseases (WHO MONICA) project.9 Standardized and validated data collection and population-based stroke incidence registers provide valuable information regarding stroke cases in the region.10 …blood samples are stored at the medical biobank. At the end of 2005, about 90,000 unique population-based samples from the WHO MONICA project and the Västerbotten Intervention Project (VIP) were stored in the Northern Sweden Medical Biobank. …the public has a favorable attitude towards genetic research. The population has a great willingness to give consent to genetic research.11,12

11 INTRODUCTION

Stroke

Definitions and classifications What is a stroke? To interpret and compare data in epidemiological and genetic studies it is important to define this. The most widely used definition of stroke was set by the World Health Organization13 as the clinical syndrome of “rapidly developing clinical signs of focal (or global) disturbance of cerebral function, with symptoms lasting 24 hours or longer, or leading to death, with no apparent cause other than of vascular origin”. This definition includes all pathological types; ischemic stroke, primary intracerebral hemorrhage (PICH) and subarachnoid hemorrhage (SAH). Transient ischemic attack (TIA) is distinguished from ischemic stroke by an arbitrary time window of less than 24h duration of the neurological deficit. A term at times used synonymously with stroke is cerebrovascular accident (CVA).14 Broader concepts that include stroke are cerebrovascular disease and cardiovascular disease. Cerebrovascular disease is sometimes used for both stroke and other vascular diseases affecting the brain such as vascular dementia.15 The term cardiovascular disease includes disorders of the heart and the peripheral and central vascular system.1 The vast majority, approximately 80%, of stroke cases are ischemic. The remaining twenty percent is split into the two main pathological types of hemorrhagic stroke; PICH (15%) and SAH (5%). Pathological subtypes of ischemic stroke are atherothromboembolism (about 50%), embolism from the heart (about 20%), intracranial small vessel disease (about 25%) and other rare causes, .g. vasculitis and aterial dissection (about 5%).16 The proportions are approximate since multiple potential causes for ischemic stroke can be detected; one patient can suffer both from atrial fibrillation (cardiac embolism?) and severe carotid atherosclerosis (atherothromboembolism?) and in some cases no definitive cause can be found, despite thorough investigations. The distribution of stroke types and subtypes vary between populations and age groups. Hemorrhagic stroke is shown to be more frequent in Asian than Western populations17,18 and small vessel disease is the most common subtype of ischemic stroke in the Japanese population.18 Embolisms from the heart tend to be more common in the elderly,19 while other rare causes are more common in younger age groups.20 A classification system for ischemic stroke based on pathophysiological subtypes is the trial of ORG 10172 in acute stroke treatment (TOAST) classification.21 The TOAST classification (Figure 1) has been shown to have a good interobserver agreement, in particular with the use of a computerized

12 algorithm22 and is the presently most frequently used classification system in genetic studies of stroke. To differentiate between stroke types and subtypes, brain imaging by computed tomography (CT) or magnetic resonance imaging (MRI) is performed. Other common techniques used to further classify ischemic strokes are carotid duplex ultrasound to diagnose blockages of the carotid arteries in the neck, and echocardiography to image the heart and assess its function.

Figure 1. Pathophysiological categorization of stroke based on the TOAST classification system.

Pathophysiology An understanding of the underlying pathophysiology of stroke is important for the identification of potential targets for both prevention and treatment. Factors affecting any of the different biological mechanisms can be considered as candidates influencing stroke risk and outcome. Brain function is dependent on constant supply of oxygen and glucose because the brain does not store glucose and is incapable of anaerobic metabolism. In ischemic stroke, the blood supply is interrupted, resulting in tissue hypoperfusion and neuronal damage. In hemorrhagic stroke, extra vascular release of blood within the brain parenchyma causes pressure injury of the tissue. The signs and symptoms

13 of stroke have a large variability depending on the type of stroke and the location of the brain injury. Depending on the severity of the stroke, patients can experience a loss of consciousness, cognitive deficits, speech dysfunction, eye problems and limb weakness. Damaged neurons release noxious metabolites, injuring healthy surrounding neurons and causing edema. In ischemic stroke, the destructive cascade can be reversed in the surrounding zone of brain tissue, the ischemic pneumbra, if blood flow is restored within a critical period of a few hours.23,24 The cerebral autoregulation keeps the cerebral blood flow (CBF) constant throughout a wide range of systemic blood pressures. In response to ischemia, the autoregulatory mechanisms compensate for a reduction in CBF by local vasodilation to open the collaterals. The vascular process leading to an acute ischemic event is dynamic and influenced by many factors. Not only are the degree of obstruction and the collateral circulation important, but also systemic blood pressure, coagulation factors, body temperature and hypo- and hyperglycemia.25,26 Systemic hypotension, e.g. due to cardiac arrhythmia or arrest or shock, activates the autoregulation and small arterioles constrict in an attempt to maintain pressure. Ischemia can then develop in the areas between the territories of the major cerebral and cerebellar arteries, in the watershed region. In ischemic stroke, the three main pathological mechanisms are related to atherothrombotic, embolic, and small-vessel disease. Atherosclerosis is the most common pathological feature of vascular obstruction resulting in thrombotic stroke. In the atherosclerotic process, lipid deposition in the vessel walls leads to the formation of plaque, the vessel lumen is narrowed and the blood flow becomes turbulent. Alterations in flow velocities lead to disruption of the intima or plaque rupture, activating the clotting cascade. Platelets then become activated and adhere to the plaque surface, and eventually form a fibrin clot. The lumen of the vessel becomes successively occluded and ischemia may develop distal to the obstruction. While a thrombus is the formation of a clot within a blood vessel, an embolus is formed when a blood clot migrates through circulation and causes a blockage of a blood vessel. The most frequent sources of emboli are the heart and the large neck arteries. Emboli formed in the heart are most commonly due to atrial fibrillation, but also valvular disease, left ventricular dysfunction and patent foramen ovale. In the internal carotid artery or vertebral arteries, detachment of a thrombus from atherosclerotic lesions is a source of emboli. Most emboli end up in the middle cerebral artery (MCA) distribution because of the conditions of the blood flow. Small vessel disease typically results in lacunar infarcts, small "lakes", in the areas of the deep perforating arteries. Microatheromata can occlude the penetrating arteries or lipohyalinosis can occur in the presence of chronic hypertension. In lipohyalinosis, endothelial injury is caused by the elevated blood pressure and a deposition of plasma proteins is followed by a degeneration of the tunica media

14 smooth muscle. The smooth muscle is replaced with collagen fibres, which inhibit the elasticity of the blood vessel. The vessel lumen narrows and eventually the clotting cascade is activated, leading to thrombosis. Primary intracerebral hemorrhage (PICH) is the result of the rupture of a vessel within the brain tissue, commonly caused by hypertension, or in the elderly, amyloid angiopathy. Deposition of amyloid protein in the vessel wall causes the vessels to become more rigid, fragile, and prone to rupture. Subarachnoid haemorrhage (SAH) is a bleeding within the surrounding meningeal spaces. Common causes of SAH are rupture of vascular malformations and aneurysms from the base of the brain. A likely factor in the pathophysiology of intracranial aneurysms is disruption of the extracellular matrix of the arterial wall, which provides strength and elasticity to intracranial arteries.27

Risk factors The number of established and proposed risk factors for stroke is overwhelming.1 Risk factors can be classified according to the extent of possible change (non- modifiable, modifiable, or potentially modifiable) or the strength of evidence (well documented or less well documented).28 The risk factor can be causal, showing reduced risk in randomized controlled trials or correlated, if significantly associated in several studies.29 The term major risk factor refers to a high prevalence of the specific factor in many populations.1 Modifiable, causal risk factors for ischemic stroke include high blood pressure, carotid stenosis and atrial fibrillation, since treating those factors reduces the incidence of stroke in randomized controlled trials.30-32 Treatment of high cholesterol with statins also reduces stroke risk.33 However, there is no overall association between plasma cholesterol and stroke. Other nonlipid favourable effects mediated by statin therapy are proposed to be the cause of reduced stroke risk, such as modified endothelial function and atherosclerotic plaque stabilization.34,35 Other major risk factors with a significant independent impact on the risk of stroke are cigarette smoking, diabetes mellitus, physical inactivity, previous TIA, ischemic heart disease and valvular heart disease.2 Additional lifestyle factors correlated to stroke risk are an unhealthy diet with a low fruit and vegetable intake, excessive alcohol use and obesity, as measured by a high body mass index (BMI).36,37 Social conditions such as low socioeconomic status as well as psychosocial stress increase the risk of stroke.38,39 Non-modifiable risk factors are advancing age40,41, being of male gender41 and a positive family history of stroke or TIA.42 Specific risk factor patterns for different ethnic groups also exist.43 Less well documented, emerging risk factors include excess levels of homocysteine, elevated levels of C-reactive protein (CRP), fibrinogen or other blood clotting markers.1,28 Several genetic polymorphisms are also associated with an increased stroke risk.44,45

15 The large prospective studies of stroke risk factors have rarely distinguished between the main pathological types, ischemic and hemorrhagic stroke.16 The exclusive risk factor pattern for PICH is therefore less well known, but several studies have shown that independent risk factors for hemorrhagic stroke are hypertension, heavy alcohol intake and anticoagulant treatment.46,47 For SAH, Feigin et al. stated that hypertension, smoking and excessive alcohol intake are important risk factors for SAH.48 For different subtypes of ischemic stroke, advances in neuroimaging and other diagnostic procedures make it possible to consider specific risk factors for different subtypes of ischemic stroke. Hypertension tends to be an important risk factor for all subtypes, but the impact of other risk factors varies with the regard to the subtype.49,50

Genetic epidemiology There are several strategies employed to study the potential genetic influence in common multifactorial diseases. In humans, twin and adoption studies as well as familial aggregation studies are used in genetic epidemiology. Animal models of a specific disease can also assist in elucidating the contribution of genetic factors. In twin studies, differences among a cohort of monozygotic twins and dizygotic twins are studied. If a disease has a genetic contribution the concordance rate is higher in monozygotic twins. In adoption studies, adopted affected individuals are identified and the presence of the same disease in the adoptive family or among biological relatives is searched for. A common study design of genetic predisposition is family history studies. If there are genetic variants that predispose to a disease, a positive family history should be a risk factor for the disease. This 157 subsequently confirmed in two independent replication studies. ADDIN EN.CITE

16 history of stroke was shown to be more frequent in studies with younger subjects, aged <70 years. Monozygotic twins were more likely to be concordant than dizygotic twins and a family history was a risk factor for stroke in both case-control and cohort studies.63 In family history studies of ischemic subtypes, individuals suffering a cardioembolic stroke tended to have less family history of stroke than large vessel and small vessel stroke cases.64-67 Taken together, genetic epidemiological studies of stroke support the notion of a contribution of genetic factors for stroke susceptibility.

17 Genetics

Basic genetic concepts Inheritance, in genetic terms, is a trait transmitted through genes from parents to offspring. The physical carrier of inheritance is deoxyribonucleic acid (DNA) and the complete set of genetic instructions for an organism is the genome.

Inheritance patterns The pattern of inheritance is the manner in which a gene is transmitted. Geneticists use pedigree information to determine the mode of inheritance of a trait or a disease. A pedigree chart displays the ancestry of a given individual, the descendants of a given individual and sibling or cousin relationships. The first affected individual recorded in a pedigree is the proband. Pedigree charts are also a common tool in human genealogy studies (Figure 2).

Figure 2. A simple pedigree with key to symbols.

Monogenic disorders are diseases where a mutation in a single gene can cause disease. The most common inheritance patterns for monogenic diseases are autosomal dominant (AD), autosomal recessive (AR) and X-linked recessive (XR) inheritance. In autosomal dominant diseases one mutation in a gene on one chromosome is enough to result in a phenotype (an observable trait /characteristic or

18 disease). In an AD pedigree, 50% of the offspring are expected to be affected. For AR diseases, both of the genes have to carry a mutation to affect an individual. AR pedigrees, are therefore expected to show 25% affected offspring. In the specific sex chromosomal inheritance patterns, males are most frequently affected. Cells also contain mitochondrial genes, and disease causing mutations in these genes can also occur. Single gene disorders can also have a variable penetrance, meaning that the expression of a specific trait is not absolute, even if you carry a mutation. Environmental factors and/or additional protective genes at other loci can modulate or abolish the phenotype. When the penetrance is very low, the influence of other genes and the environment is more substantial and the monogenic trait becomes closer to a multifactorial trait. In multifactorial diseases, the mode of inheritance is not clear, multiple interacting genes and environmental factors interplay to contribute to a disease phenotype.

Molecular genetics A draft of the human genome was presented in 2001, the DNA sequence of 3x109 basepairs (bp) was estimated to compromise about 30,000 genes.68,69 The number of genes was less than expected and today the number of human genes is estimated to be even less, about 25,000. Much of the variation on the protein level must then be explained by other mechanisms e.g. multiple transcription starts or alternative splicing to produce isoforms of proteins. In the transcription process genetic information is transferred from DNA into ribonucleic acid (RNA). RNA is then processed and spliced into messenger RNA (mRNA) and subsequently translated into a protein. In translation, each specific sequence of three nucleotides (codons) directs the cells protein-synthesizing machinery to add specific amino acids to the developing protein. Humans are capable of synthesizing at least 100,000 different kinds of proteins. A typical human gene consists of 5-10 coding exons, separated by non- coding introns, removed in the splicing process, and a promoter region, initiating the transcription of the gene. However, human genes vary widely in length, often extending over thousands, or even millions, of bases. The polymerase chain reaction (PCR) technique allows a small amount of DNA to be amplified exponentially. The PCR is a key technique in molecular genetics, invented in 1983 by Kary Mullis. PCR is an enzymatic replication of DNA in vitro (in a test tube). Three basic steps; denatauration (separating the DNA helix), annealing (joining the specific primer and the DNA strand), and extension of the specific part of DNA take place at different temperatures and are repeated for about 30 or 40 cycles, resulting in a vast number of copies of a specific DNA sequence (Figure 3).

19

Figure 3. Basic illustration of the PCR technique. Illustration freely available at; www.genome.gov/glossary.cfm

20 Population genetics Studying the variation in genes among a group of individuals is population genetics. The genetic difference between two people is very small; about 99.5-99.8% of your genome is identical to the next persons.70 Hence, the genetic difference between populations is also small, and the genetic variability within a population can be even smaller, especially in the case of isolated populations. Still, these minor genetic variations can be very useful in genetic studies. A population could be defined as a number of people in a certain geographical region or within a certain social setting. New populations can be formed when for instance settlers (founders) decide to stay in a region far away from the parental population. Descendants tend to stay in the same region, inbreeding is common and the original gene pool is maintained in a larger number of people. A founder effect is obtained. The frequencies of the gene variants in the new population may differ largely from the original population. Old populations can be diminished by war, famine or diseases, leaving only a few individuals left from the original population to expand the subsequent population. When a population suddenly contracts to a small size; a population bottleneck, dramatic changes in the gene frequencies can occur and the new population will have less genetic variability than the original population71 (Figure 4).

Figure 4. Few founders and a rapid expansion of the population contribute to low genetic diversity in the population.

Both founder effects and bottleneck phenomena are extreme examples of genetic drift. Cavalli-Sforza defines random genetic drift as the random fluctuation of gene frequencies in a population of finite size.72 Random sampling of the genetic variants in successive generations of a population results in genetic drift. The ultimate outcome is fixation of one variant and loss of the other (in the absence of other evolutionary forces such as selection, mutation and migration). The size of the

21 population is an important factor for the impact of the different evolutionary forces. In a small population, fixation of a genetic variant can occur within a few generations; whilst in a large population genetic drift occurs slowly, allowing for selection forces. If fixation of a genetic variant occurs, the only way of change in a finite population is by introduction of a new variant by mutation. New gene variants can also be introduced by transferring variants from one population to another, a migration of gene variants. An example of population admixture is the African American population. Patterns and frequencies of single gene disorders can be very diverse in different populations due to the population´s genetic structure. In the Finnish population many rare monogenic diseases are present, whilst at the same time other genetic variants, common in other European populations, are rare.73 In northern Sweden, genetic drift, founder effects and consanguinity (inbreeding) have been suggested to be important factors influencing the population´s genetic structure.74 A number of rare monogenic conditions have also been detected in northern Sweden, including infantile genetic agranulocytosis, morbus Kostmann,75 epidermolysis bullosa letalis, Mb Herlitz,76 familiar amyloid polyneuropathy77 and acute intermittent porphyria.78 Common multifactorial diseases are suggested to display similar regional differences, where selected genetic variants predisposing to disease are enriched in the population due to a restricted number of ancestors.79 In a large population with random mating, and without mutation, selection, migration or random genetic drift, the genotype frequency in the subsequent generation is only determined by the gene frequencies of the parents, known as the Hardy-Weinberg equilibrium (HWE). In HWE, the frequency of the homozygotes is equal to the allele frequencies squared (p2 and q2) and the frequency of the heterozygotes equals twice the product of the two allele frequencies (2pq). Deviations from HWE for alleles in genetic studies could indicate a stratified population with subpopulations or genotyping errors.

Genetic mapping The aim in human disease gene mapping is to identify disease genes by the use of genetic markers with known locations which cosegregate with the disease phenotype. If a marker seems to be inherited together with the disease phenotype in a pedigree, the true disease gene is probably close by, and not separated from the marker by recombination.

Recombination and genetic markers Each somatic cell in the human body has a set of 22 pairs of autosomal chromosomes and a set of sex chromosomes (diploid cell). In each pair, one of the chromosomes is inherited from the father and one from the mother. The germ cells (egg or sperm) have a single copy of each chromosome (haploid cell). In order to

22 increase variability, the maternal and paternal homologs in a chromosome set are independently assorted in the germ cells (first level of diversity) and the chromosome pair undergoes crossover events (second level of diversity) in the first meiosis. In the crossover, or recombination, the DNA double helix in one maternal and one paternal chromatid physically break and join together. Recombination normally occurs at least once in each chromosome arm. These mechanisms allow for an almost unlimited number of different germ cells in an individual. In genetic studies, the recombination events are used to our advantage in order to determine genetic distances between two loci, to create genetic maps and to map human disease genes. Two loci on different chromosomes will segregate independently. Two loci (e.g. A and B) far apart on the same chromosome will also segregate independently due to recombination. To be able to detect the recombination between these two loci, the locus must have variable DNA sequences, defined as alleles, (e.g. A1, A2, B1, B2). A series of alleles along a chromosome is a haplotype. In a homologous pair of chromosomes, an individual is heterozygous at a locus if the alleles are different on each chromosome (e.g. A1A2), and homozygous if the same allele is represented on both copies of the chromosome (e.g. A1A1 or A2A2). If a germ cell receives one of the independently segregated alleles (e.g. A1), the likelihood of also receiving one of the alleles at locus B (e.g. B1) is 50%. The proportion of germ cells that are recombinant for two loci, the recombination fraction (theta), will thus never exceed 0.5. Loci closely related to each other in a small chromosomal region tend to be inherited together as a block and are not separated by recombination (Figure 5).

Figure 5. Recombination between marker loci A and B, exemplified by crossover of the two homologous chromosomes of the father. To the left, a haplotype of closely related loci (A1,C1,D1,E1,F1).

23 Genetic distance is measured in centimorgans (cM). One cM is equal to a 1% chance of a single crossover in a single meiosis. At short distances the probability of a double crossover is very small and thus 1cM also corresponds to approximately 1% recombination. The genetic distances differ between males and females and the frequency of recombination varies in different chromosomal regions. However, 1cM is roughly equivalent to 1 million base pairs (1Mb).69 A human genetic map is created by estimation of the recombination fraction between marker loci, preferably in, large multigenerational pedigrees. Families have been assembled for the purpose of making genetic maps, at the Centre d’Étude du Polymorphisme Humain (CEPH) in collaboration with the University of Utah.80 There is a permanent supply of DNA from each CEPH individual by immortalized cell lines and CEPH DNA is now also used worldwide as control DNA in genotyping experiments. Other commonly used genetic maps include the deCODE map using Icelandic pedigrees81 and maps combining information of the genetic and physical distance of markers.82 A human physical map at a low resolution is the banding patterns of chromosomes in the cytogenetic map83 (Figure 6) and at a high resolution, the DNA sequence of all chromosomes.68,69

Figure 6. Cytogenetic map of the human chromosomes. Illustration freely available at; www.genome.gov/glossary.cfm

24 There are several different types of genetic markers; one of the most commonly used are microsatellites. Microsatellites are highly informative markers, that most frequently occur as dinucleotide repeats e.g. (CA)n, where the number of repeats (n) varies between individuals. Typically, a microsatellite marker has 5-15 different alleles at a particular locus. Dense maps of thousands microsatellite markers have been created in the Human Genome Project.84-86 Genetic markers are used for tracking inheritance of traits through a pedigree. Before the DNA markers were discovered, blood groups and serum proteins were used. The first DNA marker was the bi-allelic restriction fragment length polymorphisms (RFLPs). Nowadays, single nucleotide polymorphisms (SNPs), DNA markers that represents a variation in a single base, are increasingly used. The advantage of SNPs is that they are widespread in the genome, on average one per 1000bp, and they allow a very high- throughput of samples. SNPs are most commonly bi-allelic and thus contain less information about meiosis than microsatellite markers. The informativeness of a specific marker can be determined by the mean heterozygosity, the chance that a randomly selected person will be heterozygous at the marker locus. For a bi-allelic marker, the maximum mean heterozygosity is 0.5. Polymorphic markers with many alleles are more informative and facilitate the detection of recombinants in a pedigree.

Linkage Linkage is the tendency for markers to be inherited together because of close proximity on a chromosome. Two loci are said to be genetically linked when the recombination fraction (theta) is less than 0.5. The object of linkage analysis is to estimate theta and to test if theta is less than 0.5; that is, whether an observed deviation from 50% recombination is statistically significant.87 However, in most human pedigree data it is not possible to count recombinants and nonrecombinants. For this reason likelihood-based methods are used in linkage analysis. Linkage analysis is performed in human genetic studies as well as in animal models. Linkage is usually reported as a logarithm of the odds (LOD) score. Traditionally, a LOD score greater than 3, corresponding to 1000:1 odds for linkage, is the threshold for significant linkage and a LOD score less than -2 allows exclusion of linkage.88 Single-point (or two-point) linkage analysis is a test of linkage between a disease locus and a marker locus, independent of other markers. Multipoint linkage analysis, on the other hand, makes use of information from several markers to test linkage between the disease and a specific position on the marker map. As long as the marker map is correct, the multipoint analysis is more powerful than single point analysis. Linkage analysis methods can be applied for both monogenic diseases, with known patterns of inheritance (parametric linkage), and for complex disorders (model-free or non-parametric linkage). In parametric linkage analysis, disease model parameters such as allele frequencies and penetrance (how many of the

25 mutation-carrier develop the disease) is assumed and included in the calculations. If the assumptions made are close to the true model, parametric linkage is a powerful way to detect genes causing monogenic disorders. Non-parametric linkage analysis is applied when the mode of inheritance is unknown, as in complex diseases. The basic assumption in non-parametric linkage (NPL) is that affected individuals have an excess sharing of a specific marker allele close to the disease locus. Alleles at the same locus on two homologous chromosomes are said to be identical by descent (IBD), if they originate from the same ancestral chromosome. Two individuals can share an allele IBD at a particular locus on none, one, or two chromosomes. If random inheritance is assumed in sib-pairs; the expected number of alleles shared IBD is 0, 1 and 2 with the probability 25%, 50% and 25%, respectively (Figure 7).

Figure 7. Linkage is obtained if the affected sib pairs share significantly more alleles IBD than expected by chance. M1-M4 = marker alleles.

The proportion of the observed alleles shared IBD is tested against the expected proportion 0.25, 0.50 and 0.25. Linkage is obtained if the affected sib pairs share significantly more alleles IBD than expected by chance. Statistical methods used to detect excess allele sharing are likelihood based methods, maximum lod score (MLS) or scoring methods based on excess average IBD-sharing (NPL score). For linkage studies using affected sib pairs only, the MLS method is well suited, but when handling more complex pedigrees the NPL score is favorable. In a whole genome linkage scan, typically more than 400 microsatellite markers, evenly spread about every 10 cM, are genotyped and tested for linkage to a disease. In 1995, Lander and Kruglyak presented guidelines for genome-wide significance for complex diseases.89 The threshold for genome-wide significance of

26 5% was derived using mathematical theory assuming a dense map with fully informative genetic markers and corresponded to a LOD score of 3.3. Lander and Kruglyak also introduced the terms suggestive and significant linkage, defined as statistical evidence that would be expected to occur once and 0.05 times in a genome wide scan, respectively. The thresholds for significant and suggestive linkage vary depending on specific study designs and have been the subject of debate. A meta- analysis of 101 linkage studies showed that only a handful of the linkage studies could fulfil these LOD thresholds90 and important disease associated genes have been identified in linkage regions well below the significance thresholds.91 A complementary approach to determine significance values for a specific study is to use simulation. Empirical genome-wide significance levels can be established by whole genome simulations, taking into account the same family structures, marker map, allele frequencies and patterns of missing data as in the study itself.92 Once a specific chromosomal region is found to be linked to a disease, a common approach is to fine map this candidate region, aiming to reduce its size. Additional microsatellite markers and SNPs are genotyped to confirm or reject the initial linkage findings and to construct haplotypes within families linked to the region. As a complement, additional families can be genotyped for markers in the linked region. Once the smallest candidate region is identified, potential disease- causing genes (candidate genes) in the region are investigated by resequencing or SNP association analysis. Another approach to the positional candidate strategy is to type a dense set of SNPs across the whole region in the families or a case-control material and look for association to the disease.

Association The term association comes from two variables that are related, and does not necessarily imply causation. Association between a genetic polymorphism and a disease in a population can be direct (causal or functional), indirect (nearby a causal variant) or spurious due to population stratification or admixture. Two popular study designs for genetic association studies are case-control studies and family-based studies. In case-control analysis, allele frequencies between affected individuals and the general population are compared. In family-based association, a common method is the transmission/disequilibrium test (TDT), where the allele transmissions from the parents to the affected offspring are compared with the non-transmitted alleles (artificial internal control)93 (Figure 8).

27

Figure 8. Study designs for genetic association studies, a case-control study and a family-based study using families with two parents and an affected offspring (trios).

Direct association of a putative causative variant and a disease is powerful but difficult to achieve. Variants in a coding region, altering an amino acid in the translated protein, are candidate causal variants. However, in complex disorders many causal variants may be non-coding, effecting splicing, gene regulation or expression. Indirect association depends on the existence of association between the causal variant and nearby polymorphisms at a population level. Association because of the proximity of loci on the genome, non-random association of alleles, is termed linkage disequilibrium (LD). Alleles in LD form haplotype blocks, blocks of alleles that tend to be inherited together at a population level. A few SNPs tagging these haplotype blocks can capture most of the polymorphisms in a gene.94 However, using this method allows the possibility that a causal variant is not picked up by the markers chosen. In the International HapMap Project, LD patterns over the genome are described for different populations, aiding the complete coverage of marker information in specific genes or the entire genome.95 In the completion of the HapMap project, genetic association analysis is not only used for functional and positional candidate genes studies, but also for whole genome association studies.

28 Stroke genetics

Monogenic stroke Monogenic, mendelian forms of stroke are rare. They only comprise less then 1% of all stroke cases and the stroke event is frequently part of a multisystem disorder. There is no uniform classification for mendelian stroke syndromes, but the underlying mechanism is often used when describing the different forms. Arterial dissection can occur in Marfan syndrome and Ehlers-Danlos syndrome type IV. Hemoglobinopathies, such as sickle cell disease, mitochondrial disorders and familial hemiplegic migraine are other examples of monogenic stroke disorders.44 Familial types of hemorrhagic stroke have also been reported, where mutations in the Cystatin C gene96 and in the Amyloid beta protein precursor gene97 have been found to be causative. One of the most studied monogenic variants is the small vessel disease; cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL). This condition was earlier reported as “hereditary multi-infarct dementia” or “familial subcortical dementia” and was mapped to chromosome 19.98 Causative mutations were later identified in the gene Notch3.99 Notch3 encodes for a Notch receptor, mediating intracellular signalling and is expressed in vascular smooth muscle cells (VSMC). VSMCs are important in small vessel pathology and thus a good model for studying mechanisms in vessel wall degeneration.100 In Notch3 -/- mice it was shown that Notch3 is required for arterial differentiation and maturation of VSMCs.101

Polygenic stroke The vast majority of stroke cases are not caused by a single genetic defect, but are sporadic with a complex etiology. As for other common complex diseases, susceptibility genes are thought to interact with environmental factors to increase stroke risk. Many functional candidate genes, in several pathways with plausible biological importance, have been proposed in stroke pathology.44 Biological pathways considered to be of specific importance in stroke pathology are the renin- angiotensin system (RAS), homocysteine and lipid metabolism, haemostatic factors, pro-inflammatory genes and nitric oxide synthesis. Genetic variants are targeted in large combined cohorts such as MORGAM, an international pooling of cardiovascular cohorts.102 A common and constantly recurring problem is the replication of initial associations, due to for instance spurious association and population differences.

29 In a meta-analysis of 120 case-control studies including 32 candidate genes for stroke, statistically significant association with ischemic stroke was noted for four variants in the most investigated candidate genes; angiotensin-converting enzyme (ACE) insertion/deletion, factor V Leiden Arg506Gln, prothrombin G20210A, and methylene-tetrahydrofolate reductase (MTHFR) C677T. For the next three most studied genes; human platelet antigen type 1, Factor XIII and Apo E, no statistically significant association was identified.45 Functional candidate genes investigated in the northern Sweden population includes MTHFR in the homocystein metabolism, (MTHFR) C677T,103 haemostatic factors such as plasminogen activator inhibitor (PAI-1) 4G/5G,104 tissue plasminogen activator (tPA) 7,351C>T and also components of the RAS; angiotensin II receptor type 1 (AT1R) A1166C and ACE I/D.104 All above mentioned polymorphisms, except tPA, seem to be of importance for stroke risk in the northern Sweden population. However, the MTHFR polymorphism was associated to hemorrhagic stroke only. A hypothesis-free way of identifying stroke susceptibility genes is linkage analysis. Very few genome wide studies of common stroke105 and intermediate phenotypes such as carotid intima media thickness (IMT)106 have been published. For intracranial aneurysm or SAH families107-109 some linkage studies have been presented. However, a multitude of genome wide scans of established risk factors for stroke have been performed including hypertension, diabetes mellitus and coronary heart disease, many of these also combined in meta-analyses.110-113 In the first published genome wide scan of common stroke, Gretarsdottir et al. made use of a population-based registry of patients with stroke in Iceland, connected with an extensive computerized genealogy database. The study included 476 patients with stroke within 179 extended pedigrees. Linkage to chromosome 5q12 was detected with a multipoint allele-sharing lod score of 4.40, which met the criteria for genomewide significance. This locus, designated STRK1, was the first risk locus identified by linkage mapping for common stroke.105 By extensive fine mapping, a positional candidate gene was identified in the linkage region, the phosphodiesterase4D (PDE4D) gene. Strong association between PDE4D polymorphisms and stroke was detected, especially for carotid (atherosclerotic) and cardiogenic (embolic) ischemic stroke, and PDE4D haplotypes were classified into three distinct groups: wild-type, at-risk and protective.114 The gene structure of PDE4D is complicated with several intronic promoters. In 2002 LeJeune et al. reported 17 exons of the PDE4D gene, spanning about 1Mb. For each of the first five isoforms, putative intronic promoters were identified upstream from the start codons.115 Gretarsdottir et al. concluded that the PDE4D gene consists of at least 22 exons spanning over a 1.5Mb genomic region (Figure 9).

30

Figure 9. A schematic drawing of the PDE4D gene. Black crossbars represent exons. All PDE4D isoforms share exons 3-11. All long isoforms of PDE4D share exons LF1-4. The exons denoted with capital D are most often unique for each isoform.

The PDE4D protein degrades the second messenger cAMP, a key signal transduction molecule in multiple cell types, including vascular cells.116 There are many splice variants and at least nine different isoforms of PDE4D. The different isoforms are expressed differently in diverse tissues, for instance PDE4D6 is expressed only in brain, whilst PDE4D7 has a strong expression in lung and kidney, and PDE4D8 is expressed at high levels in heart and skeletal muscle.117 Gretarsdottir et al. also observed that multiple PDE4D isoforms were differently regulated in affected individuals. PDE4D is proposed by Gretarsdottir et al. to be involved in the pathogenesis of stroke, possibly through the atherosclerosis process. Another positional candidate for stroke is the arachidonate 5- lipoxygenase-activating protein (ALOX5AP) gene, identified by suggestive linkage to chromosome 13q12-13 in a genome wide scan of myocardial infarction. A SNP haplotype (HapA) was proven to contribute to increased risk of both myocardial infarction and stroke,118 and males had the strongest association to the at-risk haplotype. The gene structure of ALOX5AP is less complicated compared to the PDE4D gene and consists of 5 exons spanning about 31kb.119 The encoded protein; five-lipoxygenase activating protein (FLAP), is a regulator of a leukotriene pathway. This pathway produces inflammatory mediators such as leukotriene-B4 (LTB4) and has been implicated in the atherosclerotic process both in a mouse model120 and in human studies.121 Individuals with myocardial infarction had significantly greater production of LTB4 in the study by Helgadottir et al, suggesting that proinflammatory activity has a role in the pathogenesis of myocardial infarction. Gulcher et al. proposed a model for the involvement of both PDE4D and FLAP in the pathological process contributing to atherosclerosis122 (Figure 10).

31 +

Figure 10. Proposed mechanism for PDE4D and ALOX5AP variants in the pathogenesis of atherosclerosis and stroke, modified from Gulcher et al.122

The initial evidence in support of associations of variants in PDE4D and ALOX5AP with stroke and myocardial infarction have consequently been the subject of replication efforts in independent cohorts. The attempt to replicate the findings for PDE4D has been less straight forward than for ALOX5AP. To date, 13 independent follow-up studies for PDE4D association have been reported.123-135 The study cohorts differ in ethnicity, ascertainment criteria and adjustments for risk factors. Several SNPs have shown positive associations between PDE4D variants and stroke, but none of the studies have found the same high-risk haplotype associated with ischemic stroke as in the original Icelandic report. However, a recent meta- analysis performed by Staton et al., including nine of the above mentioned studies, demonstrated a strong association of three PDE4D variants to ischemic stroke, indicating that PDE4D is involved in the pathogenesis of ischemic stroke.123 The association of ALOX5AP with myocardial infarction was confirmed in an independent cohort of individuals who had sporadic myocardial infarction, but with a separate haplotype (Hap B).118 In a Scottish case-control material (450 cases with ischemic stroke and 710 controls) Hap A, but not Hap B, was associated with ischemic stroke.136 Variants in the leukotriene A4 hydrolase (LTA4H) gene, encoding for an enzyme in the same biochemical pathway as ALOX5AP/FLAP, were shown to confer modest risk of myocardial infarction in an Icelandic cohort. This haplotype (HapK), was also shown to confer modest risk in European Americans, but showed a threefold larger risk in African Americans,

32 conferring an ethnicity-specific risk. HapK is uncommon in Africa and the occurrence of the haplotype in African Americans is most probably due to European admixture. Unfavourable interactions with other genes and environmental factors more common among African Americans could account for the increased risk.137 In a randomized placebo-controlled phase II trial, including myocardial infarction patients carrying at-risk variants of ALOX5AP or LTA4H, treatment with the FLAP inhibitor, DG031, led to significant and dose-dependent suppression of biomarkers (LTB4, myeloperoxidase and CRP) that are associated with increased risk for myocardial infarction.138 Hakonarson concluded that these data supports the notion that the LTB4 arm of the leukotriene pathway may play a fundamental role in ischemic heart disease and stroke.139

Animal models Chromosomal mapping of quantitative trait loci (QTLs) contributing to stroke in rat models has been reported by several research groups. In a study by Rubattu et al. the stroke-prone spontaneously hypertensive rat (SHRSP) was mated with stroke- resistant spontaneously hypertensive rat (SHR) and a genome wide screen was performed in the resulting F2 cohort. Three major QTLs were detected on rat chromosomes 1, 4 and 5.140 Jeffs et al. performed a genome wide scan in a F2 derived from SHRSP and the normotensive reference rat strain WKY, looking for the genetic component responsible for the large infarct volumes in SHRSP. A highly significant QTL was identified on rat chromosome 5 that was blood pressure independent and co-localized with the genes encoding for atrial and brain natriuretic factors.141 Recently, Simard et al. found an upregulation of the cation channel regulatory subunit Sur1 in a rat model of ischemic stroke. Glibenclamide treatment reduced cerebral edema, infarct volume and mortality by blocking Sur1, and was proposed as a new therapeutic approach for stroke.142 Mice have also been a useful tool in genetic studies of stroke. Recently, Gould et al. showed that a mutation in the mouse Col4a1 gene, encoding procollagen type IV alpha1, predisposes both newborn and adult mice to intracerebral hemorrhage. A COL4A1 mutation was also identified in a human family with a small-vessel disease syndrome.143 Gould et al. concluded that mutations of COL4A1 may cause a spectrum of cerebrovascular phenotypes and that individuals carrying COL4A1 mutations may be predisposed to hemorrhage.144

33 AIMS

The general aim of this thesis was to identify novel stroke susceptibility loci and genes. The strategy was to make use of the relatively low genetic diversity in the population of northern Sweden in linkage and association studies.

Specific aims were:

Paper I

To replicate the original Icelandic linkage findings on chromosome 5q in our population and to test the positional candidate gene PDE4D for association to stroke in northern Sweden.

Paper II

To perform a genome wide linkage scan to detect novel susceptibility loci for stroke in the northern Sweden population.

Paper III

To fine map the linked region on chromosome 5q in order to identify novel candidate genes for stroke.

Paper IV

To perform a separate genome wide linkage scan in order to detect novel susceptibility loci in an extended family from northern Sweden.

34 SUBJECTS AND METHODS

Subjects In papers I-IV, stroke subjects were identified in a population-based stroke registry established in 1985 at the northern Sweden MONICA Centre.3,145 Two study designs have been used, a family-based design for linkage analysis and a case-control design for association studies (Figure 11).

Figure 11. Identification of subjects in the MONICA Stroke Incidence Registry.

The stroke registry cover patients between 25-74 years of age and each case included in the register has been strictly validated according to the WHO MONICA criteria.13 A diagnosis of acute stroke was based on clinical presentation. Differentiation of ischemic and hemorrhagic events was based on computed tomography (CT) scans or autopsy, while cases lacking unambiguous CT or autopsy data were classified as unspecified stroke. Stroke patients were identified by screening all hospital records, reports from general practitioners and death

35 certificates in the region. The event registration and survey procedures have been validated and are described in detail by Stegmayr et al.10,145 This register was used to identify probands for the family-based study. Familial cases of stroke were identified by questionnaires, sent to all living patients in the registry, affected at <70 years of age between the years 1985 and 1996, asking for a family history of stroke. A family history of stroke was defined in the questionnaire as an additional 1st to 3rd degree relative affected at <75 years of age. As a result of this approach, 101 families were ascertained. In a follow-up study, between the years 1997-2001, 63 additional families were identified (Figure 12).

Questionnaires

sent to individuals in the stroke register who suffered and survived a stroke < 70 years of age

1985-1996 1997-2001

2054 1701 individuals individuals

78% 79% replied replied

101 63 families families

Figure 12. Ascertainment of the family-based material

The diagnosis of each affected family member was validated in the MONICA register or by hospital records. In the sampling procedure blood samples were collected from all available affected family members as well as available unaffected siblings and children of those affected. The samples were usually obtained via arrangement with the local general practitioner. After identification and collection of individual samples, families underwent genealogical studies using public and private databases. Most families were nuclear families but extended pedigrees were also identified. Families were then only included in the study if the parents of the probands were born in the counties of Norrbotten or Västerbotten, the two northernmost counties in Sweden, and if the families had at least one affected sib

36 pair and at least one unaffected sibling. Medical records of affected individuals in both recruitments were reviewed for the prevalence of risk factors by an experienced stroke physician and ischemic subjects were subtyped according to the TOAST classification system.21 Clinical and genealogical information of participants in the study were continuously updated by screening the registers for new stroke events among previously healthy siblings and by genealogical research. Each subject was given a unique identification number prior to genotyping. The participants in the study provided informed consent for access to medical records and donation of blood for genetic research. In the case-control study, all subjects had been participants in population-based cardiovascular risk factor surveys, either the MONICA project or the Västerbotten Intervention Project (VIP). In the MONICA survey, 2000 or 2500 randomly selected individuals were invited to participate in health surveys in 1986, 1990, 1994, 1999 and 2004. Each individual was stratified for age and sex, and aged between 25 and 74 years. The mean participation rate was 77%. In the VIP survey, all residents in Västerbotten are invited to a health exam the year they turn 40, 50 and 60 years of age, (with a design similar to that of the MONICA population surveys). The mean participation rate in VIP is 59%, with a somewhat lower participation rate among younger individuals, the unemployed and those with a lower income.146 Participants in both the MONICA and the Västerbotten surveys were asked to donate a blood sample to be stored at the northern Sweden Medical Biobank for future research and written consent was obtained from all participants. The present study used a nested case-control design, including patients affected by stroke that occurred between the first of September 1996 and the 20th of September 2000. In total, 275 cases of first ever stroke were included. Two controls for each case were selected from participants in the MONICA or the Västerbotten surveys. They were matched for sex, age (± 2 years), cohort (MONICA or VIP), date of health survey (± 1 year) and place of domicile. Control subjects were excluded if they had died or had moved away from the MONICA region before the end of the stroke registration period. Both cases and controls were excluded if known to have had myocardial infarction or stroke before the health survey or if they had been diagnosed with cancer during the past 5 years. Procedures for case ascertainment in the nested case-control study are described in detail elsewhere.147

Genealogy In the genealogical analysis, church records, publicly available through the Department of the National Archives (www.svar.ra.se) and private genealogical databases were used. The church records, kept for centuries, include birth, death and marriage records. The genealogy software used, DISGEN, allowed for the export of data in the GEDCOM format and the pedigrees were displayed using Cyrillic Pedigree Editor 2.0.

37 Molecular genetic methods DNA was prepared from 10 ml whole blood or buffy coat, according to standard extraction methods. In paper I, 43 polymorphic microsatellite markers were genotyped on chromosome 5 and in paper II a total of 445 markers were used. All markers were from the Applied Biosystems (ABI) Prism Linkage Mapping Set v.2.5 HD10 and HD5. In papers III and IV, microsatellite markers were also obtained from DNA Technology. Sets of microsatellite markers with non-overlapping allele sizes and fluorescent labels were amplified together in multiplex PCR reactions. At least one sample of CEPH DNA was included on each 96 or 384 well plate. The genotyping was performed according to protocols supplied by the manufacturer and analyzed on an ABI PRISM 3100 Genetic Analyzer or a 3730XL Genetic Analyzer. Data was assembled by 3100 Data Collection software v.1.1. Analysis of results, allele calling, check of quality and editing of called genotypes were performed using GeneMapper Genotyping software v.3.0, v.3.5 or v.3.7 (Figure 13a). The genotypes were checked and, if necessary, edited manually. Applied Biosystems supplied all the above-mentioned software. To verify the family relationship and to detect genotype errors, data was analyzed for Mendelian inheritance using the program Pedcheck.148 Genotypes inconsistent with Mendelian laws were reexamined and if possible corrected or deleted. In paper I, three SNPs were selected based on information available from the public databases, rs1971940 (SNP1), rs716908 (SNP2), and rs294492 (SNP3). Sequences for two SNPs, rs12188950 (SNP4, deCODE SNP45), rs12153798 (SNP5, deCODE SNP41) and one microsatellite (AC008818-1) were obtained by personal communication from Dr Solveig Gretarsdottir, deCODE, Iceland. In papers III and IV, SNPs were selected on the basis of information available in public databases; Celera Discovery System, National Center for Biotechnology Information (NCBI) and SNPper (CHIP Bioinformatics Tools), or by resequencing. For SNP analysis we used TaqMan and SNPlex Genotyping Systems (ABI). For each SNP, DNA samples were used for verification and quality control. The genotypes in TaqMan were read and determined in ABI PRISM 7900HT Sequence Detection System (SDS) version 2.1 (ABI). The SNPlex assay was performed according to the manufacturer, analyzed on a 3730XL Genetic Analyzer and evaluated in GeneMapper v.3.7 (ABI) (Figure 13b). Haplotypes in papers III and IV were constructed manually in Cyrillic Pedigree Editor v.2.0, using all available genotype information from siblings, children and individuals married into the families. In paper III, coding regions and 30bp of adjacent intronic regions of the PIK3R1 gene were examined for variants using direct resequencing (Figure 13c). Primers for PCR and DNA resequencing analysis were designed to amplify all 15 exons of p85alpha and the first exon of each of the p50alpha and p55alpha isoforms, using Primer Express v.2.0 (ABI)

38

Figure 13a. Analysis of marker D5S2003 in GeneMapper showing an individual with two different alleles at the locus (heterozygous individual).

Figure 13b. Analysis of SNP rs251406. Blue dots = homozygous carriers for allele 2. Red dots = homozygous carriers for allele 1. Green dots = heterozygous carriers (allele 1 and allele 2).

Figure 13c. Resequencing of PIK3R1. A novel gene variant is detected at position 231, in close proximity to exon 15 (PIK3R1exon15_1).

39 Statistical analysis Linkage analysis was performed using a model-free, multipoint approach in 149 Allegro. We used the spairs scoring function that assesses identity by descent (IBD) sharing among all pairs of affected individuals within families. This scoring function is suggested to perform well with all types of disease models.150 The NPL scores were converted into allele-sharing LOD scores using the exponential model described by Kong and Cox.151 When combining the family scores to obtain an overall score, we used a compromise between weighting the families equally and weighting the affected pairs equally, using the “power:0.5” option in Allegro. Marker allele frequencies were estimated among all individuals in the pedigrees according to the algorithm of Merlin.152 In papers I and III, we used the marker map of Genethon,85 and for comparison in paper I, we also did the analysis based on the deCODE genetic framework map.81 In paper I, interpolations with respect to the physical distances in NCBI genome build 34.3 and the available genetic positions in the deCODE map were performed to estimate positions of markers not included in the deCODE map. In papers II and IV, we used the genetic map with interpolated genetic marker positions from David Duffy, Queensland Institute for Medical Research (www.qimr.edu.au/davidD/master_map.dat) where the genetic positions are interpolated via locally weighted linear regression from the National Center for Biotechnology Information (NCBI) build 34.3 physical map positions and Rutgers genetic map.82 The p-values reported are computed by comparing the observed allele-sharing LOD score with its complete data distribution and are not corrected for multiple testing, unless else is specified. To adjust for the number of markers tested and to estimate the significance of our findings (empirical p-values), we did simulation studies, in paper I for the candidate region, in paper II for the genome wide scan for model II and in paper IV for the extended family. The simulations were performed in Allegro using 1000 randomly created data sets under the null hypothesis of no linkage, assuming a genotyping success-rate of 95 % and the same marker allele frequencies and pedigree structure as in the original analysis. The association analysis in the case-control material was performed using the software package SPSS in paper I (v. 11.5) and in paper III (v. 14.0). We calculated genotyped-based Odds Ratios (ORs) with respective 95% confidence intervals, using conditional logistic regression. The conditional approach is an effective way to adjust for possible confounders in individually matched case- control studies. The genotype-based ORs were calculated using individuals homozygous for the most common allele as the reference. Allele specific calculations were performed in papers I and III under the assumption of a multiplicative model of the risk, by including a genotype trend variable (given the values 0, 1 and 2, depending on each individual’s number of at-risk alleles) in the regression model. For three of the SNPs in paper III, according to the genotype risk patterns, calculations were also carried out assuming a recessive mode of

40 inheritance. Linkage disequilibrium between markers was calculated by D' and R2 statistics in the case-control material, using the program ldmax from the GOLD software package.153 Haplotype association analysis and estimations of LD measurements based on the -algorithm were performed in Haploview.154 For testing association in the family dataset in paper I, we used the transmission/disequilibrium test (TDT) within the program TRANSMIT, version 2.5.4 which has the advantage of allowing unknown parental genotypes.155 All association analysis p-values presented are 2-tailed and are not corrected for multiple testing, unless else is specified.

41 RESULTS

Study samples In the initial family recruitment we collected 556 individual samples from 101 families. Of those 101 families, 25 were excluded after genealogical and clinical validation or due to lack of blood samples, with the criteria displayed in Figure 14. Of the remaining 76 families, the 56 families considered to provide the most information regarding identical by descent sharing (IBD) were selected for genotyping in the replication study (paper I), in the genome wide scan (paper II) and the fine mapping (papers II and III). These families included 128 affected individuals (including 11 cases of SAH) from a total of 376 individuals. In a follow up we collected 63 additional families from the same region, with 246 individual samples. Thirty families were excluded, mainly due to lack of samples at the time of genotyping, but also due to genealogical and clinical validation. The remaining 33 families from the second recruitment were added to the 20 non-genotyped families from the first recruitment, forming a follow-up cohort for fine mapping (papers II and III), that consisted of 129 affected individuals (including 15 cases of SAH) from a total of 242 individuals (Figure 14).

101 families 63 families 1985-1996 1997-2002

10 families 6 families proband not born in the region related to earlier identified family

4 families 6 families parents not born in the region parents not born in the region

7 families 16 families no sample from affected relative no sample from affected relative

4 families 2 families affected relative did not fulfill affected relative did not fulfill MONICA criteria for stroke MONICA criteria for stroke

76 families 33 families Potential for high information on IBD sharing 20 families 56 families 53 families used for initial GWS used for follow up and fine mapping fine mapping only

Figure 14. Selection of the families in the genome wide scan and the follow up.

The main characteristics of the stroke cases (SAH excluded) in the 56 families are displayed together with the characteristics for the cases in the case-control study in Table 1, paper I. In the family-based material a larger group of cases had an

42 unknown etiology for the stroke event, which is explained by the lack of access to CT scans in some of the smaller hospitals in the region during the earlier years. The age of onset was higher in the family sample, which is most likely explained by the ascertainment differences between the samples. For statistical analysis, we selected three different disease models. Model I is broad, including all stroke cases captured by the MONICA WHO definition of stroke, thus excluding TIAs but including SAH subjects. Model II was restricted to both ischemic and intracerebral hemorrhagic stroke and model III consisted of ischemic stroke only. Further subdivision according to TOAST classification21 was not considered for our linkage calculations due to the low number of subjects with complete classification data. In all models the diagnosis was based on the first stroke event with one exception; if the subject first suffered from a SAH and later an ischemic stroke (four cases), the patient was allowed as affected in all three models. The distribution of stroke types within the families of model I is displayed in Table 1, paper II. To enable comparisons to previous studies mainly investigating common stroke excluding SAH, we display in Table 2, paper II clinical characteristics and risk factor distribution for model II only. The characteristics of the stroke cases in our study do not differ markedly from the general stroke population in northern Sweden. In paper IV, genealogical analysis of the family-based material identified an extended pedigree containing seven families connected to common founders eight generations back (Figure 1, paper IV). Of the affected individuals in this extended pedigree, 40 % suffered from hemorrhagic stroke. In a review of medical records, those affected did not display symptoms of a familial stroke syndrome but rather common forms of stroke.

Replication study, Paper I Two peaks with allele-sharing LOD scores >1.0 were revealed on chromosome 5 by the initial nonparametric multipoint linkage analysis of the family-based data set, using the Genethon genetic map. These peaks were further investigated by running additional markers. As a result, the maximum allele-sharing LOD scores increased to 2.06 (P=0.0010) at marker D5S424 and to 1.60 (P=0.0033) at marker D5S1969 (Figure 1a, paper I). Analyzing the results using the deCODE map, the allele- sharing LOD score decreased to 1.86 (P=0.0017) at marker D5S424 while it increased to 1.95 (P=0.0014) at marker D5S1969 (Figure 1b, paper I). Excluding hemorrhagic strokes (n=15) had only a minor effect on the results (Figure 1, paper I). In the deCODE map the position of marker D5S1969 is at 68.9 cM, consistent with the findings in the Icelandic stroke genome scan displaying one peak at ~69 cM (LOD = 2.00).105 The simulation study of 1000 random sets yielded a candidate region p-value of 0.01 for the allele-sharing LOD score peak of 2.06 at marker

43 D5S424. These findings replicate the previously reported linkage at the chromosome 5q region105 (Figure 15).

Figure 15. Allele-sharing LOD score curves for chromosome 5, deCODE map. The LOD score curve to the left is adapted from Nilsson-Ardnor et al.133 and the LOD score curve to the right is adapted from Gretarsdottir et al.105

We next aimed to replicate the reported association of ischemic stroke to the PDE4D gene variants.114 A directed association study using our nested case-control material was performed. The microsatellite marker, AC008818-1, and the two SNPs, SNP41 and SNP45, reported by Gretarsdottir et al. to have the strongest association to stroke were included.114 Three additional SNPs in the PDE4D gene that was genotyped prior to the Icelandic report of association of ischemic stroke to the PDE4D gene were also analyzed. The positions of the SNPs and the microsatellite marker in the PDE4D gene are shown in Fig 2, paper I. Each of the genotyped markers was found to be in Hardy-Weinberg equilibrium (HWE) in the control group. For two of the markers analyzed, the B allele of AC008818-1 and SNP3, conditional logistic regression calculations revealed association with p-values <0.05. SNP3 displayed an OR of 0.68 (CI 95%, 0.48-0.96) and the B allele (-4 bp compared to the shortest allele of CEPH 1347-02) in AC008818-1 displayed an OR of 0.69 (CI 95%, 0.49-0.98), assuming an additive model in both cases. When correcting for the number of markers and alleles tested, p-values did not reach formal significance levels. No significant association to the Icelandic defined at-risk allele 0 of AC008818-1, here denoted as allele C, was obtained in this study, OR: 1.1 (CI 95%, 0.84-1.45). The linkage disequilibrium data (LD) for the markers were consistent with those of the Icelandic study. LD coverage of PDE4D is not complete (Table 3, paper I) since the association study primarily aimed to test the previously reported, most significant disease-associated genetic markers; SNP 41, SNP 45 and AC008818-1. These markers were also analyzed for association to stroke in the family-based material, but no significant values for association were obtained (Table 4, paper I).

44 Genome wide scan, Paper II In the genome wide scan, initial linkage analysis revealed allele-sharing LOD scores > 1.2 (arbitrary chosen cut off) at nine locations; 1p34, 5q13, 7q35, 9q22, 9q34, 13q32, 14q32, 18p11, 20q13, considering any of the three models with a broad to narrower definition of stroke (Table 3 and Figure 2, paper II and Figure 16). The highest allele-sharing LOD scores were obtained on chromosomes 5q (previously reported in paper I), 1p (LOD = 2.10 at marker D1S255, model II) and 18p (LOD = 2.14, at marker D18S59, model III). From our simulation studies, an allele-sharing LOD score value of 2.14 is expected to occur by chance once in every five genome wide scans.

Figure 16. Genome wide linkage scan of patients with stroke. Each chromosome is displayed along the X-axis, with the multipoint allele-sharing LOD score given on the y-axis. The solid grey line represents Model I (all stroke cases defined by WHO MONICA criteria), the solid black line represents Model II (ischemic stroke and intracerebral hemorrhage) and the dotted line represents Model III (ischemic stroke only).

Fine mapping of the nine chromosomal regions with allele-sharing LOD scores > 1.2 was performed by genotyping additional microsatellite markers in these regions. Increased allele-sharing LOD scores for chromosome 5q (previously reported in paper I) and 9q22 (LOD = 1.56 at marker D9S287, model I) were obtained. All other loci decreased. However, an allele-sharing LOD score >1.2 was still displayed on four chromosomal regions; 1p34, 9q34, 13q32 and 18p11. On chromosome 9q22 and 9q34, linkage was detected in model I and II, (but not in model III, the ischemic only model). Strongest evidence for linkage to the chromosome 18 region was found for model III and to the chromosome 13 region for model I. (Table 3 and Figure 3A, paper II and Figure 17). In a follow-up set of 53 families, original genome wide microsatellite markers and fine mapping markers were genotyped in all the nine regions of interest. When calculated separately, the 53 families did not show the same level of linkage to any of the analyzed regions as the initial 56 families. In the combined analysis,

45 including all 109 families, the highest allele-sharing LOD scores were obtained on chromosomes 5q (LOD = 1.71 at marker D5S1982, model II), 13q (LOD = 1.59 at marker D13S265, model I) and 18p (LOD = 1.53 at marker D18S59, model III), (Table 3 and Figure 3B, paper II and Figure 17).

Figure 17. Results of fine mapping regions on chromosomes 9, 13 and 18 in the first family set (A) and the combined family set (B). Allele-sharing LOD scores are plotted along the y-axis and genetic map positions are plotted along the X-axis.

PIK3R1 association, Paper III In paper I, the linkage analysis for chromosome 5q indicated a second peak, suggesting that another gene in this chromosomal region, in addition to PDE4D, could contribute to stroke susceptibility. In this region, additional, microsatellite markers were genotyped in our 109 families. Applying multipoint non-parametric methods we obtained a maximum allele-sharing LOD Score of 1.70 at marker D5S2003, (Figure 1, paper III). Genotype information from all available family members was then used to search for haplotype patterns within the affected individuals and to seek crossing over events among the affected siblings in the 21 cM 1-lod drop region between markers D5S1956 and D5S641. A common 10 cM haplotype among affected individuals was initially identified in 2 families based on genotype information of 16 microsatellite markers. Part of the same haplotype was found to be represented in additional families, but due to the limited information provided by several of the

46 markers within this haplotype, we could not unequivocally establish this as part of the larger haplotype. By genotyping nine SNPs in the identified region we were able to confirm that this region was indeed part of the larger haplotype, Figure 2, Paper III. The markers D5S2046 and D5S2019 could therefore be used to restrict a common risk haplotype in these families to a region of approximately 1 cM. Database searches of this chromosomal region revealed that only one known gene was situated there, the regulatory subunit 1 (p85 alpha) of phosphatidylinositol 3-kinase, (PIK3R1) gene (Figure18).

Figure 18. Physical location of the PIK3R1 gene. Adapted from Ensembl Genome Browser, www.ensembl.org

Since the haplotype analysis reduced the disease critical region to a single positional candidate gene, we next aimed to test for association of PIK3R1 gene variants to stroke in a case-control cohort. Thus, 18 SNPs in the 75kb PIK3R1 gene were genotyped in the case-control material. One of the SNPs tested (PIK3R1exon15-1) were identified in an affected individual by resequencing of the exons in PIK3R1. However, the identified genetic variant had a minor allele frequency of less than one percent in the case-control material. The positions of the SNPs within the PIK3R1 gene are given in Figure 3A, paper III. All markers used were found to be in Hardy-Weinberg equilibrium (HWE) in the control group. A LD map of the region is displayed in Figure 3B, paper III. Three polymorphisms were found to be significantly associated with stroke, two in intron 1 (rs251406 and rs251404) and one in intron 6 (rs34306), (Table 1, paper III). The two SNPs in intron 1 are not in LD with one and other, but one of the SNPs in intron 1, rs251404 is in weak LD, D’= 0.58, with the associated SNP in intron 6, rs34306. For all three SNPs associated with stroke in the case control material, homozygous carriers have a greater risk compared to heterozygous carriers. For two SNPs (rs251404 and rs34306) homozygosity for the common allele confers the greatest risk and for one SNP, (rs251406), homozygosity for the rare

47 allele confers an increased risk (Table 1, paper III). Met326Ile (rs3730089), a previously described type 2 diabetes functional variant,156 was not associated to stroke in our case control material, and was not in LD with the adjacent associated polymorphism, rs34306. Haplotype association analysis for SNPs in LD was performed using the computer program Haploview,154 but the resulting p-values were no more significant than the individual SNP associations.

Novel susceptibility loci, Paper IV A separate genome wide linkage analysis was performed in the extended pedigree (Figure 1, Paper IV and Figure 19).

Figure 19. Seven families connected to common founders eight generations back.

A highly significant genome wide allele-sharing LOD score of 4.66 was obtained for marker D9S1776 (Figure 2, paper IV). From simulation studies this value corresponds to a genome wide P < 0.001. The 27.6 cM 1-LOD drop region between marker D9S1677 and D9S1682 was further investigated by adding microsatellite markers. The allele-sharing LOD score increased to 4.81 over 21 consecutive markers, from D9S1835 to D9S1864. This region showed to be identical by descent in all affected individuals in the families and the region of primary interest was limited to 12.3 cM (Figure 3, paper IV). Haplotype analysis of the region revealed that two of the families, D and F, shared an identical region of 10.5 cM, between markers D9S1828 and D9S1811. Further, families A and B, each shared a smaller portion of this haplotype, limiting the common shared haplotype to 3.2 cM between markers D9S1683 and D9S262 (Table 2, paper IV). This particular haplotype was found among all affected in the four families with hemorrhagic cases in the extended pedigree.

48 DISCUSSION

The use of isolated populations for the analysis of complex genetic traits has been successful in many cases.118,157 Genetic factors identified in homogenous populations have also been confirmed to be more general in nature than previously suspected.91,136 In northern Sweden, the population history contributes to the favorability of the population for genetic studies. For centuries, the population of northern Sweden has had a low immigration rate. The settlers tended to have large families and their descendants stayed in close geographical proximity.7,8 These factors contribute to founder effects and a relatively low genetic diversity within the population. The data presented in papers I-IV gives support for the notion that the northern Swedish population provides a unique resource for genetic studies of complex diseases. To achieve significant results in linkage and association studies of stroke, huge sample sizes are estimated to be required.158 These recommendations do not take into account the scenario of low genetic variability in a population. We have shown that significant results can be achieved both in linkage studies and association studies using a limited number of cases. In paper I, we replicated the original Icelandic linkage findings for chromosome 5q. In paper II we reached similar values in the genome wide scan as other large studies in heterogeneous populations,159 even though no significant values were obtained. In paper III we localized a novel candidate gene associated with stroke and in paper IV we showed significant linkage to a novel stroke locus in an extended pedigree from northern Sweden. The subjects in the genetic studies represents the younger part of stroke cases, the mean age of onset in the combined family-based study was 61 years of age and in the case–control study, 55 years of age. Fifty percent of individuals affected by stroke in Sweden are below 75 years of age and only twenty percent are below 65 years of age.2,160 A younger age of onset means less exposure to environmental factors and thus probably a larger degree of hereditary factors contributing to the disease.63,66 Susceptibility genes identified in a younger population could also have implication for stroke risk in an elderly population, but other study designs also including elderly stroke cases, are then required. The TOAST subclassification within the ischemic stroke cases was difficult to totally complete due to the design of the study. All stroke cases were retrieved in routine health care and clinical information and samples were retrieved from 1985 and onwards, so clinical information, such as CT of the head and ultrasound of the carotid vessels, were not always available in the medical records. The case-control material is small in comparison with international recommendations, but has the advantage of proper epidemiological matching.

49 Guidelines for study designs of genetic association studies of candidate genes have been published by many journals, including sufficient numbers of well-matched subjects and controls, adjustments for multiple testing, replication in independent cohorts and a biological plausible role for the gene of interest. We have reported nominal p-values, not corrected p-values in papers I and III. When correcting for multiple testing by the use of the Bonferroni correction, the p-values for SNPs in paper I is not significant, while one SNP in paper III is still significant at the level of 0.05. The Bonferroni correction is often used when multiple tests have been performed, but the method assumes that all tests are completely independent. When testing multiple SNPs, many SNPs will show patterns of LD and are thus not independent from one another. The Bonferroni correction could therefore be considered too conservative in some instances. A new independent case-control cohort for replication studies is currently being assembled from the Medical Biobank in Umeå, with the goal of including 1000 stroke cases and 1000 matched controls. By the use of the family-based material and the case-control material we performed a whole genome wide linkage scan in the families, we separately replicated linkage on chromosome 5q and investigated this region further using the case-control material and we also identified an extended family which was the subject of a separate genome wide analysis. In the genome wide linkage analysis presented in paper II, the initial linkage findings on nine chromosomal regions were refined in two stages. In the first fine mapping step we detected moderate linkage on chromosomes 9q, 13q and 18p in addition to the previously described region on chromosome 5q.133 After the inclusion of additional families in a second step, linkage was still detected in each of these regions apart from chromosome 9q. In the separate analysis of the second family set, linkage was not found. One possible explanation for the lack of linkage could be that the families in the second set did not provide as much information about IBD sharing within the families as the first family set. The first family set was enriched for families more likely to provide maximum information regarding IBD sharing among affected individuals, obtained by selecting families with multiple affected members and unaffected relatives available for genotyping. Both family study sets were selected using the same genealogical criteria, and displayed a similar stroke risk factor profile. Linkage calculations were performed using three different disease models, from a very broad to a narrower stroke phenotype. Since most reports consider a different pathophysiology for SAH,161 the phenotype in model I, including SAH, might be too broad. However, in 23 of the 109 described families, SAH is clustered with other stroke diagnoses and in several cases the same patient has been affected by more than one stroke type. This may indicate common underlying susceptibility factors for all stroke types and subtypes, similar to the way

50 that the ALOX5AP gene has been reported to contribute to susceptibility for different diseases, i.e. myocardial infarction and ischemic stroke.118 Wiklund et al. investigated the aggregation of ischemic stroke subtypes within affected sibling pairs in a combined analysis of our family-based study and the Siblings With Ischemic Stroke Study (SWISS).162 The subtype of ischemic stroke in a proband was showed to be a poor determinant of the subtype of ischemic stroke in their sibling, suggesting that many genetic risk factors for ischemic stroke may not be specific for one subtype.163 The absence of significant linkage findings in the genome wide screen in Paper II, indicates that there might be several loci contributing to stroke susceptibility and that the study design was too underpowered to detect small genetic effects with statistical significance. In addition to the study design, there are also other important issues to consider in order to improve chances of detecting significant linkage in a genome wide scan, such as the genotyping error rate and the information content. In a study by Abecasis et al. an error rate of 2% at a λs of 2.0 is predicted to reduce the expected lod score by almost half.164 For genomewide linkage studies of sibs, Cardon and Evans concluded that the information content when using a traditional map of microsatellite markers, with one marker about every 10 cM, was significantly lower than the information content using a dense map of SNPs or microsatellites.165 In the genome wide scan, we have tried to avoid some of these problems by increasing the genetic information through including all available relatives in the genotyping and thus also being able to check for mendelian errors and exclude genotypes with mendelian inconsistencies from the linkage analysis. Linkage studies for common stroke are rare. Worldwide, few family- based materials for linkage studies of stroke seem to be available. Obtaining large family-materials in heterogeneous populations is a difficult task, since stroke is such a severe disease and predominantly affects the elderly. Hence, combining linkage data from different stroke studies in a meta-analysis is not possible. Linkage data can, however still be useful in combination with genome wide association data and whole genome expression studies. The replication of linkage to chromosome 5q in paper I, performed on our families from northern Sweden as well as the linkage analysis of Icelandic stroke families revealed two peaks in the region of interest (Gretarsdottir, personal communication). In one of these regions extensive fine mapping and association studies of candidate genes in the Icelandic study have identified the PDE4D gene as the prime candidate for stroke susceptibility. The 5q region has also been investigated by Woo et al. in an association study, reporting association between single SNPs and SNP haplotypes in the PDE4D gene and stroke but also for other SNPs located in the 5q region.128 These findings give further support to the notion that the 5q region could contain several genes with major or minor influences on stroke susceptibility. Meschia et al. attempted to replicate linkage to the PDE4D

51 region using 104 families in SWISS, but was unsuccessful.132 The PDE4D gene has been extensively investigated in several association studies. In a recent meta- analysis of nine case-control studies (data in paper I included), three SNPs showed significant association between PDE4D and ischemic stroke; SNP87, SNP83 and SNP41 in the range of pooled p-value = 0.002-0.003.123 Of those, SNP41 was included in the original at-risk haplotype presented by Gretarsdottir et al.114 In our initial association analysis of PDE4D in paper I we investigated only a few SNPs without obtaining significant association. We have recently extended our association study of PDE4D by analyzing an additional set of 23 SNPs in close proximity to SNP41. However, none of those SNPs or SNP haplotypes showed significant association to stroke in our population, after correcting for the number of SNPs genotyped. In the meta-analysis by Staton et al., the associated SNPs showed heterogeneity in the direction of association for each of the individual SNPs tested. This could imply spurious associations166 but can also suggest that the SNPs tested are in linkage disequilibrium with the causal allele(s). The idea that opposing effects (protective or susceptible) of the same allele can be associated with disease was supported by Lin et al. in a recent symposium where they concluded that opposite effects of alleles may be found in populations characterized by different patterns of inter-locus correlations. Taken together, the results imply that PDE4D is involved in the pathogenesis of ischemic stroke. In paper III, the chromosome 5q region was investigated further by fine mapping, haplotype analysis and association studies. We subsequently identified PIK3R1 as a positional candidate gene contributing to stroke susceptibility. The PIK3R1 candidate gene was identified by investigation of a region linked to common stroke. Haplotype analysis indicated a shared haplotype for affected individuals in some of the families, where PIK3R1 was the only known gene in the region. In addition to these findings, significant association between variants in PIK3R1 and stroke was obtained in a case-control material from the same geographical region. The PIK3R1 protein has a biologically plausible role in stroke pathogenesis. PIK3R1 interacts with tyrosine kinase in signal transduction within the cell, it has a potential role in glucose homeostasis and it directly interacts with the estrogen receptor alpha. Biological actions of PIK3R1, such as participating in inflammation processes, affecting glucose homeostasis and/or activating endothelial nitric oxide synthetase (eNOS)167-170 makes PIK3R1 an attractive candidate for future stroke research. There are several ways to further explore the potential involvement of PIK3R1 in the pathogenesis of stroke. The initial association needs to be replicated in independent stroke cohorts, additional SNPs should be genotyped in order to achieve full LD coverage of the gene according to the HapMap data (which will hopefully aid in identifying a causal SNP), and also genotype-phenotype functional studies have to be performed.

52 In paper IV, a separate genome wide scan was performed in an extended pedigree from northern Sweden, revealing a highly significant allele sharing LOD-score on chromosome 9q. Using haplotype analysis in the families, a common haplotype of 3.2 cM was identified that was shared among four of the families. These four families contained all the identified cases of primary intracerebral hemorrhage in the extended pedigree, indicating that this region could be of specific importance for susceptibility to hemorrhagic stroke. The candidate region includes 19 genes, but none are obvious candidate genes for stroke susceptibility. The region will be further explored by SNP association studies. Future efforts in elucidating the genetic predisposition to common stroke in northern Sweden will involve not only the previously mentioned studies of PIK3R1 and the 9q region in the extended pedigree, but also SNP association analyses aimed at investigating the other linkage regions from the genome wide scan and eventually a whole genome association study.

53 CONCLUSIONS

The main conclusions of this thesis are that by use of the relatively genetic homogeneous population in northern Sweden we were able to detect significant linkage and association for stroke susceptibility. We also identified a novel candidate gene for stroke.

Specific conclusions:

Paper I

Replication of linkage to the 5q region was obtained, but no significant association of variants in the PDE4D gene and stroke were detected, suggesting that different alleles confer susceptibility to stroke in the 5q region in the northern Swedish population.

Paper II

No new major stroke susceptibility loci were identified, indicating that multiple minor susceptibility loci in addition to the previously known locus on chromosome 5 could contribute to the disease.

Paper III

A novel positional candidate gene for stroke, PIK3R1, was identified by haplotype analysis in the linked 5q region and significant association between variants in PIK3R1 and stroke was found.

Paper IV

A novel susceptibility locus was identified on chromosome 9q in an extended family from northern Sweden, which may be of specific importance for the susceptibility of hemorrhagic stroke.

54 CONCLUDING REMARKS

The genetics of stroke is a complicated puzzle. We have just recently left the starting blocks and began to put the first pieces together, still lagging behind other common complex diseases such as Alzheimers Disease and Diabetes Mellitus. During the past 30 years the elucidation of the molecular basis for monogenic diseases has been very successful, bringing new insights into disease mechanisms and new diagnostic tests. For complex diseases the inherited predisposition is more difficult to decipher. Efforts have been made trying to simplify the complex genetic background of common diseases by the use of familial cases, population isolates, strictly defined or intermediate clinical phenotypes and animal models. Genetic studies of familial cases in large pedigrees showing a mendelian inheritance for a common disease might only identify rare, high impact genes but could also recognize defective pathways of more general importance. In populations that are geographically or socially isolated, a higher degree of genetic homogeneity is expected, fewer genetic variants exist and these are technically easier to identify. Also, an isolated population often shows a higher degree of environmental homogeneity in life style, diet and culture. The clinical phenotype is important to define in genetic studies. Stroke is the end result of a number of pathologically different processes with many genes involved and can be looked upon as a standard for late-onset, complex diseases. Numerous interrelated phenotypes and subphenotypes complicate interpretation of data. An intermediate phenotype (e.g. carotid artery IMT for large vessel ischemic stroke) is likely to be easier to study due to a lower number of genes contributing to the phenotype, even though the results may not directly translate into clinical risk. Animal models of human disease are useful in studying complex genetics but are always somewhat hampered by the question of “To what extent can the biological conclusions be translated into human disease processes?” The strategy of performing genetic studies of complex diseases in relatively homogeneous populations has been proven successful. A recent success story is the identification of variations in a G protein-coupled receptor for asthma susceptibility in Finnish and Canadian populations,157 subsequently confirmed in two independent replication studies.171,172 The most overwhelming number of publications in identifying susceptibility genes for complex diseases using a homogeneous population in combination with genealogical data have been produced by the Icelandic pharmaceutical company deCODE genetics. A multitude of projects in complex diseases has lead to the mapping of many susceptibility loci and isolated genes.91,114,118,173 The profile of the company has recently changed and is now focused on drug development. For the prevention of heart attack, the leukotriene pathway is targeted for the development of new drugs. An initial compound is now

55 in Phase III clinical trials and in August 2006, deCODE genetics announced that it has begun enrolling patients in its Phase I clinical development program for a second compound, also inhibiting the production of LTB4. The researchers at deCODE genetics recently reported a new gene associated with type 2 diabetes. A common microsatellite allele in the transcription factor 7-like 2 gene increased the risk for homozygous carriers by a factor of 2.41.91 This is an example supporting the “common disease common variant” hypothesis. Other common genetic polymorphisms that have major effects on disease risk are found in Crohn´s disease and in macular degeneration.174,175 It is tempting to generalize the “common disease common variant” hypothesis for all complex diseases but in some cases there will be rare variants contributing to disease risk. Novel technologies supported by bioinformatics facilitate the genotyping and analysis of data. Linkage analysis has been the hypothesis-free way of detecting novel susceptibility loci and genes, and association studies have been restricted by the knowledge of potential candidate genes. Technological achievements in genotyping now make it possible to perform hypothesis-free whole genome association scans. In whole genome association studies, panels of 300,000 SNPs or more are used. There are, however, statistical concerns in performing whole genome association studies. Elston commented on the problem in a recent symposium by summarizing: “too many SNPs, but yet too few”. Too many SNPs from a statistical point of view, the problem of multiple testing is immense, too few in a biological context, important SNPs will be missed, especially if rare variants are important for disease susceptibility. In addition, massive investments in new technologies, running costs, database facilities, advanced bioinformatics and biostatistics are needed. A trend in complex genetics is to try to combine information from whole genome linkage studies, whole genome association studies and lately also whole genome expression studies in large sets of study samples to identify susceptibility genes and pathways. Ultimately, the expression studies should include data from different tissues at different time points for each individual. Enormous amounts of processors will then be required to take care of and analyse the huge amount of data generated. To extend the knowledge of the impact of genetic variants for disease risk at a population level and also to investigate possible gene-environment interactions, huge international collaborations have been initiated. The Canadian initiative, Public Population Project in Genomics, P3G, is an international consortium for large-scale genetic epidemiology projects and biobanks, including GenomeEUtwin, Estonian Genome Project, Western Australian Genetic Health Project and LifeGene in Sweden. The collaboration is also multidisciplinary, as expertise in genetics, epidemiology, biocomputing, biostatistics, database handling, ethics and legal issues, need to be merged.

56 What are the practical health implications and the clinical use of genetic information? Discovery of the molecular bases underpinning monogenic diseases has had rapid consequences in health care. Numerous diagnostic tests are available both for symptomatic, presymptomatic and prenatal testing. The health benefits for patients and their families are clear, with the right diagnosis of a rare disease, the need for and the intensity of treatment and follow-up can be planned accurately. Knowledge of the exact disease causing mutation makes prenatal testing more thorough. For complex diseases the genetic susceptibility variant must be validated in the context of other known risk factors for the disease. For common stroke, the genetic contribution to increased stroke risk has been estimated to be both as large as environmental factors176 and relatively small compared to other risk factors.29 Genetically defined subgroups of patients could gain from specific behavioural or pharmacological therapy in programs of primary and secondary prevention, keeping in mind that identification of high-risk individuals might also have social and ethical concerns. Clinical benefits will be large if new therapeutic approaches are possible via the development of drugs modulating new disease pathways. In the “Head in the sand or head in the cloud” debate regarding the implementation of complex genetics in routine health care, some advocate a paradigm shift in the classification, prevention and treatment of common diseases, others consider the future use of genetic information as if making a “bird of a feather”, stealing the attention from more concerning issues of world health. A future scenario is probably somewhere in between. For some complex diseases, genetic information will certainly contribute to a large extent in daily clinical decisions.

57 POPULÄRVETENSKAPLIG SAMMANFATTNING (Swedish summary)

Stroke är en vanlig och allvarlig sjukdom som framför allt drabbar äldre. Medelåldern för att insjukna i stroke är 75 år och varje år drabbas cirka 30 000 personer i Sverige. Det finns många olika typer av stroke. Vanligast är att en blodpropp orsakar stopp i ett blodkärl och delar av hjärnan skadas på grund av bristen på syre. Även brustet blodkärl med blödning ut i hjärnvävnaden förekommer. Vem som riskerar att få en stroke är svårt att förutse. En mängd olika riskfaktorer finns beskrivna som var och en ökar risken för att insjukna. Högt blodtryck är en viktig riskfaktor samt även rökning, diabetes, hjärtsjukdom och åderförkalkning i halskärlen. Ansamling av stroke i släkten har också visats vara en oberoende riskfaktor. Dessa ärftliga faktorer samverkar tillsammans med övriga riskfaktorer och bidrar till en ökad risk för att insjukna. Kartläggningen av de ärftliga faktorerna som bidrar till en ökad risk för stroke är mycket svår. Där förändringar i en enda gen orsakar sjukdom är ärftligheten för stroke tydlig och lättare att beskriva. Dessa fall utgör dock bara en procent av alla stroke fall. Vanligast är att variationer i många olika gener samtidigt påverkar risk för stroke. Det finns olika sätt att försöka hitta och beskriva gener som bidrar till sjukdomsutveckling vid stroke och andra folksjukdomar. En metod är att studera folksjukdomar i länder eller områden med en begränsad rörlighet av befolkningen. Vid geografisk eller social isolering av en grupp människor, sprids de genetiska varianter som de första människorna bar med sig till många individer i befolkningen. Antalet möjliga inblandade genvarianter som bidrar till en ökad sjukdomsrisk i en sådan befolkning är färre och därför lättare att hitta. Flera genetiska studier av folksjukdomar har använt sig av denna metod med stor framgång, exempelvis på Island och i Finland. I norra Sverige har det också varit en relativt liten rörlighet i befolkningen och färre genvarianter kan tänkas bidra till risk för stroke. Norrbotten och Västerbotten har också deltagit i undersökningar av folkhälsan (WHO MONICA studien) sedan mitten på 80-talet och det finns en lång tradition av att registrera fall av stroke i regionen. I vår studie av genetiska faktorer vid stroke i norra Sverige har vi använt oss av WHO MONICA stroke registret för att hitta deltagare till studien. Vi har undersökt genetiska faktorer i familjer med flera fall av stroke och även i enskilda fall av stroke jämförda med friska individer (kontroller). I familjestudien har vi sökt efter genetiska faktorer på alla kromosomer, i hela genomet, förutsättningslöst. Vi har också särskilt tittat på en region på kromosom 5 som tidigare beskrivits av forskare på Island och som visat sig bidra till en ökad risk för

58 stroke. Vid undersökningen av hela genomet var det svårt att med säkerhet peka ut några kromosomområden som har särskild betydelse för ökad risk för stroke. När vi undersökte regionen på kromosom 5 lyckades vi dock hitta samma genetiska koppling till stroke som tidigare beskrivits av forskare på Island. Detta är ett viktigt fynd som visar att gener i kromosom 5 regionen har betydelse för stroke även i norra Sverige. I denna region har varianter i en gen (PDE4D-genen) hittats på Island som ökar risken för att drabbas av stroke, sannolikt genom att påverka åderförkalkning i kärlen. Vi analyserade dessa varianter i PDE4D-genen i en fall-kontroll studie, men kunde inte hitta en ökad förekomst av riskvarianterna bland stroke fallen jämfört med kontrollerna. Detta kan betyda att andra genvarianter eller andra gener i kromosom 5 regionen bidrar till en ökad risk för stroke i befolkningen i norra Sverige. Den fortsatta undersökningen av kromosom 5 pekade också på att varianter i en annan gen (PIK3R1-genen) kan bidra till en ökad stroke risk i norra Sverige. PIK3R1-genen identifierades först i några familjer, där det visade sig att de som drabbats av stroke hade ärvt samma kromosomområde, där PIK3R1 var den enda kända genen. Genvarianter i PIK3R1 analyserades sedan i en fall-kontroll studie där det visade sig att några genvarianter bidrog till en ökad risk att insjukna i stroke. Vi vet ännu inte om dessa genvarianter är betydelsefulla för stroke i andra befolkningar och vi vet inte heller hur dessa varianter påverkar risken för stroke. PIK3R1 har en viktig funktion inne i cellerna och påverkar många olika faktorer som kan vara inblandade i en tänkbar sjukdomsprocess för stroke. Fortsatta studier behövs för att bättre förstå om och hur PIK3R1 är inblandad i utvecklandet av strokesjukdom. Under vårt arbete med familjerna tittade vi på deras inbördes släktförhållanden och hittade sju familjer som hade en gemensam anfader. Bland dessa sju familjer fanns det ovanligt många fall av stroke på grund av blödning. Denna stor-familj analyserades separat och ett område på kromosom 9 visade sig vara starkt kopplat till en ökad risk för stroke. Fortsatta studier pågår för att hitta gener och genvarianter i detta område som kan ha en särskild betydelse för risken att drabbas av en blödning i hjärnan. Kunskap om olika genvarianters bidrag till en ökad risk för stroke kan vara betydelsefullt för den enskilda individen. Personer med många riskfaktorer i kombination med en ökad genetisk risk kan erbjudas särskilt stora förebyggande åtgärder för att förhindra stroke. Kännedom om de bakomliggande biologiska processerna vid sjukdomsutveckling kan också bidra till utveckling av nya förebyggande läkemedel.

59 ACKNOWLEDGEMENTS

Many people have contributed with their time and professional skills to make this thesis possible. I really would like to express my warmest gratitude and appreciation to all of you!

In particular I want to thank:

My supervisor Dan Holmberg, I am truly grateful for the opportunity to do this thesis in complex genetics and for your contagious enthusiasm and optimism. You are a great visionary! Also many thanks to my co-supervisors and co-authors at the department of public health and clinical medicine; Birgitta Stegmayr, Tommy Olsson, Kjell Asplund and P-G Wiklund (born an entertainer!) for sharing clinical expertise and wisdom.

The GENOS and MONICA staff; in particular Britt-Inger Eklund and Britta Sjölander for their work with the family-based material, and also Åsa Johansson and Gunborg Rönnberg for keeping track of the registers.

Monica Holmberg and Gösta Holmgren (in memory) for letting me come to work in Umeå in the first glorious days of genetics.

Hans K:son Blomquist, my mentor for an inspiring year, thank you for your time and wise comments, always with an open mind!

Ann-Sofie, my dear old friend, and so many things we have shared…how come we ended up with the same supervisor?

Anna Karin and Tomas, my beloved friends and room-mates, for being true and honest in small things as well as for big thoughts in life, you are great companions for laughter’s (lots!) and tears.

Petter, patiently answering any question about statistics over and over again. You are such a (too) nice person, so kind and generous! You also inherited excellent Illustrator skills!

Stefan, talks a lot and talks a lot again, always funny and (too?) creative. You can really make the worse day happy!

Kurt “Wikström” – Vilken kille! Never afraid of testing new things (with the exception of certain foods). Thank you for proofreading manuscripts and this thesis with encouraging comments! Ingela, the queen of lists…the “Friday´s fika-list” and the “to do thesis-list”. Thank you both and all the best in settling your new Australian life!

60 Elin, Regina and Anna, the girls in Monica´s group next door, for your explosive laughter’s!

Mikael, for great computer skills, we’ just entered the golden era of database handling!

Birgitta B and Iris for always being helpful with all the important administrative matters.

All very friendly and very interesting past and present members of Dan Holmbergs group, medical genetics and Friday´s fika participants including; Marie, Anna L-B, Sofia, Vinicius, Nadia, Johan, Mario (in memory), Ingela B, Corrado, Maria L, Maria E, Ulrika, Åsa L, Åsa E, Susanne, Pia, Lotta, Lisbet L, Solveig, Carin, Lena, Melker, Kristina F, Carina, Björn, Anna-Lena, Kristina L, Mia, Lisbet Ä, Malin, Urban, Irina, Linda and Martin and also Björn-Anders and Fredrik.

Jenni and Kristina C for inspiring talks and walks and excellent Asian food (except for that Chinese dessert I try to forget).

Familjen Werner; Mimmi, Mårten, Vera, Lukas, Alvar and Sonja, my home is your home; your home is my home. You really help out not to forget the important things in life; ”keep the dish-rag soaked, never let it dry!” (Må disktrasan aldrig bli torr i handen på dig!)

My far away friends; Stina, from our small hometown street, now travelling around the world with your family…next stop Canada? Sanna, Karin, Maria and Claes, close friends from the crazy days at Karolinska. I am really looking forward to that trip to Greece!

All friends, neighbours and Ulla (our day mother for years) that make everyday life valuable and enjoyable!

My second family in the south, Kerstin, Nils-Berth, Jerker, Maria and Leo, for always being so overwhelming welcoming!

My parents, Birgitta and Bengt, for believing in me, letting me follow my own mind, and for your endless support! Helping out and taking care of me and my family when the days have too few hours. Maria and Erika, my sisters, their lovely kids and families; Pelle, Hampus and Lukas, David and Tuva. Thanks just for being and for coping having me as your big sister.

My family, the precious ones that I love the most; my “Pokemon-monsters” Jacob, Johan and David, and last, but not least, my soul-mate and companion in life Bjarne!

Big hugs and thank you all!!!

61 REFERENCES

1. Mackay, J. & Mensah, G. The atlas of heart disease and stroke (2004). 2. Socialstyrelsen. (The national board of health and welfare, Stockholm, 2005). 3. Stegmayr, B. & Asplund, K. Stroke in Northern Sweden. Scand J Public Health Suppl 61, 60-9 (2003). 4. Lopez, A. D. & Murray, C. C. The global burden of disease, 1990-2020. Nat Med 4, 1241-3 (1998). 5. Samsa, G. P. et al. Utilities for major stroke: results from a survey of preferences among persons at increased risk for stroke. Am Heart J 136, 703-13 (1998). 6. Collins, R. & MacMahon, S. Blood pressure, antihypertensive drug treatment and the risks of stroke and of coronary heart disease. Br Med Bull 50, 272-98 (1994). 7. Bylund, E. Theoretical considerations regarding the distribution of settlement in inner north Sweden. Geografiska Annaler 53 (1960). 8. Bylund, E. Koloniseringen av Botnia-regionen (Bokförlaget Bra Böcker, Höganäs, 1994). 9. The World Health Organization MONICA Project (monitoring trends and determinants in cardiovascular disease): a major international collaboration. WHO MONICA Project Principal Investigators. J Clin Epidemiol 41, 105-14 (1988). 10. Stegmayr, B. & Asplund, K. Measuring stroke in the population: quality of routine statistics in comparison with a population-based stroke registry. Neuroepidemiology 11, 204-13 (1992). 11. Stegmayr, B. & Asplund, K. Informed consent for genetic research on blood stored for more than a decade: a population based study. Bmj 325, 634-5 (2002). 12. Stegmayr, B. & Asplund, K. [Genetic research on blood samples stored for years in biobanks. Most people are willing to provide informed consent]. Lakartidningen 100, 618-20 (2003). 13. WHO. Stroke--1989. Recommendations on stroke prevention, diagnosis, and therapy. Report of the WHO Task Force on Stroke and other Cerebrovascular Disorders. Stroke 20, 1407-31 (1989). 14. OMIM. Online Mendelian Inheritance in Man (MIM 601367) (Johns Hopkins University, Baltimore, 2006). 15. 'Brien, J. T. et al. Vascular cognitive impairment. Lancet Neurol 2, 89-98 (2003). 16. Warlow, C., Sudlow, C., Dennis, M., Wardlaw, J. & Sandercock, P. Stroke. Lancet 362, 1211-24 (2003). 17. Zhang, L. F. et al. Proportion of different subtypes of stroke in China. Stroke 34, 2091-6 (2003). 18. Kitamura, A. et al. Proportions of stroke subtypes among men and women > or =40 years of age in an urban Japanese city in 1992, 1997, and 2002. Stroke 37, 1374-8 (2006). 19. Wolf, P. A., Abbott, R. D. & Kannel, W. B. Atrial fibrillation: a major contributor to stroke in the elderly. The Framingham Study. Arch Intern Med 147, 1561-4 (1987).

62 20. Nedeltchev, K. et al. Ischaemic stroke in young adults: predictors of outcome and recurrence. J Neurol Neurosurg Psychiatry 76, 191-5 (2005). 21. Adams, H. P., Jr. et al. Classification of subtype of acute ischemic stroke. Definitions for use in a multicenter clinical trial. TOAST. Trial of Org 10172 in Acute Stroke Treatment. Stroke 24, 35-41 (1993). 22. Goldstein, L. B. et al. Improving the reliability of stroke subgroup classification using the Trial of ORG 10172 in Acute Stroke Treatment (TOAST) criteria. Stroke 32, 1091-8 (2001). 23. Hossmann, K. A. Viability thresholds and the penumbra of focal ischemia. Ann Neurol 36, 557-65 (1994). 24. Hakim, A. M. Ischemic penumbra: the therapeutic window. Neurology 51, S44-6 (1998). 25. Reith, J. et al. Body temperature in acute stroke: relation to stroke severity, infarct size, mortality, and outcome. Lancet 347, 422-5 (1996). 26. Bruno, A. et al. Acute blood glucose level and outcome from ischemic stroke. Trial of ORG 10172 in Acute Stroke Treatment (TOAST) Investigators. Neurology 52, 280-4 (1999). 27. Ruigrok, Y. M., Rinkel, G. J. & Wijmenga, C. Familial intracranial aneurysms. Stroke 35 (2004). 28. Goldstein, L. B. et al. Primary prevention of ischemic stroke: a guideline from the American Heart Association/American Stroke Association Stroke Council: cosponsored by the Atherosclerotic Peripheral Vascular Disease Interdisciplinary Working Group; Cardiovascular Nursing Council; Clinical Cardiology Council; Nutrition, Physical Activity, and Metabolism Council; and the Quality of Care and Outcomes Research Interdisciplinary Working Group. Circulation 113, e873-923 (2006). 29. Hankey, G. J. Potential new risk factors for ischemic stroke: what is their potential? Stroke 37, 2181-8 (2006). 30. Hart, R. G., Benavente, O., McBride, R. & Pearce, L. A. Antithrombotic therapy to prevent stroke in patients with atrial fibrillation: a meta-analysis. Ann Intern Med 131, 492-501 (1999). 31. Lawes, C. M., Bennett, D. A., Feigin, V. L. & Rodgers, A. Blood pressure and stroke: an overview of published reviews. Stroke 35, 1024 (2004). 32. Rothwell, P. M. et al. Analysis of pooled data from the randomised controlled trials of endarterectomy for symptomatic carotid stenosis. Lancet 361, 107-16 (2003). 33. Baigent, C. et al. Efficacy and safety of cholesterol-lowering treatment: prospective meta-analysis of data from 90,056 participants in 14 randomised trials of statins. Lancet 366, 1267-78 (2005). 34. Rosenson, R. S. & Tangney, C. C. Antiatherothrombotic properties of statins: implications for cardiovascular event reduction. Jama 279, 1643-50 (1998). 35. Thrift, A. G. Cholesterol is associated with stroke, but is not a risk factor. Stroke 35, 1524-5 (2004). 36. Ezzati, M., Lopez, A. D., Rodgers, A., Vander Hoorn, S. & Murray, C. J. Selected major risk factors and global and regional burden of disease. Lancet 360, 1347-60 (2002).

63 37. Ezzati, M. et al. Estimates of global and regional potential health gains from reducing multiple major risk factors. Lancet 362, 271-80 (2003). 38. Jakovljevic, D. et al. Socioeconomic status and ischemic stroke: The FINMONICA Stroke Register. Stroke 32, 1492-8 (2001). 39. Engstrom, G. et al. Marital dissolution is followed by an increased incidence of stroke. Cerebrovasc Dis 18, 318-24 (2004). 40. Wolf, P. A. et al. Secular trends in stroke incidence and mortality. The Framingham Study. Stroke 23, 1551-5 (1992). 41. Brown, R. D., Whisnant, J. P., Sicks, J. D., O'Fallon, W. M. & Wiebers, D. O. Stroke incidence, prevalence, and survival: secular trends in Rochester, Minnesota, through 1989. Stroke 27, 373-80 (1996). 42. Kiely, D. K., Wolf, P. A., Cupples, L. A., Beiser, A. S. & Myers, R. H. Familial aggregation of stroke. The Framingham Study. Stroke 24, 1366-71 (1993). 43. Sacco, R. L. et al. Race-ethnic disparities in the impact of stroke risk factors: the northern Manhattan stroke study. Stroke 32, 1725-31 (2001). 44. Hassan, A. & Markus, H. S. Genetics and ischaemic stroke. Brain 123, 1784-812 (2000). 45. Casas, J. P., Hingorani, A. D., Bautista, L. E. & Sharma, P. Meta-analysis of genetic studies in ischemic stroke: thirty-two genes involving approximately 18,000 cases and 58,000 controls. Arch Neurol 61, 1652-61 (2004). 46. Woo, D. et al. Genetic and environmental risk factors for intracerebral hemorrhage: preliminary results of a population-based study. Stroke 33, 1190-5 (2002). 47. Zodpey, S. P., Tiwari, R. R. & Kulkarni, H. R. Risk factors for haemorrhagic stroke: a case-control study. Public Health 114, 177-82 (2000). 48. Feigin, V. L. et al. Risk factors for subarachnoid hemorrhage: an updated systematic review of epidemiological studies. Stroke 36, 2773-80 (2005). 49. Jackson, C. & Sudlow, C. Are lacunar strokes really different? A systematic review of differences in risk factor profiles between lacunar and nonlacunar infarcts. Stroke 36, 891-901 (2005). 50. Ohira, T. et al. Risk Factors for Ischemic Stroke Subtypes. The Atherosclerosis Risk in Communities Study. Stroke (2006). 51. Risch, N. Genetic linkage and complex diseases, with special reference to psychiatric disorders. Genet Epidemiol 7, 3-16; discussion 17-45 (1990). 52. Hassan, A. & Markus, H. S. in Stroke Genetics (ed. Markus, H. S.) 165-196 (Oxford University Press, Oxford, 2003). 53. Brass, L. M., Isaacsohn, J. L., Merikangas, K. R. & Robinette, C. D. A study of twins and stroke. Stroke 23, 221-3 (1992). 54. Bak, S., Gaist, D., Sindrup, S. H., Skytthe, A. & Christensen, K. Genetic liability in stroke: a long-term follow-up study of Danish twins. Stroke 33, 769-74 (2002). 55. de Faire, ., Friberg, L. & Lundman, T. Concordance for mortality with special reference to ischaemic heart disease and cerebrovascular disease. A study on the Swedish Twin Registry. Prev Med 4, 509-17 (1975). 56. Gifford, A. J. An epidemiological study of cerebrovascular disease. Am J Public Health Nations Health 56, 452-61 (1966).

64 57. Marshall, J. Familial incidence of cerebrovascular disease. J Med Genet 8, 84-9 (1971). 58. Ostfeld, A. M., Shekelle, R. B., Klawans, H. & Tufo, H. M. Epidemiology of stroke in an elderly welfare population. Am J Public Health 64, 450-8 (1974). 59. Diaz, J. F., Hachinski, V. C., Pederson, L. L. & Donald, A. Aggregation of multiple risk factors for stroke in siblings of patients with brain infarction and transient ischemic attacks. Stroke 17, 1239-42 (1986). 60. Welin, L., Svardsudd, K., Wilhelmsen, L., Larsson, B. & Tibblin, G. Analysis of risk factors for stroke in a cohort of men born in 1913. N Engl J Med 317, 521-6 (1987). 61. Liao, D. et al. Familial history of stroke and stroke risk. The Family Heart Study. Stroke 28, 1908-12 (1997). 62. Jousilahti, P., Rastenyte, D., Tuomilehto, J., Sarti, C. & Vartiainen, E. Parental history of cardiovascular disease and risk of stroke. A prospective follow-up of 14371 middle-aged men and women in Finland. Stroke 28, 1361-6 (1997). 63. Flossmann, E., Schulz, U. G. & Rothwell, P. M. Systematic review of methods and results of studies of the genetic epidemiology of ischemic stroke. Stroke 35, 212-27 (2004). 64. Jerrard-Dunne, P., Cloud, G., Hassan, A. & Markus, H. S. Evaluating the genetic component of ischemic stroke subtypes: a family history study. Stroke 34, 1364-9 (2003). 65. Jood, K., Ladenvall, C., Rosengren, A., Blomstrand, C. & Jern, C. Family history in ischemic stroke before 70 years of age: the Sahlgrenska Academy Study on Ischemic Stroke. Stroke 36, 1383-7 (2005). 66. Schulz, U. G., Flossmann, E. & Rothwell, P. M. Heritability of ischemic stroke in relation to age, vascular risk factors, and subtypes of incident stroke in population- based studies. Stroke 35, 819-24 (2004). 67. Polychronopoulos, P. et al. Family history of stroke in stroke types and subtypes. J Neurol Sci 195, 117-22 (2002). 68. Venter, J. C. et al. The sequence of the human genome. Science 291, 1304-51 (2001). 69. Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860-921 (2001). 70. Kidd, K. K., Pakstis, A. J., Speed, W. C. & Kidd, J. R. Understanding human DNA sequence variation. J Hered 95, 406-20 (2004). 71. Shifman, S. & Darvasi, A. The value of isolated populations. Nat Genet 28, 309-10 (2001). 72. Cavalli-Sforza, L. L. & Bodmer, W. F. The genetics of human populations (W.H. Freeman and Company, San Francisco, 1971). 73. Pastinen, T. et al. Dissecting a population genome for targeted screening of disease mutations. Hum Mol Genet 10, 2961-72 (2001). 74. Bittles, A. H. & Egerbladh, I. The influence of past endogamy and consanguinity on genetic disorders in northern Sweden. Ann Hum Genet 69, 549-58 (2005).

65 75. Carlsson, G. & Fasth, A. Infantile genetic agranulocytosis, morbus Kostmann: presentation of six cases from the original "Kostmann family" and a review. Acta Paediatr 90, 757-64 (2001). 76. Beckman, L. & Bergenholtz, A. Epidermolysis bullosa herdeitaria letalis in northern Sweden. Hereditas 80, 173-6 (1975). 77. Holmgren, G. et al. Geographical distribution of TTR met30 carriers in northern Sweden: discrepancy between carrier frequency and prevalence rate. J Med Genet 31, 351-4 (1994). 78. Lundin, G., Lee, J. S., Thunell, S. & Anvret, M. Genetic investigation of the porphobilinogen deaminase gene in Swedish acute intermittent porphyria families. Hum Genet 100, 63-6 (1997). 79. Peltonen, L. Identification of Disease Genes in Genetic Isolates. Methods 9, 129-35 (1996). 80. Dausset, J. et al. Centre d'etude du polymorphisme humain (CEPH): collaborative genetic mapping of the human genome. Genomics 6, 575-7 (1990). 81. Kong, A. et al. A high-resolution recombination map of the human genome. Nat Genet 31, 241-7 (2002). 82. Kong, X. et al. A combined linkage-physical map of the human genome. Am J Hum Genet 75, 1143-8 (2004). 83. Caspersson, T. et al. Chemical differentiation along metaphase chromosomes. Exp Cell Res 49, 219-22 (1968). 84. Murray, J. C. et al. A comprehensive human linkage map with centimorgan density. Cooperative Human Linkage Center (CHLC). Science 265, 2049-54 (1994). 85. Dib, C. et al. A comprehensive genetic map of the human genome based on 5,264 microsatellites. Nature 380, 152-4 (1996). 86. Broman, K. W., Murray, J. C., Sheffield, V. C., White, R. L. & Weber, J. L. Comprehensive human genetic maps: individual and sex-specific variation in recombination. Am J Hum Genet 63, 861-9 (1998). 87. Terwilliger, J. D. & Ott, J. Handbook of Human Genetic Linkage (The Johns Hopkins University Press, Baltimore, 1994). 88. Morton, N. E. Sequential tests for the detection of linkage. Am J Hum Genet 7, 277- 318 (1955). 89. Lander, E. & Kruglyak, L. Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results. Nat Genet 11, 241-7 (1995). 90. Altmuller, J., Palmer, L. J., Fischer, G., Scherb, H. & Wjst, M. Genomewide scans of complex human diseases: true linkage is hard to find. Am J Hum Genet 69, 936- 50 (2001). 91. Grant, S. F. et al. Variant of transcription factor 7-like 2 (TCF7L2) gene confers risk of type 2 diabetes. Nat Genet (2006). 92. Sawcer, S. et al. Empirical genomewide significance levels established by whole genome simulations. Genet Epidemiol 14, 223-9 (1997). 93. Spielman, R. S. & Ewens, W. J. The TDT and other family-based tests for linkage disequilibrium and association. Am J Hum Genet 59, 983-9 (1996). 94. Johnson, G. C. et al. Haplotype tagging for the identification of common disease genes. Nat Genet 29, 233-7 (2001).

66 95. HapMap. A haplotype map of the human genome. Nature 437, 1299-320 (2005). 96. Jensson, O. et al. Hereditary cystatin C (gamma-trace) amyloid angiopathy of the CNS causing cerebral hemorrhage. Acta Neurol Scand 76, 102-14 (1987). 97. Van Broeckhoven, C. et al. Amyloid beta protein precursor gene and hereditary cerebral hemorrhage with amyloidosis (Dutch). Science 248, 1120-2 (1990). 98. Tournier-Lasserve, E. et al. Cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy maps to chromosome 19q12. Nat Genet 3, 256-9 (1993). 99. Joutel, A. et al. Notch3 mutations in CADASIL, a hereditary adult-onset condition causing stroke and dementia. Nature 383, 707-10 (1996). 100. Joutel, A., Monet, M., Domenga, V., Riant, F. & Tournier-Lasserve, E. Pathogenic mutations associated with cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy differently affect Jagged1 binding and Notch3 activity via the RBP/JK signaling Pathway. Am J Hum Genet 74, 338-47 (2004). 101. Domenga, V. et al. Notch3 is required for arterial identity and maturation of vascular smooth muscle cells. Genes Dev 18, 2730-5 (2004). 102. Evans, A. et al. MORGAM (an international pooling of cardiovascular cohorts). Int J Epidemiol 34, 21-7 (2005). 103. Van Guelpen, B. et al. Folate, vitamin B12, and risk of ischemic and hemorrhagic stroke: a prospective, nested case-referent study of plasma concentrations and dietary intake. Stroke 36, 1426-31 (2005). 104. Wiklund, P. G. et al. Plasminogen activator inhibitor-1 4G/5G polymorphism and risk of stroke: replicated findings in two nested case-control studies based on independent cohorts. Stroke 36, 1661-5 (2005). 105. Gretarsdottir, S. et al. Localization of a susceptibility gene for common forms of stroke to 5q12. Am J Hum Genet 70, 593-603 (2002). 106. Wang, D. et al. A genome-wide scan for carotid artery intima-media thickness: the Mexican-American Coronary Artery Disease family study. Stroke 36, 540-5 (2005). 107. Onda, H. et al. Genomewide-linkage and haplotype-association studies map intracranial aneurysm to chromosome 7q11. Am J Hum Genet 69, 804-19 (2001). 108. Olson, J. M. et al. Search for intracranial aneurysm susceptibility gene(s) using Finnish families. BMC Med Genet 3, 7 (2002). 109. Nahed, B. V. et al. Mapping a Mendelian form of intracranial aneurysm to 1p34.3- p36.13. Am J Hum Genet 76, 172-9 (2005). 110. Koivukoski, L. et al. Meta-analysis of genome-wide scans for hypertension and blood pressure in Caucasians shows evidence of susceptibility regions on chromosomes 2 and 3. Hum Mol Genet 13, 2325-32 (2004). 111. Wu, X. et al. An updated meta-analysis of genome scans for hypertension and blood pressure in the NHLBI Family Blood Pressure Program (FBPP). Am J Hypertens 19, 122-7 (2006). 112. Demenais, F. et al. A meta-analysis of four European genome screens (GIFT Consortium) shows evidence for a novel region on chromosome 17p11.2-q22 linked to type 2 diabetes. Hum Mol Genet 12, 1865-73 (2003).

67 113. Chiodini, B. D. & Lewis, C. M. Meta-analysis of 4 coronary heart disease genome- wide linkage studies confirms a susceptibility locus on chromosome 3q. Arterioscler Thromb Vasc Biol 23, 1863-8 (2003). 114. Gretarsdottir, S. et al. The gene encoding phosphodiesterase 4D confers risk of ischemic stroke. Nat Genet 35, 131-8 (2003). 115. Le Jeune, I. R., Shepherd, M., Van Heeke, G., Houslay, M. D. & Hall, I. P. Cyclic AMP-dependent transcriptional up-regulation of phosphodiesterase 4D5 in human airway smooth muscle cells. Identification and characterization of a novel PDE4D5 promoter. J Biol Chem 277, 35980-9 (2002). 116. Dominiczak, A. F. & McBride, M. W. Genetics of common polygenic stroke. Nat Genet 35, 116-7 (2003). 117. Wang, D. et al. Cloning and characterization of novel PDE4D isoforms PDE4D6 and PDE4D7. Cell Signal 15, 883-91 (2003). 118. Helgadottir, A. et al. The gene encoding 5-lipoxygenase activating protein confers risk of myocardial infarction and stroke. Nat Genet 36, 233-9 (2004). 119. Kennedy, B. P., Diehl, R. E., Boie, Y., Adam, M. & Dixon, R. A. Gene characterization and promoter analysis of the human 5-lipoxygenase-activating protein (FLAP). J Biol Chem 266, 8511-6 (1991). 120. Mehrabian, M. et al. Identification of 5-lipoxygenase as a major gene contributing to atherosclerosis susceptibility in mice. Circ Res 91, 120-6 (2002). 121. Spanbroek, R. et al. Expanding expression of the 5-lipoxygenase pathway within the arterial wall during human atherogenesis. Proc Natl Acad Sci U S A 100, 1238- 43 (2003). 122. Gulcher, J. R., Gretarsdottir, S., Helgadottir, A. & Stefansson, K. Genes contributing to risk for common forms of stroke. Trends Mol Med 11, 217-24 (2005). 123. Staton, J. M. et al. Association between phosphodiesterase 4D gene and ischaemic stroke. J Neurol Neurosurg Psychiatry 77, 1067-9 (2006). 124. Song, Q. et al. Phosphodiesterase 4D polymorphisms and the risk of cerebral infarction in a biracial population: the Stroke Prevention in Young Women Study. Hum Mol Genet 15, 2468-78 (2006). 125. Zee, R. Y. et al. Polymorphisms of the phosphodiesterase 4D, cAMP-specific (PDE4D) gene and risk of ischemic stroke: a prospective, nested case-control evaluation. Stroke 37, 2012-7 (2006). 126. Brophy, V. H. et al. Association of phosphodiesterase 4D polymorphisms with ischemic stroke in a US population stratified by hypertension status. Stroke 37, 1385-90 (2006). 127. Kuhlenbaumer, G. et al. Evaluation of single nucleotide polymorphisms in the phosphodiesterase 4D gene (PDE4D) and their association with ischaemic stroke in a large German cohort. J Neurol Neurosurg Psychiatry 77, 521-4 (2006). 128. Woo, D. et al. Association of Phosphodiesterase 4D with ischemic stroke: a population-based case-control study. Stroke 37, 371-6 (2006). 129. Nakayama, T., Asai, S., Sato, N. & Soma, M. Genotype and haplotype association study of the STRK1 region on 5q12 among Japanese: a case-control study. Stroke 37, 69-76 (2006).

68 130. Saleheen, D. et al. Association of phosphodiesterase 4D gene with ischemic stroke in a Pakistani population. Stroke 36, 2275-7 (2005). 131. van Rijn, M. J. et al. Familial aggregation, the PDE4D gene, and ischemic stroke in a genetically isolated population. Neurology 65, 1203-9 (2005). 132. Meschia, J. F. et al. Phosphodiesterase 4D and 5-lipoxygenase activating protein in ischemic stroke. Ann Neurol 58, 351-61 (2005). 133. Nilsson-Ardnor, S. et al. Linkage of ischemic stroke to the PDE4D region on 5q in a Swedish population. Stroke 36, 1666-71 (2005). 134. Bevan, S., Porteous, L., Sitzer, M. & Markus, H. S. Phosphodiesterase 4D gene, ischemic stroke, and asymptomatic carotid atherosclerosis. Stroke 36, 949-53 (2005). 135. Lohmussaar, E. et al. ALOX5AP gene and the PDE4D gene in a central European population of stroke patients. Stroke 36, 731-6 (2005). 136. Helgadottir, A. et al. Association between the gene encoding 5-lipoxygenase- activating protein and stroke replicated in a Scottish population. Am J Hum Genet 76, 505-9 (2005). 137. Helgadottir, A. et al. A variant of the gene encoding leukotriene A4 hydrolase confers ethnicity-specific risk of myocardial infarction. Nat Genet 38, 68-74 (2006). 138. Hakonarson, H. et al. Effects of a 5-lipoxygenase-activating protein inhibitor on biomarkers associated with risk of myocardial infarction: a randomized trial. Jama 293, 2245-56 (2005). 139. Hakonarson, H. Role of FLAP and PDE4D in Myocardial Infarction and Stroke: Target Discovery and Future Treatment Options. Curr Treat Options Cardiovasc Med 8, 183-92 (2006). 140. Rubattu, S. et al. Chromosomal mapping of quantitative trait loci contributing to stroke in a rat model of complex human disease. Nat Genet 13, 429-34 (1996). 141. Jeffs, B. et al. Sensitivity to cerebral ischaemic insult in a rat model of stroke is determined by a single genetic locus. Nat Genet 16, 364-7 (1997). 142. Simard, J. M. et al. Newly expressed SUR1-regulated NC(Ca-ATP) channel mediates cerebral edema after ischemic stroke. Nat Med 12, 433-40 (2006). 143. Vahedi, K. et al. Hereditary infantile hemiparesis, retinal arteriolar tortuosity, and leukoencephalopathy. Neurology 60, 57-63 (2003). 144. Gould, D. B. et al. Role of COL4A1 in small-vessel disease and hemorrhagic stroke. N Engl J Med 354, 1489-96 (2006). 145. Stegmayr, B., Lundberg, V. & Asplund, K. The events registration and survey procedures in the Northern Sweden MONICA Project. Scand J Public Health Suppl 61, 9-17 (2003). 146. Weinehall, L., Hallgren, C. G., Westman, G., Janlert, U. & Wall, S. Reduction of selection bias in primary prevention of cardiovascular disease through involvement of primary health care. Scand J Prim Health Care 16, 171-6 (1998). 147. Johansson, L. et al. Hemostatic factors as risk markers for intracerebral hemorrhage: a prospective incident case-referent study. Stroke 35, 826-30 (2004). 148. O'Connell, J. R. & Weeks, D. E. PedCheck: a program for identification of genotype incompatibilities in linkage analysis. Am J Hum Genet 63, 259-66 (1998).

69 149. Gudbjartsson, D. F., Jonasson, K., Frigge, M. L. & Kong, A. Allegro, a new computer program for multipoint linkage analysis. Nat Genet 25, 12-3 (2000). 150. McPeek, M. S. Optimal allele-sharing statistics for genetic mapping using affected relatives. Genet Epidemiol 16, 225-49 (1999). 151. Kong, A. & Cox, N. J. Allele-sharing models: LOD scores and accurate linkage tests. Am J Hum Genet 61, 1179-88 (1997). 152. Abecasis, G. R., Cherny, S. S., Cookson, W. O. & Cardon, L. R. Merlin--rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet 30, 97-101 (2002). 153. Abecasis, G. R. & Cookson, W. O. GOLD--graphical overview of linkage disequilibrium. Bioinformatics 16, 182-3 (2000). 154. Barrett, J. C., Fry, B., Maller, J. & Daly, M. J. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263-5 (2005). 155. Clayton, D. A generalization of the transmission/disequilibrium test for uncertain- haplotype transmission. Am J Hum Genet 65, 1170-7 (1999). 156. Chen, S. et al. Association analysis of the variant in the regulatory subunit of phosphoinositide 3-kinase (p85alpha) with Type 2 diabetes mellitus and hypertension in the Chinese Han population. Diabet Med 22, 737-43 (2005). 157. Laitinen, T. et al. Characterization of a common susceptibility locus for asthma- related traits. Science 304, 300-4 (2004). 158. Hassan, A., Sham, P. C. & Markus, H. S. Planning genetic studies in human stroke: sample size estimates based on family history data. Neurology 58, 1483-8 (2002). 159. Samani, N. J. et al. A genomewide linkage study of 1,933 families affected by premature coronary artery disease: The British Heart Foundation (BHF) Family Heart Study. Am J Hum Genet 77, 1011-20 (2005). 160. Socialstyrelsen. (The National Board of Health and Welfare, Stockholm, 2000). 161. Ruigrok, Y. M., Rinkel, G. J. & Wijmenga, C. Genetics of intracranial aneurysms. Lancet Neurol 4, 179-89 (2005). 162. Meschia, J. F. et al. The Siblings With Ischemic Stroke Study (SWISS) protocol. BMC Med Genet 3, 1 (2002). 163. Wiklund, P. G. et al. Lack of Aggregation of Ischemic Stroke Subtypes Within Affected Sibling Pairs. Neurology In press (2006). 164. Abecasis, G. R., Cherny, S. S. & Cardon, L. R. The impact of genotyping error on family-based analysis of quantitative traits. Eur J Hum Genet 9, 130-4 (2001). 165. Evans, D. M. & Cardon, L. R. Guidelines for genotyping in genomewide linkage studies: single-nucleotide-polymorphism maps versus microsatellite maps. Am J Hum Genet 75, 687-92 (2004). 166. Worrall, B. B. & Mychaleckyj, J. C. PDE4D and stroke: a real advance or a case of the Emperor's new clothes? Stroke 37, 1955-7 (2006). 167. Anderson, K. E. & Jackson, S. P. Class I phosphoinositide 3-kinases. Int J Biochem Cell Biol 35, 1028-33 (2003). 168. Terauchi, Y. et al. Increased insulin sensitivity and hypoglycaemia in mice lacking the p85 alpha subunit of phosphoinositide 3-kinase. Nat Genet 21, 230-5 (1999). 169. Almind, K. et al. Characterization of the Met326Ile variant of phosphatidylinositol 3-kinase p85alpha. Proc Natl Acad Sci U S A 99, 2124-8 (2002).

70 170. Simoncini, T. et al. Interaction of oestrogen receptor with the regulatory subunit of phosphatidylinositol-3-OH kinase. Nature 407, 538-41 (2000). 171. Melen, E. et al. Haplotypes of G protein-coupled receptor 154 are associated with childhood allergy and asthma. Am J Respir Crit Care Med 171, 1089-95 (2005). 172. Kormann, M. S. et al. G-Protein-coupled receptor polymorphisms are associated with asthma in a large German population. Am J Respir Crit Care Med 171, 1358- 62 (2005). 173. Gudmundsson, G. et al. Localization of a gene for peripheral arterial occlusive disease to chromosome 1p31. Am J Hum Genet 70, 586-92 (2002). 174. Ogura, Y. et al. A frameshift mutation in NOD2 associated with susceptibility to Crohn's disease. Nature 411, 603-6 (2001). 175. Klein, R. J. et al. Complement factor H polymorphism in age-related macular degeneration. Science 308, 385-9 (2005). 176. Brass, L. M. & Alberts, M. J. The genetics of cerebrovascular disease. Baillieres Clin Neurol 4, 221-45 (1995).

71