IDENTIFICATION OF GENETIC PREDICTORS OF RESISTANT HYPERTENSION THROUGH GENOME WIDE ASSOCIATION ANALYSIS COUPLED WITH INDUCED PLURIPOTENT STEM CELLS

By

NIHAL EL ROUBY

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2017

© 2017 Nihal El Rouby

To my family, My mother Camilia, My father Mohammed El Rouby, My husband Ihab and sons Yaseen and Yousef, My brother Ameer, My sisters Iman, Sara and Heba, My friends Amira Khalil, Maha El badry and Amira Sirag

ACKNOWLEDGMENTS

I would like to give my sincere gratitude to my mentor, Dr. Julie Johnson, for her mentorship, guidance, and continuous support throughout my PhD training. I cannot thank her enough for providing me the opportunity to work in her lab and under her supervision. Her continuous support and mentoring helped me grow and become a better researcher. I learned from her how to think critically of all the details in the research and how to criticize my own work. I believe that any success in my career will be attributed to her continuous teaching, encouragement and support. I can say that I owe her a debt of gratitude for all she has done to help me succeed in my PhD and thereafter.

I would like to thank all my committee members for their continuous guidance and extended support along the way. I am especially grateful to Dr. Caitrin McDonough who taught me everything about genetic analysis. She has been supportive since the very first day I joined the Johnson’s lab. She was always there whenever I wanted to consult and communicate any difficulties that I encounter during my analysis. Her feedback and valuable input have certainly helped me accomplish my project successfully. I would like to thank Dr. Yan Gong, for her tremendous help and support to my study and specifically the statistical aspects of the project. Special thanks Dr.

Naohiro Terada who welcomed me in his lab and provided all the support and help me to learn and grasp a very new field to me. I would also like to thank Dr. Jefferey

Harrison for his advices and insights on the CX3CR1 project. Dr. Harrison has always made time to discuss and support different aspects of my project. Special thanks to Dr.

Matthew Gitzendanner for all his help, guidance and support, especially in the bioinformatics analysis. He always provided help whenever I needed and was always

4

willing to help me troubleshoot some of my codes that never worked. I would like to thank Dr. Jatinder lamba for her valuable advice, input and support for my work during my PhD training. I would like to also thank Dr. Mark Wallet for his advice and input on the monocytes differentiation and CX3CR1.

My sincerest gratitude goes to Dr. Taimour Langaee, Dr. Rhonda Copper-

DeHoff, Dr. Larissa Cavallari, and Dr. Reggie Frye for their support, valuable advice, and help during my PhD. Special thanks to Dr. Carl Pepine for giving me the opportunity to participate in the resistant hypertension clinic during my training, and for his continuous support.

My utmost gratitude also goes to Dr. Katherine Santostefano who taught me everything about iPSC and wet lab experiments. Thank you for your gracious help and support throughout my project. I would also like to thank Dr. Nikolette Biel, Dr. Sonal

Singh, Dr. Natalie Fredette, Dr. Noriko Watanabe, Dr. Chul Han, and Bayli DiVita for all their help with the iPSC project. They have been supportive to my experience in the

Terada’s lab.

I would like to thank Dr. Charles Wood in the department of Physiology and

Functional Genomics for his continuous support throughout my PhD. I must acknowledge NIH grant T32HL083810 and the Department of Physiology and

Functional Genomics for support of my PhD training.

I would like to extend a special thanks to all present and former graduate students and postdocs in the Department of Pharmacotherapy and Translational

Research who have been a family for me. Special thanks to Dr. Mohamed Mohamed

Eslam who has graciously helped me when I first came her to UF. Dr. Mohamed and his

5

wife, Rana have been a family to me. Special thanks to Dr. Mohamed Hossam Shahin who has been always a brother to me. Dr. Mohamed Shahin has never hesitated in helping me and answering any questions I have. Special thanks to my colleague, Carol

Sa, who started her PhD journey at the same time. Carol has been always there to support me during my journey. Special thanks to Dr. Issam Hamadeh, Dr. Shin-wen

Chang, Dr. Mohamed Solayman, Dr. Sonal Singh for their kindness, support and friendship. I would also like to extend my thanks to Ben Burkley, Cheryl Galloway,

Lynda Stauffer, who facilitated part of the research in this dissertation.

I would like to deeply thank my wonderful husband – Ihab– for his tremendous support, love, caring, patience and encouragement; without his support, the completion of my PhD would have never been possible. I would like to thank my sons, Yaseen and

Yousef, for the joy they have created in my life. They always kept me going as I have wanted be a role model for them when they grow up.

I would like to dedicate my great and sincere thanks to my best friend Amira

Khalil, who has been always a motivation force for me. Special thanks to my friends

Maha El Badry and Amira Sirag who have always supported me and made this journey pleasant. Special thanks to my friends, Yasmeen Magdy, Amal Bakry, Razan Alfakir,

Nadia Salloom and Souad Benchemsi for their kindness and support.

Last, but not least, I would like to thank my family members, my father, Dr. El

Rouby, who has been a role model for me. He taught me, by example, the importance of hard work, humbleness and selflessness. My mother has always showered me with unconditional love, support and encouragement. My brother and sisters have

6

surrounded me with care and support throughout every stage of my life; without your support, I would have never accomplished any of my goals.

7

TABLE OF CONTENTS

page

ACKNOWLEDGMENTS ...... 4

LIST OF TABLES ...... 12

LIST OF FIGURES ...... 13

ABSTRACT ...... 15

CHAPTER

1 GENETICS OF RESISTANT HYPERTENSION – A NOVEL PHARMACOGENOMICS PHENOTYPE ...... 18

Introduction ...... 18 Recent RHTN Pharmacogenetics Studies...... 20 Genetic Variation in Sodium and Water Handling Pathways ...... 21 1) Epithelial Sodium Channel (ENAC) ...... 21 2) Aldosterone synthase (Cytochrome P45011B2, CYP11B2) ...... 22 3) Cytochrome P450, family 4, subfamily A, polypeptide 11, CYP4A11 .... 23 Genetic Variation Affecting Response to Dietary Salt Reduction and Diuretics ...... 24  -Adducin ...... 24 Guanine nucleotide-binding protein -polypeptide 3 (GNB3) ...... 25 Neural precursor cell expressed developmentally down – regulated 4- like (NEDD4L) ...... 25 Recent Genetic Studies in RHTN ...... 26 related to vascular function ...... 26 eNos variants and phosphodiesterase type 5 inhibitors ...... 28 RHTN Associations in Genes of Unrelated Pathways ...... 29 Need for Expansive Genetic Approaches in RHTN Pharmacogenomics...... 32 Conclusions ...... 37 Significance ...... 38

2 QUALITY CONTROL AND IMPUTATION PROCEDURES FOR THE SECONDARY PREVENTION OF SUBCORTICAL STROKES DATASET ...... 42

Introduction ...... 42 Methods ...... 45 Tools and software used for QC and imputation ...... 45 Study participants ...... 45 Secondary Prevention of Subcortical Strokes (SPS3) ...... 45 Genotyping ...... 45 Per marker QC procedures ...... 45

8

SNP genotyping call rate and removing SNPs below the missingness call rate ...... 46 SNP deviating from Hardy Weinberg Equilibrium (HWE) ...... 46 SNPs with low frequency and removing Monomorphic SNPs ...... 47 Per sample QC procedures ...... 47 Sample call rate ...... 48 Gender check ...... 48 Relatedness check through genome wide IBS/IBD check ...... 48 Population stratification ...... 49 Imputation ...... 50 Checking genome build, strand alignment with reference panel ...... 51 Phasing into haplotypes ...... 51 Reference panel for imputation ...... 51 Post imputation ...... 52 Association analysis using EPACTS ...... 52 Results ...... 53 SNP quality control ...... 53 Sample quality control ...... 53 Gender mismatch ...... 53 Genome wide Identity by state/Identity by descent (IBS/IBD) check ...... 54 Principal component analysis of ancestry...... 54 Association analysis ...... 54 Post imputation quality metrics ...... 55 Discussion ...... 55

3 GENOME WIDE ASSOCIATION ANALYSIS OF COMMON VARIANTS OF RESISTANT HYPERTENSION (RHTN) ...... 67

Introduction ...... 67 Methods ...... 69 Study Cohorts ...... 69 Resistant Hypertension Phenotype ...... 71 Genotyping, Quality Control, and Imputation ...... 73 Statistical Analyses ...... 74 GWAS Analysis ...... 74 INVEST–SPS3 ...... 74 SNP screening and selection for replication in SPS3 ...... 75 INVEST– SPS3 – eMERGE ...... 76 Risk Score Analysis in INVEST and SPS3 ...... 77 Results ...... 77 Clinical Characteristics of Study Participants and RHTN phenotype ...... 77 Replication of INVEST Significant SNPs in SPS3 ...... 78 Validating INVEST – SPS3 SNPS in eMERGE ...... 79 Genetic Risk Score in INVEST-SPS3 ...... 80 Discussion ...... 81

9

4 IPSC AND CRISPR CAS9 – A FASCINATING TOOL BUT NOT FOR ALL PHENOTYPES ...... 111

Introduction ...... 111 Limitations of Existing In-Vitro Based-Modeling ...... 111 Advantages of induced Pluripotent Stem Cells (iPSC) Based Modeling...... 112 iPSC Based Modeling in Monogenic and Complex Diseases ...... 113 Coupling Genome Editing with iPSC ...... 114 Successful Examples of iPSC Based Modeling in Pharmacogenomics ...... 115 Goal of The Project ...... 117 Methods ...... 121 Source of iPSC Donors ...... 121 a) Generation of iPSC from PBMC and Culture (Step 1, Flow chart, Figure 4-2) ...... 122 b) Differentiation of Unedited iPSC into Monocytes ...... 122 c) Editing (Step 2, Flow chart, Figure 4-2) ...... 124 c-i) Design of gRNA and plasmid vector ...... 125 c-ii) Nucleofection ...... 125 c-iii) Selection of edited clones ...... 126 c-iv) Collecting genomic DNA from edited iPSC ...... 126 c-v) Screening strategy for the edited allele ...... 127 1) Polymerase Chain Reaction (PCR) screen and Sanger sequencing ... 128 2) PCR to amplify transcript from iPSC differentiated monocytes (iPSC- Mo) ...... 129 d)THP-1 culture ...... 129 e) Experiments for characterization and evaluation of monocytes ...... 129 Flow cytometry of surface markers of unedited iPSC differentiated monocytes (iPSC-Mo) ...... 129 Real time PCR to evaluate the gene expression of CX3CR1, CD14 and MAF ...... 130 Western blot to evaluate pERK and PAKT ...... 131 Chemotaxis ...... 133 Results ...... 133 Successful Generation of PEAR iPSC...... 133 Gene Editing Results ...... 134 Monocytes Production from Unedited iPSC Confirmed by CD14 ...... 135 CX3CR1 Expression in Unedited iPSC-Mo (PF0380 and PF0052) ...... 136 Comparing Expression of CX3CR1 and MAF Between Harvests in Unedited iPSC-Mo (PF0623) ...... 137 Western Blot to Detect CX3CR1, pERK and pAKT ...... 137 Chemotaxis ...... 137 Discussion ...... 137 Conclusions ...... 143

5 CONCLUSIONS AND FUTURE DIRECTIONS ...... 157

Summary and Future Directions ...... 165

10

APPENDIX

A CODES USED FOR QC AND IMPUTATION PROCEDURES ...... 169

LIST OF REFERENCES ...... 200

BIOGRAPHICAL SKETCH ...... 230

11

LIST OF TABLES

Table page

1-1 Summary of the most recent RHTN studies ...... 40

2-1 Gender check for SPS3-GENES participants using PLINK ...... 58

2-2 Genome-wide IBS/IBD analysis for SPS3 participants using PLINK ...... 59

2-3 Imputation of SPS3-GENES to 1000Genomes reference panel ...... 65

2-4 GWAS association summary statistics in SPS3 whites ...... 66

2-5 GWAS association summary statistics in SPS3 Hispanics ...... 66

3-1 Clinical characteristics of INVEST (Discovery) and SPS3 (Replication) ...... 87

3-2 Top RHTN signals that met the suggestive evidence of association in INVEST whites and Hispanics (p=1 x10-4) ...... 88

3-3 Top RHTN signals in INVEST that were evaluated for replication in SPS3 ...... 91

3-4 RHTN SNPs: Discovery in INVEST with replication in SPS3...... 92

3-5 Genotype frequencies and Hardy Weinberg Equilibrium of replicated / validated SNPs ...... 93

3-6 Top SNPs from INVEST-SPS3 meta-analysis that were assessed for validation in eMERGE ...... 94

4-1 Sequences of gRNA and primers used ...... 144

12

LIST OF FIGURES

Figure page

1-1 Multiple interrelated pathways involved in the pathogenesis of RHTN...... 41

2-1 A flowchart showing the QC steps that were taken in SPS3 dataset ...... 60

2-2 Principle component ancestry analysis (PCA) of all SPS3-GENES participants ...... 61

2-3 Within race PCA plot in SPS3 shows the percent of variability explained by PCs...... 62

2-4 GWAS associations with RHTN in SPS3 whites after adjusting of clinical covariates and PC1 ...... 63

2-5 A flowchart shows the steps and software used in the imputation process of SPS3-GENES dataset ...... 64

3-1 Top SNPs from INVEST-SPS3 meta-analysis using imputed data ...... 95

3-2 Flowchart showing SNP prioritization for replication in SPS3 (primary approach) or validation in eMERGE (secondary approach) ...... 99

3-3 Regional plot of MSX2 association with RHTN. The rs11749255 (purple dot) is associated with RHTN (p=3.8 x 10-8) ...... 100

3-4 Adjusted odds ratios and 95% CIs for resistant hypertension risk for MSX2 rs11749255...... 101

3-5 Adjusted odds ratios and 95% CIs for resistant hypertension risk for IFLTD1 rs6487504...... 102

3-6 Adjusted odds ratios and 95% CIs for resistant hypertension risk for PTPRD rs324498...... 103

3-7 RHTN association in INVEST-SPS3 meta-analysis using genotyped data ...... 104

3-8 Regional plot of BNC2 association with RHTN...... 105

3-9 Genetic risk score association with RHTN...... 106

3-10 Risk score association with resistant hypertension (RHTN) A) INVEST whites B) INVEST Hispanics. C) SPS3 whites. D) SPS3 Hispanics...... 107

3-11 Data from GTex portal shows that rs11749255 is an eQTL for MSX2 gene in brain anterior cingulate cortex ...... 108

13

3-12 BNC2 functional annotation ...... 109

3-13 Data from GTex portal shows that rs16934621 is an eQTL for BNC2 gene in brain cortex ...... 110

4-1 Schematic figure illustrates the CX3CR1-Fractalkine (FKN) axis...... 145

4-2 Flow chart to illustrate the planned experimental work...... 146

4-3 The guide RNA in light red font binds to the complementary target sequence (dark red font) in exon one...... 147

4-4 Timeline of transfecting iPSC colonies, selecting colonies, and screening colonies ...... 148

4-5 Schematic figure illustrates the screening strategy of editing event...... 149

4-6 PCR screen and Sanger sequencing of PF0052 transfected with gRNA1...... 151

4-7 PCR screen and Sanger sequencing of PF0052 transfected with gRNA2...... 152

4-8 Differentiation of iPSC (PF0623) into monocytes...... 153

4-9 CX3CR1 relative expression of PF0052 and PF0380 compared to THP1 cells...... 154

4-10 Comparing CX3CR1 and MAF relative expression of PF0623 among 5 harvests...... 155

4-11 Chemotaxis assay of iPSC – monocytes (PF0623) in response to fractalkine (FKN) ...... 156

14

Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

IDENTIFICATION OF GENETIC PREDICTORS OF RESISTANT HYPERTENSION THROUGH GENOME WIDE ASSOCIATION ANALYSIS COUPLED WITH INDUCED PLURIPOTENT STEM CELLS

By

NIHAL EL ROUBY

MAY 2017

Chair: Julie A. Johnson Major: Pharmaceutical Sciences

Resistant hypertension (RHTN), defined as uncontrolled blood pressure (BP)

≥140/90 using 3 or more drugs or controlled BP (<140/90) using 4 or more drugs, is associated with adverse outcomes, including stroke, congestive heart failure, and kidney disease. A paucity of data on the genetic determinants of RHTN currently exist.

Therefore, we aimed at identifying genetic markers of RHTN and elucidate the potential mechanisms, by which the identified genetic variants may lead to RHTN.

In Aim 1, we conducted a genome wide association analysis in 1194 white and

Hispanic participants with hypertension and coronary artery disease from the

INternational VErapamil-SR Trandolapril STudy- GENEtic Substudy (INVEST-GENES).

Top variants associated with RHTN at p<10-4 were tested for replication in 585 white and Hispanic participants with hypertension and subcortical strokes from the Secondary

Prevention of Small Subcortical Strokes- GENEtic Substudy (SPS3-GENES).

Additionally, a genetic risk score for RHTN was created by summing the risk alleles of replicated RHTN signals. Further, additional SNPs from the INVEST-SPS3 meta-

15

analysis with P<5x10-5 were tested for replication in a RHTN dataset identified using electronic health data linked to bio-repository data from the electronic Medical and

Genomics Network (eMERGE). rs11749255 in MSX2 was associated with RHTN in

INVEST (Odds ratio (OR) (95% CI) =1.50 (1.2-1.8), p=7.3x10-5) and replicated in SPS3

(OR= 2.0 (1.4-2.8), p= 4.3x10-5), with genome-wide significance in meta-analysis

(OR=1.60 (1.3-1.9), p=3.8 x 10-8). Other replicated signals were in IFLTD1 and PTPRD.

IFLTD1 rs6487504 was associated with RHTN in INVEST (OR=1.90 (1.4-2.5), p=1.1x10-5) and SPS3 (OR=1.70 (1.2-2.5), p=4x10-3). PTPRD rs324498, a previously reported RHTN signal, was among the top signals in INVEST (OR =1.60 (1.3-2.0), p=3.4x10-5) and replicated in SPS3 (OR=1.60 (1.1-2.4), one-sided p=0.005).

Additionally, a BNC2 3’UTR SNP rs16934621 was associated with increased risk of

RHTN in the INVEST-SPS3 meta-analysis (OR =1.8, P =1.5x 10-5) and eMERGE

(OR=1.4, one-sided P=0.015). Further, participants with the higher number of risk alleles were at increased risk of RHTN compared to participants with a lower number (p- trend=1.8x10-15).

In Aim 2, we attempted to functionally validate a RHTN association with a missense SNP (rs3732378) in CX3CR1, using genetically edited, induced pluripotent stem cell (iPSC)-differentiated monocytes. Although, we differentiated iPSC to monocytes, we were not able to functionally validate the association in iPSC- differentiated monocytes due to technical challenges that were encountered in the project, some of which were related to gene editing.

In conclusion, we identified and replicated associations with RHTN in the MSX2,

IFLTD1, and PTPRD regions, and combined these associations to create a genetic risk

16

score. If further validated, these results may provide rationale for precision HTN treatment to reduce RHTN and reveal underpinnings of RHTN.

17

CHAPTER 1 GENETICS OF RESISTANT HYPERTENSION – A NOVEL PHARMACOGENOMICS PHENOTYPE

Introduction

Hypertension (HTN) is the most significant, and common chronic modifiable disease, affecting approximately one billion adults worldwide, and about 80 million adults in the US1-3. Elevated blood pressure (BP) is a major contributor to cardiovascular (CV) morbidity and mortality, including stroke and chronic kidney disease. 4 These HTN-related CV complications are largely preventable if chronic BP elevation is treated and successfully reduced.5, 6 However, the majority of patients with

HTN worldwide do not achieve adequate BP reduction, despite the use of multiple antihypertensive medications, in many cases. 7 Uncontrolled BP despite treatment with multiple antihypertensive medications is referred to as resistant HTN (RHTN). The

American Heart Association defines RHTN in its 2008 position statement as a BP that remains ≥ 140/90 mmHg despite the use of ≥ 3 antihypertensive medications, ideally including a thiazide diuretic; or BP < 140/90 achieved with use of ≥ 4 medications.

RHTN is a complex phenotype with a multifactorial etiology driven by intricate interactions of various pathophysiological mechanisms and it is not possible to attribute

RHTN to a single cause (Figure 1-1.) Primary hyperaldosteronism is more prevalent in

HTN than generally recognized and estimated to be present in 20% of patients with

RHTN.8 Even in the absence of primary hyperaldosteronism, patients with RHTN demonstrate higher levels of aldosterone and volume overload than individuals without

RHTN.9 Additionally, the role of sympathetic hyperactivity in RHTN development is evident and well described, 10,11 and likely interrelated with volume overload in RHTN, particularly through activation of the renin-angiotensin aldosterone system (RAAS),

18

which is the plausible mechanistic link between elevated aldosterone levels and the heightened sympathetic activity.12 Finally, excessive vascular inflammation, and endothelial dysfunction, an end-result of the heightened sympathetic system and elevated aldosterone levels, may be implicated in the development of RHTN.13-15

Inter-individual variation in drug response exists, much of which is derived by genetic variation. Specifically, genetic variation can influence the pharmacokinetics and pharmacodynamics of a drug; for example, a certain genotype may affect the drug metabolizing enzymes, drug transport or the pharmacodynamics target of the drugs.

Pharmacogenomics focuses on studying the genetic variations underlying inter- individual variability of drug response,16, 17 with the goal of personalizing pharmacological treatment to an individual patient and facilitating the discovery of molecular targets.

Single nucleotide polymorphism (SNP), defined as a single base substitution, is considered the most frequently occurring genetic variation, occurring once every 300 nucleotides, with a count roughly estimated at 10 million SNPs across the . These genetic polymorphisms contribute to variability in disease risk and drug response. It is conceivable that individuals who respond variably to drugs may present different subtypes of the disease. For example, patients who demonstrate a good BP reduction to one antihypertensive drug may have a different genetic and biological basis of their HTN than patients who demonstrate a better BP reduction with another drug. As such, the study of genetic determinants of the different BP responses may help elucidate some of the genetic and mechanistic basis of HTN.

19

While HTN pharmacogenomics research has witnessed advancement in the recent years, evidenced by the discovery of many genetic variants for different drug classes18,19-22, slow progress has been made with regards to RHTN genetics.

Nevertheless, there are a handful of genetic studies conducted in patients with RHTN, with a focus on identifying the genetic basis of RHTN. Few other studies were conducted with the aim of evaluating a specific genotype-drug response in patients with

RHTN. In Chapter 1, we will review the recent genetic and pharmacogenetic literature on RHTN (Table 1-1), highlighting studies that could potentially reveal the genetic basis of RHTN or guide treatment selection. Additionally, the potential for pharmacogenomics research in RHTN beyond the candidate gene approach will be discussed, which is likely to identify additional genetic markers and advance the field.

Recent RHTN Pharmacogenetics Studies

Aldosterone exerts its sodium retaining effects through its binding to the mineralocorticoid receptor (MR) in the kidney, with a subsequent translocation of the receptor in the nucleus and transcription of genes; this ultimately leads to the activation of the epithelial sodium channel (ENaC).23, 24 The role of aldosterone in HTN and RHTN mediated through its sodium retaining effects in the kidneys, is also augmented by its adverse effects on the vasculature, such as inflammation, oxidative stress, vascular remodeling and endothelial dysfunction.25

ENaC plays an important role in BP regulation through sodium and volume handling.26, 27 Inappropriate activation of ENaC due to genetic variants can lead to increased sodium and water reabsorption, accompanied by reduced renin secretion, and therefore a low-renin subtype of RHTN.24, 28, 29 Low-renin RHTN status is commonly

20

present in patients of African ancestry who harbor genetic variants in the ENaC; these variants are believed to provide survival advantage in the sub-Saharan desert.30

Mineralocorticoid receptor antagonists (MRAs) e.g. spironolactone and epleronone are highly efficacious medications in RHTN; the BP lowering effects and vascular benefits can occur at even low to moderate doses.31-33 However, MRAs may not be effective when RHTN is driven by over-activation of ENaC, leading to sodium and water retention, and a low renin HTN/RHTN phenotype. In such case, amiloride, a diuretic that specifically blocks ENaC, is a more effective BP lowering agent than an

MRA.34 This is one clear example of how a complex phenotype such as RHTN can have sub-phenotypes that may be pathophysiologically distinct, and how pharmacogenomics research might lead to the potential to guide the selection of antihypertensive agent based on a set of genetic markers that derive the phenotype. Below, we review some of the pharmacogenetic studies that evaluated genotype-associated BP lowering effects in

RHTN.

Genetic Variation in Sodium and Water Handling Pathways

1) Epithelial Sodium Channel (ENAC)

The ENaC rs80311498 polymorphism (p.Arg563Gln or R563Q), first identified by sequencing the β-chain of ENaC in South African black or mixed ancestry patients, was associated with low renin, low aldosterone HTN.28 Recently, the association of the

R563Q variant with HTN and BP response to amiloride, was investigated in patients from South Africa who have RHTN.35 This variant was found to be associated with HTN only in the urban areas but not in Namibian or Northern Cape San, perhaps due to low dietary sodium consumption in these regions. In this study, twenty-two patients with

RHTN who carried one risk allele for the R563Q variant were treated with amiloride, in

21

addition to their antihypertensive regimen. Although the investigators found that BP was significantly reduced by an average of -36/-17 mm Hg (p<0.0001) after amiloride treatment, they neither conducted a comparative analysis to a control group without the variant, nor assessed association between RHTN and controlled BP. Such an analysis will be important to further evaluate the association of this variant with RHTN and BP response. Further, the association needs to be evaluated in other populations.

Nonetheless, this study is an example of pharmacogenetic application in RHTN, with a genotype - guided selection of antihypertensive agents.

2) Aldosterone synthase (Cytochrome P45011B2, CYP11B2)

Aldosterone synthase (Cytochrome P45011B2, CYP11B2) is a rate limiting step in aldosterone synthesis in humans.36 Due to the role of CYP11B2 in aldosterone synthesis, and its previous genetic association with HTN,37 a recent study to assess the association of –344 C/T CYP11B2 polymorphism (rs1799998) on plasma aldosterone levels, in 62 patients with documented RHTN was conducted.38 In this study, after adjusting for gender, body mass index and ambulatory BP, patients who were homozygote for the CYP11B2 variant (TT) had a higher plasma aldosterone than carriers of the wild type C allele. Additionally, patients with TT genotype who were treated with an MRA demonstrated an increased level of aldosterone compared to those treated with an MRA if they were CT or CC genotypes. The authors concluded from this association that patients with TT genotype, may be more susceptible to the aldosterone breakthrough phenomenon,39 which is characterized by an increase in aldosterone following MRA exposure. Confirming this association in additional studies, with larger sample size, may suggest that MRA may not be an optimal treatment for patients with

22

RHTN who are homozygote for the –344 C/T CYP11B2 variant (TT) due to aldosterone escape or breakthrough phenomenon.

3) Cytochrome P450, family 4, subfamily A, polypeptide 11, CYP4A11

The CYP4A11 enzyme converts arachidonic acid to 20-hydroxyeicosatetraenoic acid (20-HETE), which is known to induce natriuresis through ENaC inhibition.40 Genetic variants with inhibitory effects on CYP4A11 may thus promote volume-dependent

RHTN, as a result of ENaC activation and increased sodium and water retention. The rs3890011 is an intronic SNP that has been associated with HTN when evaluated alone or in a haplotype analysis in Chinese population.41 Additionally, this SNP (rs3890011) is in linkage disequilibrium with a loss of functional SNP (rs1126742; Phe434Ser), for which previous associations with HTN have also been reported.42, 43 Although the mechanism of the association for the loss of functional SNP (rs1126742) with HTN is likely due to the deficiency of 20-HETE, the mechanism for rs3890011 association is not clearly known. Therefore, Laffer et al. sought to evaluate the effect rs3890011CYP4A11 on response to drugs that act through ENaC pathway, hypothesizing that this SNP is associated with increased ENaC activity, and thus patients with the variant would be resistant to MRA.44 In this study, eighty-three patients of African descent with plasma renin activity ≤2ng/mL/hr were included. Patients were randomized to spironolactone, amiloride, spironolactone and amiloride or placebo. Patients with the homozygous variant (CC) did not achieve a BP lowering response to spironolactone (+ 6.87.9 / +

4.88.6 mmHg), however, demonstrated BP reduction with amiloride (SBP/DBP: -

6.37.3 /-3.24 mm Hg, p<0.01/<0.05). On the other hand, patients with the GG, and

GC genotypes responded similarly to spironolactone and amiloride (SBP/DBP: -9.89.4

23

/ -6.36.5 for spironolactone, -10.6 8.2 / -5.96.4 for amiloride, p=0.41/0.43). BP was reduced in all three genotypes when amiloride and spironolactone were used in combination. These results suggest that patients with the homozygote variant genotype

(CC) demonstrate increased activity of ENAC, and therefore respond to amiloride and not spironolactone. Again, this finding should be replicated in other studies with larger sample size and additional race groups.

Genetic Variation Affecting Response to Dietary Salt Reduction and Diuretics

Patients with RHTN often have salt sensitivity, and therefore, dietary salt restriction is especially an effective non-pharmacological approach.45,46Additionally, patients with RHTN have occult volume overload, which makes diuretics an essential component for effective RHTN management.47 Variability in response to both dietary salt reduction and diuretics exists, suggesting involvement of genetic variants in sodium handling mechanisms. These genetic variants may confer a salt sensitive HTN phenotype that is typically responsive to diuretics.48 Several studies investigated the association between the genetic variants that influence salt sensitivity and response to dietary salt reduction or diuretics. Because these variants may be important in RHTN, we review some of the studies that evaluated such associations with salt interventions and/or diuretics, particularly thiazide diuretics, a highly efficacious drug class in RHTN treatment.

 -Adducin

 -Adducins are cytoskeletal proteins, directly involved in sodium handling through effects on Na+ K+ -ATPase, and play an active role in sodium reabsorption.49

The non-synonymous SNP (Gly460Trp) in ADD1, the gene encoding adducins,

24

increases the activity of Na+ K+ ATPase, and thus sodium reabsorption. This SNP has been associated with sodium sensitivity and patients with 460Trp carrier had a greater reduction in mean arterial pressure two months post HCTZ treatment, compared to patients with the wild type genotype (-14.7 2 mm Hg vs-7.4 1.3 mm Hg,

P=0.001).50The data with Gly460Trp variant are conflicting, and majority of studies did not replicate this finding.22, 51-53

Guanine nucleotide-binding protein -polypeptide 3 (GNB3)

Similar to ADD1, GNB3 has been linked to salt sensitive HTN and response to thiazide diuretics. The variant T allele of C825T SNP was associated with lower plasma renin and pro-renin, higher aldosterone to renin ratio, and DBP.54 Based on this finding,

Turner et al. tested the association of the C825T SNP with response to HCTZ in a cohort of 197 African Americans and 190 whites from the Genetic Epidemiology of

Responses to Antihypertensives (GERA).55 In the study, the variant T allele was associated with greater BP reduction in response to HCTZ even after adjusting for clinical variables; (SBP/DBP reduction = -6 /-5 mm Hg, p=0.01)55. While the same association was replicated in another study from the Netherlands, whereby the T allele was also associated with a better BP reduction to HCTZ,56 this association was not replicated by others51

Neural precursor cell expressed developmentally down – regulated 4-like (NEDD4L)

NEDD4L encodes for ubiquitin ligase that negatively regulates ENaC, thus influencing sodium tubular reabsorption. The rs4149601 G>A polymorphism results in a cryptic splice site in NEDD4L with the generation of a protein that down regulates more effectively than the wild type.57 As such, the G allele has been associated with a salt

25

sensitive HTN and lower plasma renin compared to the AA genotypes.57, 58 In the

NORDIC Diltiazem (NORDIL) study, the G carriers, who were treated with either a thiazide diuretic or  - blocker, had a significantly greater SBP reduction than AA genotype (-19.5 mmHg vs -15 mmHg ; p<0.001). Similarly, the G carriers, had a significantly greater DBP reduction than AA genotype (-15.4 mm Hg vs -14.1 mm Hg mmHg; p<0.02).59 Non-surprisingly, this genotype-treatment interaction had no effect on

BP response to diltiazem.59 The association with thiazide diuretic was confirmed in the

Pharmacogenomic Evaluation of Antihypertensive Responses (PEAR), whereby a greater reduction in BP in response to HCTZ was observed with increasing copies of the rs4149601 G allele; SBP reduction for GG, GA, AA: -12.4, -10.2 and -7.4 (p=0.03) and DBP reduction for GG, GA, AA: -5.5, -5, and -2.2 (p=0.068)60.

It appears that NEDD4L associations has the most evidence for association with salt sensitivity and thiazide diuretic, and therefore, it may be worthwhile to test its association with RHTN in future studies.

Recent Genetic Studies in RHTN

The previous discussion focused mostly on candidate gene approaches to understanding RHTN through the study of genotype-treatment interactions. Other studies, discussed below, focused on the genetic basis of RHTN outside of the context of treatment response.

Genes related to vascular function

Vascular inflammation and endothelial dysfunction are believed to play a role in the development of RHTN,13 which has stimulated the interest in investigating associations of polymorphisms in genes involved in vascular function, with the hypotheses that these associations may influence BP.

26

Nitric oxide (NO) has been a focus of HTN research due to its role in BP regulation via effects on the endothelium-derived relaxing factor activity.61 Additionally, depletion of NO is implicated in the pathogenesis of HTN.62, 63 NO is produced by NO synthase (NOS), which has three isoforms: neuronal NOS (nNOS), inducible NOS

(iNOS), and endothelial NOS (eNOS). While nNOS and eNOS are constitutively expressed, iNOS is induced by inflammation or cardiac damage.64 During inflammation,

NO produced by iNOS reacts with reactive oxygen species and gives rise to peroxynitrite, which leads to endothelial damage and vascular dysfunction.64, 65

Oliveira-Paula et al. assessed the association between three reported functional

SNPs in iNOS, and RHTN.66 The SNPs evaluated in this analysis included: a missense

SNP (g.2087G>A) that changes an amino acid from serine to leucine with a consequent

67 increase in iNOS activity; the microsatellite (CCTTT)n and g-1026C>A SNPs, located in the promotor region of iNOS, and are associated with increase expression of iNOS.68-

70 Moreover, the association between RHTN and the haplotype made by these three

SNPs was evaluated. The analysis included 113 normotensive individuals, 115 with controlled HTN and 82 with RHTN. The study found that the presence of a variant genotype for g.2087G>A SNP (AA + GA) was more frequent in the HTN and RHTN groups, compared to normotensive group (OR=2.05, p= 0.0016). Additionally, the haplotype (SCA) made by the short microsatellite repeat, the variant allele of g.2087

G>A, and the wild allele of g-1026C>A, was more frequent in the HTN group compared with the RHTN group (OR=0.14, p=0.012). The authors suggested that g.2087G>A variant may predispose to HTN, and that the SCA haplotype may be protective from

RHTN. This study did not investigate the effect of any of the variants on

27

antihypertensive - BP response. Additionally, the variants evaluated in the study were both common in HTN and RHTN, and therefore, it is not clear how to draw a conclusion about the role of these variants in RHTN. eNos variants and phosphodiesterase type 5 inhibitors

As described above, NO plays a key role in maintaining vascular tone and vasodilation. NO also exerts its action through stimulating soluble guanyl cyclase, which in turn converts GTP to cGMP, leading to vascular smooth muscle relaxation and vasodilation.71 cGMP is finally broken down by phosphodiesterase type 5 (PDE5) into

5’GMP. Thus, agents that inhibit the breakdown of cGMP such as phosphodiesterase type 5 inhibitors (e.g. sildenafil and tadalafil) can lead to cGMP accumulation and vasodilatory net effect, and therefore, have been subject for research as potential antihypertensive agents.72 Oliver et al. conducted a proof of concept study to evaluate the effect of combination of isosorbide mononitrate (ISMN) plus sildenafil on BP reduction in patients with RHTN.73 In this small study, six patients with RHTN were randomized in a double blind, four-way cross over, to sildenafil (50mg), ISMN (10mg), a combination of sildenafil plus ISMN or placebo.73 Patients were maintained on their original antihypertensive medications during the study. While each of the agents resulted in a greater BP reduction than placebo, the combination of sildenafil and ISMN produced the largest decline in BP (SBP/DBP of 26/18mm Hg). This study suggested the effectiveness of PDE5I, either alone or in combination, in the treatment of RHTN, however, the results have not been confirmed in larger studies with chronic administration.

Since genetic polymorphisms in eNOS may potentially affect NO availability, and thus may influence the response to drugs acting through the NO pathway, researchers

28

sought to investigate the association of polymorphisms in eNOS, with response to

PDE5I inhibitors. The T786C SNP in eNOS is among the well - studied polymorphisms, and has been associated with lower transcriptional level, and lower levels of NO.74

Other well studied polymorphisms in eNOS include variable number of tandem repeats

(VNTR) in intron 4; and Glu298Asp. Additionally, the haplotype made by the C allele of

T786C; the Glu allele of Glu298Asp; and the 4b allele of tandem repeats (C-Glu-b) has been associated with low nitrate and nitrite levels, suggesting low endogenous NO.75-77

The same haplotype was also associated with higher risk for HTN in adolescents and obese children.78 These polymorphisms were hypothesized to affect drug response to

PDE5I through effects on NO. Therefore, researchers evaluated the association between eNOS T786C and hemodynamic and BP responses, after acute use of sildenafil in RHTN patients.79 Despite the improvement of some of the hemodynamic properties after sildenafil administration including mean arterial pressure, total peripheral resistance (TPR) and diastolic dysfunction parameters, there were actually no significant changes in nitrite levels or cGMP levels after treatment or by genotype.79

Currently, PDE5 inhibitors are not used or recommended for use in RHTN, however, if they gain approval for use in RHTN management, a more thorough investigation of eNOS polymorphisms and haplotypes, will be important to fully understand the impact on BP reduction in RHTN.

RHTN Associations in Genes of Unrelated Pathways

The multifactorial and polygenetic nature of RHTN, motivated the investigation of multiple genes in BP pathways. The associations of the M235T polymorphism (rs699) in the angiotensinogen gene (AGT); the insertion/deletion polymorphism (rs1799572) in the angiotensin I- converting enzyme gene (ACE); and the Glu298Asp polymorphism

29

(rs1799983) in eNOS were tested for association with RHTN. 61 Yugar-Toledo et al. assessed the association of these three genetic variants, their interaction, and the gene:environment interactions, in 70 patients with RHTN, 80 patients with controlled

BP, and 70 normotensive individuals. Although, the study did not find significant associations for any of the studied variants after adjustment for age, gender, body mass index (BMI), low and high density lipoproteins, total cholesterol, and glomerular filtration rate, the T allele of the M235T AGT was associated with increased risk for RHTN, only in age >50.

Lynch et al. evaluated the association of 78 candidate polymorphisms in 2,203 participants with RHTN and 2354 controls in the Genetics of Hypertension Associated treatment (GenHAT),80 the ancillary genetic study to the Antihypertensive and Lipid-

Lowering Treatment To Prevent Heart Attack Trial (ALLHAT).81, 82 After adjustment for clinical covariates, and multiple comparisons, no significant genetic markers were identified in African Americans, however, rs699 (M235T) and rs505 in AGT were associated with RHTN in Caucasians. The wild type allele of rs699 (Met-allele) and the variant allele of rs5051 (G) were associated with increased risk for RHTN; OR = 1.27

(1.12-1.44, P=0.0001) and OR= 1.36 (1.20-1.53, P<0.0001), respectively. Of note, these variants were less common in African Americans, which may explain the negative associations in African Americans. The association for rs699 AGT was, however opposite to what was found in the analysis reported by Yugar-Toledo et al.61, whereby carriers of the T allele were at increased risk of RHTN. This may, however, be explained by differences in LD patterns of the populations investigated in the two studies, whereby

30

the Brazilian population in the study by Yugar-Toledo et al.,61 had higher degree of admixture, than the populations studied in this analysis.83

Lastly, Fontana et al. conducted association analysis using genetic data from the

Human CVD Beadchip array, which contains approximately 50,000 SNPs in ~ 2000 genes involved in CV, inflammation and metabolic processes. The associations for these SNPs were assessed in 526 RHTN; and 1225 controlled HTN from the genetic sub-study of the INternational VErapamil SR Trandolapril Study (INVEST).84, 85 The analysis was conducted separately in European Americans and Hispanics adjusting for predictors of RHTN in INVEST, previously reported by Smith et al.,86 which included age, gender, BMI, congestive heart failure, left ventricular hypertrophy, peripheral vascular disease, percutaneous coronary intervention , stroke, and randomized treatment assignment. An intronic SNP (rs12817819) in ATP2B1 was found to be associated with RHTN in both European American (OR=1.57 (1.17-2.01), P = 2.44 x 10-

3), and Hispanics (OR= 1.76 (1.27-2.44), p = 7.69 x 10-4). Although the ATP2B1 rs12817819 SNP was not statistically significant when evaluated for replication in the

Women’s Ischemia Evaluation Syndrome (WISE),87, 88 the direction of association was consistent with INVEST. Additionally, the chip-wide significance level was achieved when INVEST and WISE were combined through a meta-analysis (OR = 1.65 (1.36-

1.95), meta-analysis p-value = 1.60 x 10-6). ATP2B1 has been associated with HTN in genome wide 89and gene centric studies90 and encodes a plasma membrane calcium/calmodulin-dependent ATPase, which is involved in in intracellular calcium homeostasis and smooth muscle cells contraction. This is the most comprehensive candidate genetic analysis conducted to date for RHTN.

31

Need for Expansive Genetic Approaches in RHTN Pharmacogenomics

In Chapter 1, we reviewed the most recent studies conducted to evaluate the genetic basis of RHTN, some of which evaluated the antihypertensive response to a certain genotype in patients with RHTN. These studies mostly took a candidate gene approach, with a focus on polymorphisms in few genes related to sodium/water handling or vascular tone regulation, both which are essential for BP control. The most recent studies, however, were more comprehensive, such that they studied multiple genes, which is more appropriate for such a complex phenotype as RHTN. It is obvious from this review that despite the recent efforts to study the genetics of RHTN, there has not been sufficient advancement in pharmacogenetics as it relates to RHTN, and that almost none of the evaluated polymorphisms were successfully replicated, even with the most comprehensive candidate studies to date. This is not surprising for candidate gene studies. Additionally, with the narrow scope of the candidate gene studies, it is not possible to gain insights into the genetic basis of RHTN or identify new molecular/or therapeutic targets. Clearly, additional work is warranted to replicate some of the previously tested signals and identify more genetic markers for RHTN through expansive genomics approaches that survey the association of variants across the whole genome with RHTN. Examples include genome wide association analysis

(GWAS) and next generation sequencing studies.

The continuous international efforts to catalogue human common and rare genetic variations, represented by International HapMap project91, 92 , the 1000

Genomes93, 94, and the UK10K project95 will continue to provide the scientific community with wealth of information that can be used to reveal the genetic basis of complex diseases and phenotypes. Additionally, the availability of more comprehensive

32

genotypic platforms (the newer genotyping chips cover ~ 5 million SNPS), and decreasing cost of sequencing methods will make it possible to create genome wide information for large cohorts of patients, which can be used for association analyses to reveal genetic determinants of RHTN.

The implementation of GWAS analyses in antihypertensive pharmacogenomics has led to successful discoveries of novel genetic loci for BP response to individual drug classes.18, 20, 22 Although the application of GWAS in RHTN may seem complicated owing to the presence of multiple drugs, which defines RHTN, it can be conducted by creating a cohort of patients with RHTN and easily controlled HTN patients (non-RHTN), followed by testing the association of common genetic variants with RHTN. This association may be conducted separately in each drug class. While performing the

GWAS analysis by the different drug classes have the potential for identifying genotypes that may portend increased or reduced risk of RHTN within a specific drug class, it can significantly decrease the number patients and further reduce the power to identify significant associations. The presence of large cohorts or consortia with data on medication use and BP response may make this type of analysis feasible.

Ideally, the GWAS analysis should be performed using a dataset that was designed for the purpose of evaluating RHTN. An example of such datasets is the

Resistant Hypertension Optimal Treatment Trial (ReHOT).96 ReHOT is a prospective clinical trial designed to evaluate the prevalence of RHTN and the optimal fourth antihypertensive added to an optimized three-drug regimen, in patients with stage II

HTN recruited from 26 sites in Brazil. Adherence to drug therapy was confirmed by pill count; and ambulatory blood pressure monitoring (ABPM) was used to determine

33

RHTN, after the maximum titration of the three drugs (diuretic: chlorthalidone, angiotensin converting enzyme inhibitor: enalapril or angiotensin receptor blocker: losartan; and calcium channel blocker: amlodipine). Patients with confirmed RHTN were randomized in an open label to either a MRA: spironolactone or a sympatholytic: clonidine. The primary outcome was defined as effective BP reduction assessed by office BP and ABPM after 12 weeks drug treatment. Several biological samples, including DNA, were collected to determine markers associated with RHTN and response to drugs. While this dataset or datasets with similar design are considered optimal to test RHTN genetic associations, these datasets are rare. An alternative approach would be to create RHTN case – control cohorts within existing, well – designed, HTN clinical trials. Herein we use the latter approach.

GWAS analysis assesses the associations for millions of SNPs across the genome, and therefore the chances of false positive associations are considerably high, necessitating replication of the identified associations in similar independent cohorts.

Generally, the identification of appropriate replication cohorts, with well - matched phenotype to the discovery cohort, is considered a major challenge for successful replication of many GWAS signals. For RHTN, it may be even more daunting to find replication cohorts with similar phenotype, in part due to the heterogeneity of RHTN.

The lack of a consensus definition for RHTN among different studies may present another challenge, further emphasizing the need for harmonization of the RHTN phenotype. To address the need for increased sample size and suitable replication cohorts, researchers in the same discipline are establishing collaborative efforts in the form of large consortia that can facilitate genomics analyses, utilizing rich clinical and

34

genetic data generated by research groups. Several pharmacogenomics consortia currently exist and have documented the ability to overcome the shortcomings of individual cohort efforts through these collaborations. The International Consortium for

Antihypertensive Pharmacogenomics Studies (ICAPS: https://icaps-htn.org/) is one example; and represents an ongoing collaborative effort among research groups that own genetic and clinical data on BP response, CV outcomes and adverse metabolic effects to antihypertensive medications, from small and large clinical trials, as well as epidemiologic studies. Within ICAPS, there is a potential for large scale RHTN GWAS analyses utilizing the existing HTN treatment trials with genome wide data and constructed RHTN phenotypes, including the INVEST,85 Secondary Prevention of Small

Subcortical Strokes (SPS3),97, 98 and the Anglo-Scandinavian Cardiac Outcomes Trial

(ASCOT).99

The ultimate goal of pharmacogenomics approach is to translate genetic discoveries into clinically-actionable information that can be used alone or with clinical predictors to customize treatment to an individual patient as opposed to the empirical

“trial and error approach” commonly used in medicine. In RHTN, this ultimately can optimize BP control and avoid long periods of uncontrolled BP that can lead to adverse cardiovascular outcomes. However, this goal is often hindered by various challenges, including but not limited to lack of replicable variants, owing to the small sample sizes; differences in phenotypes; and lack of understanding of the mechanistic basis of the identified genetic signals. While the first two challenges could be overcome to some extent through large pharmacogenomics consortia, the third challenge remains to be addressed. To further advance the use of pharmacogenomics in RHTN, we set out to

35

undertake a GWAS approach using clinical and genome wide data from two clinical trials that were available to us, in which we created RHTN datasets for discovery and replication of genetic associations.

Most of the discovered and replicated association through GWAS, and other sequencing approaches do not have an obvious biological link to the phenotype.

Therefore, after discovery and replication of the genetic associations, additional molecular work is usually needed to understand mechanistically how the associations may lead to the phenotype. This may ultimately help in identifying new molecular targets, another promise of pharmacogenomics. Animal based studies have been generally applied in disease modeling, however, their use may not be suitable for revealing the biological role of human genetic variants, due to factors related to genetic differences between the species, as well as other physiological and developmental differences. Other in vitro based modeling approaches, such as human primary cell lines and immortalized cell lines are also used, but have their limitations related to the finite supply, and inaccessibility of some tissues such as heart, brain and vasculature.

Specifically for immortalized cell lines, issues related to changes in phenotypes during transformation to immortalized cell lines may arise. To this end, the use of induced pluripotent stem cells (iPSC), a state of the art technology, offers promise in modeling the functional consequences of discovered genetic associations.100 Using iPSC, a somatic cell from a specific patient can be transformed into a pluripotent state, with the potential to differentiate into a relevant cellular model for the phenotype under the study.101 For HTN/RHTN pharmacogenomics, iPSC samples from patients who have

HTN and carry the genetic variant of interest, can be transformed into the appropriate

36

tissue for the phenotype; for examples, cardiomyocytes, neurons, endothelial cells, vascular smooth muscle cells, monocytes, neurons, etc. These iPSC–differentiated cells should retain the genomic and biochemical characteristics of the specific patients, and therefore can serve as a platform to evaluate the molecular consequences of the genetic variant at the cellular level, in addition to its potential as a platform for screening drug compounds.

While this technology has made strides in the field of genomics, especially in modeling monogenic diseases, its use is starting to progress in complex disease and pharmacogenomics. Therefore, we hoped to advance the use of iPSC modeling in pharmacogenomics, by performing a pilot study to validate genetic association(s) from our GWAS studies.

Conclusions

In this dissertation, we sought to identify genetic determinants of RHTN by performing GWAS analysis using available clinical and genetic data from two outcomes–based clinical trials in our lab. Additionally, we set out to validate the molecular consequences of identified GWAS variant(s) and demonstrate the use of iPSC in pharmacogenomics. The aims of this project include:

Aim 1: Identify the relationship between genetic markers and RHTN in patients treated with antihypertensive medications for BP control through

Aim 1a: performing a genome wide association study (GWAS) in a hypertensive population treated with antihypertensive medications in a clinical trial setting (Discovery phase), followed by replicating the association results in another independent hypertensive population treated with antihypertensive medications in another clinical trial setting (replication phase).

37

Aim 1b: performing joint meta-analysis between GWAS results of discovery and replication phases followed by replicating the results in other hypertensive study populations.

Aim 2: Understand the mechanistic role and functions of the genetic variant(s) discovered in the association analysis by performing molecular studies using induced pluripotent stem cells (iPSC) that can give rise to various cell lineages relevant to the

BP response phenotype.

Significance

To date, few studies have been conducted to identify genetic predictors of RHTN, and most studies were candidate genes studies, focusing on known biological pathways of BP regulation. Genome-wide association studies (GWAS) allow for unbiased discovery of genetic variants associated with the phenotype, but such studies have not been conducted for RHTN. We consider RHTN to be a pharmacogenetic phenotype, because RHTN is defined solely by the pattern of drug response the patient exhibits.

Therefore, the moderate to large effect sizes observed in drug response phenotypes, like RHTN, is likely to provide a great potential for clinical use of the genetic information to guide therapy, and allow genetic discoveries to be made with much smaller samples sizes than are required for disease genetics.

We anticipate that discovering genetic variants of RHTN through the GWAS approach is going to be important for the followings. First, if sufficient risk for RHTN could be explained through models or algorithms that incorporate clinical and genetic factors, it may be possible to identify those patients at greatest risk for RHTN, and thus a personalized treatment approach (potentially more aggressive) could be developed a priori, allowing for BP control more rapidly, which has been shown to lead to better

38

clinical outcomes. Secondly, our GWAS studies have the potential to provide a greater understanding of the mechanisms underlying RHTN. Finally, it is possible that novel pathways of HTN/RHTN are identified, leading to new drug targets. To our knowledge, this is the first GWAS conducted to investigate the relationship between genetic factors and RHTN using clinical trials data. We anticipate that we will set the stage for replication of GWAS signals through ICAPS, and further GWAS meta-analysis. This will likely identify more genetic predictors of RHTN.

The proposal to perform molecular studies using iPSC-based model, a highly innovative technique, will provide the scientific community with a practical and detailed example for using iPSC as a biological model for the validation of pharmacogenomic findings, and will reveal some of the challenges that researchers should be aware of before setting up iPSC based studies. This will hopefully allow for successful applications of iPSC in pharmacogenomics.

39

Table 1-1. Summary of the most recent RHTN studies Study Subjects Gene/Candidate SNP Findings/Conclusion Pharmacogenomic studies related to aldosterone and aldosterone pathways

Jones ES et 1468 HTN, 471 controls from ENAC (R563Q variant) R563Q variant may be al.[35] three different ethnic groups involved in HTN pathogenesis; patients with RHTN and the variant may benefit from amiloride treatment Fontana et 62 RHTN from Brazil CYP11B2 (-344C/T) Spironolactone may not be al.[38] the preferred treatment for the -344 TT genotype

Laffer et 83 African Americans with volume CYP4A11 (rs3890011) rs3890011 may be al.[44] dependent RHTN randomized to associated with increased spironolactone, amiloride, ENAC activity combination of spironolactone and Spironolactone may not be amiloride or placebo effective for rs3890011 CC genotype

Genetic studies related to vascular inflammation

Oliveira-Paula 113 normotensives, 115 controlled iNOS variants: AA+GA variant of g 2087 et al.[66] HTN, 70 RHTN from Brazil g.2087G>A, was associated with HTN microsatellite (CCTTT)n, g-1026C>A SCA haplotype was associated with protective iNOS haplotype: effect against RHTN (CCTTT)n, g1026C>A, g2087G>A Genetic studies in unrelated genes

Yugar-Toledo 70 normotensives, 80 controlled AGT(M235T), ACE The T-allele of M235 was et al.[61] HTN, 82 RHTN from Brazil (ACEI/D), NO3 associated with RHTN in (Glu298Asp) patients older than 50

Lynch et 2203 RHTN, 2354 controlled HTN 78 candidate SNPs The Met allele and G allele al.[80] from ALLHAT trial of rs699 and rs5051 associated with RHTN in whites

Fontana et 1225 controlled HTN, 516 RHTN ~50,000 SNPs in 2000 ATP2B1 was associated al.[84] from INVEST genes from IBC chip with RHTN

40

Figure 1-1. Multiple interrelated pathways involved in the pathogenesis of RHTN. An overactive sympathetic nervous system may activate the renal sympathetic efferents, with a subsequent activation of RAAS. Excessive aldosterone promotes vascular inflammation and remodeling and may also trigger the activation of sympathetic nervous system. Vascular inflammation derived by excessive aldosterone or activated RAAS may contribute to endothelial dysfunction and arterial stiffness leading to RHTN. The double-headed arrows highlights the interplay of several pathogenic mechanisms in RHTN.

41

CHAPTER 2 QUALITY CONTROL AND IMPUTATION PROCEDURES FOR THE SECONDARY PREVENTION OF SUBCORTICAL STROKES DATASET

Introduction

Genome wide association analysis (GWAS) is used to identify genetic variants associated with complex diseases and pharmacogenomics phenotypes e.g. drug response to antihypertensive agents.22, 102-104

Unlike candidate gene studies, where a limited set of genes and / or variants are studied, GWAS interrogates the whole genome, relying on the use of commercially available arrays105, 106. These arrays are designed essentially to interrogate the whole genome, by capturing common single nucleotide polymorphisms (SNPs), in order of hundreds of thousands to millions of variants; these SNPs tag other variants in which they are in linkage disequilibrium (LD) and travel in haplotype blocks more often than expected by chance107. While these arrays are designed to ensure coverage of whole genome, the risk of false positive findings is greatly increased owing to the multiple testing of markers, with this risk being amplified, if GWAS analyses are conducted without properly cleaned datasets. This not only affects downstream analyses but may have other impacts if other researchers tried to replicate the reported false associations leading to wasting of time, efforts and resources.

With the generation of multidimensional data, comes the need for rigorous procedures in which systematic approaches are clearly outlined and defined to allow the generation of high quality genetic and clinical data. For genome wide genotyping data, these quality control (QC) procedures require the reconciliation of both genetic and demographic/clinical data of participants to identify non-obvious problems like batch

42

effects, sample mix-ups and population stratification, all which can lead to potential errors.

Genome wide imputation is a process by which unobserved genotypes are inferred by statistical methods and has become the gold standard prior to performing

GWAS analyses, especially when multiple cohorts are involved.108-112 Genotype imputation is based on the concept that unrelated individuals would inherit stretches of haplotypes from common ancestors113. Once these stretches of haplotypes are accurately and correctly determined using genotyped SNPs, other untyped SNPs on these haplotype blocks can be inferred or imputed using reference panels with densely genotyped SNP data, as in the HapMap population91, and 1000 Genomes.94, 114

Imputation can improve the GWAS association through the followings: 1) allows the inclusion of many variants in the analysis, and 2) allows meta-analyses to be conducted between large cohorts with large genomics data, through the harmonization of data, when multiple cohorts are genotyped on different platforms.115, 116 Additionally, through the inclusion of dense markers, imputation allows the fine mapping of association results, and potentially honing in on the functional or causal variant(s) within regions of associations.117-119Earlier imputation methods used HapMap populations as reference panels; however, it has been shown that imputation using the 1000 genomes reference panel has outperformed imputations using HapMap reference panels, mainly due to the increased depth of coverage of variants, with greater than 40 million variants being assayed in > 2500 individuals from 14 different ancestral populations.120 Additionally, it is also been shown that using multi-population reference panels, as in 1000Genomes,

43

projects perform better than single population reference panels in imputing variants of low frequency or rare variants.120

Purpose of Chapter 2: To detail and document the QC steps performed to generate a high quality dataset prior to performing imputation procedures and association analyses. We provide rationale and reasoning for the different approaches or steps we have undertaken. This dataset was created with the intention to perform

GWAS in resistant hypertension (RHTN), however, it can be used for any other phenotypes. We carried out steps to impute the genetic sub-study of the Secondary

Prevention of Small Subcortical Strokes (SPS3-GENES) 97, 98 to allow us to perform

GWAS meta-analysis of RHTN between SPS3 and the INternational VErapamil SR-

Trandolapril STudy GENEtic Substudy (INVEST-GENES).84, 85This imputed dataset is made available for GWAS analysis or meta-analysis with other phenotypes, and in fact the imputed SPS3 dataset is being used in multiple other genetic association analyses within the Stroke Genetics Network (SiGN).121 Chapter 2 also covers the steps for association analysis of the imputed dataset. Association and replication results will be covered in Chapter 3. This may provide the scientific community with a guide for QC steps, imputation and post imputation association analysis. We recommend performing the QC steps on “per-SNP” basis prior to conducting QC steps on “per-sample” basis in order not to compromise the power of associations by removing samples that did not perform well because of low performing SNPs. We frequently refer to other QC protocols that are outlined in these references.122, 123All the codes used to produce this dataset are provided in the Appendix A.

44

Methods

Tools and software used for QC and imputation

PLINK124 v1.07, EIGENSOFT,125 SHAPEIT v 2.5.5, Minimac3 v1.0.13126, EPACTS

(http://genome.sph.umich.edu/wiki/EPACTS)

Study participants

Secondary Prevention of Subcortical Strokes (SPS3)

We used genetic data for participants in the genetic sub-study of SPS3 (SPS3-

GENES). In brief, SPS3 was an international, multi- center, clinical trial that recruited participants with a history of subcortical strokes or lacunar strokes97, 98. SPS3 investigated the optimal blood pressure goal and antiplatelet regimen. Patients were randomized into a normal systolic BP (SBP) goal of 130-149 or low SBP goal of less than 130 mm Hg and a double antiplatelet (aspirin + clopidogrel) or a single antiplatelet regimen (asprin +placebo)

Genotyping

SPS3 DNA samples underwent genotyping using the Infinium HumanOmni5-4 v1.1 Beadchip (Illumina, San Diego, California, USA); 960 samples were genotyped in the NIH-supported Center for Inherited Disease Research (CIDR) and an additional 100 samples were genotyped in AKESOgen, Inc (Northcross, GA, USA). Initial dataset included 1060 patients and 4,511,703 SNPs. The following QC steps were performed after the proper calling of the genotypes.

Per marker QC procedures

This involves removing low performing markers, which if left in the final SNP dataset can lead to false associations. These markers include 1) markers with

45

excessive failure call rates across all samples 2) SNPs with low frequency 3) SNPs that are out of Hardy Weinberg Equilibrium (HWE).

SNP genotyping call rate and removing SNPs below the missingness call rate

SNP genotyping call rate is a quality metric, used to measure the percentage of samples that can be called with certainty for each SNP. Genotyping call rate is arbitrarily set for each study, however, there should always be a balance between strictly setting the genotyping call rate to generate a high quality data and the number of

SNPs removed from the study, causing important trait causing variants to be potentially missed. SNP genotyping call rate is usually set between 95 -99%.127, 128 Here, we used

PLINK124 to remove SNPs with low quality or low genotyping efficiency, defined as less than 95%.

SNP deviating from Hardy Weinberg Equilibrium (HWE)

Hardy Weinberg Equilibrium (HWE) is an important metric in evaluating the quality of a genotyped variant and should always be thoroughly checked. Under HWE, the frequency of SNPs is expected to remain constant in the population from generation to generation unless some evolutionary or selection forces act upon the

SNPs.129Deviations from HWE in large and randomly mating populations, with no evidence of selection forces or inbreeding may signal failed genotyping or genotyping- calling errors. In some instances, deviations from HWE may indicate selection and association with trait causing variants, and therefore, the removal of these SNPs may lead to potential causal variants or true associations to be missed.130 Here, we did not exclude SNPs within a certain significance threshold for deviations from HWE; rather, we checked the HWE for our top associated SNPs and flagged these SNPs deviating from HWE (P ≤0.0001). In cases where a pruned, high-quality SNP dataset was needed

46

for other downstream procedures e.g. Principle component analysis (PCA), we filtered out SNPs deviating from HWE (P ≤0.0001).

SNPs with low frequency and removing Monomorphic SNPs

Removing SNPs with low frequency is important for two reasons. First, the small number of heterozygotes and homozygotes makes it hard to call accurately these low frequency variants.131In addition, the association of these low frequency variants is usually driven by very small number of individuals. Second, GWAS analysis are not powered to detect such low frequency variants or rare variants.132 For these two reasons, it is recommended to eliminate low frequency variants (e.g. 1-3%) prior to

GWAS analysis, in addition to the monomorphic SNPs, i.e. SNPs that exist in one alleleic form.

We used PLINK124 to output monomorphic SNPs, which were subsequently removed from the dataset. To output monomorphic SNPs, we calculated the allele frequency at which a SNP exists as a heterozygote. (1/(2 x number of subjects)). We used this calculated minor allele frequency (MAF) as a cut-off to filter out monomorphic

SNPs. We eliminated low frequency at a MAF below than 3% prior to association analysis with genotyped SNPs.

Per sample QC procedures

This involves a few more steps than the “per-marker QC” and some of the steps are performed using a high quality SNP data, filtered on MAF and HWE. These steps include identification of 1) samples with discrepancy between demographic and SNP- determined gender; 2) samples with excessively missing genotypes across all the

SNPs; 3) duplicated or samples with familial relationships; 4) samples with disparate ancestral background.

47

Sample call rate

Sample call rate measures the proportion of accurately called SNPs per sample.

A high missingness rate is indicative of poor or low quality DNA, and should be excluded from analysis. Similar to SNP genotyping call rate, there is always a trade-off between strictly using a stringent threshold for sample call rate to ensure high quality samples, and compromised power due to eliminating a large number of samples that do not meet these thresholds. Sample missingness rate is usually set to 3-7%.127, 133 We excluded samples with SNP call rate below 95%.

Gender check

Comparing reported gender information with genetically determined gender allows identification of sample mix-ups resulting from plating errors131. Gender check is done through determining the homozygosity rate (F) on the X , which is higher in males (F>0.8) than females (<0.2). Any observed discrepancy should be investigated through reviewing the demographic information or text checking for these problematic samples, and the issues may be resolved by confirming that gender information was erroneously entered. Samples with unresolved sex discrepancy should be discarded from the datasets as it may indicate sample swaps and incorrectly linking a phenotype to non-matched genotype. We used PLINK to calculate per sample mean homozygosity (F) across X chromosome markers. The distribution of X chromosome homozygosity was compared to the reported gender. PLINK outputs a file for individuals with observed discordance between reported and estimated sex.

Relatedness check through genome wide IBS/IBD check

The assumption in GWAS is that unrelated samples are included in the analysis, therefore the degree of relatedness between any pair of samples should be assessed.

48

Identity by state is a metric that assesses the proportion of alleles shared between any two individuals across the genome, excluding sex chromosomes131. In cases of pedigree relationships, a pair of related individuals share common alleles more than expected by chance.131 The calculation of IBS is best done when genetic regions of linkage disequilibrium (LD) in the genome are pruned, so that no pair of correlated

SNPs exist within a given a distance. Additionally, the extent of ancestry sharing among a pair of individuals is called Identity by Descent (IBD) and can be estimated using genome wide data. Carefully scrutinizing these metrics helps to identify any cryptic relatedness, detect sample duplicates and potential mix-ups. We conducted a genome wide IBS/IBD on LD pruned, high level quality SNPs (N=174,594 SNPs), with minor allele frequency (MAF) > 10%, Hardy Weinberg Equilibrium >0.001 and SNP genotyping call rate >99.5%. LD pruning and SNP filtering was done using a window of

100 SNPs in steps of 10 SNPs at r2 of 0.5. Duplicated samples or samples from monozygotic twins are expected to have an IBD of 1, IBD of 0.5 indicates first degree relative, IBD of 0.25 indicates second degree relative, and IBD of 0.125 indicates third degree relative. Typically, one individual is removed from a pair of individuals with an

IBD estimate of greater than 0.1875, a value between second and third degree relatives.131

Population stratification

Population stratification refers to the systematic differences in allele frequencies between sub-populations of a given study population, which is driven by the presence of different ancestral groups.134, 135 The presence of mixed ancestral groups within the study population may possibly lead to spurious association, simply due to differences in allele frequencies between cases and controls, stemming from different ancestral

49

origins rather than a genotypic effect on the phenotype136. This can be overcome by ensuring a careful distribution of cases and controls with respect to the ancestral origin.

Principle component analysis (PCA) is a common statistical procedure to define population stratification within a study sample.125, 137 PCA is implemented to explain variability in data through the generation of unrelated variables called principal components from highly correlated variables (SNPs) within a data matrix of observations (each study participant represent an observation). PCA is preferred to more sophisticated and computationally intense statistical methods e.g. multidimensional scaling, a procedure that necessitates a pairwise IBD matrix.138 To separate individuals according to their genetic ancestry, a highly pruned dataset from

HapMap91, 92 representing individuals from Europe (CEU), Africa and Asia may be used to construct a PCA model, where the first two PCs produces enough separation of individuals along their ancestral origins131. We used the same data as in IBS/IBD described above to compute the first 10 principle components (PCs) for all SPS3 participants using EIGENSOFT.125

Imputation

Pre-imputation data processing: We used the SPS3 cleaned data as discussed in previous QC steps. We removed 51,013 duplicate SNPs in the file; these are the SNPs that had rs number and another SNP identifiers such as exm or kpg but the same BP and chromosome number. We also removed rare and monomorphic SNPs as described before as these rare SNPs may affect imputation quality. We consulted the

Minimac3 imputation cookbook http://genome.sph.umich.edu/wiki/Minimac3_Imputation_Cookbook

50

Checking genome build, strand alignment with reference panel

Before phasing the genetic data into haplotypes, one must check that the study dataset is on the same genome build as in the reference panel, otherwise, the dataset has to be lifted over to the same reference panel using UCSC liftover.139, 140 We did not have to do this step since SPS3 dataset was on UCSC hg19 (Genome Reference

Consortium GRCh37), the same genome build as in the 1000Genomes reference panel that was used. Additionally, for a proper imputation, one has to ensure that the study dataset and reference panel alleles are aligned to the same physical DNA strand as in the reference genome.139, 140 Therefore, we obtained the set of SNPs in our dataset that were oriented to the negative strand from the Illumina Manifest file. We used PLINK v1.7 to flip these SNPs to the “+” strand of the reference genome.

Phasing into haplotypes

We used a pre-phasing approach using SHAPEIT to phase all the alleles of the study dataset into haplotypes141, 142. This is different from other imputation methods in which the phasing is done along with the imputation in one step as in BEAGLE.109, 110

The pre-phasing approach speeds the imputation process and allows the use of the pre- phased data for imputation at a later time.112 Following pre-phasing, we used PLINK to split the phased dataset by and then updated the SNP number into a chromosome: basepair position. We then converted the pre-phased data into vcf format which is the format accepted for Minimac3 for imputation.126

Reference panel for imputation

We used 1000 Genomes phase III version5 reference panel.94 This release included a large number of individuals with diverse ancestral origins; sampled from 26 populations, including Africa, Europe, East Asia, south Asia and Americas. Individuals

51

were sequenced using whole genome sequencing and targeted exome sequencing, and received SNP genotyping using dense SNP microarray. This represented a greater diversity for catalogued human genetic variations – in contrast to the earlier release that focused on European populations with more enclosed genetic variations. We downloaded the reference panel (M3VCF format) using this link: http://genome.sph.umich.edu/wiki/Minimac3#Reference_Panels_for_Download

We imputed autosomal chromosomes (1 - 22) of SPS3-GENES dataset to the

1000 Genomes reference panel III using Minimac3, a low – memory, with greater computationally efficiency and similar accuracy to other imputation software e.g.

BEAGLE and Impute2.126 Minimac3 runs on readily phased genotypes and has the capacity to handle large reference panels without compromising the accuracy of imputation.126, 143

Post imputation

Association analysis using EPACTS

We used EPACTS (http://genome.sph.umich.edu/wiki/EPACTS) to perform single variant association analysis. EPACTS supports the VCF output files from

Minimac3, and enables the use of dosage files and adjusting for covariates. We used b.wald test, a logistic wald test that allows the use of a binary phenotype and adjusting for covariates. EPACTS enables filtering on minor allele frequency. We performed the association separately in SPS3 whites and SPS3 Hispanics using a minor allele count filter of 10, equivalent to MAF of 2% and 1.6% in SPS3 whites and SPS3 Hispanics respectively.

To compare the performance of EPACTS, we compared the association results of PLINK and EPACTS for the top RHTN signals in SPS3 whites and Hispanics to

52

ensure comparable test statistics and P-values of association. We also compared the concordance rate between the genotypes of typed SNPs and alleleic dosage of the imputed SNPs for our top signals.

Results

SNP quality control

We eliminated SNPs with genotyping call rate less than 95% across all samples.

We also eliminated SNPs with minor allele frequency difference ≥ 10% between the two batches (samples genotyped in CIDR and ASEkogen). Additionally, monomorphic SNPs and SNPs with MAF less than 0.4% were removed. This step eliminated a total of

1,226,039 SNPs (Figure 2-1)

Sample quality control

Using a sample call rate of 95%, samples with genotyping efficiency below 95% across all SNPs were removed from analysis. We did not remove any samples in this step (Figure 2-1).

Gender mismatch

Samples were flagged problematic if the SNP determined gender and reported gender do not match, or if the SNP or pedigree information related to gender are ambiguous. A sample was called male if homozygosity estimate “F” is greater than 0.8 and a female if “F” is lower than 0.2. Samples with a borderline F (0.222) were not called by PLINK. Forty-six samples were flagged problems as they had a female pedigree and inestimatable gender due to borderline homozygosity rates on X chromosome. In other words, these are the samples for which the sex was not definitively called by PLINK as they had borderline F of 0.222. However, the gender of these samples was confirmed by checking the demographic information, and none of

53

the samples were removed at this step. This suggested careful handling of the samples during processing. (Table 2-1).

Genome wide Identity by state/Identity by descent (IBS/IBD) check

We conducted this analysis using LD pruned, high quality SNPs (N=174,594

SNPs), with minor allele frequency (MAF) > 10%, Hardy Weinberg Equilibrium >0.001 and SNP genotyping call rate >99.5%. LD pruning of SNPs was done using a window of

100 SNPs in steps of 10 SNPs at r2 of 0.5. Duplicated samples or monozygotic samples have an IBD of 1 as shown in (Table 2-2). We removed 11 duplicate samples (Table 2-

2); these duplicated samples were intentionally added to detect genotyping error rate.

Principal component analysis of ancestry

The SPS3 dataset includes participants from Europe (Spain), Northern American

(USA and Canada) and Latin America (Mexico, Chile, Equador, Peru). The first two PCs separated participants into clusters according to their ancestral origins. The PCA plots allowed us to identify mix-up of the samples during data handling and transfer as demonstrated in (Figure 2-2). We also computed within-race PCs to identify any subtle population stratification or substructure. We observed that the first PC (PC1) explains most the variability in SPS3 whites, however, PC1, and PC2, and PC3 explains most of the variability and gave in the best separation in Hispanics (Figure 2-3) Based on this analysis, we included PC1 as one of the covariates in the association model of SPS3 whites, and included PC1, PC2 and PC3 as covariates in the association model for

SPS3 Hispanics.

Association analysis

We performed association analysis using genotyped SNPs in each ethnic/ancestry group separately. The detailed association results will be discussed in

54

Chapter 3. We show here the Manhattan and Quantile-Quantile (QQ) plots for GWAS association of genotyped SNPs with RHTN in SPS3 whites (Figure 2-4). Early departure from the line of unity in a QQ plot may suggest the presence population stratification. We did not notice an early deviation from the line of unity in the example

QQ plot of RHTN association in SPS3 whites (Figure 2-4, B)

Post imputation quality metrics

We summarized the imputation steps performed for SPS3-GENES (Figure 2-

5).The imputation of SPS3 dataset generated 44,155,034 variants, the breakdown of the number of variants per each chromosome is provided in (Table 2-3). To assess how well our imputation worked, we evaluated a few SNPs that were both genotyped and imputed, these SNPs were amongst our top associations in SPS3 whites, SPS3

Hispanics and INVEST-SPS3 meta-analysis (association results in Chapter 3). The results of the imputed SNPs were compared to the genotyped and the degree of concordance between the alleleic dosages and the original genotypes was calculated.

For the five SNPs we evaluated, we observed > 98% concordance between genotypes and alleleic dosages. We also observed similar association results for the SNPs (P- values and effect size (Odds ratio)), when we compared the association results from

PLINK and EPACTS, albeit two different tests statistics were used in the two; PLINK uses likelihood ratio test and EPACTS used wald test (Table 2-4, and Table 2-5).

Discussion

We summarized the steps that were taken in quality control and imputation procedures to produce a high-quality SPS3 dataset and prevent the occurrence of false positive or false negative results due to data quality issues. A cleaned dataset is also critical for successful imputation.

55

Our QC steps removed SNPs with low call rate which are indicative of low quality

SNPs that can lead to spurious associations. We also removed samples with low call rates that may be indicative of low quality DNA. We followed defined steps in order to detect any sample mishandling such as sample swaps, these included gender check, genome wide IBS/IBD, and principle component analysis. These procedures are important to detect any cryptic relatedness or population stratification that is likely to occur in multi-ethnic/ancestral populations like in SPS3.

Additionally, we summarized the steps taken for successful imputation of SPS3 dataset to allow the implementation of GWAS analysis. We imputed the dataset to the multi - population 1000Genomes reference panel using Minimac3. This imputed dataset is comprehensive and rich, with inclusion of rare and structural variants, allowing for fine mapping of associations. Data was made available in two versions, a non-filtered version and a version filtered on Rsq, an imputation quality metrics. We did not filter on

MAF so that the analyst can choose the optimal MAF filter before performing the association analysis. We provide a python script to select the MAF filter and we recommend excluding variants with MAF < 1-3 % as the power of GWAS analysis does not support associations for low frequency variants. Additionally, we summarized the steps needed for association analysis of imputed Minimac3 VCF output using EPACTS.

We report a comparison of the top association results using genotyped and imputed

SNPs from PLINK and EPACTS respectively. We observed that the results are comparable with high concordance between genotypes and imputation results, which make us arrive at the conclusion that our imputation is accurate. We wanted to caution the user that EPACTS involve some extra processing steps to prepare the input files for

56

the meta-analysis in METAL144. We used a python script (provided in the appendix A), to parse the EPACTS output files and extract the two alleles (reference and variant alleles) and MAF in order to format the input files necessary for METAL. However, it appears that EPACTS is the only available software that can run on imputed VCF files from Minimac3. Dosageconvertor is a tool that is recently released to convert dosage files in VCF format from Minimac3 to MACH or PLINK formats. However, we have not used this tool yet.

In summary, this dissertation Chapter is meant to document preparatory steps that were taken to prepare for the analyses described in this dissertation and to allow the SPS3 dataset to be used for numerous other analyses. This also serves as a guide for QC and imputation procedures necessary for GWAS analyses.

57

Table 2-1. Gender check for SPS3-GENES participants using PLINK FID IID PEDSEX SNPSEX STATUS F INVESTIGATION 40 44-005-1 2 0 PROBLEM 0.2217 OK 48 56-063-4 2 0 PROBLEM 0.3549 OK 50 56-092-8 2 0 PROBLEM 0.2259 OK 60 72-019-4 2 0 PROBLEM 0.3648 OK 63 72-084-4 2 0 PROBLEM 0.2594 OK 66 72-225-1 2 0 PROBLEM 0.213 OK 101 11-077-9 2 0 PROBLEM 0.3328 OK 2907314217 2907314217 2 0 PROBLEM 0.2852 OK 2908162310 2908162310 2 0 PROBLEM 0.3449 OK 2909508595 2909508595 2 0 PROBLEM 0.2294 OK 2902874079 2902874079 2 0 PROBLEM 0.2219 OK 2904064749 2904064749 2 0 PROBLEM 0.224 OK 2908472810 2908472810 2 0 PROBLEM 0.314 OK 2906665225 2906665225 2 0 PROBLEM 0.2046 OK 2908525022 2908525022 2 0 PROBLEM 0.2874 OK 2906591393 2906591393 2 0 PROBLEM 0.328 OK 2906388372 2906388372 2 0 PROBLEM 0.2854 OK 2906653722 2906653722 2 0 PROBLEM 0.203 OK 2901833275 2901833275 2 0 PROBLEM 0.3567 OK 2904159111 2904159111 2 0 PROBLEM 0.3259 OK 2908206117 2908206117 2 0 PROBLEM 0.2662 OK 2906457507 2906457507 2 0 PROBLEM 0.2217 OK 2904268091 2904268091 2 0 PROBLEM 0.2568 OK 2903133313 2903133313 2 0 PROBLEM 0.2079 OK 2906173395 2906173395 2 0 PROBLEM 0.2761 OK 2904863333 2904863333 2 0 PROBLEM 0.2777 OK A gender check output from PLINK. Samples with disconcordant gender information are shown as problem samples in the status field. FID is the family ID, IID is the individual ID, PEDSEX is the pedigree sex, 1=males, 2= females, 0= unknown, SNPSEX is the genetic sex estimated by PLINK, 1=males, 2= females, 0= unknown; F is the X chromosome inbreeding (homozygosity constant). We demonstrate how we resolve the gender discrepancy for these samples. These samples had a borderline homozygosity estimate, and therefore, were inestimable by PLINK. We investigated the demographic information files and confirmed that these samples are female

58

Table 2-2. Genome-wide IBS/IBD analysis for SPS3 participants using PLINK FID1 IID1 FID2 IID2 RT EZ Z0 Z1 Z2 PI_HAT 4 11-000-0 2900225622 2900225622 UN NA 0 0.0001 0.9999 1 14 14-032-5 2900205338 2900205338 UN NA 0 0 1 1 21 22-004-3 2900144552 2900144552 UN NA 0 0.0001 0.9999 1 26 24-045-1 2900161228 2900161228 UN NA 0 0 1 1 29 32-008-0 2900161671 2900161671 UN NA 0 0 1 1 55 62-020-3 2900139745 2900139745 UN NA 0 0 1 1 95 79-037-3 2900152702 2900152702 UN NA 0 0 1 1 52 56-217-3 2900239729 2900239729 UN NA 0.0001 0 0.9999 0.9999 62 72-032-1 2900222130 2900222130 UN NA 0.0001 0 0.9999 0.9999 100 83-007-0 2900159731 2900159731 UN NA 0.0001 0 0.9999 0.9999 76 73-013-0 2900193029 2900193029 UN NA 0 0.0005 0.9995 0.9998 87 2904290168 87 2903424145 FS 0.25 0.0014 0.9059 0.0927 0.5457 7 2906477441 7 2905464668 FS 0.25 0.2904 0.4191 0.2905 0.5 A pairwise IBD estimation using LD pruned, high quality SNPs (N=174,594 SNPs), with minor allele frequency (MAF) > 10%, Hardy Weinberg Equilibrium >0.001 and SNP genotyping call rate >99.5%. FID1:Family ID for first individual, IID1:Individual ID for first individual, FID2:Family ID for second individual, IID2:Individual ID for second individual, RT: Relationship type given PED file, EZ: Expected IBD sharing given PED file, Z0: P(IBD=0), Z1:P(IBD=1), Z2: P(IBD=2), PI_HAT: P(IBD=2)+0.5*P(IBD=1) ( proportion IBD). Eleven samples with Pi_hat = ~1 indicating duplicate samples that were removed. The last two rows show two pair of individuals with PI_HAT of 0.5457 and 0.5 respectively, indicating related individuals, which we decided to keep in our dataset

59

Figure 2-1. A flowchart showing the QC steps that were taken in SPS3 dataset

60

A)

B)

Figure 2-2. Principle component ancestry analysis (PCA) of all SPS3-GENES participants. A) Individuals of same ancestral origin are not clustering together after PCA indicating swaps. B) Issue was resolved after QC indicated by samples clustering along their ancestral origin

61

A)

B)

C)

Figure 2-3. Within race PCA plot in SPS3 shows the percent of variability explained by PCs A) A plot of PC1 versus PC2 within SPS3 white participants, PC1 and PC2 explains 1.9% and 1.4% of variability respectively. B) A plot of PC1 versus PC2 within SPS3 Hispanics, PC1 and PC2 explains 11% and 3.7% of variability respectively. C) A plot of PC1 versus PC3 within SPS3 Hispanics, PC1 and PC3 explains 11% and 2.1% of variability respectively.

62

A)

B)

Figure 2-4. GWAS associations with RHTN in SPS3 whites after adjusting of clinical covariates and PC1. A) Manhattan plot of genotyped SNPs association with RHTN in SPS3 whites. B) QQ plot of observed versus expected P value of genotyped SNPs association with in SPS3.

63

Figure 2-5. A flowchart shows the steps and software used in the imputation process of SPS3-GENES dataset

64

Table 2-3. Imputation of SPS3-GENES to 1000Genomes reference panel Chromosome Imputation output Filter at Rsq score 0.3 1 3,738,239 1,967,707 2 4,057,612 2,252,065 3 3,355,938 1,893,975 4 3,338,264 1,900,755 5 3,032,421 1,720,709 7 2,753,496 1,522,579 8 2,651,560 1,481,454 9 2,063,095 1,090,983 10 2,334,089 1,295,654 11 2,333,241 1,281,535 12 2,242,719 1,229,655 13 1,661,699 969,097 14 1,535,591 833,205 15 1,404,163 734,111 16 1,549,315 748,459 17 1,345,834 659,635 18 1,319,628 753,740 19 1,084,534 492,860 20 1,047,612 574,417 21 653,790 347,196 22 652,194 310,460 Total 44,155,034 24,060,251

65

Table 2-4. GWAS association summary statistics in SPS3 whites SPS3_Genotyped (PLINK) SPS3_Imputed (EPACTS) CHR SNP Mb A1 A2 OR P SNP A1 A2 OR P 5 rs10515283 98.1 G A 3.84 1.05E-07 5:98093671_T/C C T 4.00 1.79E-07 3 rs9683037 193.5 G A 3.08 9.79E-07 3:193529913_A/G G A 3.15 9.94E-07 19 rs2216662 9.1 A G 3.13 2.50E-06 19:9057721_C/T T C 3.13 2.51E-06 3 rs9814527 193.5 G A 2.85 3.26E-06 3:193533703_A/G G A 2.66 1.28E-05 1 rs10494190 117.4 A G 0.13 8.98E-06 1:117392940_G/A A G 0.19 4.08E-05 Comparison of summary statistics from PLINK and EPACTS for top GWAS SNPs that were genotyped and imputed in SPS3 whites. CHR:Chromosome, Mb: position in megabases, A1 and A2: coded alleles 1 and 2, OR:odds ratio, P= P value of association. Similar values for effect size in OR and P for association were observed in PLINK and EPACTS

Table 2-5. GWAS association summary statistics in SPS3 Hispanics SPS3_Genotyped (PLINK) SPS3_Imputed (EPACTS) CHR SNP Mb A1 A2 OR P SNP A1 A2 OR P 1 kgp349220 237.0 A G 5.42 1.20E-06 1:236951332_C/T T C 7.05 1.07E-06 1 rs2282366 236.9 A G 3.94 2.90E-06 1:236924506_A/G A G 4.00 2.30E-06 9 rs10989766 104.8 A G 3.44 3.13E-06 9:104771134_G/A A G 3.57 3.77E-06 9 rs17206790 104.8 A G 3.34 5.28E-06 9:104765534_G/A A G 3.54 3.25E-06 1 rs4659719 236.9 C A 3.73 5.62E-06 1:236944429_G/T G T 3.70 7.08E-06 9 kgp272368 104.8 A G 3.13 8.25E-06 9:104768512_C/T T C 3.13 8.24E-06 10 kgp10906714 104.9 G A 4.21 8.38E-06 10:104852002_T/C C T 4.58 1.10E-05 10 kgp12243048 104.9 A G 4.21 8.38E-06 10:104909890_A/G A G 4.17 8.39E-06 Comparison of summary statistics from PLINK and EPACTS for top GWAS SNPs that were genotyped and imputed in SPS3 Hispanics. CHR:Chromosome, Mb:Base pair position in megabases, A1 and A2: coded alleles 1 and 2, OR:odds ratio, P= Pvalue of association. Similar values for effect size in OR and P for association were observed in PLINK and EPACTS

66

CHAPTER 3 GENOME WIDE ASSOCIATION ANALYSIS OF COMMON VARIANTS OF RESISTANT HYPERTENSION (RHTN)

Introduction

High blood pressure (BP) is a leading cause of cardiovascular (CV) complications including stroke, heart failure and kidney disease145. Despite the availability of numerous effective antihypertensive drug classes and medications within each class, nearly half of the patients with hypertension (HTN) continue to have uncontrolled BP and a subset of these of patients suffer from resistant hypertension

(RHTN)7. According to the American Heart Association position statement in 2008,

RHTN is defined as uncontrolled BP (>140/90 mm Hg) despite the use of maximum tolerated doses of 3 or more antihypertensive medications or controlled BP with the use of 4 or more medications, ideally with a diuretic included146.

Hypertensive patients with RHTN are at a higher risk of CV outcomes, including stroke, congestive heart failure, compared to patients with easily controlled BP.86, 147-

151These worse outcomes are likely caused by prolonged periods of uncontrolled BP.

More concerning, patients with RHTN are at a higher risk of declining kidney function, which can further lead to poor BP control. Secondary analysis of the Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial (ALLHAT) show that patients with RHTN were twice as likely to develop end-stage renal disease (ESRD) compared to patients with controlled BP, in a model adjusted for clinical comorbidities

(OR (95% CI) 2.11(1.20–3.70))147. Additionally, in the population – based study of the

Reasons for Geographic and Racial Differences in Stroke (REGARDS), the rate of dialysis initiation was found to be 6 times higher in patients with RHTN compared to patients with controlled BP (OR (95% CI) 6.32(4.30–9.30))152. Further, patients with

67

chronic kidney disease (CKD) and RHTN from the Chronic Renal Insufficiency Cohort

(CRIC) were at increased risk for a 50% reduction in estimated glomerular filtration rate

(eGFR) or ESRD (OR (95% CI) 1.28(1.11–1.46))153

RHTN is driven by a multitude of genetic, life style and clinical factors. CKD is considered among the major comorbidities in patients with RHTN with prevalence rates estimated to be 30% – 40% in patients with CKD.153,154,13 Other documented clinical and life style risk factors include increasing age, diabetes, obesity increased salt intake and lack of exercise.47, 155 While the clinical and life style risk factors of RHTN have been extensively studied, genetic factors of RHTN are less well studied and most of the published data come from small, non-replicated candidate gene studies35, 38, 44, 156, 157.

Most of these candidate gene studies focused on genes related to vascular dysfunction, and renal handling of sodium and water.35, 38, 44, 156, 157

With the notion that RHTN is a complex pharmacogenomics phenotype driven by multiple genetic variants, that lead to inadequate response to different classes of BP lowering medications and a more difficult to control BP, a genome spanning analyses hold the promise of revealing genetic determinants of RHTN156. These types of analyses should identify genes and pathways driving RHTN phenotype, and which may be shared with other pathophysiological mechanisms underlying hypertension, renal disease, and CV disease. More importantly, genetic variants, if discovered and validated, might be utilized in prediction algorithms, along with clinical factors, to identify high-risk individuals, for whom strict risk factors modifications is highly warranted, or who may benefit from specific interventions that target a more aggressive treatment

68

approach such as intensive up-titration of antihypertensive medications to ensure adequate BP control.

In the current study, we sought to identify genetic variants of RHTN through the use of a genome wide association analysis (GWAS) and create a genetic risk score using replicated RHTN signals from this analysis, which will need to be further validated in independent RHTN datasets. Discovery GWAS was performed in a cohort of hypertensive patients with documented coronary artery disease from the INternational

VErapamil- SR trandolpril STudy Genetic Sub-study (INVEST-GENES) and replication

GWAS was performed in an independent cohort of hypertensive patients with stroke from the Secondary Prevention of Small Subcortical Strokes Genetic Sub-study (SPS3-

GENES). As a secondary validation approach, we performed a look-up of associated

SNPs in a third RHTN dataset from the electronic MEdical Records & GEnomics

(eMERGE) network, the only readily available dataset with both GWAS data and a constructed RHTN phenotype.

Methods

Study Cohorts

The discovery GWAS analysis of the current study included clinical data and

DNA samples from white and Hispanics participants recruited as part of the

INternational VErapamil- SR trandolpril STudy; (https://clinicaltrials.gov/ identifier,

NCT00133692). The INVEST was an international, multi-center clinical trial investigating cardiovascular outcomes of hypertensive patients with coronary artery disease after randomization to a -blocker strategy (B, atenolol) or calcium antagonist strategy (CA, verapamil)85. Briefly, patients in both arms received the assigned drug in a step-wise titration to a pre-determined BP goal of 140/90 mm Hg. Add-on drugs included

69

hydrochlorothiazide (HCTZ), trandolapril and other antihypertensives that were added to the drug regimen in a protocol-specific manner if the BP goal was not attained. Patients were followed for an average of 2.7 years. INVEST-GENES, the genetic sub-study of

INVEST included 5979 participants with DNA samples available, 1529 of which had genome wide genotypic information available.

The replication effort for the GWAS analysis of the current study included clinical data and DNA samples from white and Hispanics participants recruited as part of the

Secondary Prevention of Small Subcortical Strokes – GENES (SPS3–GENES). The

SPS3 was an international, multicenter clinical trial evaluating the optimal antiplatelet regimen and BP target goal for patients with a history of subcortical stroke

(http://www.clinicaltrials.gov, NCT00059306)97, 98. Briefly, 3020 patients with a recent history of subcortical (lacunar) stroke were randomized in a 2x2 factorial design to either a dual antiplatelet regimen of 325mg aspirin plus 75mg clopidogrel or a single antiplatelet regimen (325mg aspirin plus placebo), and a systolic blood pressure (SBP) goal of 130-149 mmHg or a lower SBP target (<130 mmHg). The primary endpoint of

SPS3 was time to stroke recurrence. Secondary outcomes were defined as major vascular events including myocardial infarction, ischemic stroke or vascular death. The average follow up of patients in the BP targets arm was 4 years. SPS3-GENES included

1139 participants with available DNA samples, 1049 of which had genome wide genome wide genotypic information available.

The main studies and genetic sub-studies of INVEST and SPS3 were approved by the institutional review board at each study site and patients in both genetic sub-

70

studies provided informed consent for DNA collection and participation in the genetics research.

A secondary validation analysis for the current study was performed using data from electronic health records linked to DNA bio-repository as part of the electronic

MEdical Records & GEnomics (eMERGE) network from 10 sites in the US. This includes phase I: Group Health/University of Washington (GH/UW), Marshfield Clinic

(MFC), Mayo Clinic (MC), Northwestern University (NU), and Vanderbilt University (VU), and phase II: Children’s Hospital of Philadelphia, Boston Children’s Hospital, Cincinnati

Children’s Hospital Medical Center, Geisinger Health System (GHS), and Mount Sinai

School of Medicine (MSSM)158, 159.

Resistant Hypertension Phenotype

INVEST: RHTN was defined using medication and BP measurements at the visit prior to experiencing study outcomes or censoring84, 86. Participants were classified as

RHTN if their SBP was  140 or DBP  90 using three or more medications, or if they were using 4 or greater antihypertensive medications regardless of BP. Participants with

SBP  140 or DBP  90 mm Hg while on 2 or fewer medications were excluded from this analysis. Previous analyses in INVEST have shown that a strict definition of RHTN requiring the addition of diuretics did not change the association between RHTN and outcomes86, we therefore did not require the inclusion of thiazide diuretics in the definition of RHTN for this analysis. As a confirmation, we performed a sensitivity analysis for the top associated signals using the strict definition of RHTN: SBP was 

140 or DBP  90 with the use of at least 3 antihypertensive medications including a diuretic, or using 4 or greater antihypertensive medications including a diuretic

71

regardless of BP. Participants with SBP <140 and DBP <90 mm Hg using 3 or fewer medications were included in the controlled BP group.

For the analyses described herein, we included a total of 1194 participants with

GWAS data who met the criteria for the RHTN dataset as either having RHTN or controlled BP. This included 657 whites (226 RHTN; 431 controlled BP) and 537

Hispanics (143 RHTN; 394 controlled BP).

SPS3: To construct the RHTN phenotype in SPS3, we excluded non- hypertensive participants and focused on hypertensive participants in both SBP targets.

RHTN status was defined using BP readings and medications at the 12-month follow-up visit, which allowed enough time for BP medication titration to be complete and ensure that RHTN status was not driven by addition of more BP lowering medications in the low

BP target goal at later time points. We observed a high concordance rate (>90%) in

RHTN status between any two consecutive visits within a 6 months window from the 12 month visit (i.e. 12 months ±6 months). Similar to INVEST– GENES, RHTN was defined as BP 140/90 using 3 or more medications or controlled BP (<140/90 mm Hg) on 4 or more medications. For our analyses in SPS3–GENES, we included 585 hypertensive participants with available GWAS data who met the criteria for RHTN dataset as either having RHTN or controlled BP; 263 whites (71 cases; 192 controlled

BP) and 321 Hispanics (83 cases; 239 controlled BP).

eMERGE: The RHTN dataset was constructed using EHR-linked data of hypertensive patients from 7 sites in eMERGE (excluding the pediatric sites: Children’s

Hospital of Philadelphia, Boston Children’s Hospital, Cincinnati Children’s Hospital

Medical Center). RHTN was defined according to two algorithms; the first algorithm

72

defined patients as RHTN if they have an outpatient SBP> 140 mm Hg or DBP>90 despite the use of three or greater antihypertensive medication classes for at least one month after meeting medication criteria, and the second algorithm defined patients as

RHTN if they used at least 4 concomitant antihypertensive medication classes. Patients with controlled BP were defined as hypertensive patients with SBP<135 mmHg and

DBP <90 Hg, and used one antihypertensive medication. Patients were excluded if they had systolic heart failure or chronic kidney disease. For this analysis, 2417 patients of

European descent were included.

Genotyping, Quality Control, and Imputation

INVEST: Genomic DNA samples from INVEST-GENES were genotyped on the

Illumina OmniExpress Exome chip at the RIKEN Center of Integrative Medicine in

Yokohoma, Japan. Samples and SNPs were excluded if call rates were below 95%.

Principal component analysis was performed with a linkage disequilibrium (LD) pruned data set using the EINGENSTRAT method. Race/ethnicity was self-described and then confirmed with principle component analysis (PCA) data from the GWAS, and reassigned based on the PCA data when appropriate. Genotypes were phased using

SHAPEIT2 and imputed to the 1000 Genomes phase III reference panel93 using

Minimac3126 (http://genome.sph.umich.edu/wiki/Minimac3).

SPS3: Genomic DNA was genotyped using Infinium HumanOmni5-4 v1.1

Beadchip (Illumina, San Diego, California, USA) at CIDR and AKESOgen. The QC and imputation procedures for SPS3 were described in details in Chapter 2. Genotype data were imputed to the 1000 Genomes phase III reference panel93 using Minimac3126

(http://genome.sph.umich.edu/wiki/Minimac3).

73

eMERGE: Patients in eMERGE phase I were genotyped on Illumina 660W-

Quad, and patients in eMERGE phase II were genotyped on Illumina 550 Illumina 610,

Illumina HumanOmniExpress, and Affymetrix 6.0. The details of imputation procedures were described in a publication by Verma et al.139 Patients were included if call rates

>98% and SNPs with call rates >99% and minor allele frequencies >2%. Genetic ancestry was evaluated using STRUCTURE and EIGENSTRAT. Race/ethnicity were either self –identified or assigned. Genotypes were phased using SHAPEIT2 and imputed to the 1000 Genomes cosmopolitan reference panel (n=1,092) using IMPUTE2.

Statistical Analyses

Clinical characteristics of INVEST and SPS3 participants are presented as frequency and percentage for categorical variables, and means  standard deviations for continuous variables. Univariate logistic regression was used to evaluate the differences in clinical characteristics between participants with and without RHTN.

Analysis of clinical characteristics was performed using SAS version 9.3 (SAS Institute

Inc, Cary, NC).

The overall analyses framework used in this study is illustrated in Figure 3-1. Our analyses included GWAS analysis in INVEST using genotyped data (discovery), replication in SPS3 (primary validation approach), meta-analysis between INVEST and

SPS3, and validation in eMERGE (secondary validation approach).

GWAS Analysis

INVEST–SPS3

First, we assessed the associations between RHTN and 696,317 genotyped

SNPs in INVEST-GENES (N=1194) using logistic regression analysis and adjusting for clinical predictors of RHTN in INVEST as reported previously by Smith et al.86

74

Significant clinical predictors of RHTN in SPS3 were selected using step-wise multivariable logistic regression. An entry p-value of 0.1 was used and clinical covariates with a p-value less than 0.05 remained in the model and were used as covariates in the genetic analysis. The GWAS analysis was performed separately in whites and Hispanics, based on PCA-defined genetic race, using PLINK124 and adjusting for clinical predictors of RHTN in INVEST-GENES (age, gender, BMI, diabetes, congestive heart failure, left ventricular hypertrophy, drug assignments) and ancestry specific PCA: PC1 in whites and PC1 and PC2 in Hispanics.

Second, we performed study-wide, fixed effect, inverse variance weighted meta- analysis in METAL144 using association summary statistics of INVEST whites and

INVEST Hispanics, with the assumptions that functional SNPs should have consistent association across racial/ethnic groups160. Genome wide significance was set at 5x10-8

161 and suggestive SNPs were arbitrarily set at 1x10-4 in order to not dismiss biologically important SNPs that may be excluded from replication if they do not meet a more stringent cut-off p-value.

SNP screening and selection for replication in SPS3

We set out to validate genome wide significant and suggestive SNPs from

INVEST meta-analysis by replication in independent hypertensive participants (N=585) from SPS3, the primary replication cohort for this GWAS study. Studies have shown that carefully selected SNPs based on functional evidence and biological plausibility are more likely to be replicated162, 163, therefore we adopted a screening strategy to prioritize loci to move forward for replication. Suggestive SNPs with the same direction of association in INVEST whites and INVEST Hispanics were prioritized if they were deemed functional or located in gene regions with a biological link to the RHTN

75

phenotype (Figure 3-2). SNPs were determined to have a functional or biological link by meeting one of the following criteria: 1) eQTL(s) or overlap with active chromatin state as reported by Haploreg v.4 164 or RegulomeDB165 2) SNPs in genes of documented function and previous association with hypertension, blood pressure regulation or a cardiovascular phenotype in the NHGRI-EBI GWAS Catalog 166. If a gene locus has several SNPs in high LD – we selected a representative SNP to move forward for replication. We moved a total of 10 SNPs in 10 independent genomic loci based on the criteria above and SNPs were considered replicated in SPS3 by meeting a one-sided

Bonferroni corrected p-value of 0.005 (0.05 / 10 signals) and had directionally similar association indicated by consistent odds ratio of RHTN in INVEST and SPS3.

INVEST– SPS3 – eMERGE

Next, we conducted a meta-analysis of summary statistics of white and Hispanic participants of INVEST and SPS3 using fixed effect, inverse variance weighted meta- analysis in METAL144, with the rationale of increasing sample size and improving power to detect associations. SNP(s) with a meta-analysis P < 5 x10-8 were deemed genome wide significant161 and SNPs with p<5x10-5 were suggestive. We used eMERGE as a secondary validation cohort for the genome wide and suggestive SNPs from INVEST-

SPS3 meta-analysis. We considered 5 SNPS for validation in eMERGE that met the screening criteria for functional or biological evidence as described earlier and had the same direction of association in INVEST and SPS3. SNPs were considered validated in eMERGE if they had a consistent direction of association as in INVEST and SPS3 at a

Bonferroni corrected one-sided p-value of 0.01 (0.05/ 5 SNPs) since a one-sided hypothesis was being tested.

76

As a confirmatory analysis, we performed GWAS analysis using the dosage files of imputed data. EPACTS software (http://genome.sph.umich.edu/wiki/EPACTS) was used for the analysis.

Risk Score Analysis in INVEST and SPS3

To advance potential translation of the findings, we set out to construct a genetic risk score of RHTN SNPs to evaluate the effect of having multiple risk alleles and their contribution to the phenotype. A risk score was generated using 3 independent SNPs that were replicated in SPS3 and included: rs11749255 in MSX2, rs6487504 in IFLTD1, rs324498 in PTPRD. Genetic risk scores were constructed using an unweighted, allele counting method as numerous studies have documented that this is as effective as more complex weighted models167, 168. Only one participant with a missing genotype was excluded from the genetic score risk analysis. A single point was given to each risk allele, i.e. the allele associated with increased odds of RHTN167. The risk score ranged from 0 – 6 (2 points if participant was homozygous for risk allele, 1 point if heterozygous for risk allele, 0 points if homozygous for the protective allele). We evaluated the prevalence of RHTN across the risk score groups. A Cochran-Armitage Trend test was performed separately within the four ancestry/ethnic groups, and then within the combined dataset of INVEST– SPS3. We also tested the association of the risk score with RHTN using logistic regression in SAS 9.3 (Cary NC). Adjusted odds ratio, and

95% confidence intervals (CIs) were computed.

Results

Clinical Characteristics of Study Participants and RHTN phenotype

The baseline clinical characteristics of patients in INVEST and SPS3 are summarized in Table 3-1. On average, INVEST participants were older (mean age is 68

77

years) than SPS3 participants (mean age is 63 years). In INVEST, participants with

RHTN had a higher prevalence of other cardiovascular co-morbidities such as congestive heart failure, myocardial infarction, and peripheral vascular disease.

Participants with RHTN were more likely to be diabetic and have higher BMI compared to non-RHTN participants (Table 3-1).

Replication of INVEST Significant SNPs in SPS3

After QC procedures, 696,317 genotyped SNPs were tested for association with

RHTN in INVEST. None of the tested SNPs reached genome wide significance in either of the ancestry/ethnicity groups (whites or Hispanics) or in the white-Hispanic meta- analysis. We had 43 independent SNPs (Table 3 - 2) from INVEST (white-Hispanic meta-analysis) that met the suggestive evidence of association and had consistent association among INVEST whites and Hispanics participants; 10 of which were screened and selected for replication in SPS3 (white-Hispanic meta-analysis) as they had the strongest evidence for a functional role according to Haploreg v.4 164 and

RegulomeDB165 ( BNC2, IER3IP1,ADAMSTL1, BACH2 gene loci) or they were in regions with previous association with a cardiovascular phenotype according to the

NHGRI-EBI GWAS catalogue (e.g. atherosclerosis: CX3CR1; stroke: MSX2; hypertension: PTPRD, CASP3, BACH2; heart failure: ADAMSTL1, IFLTD1 ) (Table 3-

3).

Among the 10 evaluated SNPs, 3 SNPs in the MSX2 gene region, IFLTD1 and

PTPRD were replicated in SPS3 (Table 3-4). Minor allele frequencies for these 3 SNPs and Hardy Weinberg Equilibrium p-values are shown in Table 3-5. The first replicated gene region included a SNP (rs11749255) located 82 kb upstream of MSX2 as shown in the regional plot (Figure 3-3). This gene region has several associated signals in LD

78

with rs11749255. The A allele of rs11749255 was associated with a 50% increase in odds of RHTN in INVEST (OR (95% CI) 1.5 (1.20-1.80), p=7.3x10-5) and 2-fold increased odds in RHTN in SPS3 (OR (95% CI) 2.0 (1.40-2.8), p=4.4x10-5). This SNP reached genome wide significance when INVEST and SPS3 were combined (OR (95%

CI) 1.6 (1.30-1.90), p=3.8x10-8) (Figure 3-4, Table 3-4).

Additional signals that met the replication criteria were also discovered. The second region with association with RHTN was found near the IFLTD1 gene region, where rs6487504 was consistently associated with RHTN in both INVEST (OR (95% CI)

1.9(1.41-2.47), p=1.1x10-5) and SPS3 (OR (95% CI) 1.7 (1.19 -2.46), p=4.0 x 10-3).

Each additional copy of the variant allele (A) was associated with 81% higher odds for

RHTN in the INVEST and SPS3 meta-analysis (OR (95% CI) = 1.8 (1.45 - 2.26), p= 1.6 x10-7 (Figure 3-5, Table 3-4).

The third association of interest was an intronic SNP rs324498 in the PTPRD, a previously reported association with RHTN from INVEST that was identified using a large gene centric chip analysis84. The SNP was associated with RHTN in INVEST (OR

(95% CI) 1.6 (1.30-2.00), p=3.4x10-5) and replicated in SPS3 (OR (95% CI) 1.6 (1.10-

2.40), one sided p=0.005). Each additional copy of the variant allele (G) was associated with 62% increase in RHTN risk in the INVEST and SPS3 meta-analysis (OR (95% CI)

= 1.60(1.30 – 2.00), p= 1.3 x10-6 (Figure 3-6, Table3-4).

Validating INVEST – SPS3 SNPS in eMERGE

As a secondary validation step, we set out to validate SNPs from the combined

INVEST and SPS3 meta-analysis using data from eMERGE. The Manhattan and Q-Q plot of the INVEST-SPS3 meta-analysis are shown in Figures 3-7A and 3-7B. We selected 5 SNPs to validate in eMERGE including the rs11749255 MSX2 and rs324498

79

PTPRD that replicated in SPS3 (rs6487504 IFLTD1 SNP was not available in eMERGE). We were not able to validate rs11749255 MSX2 and rs324498 PTPRD associations in eMERGE. However, we found a SNP rs16934621 in the BNC2 gene region (Figure 3-8.) that was associated with RHTN in the INVEST – SPS3 meta- analysis and had a directionally similar association in eMERGE (Table 3-6). The A allele of rs16934621 was associated with increased risk of RHTN in the INVEST – SPS3 meta-analysis (OR (95% CI) = 1.8 (1.38-2.34), p=1.5x10-5) and eMERGE (OR (95% CI)

= 1.43 (1.03-1.98), one–sided p=0.015).

The associations for the identified loci (MSX2, IFLTD1, PTPRD and BNC2) were confirmed when we performed the GWAS using 1000 Genomes imputed data (Figure 3-

7).

Genetic Risk Score in INVEST-SPS3

We constructed a genetic risk score based on the three replicated SNPs (MSX2 rs11749255, PTPRD rs324498 and IFLTD1 rs6487504) and compared the prevalence of RHTN across the different risk score categories. The Cochran-Armitage Trend test revealed that participants with increased number of risk alleles (higher risk score) had a higher prevalence of RHTN compared to lower score participants in the combined data of INVEST and SPS3 (p =1.8 x 10-15, Figure 3-9). This association with the risk score was also consistent across the four ancestry/ethnic groups; Figure 3-10(A-D). In addition, we found a significant association between RHTN and the risk score in

INVEST – SPS3 cohorts; each risk allele was associated with ~ 62% increase in RHTN risk (OR (95% CI) 1.6 (1.45 – 1.81), p<0.0001).

80

Discussion

We sought to identify and replicate common genetic variants associated with

RHTN across two cohorts of hypertensive patients treated with antihypertensive medications for BP control. Through a GWAS analysis approach, we identified three novel regions associated with RHTN in INVEST that validated in SPS3: MSX2, IFLTD1, and PTPRD. We also found another region of interest near the BNC2 region, which was first identified in a meta-analysis of INVEST and SPS3 and validated in a cohort of hypertensive patients from EHR data linked to bio-repository from the eMERGE network.

The first identified region is in MSX2 and included multiple associated variants.

MSX2 encodes for a muscle segment homeobox gene family. The encoded protein is a transcriptional factor that promotes the genetic expression of osteogenic factors including alkaline phosphatases and plays a role in bone development169. Additionally,

MSX2 is a transcriptional modulator in vascular calcification170. Transgenic overexpression of MSX2 in mice promotes vascular calcification and activation of Wnt dependent signaling in a model of vascular calcification 171. A study by Chen et al. reported that the inhibition of MSX1 and MSX2 in diabetic mice was associated with reduction in aortic vascular calcification suggesting MSX2 as a therapeutic target for treatment of vascular calcification in diabetes.172 Moreover, azeldipine, a newer generation of calcium channel blocker inhibited the Msx2 gene dependent process of vascular calcification in mice173. An MSX2 variant (rs11739255) was associated with increased risk of RHTN and reached genome wide significance when INVEST and

SPS3 were combined. The consistent association of this genetic signal across four independent cohorts of hypertensive patients from two different studies (INVEST whites,

81

INVEST Hispanics; SPS3 whites, SPS3 Hispanics) provides compelling evidence to the importance of this genetic region in RHTN. rs11749255 is associated with histone modification mark (H3k4me1) in fetal heart tissue and placenta and altered binding of several regulatory motifs according to Haploreg v4164. Interestingly, this SNP is an eQTL for MSX2 in brain cortex according to the GTEx portal database174, 175 (Figure 3-11)

The second identified region of interest is in the intermediate filament tail domain containing 1 (IFLTD1), which plays a role in structural activity and cell proliferation.

IFLTD1 rs6487504 SNP was associated with RHTN in INVEST and SPS3. Although it is unclear how the association in the IFLTD1 gene region influences HTN and RHTN, several associations in IFLTD1 with cardiovascular phenotypes including body mass index, carotid femoral pulse wave velocity and left ventricular ejection time have been reported176-178. Moreover, the SNP rs6487504 overlaps with promotor histone marks in several brain tissues and enhancer histone marks in Human Umbilical Vein Endothelial

Cells (HUVEC) according to Haploreg.

The third associated and replicated region is in the PTPRD locus. The PTPRD protein is a member of protein tyrosine phosphatase (PTP), a family of signaling molecules involved in a variety of cellular processes including mitotic cycle, and cellular differentiation. The PTPRD rs324498 association was first identified in a large-centric gene analysis as a RHTN association84 and was among the top associated SNPs in this

GWAS analysis. We confirmed this association in hypertensive patients with a history of stroke from SPS3. The PTPRD locus has been associated with increased risk of coronary artery disease and diabetes179, 180, 181. Recently, two SNPs, rs12346562 and rs10739150 near the PTPRD were also associated with BP response to atenolol in

82

hypertensive participants from the Pharmacogenetics Evaluation of Antihypertensive

Responses (PEAR) study182.

BNC2 encodes basonuclin 2, a zinc finger transcriptional factor183. SNPs in

BNC2 have been previously associated with glycemic control in type I diabetes and glycemic complications including diabetic nephropathy and retinal complications184.

There is also a report on the association of BNC2 with hypertensive - associated microalbuminuria in a GWAS of hypertensive individuals185. Another recent analysis from the GenSalt study reported an association of BNC2 – potassium interaction with diastolic blood pressure186. BNC2 is characterized by extreme conservation among vertebrates, suggesting its important regulatory function. While the exact mechanism of the associated BNC2 SNP in the context of RHTN is unknown, data from ENCODE illustrate that rs16934621 is associated with chromatin states in cell lines and affects protein binding (Figure 3-12). Additionally, this SNP is an eQTL for BNC2 in brain cortex according to the GTEx portal database174, 175 (Figure 3-13).

Finally, we developed a genetic risk score in the INVEST and SPS3 cohorts to evaluate the contribution of multiple risk alleles to the RHTN phenotype. Participants with increased number of risk alleles were at a higher risk of developing RHTN compared to participants with lower number of risk alleles. This is in line with the polygenic nature of complex phenotypes in which multiple genetic variants are likely to act in concert to derive the phenotype. This genetic risk score has yet to be validated in independent RHTN cohorts. To date, there are no available RHTN cohorts with genome wide data in which the risk score can be replicated. However, the International

Consortium for Antihypertensive Pharmacogenomics Studies (https://icaps-htn.org/)

83

include GWAS data available on antihypertensive drug response from 29 hypertensive cohorts in 22 research groups. Datasets with ascertained BP response and potential to infer the RHTN phenotype, similar to INVEST and SPS3, are available in ICAPS, and several groups are in the process of creating the RHTN phenotype within their dataset, which presents potential validation cohorts for the discovered RHTN signals and the genetic risk score developed in this analysis This is likely to promote the utility of risk score as a prediction tool to identify high-risk patients, with whom nephrologists/clinicians need to be strict with risk factor modifications, for example, dietary sodium restriction. Such patients should have their antihypertensive regimen optimized with the recommended agents that include diuretic, long acting non- dihydropyridine calcium channel blocker, and a renin-angiotensin system blocker (ACEI or ARB). If BP is still uncontrolled, spironolactone, a highly effective mineralocorticoid receptor antagonist should be added as a fourth agent187. These patients may also benefit from referral to hypertension specialists, or specific procedural intervention.

To our knowledge, this is the first GWAS analysis in RHTN conducted using data from clinical trials. We focused on discovery and replication of genetic determinants of

RHTN in hypertensive individuals from two randomized, outcomes-driven clinical trials, with separate analyses in whites and Hispanics, for a total of four cohorts evaluated to arrive at our findings. Strengths of this study are the consistency of findings across two clinical trials with well – documented drug use and dose optimization to a BP-driven protocol. Specifically in INVEST, a centralized and electronic data reporting system was used, which allowed for accurate monitoring and tracking of drug use. This is in addition to a centralized mail ordering pharmacy for processing and delivery of prescribed

84

medications to the patient’s home, and the receipt of medications confirmed via patient’s postcard188. Moreover, patients in INVEST experienced BP and heart rate lowering effects of atenolol and verapamil, an expected pharmacodynamics effect, further confirming ingestion of the drugs85. In SPS3, patients were followed monthly until BP is in goal, and then quarterly. Compliance with the medications was assessed in the follow – up visits and adherence was reported to be good or excellent in >75% of the visits97, 98. Additionally, medications were offered at no cost whenever appropriate97,

98. With these features, we feel that RHTN observed in INVEST and SPS3 studies reflect a difficult to treat BP phenotype rather than a non – adherence related. Further, the identification of genes and replication of previously reported associations, that are biologically relevant to hypertension mechanisms suggest that these findings are likely to be related to RHTN or a difficult to control BP phenotype. While we feel confident that the associations found in the current study reflect a difficult to control BP rather than pseudo-resistance driven by non-adherence, we did not perform biochemical assays to screen for drugs and adherence and thus pseudo-resistance cannot be definitively excluded.

We acknowledge some limitations in our study. First, our power in our discovery cohort (INVEST) is limited to detect genetic associations with variants of small to moderate genetic effect; this was overcome to some extent by combining association results of two hypertensive cohorts. Second, we sought to utilize data derived from the electronic health record as part of eMERGE, as a secondary validation for the identified signals; however, we believe that eMERGE may not be an optimal replication/validation cohort for INVEST-SPS3 meta-analysis owing to the differences in the RHTN

85

phenotype between INVEST–SPS3 and eMERGE datasets, and the general quality of data in a clinical trial versus within the electronic health record. Specifically, in INVEST and SPS3, drug use was carefully recorded and BP carefully monitored, versus data arising from the electronic health record, where BP may not have been recorded with the same care as a clinical trial and medication use is more difficult to track.

Additionally, differences in clinical characteristics between the patient populations exist: patients in eMERGE were much older than INVEST and SPS3 participants; and generally healthier than patients in INVEST and SPS3 as patients in eMERGE dataset were excluded if they had heart failure and chronic kidney disease. These differences in the RHTN phenotype may explain the lack of validation of MSX2 and PTPRD signals in eMERGE and may further emphasize the need for similar datasets for validating

RHTN signals. Third, our datasets did not include patients with African American ancestry or CKD due to the reduced power for testing the association with RHTN in these patients. Testing the association with RHTN for these specific patients is encouraged as more datasets become available.

In conclusion, we identified and validated multiple variants in different gene regions. The understanding of the underpinnings of these variants may help in advancing knowledge about the etiological factors of RHTN. Further validating the association of these variants and risk score in emerging RHTN cohorts may help in the precision medicine era, where patients with genetic predisposition to RHTN can be identified and treated accordingly to prevent serious CV and renal sequelae.

86

Table 3-1. Clinical characteristics of INVEST (Discovery) and SPS3 (Replication)

INVEST SPS3

whites Hispanics whites Hispanics Clinical characteristics Controls RHTN Controls RHTN Controls RHTN Controls RHTN Cases Cases Cases Cases N=431 N=226 N=394 N=143 N=192 N=71 N=239 N=83

Age 7010 709 6610 6610 6410 639 6311 6311

Female 184(43%) 114(50%) 225(57%) 87(61%) 70(36%) 18(25%) 98(41%) 37(45%)

BMI, meanSD 296 296 28.55 305** 296 3110 284 306##

SBP at RHTN classification 14717 15419* 14719 14919 1269 13814# 12211 13314##

DBP at RHTN classification 8310 8211 8710 8812 718 739 6710 7010

Diabetes 64(15%) 66(29%)* 51(13%) 25(18%) 44(23%) 25(35%)# 71(30%) 33(40%)

Heart failure 27(6%) 17(8%) 6(2%) 8(6%)** 1(1%) 2(1.4%) 2(0.9%) 1(1.2%)

Myocardial infarction 170(39%) 92(41%) 35(9%) 23(16%) 6(3%) 8(11%)# 8(3%) 2(2%)

Peripheral vascular disease 40(9%) 35(16%)* 34(9%) 23(16%)** 4(2%) 2(3%) 0(0%) 2(2%)

Smoking 204(47%) 113(50%) 137(35%) 44(31%) 41(21%) 21(30%) 18(8%) 3(4%) BMI: Body mass index, SBP: systolic blood pressure, DBP: diastolic blood pressure. Continuous variables are expressed as means  standard deviations (SD), categorical variables are expressed as frequency and percentages. * indicates p<0.05 compared to controlled HTN in INVEST whites, ** indicates p<0.05 compared to controlled HTN in INVEST Hispanics, #indicates p<0.05 compared to controlled HTN in SPS3 whites, ##indicates p<0.05 compared to controlled HTN in SPS3 Hispanics.

87

Table 3-2. Top RHTN signals that met the suggestive evidence of association in INVEST whites and Hispanics (p=1 x10-4) INVEST SNP Nearest Gene A1 A2 OR Direction P rs17441872 990kb 3' of SFTA1P A G 0.55 -- 9.73E-07

rs647769 14kb 5'UPK2 A G 1.57 ++ 7.14E-06 rs3732378 missense CX3CR1 A G 0.49 -- 8.03E-06 rs8136758 TBC1D22A intronic A G 1.85 ++ 8.21E-06 rs11980456 CHN2 intronic A G 1.54 ++ 9.11E-06 rs16934621 3'UTR BNC2 A G 2.04 ++ 9.63E-06 rs11663646 166kb 5' of IER3IP1 A G 0.65 -- 9.70E-06 rs6487504 5.8kb 5' of IFLTD1 A G 1.87 ++ 1.11E-05

rs960955 CLTCL1 intronic A G 1.79 ++ 1.49E-05

rs34116 RASGREF2 A G 0.66 -- 1.53E-05 rs4458096 117kb 5' of IER3IP1 A C 1.51 ++ 1.70E-05 rs10476096 54kb 3' of HMP19 A C 1.87 ++ 1.72E-05 rs10892471 80 kb 5' of PVRL1 A G 0.45 -- 1.76E-05 rs4715102 752kb 3' of LOC100506207 A G 1.49 ++ 1.84E-05

rs259499 ZNF385D intronic A G 0.58 -- 3.26E-05 rs7149467 200k 5' of C14orf64 A G 1.61 ++ 3.26E-05

rs324498 PTPRD intronic A G 0.62 -- 3.43E-05 rs12375332 5.7kb 5' of SDCBP A G 1.89 ++ 4.20E-05 rs1371901 406kb 5' of BCHE A G 1.52 ++ 4.41E-05

88

Table 3-2. Continued INVEST SNP Nearest Gene A1 A2 OR Direction P rs7169971 200kb 3' of MIR4510 A C 1.51 ++ 4.90E-05 rs17474256 91kb 5' of A G 0.55 -- 6.31E-05 LOC100129138

rs2699032 69kb 3' of RAPGEF2 A G 0.56 -- 5.49E-05 rs10191709 68kb 5' of SLC4A10 A G 0.47 -- 5.76E-05

rs4130074 215kb 5' of MIR4472-1 A C 1.54 ++ 5.77E-05

rs4961653 ADAMTSL1 intronic -- 5.99E-05 A G 0.67

rs1971131 EXD1 synonymous A G 0.62 -- 6.16E-05 rs12278752 CLMP intronic A G 1.97 ++ 6.21E-05 rs17474256 91kb 5' of A G 0.55 -- 6.31E-05 LOC100129138 rs10237191 MAD1L1 intronic A G 2.40 ++ 6.41E-05

rs6864763 81kb 5'of MSX2 A G 0.64 -- 6.43E-05

rs8141057 544bp 5' of CYTH4 A C 1.58 ++ 6.48E-05

rs7016829 2.6kb 5' of REXO1L2P A G 1.97 ++ 6.94E-05

rs2178094 17kb 3' of CASP3 A G 1.47 ++ 6.99E-05

rs917248 32kb 5'of DPY19L2P4 A G 2.44 ++ 6.99E-05

rs6841628 31kb 3' of LOC441025 A G 1.74 ++ 7.17E-05 rs11749255 82kb 5' of MSX2 A G 1.50 ++ 7.30E-05

rs1880821 84kb 3' of SEMA3A A G 0.36 -- 7.80E-05

89

Table 3-2. Continued INVEST SNP Nearest Gene A1 A2 OR Direction P rs4359751 PHF21B intronic A G 1.49 ++ 7.97E-05 rs12625736 PTPRT intronic A G 0.67 -- 8.09E-05

rs4143021 GYPC intronic A G 1.63 ++ 8.39E-05

rs2194734 45kb 3' of TBR1 A G 0.52 -- 8.46E-05

rs4624363 3.3kb 3' of MDH1B A G 1.80 ++ 8.52E-05 rs10930413 MYO3B intronic A C 0.67 -- 8.53E-05 rs10018559 95kb 3' of LOC285419 A G 1.46 ++ 8.79E-05

rs4388268 BACH2 intronic A G 0.64 -- 8.80E-05 rs12831974 THRDE intronic A G 0.58 -- 9.11E-05

A1: coded allele in each dataset; OR: odds-ratio; P: meta-analysis p-value of whites and Hispanics of INVEST

90

Table 3-3. Top RHTN signals in INVEST that were evaluated for replication in SPS3

Functional INVEST (whites+Hispanics meta- SPS3 (whites+Hispanics meta- SNP Nearest Gene Annotation analysis) analysis) A1 OR P Direction A1 OR P* Direction rs3732378 CX3CR1 missense A 0.49 8.03x10-6 -- A 1.10 0.70 -+ rs16934621 3’UTR BNC2 Enhancer histone marks, promotor histone A 2.04 9.63x10-6 ++ A 1.32 0.13 ++ marks in multiple tissues rs6487504 5.8kb 5' IFLTD1 Enhancer histone marks in blood, HUVEC, A 1.87 1.11x10-5 ++ A 1.71 4.0x10-3 ++ brain rs4458096 117kb 5' eQTL A 1.51 1.70x10-5 ++ A 1.03 0.57 -+ IER3IP1

rs324498 PTPRD intronic A 0.62 3.43x10-5 -- A 0.62 0.01 -- rs11749255 82kb 5' MSX2 Enhancer histone marks in heart A 1.50 7.30x10-5 ++ A 2.02 4.4x10-5 ++ rs4961653 ADAMSTL1 Enhancer histone marks, promotor histone A 0.67 5.99x10-5 -- A 0.98 0.89 -+ (intronic) marks in multiple tissues, eQTL rs2178094 17kb 3' CAPS3 Enhancer histone marks, eQTL A 1.47 6.99x10-5 ++ A 0.87 0.38 -+ rs4624363 3.3kb 3' eQTL A 1.80 8.52x10-5 ++ A 0.99 0.99 -+ MDH1B rs4388268 BACH2 Enhancer histone marks, promotor histone A 0.64 8.52x10-5 -- A 0.82 0.18 -- (intronic) marks in multiple tissues A1: coded allele in each dataset; OR: odds-ratio; P is the meta-analysis p-value of whites and Hispanics of INVEST. P* is the meta-analysis p- value of whites and Hispanics of SPS3 replication cohort. These ten independent SNPs were taken forward for replication in SPS3 meta-analysis since they had the highest evidence for functional role and / or biological relevance. SNPs were replicated in SPS3 if they have the same direction of association as in INVEST and met a one-sided Bonferroni corrected p-value of 0.005 (0.05 / 10 signals). SNPs that were replicated in SPS3 are displayed in bold font. HUVEC: Human Umbilical Vein Endothelial Cells.

91

Table 3-4. RHTN SNPs: Discovery in INVEST with replication in SPS3 INVEST- White- SPS3 INVEST- White- Nearest Hispanic Meta- SPS3 Heterogeneity SNP Ch Position A1 Study Hispanic Gene Meta- analysis Meta- p OR (95%CI) analysis p OR analysis p (95%CI) INVEST 1.5 (1.2,1.8) 7.3 x 10-5 rs11749255 5 174642665 MSX2 A 1.60 3.8 x 10-8 0.14 SPS3 2.0 (1.4,2.8) 4.4 x 10-5 (1.3,1.9)

INVEST 1.9 (1.4,2.5) 1.1 x 10-5 rs6487504 12 25654374 IFLTD1 A 1.80 1.6 X 10-7 0.92 SPS3 1.7 (1.2,2.5) 4.0 x 10-3 (1.4,2.3)

INVEST 1.62(1.3,2.0) 3.4 x 10-5 rs324498 9 9059545 PTPRD G 1.62 1.3 x10-6 1

SPS3 1.63(1.1,2.4) 0.01 (1.3,2.0) A1: coded Allele; OR: Odds ratio; Heterogeneity p: INVEST-SPS3 meta-analysis heterogeneity p-value.

92

Table 3-5. Genotype frequencies and Hardy Weinberg Equilibrium of replicated / validated SNPs Alleles Genotype counts HWE Marker Nearest gene Minor (m) Major (M) MAF mm/mM/MM

INVEST Whites rs11749255 MSX2 gene G A 0.32 64/296/296 0.48 rs6487504 IFLTD1 gene G A 0.15 12/169/475 0.64 rs16934621 BNC2 gene A G 0.08 6/87/564 0.25 rs324498 PTPRD gene G A 0.16 15/181/451 0.67 INVEST Hispanics rs11749255 MSX2 gene G A 0.36 73/238/226 0.46 rs6487504 IFLTD1 gene G A 0.17 19/143/375 0.28 rs16934621 BNC2 gene A G 0.1 3/100/432 0.46 rs324498 PTPRD gene G A 0.28 41/214/279 1 SPS3 whites rs11749255 MSX2 gene G A 0.35 22/117/124 0.71 rs6487504 IFLTD1 gene G A 0.06 4/63/196 1 rs16934621 BNC2 gene A G 0.13 3/44/216 1 rs324498 PTPRD gene G A 0.1 6/57/200 0.27 SPS3 Hispanics rs11749255 MSX2 gene G A 0.26 29/144/149 1 rs6487504 IFLTD1 gene G A 0.36 46/142/133 0.61 rs16934621 BNC2 gene A G 0.08 1/42/279 1 rs324498 PTPRD gene G A 0.21 9/100/213 0.54 HWE: Hardy Weinberg Equilibrium

93

Table 3-6. Top SNPs from INVEST-SPS3 meta-analysis that were assessed for validation in eMERGE INVEST-SPS3- INVEST-SPS3 meta-analysis eMERGE eMERGE Meta- analysis Nearest One SNP Chr Position Function A1 Freq OR P A1 Freq OR P Direction Gene sided P rs11749255 5 174.6 MSX2 Intergenic A 0.68 1.62 3.8x10-8 A 0.69 0.91 0.84 3.05E-07 ++++- rs324498 9 9.1 PTPRD Intronic A 0.81 0.62 1.3x10-6 A 0.84 1.2 0.94 3.01E-05 ----+ rs12228810 12 125.1 AACS Intronic C 0.85 0.62 1.3x10-5 C 0.88 0.92 0.26 9.42E-06 ----- rs16934621 9 16.4 BNC2 3’UTR A 0.1 1.79 1.5x10-5 A 0.08 1.43 0.015 4.82E-06 +++++ rs3766160 1 15.8 CELA2B Missense A 0.73 0.68 1.5x10-5 A 0.74 0.92 0.19 1.06E-06 +++++ A1: coded allele, Freq: frequency of coded allele, OR: odds ratio

94

Figure 3-1. Top SNPs from INVEST-SPS3 meta-analysis using imputed data

MarkerName rs# Nearest gene A1 A2 OR P.value Direction 5:174069668_A/G_Intergenic rs11749255 MSX2 a g 1.67 4.13E-08 ++++

5:174072704_C/G_Intergenic rs6556150 MSX2 c g 1.67 9.07E-08 ++++ 5:174075895_C/T_Intergenic rs11743445 MSX2 t c 0.60 9.61E-08 ----

5:174074955_G/C_Intergenic rs6888000 MSX2 c g 0.60 1.12E-07 ----

5:174070820_A/G_Intergenic rs6864763 MSX2 a g 0.61 1.12E-07 ---- 5:174070626_G/A_Intergenic rs6864839 MSX2 a g 0.61 1.16E-07 ----

12:25807308_C/T_Intergenic rs6487504 IFLTD1 t c 1.82 1.26E-07 ++++

5:174074073_TA/T_Intergenic rs111693981 MSX2 t ta 1.65 1.82E-07 ++++

5:174073675_G/A_Intergenic rs56283956 MSX2 a g 0.61 3.09E-07 ---- 11:118812512_C/T_Intron:UPK2 rs647769 UPK2 t c 1.59 7.96E-07 ++++

2:123330661_A/_Intergenic rs201648746 12:25814594_T/C_Intergenic rs10842594 IFLTD1 t c 1.74 1.13E-06 ++++

4:183598169_A/G_Intron:ODZ3 rs7666397 ODZ3 a g 0.66 1.17E-06 ---- 1:113626662_A/ATAT_Insertion:LRIG2 rs3916295 LRIG2 a atat 0.44 1.28E-06 ----

9:9059545_G/A_Intron:PTPRD rs324498 PTPRD a g 0.62 1.29E-06 ----

11:118811684_C/G_Intron:UPK2 rs550999 UPK2 c g 0.63 1.30E-06 ----

4:183599392_C/T_Intron:ODZ3 rs6552592 ODZ3 t c 1.52 1.31E-06 ++++

6:72460809_C/T_Intergenic rs147457421 136kb 5' of RIMS1 t c 0.65 1.51E-06 ----

20:58194047_G/A_Intron:PHACTR3 rs78878795 PHACTR3 a g 0.37 1.65E-06 ----

6:72460815_A/G_Intergenic rs147264336 136kb 5' of RIMS1 a g 1.53 1.67E-06 ++++

20:6679347_T/TATC_Intergenic rs112437787 69kb 5' of BMP2 t tatc 0.45 1.70E-06 ---- 9:16417153_GA/G_Deletion:BNC2 g ga 1.99 2.03E-06 ++++

6:72460810_A/G_Intergenic rs139826905 136kb 5' of RIMS1 a g 1.53 2.03E-06 ++++

6:72460811_C/G_Intergenic rs143921224 136kb 5' of RIMS1 c g 1.53 2.07E-06 ++++

6:72460805_A/T_Intergenic rs142886166 136kb 5' of RIMS1 a t 1.52 2.37E-06 ++++

95

Figure 3-1. Continued. 20:6678989_A/G_Intergenic rs7265612 70kb 5' of BMP2 a g 0.46 2.49E-06 ----

6:72460872_G/T_Intergenic rs148737641 136kb 5' of RIMS1 t g 0.66 2.60E-06 ----

12:125555313_A/C_Intron:AACS rs11058028 AACS a c 0.60 3.20E-06 ---- 10:15807958_C/A_Intergenic rs12571822 12kb 3' of FAM188A a c 1.72 3.23E-06 ++++

4:183597786_G/C_Intron:ODZ3 rs7666234 ODZ3 c g 1.48 3.33E-06 ++++

4:183597788_A/G_Intron:ODZ3 rs7665739 ODZ3 a g 0.67 3.34E-06 ----

12:125555462_G/A_Intron:AACS rs879993 AACS a g 1.66 3.79E-06 ++++

1:57973848_G/A_Intron:DAB1 rs12120223 DABI a g 0.63 3.80E-06 ---- 10:50792460_G/A_Intergenic rs10857513 25kb 5' of CHAT a g 2.06 3.96E-06 ++++ 19:28645689_G/A_Intergenic rs113355465 361kb 5' of a g 0.63 4.14E-06 ---- LOC148189

1:15802284_A/AG_Insertion:CELA2B rs199788900 310bp 5' of CELA2B a ag 0.63 4.50E-06 ---- 4:150415036_CA/C_Deletion:RP11- ca c 3.06 4.80E-06 ++++ 526A4.1

2:242443829_C/T_Intron:STK25 rs34227388 STK25 t c 1.78 5.03E-06 ++++

1:15801214_G/A_Intron:CELA2B rs72865196 1.4kb 5' of CELA2B a g 1.57 5.14E-06 ++++

9:16415806_T/C_Intergenic rs78537784 BNC2 t c 0.52 5.16E-06 ----

1:15801361_C/T_Intron:CELA2B rs35875594 CELA2B t c 1.57 5.18E-06 ++++

1:15808702_C/T_Intron:CELA2B rs4661635 CELA2B t c 1.58 5.54E-06 ++++

9:16415411_A/G_Intergenic rs2296863 BNC2 a g 0.52 5.57E-06 ----

1:15808506_TAG/T_Deletion:CELA2B rs35206351 t tag 1.57 5.69E-06 ++++

1:15808767_G/A_Nonsynonymous:CELA2 rs3820071 CELA2B a g 1.56 5.75E-06 ++++ B

4:183590236_C/A_Intron:ODZ3 rs2309774 ODZ3 a c 0.69 5.76E-06 ---- 9:16415036_A/G_Intergenic rs2296862 BNC2 a g 0.52 5.77E-06 ----

1:15809428_C/T_Intron:CELA2B rs12139377 t c 1.56 5.84E-06 ++++

1:15808872_G/A_Nonsynonymous:CELA2 rs3766160 CELA2B a g 1.58 5.91E-06 ++++ B

4:183598168_C/T_Intron:ODZ3 rs7666561 ODZ3 t c 1.47 5.95E-06 ++++

12:125549956_C/T_Utr5:AACS rs41474647 5utr AACS t c 1.64 6.02E-06 ++++

96

Figure 3-1. Continued.

1:15810252_G/A_Intron:CELA2B rs10803385 CELA2B a g 1.56 6.05E-06 ++++ 5:174076669_C/T_Intergenic rs11744357 MSX2 t c 0.64 6.11E-06 ----

1:15811692_C/T_Intron:CELA2B rs10803386 CELA2B t c 1.55 6.41E-06 ++++ 10:50784237_T/C_Intergenic rs60981594 33kb 5' of CHAT t c 0.49 6.87E-06 ---- 3:168763395_A/G_Intergenic rs28608851 38kb 3' of MECOM a g 0.53 7.03E-06 ----

1:15809036_G/A_Intron:CELA2B rs3766161 CELA2B a g 1.56 7.07E-06 ++++

11:63222914_T/C_Intergenic rs138283171 6kb 3' of HRASLS5 t c 0.42 0.000007185 ----

1:15808289_C/T_Intron:CELA2B rs4661634 CELA2B t c 1.56 7.78E-06 ++++

7:8363175_T/A_Intron:AC007128.1 rs6967558 61kb 5' of ICA1 a t 1.98 7.87E-06 ++++

12:125554355_A/G_Intron:AACS rs11058026 AACS a g 0.62 7.88E-06 ---- 1:22000789_G/A_Intergenic rs151158213 4kb 3' of USP48 a g 2.58 8.04E-06 ++++

12:125556425_CAG/C_Deletion:AACS rs3048262 AACS cag c 0.61 8.18E-06 ---- 4:44032282_C/T_Intergenic rs148733901 144kb 3' of KCTD8 t c 1.77 8.76E-06 ++++ 4:44036112_GA/G_Intergenic rs142727333 140kb 3' of KCTD8 g ga 1.78 8.99E-06 ++++

4:44029342_T/A_Intergenic rs79754981 147kb 3' of KCTD8 a t 1.77 9.16E-06 ++++ 1:15808198_C/T_Intron:CELA2B rs7520335 CELA2B t c 1.57 9.19E-06 ++++

4:44026457_G/A_Intergenic rs77519982 KTCD8 a g 1.77 9.39E-06 ++++

11:63225574_C/T_Intergenic rs117563546 3.3kb 3' of t c 2.41 9.40E-06 ++++ HRASLS5 4:44021134_A/AT_Insertion:RP11- rs181531127 155kb 3' of KCTD8 a at 0.57 9.40E-06 ---- 328N19.1

97

Figure 3-1. Continued. 4:44021133_G/GC_Insertion:RP11- rs201384613 155kb 3' of KCTD8 g gc 0.57 9.43E-06 ---- 328N19.1

11:63225577_C/T_Intergenic rs117734384 3.3kb 3' of t c 2.41 9.48E-06 ++++ HRASLS5 4:44037652_C/A_Intergenic rs11945369 138kb 3' of KCTD8 a c 1.79 9.59E-06 ++++

1:207660378_A/G_Intron:CR2 rs2182913 CR2 a g 0.63 9.75E-06 ----

9:16415459_C/T_Intergenic rs2296864 3'UTR BNC2 t c 1.81 9.86E-06 ++++ 3:168754403_CT/C_Intergenic ct c 0.54 1.00E-05 ----

9:16415357_TAC/T_Intergenic rs147676123 3'UTR BNC2 t tac 1.81 1.00E-05 ++++ A1; A2: coded alleles, OR: odds ratio, P: Meta-analysis association p-values for top SNPs (p≤ 1x10-5) from INVEST and SPS3 (whites + Hispanics). We performed a confirmatory GWAS analysis using imputed data from INVEST and SPS3. First column of the table includes the marker name as labeled by EPACTS: Chromosome #: Basepair_Reference allele_variant allele:gene annotation. Second column includes the SNP id (rs#)

98

Figure 3-2. Flowchart showing SNP prioritization for replication in SPS3 (primary approach) or validation in eMERGE (secondary approach)

99

Figure 3-3. Regional plot of MSX2 association with RHTN. The rs11749255 (purple dot) is associated with RHTN (p=3.8 x 10-8)

100

Figure 3-4. Adjusted odds ratios and 95% CIs for resistant hypertension risk for MSX2 rs11749255 in INternational VErapamil SR Trandolapril STudy (INVEST) whites, INVEST Hispanics, Secondary Prevention of Small Subcortical Strokes (SPS3) whites, SPS3 Hispanics, and meta-analysis.

101

Figure 3-5. Adjusted odds ratios and 95% CIs for resistant hypertension risk for IFLTD1 rs6487504 in INternational VErapamil SR Trandolapril STudy (INVEST) whites, INVEST Hispanics, Secondary Prevention of Small Subcortical Strokes (SPS3) whites, SPS3 Hispanics, and meta-analysis.

102

Figure 3-6. Adjusted odds ratios and 95% CIs for resistant hypertension risk for PTPRD rs324498 in INternational VErapamil SR Trandolapril STudy (INVEST) whites, INVEST Hispanics, Secondary Prevention of Small Subcortical Strokes (SPS3) whites, SPS3 Hispanics, and meta-analysis.

103

A)

MSX2 rs11749255

B)

Figure 3-7. RHTN association in INVEST-SPS3 meta-analysis using genotyped data. A): Manhattan plot of RHTN association in INVEST-SPS3 meta-analysis. B): Quantile-quantile plot (QQ) of RHTN association in INVEST-SPS3 meta- analysis.

104

Figure 3-8. Regional plot of BNC2 association with RHTN. The rs16934621 (purple dot) is associated with RHTN in INVEST-SPS3 meta-analysis (p=1.5 x 10-5) and validated in eMERGE (p=0.015).

105

-15 P=1.8x10

Figure 3-9. Genetic risk score association with RHTN. Risk score was calculated using three SNPs: rs11749255 MSX2, rs6487504 IFLTD1 and rs324498 PTPRD in 1778 participants from INternational VErapamil SR Trandolapril STudy (INVEST) and Secondary Prevention of Small Subcortical Strokes (SPS3). One point was given to each allele conferring risk for RHTN. Participants with a higher risk score had a higher prevalence of RHTN compared to participants with a risk score.

106

Figure 3-10. Risk score association with resistant hypertension (RHTN). Risk score was calculated using three SNPs: rs11749255 MSX2, rs6487504 IFLTD1 and rs324498 PTPRD in each of the four cohorts separately. One point was given to each allele conferring risk for RHTN. A) INVEST whites B) INVEST Hispanics. C) SPS3 whites. D) SPS3 Hispanics. RHTN association with the risk score was consistent among the four cohort.

107

P=0.035

Figure 3-11. Data from GTex portal shows that rs11749255 is an eQTL for MSX2 gene in brain anterior cingulate cortex. The variant genotype is associated with higher expression of MSX2 compared to the wild type genotype

108

Figure 3-12. BNC2 functional annotation. The rs16934621 is characterized by high Genomic Evolutionary Rate Profiling (GERP) score and associated with chromatin states; transcriptional elonation in human skeletal muscle cells HSMM cell lines and weak transcription in Human umbilical vein endothelial cells (HUVEC). Figure was produced using UCSC Genome Browser database: http://genome.ucsc.edu/.

109

P=0.035

Figure 3-13. Data from GTex portal shows that rs16934621 is an eQTL for BNC2 gene in brain cortex. The variant allele is associated with lower expression of BNC2 compared to the wild type genotype.

110

CHAPTER 4 IPSC AND CRISPR CAS9 – A FASCINATING TOOL BUT NOT FOR ALL PHENOTYPES

Introduction

Resistant hypertension (RHTN), is defined as uncontrolled blood pressure (BP) despite the use of 3 or more BP medications or controlled BP on 4 or more drugs146. It is a complex phenotype with intricate involvement of multiple systems including brain, kidney, and vasculature; vascular dysfunction and stiffened blood vessels are contributors to the phenotype156. Additionally, RHTN is frequently encountered with other comorbid conditions, for example atherosclerosis, diabetes, and other metabolic/cardiovascular risk factors148-150. Unlike monogenic diseases, where one gene or one genetic variant is often the culprit of the phenotype, RHTN is likely driven by multiple genes and gene variants that work in concert156. For such complex phenotypes, genome wide analyses e.g. genome wide association analysis are used to reveal phenotype associated variants189. However, for most of the identified associations, we do not have a clear interpretation of their functional role in the phenotype as these variants lie outside the coding regions of the genome or do not have an obvious role in the associated genes190, 191. Therefore, modeling of these variants in an in-vitro or in-vivo systems are needed to study the mechanistic link between the variants and the phenotype.

Limitations of Existing In-Vitro Based-Modeling

The mammalian genomes are considered evolutionarily conserved, and therefore, animal models such as non-human primates, mice and rats have been used as modeling tools to study human diseases. Their use in biomedical sciences became even more extensive with the advancement in gene targeting and emergence of

111

transgenic models.192 Nevertheless, genetic differences between the two species have accumulated since the divergence of the ancestors 108 years ago.193 For example, although most of the mice and human genes are orthologous, meaning that they evolved from a common ancestral gene, approximately, 20% of the genes lack an identifiable orthologue.193 Additionally, physiological, embryonic and developmental differences exist between the two species to an extent that may preclude the recapitulation of human phenotypes, especially when trying to uncover the influence of genetic differences. Resting heart rate is one example of such physiological differences.194

Due to these dissimilarities between animals and human, it is always better to conduct genetic studies in human, however, this can only be done using in – vitro model system. One way to conduct such studies is to use patient derived immortalized cell lines that can be generated from blood or tissue samples. While it is feasible to perform cancer studies using these patient derived immortalized cell lines from tumor cells that can be easily isolated from patients, it is often hard to perform other studies for certain phenotypes that may be cell–specific, and cannot be readily isolated from humans due to the inaccessibility of these tissues. Additionally, although immortalized human cell lines may provide a reasonable cellular model for certain phenotypes, questions may arise if changes may occur during the immortalization process, and whether this may affect the cellular phenotype being studied.

Advantages of induced Pluripotent Stem Cells (iPSC) Based Modeling

For all the shortcomings of using animal based or immortalized cell lines in disease modeling, induced pluripotent stem cells (iPSC) stood as a promising tool in biomedical sciences. iPSC is essentially an adult somatic cell that has been re-set into a

112

pluripotent stem cell by using reprogramming factors, which is then capable of differentiation into any cell type of the three germ layers.195 Theoretically, iPSC can be differentiated into virtually any tissue that retains both the genotype and molecular characteristics of the patients, 196, 197hence, the relation between genotype and phenotype can be probed after differentiation into the tissue that makes the most sense for the phenotype under question in research. The major features that made iPSC particularly an attractive modeling tool are: their pluripotent nature, and the self – renewal properties providing an indefinite source of cells that can be used as a platform for drug screening. Another theoretical advantage of iPSC use, is that it is derived from a specific patient, and for which, it recapitulates the clinical phenotype in vitro, which can be assayed with different drugs until the best therapy is potentially found for that specific patient. This best frames the principle of precision medicine. iPSC Based Modeling in Monogenic and Complex Diseases

Successful use of iPSC has been facilitated and advanced by studies that researched phenotypes with clear cellular and molecular phenotypes, when modeled in a dish. These are mostly monogenic diseases that are characterized by high penetrance and arise from a single mutation of a gene. These usually have a well– defined, and observable cellular phenotype. This is however different from complex disease phenotypes that are polygenic in nature, meaning that they are driven by variants in multiple genes, each of which have a small to moderate contribution to the cellular/molecular phenotype. These complex phenotypes have low penetrance, and usually appear at advanced stage as opposed to monogenic diseases that have an early disease onset. Additionally, complex diseases usually have subtle in vitro phenotypes that are not as distinct as monogenic diseases, making them harder to

113

model in vitro. Further, the late onset of complex diseases can add some complexity to their modeling in iPSC, which has fetal–like features198. Collectively, this may explain why iPSC modeling of complex diseases may be lagging behind monogenic diseases in the literature.

With the early advent of iPSC, researchers conducted studies in iPSC from individuals with (affected individual or case) or without the phenotype (controls). Since iPSC from cases and controls may also have genetic and epigenetic variations that are not related to the mutation or variant of interest, this approach of using iPSC from cases and controls, may be challenging to distinguish the phenotype related to the gene of interest versus epigenetic or genetic variations at other loci. This is particularly important in complex diseases that already have subtle phenotypes and contributions from multiple genes. This challenge can be overcome to some extent by creating isogenic cell lines with constant background except for the variant to be studied, using the available genome editing tools.

Coupling Genome Editing with iPSC

Genome editing is another scientific advancement that made it possible to couple the advantages of iPSC with the ability to introduce or correct a genetic variant, followed by studying the functional consequences of the edited locus in the differentiated tissue.199, 200 Different tools for genetic editing have been developed in the recent years such as zinc finger nucleases (ZFN),201 transcription activator-like effector nucleases

(TALENs) 202and Clustered regularly interspaced short palindromic repeats – associated

(CRISPR–Cas9).203 CRISPR–Cas9 have been widely adopted for gene editing since it is easier to use, more efficient, and generally cheaper than other tools.204 It was re – purposed as a site specific nuclease from the bacterial Clustered regularly interspaced

114

short palindromic repeats (CRISPR)/CRISPR-associated (Cas). In this system, the

Cas9 nuclease is associated to an engineered guide RNA, which is 20 nucleotide sequence that binds to the target genetic sequence, followed by recruiting the nuclease

(Cas9) to induce double strand break (DSB). The DSB can then be repaired through one of two pathways; the non-homologous end–joining (NHEJ) repair mechanism resulting in random insertions or deletions (Indels) or homology-directed repair (HDR) mechanisms utilizing exogenous DNA template, resulting in precise insertion / deletion gene edits or base substitutions.204-207

Successful Examples of iPSC Based Modeling in Pharmacogenomics

Precision medicine will continue to evolve and promises to use individual level clinical and genetic data to prescribe the right medication, at the right dose and at the right time, among other proposed benefits. This is also a major pillar of pharmacogenomics , which focuses on studying the effect of genetic variability on drug response, with the promise of identifying the most effective therapy, with the least adverse outcomes for an individual patient based on a panel of genetic predictors.

However, this will be only possible if the discovered associated variants are transitioned from the research and put into clinical use. Understanding the functional impact of the identified variant is then a fundamental step to build the evidence for its clinical use.

This knowledge can be facilitated through modeling the effect of the variant, where iPSC studies may serve as an excellent platform for some phenotypes. Eventually, this basic understanding of the functional effect of the variant will have to be supported by further evidence that this variant is sufficiently predictive of drug response or adverse effect, if used clinically. In addition to its promise in precision medicine, pharmacogenomics can facilitate discovery of therapeutic drug targets. In this regard,

115

iPSC based studies can be remarkably useful for screening potential drug targets and modeling drugs–induced toxicities, thus facilitating a faster track for drug development.

For these studies, iPSC from patients are differentiated into the relevant tissue, where the effect of the drugs on a phenotype read-out can be probed. Examples of some phenotype read–outs may include neurite out–growth in case of modeling drug – induced peripheral neuropathy or action potential in the case of modeling drug QT prolongation. The use of iPSC derived cardiomyocytes to recapitulate doxorubicin- induced cardiotoxicity in breast cancer patients was successfully shown in a recent study.208 iPSC cardiomyocytes from patients with and without cardiotoxicity were treated with doxurubicin. iPSC cardiomyocytes from patients who experienced clinical cardiotoxicity, showed sensitivity to doxorubicin treatment, demonstrated by damage in the myofilaments, increased production of oxygen reactive species, and DNA damage208. Another important study to investigate the suitability of iPSC cardiomyocytes as a platform for drug induced toxicity screens was conducted by Matsa et al.209 Through transcriptome profiling of iPSC-cardiomyocytes from two healthy individuals, authors were able to show in this study, that the maximum phenotypic variability was represented by inter–patient variability rather than intra–variability.

Through pathways analysis, authors found alterations in oxidative stress between the different iPSC lines, more specifically in the nuclear factor erythroid 2-like 2 (NFE2L2 or

NRF2). Additionally, authors were able to hone in pathways to be studied that were suggested by differences in gene expression and protein interaction data namely, energy metabolism and fatty acid oxidation pathways. Finally, tacrolimus and rosiglitazone, were used to target these pathways, and investigate differences in

116

responses to these drugs. As expected, the differential drug response observed, was related to differences in gene expression and pathways between the two studied cell lines. This is an important study to demonstrate quality control checks for iPSC derived cells if they were to be used as a platform for drug screens.

The ability to model drug induced peripheral neuropathy is another success story for use of iPSC in pharmacogenomics. A recent study used commercially available iPSC derived neurons and demonstrated changes in morphological features after treatment with four chemotherapeutic agents: vincristine, paclitaxel, cisplatin and hydroxyurea (used as a negative control).210 In the same model, knocking out TUBB2, a gene implicated in paclitaxel – induced neurotoxicity led to decreased neurite growth after paclitaxel treatment.210 This finding corroborated findings from other studies that linked decreased expression of TUBB2A with paclitaxel - associated toxicity.

Collectively, these data show the great potential of iPSC use in modeling drug- associated phenotypes.

Goal of The Project

We recently conducted a GWAS and identified an association between a missense variant (rs3732378; (T280M)) in CX3CR1 and RHTN. The variant allele

(M280) was associated with reduced risk of RHTN development in whites and Hispanics hypertensive participants recruited as part of the INternational VErapamil SR

Trandolapril STudy (INVEST)85. CX3CR1 is a chemokine receptor that is particularly expressed on inflammatory cells including natural killer cells, dendritic cells, and T-cells, with a high expression on monocytes211. Additionally, CX3CR1 is expressed on vascular smooth muscle cells212. Fractalkine, the ligand for CX3CR1 has unique features: it exists as a soluble chemokine, acting as a chemoattractant for monocytes and an

117

adhesion molecule on endothelial cells, facilitating monocytes-endothelial interactions213

Importantly, the Fracktalkine-CX3CR1 axis is instrumental in initiating the inflammatory process involved in atherosclerosis and plaque formation. This is demonstrated by schematic figure 4-1. Monocytes expressing CX3CR1 are recruited to endothelial cells in response to FKN and other chemotactic mediators, followed by interaction with endothelial cells through CX3CR1-FKN binding, and trans-endothelial migration213.

Monocytes in the sub-endothelial space interact with vascular smooth muscle cells through CX3CR1-FKN axis leading to accentuation of inflammation, release of inflammatory mediators and vascular smooth muscle cell proliferation, with an end - result of vascular dysfunction and atherosclerosis214, 215. Since inflammation and vascular dysfunction are implicated in HTN and RHTN, and atherosclerosis is frequently encountered in patients with RHTN148, we reasoned that the CX3CR1 variant may be modulating RHTN risk through attenuating CX3CR1–FKN axis, thereby decreasing the inflammation burden. We were particularly interested to pursue this variant in an iPSC model for the following reasons. First, this variant is a functional variant that is predicted to have a deleterious effect on CX3CR1. Second, CX3CR1 polymorphism (T280M) has been associated with decreased risk of cardiovascular disease, atherosclerosis and coronary artery disease.216-218 In our GWAS, the association of this variant was directionally similar to previous associations and participants with the variant allele in our dataset were at reduced risk of RHTN, which is in line with previous associations, documenting the protective effect of this variant from cardiovascular diseases. Third, since this variant is in the coding region of CX3CR1, it seemed feasible to edit this gene using the available gene editing tools. The biological plausibility and apparent feasibility

118

to edit the CX3CR1 were encouraging factors in deciding to further study CX3CR1, as a pilot to demonstrate the use of iPSC to validate pharmacogenomics variants.

Current functional studies to assess the interaction of CX3CR1 with FKN have been mostly conducted using immortalized cell line that do not normally express

CX3CR1. Therefore, these cell lines are usually used after stable transfection with the

CX3CR1 construct. An early study by Fong et al. used CX3CR1-transfected K562 cell line to study the role of CX3CR1 and FKN interaction in mediating leukocytes adhesion with endothelium.219 In this study, CX3CR1–transfected K562 cells demonstrated firm adhesion to ECV-304 cells that were expressing FKN, and TNF α – stimulated human umbilical vein endothelial cells (HUVEC). The firm adhesion between CX3CR1 and FKN was not inhibited by pertussis toxins, anti-integrin antibodies or EDTA/EGTA, suggesting integrin-independent interaction.219Additionally, immortalized cell lines that were stably transfected with CX3CR1 variants were used to evaluate the effect of these variants on the functionality of the CX3CR1 receptor and its interaction with FKN. A study by Daoudi et al. used human embryonic cell lines (HEK) that were transfected with the CX3CR1 IL249 –M280 haplotype and demonstrated an increased adhesiveness of the CX3CR1 variant to the membrane bound FKN.220 In contrast,

McDermott et al. showed that CX3CR1- M 280 transfected K562 demonstrated reduced binding to FKN–transfected human endothelial cells (926 FKN) compared to the wild type CX3CR1-transfected K562.218 In the same study, patients with homozygous variant

(M280) demonstrated reduced monocytes–FKN binding and chemotaxis.218 Further, the

M280 was associated with a reduced risk of cardiovascular disease in the Offspring

Cohort of the Framingham Heart218. For the purpose of understanding the molecular

119

mechanisms underlying the observed differences in CX3CR1-FKN interaction associated with CX3CR1 variants, Davis et al. used Chinese hamster ovary cells (CHO) that were transfected with CX3CR1 variants. In this study, there was no observed effect of CX3CR1 variants on FKN- stimulated AKT phosphorylation in CHO cells.221 While the CX3CR1- transfected cell lines were successful in providing some mechanistic insights onto the effect of variants on CX3CR1-FKN in the mentioned studies, these immortalized cell lines lacked the ability to model the CX3CR1-FKN interaction in the biologically relevant system in the context of the disease of interest. This may have led to the discrepancy of data regarding the effect of variant on binding between CX3CR1 and FKN. 218,219 For this reason, we sought to use iPSC along with gene editing to produce isogenic cell lines followed by differentiation into monocytes, to validate the

RHTN-associated M280 signal, and study its effect on CX3CR1-FKN mediated monocytes interaction. We hypothesized that monocytes of hypertensive patients with this variant may have reduced chemotaxis and altered receptor adhesion leading to weakened CX3CR1-FKN axis, and thus reduced risk of RHTN development.

Our overall goal was to examine the consequence of the rs3732378 CX3CR1

(T280M) on HTN and RHTN through investigating the CX3CR1-FKN axis, within iPSC differentiated monocytes (iPSC-Mo), focusing on monocytes interactions (chemotaxis and adherence).

Herein, we planned to use iPSC that were heterozygous for the rs3732278 SNP

(T280M). These iPSC were selected from hypertensive patients with mild to moderate hypertension, recruited as part of Pharmacogenomics Evaluation of Antihypertensive

Responses (PEAR).222, 223 PEAR iPSC were created from peripheral blood mononuclear

120

cell (PBMC). We set out to use gene editing using CRISPR–Cas9 to create three isogenic cell lines from each selected donor, to ensure a constant genetic back ground that only differs at CX3CR1. The three isogenic cell lines are: wild type allele expressing cell line, variant allele expressing cell line, and double knock–outs. Following gene– editing, we planned to differentiate these edited cell lines into monocytes (iPSC-Mo), and compare the effect on chemotaxis, adhesion and down–stream inflammation signaling pathways, among these three isogenic cell lines. A flow–chart explaining the overall experimental approach is shown in figure 4-2.

Methods

Source of iPSC Donors

We used two PEAR iPSC cell lines that were heterozygous for the CX3CR1 variant; PF0380 and PF0052. These two cell lines were part of the Pharmacogenomics

Evaluation of Antihypertensive Responses (PEAR)222, 223 iPSC cohort created in Dr.

Terada’s lab and the only two cell lines available in the database that were heterozygous for the SNP. We also used four iPSC cell lines that are homozygous for the wild type allele at CX3CR1 rs3732378 SNP. The iPSC cell lines were generated from hypertensive patients with mild to moderate hypertension (HTN) recruited as part of PEAR and PEAR-2 to evaluate of the association between genetic variability and BP response or metabolic adverse events to two commonly prescribed drug classes; ß- blockers or thiazide diuretics222, 223. These iPSC cell lines were generated from peripheral blood mononuclear cells and assayed for pluripotency and karyotyped to ensure freedom of any chromosomal aberrations. These cell lines were assayed for pluripotency markers such as SSEA4.

121

a) Generation of iPSC from PBMC and Culture (Step 1, Flow chart, Figure 4-2)

iPSC cells were generated in Dr. Terada’s lab from PEAR participants utilizing the Sendai virus SeVdp(KOSM)302L to deliver four reprogramming factors (OCT4,

Sox2, Klf4, and c-Myc) to peripheral blood mononuclear cells at a multiplicity of infection

(MOI) of 2224. After infection with Sendai virus for 2 hours, cells were plated at a density of 2x105 per well in a 6 well plate, onto mitotically inactivated mouse embryonic fibroblasts, which served as a supportive cell type for the generation and maintenance of iPSCs. On the day of virus infection and first two days, cells were cultured in RPMI

1640 medium. On day 3, the medium was switched to Primate ES Medium

(ReproCELL, #RCHEMD001) changed every other. iPSC clones were isolated ~ 2-3 weeks following the infection with Sendai virus and transferred to vitronectin coated plates and maintained in Essential 8 Medium (Stem Cell Technologies, #05940). Real time PCR was used to compare relative mRNA expression of the pluripotent-specific nuclear genes OCT4 and Nanog, as compared to human embryonic stem cells. In addition, immunofluorescence analysis was performed to confirm the surface expression of the pluripotent-specific cell surface marker SSEA4.The iPSC were also assayed for any chromosomal aberrations using karyotyping. b) Differentiation of Unedited iPSC into Monocytes

Our experimental plan was to generate edited cell lines followed by differentiation into monocytes; however, we performed differentiation of unedited cell lines while simultaneously performing gene editing steps, to ensure the stable production of iPSC derived monocytes (iPSC-Mo) that express CX3CR1, and therefore the protocol detailing the monocytes differentiation is described first before the gene editing experimental part.

122

The first attempt at differentiation was performed using PF0380 and PF0052

(unedited iPSC, heterozygous for the SNP), which we will refer to it as the first attempt at differentiation. The second attempt was performed using 6 total cell lines: two unedited heterozygous cell lines (PF0052; PF0380), and four unedited wild type cell line

(PF0623, PF0513, PF0204, PF028), which we will refer to as the second attempt of differentiation.

The differentiation method used here relied on embryoid body (EB) formation, which are three-dimensional aggregates of iPSC, arising spontaneously from iPSC and which can be differentiated into derivatives of ectoderm, mesoderm, and endoderm germ layers depending on culture conditions used225. The EBs were induced to differentiate into monocytes by the addition of macrophage colony-stimulating factor (M-

CSF) and interleukin (IL-3). Monocytes are detectable in the media 4 to 6 weeks after

EB formation and attachment. Because monocytes are non-adherent, they can be harvested from the culture supernatants every week upon media change and can be collected for several months.

iPSC (~1.2 x 105) grown on Matrigel in a 6 well plate were expanded and used for EB formation. Immediately before processing the cells for EB formation, media was removed and replaced with 2ml of fresh mTeSR1 media in each well. The 6 well plate was then placed on a grid to allow for an even cutting of the iPSC colonies; the well was scored using a sterile 18G needle on a 1ml syringe. These cut colonies were lifted using a cell scraper and transferred into 6 well non-adherence plate containing 4 ml of pre- warmed media (mTESR1 + ROCK inhibitor (to prevent cell death)) and cultured for 4 days until EBs formation, with half media change on day 2. After 4 days, the clumps of

123

EBs were formed and ~10 EBs were transferred into one well of the 6 well plate and culture media was added, consisting of serum free X-VIVO15 (Lonza), supplemented with 100ng /ml of M-CSF (PeproTech), 25ng/ml of IL-3 (PeproTech), 2mM glutamax

(Gibco), 100U/ml penicillin (Cellgro), 100ug/ml streptomycin (Cellgro), and 0.055 mM B- mercaptoethanol (Gibco). Two-thirds of the media was changed every five days. Non- adherent monocytes started to appear in the culture media approximately 4-6 weeks later. Monocytes were collected weekly as we changed the media, and were used to perform experiments. iPSC-Mo were assessed under microscope for morphology, and were evaluated by flow cytometry for CD14 (monocyte specific marker) and CX3CR1 surface expression, gene expression of CX3CR1 and MAF and chemotaxis assay. c) Gene Editing (Step 2, Flow chart, Figure 4-2)

For the gene editing of CX3CR1, we used two iPSC cell lines from donors that were heterozygous for rs3732378 as discussed above (source of iPSC donors).

We chose to utilize the HDR repair mechanisms to allow for precise editing and selection of the edited allele using a knock-in insert of antibiotic selection cassette226.

Initially, we planned to use one guide RNA (gRNA1) to direct Cas9 for cutting either allele with no specificity. gRNA is a synthetic RNA of ∼20 nucleotide in length that binds to the target DNA by complementarity and recruits Cas9 nuclease to produce double strand break (DSB). However, we observed that the designed gRNA preferentially targeted one allele due to the presence of a common missense SNP (rs11715522) at amino acid position 8. We therefore designed another gRNA (gRNA2) to direct the Cas9 to the other untargeted allele.

124

c-i) Design of gRNA and plasmid vector

We used the donor plasmid pCRISPR-CG02 (GeneCopoeia) containing Cas9 and guide RNA targeting exon 1 of CX3CR1. A second plasmid pDonor-D01

(GeneCopoeia) was used as a donor vector for homology directed repair. This plasmid contains a GFP and puromycin cassette flanked by CX3CR1 sequence upstream and downstream of the guide RNA target site (homology arms). In the event only one allele was targeted by homology directed repair, a second donor plasmid pDonor-D05

(GeneCopoeia) containing a neomycin resistance cassette flanked by homology arms to

CX3CR1 was used. This is illustrated by a schematic diagram in Figure 4-3. c-ii) Nucleofection

We used Nucleofection (Lonza), a modified electroporation method that relies on using electric pulses to deliver the plasmids (containing the Cas9/gRNA and the donor sequence) and enhances the targeting efficiency of stem cells224.

For a given transfection, we used a total number of 8x105 cells. Before transfection, we treated the cells with Accutase (1ml per well of a 6 well plate) for 5 minutes at 37C. After dissociating the cells into single cells by gentle pipetting 4-6 times,

Accutase was diluted by adding medium (DMEM:F12). Cells were collected into a centrifuge tube and spun at 800rpm for 3 minutes. Media was aspirated to form a cell pellet. The Human Stem Cell Nucleofector Solution I (Lonza) was prepared separately

(82ul Nucleofector solution + 18ul supplement) and mixed with up to a total of 4ug DNA

(from targeting plasmid containing Cas9 /gRNA and donor plasmid containing antibiotic selection cassette). The Nucleofector solution and the DNA was added to resuspend the cell pellet, mixed once and added to the cuvette supplied by the manufacturer. The cuvette was inserted in the electroporation machine and program was set to the

125

specified conditions for the stem cells (B-16). We transferred the transfected cells with

500ul of pre-warmed media (mTESR1 + 10 uM of ROCK inhibitor) into a 24 well culture plate pre-coated with Matrigel. Following the transfection, we transferred the contents of the nucleofection solution and cells using a transfer pipette supplied by manufacturer into a separate well that contains mTESR1 with ROCK inhibitor. c-iii) Selection of edited clones

We started treating the transfected cells with puromycin starting on day 3 (figure

4-4), at a concentration of 0.6 ug/ml, to kill the cells that did not acquire the puromycin cassette and to therefore select for the cells that successfully integrated the knock-in insert (Puro/GFP)226. We treated the cells for approximately 14-21 days. Colonies that arose from single cells were manually picked using a sterile syringe under the microscope and each colony was transferred onto a well of Matrigel coated 24 well plate with daily media (mTESR1) change. If we observed mosaics consisting of a mixture of unedited cells and genetically edited cells indicated by undetermined nucleotide sequence using Sanger sequencing, we repeated puromycin treatment for another 2-3 weeks after dissociating the iPSC colonies into single cell using accutase and the process of colony picking was repeated as described above. Timeline of cell transfection and picking the colonies are shown in figure 4-4. c-iv) Collecting genomic DNA from edited iPSC

Picked colonies were allowed to grow over ~ 3-7 days followed by passaging the cells using the non-enzymatic dissociation reagent ReLeSR (Stem Cell Tech.) and transferring to separate wells of a 12 well plate. Cells are allowed to grow and media

(mTESR1) is changed daily until cells were confluent (~1-1.5x106). iPSC were passaged using ReLeSR or EDTA (0.5mM) and ~ 10% of the cells are transferred into a

126

well of a 6 -well plate for expansion with daily media change, the remaining 90% of the cells were pelleted with PBS to extract genomic DNA. DNeasy Blood and Tissue kit

(Qiagen) was used for genomic DNA extraction according to the manufacturer instruction. c-v) Screening strategy for the edited allele

Because CX3CR1 was not detected in unedited iPSC or iPSC derived monocytes via Western blot, we could not rely on protein screening methods to confirm the allele(s) knock out, rather, we relied on Polymerase chain Reaction (PCR) to detect gene–editing events.

The most straight forward way to discern which allele was edited (the variant allele or the wild type allele) is to PCR amplify and sequence the region between exon 1

(where the GFP puro insert integrates) and the SNP of interest (rs3738378), however, it was not feasible to amplify this region using genomic DNA due to the large size.

Therefore, our strategy to discern which allele was edited was a two-step approach

(Figure 4-5): first, we performed Sanger sequencing of the PCR amplicon of unedited allele at the gRNA binding site using primers (1 and 2). This step was used to determine if the unedited allele is a wild type or variant for rs11715522, where gRNA prefers to bind. Second, we planned to sequence the transcript of CX3CR1 from iPSC differentiated monocytes (iPSC-Mo) to determine the genotype of the individuals at

SNP1: rs11715522 (where gRNA integrates) and SNP2: rs3732378 to determine if both

SNPs are inherited together (same haplotype) and thus determine if the edited allele was wild or variant for the rs3732378.

127

1) Polymerase Chain Reaction (PCR) screen and Sanger sequencing

PCR was used to verify gene editing as well as the insertion of GFP/Puro illustrated in figure 4-5. PCR was performed to amplify the unedited allele using primers specific to the regions flanking the gRNA site (primers 1 and 2: forward green and reverse green primers, figure 4-3, 4-5). Additionally, primers were designed to amplify the Puro/GFP cassette (primers 1 and 3: forward green and reverse teal primers, figures 4-3, 4-5). To amplify the unedited allele, we used a PCR reaction mix consisting of 2μL genomic DNA (40ng) from the growing iPSC colonies, 0.2μL of 10 μM mix of primers 1 and 2 (forward green and reverse green), and 22.5ul of the PCR platinum blue master mix to a total of 25ul PCR reaction mix. PCR conditions were set to an initial denaturation step; 94° for 3 minutes, followed by 40 cycles of 94° denaturation for

30 seconds, annealing at 55° for 30 seconds and elongation at 72° for 2.5 minute. The final extension step is 72° for 5 minutes. The same protocol was used to amplify the inserted cassette except for changing the primers used to primers 1 and 3 (forward green and reverse teal). The PCR products were run on an agarose gel (1.5%) for visualization of the specific band sizes. Detecting a band of expected size (548bp) using primers 1 and 2 and a band of specific size using primers 1 and 3 (520bp) indicated that the insertion of the GFP/Puro took place on one allele.

The PCR amplified products of unedited alleles using primers 1 and 2, were sequenced using Sanger sequencing to determine the nucleotide sequence of the unedited allele and detect the formation of indels. The chromatogram was visualized using 4Peaks software v1.8 (http://nucleobytes.com/4peaks/).

128

2) PCR to amplify transcript from iPSC differentiated monocytes (iPSC-Mo)

To amplify the CX3CR1 transcript from unedited iPSC-Mo, we used a PCR reaction mix consisting of 10μL cDNA (50ng) from iPSC monocytes isolated from

PF0052 or PF0380, 2.5μL of 50 μM mix of different combinations of forward and reverse primers (table 1), 2.5μl of 10X PCR buffer, 0.5 μL of 10mM dNTPs, 0.125 μL of

Taq polymerase (5 prime) to a total of 25 μL PCR reaction mix. PCR conditions were set to an initial denaturation step; 95° for 5 minutes, followed by 40 cycles of 94° denaturation for 30 seconds, annealing at 55° for 30 seconds and elongation at 72° for

45 seconds. The final extension step is 72° for 10 minutes. d)THP-1 culture

THP-1 is considered a myelomonocytic cell line and commonly used for monocytes studies227. THP1 was used to practice monocytes handling and as a control cell line for iPSC derived monocytes, and for which gene expression of CX3CR1 was assessed and compared to iPSC Mo. THP-1 cell line was provided as a gift from the lab of Dr. Mark Wallet and were cultured in RPMI-1640 media with 10% FBS, 25mM

HEPES, 1mM sodium pyruvate, 2mM glutamine, 0.05 mM 2-Mercapto-ethanol, and penicillin (50U/ml), Streptomycin (50U/ml), at a density of 1x106 cells/ml, at 37C in a humidified 5% CO2 atmosphere. e) Experiments for characterization and evaluation of monocytes

Flow cytometry of surface markers of unedited iPSC differentiated monocytes (iPSC-Mo)

Flow cytometry was performed using unedited iPSC-Mo to confirm the successful differentiation of iPSC into monocytes, through evaluating the expression of CD14, (a monocyte cell surface marker). Additionally, the expression of CX3CR1 on unedited

129

iPSC-Mo was evaluated. iPSC-Mo (0.5-1 x106 ) were washed with phosphate-buffered saline (PBS) and either stained in FACS buffer consisting of 1% fetal calf serum

(Hyclone) and 0.01% sodium azide with the marker specific antibody or isotype matched control (to control for background fluorescence) on ice for 30 minutes. We used CD14-phycoerythrin (PE) and CX3CR1-PE antibodies. iPSC-Mo were gated according to their side vs. forward scattering (SSC/FSC) dot-plot profiles and positive staining for CD14. The fluorescence was measured on 10,000 events using the BD

Accuri C6 flow cytometer. Data were analyzed on FSC/SSC dot plots and fluorescence intensity histograms were presented showing antibody staining in red relative to isotype- matched control in black. The percentage of cells expressing CD14 was determined gating the populations using background isotype fluorescence from the antibody staining. The results were expressed as percentage of cells positive for CD14.

Real time PCR to evaluate the gene expression of CX3CR1, CD14 and MAF

We wanted to evaluate the gene expression of CX3CR1, CD14 and cMAF in iPSC-Mo, to ensure successful differentiation and stable expression between the harvests. cMAF is important for myeloid cell development and encodes cMAF that induces differentiation to monocytes.228 We evaluated the expression of CX3CR1 in iPSC-Mo as compared to THP1, which stably expresses CX3CR1. Total RNA was extracted with the RNAqueous total RNA isolation kit (Ambion) from 2x106 iPSC-Mo or

THP-1. For each sample, RNA was extracted from either untreated iPSC-monocytes or

THP-1 and pre-treated with DNaseI using the Turbo DNase free kit (Ambion) to remove any DNA material. cDNA was synthesized from the extracted RNA with the High capacity cDNA Reverse Transcription Kit (Applied Biosystems) according to the manufacturer’s instructions. Quantitative real time PCR was performed using a StepOne

130

Plus real-time PCR machine (Applied Biosystems). Relative quantification of CX3CR1, and the housekeeping GAPDH mRNA expression was performed using ddCt method225.

GAPDH was used as a house keeping gene since it showed the most stable gene expression after treatment with stimulants such as MCP1 or FKN, compared to other tested house keeping genes including 18S and ß2 globulin. For THP1 and iPSC derived monocytes, we compared the Ct (cycle threshold values) of CX3CR1 and GAPDH using cDNA 2.5ng, 5ng, 25 ng and decided to use 25ng of cDNA as this amount produced reasonable Ct values for GAPDH, CX3CR1, and MAF. The CX3CR1 expression was compared between THP1 (control) and unedited iPSC-Mo (PF0380 and PF0052).

Similarly, relative quantification of MAF, and GAPDH mRNA expression was performed using ddCt method. We compared the expression of CX3CR1 and MAF in unedited iPSC derived monocytes (PF0623) among different harvests. All samples were run in triplicate.

Western blot to evaluate pERK and PAKT

We sought to evaluate the protein expression of CX3CR1 using unedited iPSC derived monocytes. Additionally, we sought to evaluate pERK and pAKT signaling pathways after stimulation with Fractalkine (FKN); the CX3CR1 ligand. ERK and AKT are important pathways involved in the adhesion and survival of monocytes229, 230. As part of our preliminary work, we observed that THP-1 cells treated with FKN demonstrated increased pAKT and pERK. We expected that similar effects will be observed in iPSC derived monocytes when treated with FKN.

For these experiments, we treated iPSC derived monocytes with FKN and TNFα

(as a positive control) to evaluate the ERK and AKT pathways. iPSC derived

131

monocytes were collected and transferred into 6 - well tissue culture plates at 1x106/ well. We treated iPSC monocytes with TNFα at 100ng/ul or FKN at 100nM for 5, 10, and

15 minutes. The drug concentrations were selected based on preliminary experiments in THP1 cells showing that FKN 100nM and TNFα 100ng/ul produced activation of ERK and AKT. Following stimulation, the plates were placed directly on ice and cells were washed twice with ice-cold PBS containing (sodium fluoride, protease inhibitor, Na orthovanadate) to stop the ligand stimulation. Cells were collected into 15ml tubes and spun down at 2000 rpm to form a cell pellet. 100ul of lysis buffer (RIPA buffer, sodium fluoride, protease inhibitor, 0.1% SDS, 100 μg/mL PMSF, 20 μg/mL Aprotinin, 1 mM Na orthovanadate ,pH 7.4) was added to the cell pellet to resuspend the cells and the lysate was sonicated for 5 sec to shear DNA and reduce viscosity. Quantification of extracted protein was performed using a BSA assay. Samples were mixed with 4X

Laemmli sample buffer containing 12% BME, boiled for 10 min and vortexed. We subjected 30-40 μl of each sample containing 40 ug of lysed protein to SDS-PAGE using 10% polyacrylamide for Western blot analysis. Samples were transferred onto

PVDF membrane (Millipore, Boston, MA), and blocked using 1% BSA in TBS-T ( 20 mM

Tris-HCl, 150 mM NaCl, 0.1% Tween 20, pH 7.4) at room temperature on an orbital shaker for 1 hr. Membranes were incubated overnight at 4° C with the primary antibodies; anti-Phos-Akt (1:200 dilution), anti-Phos-ERK (1:500 dilution), anti-CX3CR

(1:1000 dilution), anti-AKT(1:1000 dilution), anti-ERK (1:1000 dilution), or anti-GAPDH

(housekeeping 1:1000 dilution) antibodies (Cell Signaling Technology, Beverley, MA).

After overnight incubation, the membranes were washed 3 X 5 min in TBS-T and incubated for 1 hour with secondary antibodies; goat anti-rabbit IgG/HRP secondary

132

antibody for (Phos-ERK, GAPDH, ERK, CX3CR1); Cell Signaling Technology, Beverley,

MA) or sheep anti-mouse IgG/HRP (Phos-AKT; Invitrogen, Carlsbad, CA) at a 1:5,000 dilution in TBS-T at room temperature for 1 hr. Membranes were vigorously washed 3 times X 5 min in TBS-T, then incubated with Thermo Sicentific SuperSignal West Pico

(Rockford IL) chemiluminescence substrate developer (product #340800) for 1 min.

Membranes were imaged using Carestream X-OMAT LS film (Rochester New York) part # 161 1342. For primary antibody stripping, membranes were immersed in stripping buffer (25 mM Glycine-HCl, pH 2.0, 1% SDS) for 10 min at room temperature followed by washing 3 X 10 min in TBS-T. Membranes were blocked, and incubated with the desired primary antibody listed above.

Chemotaxis

iPSC derived-Mo chemotaxis was evaluated using a trans-well migration assay with 5-µm pore size inserts (cellbiolabs)231. Cell suspension (2x105) was added in the upper chamber onto the polycarbonate membrane placed above the well containing serum free X-VIVO15 (Lonza) or FKN (fractalkine; R&D systems) containing media and incubated for 3 hours. Two concentrations of FKN; 5ng/ml and 10ng/ml were used to detect a migration dose response. Migrating cell that were attached on the bottom of the insert were lysed, dyed with CyQuant dye and analyzed. After incubation for 30 min at room temperature in the dark, the fluorescence signal of migrated cells was measured using an EnVision Multilabel Plate Reader (Perkin Elmer).

Results

Successful Generation of PEAR iPSC

iPSC were successfully generated for 17 PEAR patients. iPSCs were shown to express pluripotent-specific nuclear genes (OCT4 and Nanog) at levels that are

133

comparable to embryonic stem cells.224 Additionally, immunofluorescence assay confirmed the expression of SSEA4, a pluripotent-specific cell surface marker.224 Two

PEAR iPSC (PF0052, PF0380) heterozygous for rs3732278 were used for gene editing and differentiation into monocytes (iPSC-Mo). Four iPSC samples from donors

(PF0623, PF028, PF0204, PF0513) who are wild type for rs3732278, were used for generation of monocytes.

Gene Editing Results

Using the protocol described in the gene editing section, we were able to edit iPSC from two PEAR patients; PF0052 and PF0380, and create cell lines, with one allele knocked out. The editing event was detected through the PCR screen and by amplifying the inserted cassette (GFP/Puro). We transfected PF0052 using gRNA1 and screened 7 colonies by PCR. On the PCR screen, we observed that 7 colonies amplified the region containing the insert. This was indicated by the presence of 520 bp band size specific of the insertion (Figure 4-6, gel image B). We noticed that one allele was preferentially targeted by the gRNA with the insertion of GFP/puro cassette since the nucleotide sequence of the unedited allele at the gRNA binding site was always

TTG as shown by Sanger sequencing (Figure 4-6, C) suggesting that one allele was preferentially edited. This preferential binding of the gRNA to one allele was likely due to the presence of a missense SNP at amino acid position 8 in exon 1 (rs11715522;

TTT>TTG). Because of the preferential targeting, another gRNA (gRNA2) was designed, to target the other unedited allele. We transfected PF0052 iPSC using gRNA2 and selected 12 iPSC colonies, ten of which were screened by PCR to detect the presence of the amplified insert. Sanger sequencing of the unedited allele of these

10 colonies revealed undetermined nucleotide sequence shown as TTN (N is

134

undetermined nucleotide), likely due to the presence of mosaics (mixed genotypes of colonies arising from mixed cells). This mixed genotype problem was resolved by dissociating iPSC colonies into single cell using accutase followed by puromycin treatment for another 2-3 weeks. We repeated the PCR screen for eight selected colonies, and found that seven colonies had the insert as shown in figure (4-7, gel B);

Sanger sequencing of the unedited allele for one of the colonies(Figure 4-7, C) shows the nucleotide sequence at gRNA binding site as TTT suggests that the other allele was edited. In an attempt to obtain a double knock-out of PF0052, both gRNAs (gRNA1 and gRNA2) were used to transfect iPSCs in a single transfection. The PCR screen and

Sanger sequencing of 12 colonies suggested that we obtained colonies with single allele edit but no knock out. For PF0380, we transfected iPSCs with gRNA1 and screened 5 colonies, one of which had a single allele edited. We also transfected

PF0380 with gRNA2 and screened 6 colonies. Although we observed a PCR band for the insert, the sequence of the colonies was not determined, likely due to mosaics. For both PF0052 and PF0380, we did not obtain colonies with double knock out. Several attempts of PCR using 5 different sets of CX3CR1 primers (Table.1) to amplify the

CX3CR1 transcript from iPSC–monocytes, extracted from PF0380 and PF0052, were unsuccessful.

We were not successful at obtaining the desired edited cell lines, and therefore all the results reported in the next section are from unedited iPSC differentiated monocytes.

Monocytes Production from Unedited iPSC Confirmed by CD14

Our first attempt at differentiation into monocytes was performed using iPSC from two donors (PF0052 and PF0380) who are heterozygous for the CX3CR1 rs3732378.

135

Monocytes started to appear in the media ~ 4-6 weeks post EB attachment onto the 6- well plate. We obtained a homogenous population of iPSC monocytes that were >90%

CD14-positive. Production of monocytes continued over ~5 months with growth factors

(M-CSF and IL-3) replaced with media change every 5 days. Approximately, 1-1.5 x 106 monocytes were collected weekly from each donor upon media change and monocytes were used on the same day for experiments. Since iPSC monocytes have the tendency to adhere to the culture plate and differentiate to macrophages, monocytes were used to perform experiments on the same day of harvest and could not be cryopreserved for use at later times.

The second attempt at differentiation was performed using six donors; two heterozygous (PF0052; PF0380) and four wild type for the rs3732378. The goal of this differentiation was to compare differences between wild type and variant iPSC after differentiation into monocytes. However, monocytes production was confirmed for only one out of six, which had; >90% CD14-positive as shown by flow cytometry (figure 4-8,

B). CX3CR1 surface expression was found to be low when assessed by flow cytometry

(figure 4-8, B).

CX3CR1 Expression in Unedited iPSC-Mo (PF0380 and PF0052)

We performed real time PCR using 25ng of cDNA to evaluate CX3CR1 expression in iPSC-Mo. Compared to THP1, which was used as a control, CX3CR1 expression in iPSC-Mo from PF0052 and PF0380 was found to be lower than

THP1.(Figure 4-9). Additionally, the difficulty to amplify the CX3CR1 transcript suggested the low expression of CX3CR1 in iPSC-Mo in PF0052 and PF0380.

136

Comparing Expression of CX3CR1 and MAF Between Harvests in Unedited iPSC- Mo (PF0623)

We assessed MAF and CX3CR1 expression in unedited iPSC-Mo from PF0623 that was produced in our second attempt of differentiation. We observed variable expression between 6 harvests collected over 6 weeks. (Figure 4-10)

Western Blot to Detect CX3CR1, pERK and pAKT

Several attempts were unsuccessful to detect CX3CR1 and activation of ERK,

AKT signaling pathways upon stimulating of iPSC monocytes with FKN. We used iPSC-

Mo generated from PF0052, and PF0380.This may suggest low levels of CX3CR1 in iPSC-Mo.

Chemotaxis

We evaluated the migratory capacity of iPSC monocytes (PF0623) toward FKN and used two different concentration of FKN; 5ng/ml, 10ng/ml. We did not observe a dose response increase in migratory capacity of iPSC monocytes. (Figure 4-11)

Discussion

In these studies, we set out to study CX3CR1 polymorphism in iPSC derived cell lines and to shed some light on the mechanistic underpinnings of the genetic association with HTN and RHTN. We originally planned to use differentiated monocytes from genetically edited, isogenic cell lines, derived from hypertensive patients. Several challenges in the project hindered accomplishing our original goals of using the iPSC derived monocytes as a platform to functionally validate an identified RHTN pharmacogenetic variant. We think that reporting the challenges may give researchers insights on the preparatory steps needed before embarking on gene editing and iPSC studies.

137

First, we observed variability in CX3CR1 expression among the different harvests of monocytes derived from same cell line. While some variability is expected between different cell lines, we expect minimal to no variability within the same cell line. As such, it is nearly impossible to discern the inter-individual variation related to genetic variations versus inter-batch or inter-harvest variation, especially if genetic variations translate into subtle phenotypic changes in in-vitro cell model. We think that the variability observed in expression may be attributed to maturity stage of differentiated monocytes since we also observed variability in MAF, a marker of monocyte differentiation. Another possible explanation of the variability may be the heterogeneity of population of monocytes. In general, monocytes are heterogeneous and expression of various receptors depend on both the subset of monocytes and microenvironment.232

Specifically for human blood monocytes, three subsets of monocytes exist according to recent immunological reports.232, 233This classification relies on the surface expression of CD14, CD16 and other receptors such as CCR2. For example, classical monocytes

(CD14++CD16-), which constitute the majority of blood monocytes, express high level of

CCR2 and low levels of CX3CR1. Intermediate monocytes (CD14++CD16+) are another subset of monocytes that contribute remarkably to atherosclerosis. The third subset of monocytes is the non-classical monocytes (CD14+CD16++), which express high levels of

CX3CR1 and low levels of CCR2. During human monocyte differentiation, classical monocytes leave bone marrow and subsequently differentiate into intermediate monocytes followed by differentiation into non-classical monocytes in the peripheral blood circulation. We think that the variable CX3CR1 expression could well be due to heterogeneous populations or variable differentiation fate of monocytes. Perhaps, the

138

CD16 expressing monocytes (intermediate and non-classical monocytes), which express high levels of CX3CR1 and are known to be pro-inflammatory234, would be the appropriate subset of monocytes for CX3CR1, as we focus on the link between inflammation driven by the CX3CR1–FKN axis within monocytes –vasculature interaction.

Existing protocols of monocytes/macrophage differentiation phenotyped monocytes markers at one time point225, 235, and we are not aware of studies that performed longitudinal immunophenotyping of iPSC monocytes to track their differentiation fate.

Originally, we were interested in studying the effect of CX3CR1 variant in monocytes differentiated from isogenic iPSC cell lines. However, given the variability of

CX3CR1 expression driven by factors unrelated to genetic variability, we felt that continuing to edit the iPSC cell lines was an impractical goal without ascertaining a final and standardized immuno-phenotype for the differentiated monocytes.

Additionally, we encountered some challenges with respect to CX3CR1 gene editing. First, we expected to be able to create double knock-outs, however, we have not observed any double knock out from editing of two iPSC cell lines, even when we screened up to 12 colonies in one editing process. Second, editing of one allele was a lengthy process and required multiple rounds of selection and colony screening. In our experiments, editing one allele took over 8 months. We think that this may be attributed to the reduced efficiency of CRISPR-Cas9 in the CX3CR1 locus. In general, the efficiency of genome editing is lower in iPSC compared to stable immortalized lines and screening a large number of colonies may be needed to obtain the desired gene editing.

139

It has been suggested that genome editing efficiency depends on how accessible the target locus is to the gRNA and the accessibility to the gRNA may depend on DNAaseI hypersensitivity of the targeted locus236. Of note, CX3CR1 is not expressed in iPSC and it is not clear if this lack of expression may have precluded or reduced the efficiency of gene editing. Therefore, we suggest that CX3CR1 may have not been amenable to gene editing due to factors related to accessibility of gRNA and thus affecting gene editing efficiency. Other encountered problems included the presence of colonies of mixed genotypes as we selected for edited colonies. This was overcome by dissociating the colonies into single cells followed by another round of puromycin treatment and screening for colonies that arose from single cells. Performing FACS sorting of GFP+ cells followed by plating the GFP+ cells at a low count, to allow for colonies to emerge from single cells, may be an alternative or more efficient means to select for genetically edited cells, but presents additional technical challenges and time requirements.

Lessons learned from iPSC: Genome editing of iPSC is an attractive and cutting–edge technology, however it is not without certain challenges. There are a number of lessons learned from the study presented here. First, a careful evaluation of the gene and gene locus to be edited is critical before embarking on a gene - editing project. Before designing a gRNA for a gene, one must know if the targeted gene has multiple mRNA isoforms as a result of alternative splicing. This is important because if the gene of interest expresses a splice variant that does not include the targeted exon, the activity of that gene may be still retained, even after knocking - out the exon. This was not the case in our study since we opted not to continue the gene editing for factors

140

related to the CX3CR1 variable expression in unedited monocytes. However, this is certainly an important step to check before starting a gene - editing project.

Second, before starting gene editing, researchers must ensure that both the target gene and other genes related to the studied pathway are expressed in the final differentiation tissue, which should have a stable phenotype and expression profile that is not changing between passages or harvests. Herein, we started a lengthy gene editing process before finally arriving at a conclusion that CX3CR1 may not be a practical gene target to study in iPSC-monocytes. Third, genome editing efficiency is much higher in immortalized cell lines236 e.g. HEK293T cells, compared to iPSC and therefore, researchers should use these immortalized cell lines for gene editing experiments to evaluate the targeting activity of gRNA and cutting efficiency of nuclease before embarking on editing iPSC cell lines. If it is found that editing of immortalized cell lines is inefficient, then it is highly likely that significant difficulties would be encountered when moving to iPSCs. Additionally, the frequency of indel formation, an indicator of nuclease cutting efficiency should be evaluated using the available assays, for example,

T7endonuclease I (T7EI) that recognize indels resulting from non-homolgy end joining repair in CRISPR-Cas9 editing experiments.236, 237 Further, researchers should be aware of the accessibility of the genome in the regions that will be targeted for gene editing; for example, data on DHS hypersensitive sites (accessible chromatin) in stem cells can be found using ENCODE, enabling the selection of gRNA with high targeting efficiency to the genomic locus. Researchers may need to test the efficiency of several gRNA (3-4 gRNA) before deciding on the optimum one for cutting at the specific locus.236 Perhaps, the most important lesson from this project, is that researchers

141

should not start gene editing before carefully evaluating and ensuring a clear, stable and measurable phenotype. Additionally, the phenotype studied should be reproducible over time and over several attempts of differentiation. Researchers should repeat the differentiation process from one cell line multiple times, and should get the same phenotype. Having large intra-variability within the same line because of variable differentiation may pose challenges in distinguishing changes in phenotypes as it relates to genetic variants or drug treatments, especially for complex phenotypes that may already have very subtle phenotypes. More importantly, the gene of interest should be assessed carefully before editing to ensure the gene is adequately expressed in the target tissue. In our project, we started the project at the gene editing level, which should have been undertaken after many other proof of concept experiments in immortalized or non-edited cells, had been completed. Should we have known all these factors about the CX3CR1 not being adequately expressed and having variable expression, we would have avoided challenges related to gene editing, and the waste of effort and resources. An important lesson through this project is that one should always start experiments proximally rather than distally, ensuring that all the preparatory steps and validation of approaches is in place before embarking on the critical experiments.

Finally, researchers should set clear endpoints and expectations at each stage of the research projects, especially in such large projects as iPSC that constitute multi- interrelated stages, each of which may take considerable amount of time. The ability to identify when an experiment or project is at a dead end, or unlikely to progress as planned is important for a scientist.

142

Conclusions

In our iPSC studies, we were not able to validate CX3CR1-RHTN associations because of the unforeseen roadblocks in the project. We observed variability of

CX3CR1 expression between the harvests using monocytes derived from one cell line.

This may suggest heterogeneity of monocytes or varying differentiation stages for the monocytes. This preliminary data on variable expression of CX3CR1 calls for immunophenotyping studies of different monocytes receptors for an accurate characterization of monocytes. A careful consideration of various aspects of CX3CR1, the gene under this study, could have saved us a lengthy process of genome editing before coming to conclude that iPSC differentiated monocytes is not a suitable platform to study the CX3CR1 pharmacogenetic variant.

143

Table 4-1. Sequences of gRNA and primers used Name Sequence gRNA1 CCGCAAATTCAACCGTC Genomic DNA Sequencing primers Primer 1 (forward green) ataacctccccatccctcac Primer 2 (reverse green) ggccaagactctccctctct Primer 3 (reverse teal) cacggcgactactgcactta CX3CR1 transcript primers F1:cttaccaggccgtggactta Set1 R1:gaggattcaggcaacaatgg F2:cttaccaggccgtggactta Set2 R2:tgagaggattcaggcaacaa F3:taccaggccgtggacttaaa Set3 R3:gaggattcaggcaacaatgg F4:taccaggccgtggacttaaa Set4 R4:tgagaggattcaggcaacaa F5:aggatgagagaacccctgga Set5 R5:gaacttctccccagcaaatg gRNA1 was designed to target exon 1. Primers 1 +2 were used to amplify genomic region around nuclease cutting site and thus the presence of band size 548bp indicated the presence of unedited allele. Primers 1+3 were used to amplify the insert (GFP/Puro) and thus the presence of a band size of 520bp indicated the presence of an insert. Five sets of primers were used to amplify CX3CR1 transcript from iPSC-Mo.

144

Figure 4-1. Schematic figure illustrates the CX3CR1-Fractalkine (FKN) axis. During inflammation, the release of chemotactic material including FKN leads to recruitment of monocytes that interact with endothelial cells through CX3CR1 expressed on monocytes and FKN expressed on endothelial cells. Monocytes then infiltrate through endothelial cells and can also interact with vascular smooth muscle cells through CX3CR1-FKN leading to the release of inflammatory mediators, vascular smooth muscle cell proliferation and finally vascular dysfunction.

145

Figure 4-2. Flow chart to illustrate the planned experimental work. iPSC cell lines were generated from peripheral blood mononuclear cells (PBMC) from 17 PEAR patients. Two iPSC that were heterozygous for CX3CR1 rs3732378 were selected to be edited, using CRISPR Cas9-gRNA, and to generate three isogenic cell lines; a wild allele, variant allele, double-knock out cell lines, followed by differentiation into monocytes (iPSC-Mo).

146

Figure 4-3. The guide RNA in light red font binds to the complementary target sequence (dark red font) in exon one. A plasmid donor with homology arms and puro/GFP cassette integrates in exon one. Green primers (forward and reverse green) amplify genomic region if allele is unedited. Forward green and reverse teal primers amplify puro/GFP insert. (HA-L: left homology arm; HA-R: right homology arm).

147

Figure 4-4. Timeline of transfecting iPSC colonies, selecting colonies, and screening colonies

148

Figure 4-5. Schematic figure illustrates the screening strategy of editing event. Top panel: illustrates three scenarios of Puro/GFP integration at SNP1 (rs11715522). Bottom panel: shows the primers used to amplify the genomic region of unedited allele (1+2) and the genomic region of insert (1+3). Detecting a band using primers 1+3 and a band using primers 1+2 indicate that one allele has insert, and the other is unedited (no insert). However, this does not discern which allele was edited (variant or wild type allele with respect to our SNP of interest (SNP 2: rs3732378). Detecting a band for the insert using primers (1+3) and no band using primers 1+2 suggest that we obtained a double knock out as in case 3. Our strategy for discerning which allele was edited is two step approach: first, we sequenced the unedited allele using primers (1+2). This will determine if the unedited allele is wild type/ or variant for SNP1, where gRNA prefers to bind. Second, sequence the transcript of CX3CR1 to determine the genotypes of the individuals at SNP1 (where gRNA integrates) and SNP2 and discern if both SNPs are inherited together (same haplotype). Case1: the nucleotide sequence of the unedited allele with primer 1+2 at SNP1 is TTG, and the individual haplotype at SNP1 and SNP2 is shown (two SNPs; TTG and ATG are inherited together), we can conclude that wild type allele A1 at rs3732378 is edited and this cell line is expressing the variant type allele. Case 2: the nucleotide sequence of the unedited allele with primers 1 and 2 at SNP1 is TTG and the individual haplotype at SNP1 and SNP2 is shown (two SNPs are not inherited together), we can conclude that the variant allele A1 at rs3732378 is edited and this cell line is expressing wild type allele.

149

150

Figure 4-6. PCR screen and Sanger sequencing of PF0052 transfected with gRNA1. A. Gel electrophoresis of PCR bands (548bp) using primers 1+2 (forward green and reverse green) amplifying genomic region around gRNA binding (unedited allele) B. Gel electrophoresis of PCR bands (520bp) using primers 1+3 amplifying genomic region of the GFP/Puro insert. C. Sanger sequence of an unedited cell (top panel) and an iPSC colony #8 (red square). A double peak (green and black peaks) at TTG (arrow) indicates heterozygosity at this position. The sequence of only one peak (bottom panel) indicates the presence of one allele. la: DNA ladder

151

Figure 4-7. PCR screen and Sanger sequencing of PF0052 transfected with gRNA2. A. Gel electrophoresis of PCR bands (548bp) using primers 1+2 (forward green and reverse green) amplifying genomic region around gRNA binding (unedited allele) B. Gel electrophoresis of PCR bands (520bp) using primers 1+3 amplifying genomic region of the GFP/Puro insert. C. Sanger sequence of an unedited cell (top panel) and an iPSC colony #9 (red square). A double peak (green and black peaks) at TTG (arrow) indicates heterozygosity at this position. The sequence of only one peak (bottom panel) indicates the presence of one allele.

152

Figure 4-8. Differentiation of iPSC (PF0623) into monocytes. A. iPSC were scraped and transferred to low adherent plates to form EBs, which are then attached and induced with growth factors; M-CSF and IL-3 to produce monocytes. B. First two panels: Forward scatter vs side scatter dot plot of unstained and CD14 stained monocytes with a gate around monocytes population, third panel: histogram represent CD 14 staining (red) compared to unstained (black), fourth panel: histogram represent CX3CR1 staining compared to unstained

153

Figure 4-9. CX3CR1 relative expression of PF0052 and PF0380 compared to THP1 cells. Expression is normalized to GAPDH

154

Figure 4-10. Comparing CX3CR1 and MAF relative expression of PF0623 among 5 harvests. Expression is normalized to GAPDH

155

Figure 4-11. Chemotaxis assay of iPSC – monocytes (PF0623) in response to fractalkine (FKN). Cell suspension (2x105) was placed onto the polycarbonate membrane placed above the well containing serum free or FKN containing media and incubated for 3 hours. Two concentrations of FKN; 5ng/ml and 10ng/ml were used to detect a migration dose response.

156

CHAPTER 5 CONCLUSIONS AND FUTURE DIRECTIONS

Resistant hypertension (RHTN) is an important clinical problem that is associated with adverse cardiovascular outcomes, especially stroke, congestive heart failure and kidney disease.149, 150, 152, 153 While the clinical predictors of RHTN are well studied and documented, the genetic predictors are less well studied. RHTN is a complex phenotype driven by multiple interrelated etiologies. The fact that a subset of hypertensive patients do not respond adequately to multiple antihypertensive medications suggests the presence of genetic variability in the different pathways of

HTN leading to inadequate responses to the prescribed antihypertensive drugs.

Currently, there are limited genetic data on RHTN, however, these data at this point do not allow for deciphering the genetic basis of RHTN, due to limitations related to sample size and the focused nature of candidate gene approach in most of the previous studies. Therefore, in this dissertation, we aimed to identify genetic predictors of RHTN through a genome wide association study (GWAS) that is capable of identifying common genetic variants associated with such a complex pharmacogenetics as RHTN.

For this analysis, we used available genome wide and clinical data from two randomized outcomes–driven clinical trials, from which we were able to create RHTN datasets from independent datasets of hypertensive patients.

While genome-spanning analyses including GWAS, can provide insights into the genetic basis of complex phenotypes, challenges to revealing the mechanistic link between identified variants and the phenotype usually arise, and therefore, in vitro functional studies are often needed to unravel this relation and understand the molecular mechanisms of the associations. To this end, induced pluripotent stem cell

157

(iPSC) serves as a promising tool to validate functional roles of the identified genetic associations in the relevant tissue. The indefinite supply of cells that iPSC can provide and the capability to differentiate into human tissues that are hard to access in humans, including brain, vasculature, and heart, are by far the most attractive properties of using iPSC as in vitro models. Therefore, we sought to advance the use of state-of-art induced pluripotent stem cell technology in pharmacogenomics, through performing an iPSC-based pilot study, to validate the GWAS signals from our analysis.

In Chapter 2, we performed and documented quality control procedures to create a cleaned and analyzable GWAS dataset. For that purpose, the Secondary Prevention of Small Subcortical Strokes (SPS3)97, 98 – GENEtic substudy (SPS3-GENES) was used. In addition, we performed imputation procedures using the cleaned dataset of

SPS3. The dataset was imputed to 1000 Genomes reference panel Phase III94. The imputation was performed to include variants that were not directly genotyped, which allows for datasets that are genotyped with different platforms, to be combined through meta-analysis. Additionally, imputation can help fine-map the association, allowing for the discovery of the actual causal variants. Chapter 2 may serve as an educational guide as it contains detailed steps for the QC and imputation procedures, with example outputs.

In Chapter 3, we performed GWAS analysis in a discovery cohort that includes whites and Hispanics hypertensive participants from the INternational VErapamil SR

Trandolapril STudy (INVEST)85–GENEtic substudy (INVEST-GENES). We identified ten

SNPs in three genetic loci that were prioritized for replication, as they had evidence for biological or functional roles. The replication was attempted in another independent set

158

of hypertensive patients from SPS3–GENES. We were able to replicate three signals in

MSX2 (rs11749255), PTPRD (rs324498) and IFLTD1 (rs647504). To increase the power of our GWAS analysis, we combined the association results of INVEST and

SPS3 in a combined meta-analysis, and prioritized other signals for validation in an independent RHTN dataset from the electronic MEdical Records& GEnomics

(eMERGE) Network. Through this step, we identified a SNP (rs16934621) in BNC2 region that was consistently associated with RHTN with p =0.015, however did not meet the Bonferroni threshold p value of replication. The lack of replication was not surprising to us, as we believe that eMERGE is a suboptimal replication cohort for our datasets for multiple reasons. One of the obvious reasons is that the RHTN phenotype in eMERGE may be less robust than INVEST and SPS3, since BP recordings and medication use are usually less carefully ascertained in the electronic health record setting, versus clinical trials. However, at the time of completion of our INVEST-SPS3 meta-analysis, eMERGE was the only available dataset with a constructed RHTN phenotype and

GWAS data.

Unlike candidate gene studies, in which selection of variants is based on a priori knowledge of the biological function of the gene, and its role in the phenotype, GWAS is an agnostic approach that may identify SNPs in genes with little known about its function or role in the phenotype. Therefore, researchers spent some time and effort trying to understand the role of the identified SNP(s) and/or genes in the phenotype. In our GWAS analysis, we identified three RHTN associations in independent gene loci.

The first identified and replicated signal (rs11749255) is in the MSX2 region. This signal reached genome wide significance by combining INVEST and SPS3 datasets. MSX2

159

encodes for a muscle segment homeobox gene family that promotes the genetic expression of osteogenic factors including alkaline phosphatases and plays a role in bone development169. Additionally, MSX2 is a transcriptional modulator in vascular calcification170 According to GTEx, the rs11749255 is an eQTL174, 175 to MSX2 and the variant allele is associated with increased expression of MSX2 in the brain. It is possible that patients with the variant may have increased vascular calcification, that could possibly lead to vascular dysfunction and RHTN. The consistent associations in four cohorts of INVEST and SPS3, and the relevance of MSX2 to RHTN mechanisms through links to vascular biology, make this association compelling. Additionally, it suggests the involvement of vascular dysfunction in the etiology of RHTN. Certainly, this variant will have to be assessed for replication in other cohorts, to gain further support.

Should enough evidence accumulate from further replications, this variant may be worthwhile to study in in vitro models to identify the exact mechanism in RHTN.

The second interesting signal is an intronic SNP (rs324498) in PTPRD that was previously identified in a large gene centric association analysis conducted in INVEST.84

This SNP was among our top associations in the GWAS analysis and replicated in

SPS3. Interestingly, other SNPs in PTPRD were associated with BP response to atenolol in a GWAS analysis from the Pharmacogenomic Evaluation of Antihypertensive

Responses (PEAR).182 PTPRD encodes the protein tyrosine phosphatase receptor type

D (PTPRD), which belongs to the family of receptor protein tyrosine phosphatases.

PTPRD dephosphorylates the signal transducers and activators of transcription 3

(STAT3),238 which play a role in renal RAS through JAK-STAT pathway.239 It may possible that PTPRD may be involved in BP regulation by its effect on RAS in the

160

kidneys.182 Additionally, PTPRD has been associated with diabetes and coronary artery disease.180, 181The fact that PTPRD has several associations with cardiovascular related phenotypes, and more specifically BP response, provide compelling evidence for the importance of this gene.

The third associated and replicated signal is a SNP (rs6487504) in IFLTD1 region. Although the exact function of the gene is unknown, and therefore, we cannot draw any speculations about the role of this gene in RHTN, previous associations with echocardiographic parameters, and cardiovascular related phenotypes have been reported.176-178These imply that the gene is somehow involved in cardiovascular phenotypes, which increases the likelihood of this being a meaningful association.

Altogether, the consistent association for these three SNPs that was observed in two independent studies is compelling, and warrants further replication in other emerging datasets to support a potential role as genetic predictors of RHTN.

Additionally, we identified an association with RHTN for a SNP (rs3732378) in

CX3CR1. In our INVEST GWAS analysis, this signal was associated with reduced risk of RHTN that was consistent among the different ethnic/ancestry groups. The rs3732378 SNP (T280M) is a missense SNP in CX3CR1 that changes the amino acid of the encoded CX3CR1 from Thereonine to Methionine. CX3CR1 is a chemokine receptor that is particularly expressed on inflammatory cells including natural killer cells, dendritic cells, and T-cells, with a high expression on monocytes211. Fractalkine, the ligand for CX3CR1 receptor has unique features as it exists as a soluble chemokine and an adhesion molecule on endothelial cells, facilitating monocytes - endothelial interactions.213 Monocytes that express CX3CR1 are recruited to endothelial cells ,

161

followed by interaction with endothelial cells through CX3CR1-FKN binding, and trans- endothelial migration213. Migrating monocytes interact with vascular smooth muscle cells through CX3CR1-FKN axis leading to release of inflammatory mediators and vascular smooth muscle cell proliferation, with an end - result of vascular dysfunction and atherosclerosis.214, 215 Previous studies have documented that patients with the variant have reduced risk of atherosclerosis, carotid artery stenosis and cardiovascular diseases, 216-218 which corroborated the CX3CR1 association with reduced RHTN in our analysis. We therefore hypothesized that hypertensive patients with the T280M variant may have attenuated CX3CR1-FKN axis due to altered binding of monocytes with the receptor. Although this variant was not replicated in SPS3, the functional evidence of this variant and the biological relevance to the phenotype, supported by extensive literature, was encouraging for us to validate this variant in in vitro functional studies.

The ultimate goal of this work is to discover and validate genetic predictors that can be used in identifying patients at highest risk of RHTN who may benefit from specific pharmacological or non-pharmacological approaches. This is only possible if the identified variants have sufficiently large effect size to be predictive of drug response or adverse effect, if used in the clinical setting. One way to facilitate the clinical use of such genetic variants and improve the translatability of the findings is to create risk scores that include a panel of validated SNPs. Ultimately, these risk score can be utilized to define an individual risk of RHTN. In our studies, we created a risk score for each patient by summing up the risk alleles of the three replicated SNPs; rs11749255

MSX2, rs324498 PTPRD and rs6487504 IFLTD1. The risk score was significantly associated with increased risk of RHTN, which was consistent in all four cohorts of

162

INVEST and SPS3. Patients with increased number of risk alleles were at a higher risk of RHTN compared to patients with a lower number, when INVEST and SPS3 datasets were combined. Although this risk score seems to be promising in identifying patients at risk of RHTN, it will have to be validated in other similar RHTN datasets, to gain the level of evidence needed to support its potential use in clinics, perhaps with other clinical predictors.

In Chapter 4, we sought to advance the use of iPSC–based model in pharmacogenomics through using the iPSC differentiated cell to validate one of the identified GWAS signals. We took forward the identified association with the missense signal in CX3CR1 since a coding region variant is more amenable to this investigative approach. Our ultimate goal was to functionally validate the variant and gain insights on its role of the variant in RHTN by studying the effects on CX3CR1-FKN interactions.

We were interested in studying the variant in a constant genetic background, to prevent any confounding effects from variable genetic ground at other loci, therefore, we planned to genetically edit the CX3CR1 locus using CRISPR-Cas-9, and create three isogenic cell (variant allele expressing, wild allele expressing and knock-outs cell lines), followed by differentiation into the relevant tissue.

Since this variant lies in the coding region of the gene (exon 2), we anticipated that gene editing would be a simple and straightforward task, which was another motivation for us to study this gene in iPSC model. Further, since CX3CR1 is highly expressed on monocytes, therefore, we reasoned that monocytes will be an optimal tissue for studying CX3CR1 in iPSC. We set out to genetically edit two iPSC samples

163

from hypertensive patients, recruited as a part of the Pharmacogenomics Evaluation of

Antihypertensive Responses (PEAR), followed by differentiation into monocytes.

After spending over 1.5 years in efforts to create the edited isogenic cell lines, we were only able to get one allele edited for two iPSC PEAR samples. Additionally, we were not able to obtain double knock-outs as anticipated, even after several attempts of gene editing. The amount of time and effort spent made us realize the challenges of gene editing the CX3CR1. Although it is not clear why we were not able to obtain the desired edits, we attributed this difficulty to the reduced efficiency of CRISPR-Cas9 in the CX3CR1 locus. Of note, CX3CR1 was not expressed in iPSC and it is not clear if the lack of expression may have reduced the efficiency of guide RNA targeting. It has been suggested that CRISPR Cas9 guide RNA partially depends on DNAase I hypersensitivity of the targeted genomic loci,236 so it is possible that the CX3CR1 locus was not easily accessible to the guide RNA.

Additionally, while simultaneously differentiating unedited iPSC into monocytes, to ensure that we have a working protocol, we observed harvest- to- harvest variation in

CX3CR1 expression in one cell line. We considered this intra-individual variability in

CX3CR1 expression a major roadblock in the project. Even if we were successful in obtaining the edits, the large intra-variability will make it hard to interpret our data, and discriminate between the inter-cell line variability as opposed to intra-variability due to technical variation, and therefore, we concluded that continuing to edit the CX3CR1 is not a practical goal.

Although we did not achieve our goal of validating the pharmacogenetic association in CX3CR1, we gained appreciation of the technical difficulties associated

164

with iPSC studies, and learned some lessons from iPSC based modeling. Based on our real-world experience, we provide some recommendations that may be useful for future iPSC/gene editing studies in pharmacogenomics. Since the goal of iPSC in pharmacogenomics is to model the phenotypic changes associated with the genetic variants, it is highly important that researchers perform quality control checks to ensure that intra-individual variation is at an acceptable minimal. This is important to confidently attribute the phenotypic changes to the genetic variant, rather than intra-clonal variations or other technical variations. Additionally, researchers should ensure that the genetic expression profile of the final differentiated tissue is enriched for genes that are relevant for the phenotype under the question. Most importantly, researchers should ensure that the differentiation is robust and stable, and that the phenotype of the final tissue is reproducible over multiple times of differentiation. Additionally, researchers should decide a clear and a quantifiable cellular phenotype that they can assess with regard to the variant of interest.

Summary and Future Directions

In summary, we used a GWAS approach to identify novel genetic predictors of

RHTN. Through this work, we identified novel gene loci for RHTN and created a genetic risk score to identify individuals with highest risk for RHTN. We also conducted a pilot study using iPSC based modeling, to validate a discovered genetic variant, however, we were unable to accomplish our goal due to unforeseen challenges related to gene editing, and others related to instability of gene expression of the target gene in iPSC differentiated monocytes.

The strengths of this project include replicating novel genetic loci across two independent clinical trials, providing compelling evidence for the importance of these

165

loci, and emphasizing the need for future replications in similar cohorts of well-designed and appropriately powered studies. Potential replication opportunities will become available as part of International Consortium for Antihypertensives Pharmacogenomics

Studies (ICAPS: https://icaps-htn.org/), as many participating research groups are already constructing their RHTN datasets. ICAPS contains research collaborations who own genome wide data for over than 350,000 participants in observational, epidemiological and clinical trial settings. Therefore, RHTN definition will have to be synthesized separately among observational and clinical trial data, to ensure a harmonized phenotype and improve the chances of replication. Utilization of multiple studies in ICAPS will likely increase the opportunities of identifying more replicable associations for RHTN, that have large effect size comparable to what is seen in pharmacogenomics associations, and which may have a potential for clinical us.

Ultimately, the genetic information may be more useful if aggregately used as in the case of generated risk scores. A useful application would be to generate prediction algorithms that can incorporate these genetic risk scores and clinical factors, to identify risk patients who may benefit from specific pharmacological or non – pharmacological approaches.

We believe that RHTN is a pharmacogenomics phenotype that could be driven by inadequate response to the current antihypertensive drug classes. To further understand the role of associated SNPs in RHTN, and perhaps, in the context of drug response, these associations should be investigated within the various drug classes.

This type of analysis may help identify genotypes that may derive benefits or risks from a certain drug treatment. Additionally since the available antihypertensive drug classes

166

work by different BP lowering mechanisms, identifying such SNP–drug interactions may help explain the mechanisms by which these variants lead to RHTN. This requires categorizing the dataset in different drug classes, which may compromise the power to detect associations in small datasets. Large datasets that will be available through the

ICAPS hold promise to conduct these types of analyses.

RHTN is likely driven by a group of variants of relatively common frequency in the population, and others that are less frequent, and therefore, we think that we may miss important genetic predictors, by only relying on GWAS approach that is only powered to detect common variants with modest effect sizes. Although we can overcome this to some extent, by increasing the power through meta-analyzing multiple

GWAS datasets in ICAPS, we think that future studies may benefit from next generation sequencing methods, for example, whole genome or whole exome sequencing analyses, to capture the whole spectrum of variants in RHTN. Additionally, incorporating multi–omics datasets such as transcriptomics, metabolomics and epigenomics, may help provide more insights onto the mechanistic pathways of RHTN by integrating multiple layers of data, and capturing more information on the biology of BP regulation such that, no information will be missed by using genetic information alone. However, this approach may not be ready yet for implementation in RHTN, as this kind of data is not yet available at the current time. Certainly, it will be promising to use the multi-omics approach as data begin to build up.

In conclusion, the results of this project highlighted the promising potential for using comprehensive GWAS analysis to identify genetic determinants of RHTN. This is the first GWAS analysis using data from clinical trials, and will likely pave the way for

167

more analyses to follow, and more genetic variants to be discovered. Our experience with iPSC will likely educate researchers who are interested in using iPSC in their pharmacogenomics studies, and hopefully will serve as a guide on how to generally prepare before starting iPSC based studies.

168

APPENDIX A CODES USED FOR QC AND IMPUTATION PROCEDURES

1)Removing SNPS of MAF >10% difference between CIDR and AKESOgen PBS -N plink

#PBS -M [email protected]

#PBS -m abe

#PBS -o SPS3_remove_SNP.out

#PBS -e SPS3_remove_SNP.err

#PBS -l nodes=1:ppn=1

#PBS -l pmem=15gb

#PBS -l walltime=4:00:00 cd $PBS_O_WORKDIR module load plink plink --bfile SPS3_merged_data_1 --exclude SNPs_Removed_After_QC.txt --make-bed

--out SPS3_merged_Caitrin_removed_SNP_050915

2)Remove SNPS and samples at a call rate 95% #!/bin/bash

#PBS -N plink

#PBS -M [email protected]

#PBS -m abe

#PBS -o SPS3_remove_poor_call_SNP.out

#PBS -e SPS3_remove_poor_call_SNP.err

#PBS -l nodes=1:ppn=1

#PBS -l pmem=15gb

#PBS -l walltime=4:00:00

169

cd $PBS_O_WORKDIR module load plink plink --bfile SPS3_merged_Caitrin_removed_SNP_050915 --geno 0.05 –mind 0.05 -- make-bed --out SPS3_merged_Caitrin_removed_SNP_2_050915

3)Remove monomorphic SNPs #!/bin/bash

#PBS -N plink

#PBS -M [email protected]

#PBS -m abe

#PBS -o SPS3_remove_SNP_Monomor.out

#PBS -e SPS3_remove_SNP_Monomor.err

#PBS -l nodes=1:ppn=1

#PBS -l pmem=15gb

#PBS -l walltime=4:00:00 cd $PBS_O_WORKDIR module load plink plink --bfile SPS3_merged_Caitrin_removed_SNP_2_050915 --maf 0.0004 --make-bed

--out SPS3_merged_Caitrin_removed_SNP_3_050915

4)Code to remove duplicate subjects I ran this step as I already know the list of duplicate samples. Normally, one would run

IBS/IBD first to output a list of duplicate samples

#!/bin/bash

#PBS -N plink

170

#PBS -M [email protected]

#PBS -m abe

#PBS -o SPS3_remove_duplicate.out

#PBS -e SPS3_remove_duplicate.err

#PBS -l nodes=1:ppn=1

#PBS -l pmem=15gb

#PBS -l walltime=4:00:00 cd $PBS_O_WORKDIR module load plink plink --bfile SPS3_merged_Caitrin_removed_SNP_3_050915 --remove

SPS3_duplicates_removed_by_maryland_050915.txt --make-bed --out

SPS3_merged_Caitrin_removed_SNP_and_duplicate_050915

5)Code to produce high quality SNP data for population stratification and genome wide IBS/IBD #!/bin/bash

#PBS -N plink

#PBS -M [email protected]

#PBS -m abe

#PBS -o SPS3_merged_Caitrin_extra_QC.out

#PBS -e SPS3_merged_Caitrin_extra_QC.err

#PBS -l nodes=1:ppn=1

#PBS -l pmem=15gb

#PBS -l walltime=4:00:00 cd $PBS_O_WORKDIR

171

module load plink plink --bfile SPS3_merged_Caitrin_removed_SNP_and_duplicate_050915 --maf 0.1 -- hwe 0.001 --geno 0.005 --make-bed --out

SPS3_merged_Caitrin_removed_SNP_and_duplicate_extraQC

6)Code for SNP LD pruning #!/bin/bash

#PBS -N plink

#PBS -M [email protected]

#PBS -m abe

#PBS -o SPS3_merged_Caitrin_prune.out

#PBS -e SPS3_merged_Caitrin_prune.err

#PBS -l nodes=1:ppn=1

#PBS -l pmem=15gb

#PBS -l walltime=4:00:00 cd $PBS_O_WORKDIR module load plink plink --bfile SPS3_merged_Caitrin_removed_SNP_and_duplicate_extraQC --indep 50 5

2 --out SPS3_merged_Caitrin_removed_SNP_and_duplicate_extraQC_prune

6)Code to extract the set of SNPs to keep in the dataset after LD pruning step above #!/bin/bash

#PBS -N plink

#PBS -M [email protected]

#PBS -m abe

#PBS -o SPS3_merged_Caitrin_extract_prune.out

172

#PBS -e SPS3_merged_Caitrin_extract_prune.err

#PBS -l nodes=1:ppn=1

#PBS -l pmem=15gb

#PBS -l walltime=4:00:00 cd $PBS_O_WORKDIR module load plink plink --bfile SPS3_merged_Caitrin_removed_SNP_and_duplicate_extraQC --extract

SPS3_merged_Caitrin_removed_SNP_and_duplicate_extraQC_prune.prune.in --make- bed --out SPS3_merged_Caitrin_removed_SNP_and_duplicate_pruned

6)Code to run the frequency command From the output of this step, you can obtain the list of SNPs on sex chromosomes. These SNPs on sex chromosomes will be removed from the file of high quality SNPs

#!/bin/bash

#PBS -N plink

#PBS -M [email protected]

#PBS -m abe

#PBS -o SPS3_merged_Caitrin_freq.out

#PBS -e SPS3_merged_Caitrin_freq.err

#PBS -l nodes=1:ppn=1

#PBS -l pmem=15gb

#PBS -l walltime=4:00:00 cd $PBS_O_WORKDIR

173

module load plink plink --bfile SPS3_merged_Caitrin_removed_SNP_and_duplicate_pruned --freq --out

SPS3_merged_Caitrin_removed_SNP_and_duplicate_pruned_freq

7)Code to remove SNPs on sex chromosome #!/bin/bash

#PBS -N plink

#PBS -M [email protected]

#PBS -m abe

#PBS -o SPS3_merged_Caitrin_remove_SEX_CH.out

#PBS -e SPS3_merged_Caitrin_remove_SEX_CH.err

#PBS -l nodes=1:ppn=1

#PBS -l pmem=15gb

#PBS -l walltime=4:00:00 cd $PBS_O_WORKDIR module load plink plink --bfile SPS3_merged_Caitrin_removed_SNP_and_duplicate_pruned --exclude merged_Caitrin_SEX_CH.txt --make-bed --out

SPS3_merged_Caitrin_extra_QC_2_050915

8)Code to run genome-wide IBS/IBD to detect duplicate or related samples #!/bin/bash

#PBS -N plink

#PBS -M [email protected]

#PBS -m abe

#PBS -o SPS3_merged_caitrin_IBS.out

174

#PBS -e SPS3_merged_caitrin_IBS.err

#PBS -l nodes=1:ppn=1

#PBS -l pmem=15gb

#PBS -l walltime=4:00:00 cd $PBS_O_WORKDIR module load plink plink --bfile SPS3_merged_Caitrin_extra_QC_2_050915 --genome --min 0.15 --out

SPS3_merged_caitrin_dup_removed_genome_min_051415

9)Code to get ped/map files to run population stratification using EIGENSTRAT #!/bin/bash

#PBS -N plink

#PBS -M [email protected]

#PBS -m abe

#PBS -o SPS3_merged_Caitrin_recode.out

#PBS -e SPS3_merged_Caitrin_recode.err

#PBS -l nodes=1:ppn=1

#PBS -l pmem=15gb

#PBS -l walltime=4:00:00 cd $PBS_O_WORKDIR module load plink plink --bfile SPS3_merged_Caitrin_extra_QC_2_050915 --recode --out

SPS3_merged_Caitrin_extra_QC_2_pruned_ped

10)Code to run PCA analysis in the whole dataset using EIGENSTRAT #!/bin/bash

175

#PBS -N plink

#PBS -M [email protected]

#PBS -m abe

#PBS -o SPS3_merged_Caitrin_PCA_All.out

#PBS -e SPS3_merged_Caitrin_PCA_All.err

#PBS -l nodes=1:ppn=1

#PBS -l pmem=15gb

#PBS -l walltime=4:00:0 cd $PBS_O_WORKDIR module load eigensoft smartpca.perl -i SPS3_merged_Caitrin_extra_QC_2_pruned_ped.ped -a

SPS3_merged_Caitrin_extra_QC_2_pruned_ped.map -b

SPS3_merged_Caitrin_extra_QC_2_pruned_ped.ped -k 10 -o

SPS3_merged_Caitrin_PCA_All.pca -p SPS3_merged_Caitrin_PCA_All.plot -e

SPS3_merged_Caitrin_PCA_All.eval -l SPS3_merged_Caitrin_PCA_All.log -m 0 -t 0 -s

0

11)Code to create a SNP file data for whites to run PCA in whites #!/bin/bash

#PBS -N plink

#PBS -M [email protected]

#PBS -m abe

#PBS -o SPS3_merged_Caitrin_White_2.out

#PBS -e SPS3_merged_Caitrin_White_2.err

#PBS -l nodes=1:ppn=1

176

#PBS -l pmem=15gb

#PBS -l walltime=4:00:00 cd $PBS_O_WORKDIR module load plink plink --bfile SPS3_merged_Caitrin_extra_QC_2_050915 –keep

SPS3_cleaned_Whites_FOR_PCA.txt --make-bed --out

SPS3_merged_Caitrin_Keep_White_1_051115

12)Code to get ped/map files in whites to run population stratification using EIGENSTRAT #!/bin/bash

#PBS -N plink

#PBS -M [email protected]

#PBS -m abe

#PBS -o SPS3_merged_Caitrin_recode_whites.out

#PBS -e SPS3_merged_Caitrin_recode_whites.err

#PBS -l nodes=1:ppn=1

#PBS -l pmem=15gb

#PBS -l walltime=4:00:00 cd $PBS_O_WORKDIR module load plink plink --bfile SPS3_merged_Caitrin_Keep_White_1_051115 --recode --out

SPS3_merged_Caitrin_White_ped_051115

13)Code to run PCA using EIGENSTRAT in whites #!/bin/bash

#PBS -N plink

177

#PBS -M [email protected]

#PBS -m abe

#PBS -o SPS3_merged_Caitrin_PCA_whites.out

#PBS -e SPS3_merged_Caitrin_PCA_whites.err

#PBS -l nodes=1:ppn=1

#PBS -l pmem=15gb

#PBS -l walltime=4:00:0 cd $PBS_O_WORKDIR module load eigensoft smartpca.perl -i SPS3_merged_Caitrin_White_ped_051115.ped -a

SPS3_merged_Caitrin_White_ped_051115.map -b

SPS3_merged_Caitrin_White_ped_051115.ped -k 10 -o

SPS3_merged_Caitrin_PCA_White.pca -p SPS3_merged_Caitrin_PCA_White.plot -e

SPS3_merged_Caitrin_PCA_White.eval -l SPS3_merged_Caitrin_PCA_White.log -m 0

-t 0 -s 0

14)Code to create a SNP file data for Hispanics to run PCA #!/bin/bash

#PBS -N plink

#PBS -M [email protected]

#PBS -m abe

#PBS -o SPS3_merged_Caitrin_Hispanics.out

#PBS -e SPS3_merged_Caitrin_Hispanics.err

#PBS -l nodes=1:ppn=1

#PBS -l pmem=15gb

178

#PBS -l walltime=4:00:00 cd $PBS_O_WORKDIR module load plink plink --bfile SPS3_merged_Caitrin_extra_QC_2_050915 --keep

SPS3_cleaned_Hispanics_FOR_PCA.txt --make-bed --out

SPS3_merged_Caitrin_Keep_Hisp_v1_051115

15)Code to get ped/map files in Hispanics to run population stratification using EIGENSTRAT #!/bin/bash

#PBS -N plink

#PBS -M [email protected]

#PBS -m abe

#PBS -o SPS3_merged_Caitrin_recode_Hispanics.out

#PBS -e SPS3_merged_Caitrin_recode_Hispanics.err

#PBS -l nodes=1:ppn=1

#PBS -l pmem=15gb

#PBS -l walltime=4:00:00 cd $PBS_O_WORKDIR module load plink plink --bfile SPS3_merged_Caitrin_Keep_Hisp_v1_051115 --recode --out

SPS3_merged_Caitrin_Hisp_ped_051115

16)Code to run PCA using EIGENSTRAT in Hispanics #!/bin/bash

#PBS -N plink

#PBS -M [email protected]

179

#PBS -m abe

#PBS -o SPS3_merged_Caitrin_PCA_Hispanics.out

#PBS -e SPS3_merged_Caitrin_PCA_Hispanics.err

#PBS -l nodes=1:ppn=1

#PBS -l pmem=15gb

#PBS -l walltime=4:00:0 cd $PBS_O_WORKDIR module load eigensoft smartpca.perl -i SPS3_merged_Caitrin_Hisp_ped_051115.ped -a

SPS3_merged_Caitrin_Hisp_ped_051115.map -b

SPS3_merged_Caitrin_Hisp_ped_051115.ped -k 10 -o

SPS3_merged_Caitrin_PCA_Hisp.pca -p SPS3_merged_Caitrin_PCA_Hisp.plot -e

SPS3_merged_Caitrin_PCA_Hisp.eval -l SPS3_merged_Caitrin_PCA_Hisp.log -m 0 -t

0 -s 0

17)Code to create a SNP file data for African Americans to run PCA #!/bin/bash

#PBS -N plink

#PBS -M [email protected]

#PBS -m abe

#PBS -o SPS3_merged_Caitrin_AA.out

#PBS -e SPS3_merged_Caitrin_AA.err

#PBS -l nodes=1:ppn=1

#PBS -l pmem=15gb

#PBS -l walltime=4:00:00

180

cd $PBS_O_WORKDIR module load plink plink --bfile SPS3_merged_Caitrin_extra_QC_2_050915 --keep

SPS3_merged_Caitrin_Duplicate_removed_African_051115.txt --make-bed --out

SPS3_merged_Caitrin_Keep_African

18)Code to get ped/map files in African Americans to run population stratification using EIGENSTRAT #!/bin/bash

#PBS -N plink

#PBS -M [email protected]

#PBS -m abe

#PBS -o SPS3_merged_Caitrin_recode_AA.out

#PBS -e SPS3_merged_Caitrin_recode_AA.err

#PBS -l nodes=1:ppn=1

#PBS -l pmem=15gb

#PBS -l walltime=4:00:00 cd $PBS_O_WORKDIR module load plink plink --bfile SPS3_merged_Caitrin_Keep_African --recode --out

SPS3_merged_Caitrin_African_ped_051115

19)Code to run PCA using EIGENSTRAT in African Americans #!/bin/bash

#PBS -N plink

#PBS -M [email protected]

#PBS -m abe

181

#PBS -o SPS3_merged_Caitrin_PCA_AA.out

#PBS -e SPS3_merged_Caitrin_PCA_AA.err

#PBS -l nodes=1:ppn=1

#PBS -l pmem=15gb

#PBS -l walltime=4:00:0 cd $PBS_O_WORKDIR module load eigensoft smartpca.perl -i SPS3_merged_Caitrin_African_ped_051115.ped -a

SPS3_merged_Caitrin_African_ped_051115.map -b

SPS3_merged_Caitrin_African_ped_051115.ped -k 10 -o

SPS3_merged_Caitrin_PCA_African.pca -p SPS3_merged_Caitrin_PCA_African.plot -e

SPS3_merged_Caitrin_PCA_African.eval -l SPS3_merged_Caitrin_PCA_African.log -m

0 -t 0 -s 0

Imputation 1)R code to remove duplicates from the clean genotyped SNP file SPS3_bim<- read.table("SPS3_merged_Caitrin_sex_updated_051315.bim", header=FALSE) i)Add headers to the table of duplicate SNPs colnames(SPS3_bim) <-c("Chr", "SNP", "morgan", "Bp","A1", "A2")

ii)extract duplicate SNPS dup<-SPS3_bim_S[duplicated(SPS3_bim_S[c(1,4)]),] iii)Output a table with the duplicates write.table(dup, "SPS3_bim_dups_120415.txt", quote = FALSE, sep =

"\t",row.names=FALSE)

2)Plink code to exclude the duplicate SNPs

182

--bfile SPS3_merged_Caitrin_sex_updated_051315 --exclude

SPS3_bim_dups_120415.txt --make-bed --out SPS3_uniq_clean_120415

3)Flip negative strand SNPs Obtain the list of negative strand SNPs from the illumina manifest file i)Use this command to remove the header of dataset and obtain the needed columns cat HumanOmni5Exome-4v1_B1.csv |cut -s -f2,10,11,20 -d","|tail -n+6

>Illumuna_snps_020616 ii)use this command to only extract the negative strand SNP awk -F',' '$4 == "-"' Illumina_negativestrand_abbrev_v1_122115.txt >

Illumina_negativestrand_122315.txt iii)Use this command to convert the comma separated file into tab delimited file sed 's/,/\t/g' Illumina_negativestrand_020616.txt > SPS3_negativeSNPs_v0_123015.txt iv)Sort the file that has the negative strand SNPs and only keep unique SNPs sort

SPS3_negativeSNPs_v0_123015.txt|uniq >SPS3_negativeSNPs_unique_123015.txt v)PLINK code to flip the negative strand --bfile SPS3_uniq_cleaned_chr_pos_120415 –flip

SPS3_negativeSNPs_unique_123015.txt --make-bed --out

SPS3_uniq_chr_pos_flip_123015

4)Split the dataset by chromosome using plink

Loop to use in PLINK to split the data by chromosomes #!/bin/bash

#

#PBS -N split

#PBS -M [email protected]

#PBS -m abe

183

#PBS -o split_sps3_bychr.out

#PBS -e split_sps3_bychr.err

#PBS -l nodes=1:ppn=8

#PBS -l pmem=3gb

#PBS -l walltime=4:00:00 cd $PBS_O_WORKDIR module load plink for chr in $(seq 1 22); do plink --bfile SPS3_uniq_chr_pos_flip_123015 --chr $chr --recode --out

SPS3_ALL_chr$chr; done

5)Phase the data using PHASEIT i)Download the genetic map of HapMap III and then unzip the dataset

From the shapeit software website, download the genetic map file for Hapmap phase II b37: https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html#gmap tar -xzvf genetic_map_b37.tar.gz ii) Phase the dataset using Shapeit It is better to do phasing for one chromosome to get an estimate of the time. Start with chromosome 22, then use a loop to do phasing for all the chromosomes

#!/bin/bash

#

#PBS -N phase

#PBS -M [email protected]

#PBS -m abe

184

#PBS -o phase_SPS3_All_chr.out

#PBS -e phase_SPS3_All_chr.err

#PBS -l nodes=1:ppn=1

#PBS -l pmem=15gb

#PBS -l walltime=24:00:00 cd $PBS_O_WORKDIR module load shapeit shapeit --input-ped SPS3_ALL_chr22.ped SPS3_ALL_chr22.map -M genetic_map_b37/genetic_map_chr22_combined_b37.txt -O SPS3_ALL_chr22.phased iii)Convert the phased chromosome into VCF file using SHAPIT #!/bin/bash

#

#PBS -N convertvcf

#PBS -M [email protected]

#PBS -m abe

#PBS -o convertvcf_SPS3_chr22.out

#PBS -e convertvcf_SPS3_chr22.err

#PBS -l nodes=1:ppn=8

#PBS -l pmem=15gb

#PBS -l walltime=24:00:00 cd $PBS_O_WORKDIR module load shapeit shapeit -convert --input-haps SPS3_ALL_chr22.phased --output-vcf

SPS3_ALL_chr22.phased.vcf

185

iv) Loop to phase the whole dataset (22 chromosomes) #!/bin/bash

#

#PBS -N phase

#PBS -M [email protected]

#PBS -m abe

#PBS -o phase_SPS3_All_chr.out

#PBS -e phase_SPS3_All_chr.err

#PBS -l nodes=1:ppn=1

#PBS -l pmem=50gb

#PBS -l walltime=264:00:00

#PBS -t 1-22 cd $PBS_O_WORKDIR module load shapeit shapeit --input-ped SPS3_ALL_chr${PBS_ARRAYID}.ped

SPS3_ALL_chr${PBS_ARRAYID}.map -M genetic_map_b37/genetic_map_chr${PBS_ARRAYID}_combined_b37.txt -O

SPS3_ALL_chr${PBS_ARRAYID}.phased v)Code to convert the 22 phased chromosomes into VCF files #!/bin/bash

#

#PBS -N convert

#PBS -M [email protected]

#PBS -m abe

186

#PBS -o convert_SPS3_All_chr.out

#PBS -e convert_SPS3_All_chr.err

#PBS -l nodes=1:ppn=8

#PBS -l pmem=15gb

#PBS -l walltime=24:00:00

#PBS -t 1-22 cd $PBS_O_WORKDIR module load shapeit shapeit -convert --input-haps SPS3_ALL_chr${PBS_ARRAYID}.phased --output-vcf

SPS3_ALL_chr${PBS_ARRAYID}.vcf

6)Imputation using Minimac3

First, download the 1000 Genomes Reference panel ( chr1-22; X) with the parameter estimates. Download the M3VCF format from Minimac3 website

(http://genome.sph.umich.edu/wiki/Minimac3)

Code to impute the 22 chromosomes using Minimac3

Again, it is better to try out one chromosome first to get an estimate of the time.

We provide below the loop for imputing the 22 chromosomes.

#!/bin/bash

#

#PBS -N Minimac

#PBS -M [email protected]

#PBS -m abe

#PBS -o Minimac_SPS3_All_chr.out

187

#PBS -e Minimac_SPS3_All_chr.err

#PBS -l nodes=1:ppn=1

#PBS -l pmem=4gb

#PBS -l walltime=24:00:00

#PBS -t 1-22 cd $PBS_O_WORKDIR module load minimac/3-20150727

Minimac3 --refHaps

${PBS_ARRAYID}.1000g.Phase3.v5.With.Parameter.Estimates.m3vcf.gz --haps

SPS3_ALL_chr${PBS_ARRAYID}.vcf --prefix

SPS3_ALL_chr${PBS_ARRAYID}_imputed > mimimac3_SPS3_chr${PBS_ARRAYID}.log

7)Python script to filter the imputed files according to R2 and MAF filters The user can use the script and change the cut-off for R2 or MAF parser = argparse.ArgumentParser() parser.add_argument("-i", help="input file") parser.add_argument("-o", help="output file") parser.add_argument("-r", help="Filter by R2 value, default=0.0", default=0.0) parser.add_argument("-m", help="Filter by MAF value, default=0.0", default=0.0) parser.add_argument("-f", help="Print full content of input file, only filtering by R2 and

MAF. Default=0 (no, summarize).", default=0) parser.add_argument("-z", help="gzip output file, default=0, set to 1 for gzip", default=0) args = parser.parse_args()

InFile = args.i

188

OutFile = args.o

R2FliterValue = args.r

MAFFilterValue= args.m

PrintFull= int(args.f) gzipOut= int(args.z)

#Open Input files try:

IN=gzip.open(InFile, 'rb') except IOError:

print "Can't open file", InFile

#Open output file if gzipOut == 1:

try:

OUT=gzip.open(OutFile, 'wb')

except IOError:

print "Can't open file", OutFile else:

try:

OUT=open(OutFile, 'w')

except IOError:

print "Can't open file", OutFile for Line in IN:

if Line.startswith("#"):

189

if PrintFull == 1:

OUT.write(Line)

else:

Line.strip('\n')

Line_bits=re.split('\t', Line)

ID=Line_bits[2]

RefAllele=Line_bits[3]

AltAllele=Line_bits[4]

Qual=Line_bits[5]

Filter=Line_bits[6]

Info=Line_bits[7]

Filter_bits=re.split(';', Info) #MAF=0.00639;R2=0.00000

MAF=re.split('=',Filter_bits[0])[1]

R2value=re.split('=',Filter_bits[1])[1]

if (PrintFull == 1) and (R2value >= R2FliterValue) and (MAF >=

MAFFilterValue):

OUT.write(Line)

else:

if (R2value >= R2FliterValue) and (MAF >= MAFFilterValue):

OUT.write("%s\t%s\t%s\t%s\t%s\t%s\t%s\n" %(ID, RefAllele,

AltAllele, Qual, Filter, MAF, R2value))

8)Running GWAS analysis using EPACTS Prepare the phenotype file first (Consult EPACTS on how to prepare phenotype file)

190

i)Use Tabix to index the files cd $PBS_O_WORKDIR module load samtools for i in {1..22} do tabix -pvcf -f SPS3_ALL_chr${i}_imputed.dose.vcf.gz done ii)Loop code to run the association using EPACTS #!/bin/bash

#

#PBS -N SPS3_Wh

#PBS -M [email protected]

#PBS -m abe

#PBS -o SPS3_Wh_assoc_dose_wald.out

#PBS -e SPS3_Wh_assoc_dose_wald.err

#PBS -l nodes=1:ppn=8

#PBS -l pmem=15gb

#PBS -l walltime=168:00:00

#PBS -t 1-22 cd $PBS_O_WORKDIR module load epacts epacts single --vcf SPS3_ALL_chr${PBS_ARRAYID}_imputed.dose.vcf.gz --ped

SPS3_RHTN_Wh_Pheno_EPACTS_013016.ped --out

SPS3_Wh_assoc_wald_chr${PBS_ARRAYID}_013116 --test b.wald --pheno DISEASE

--cov AGE --cov SEX --cov ASSIGN --cov DIAB --cov CHF --cov MI --cov PVD --cov

PC1_W --cov BMI --anno --min-mac 1 --field DS --run 10

191

iii)Python script to parse the association file and extract the fields like A1, A2, MAF, Beta if the user is planning to run METAL

# Options

#

# -i input file

# -o output file

# -f filter loci with MAF >= value (Default is no filtering) parser = argparse.ArgumentParser() parser.add_argument("-i", help="Input file, a gziped INVEST file") parser.add_argument("-o", help="Output file for Metal, will be gziped") parser.add_argument("-f", help="Filter loci, keeping those with MAF >= vlaue, Defualt=0

(no filtering)", default=0.0) args = parser.parse_args() infile = args.i outfile = args.o

MAFfilter= float(args.f) try:

IN=gzip.open(infile, 'rb') except IOError:

print ("Can't open in file:", infile) try:

OUT=gzip.open(outfile, 'wb') except IOError:

192

print ("Can't open out file:", outfile)

Header=1 for Line in IN:

if Header==1:#skip the header row and add header to output file

Header=0

OUT.write("#CHROM MARKER_ID NonRefAllele RefAllele MAF

PVALUE BETA SEBETA ZSTAT\n")

else:

Line.strip('\n')

Line_bits=re.split('\t', Line)

if float(Line_bits[7]) >= MAFfilter: #If the Minnor Allele Frequency

(Line_bits[7]) is > the set MAFfilter.

NonRefAllele=re.search('\_(.+?)\/', Line_bits[3]) #Line_bits[3] is the

Marker name column. Get the first allele

RefAllele=re.search('\/(.+?)\_', Line_bits[3]) #Get the second allele

#CHROM MARKER_ID NonRefAllele RefAllele

MAF

try:

OUT.write("%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\n"

%(Line_bits[0],Line_bits[3],NonRefAllele.group(1),RefAllele.group(1),Line_bits[7],Line_b its[8],Line_bits[9],Line_bits[10],Line_bits[11]) )

except:

print ("No Allele information for %s\n" %(Line_bits[3]))

193

OUT.close() iv)A bash script that has a loop to run the python script and parse all 22 chromosomes User specifies the input file in –i, and the output file in –o.User can set the filter f to any MAF

#!/bin/bash

#

#PBS -N SPS3_Wh_loop_filter

#PBS -M [email protected]

#PBS -m abe

#PBS -o SPS3_wh_loop_pyth_v1.out

#PBS -e SPS3_wh_loop_pyth_v1.err

#PBS -l nodes=1:ppn=1

#PBS -l pmem=4gb

#PBS -l walltime=4:00:00

#PBS -t 1-22 cd $PBS_O_WORKDIR module load python python SPS3_to_Metal_020616.py -i

SPS3_Wh_assoc_wald_chr${PBS_ARRAYID}_013116.epacts.gz -o

SPS3_Wh_Metal_chr${PBS_ARRAYID}_030716.txt.gz -f 0.019 v)A loop in Bash to unzip all the parsed 22 chromosomes from the step above #!/bin/bash #

#PBS -N unzip_wh_030716

194

#PBS -M [email protected]

#PBS -m abe

#PBS -o unzip_Wh_030716.out

#PBS -e unzip_Wh_030716.err

#PBS -l nodes=1:ppn=1

#PBS -l pmem=4gb

#PBS -l walltime=2:00:00

#PBS -t 1-22 cd $PBS_O_WORKDIR gunzip SPS3_Wh_Metal_chr${PBS_ARRAYID}_030716.txt.gz vi)Unix codes to concatenate all 22 chromosomes for whites

First, output the header of the chromosomes

Head –n 1 SPS3_Wh_Chr1_030716.txt > SPS3_Wh_ALLCHR.txt

Second, concatenate all the chromosomes together (1-22)

#!/bin/bash

#

#PBS -N mergingchromosomes_030716

#PBS -M [email protected]

#PBS -m abe

#PBS -o mergingwhitechr_v1.out

#PBS -e mergingwhitechr_v1.err

#PBS -l nodes=1:ppn=1

#PBS -l pmem=4gb

195

#PBS -l walltime=2:00:00

#PBS -t 1-22 cd $PBS_O_WORKDIR tail -n+2 SPS3_Wh_Metal_chr${PBS_ARRAYID}_030716.txt>>

SPS3_Wh_ALLCHR_formetal_030716.txt

User can run EPACTS association analysis for other datasets, and repeat steps (iii-vi) to produce input files needed for METAL. Here, we repeated the steps for Hispanics followed by meta-analysis between whites and Hispanics using the provided METAL script

Bash script to run python script through all the chromosomes of Hispanics #!/bin/bash

#

#PBS -N SPS3_Hisp_loop_filter

#PBS -M [email protected]

#PBS -m abe

#PBS -o SPS3_Hisp_loop_pyth_v1.out

#PBS -e SPS3_Hisp_loop_pyth_v1.err

#PBS -l nodes=1:ppn=1

#PBS -l pmem=4gb

#PBS -l walltime=4:00:00

#PBS -t 1-22 cd $PBS_O_WORKDIR module load python

196

python SPS3_to_Metal_020616.py -i

SPS3_Hisp_assoc_wald_chr${PBS_ARRAYID}_020116.epacts.gz -o

SPS3_Hisp_Metal_chr${PBS_ARRAYID}_030716.txt.gz -f 0.016

A loop in Bash to unzip all files #!/bin/bash

#

#PBS -N unzip_Hisp_030716

#PBS -M [email protected]

#PBS -m abe

#PBS -o unzip_Hisp_v1.out

#PBS -e unzip_Hisp_v1.err

#PBS -l nodes=1:ppn=1

#PBS -l pmem=4gb

#PBS -l walltime=2:00:00

#PBS -t 1-22 cd $PBS_O_WORKDIR gunzip SPS3_Hisp_Metal_chr${PBS_ARRAYID}_030716.txt.gz

Unix codes to concatenate all chromosomes for Hispanics First, output the header of the chromosomes

Head –n 1 SPS3_Hisp_Metal_chr_030716.txt >

SPS3_Hisp_ALLCHR_formetal_072616.txt

Second, concatenate all the chromosomes together (1-22)

#!/bin/sh

#SBATCH --job-name=Hisp_SPS3_mergingchromosomes

197

#SBATCH --mail-type=ALL

#SBATCH [email protected]

#SBATCH --nodes=1

#SBATCH --ntasks=1

#SBATCH --mem-per-cpu=1gb

#SBATCH -t 00:45:00

#SBATCH -o mergingHisp_SPS3.out

#SBATCH -e mergingHisp_SPS3.err

#SBATCH --array=1-22 tail -n+2 SPS3_Hisp_Metal_chr${SLURM_ARRAY_TASK_ID}_030716.txt>>

SPS3_Hisp_ALLCHR_formetal_072616.txt

9) Running Meta-analysis in METAL using the following script (Metal_script) User can run the script on the cluster after loading the METAL program, and then typing metal Metal_script

MARKER MARKER_ID

ALLELE RefAllele NonRefAllele

FREQ MAF

EFFECT BETA

PVALUE PVALUE

SCHEME STDERR

STDERR SEBETA

SEPARATOR TAB

PROCESS SPS3_Wh_Metal_ALLCHR_030716.txt

MARKER MARKER_ID

198

ALLELE RefAllele NonRefAllele

FREQ MAF

EFFECT BETA

PVALUE PVALUE

SCHEME STDERR

STDERR SEBETA

SEPARATOR TAB

PROCESS SPS3_Hisp_ALLCHR_formetal_072616.txt

OUTFILE SPS3_Minimac_Wh_Hisp_ ALLCHR .tbl

ANALYZE HETEROGENEITY

199

LIST OF REFERENCES

1. Nwankwo, T, Yoon, SS, Burt, V, Gu, Q: Hypertension among adults in the United States: National Health and Nutrition Examination Survey, 2011-2012. NCHS Data Brief: 1-8, 2013.

2. Joffres, M, Falaschetti, E, Gillespie, C, Robitaille, C, Loustalot, F, Poulter, N, McAlister, FA, Johansen, H, Baclic, O, Campbell, N: Hypertension prevalence, awareness, treatment and control in national surveys from England, the USA and Canada, and correlation with stroke and ischaemic heart disease mortality: a cross-sectional study. BMJ Open, 3: e003423, 2013.

3. Mozaffarian, D, Benjamin, EJ, Go, AS, Arnett, DK, Blaha, MJ, Cushman, M, Das, SR, de Ferranti, S, Després, JP, Fullerton, HJ, Howard, VJ, Huffman, MD, Isasi, CR, Jiménez, MC, Judd, SE, Kissela, BM, Lichtman, JH, Lisabeth, LD, Liu, S, Mackey, RH, Magid, DJ, McGuire, DK, Mohler, ER, Moy, CS, Muntner, P, Mussolino, ME, Nasir, K, Neumar, RW, Nichol, G, Palaniappan, L, Pandey, DK, Reeves, MJ, Rodriguez, CJ, Rosamond, W, Sorlie, PD, Stein, J, Towfighi, A, Turan, TN, Virani, SS, Woo, D, Yeh, RW, Turner, MB, Members, WG, Committee, AHAS, Subcommittee, SS: Executive Summary: Heart Disease and Stroke Statistics--2016 Update: A Report From the American Heart Association. Circulation, 133: 447-454, 2016.

4. Mozaffarian, D, Benjamin, EJ, Go, AS, Arnett, DK, Blaha, MJ, Cushman, M, Das, SR, de Ferranti, S, Després, JP, Fullerton, HJ, Howard, VJ, Huffman, MD, Isasi, CR, Jiménez, MC, Judd, SE, Kissela, BM, Lichtman, JH, Lisabeth, LD, Liu, S, Mackey, RH, Magid, DJ, McGuire, DK, Mohler, ER, Moy, CS, Muntner, P, Mussolino, ME, Nasir, K, Neumar, RW, Nichol, G, Palaniappan, L, Pandey, DK, Reeves, MJ, Rodriguez, CJ, Rosamond, W, Sorlie, PD, Stein, J, Towfighi, A, Turan, TN, Virani, SS, Woo, D, Yeh, RW, Turner, MB, Members, WG, Committee, AHAS, Subcommittee, SS: Heart Disease and Stroke Statistics-2016 Update: A Report From the American Heart Association. Circulation, 133: e38- 360, 2016.

5. Sundström, J, Arima, H, Jackson, R, Turnbull, F, Rahimi, K, Chalmers, J, Woodward, M, Neal, B, Collaboration, BPLTT: Effects of blood pressure reduction in mild hypertension: a systematic review and meta-analysis. Ann Intern Med, 162: 184- 191, 2015.

6. Sundström, J, Arima, H, Woodward, M, Jackson, R, Karmali, K, Lloyd-Jones, D, Baigent, C, Emberson, J, Rahimi, K, MacMahon, S, Patel, A, Perkovic, V, Turnbull, F, Neal, B, Collaboration, BPLTT: Blood pressure-lowering treatment based on cardiovascular risk: a meta-analysis of individual patient data. Lancet, 384: 591-598, 2014.

200

7. Egan, BM, Zhao, Y, Axon, RN: US trends in prevalence, awareness, treatment, and control of hypertension, 1988-2008. JAMA, 303: 2043-2050, 2010.

8. Calhoun, DA, Nishizaka, MK, Zaman, MA, Thakkar, RB, Weissmann, P: Hyperaldosteronism among black and white subjects with resistant hypertension. Hypertension, 40: 892-896, 2002.

9. Gaddam, KK, Nishizaka, MK, Pratt-Ubunama, MN, Pimenta, E, Aban, I, Oparil, S, Calhoun, DA: Characterization of resistant hypertension: association between resistant hypertension, aldosterone, and persistent intravascular volume expansion. Arch Intern Med, 168: 1159-1164, 2008.

10. Oliva, RV, Bakris, GL: Sympathetic activation in resistant hypertension: theory and therapy. Semin Nephrol, 34: 550-559, 2014.

11. Tsioufis, C, Kordalis, A, Flessas, D, Anastasopoulos, I, Tsiachris, D, Papademetriou, V, Stefanadis, C: Pathophysiology of resistant hypertension: the role of sympathetic nervous system. Int J Hypertens, 2011: 642416, 2011.

12. Modolo, R, de Faria, AP, Moreno, H: Resistant hypertension: a volemic or nervous matter? J Am Soc Hypertens, 9: 408-409, 2015.

13. Barbaro, NR, Fontana, V, Modolo, R, De Faria, AP, Sabbatini, AR, Fonseca, FH, Anhê, GF, Moreno, H: Increased arterial stiffness in resistant hypertension is associated with inflammatory biomarkers. Blood Press, 24: 7-13, 2015.

14. Salles, GF, Fiszman, R, Cardoso, CR, Muxfeldt, ES: Relation of left ventricular hypertrophy with systemic inflammation and endothelial damage in resistant hypertension. Hypertension, 50: 723-728, 2007.

15. Whaley-Connell, A, Johnson, MS, Sowers, JR: Aldosterone: role in the cardiometabolic syndrome and resistant hypertension. Prog Cardiovasc Dis, 52: 401-409, 2010.

16. Weinshilboum, R, Wang, L: Pharmacogenomics: bench to bedside. Nat Rev Drug Discov, 3: 739-748, 2004.

17. Weinshilboum, R: Inheritance and drug response. N Engl J Med, 348: 529-537, 2003.

18. Hiltunen, TP, Donner, KM, Sarin, AP, Saarela, J, Ripatti, S, Chapman, AB, Gums, JG, Gong, Y, Cooper-DeHoff, RM, Frau, F, Glorioso, V, Zaninello, R, Salvi, E, Glorioso, N, Boerwinkle, E, Turner, ST, Johnson, JA, Kontula, KK: Pharmacogenomics of hypertension: a genome‐wide, placebo‐controlled cross‐ over study, using four classes of antihypertensive drugs. J Am Heart Assoc, 4: e001521, 2015.

201

19. Duarte, JD, Turner, ST, Tran, B, Chapman, AB, Bailey, KR, Gong, Y, Gums, JG, Langaee, TY, Beitelshees, AL, Cooper-Dehoff, RM, Boerwinkle, E, Johnson, JA: Association of chromosome 12 locus with antihypertensive response to hydrochlorothiazide may involve differential YEATS4 expression. Pharmacogenomics J, 13: 257-263, 2013.

20. Turner, ST, Bailey, KR, Schwartz, GL, Chapman, AB, Chai, HS, Boerwinkle, E: Genomic association analysis identifies multiple loci influencing antihypertensive response to an angiotensin II receptor blocker. Hypertension, 59: 1204-1211, 2012.

21. Gong, Y, McDonough, CW, Wang, Z, Hou, W, Cooper-DeHoff, RM, Langaee, TY, Beitelshees, AL, Chapman, AB, Gums, JG, Bailey, KR, Boerwinkle, E, Turner, ST, Johnson, JA: Hypertension susceptibility loci and blood pressure response to antihypertensives: results from the pharmacogenomic evaluation of antihypertensive responses study. Circ Cardiovasc Genet, 5: 686-691, 2012.

22. Turner, ST, Boerwinkle, E, O'Connell, JR, Bailey, KR, Gong, Y, Chapman, AB, McDonough, CW, Beitelshees, AL, Schwartz, GL, Gums, JG, Padmanabhan, S, Hiltunen, TP, Citterio, L, Donner, KM, Hedner, T, Lanzani, C, Melander, O, Saarela, J, Ripatti, S, Wahlstrand, B, Manunta, P, Kontula, K, Dominiczak, AF, Cooper-DeHoff, RM, Johnson, JA: Genomic association analysis of common variants influencing antihypertensive response to hydrochlorothiazide. Hypertension, 62: 391-397, 2013.

23. Studer, RA, Person, E, Robinson-Rechavi, M, Rossier, BC: Evolution of the epithelial sodium channel and the sodium pump as limiting factors of aldosterone action on sodium transport. Physiol Genomics, 43: 844-854, 2011.

24. Bubien, JK: Epithelial Na+ channel (ENaC), hormones, and hypertension. J Biol Chem, 285: 23527-23531, 2010.

25. Sowers, JR, Whaley-Connell, A, Epstein, M: Narrative review: the emerging clinical implications of the role of aldosterone in the metabolic syndrome and resistant hypertension. Ann Intern Med, 150: 776-783, 2009.

26. Hummler, E: Epithelial sodium channel, salt intake, and hypertension. Curr Hypertens Rep, 5: 11-18, 2003.

27. Garty, H, Palmer, LG: Epithelial sodium channels: function, structure, and regulation. Physiol Rev, 77: 359-396, 1997.

202

28. Rayner, BL, Owen, EP, King, JA, Soule, SG, Vreede, H, Opie, LH, Marais, D, Davidson, JS: A new mutation, R563Q, of the beta subunit of the epithelial sodium channel associated with low-renin, low-aldosterone hypertension. J Hypertens, 21: 921-926, 2003.

29. Eide, IK, Torjesen, PA, Drolsum, A, Babovic, A, Lilledahl, NP: Low-renin status in therapy-resistant hypertension: a clue to efficient treatment. J Hypertens, 22: 2217-2226, 2004.

30. Spence, JD: Physiologic tailoring of treatment in resistant hypertension. Curr Cardiol Rev, 6: 119-123, 2010.

31. Ori, Y, Chagnac, A, Korzets, A, Zingerman, B, Herman-Edelstein, M, Bergman, M, Gafter, U, Salman, H: Regression of left ventricular hypertrophy in patients with primary aldosteronism/low-renin hypertension on low-dose spironolactone. Nephrol Dial Transplant, 28: 1787-1793, 2013.

32. Nishizaka, MK, Zaman, MA, Calhoun, DA: Efficacy of low-dose spironolactone in subjects with resistant hypertension. Am J Hypertens, 16: 925-930, 2003.

33. Oxlund, CS, Henriksen, JE, Tarnow, L, Schousboe, K, Gram, J, Jacobsen, IA: Low dose spironolactone reduces blood pressure in patients with resistant hypertension and type 2 diabetes mellitus: a double blind randomized clinical trial. J Hypertens, 31: 2094-2102, 2013.

34. Spence, JD: Lessons from Africa: the importance of measuring plasma renin and aldosterone in resistant hypertension. Can J Cardiol, 28: 254-257, 2012.

35. Jones, ES, Owen, EP, Rayner, BL: The association of the R563Q genotype of the ENaC with phenotypic variation in Southern Africa. Am J Hypertens, 25: 1286- 1291, 2012.

36. Strushkevich, N, Gilep, AA, Shen, L, Arrowsmith, CH, Edwards, AM, Usanov, SA, Park, HW: Structural insights into aldosterone synthase substrate specificity and targeted inhibition. Mol Endocrinol, 27: 315-324, 2013.

37. Alvarez-Madrazo, S, Mackenzie, SM, Davies, E, Fraser, R, Lee, WK, Brown, M, Caulfield, MJ, Dominiczak, AF, Farrall, M, Lathrop, M, Hedner, T, Melander, O, Munroe, PB, Samani, N, Stewart, PM, Wahlstrand, B, Webster, J, Palmer, CN, Padmanabhan, S, Connell, JM: Common polymorphisms in the CYP11B1 and CYP11B2 genes: evidence for a digenic influence on hypertension. Hypertension, 61: 232-239, 2013.

203

38. Fontana, V, de Faria, AP, Barbaro, NR, Sabbatini, AR, Modolo, R, Lacchini, R, Moreno, H: Modulation of aldosterone levels by -344 C/T CYP11B2 polymorphism and spironolactone use in resistant hypertension. J Am Soc Hypertens, 8: 146-151, 2014.

39. Ubaid-Girioli, S, Adriana de Souza, L, Yugar-Toledo, JC, Martins, LC, Ferreira-Melo, S, Coelho, OR, Sierra, C, Coca, A, Pimenta, E, Moreno, H: Aldosterone excess or escape: Treating resistant hypertension. J Clin Hypertens (Greenwich), 11: 245-252, 2009.

40. Zordoky, BN, El-Kadi, AO: Effect of cytochrome P450 polymorphism on arachidonic acid metabolism and their impact on cardiovascular diseases. Pharmacol Ther, 125: 446-463, 2010.

41. Zhang, R, Lu, J, Hu, C, Wang, C, Yu, W, Ma, X, Bao, Y, Xiang, K, Guan, Y, Jia, W: A common polymorphism of CYP4A11 is associated with blood pressure in a Chinese population. Hypertens Res, 34: 645-648, 2011.

42. Gainer, JV, Lipkowitz, MS, Yu, C, Waterman, MR, Dawson, EP, Capdevila, JH, Brown, NJ, Group, AS: Association of a CYP4A11 variant and blood pressure in black men. J Am Soc Nephrol, 19: 1606-1612, 2008.

43. Williams, JS, Hopkins, PN, Jeunemaitre, X, Brown, NJ: CYP4A11 T8590C polymorphism, salt-sensitive hypertension, and renal blood flow. J Hypertens, 29: 1913-1918, 2011.

44. Laffer, CL, Elijovich, F, Eckert, GJ, Tu, W, Pratt, JH, Brown, NJ: Genetic variation in CYP4A11 and blood pressure response to mineralocorticoid receptor antagonism or ENaC inhibition: an exploratory pilot study in African Americans. J Am Soc Hypertens, 8: 475-480, 2014.

45. Vongpatanasin, W: Resistant hypertension: a review of diagnosis and management. JAMA, 311: 2216-2224, 2014.

46. Pimenta, E, Gaddam, KK, Oparil, S, Aban, I, Husain, S, Dell'Italia, LJ, Calhoun, DA: Effects of dietary sodium reduction on blood pressure in subjects with resistant hypertension: results from a randomized trial. Hypertension, 54: 475-481, 2009.

47. Kumar, N, Calhoun, DA, Dudenbostel, T: Management of patients with resistant hypertension: current treatment options. Integr Blood Press Control, 6: 139-151, 2013.

48. Hall, JE, Granger, JP, do Carmo, JM, da Silva, AA, Dubinion, J, George, E, Hamza, S, Speed, J, Hall, ME: Hypertension: physiology and pathophysiology. Compr Physiol, 2: 2393-2442, 2012.

204

49. Armando, I, Villar, VA, Jose, PA: Genomics and Pharmacogenomics of Salt- sensitive Hypertension. Curr Hypertens Rev, 11: 49-56, 2015.

50. Cusi, D, Barlassina, C, Azzani, T, Casari, G, Citterio, L, Devoto, M, Glorioso, N, Lanzani, C, Manunta, P, Righetti, M, Rivera, R, Stella, P, Troffa, C, Zagato, L, Bianchi, G: Polymorphisms of alpha-adducin and salt sensitivity in patients with essential hypertension. Lancet, 349: 1353-1357, 1997.

51. Matayoshi, T, Kamide, K, Takiuchi, S, Yoshii, M, Miwa, Y, Takami, Y, Tanaka, C, Banno, M, Horio, T, Nakamura, S, Nakahama, H, Yoshihara, F, Inenaga, T, Miyata, T, Kawano, Y: The thiazide-sensitive Na(+)-Cl(-) cotransporter gene, C1784T, and adrenergic receptor-beta3 gene, T727C, may be gene polymorphisms susceptible to the antihypertensive effect of thiazide diuretics. Hypertens Res, 27: 821-833, 2004.

52. Turner, ST, Chapman, AB, Schwartz, GL, Boerwinkle, E: Effects of endothelial nitric oxide synthase, alpha-adducin, and other candidate gene polymorphisms on blood pressure response to hydrochlorothiazide. Am J Hypertens, 16: 834-839, 2003.

53. Schelleman, H, Klungel, OH, Witteman, JC, Hofman, A, van Duijn, CM, de Boer, A, Stricker, BH: The influence of the alpha-adducin G460W polymorphism and angiotensinogen M235T polymorphism on antihypertensive medication and blood pressure. Eur J Hum Genet, 14: 860-866, 2006.

54. Schunkert, H, Hense, HW, Döring, A, Riegger, GA, Siffert, W: Association between a polymorphism in the G protein beta3 subunit gene and lower renin and elevated diastolic blood pressure levels. Hypertension, 32: 510-513, 1998.

55. Turner, ST, Schwartz, GL, Chapman, AB, Boerwinkle, E: C825T polymorphism of the G protein beta(3)-subunit and antihypertensive response to a thiazide diuretic. Hypertension, 37: 739-743, 2001.

56. Schelleman, H, Stricker, BH, Verschuren, WM, de Boer, A, Kroon, AA, de Leeuw, PW, Kromhout, D, Klungel, OH: Interactions between five candidate genes and antihypertensive drug therapy on blood pressure. Pharmacogenomics J, 6: 22- 26, 2006.

57. Dahlberg, J, Nilsson, LO, von Wowern, F, Melander, O: Polymorphism in NEDD4L is associated with increased salt sensitivity, reduced levels of P-renin and increased levels of Nt-proANP. PLoS One, 2: e432, 2007.

58. Russo, CJ, Melista, E, Cui, J, DeStefano, AL, Bakris, GL, Manolis, AJ, Gavras, H, Baldwin, CT: Association of NEDD4L ubiquitin ligase with essential hypertension. Hypertension, 46: 488-491, 2005.

205

59. Svensson-Färbom, P, Wahlstrand, B, Almgren, P, Dahlberg, J, Fava, C, Kjeldsen, S, Hedner, T, Melander, O: A functional variant of the NEDD4L gene is associated with beneficial treatment response with β-blockers and diuretics in hypertensive patients. J Hypertens, 29: 388-395, 2011.

60. McDonough, CW, Burbage, SE, Duarte, JD, Gong, Y, Langaee, TY, Turner, ST, Gums, JG, Chapman, AB, Bailey, KR, Beitelshees, AL, Boerwinkle, E, Pepine, CJ, Cooper-DeHoff, RM, Johnson, JA: Association of variants in NEDD4L with blood pressure response and adverse cardiovascular outcomes in hypertensive patients treated with thiazide diuretics. J Hypertens, 31: 698-704, 2013.

61. Yugar-Toledo, JC, Martin, JF, Krieger, JE, Pereira, AC, Demacq, C, Coelho, OR, Pimenta, E, Calhoun, DA, Júnior, HM: Gene variation in resistant hypertension: multilocus analysis of the angiotensin 1-converting enzyme, angiotensinogen, and endothelial nitric oxide synthase genes. DNA Cell Biol, 30: 555-564, 2011.

62. Giles, TD, Sander, GE, Nossaman, BD, Kadowitz, PJ: Impaired vasodilation in the pathogenesis of hypertension: focus on nitric oxide, endothelial-derived hyperpolarizing factors, and prostaglandins. J Clin Hypertens (Greenwich), 14: 198-205, 2012.

63. Sandrim, VC, de Syllos, RW, Lisboa, HR, Tres, GS, Tanus-Santos, JE: Influence of eNOS haplotypes on the plasma nitric oxide products concentrations in hypertensive and type 2 diabetes mellitus patients. Nitric Oxide, 16: 348-355, 2007.

64. Wilcox, JN, Subramanian, RR, Sundell, CL, Tracey, WR, Pollock, JS, Harrison, DG, Marsden, PA: Expression of multiple isoforms of nitric oxide synthase in normal and atherosclerotic vessels. Arterioscler Thromb Vasc Biol, 17: 2479-2488, 1997.

65. Ballinger, SW, Patterson, C, Yan, CN, Doan, R, Burow, DL, Young, CG, Yakes, FM, Van Houten, B, Ballinger, CA, Freeman, BA, Runge, MS: Hydrogen peroxide- and peroxynitrite-induced mitochondrial DNA damage and dysfunction in vascular endothelial and smooth muscle cells. Circ Res, 86: 960-966, 2000.

66. Oliveira-Paula, GH, Lacchini, R, Coeli-Lacchini, FB, Junior, HM, Tanus-Santos, JE: Inducible nitric oxide synthase haplotype associated with hypertension and responsiveness to antihypertensive drug therapy. Gene, 515: 391-395, 2013.

67. Wang, SS, Davis, S, Cerhan, JR, Hartge, P, Severson, RK, Cozen, W, Lan, Q, Welch, R, Chanock, SJ, Rothman, N: Polymorphisms in oxidative stress genes and risk for non-Hodgkin lymphoma. Carcinogenesis, 27: 1828-1834, 2006.

206

68. Fu, L, Zhao, Y, Lu, J, Shi, J, Li, C, Liu, H, Li, Y: Functional single nucleotide polymorphism-1026C/A of inducible nitric oxide synthase gene with increased YY1-binding affinity is associated with hypertension in a Chinese Han population. J Hypertens, 27: 991-1000, 2009.

69. Kaise, M, Miwa, J, Suzuki, N, Mishiro, S, Ohta, Y, Yamasaki, T, Tajiri, H: Inducible nitric oxide synthase gene promoter polymorphism is associated with increased gastric mRNA expression of inducible nitric oxide synthase and increased risk of gastric carcinoma. Eur J Gastroenterol Hepatol, 19: 139-145, 2007.

70. Li, W, Liu, H, Fu, L, Li, D, Zhao, Y: Identification of Yin Yang 1-interacting partners at -1026C/A in the human iNOS promoter. Arch Biochem Biophys, 498: 119-126, 2010.

71. Brown, KE, Dhaun, N, Goddard, J, Webb, DJ: Potential therapeutic role of phosphodiesterase type 5 inhibition in hypertension and chronic kidney disease. Hypertension, 63: 5-11, 2014.

72. Oliver, JJ, Melville, VP, Webb, DJ: Effect of regular phosphodiesterase type 5 inhibition in hypertension. Hypertension, 48: 622-627, 2006.

73. Oliver, JJ, Hughes, VE, Dear, JW, Webb, DJ: Clinical potential of combined organic nitrate and phosphodiesterase type 5 inhibitor in treatment-resistant hypertension. Hypertension, 56: 62-67, 2010.

74. Miyamoto, Y, Saito, Y, Nakayama, M, Shimasaki, Y, Yoshimura, T, Yoshimura, M, Harada, M, Kajiyama, N, Kishimoto, I, Kuwahara, K, Hino, J, Ogawa, E, Hamanaka, I, Kamitani, S, Takahashi, N, Kawakami, R, Kangawa, K, Yasue, H, Nakao, K: Replication protein A1 reduces transcription of the endothelial nitric oxide synthase gene containing a -786T-->C mutation associated with coronary spastic angina. Hum Mol Genet, 9: 2629-2637, 2000.

75. Metzger, IF, Sertório, JT, Tanus-Santos, JE: Modulation of nitric oxide formation by endothelial nitric oxide synthase gene haplotypes. Free Radic Biol Med, 43: 987- 992, 2007.

76. Metzger, IF, Ishizawa, MH, Rios-Santos, F, Carvalho, WA, Tanus-Santos, JE: Endothelial nitric oxide synthase gene haplotypes affect nitrite levels in black subjects. Pharmacogenomics J, 11: 393-399, 2011.

77. Metzger, IF, Souza-Costa, DC, Marroni, AS, Nagassaki, S, Desta, Z, Flockhart, DA, Tanus-Santos, JE: Endothelial nitric oxide synthase gene haplotypes associated with circulating concentrations of nitric oxide products in healthy men. Pharmacogenet Genomics, 15: 565-570, 2005.

207

78. Souza-Costa, DC, Belo, VA, Silva, PS, Sertorio, JT, Metzger, IF, Lanna, CM, Machado, MA, Tanus-Santos, JE: eNOS haplotype associated with hypertension in obese children and adolescents. Int J Obes (Lond), 35: 387-392, 2011.

79. Quinaglia, T, de Faria, AP, Fontana, V, Barbaro, NR, Sabbatini, AR, Sertório, JT, Demacq, C, Tanus-Santos, JE, Moreno, H: Acute cardiac and hemodynamic effects of sildenafil on resistant hypertension. Eur J Clin Pharmacol, 69: 2027- 2036, 2013.

80. Lynch, AI, Irvin, MR, Davis, BR, Ford, CE, Eckfeldt, JH, Arnett, DK: Genetic and Adverse Health Outcome Associations with Treatment Resistant Hypertension in GenHAT. Int J Hypertens, 2013: 578578, 2013.

81. Arnett, DK, Boerwinkle, E, Davis, BR, Eckfeldt, J, Ford, CE, Black, H: Pharmacogenetic approaches to hypertension therapy: design and rationale for the Genetics of Hypertension Associated Treatment (GenHAT) study. Pharmacogenomics J, 2: 309-317, 2002.

82. Davis, BR, Cutler, JA, Gordon, DJ, Furberg, CD, Wright, JT, Cushman, WC, Grimm, RH, LaRosa, J, Whelton, PK, Perry, HM, Alderman, MH, Ford, CE, Oparil, S, Francis, C, Proschan, M, Pressel, S, Black, HR, Hawkins, CM: Rationale and design for the Antihypertensive and Lipid Lowering Treatment to Prevent Heart Attack Trial (ALLHAT). ALLHAT Research Group. Am J Hypertens, 9: 342-360, 1996.

83. Barbaro, NR, Fontana, V, Moreno, H: Angiotensinogen Variants among Resistant Hypertensive Patients. Int J Hypertens, 2014: 424793, 2014.

84. Fontana, V, McDonough, CW, Gong, Y, El Rouby, NM, Sa, AC, Taylor, KD, Chen, YD, Gums, JG, Chapman, AB, Turner, ST, Pepine, CJ, Johnson, JA, Cooper- DeHoff, RM: Large-scale gene-centric analysis identifies polymorphisms for resistant hypertension. J Am Heart Assoc, 3: e001398, 2014.

85. Pepine, CJ, Handberg, EM, Cooper-DeHoff, RM, Marks, RG, Kowey, P, Messerli, FH, Mancia, G, Cangiano, JL, Garcia-Barreto, D, Keltai, M, Erdine, S, Bristol, HA, Kolb, HR, Bakris, GL, Cohen, JD, Parmley, WW, Investigators, I: A calcium antagonist vs a non-calcium antagonist hypertension treatment strategy for patients with coronary artery disease. The International Verapamil-Trandolapril Study (INVEST): a randomized controlled trial. JAMA, 290: 2805-2816, 2003.

86. Smith, SM, Gong, Y, Handberg, E, Messerli, FH, Bakris, GL, Ahmed, A, Bavry, AA, Pepine, CJ, Cooper-Dehoff, RM: Predictors and outcomes of resistant hypertension among patients with coronary artery disease and hypertension. J Hypertens, 32: 635-643, 2014.

208

87. Merz, CN, Kelsey, SF, Pepine, CJ, Reichek, N, Reis, SE, Rogers, WJ, Sharaf, BL, Sopko, G: The Women's Ischemia Syndrome Evaluation (WISE) study: protocol design, methodology and feasibility report. J Am Coll Cardiol, 33: 1453-1461, 1999.

88. Smith, SM, Huo, T, Delia Johnson, B, Bittner, V, Kelsey, SF, Vido Thompson, D, Noel Bairey Merz, C, Pepine, CJ, Cooper-Dehoff, RM: Cardiovascular and mortality risk of apparent resistant hypertension in women with suspected myocardial ischemia: a report from the NHLBI-sponsored WISE Study. J Am Heart Assoc, 3: e000660, 2014.

89. Levy, D, Ehret, GB, Rice, K, Verwoert, GC, Launer, LJ, Dehghan, A, Glazer, NL, Morrison, AC, Johnson, AD, Aspelund, T, Aulchenko, Y, Lumley, T, Köttgen, A, Vasan, RS, Rivadeneira, F, Eiriksdottir, G, Guo, X, Arking, DE, Mitchell, GF, Mattace-Raso, FU, Smith, AV, Taylor, K, Scharpf, RB, Hwang, SJ, Sijbrands, EJ, Bis, J, Harris, TB, Ganesh, SK, O'Donnell, CJ, Hofman, A, Rotter, JI, Coresh, J, Benjamin, EJ, Uitterlinden, AG, Heiss, G, Fox, CS, Witteman, JC, Boerwinkle, E, Wang, TJ, Gudnason, V, Larson, MG, Chakravarti, A, Psaty, BM, van Duijn, CM: Genome-wide association study of blood pressure and hypertension. Nat Genet, 41: 677-687, 2009.

90. Ganesh, SK, Tragante, V, Guo, W, Guo, Y, Lanktree, MB, Smith, EN, Johnson, T, Castillo, BA, Barnard, J, Baumert, J, Chang, YP, Elbers, CC, Farrall, M, Fischer, ME, Franceschini, N, Gaunt, TR, Gho, JM, Gieger, C, Gong, Y, Isaacs, A, Kleber, ME, Mateo Leach, I, McDonough, CW, Meijs, MF, Mellander, O, Molony, CM, Nolte, IM, Padmanabhan, S, Price, TS, Rajagopalan, R, Shaffer, J, Shah, S, Shen, H, Soranzo, N, van der Most, PJ, Van Iperen, EP, Van Setten, J, Van Setten, JA, Vonk, JM, Zhang, L, Beitelshees, AL, Berenson, GS, Bhatt, DL, Boer, JM, Boerwinkle, E, Burkley, B, Burt, A, Chakravarti, A, Chen, W, Cooper-Dehoff, RM, Curtis, SP, Dreisbach, A, Duggan, D, Ehret, GB, Fabsitz, RR, Fornage, M, Fox, E, Furlong, CE, Gansevoort, RT, Hofker, MH, Hovingh, GK, Kirkland, SA, Kottke-Marchant, K, Kutlar, A, Lacroix, AZ, Langaee, TY, Li, YR, Lin, H, Liu, K, Maiwald, S, Malik, R, Murugesan, G, Newton-Cheh, C, O'Connell, JR, Onland- Moret, NC, Ouwehand, WH, Palmas, W, Penninx, BW, Pepine, CJ, Pettinger, M, Polak, JF, Ramachandran, VS, Ranchalis, J, Redline, S, Ridker, PM, Rose, LM, Scharnag, H, Schork, NJ, Shimbo, D, Shuldiner, AR, Srinivasan, SR, Stolk, RP, Taylor, HA, Thorand, B, Trip, MD, van Duijn, CM, Verschuren, WM, Wijmenga, C, Winkelmann, BR, Wyatt, S, Young, JH, Boehm, BO, Caulfield, MJ, Chasman, DI, Davidson, KW, Doevendans, PA, Fitzgerald, GA, Gums, JG, Hakonarson, H, Hillege, HL, Illig, T, Jarvik, GP, Johnson, JA, Kastelein, JJ, Koenig, W, März, W, Mitchell, BD, Murray, SS, Oldehinkel, AJ, Rader, DJ, Reilly, MP, Reiner, AP, Schadt, EE, Silverstein, RL, Snieder, H, Stanton, AV, Uitterlinden, AG, van der Harst, P, van der Schouw, YT, Samani, NJ, Johnson, AD, Munroe, PB, de Bakker, PI, Zhu, X, Levy, D, Keating, BJ, Asselbergs, FW, CARDIOGRAM, META, Study, LC: Loci influencing blood pressure identified using a cardiovascular gene-centric array. Hum Mol Genet, 22: 1663-1678, 2013.

209

91. Altshuler, DM, Gibbs, RA, Peltonen, L, Dermitzakis, E, Schaffner, SF, Yu, F, Bonnen, PE, de Bakker, PI, Deloukas, P, Gabriel, SB, Gwilliam, R, Hunt, S, Inouye, M, Jia, X, Palotie, A, Parkin, M, Whittaker, P, Chang, K, Hawes, A, Lewis, LR, Ren, Y, Wheeler, D, Muzny, DM, Barnes, C, Darvishi, K, Hurles, M, Korn, JM, Kristiansson, K, Lee, C, McCarrol, SA, Nemesh, J, Keinan, A, Montgomery, SB, Pollack, S, Price, AL, Soranzo, N, Gonzaga-Jauregui, C, Anttila, V, Brodeur, W, Daly, MJ, Leslie, S, McVean, G, Moutsianas, L, Nguyen, H, Zhang, Q, Ghori, MJ, McGinnis, R, McLaren, W, Takeuchi, F, Grossman, SR, Shlyakhter, I, Hostetter, EB, Sabeti, PC, Adebamowo, CA, Foster, MW, Gordon, DR, Licinio, J, Manca, MC, Marshall, PA, Matsuda, I, Ngare, D, Wang, VO, Reddy, D, Rotimi, CN, Royal, CD, Sharp, RR, Zeng, C, Brooks, LD, McEwen, JE, Consortium, IH: Integrating common and rare genetic variation in diverse human populations. Nature, 467: 52-58, 2010.

210

92. Frazer, KA, Ballinger, DG, Cox, DR, Hinds, DA, Stuve, LL, Gibbs, RA, Belmont, JW, Boudreau, A, Hardenbol, P, Leal, SM, Pasternak, S, Wheeler, DA, Willis, TD, Yu, F, Yang, H, Zeng, C, Gao, Y, Hu, H, Hu, W, Li, C, Lin, W, Liu, S, Pan, H, Tang, X, Wang, J, Wang, W, Yu, J, Zhang, B, Zhang, Q, Zhao, H, Zhou, J, Gabriel, SB, Barry, R, Blumenstiel, B, Camargo, A, Defelice, M, Faggart, M, Goyette, M, Gupta, S, Moore, J, Nguyen, H, Onofrio, RC, Parkin, M, Roy, J, Stahl, E, Winchester, E, Ziaugra, L, Altshuler, D, Shen, Y, Yao, Z, Huang, W, Chu, X, He, Y, Jin, L, Liu, Y, Sun, W, Wang, H, Wang, Y, Xiong, X, Xu, L, Waye, MM, Tsui, SK, Xue, H, Wong, JT, Galver, LM, Fan, JB, Gunderson, K, Murray, SS, Oliphant, AR, Chee, MS, Montpetit, A, Chagnon, F, Ferretti, V, Leboeuf, M, Olivier, JF, Phillips, MS, Roumy, S, Sallée, C, Verner, A, Hudson, TJ, Kwok, PY, Cai, D, Koboldt, DC, Miller, RD, Pawlikowska, L, Taillon-Miller, P, Xiao, M, Tsui, LC, Mak, W, Song, YQ, Tam, PK, Nakamura, Y, Kawaguchi, T, Kitamoto, T, Morizono, T, Nagashima, A, Ohnishi, Y, Sekine, A, Tanaka, T, Tsunoda, T, Deloukas, P, Bird, CP, Delgado, M, Dermitzakis, ET, Gwilliam, R, Hunt, S, Morrison, J, Powell, D, Stranger, BE, Whittaker, P, Bentley, DR, Daly, MJ, de Bakker, PI, Barrett, J, Chretien, YR, Maller, J, McCarroll, S, Patterson, N, Pe'er, I, Price, A, Purcell, S, Richter, DJ, Sabeti, P, Saxena, R, Schaffner, SF, Sham, PC, Varilly, P, Stein, LD, Krishnan, L, Smith, AV, Tello-Ruiz, MK, Thorisson, GA, Chakravarti, A, Chen, PE, Cutler, DJ, Kashuk, CS, Lin, S, Abecasis, GR, Guan, W, Li, Y, Munro, HM, Qin, ZS, Thomas, DJ, McVean, G, Auton, A, Bottolo, L, Cardin, N, Eyheramendy, S, Freeman, C, Marchini, J, Myers, S, Spencer, C, Stephens, M, Donnelly, P, Cardon, LR, Clarke, G, Evans, DM, Morris, AP, Weir, BS, Mullikin, JC, Sherry, ST, Feolo, M, Skol, A, Zhang, H, Matsuda, I, Fukushima, Y, Macer, DR, Suda, E, Rotimi, CN, Adebamowo, CA, Ajayi, I, Aniagwu, T, Marshall, PA, Nkwodimmah, C, Royal, CD, Leppert, MF, Dixon, M, Peiffer, A, Qiu, R, Kent, A, Kato, K, Niikawa, N, Adewole, IF, Knoppers, BM, Foster, MW, Clayton, EW, Watkin, J, Muzny, D, Nazareth, L, Sodergren, E, Weinstock, GM, Yakub, I, Birren, BW, Wilson, RK, Fulton, LL, Rogers, J, Burton, J, Carter, NP, Clee, CM, Griffiths, M, Jones, MC, McLay, K, Plumb, RW, Ross, MT, Sims, SK, Willey, DL, Chen, Z, Han, H, Kang, L, Godbout, M, Wallenburg, JC, L'Archevêque, P, Bellemare, G, Saeki, K, An, D, Fu, H, Li, Q, Wang, Z, Wang, R, Holden, AL, Brooks, LD, McEwen, JE, Guyer, MS, Wang, VO, Peterson, JL, Shi, M, Spiegel, J, Sung, LM, Zacharia, LF, Collins, FS, Kennedy, K, Jamieson, R, Stewart, J, Consortium, IH: A second generation human haplotype map of over 3.1 million SNPs. Nature, 449: 851-861, 2007.

93. Abecasis, GR, Auton, A, Brooks, LD, DePristo, MA, Durbin, RM, Handsaker, RE, Kang, HM, Marth, GT, McVean, GA, Consortium, GP: An integrated map of genetic variation from 1,092 human genomes. Nature, 491: 56-65, 2012.

94. Auton, A, Brooks, LD, Durbin, RM, Garrison, EP, Kang, HM, Korbel, JO, Marchini, JL, McCarthy, S, McVean, GA, Abecasis, GR, Consortium, GP: A global reference for human genetic variation. Nature, 526: 68-74, 2015.

211

95. Walter, K, Min, JL, Huang, J, Crooks, L, Memari, Y, McCarthy, S, Perry, JR, Xu, C, Futema, M, Lawson, D, Iotchkova, V, Schiffels, S, Hendricks, AE, Danecek, P, Li, R, Floyd, J, Wain, LV, Barroso, I, Humphries, SE, Hurles, ME, Zeggini, E, Barrett, JC, Plagnol, V, Richards, JB, Greenwood, CM, Timpson, NJ, Durbin, R, Soranzo, N, Consortium, UK: The UK10K project identifies rare variants in health and disease. Nature, 526: 82-90, 2015.

96. Krieger, EM, Drager, LF, Giorgi, DM, Krieger, JE, Pereira, AC, Barreto-Filho, JA, da Rocha Nogueira, A, Mill, JG, Investigators, R: Resistant hypertension optimal treatment trial: a randomized controlled trial. Clin Cardiol, 37: 1-6, 2014.

97. Benavente, OR, White, CL, Pearce, L, Pergola, P, Roldan, A, Benavente, MF, Coffey, C, McClure, LA, Szychowski, JM, Conwit, R, Heberling, PA, Howard, G, Bazan, C, Vidal-Pergola, G, Talbert, R, Hart, RG, Investigators, SPS: The Secondary Prevention of Small Subcortical Strokes (SPS3) study. Int J Stroke, 6: 164-175, 2011.

98. Group, SPSS, Benavente, OR, Coffey, CS, Conwit, R, Hart, RG, McClure, LA, Pearce, LA, Pergola, PE, Szychowski, JM: Blood-pressure targets in patients with recent lacunar stroke: the SPS3 randomised trial. Lancet, 382: 507-515, 2013.

99. Sever, PS, Dahlöf, B, Poulter, NR, Wedel, H, Beevers, G, Caulfield, M, Collins, R, Kjeldsen, SE, McInnes, GT, Mehlsen, J, Nieminen, M, O'Brien, E, Ostergren, J: Rationale, design, methods and baseline demography of participants of the Anglo-Scandinavian Cardiac Outcomes Trial. ASCOT investigators. J Hypertens, 19: 1139-1147, 2001.

100. Hankowski, KE, Hamazaki, T, Umezawa, A, Terada, N: Induced pluripotent stem cells as a next-generation biomedical interface. Lab Invest, 91: 972-977, 2011.

101. Zhu, H, Lensch, MW, Cahan, P, Daley, GQ: Investigating monogenic and complex diseases with pluripotent stem cells. Nat Rev Genet, 12: 266-275, 2011.

102. Hindorff, LA, Sethupathy, P, Junkins, HA, Ramos, EM, Mehta, JP, Collins, FS, Manolio, TA: Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A, 106: 9362-9367, 2009.

103. McCarthy, MI, Hirschhorn, JN: Genome-wide association studies: past, present and future. Hum Mol Genet, 17: R100-101, 2008.

212

104. Gong, Y, Wang, Z, Beitelshees, AL, McDonough, CW, Langaee, TY, Hall, K, Schmidt, SO, Curry, RW, Gums, JG, Bailey, KR, Boerwinkle, E, Chapman, AB, Turner, ST, Cooper-DeHoff, RM, Johnson, JA: Pharmacogenomic Genome-Wide Meta-Analysis of Blood Pressure Response to β-Blockers in Hypertensive African Americans. Hypertension, 67: 556-563, 2016.

105. Kitsios, GD, Zintzaras, E: Genome-wide association studies: hypothesis-"free" or "engaged"? Transl Res, 154: 161-164, 2009.

106. Pearson, TA, Manolio, TA: How to interpret a genome-wide association study. JAMA, 299: 1335-1344, 2008.

107. Conrad, DF, Jakobsson, M, Coop, G, Wen, X, Wall, JD, Rosenberg, NA, Pritchard, JK: A worldwide survey of haplotype variation and linkage disequilibrium in the human genome. Nat Genet, 38: 1251-1260, 2006.

108. Marchini, J, Howie, B: Genotype imputation for genome-wide association studies. Nat Rev Genet, 11: 499-511, 2010.

109. Browning, SR: Missing data imputation and haplotype phase inference for genome-wide association studies. Hum Genet, 124: 439-450, 2008.

110. Browning, BL, Browning, SR: A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet, 84: 210-223, 2009.

111. Howie, B, Marchini, J, Stephens, M: Genotype imputation with thousands of genomes. G3 (Bethesda), 1: 457-470, 2011.

112. Howie, B, Fuchsberger, C, Stephens, M, Marchini, J, Abecasis, GR: Fast and accurate genotype imputation in genome-wide association studies through pre- phasing. Nat Genet, 44: 955-959, 2012.

113. Li, Y, Willer, CJ, Ding, J, Scheet, P, Abecasis, GR: MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol, 34: 816-834, 2010.

114. Abecasis, GR, Altshuler, D, Auton, A, Brooks, LD, Durbin, RM, Gibbs, RA, Hurles, ME, McVean, GA, Consortium, GP: A map of human genome variation from population-scale sequencing. Nature, 467: 1061-1073, 2010.

115. Zeggini, E, Ioannidis, JP: Meta-analysis in genome-wide association studies. Pharmacogenomics, 10: 191-201, 2009.

213

116. de Bakker, PI, Ferreira, MA, Jia, X, Neale, BM, Raychaudhuri, S, Voight, BF: Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum Mol Genet, 17: R122-128, 2008.

117. Chen, F, Chen, GK, Millikan, RC, John, EM, Ambrosone, CB, Bernstein, L, Zheng, W, Hu, JJ, Ziegler, RG, Deming, SL, Bandera, EV, Nyante, S, Palmer, JR, Rebbeck, TR, Ingles, SA, Press, MF, Rodriguez-Gil, JL, Chanock, SJ, Le Marchand, L, Kolonel, LN, Henderson, BE, Stram, DO, Haiman, CA: Fine- mapping of breast cancer susceptibility loci characterizes genetic risk in African Americans. Hum Mol Genet, 20: 4491-4503, 2011.

118. Sanna, S, Li, B, Mulas, A, Sidore, C, Kang, HM, Jackson, AU, Piras, MG, Usala, G, Maninchedda, G, Sassu, A, Serra, F, Palmas, MA, Wood, WH, Njølstad, I, Laakso, M, Hveem, K, Tuomilehto, J, Lakka, TA, Rauramaa, R, Boehnke, M, Cucca, F, Uda, M, Schlessinger, D, Nagaraja, R, Abecasis, GR: Fine mapping of five loci associated with low-density lipoprotein cholesterol detects variants that double the explained heritability. PLoS Genet, 7: e1002198, 2011.

119. Holm, H, Gudbjartsson, DF, Sulem, P, Masson, G, Helgadottir, HT, Zanon, C, Magnusson, OT, Helgason, A, Saemundsdottir, J, Gylfason, A, Stefansdottir, H, Gretarsdottir, S, Matthiasson, SE, Thorgeirsson, GM, Jonasdottir, A, Sigurdsson, A, Stefansson, H, Werge, T, Rafnar, T, Kiemeney, LA, Parvez, B, Muhammad, R, Roden, DM, Darbar, D, Thorleifsson, G, Walters, GB, Kong, A, Thorsteinsdottir, U, Arnar, DO, Stefansson, K: A rare variant in MYH6 is associated with high risk of sick sinus syndrome. Nat Genet, 43: 316-320, 2011.

120. Liu, Q, Cirulli, ET, Han, Y, Yao, S, Liu, S, Zhu, Q: Systematic assessment of imputation performance using the 1000 Genomes reference panels. Brief Bioinform, 16: 549-562, 2015.

121. Meschia, JF, Arnett, DK, Ay, H, Brown, RD, Benavente, OR, Cole, JW, de Bakker, PI, Dichgans, M, Doheny, KF, Fornage, M, Grewal, RP, Gwinn, K, Jern, C, Conde, JJ, Johnson, JA, Jood, K, Laurie, CC, Lee, JM, Lindgren, A, Markus, HS, McArdle, PF, McClure, LA, Mitchell, BD, Schmidt, R, Rexrode, KM, Rich, SS, Rosand, J, Rothwell, PM, Rundek, T, Sacco, RL, Sharma, P, Shuldiner, AR, Slowik, A, Wassertheil-Smoller, S, Sudlow, C, Thijs, VN, Woo, D, Worrall, BB, Wu, O, Kittner, SJ, Study, NS: Stroke Genetics Network (SiGN) study: design and rationale for a genome-wide association study of ischemic stroke subtypes. Stroke, 44: 2694-2702, 2013.

122. Winkler, TW, Day, FR, Croteau-Chonka, DC, Wood, AR, Locke, AE, Mägi, R, Ferreira, T, Fall, T, Graff, M, Justice, AE, Luan, J, Gustafsson, S, Randall, JC, Vedantam, S, Workalemahu, T, Kilpeläinen, TO, Scherag, A, Esko, T, Kutalik, Z, Heid, IM, Loos, RJ, Consortium, GIoATG: Quality control and conduct of genome-wide association meta-analyses. Nat Protoc, 9: 1192-1212, 2014.

214

123. Turner, S, Armstrong, LL, Bradford, Y, Carlson, CS, Crawford, DC, Crenshaw, AT, de Andrade, M, Doheny, KF, Haines, JL, Hayes, G, Jarvik, G, Jiang, L, Kullo, IJ, Li, R, Ling, H, Manolio, TA, Matsumoto, M, McCarty, CA, McDavid, AN, Mirel, DB, Paschall, JE, Pugh, EW, Rasmussen, LV, Wilke, RA, Zuvich, RL, Ritchie, MD: Quality control procedures for genome-wide association studies. Curr Protoc Hum Genet, Chapter 1: Unit1.19, 2011.

124. Purcell, S, Neale, B, Todd-Brown, K, Thomas, L, Ferreira, MA, Bender, D, Maller, J, Sklar, P, de Bakker, PI, Daly, MJ, Sham, PC: PLINK: a tool set for whole- genome association and population-based linkage analyses. Am J Hum Genet, 81: 559-575, 2007.

125. Price, AL, Patterson, NJ, Plenge, RM, Weinblatt, ME, Shadick, NA, Reich, D: Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet, 38: 904-909, 2006.

126. Das, S, Forer, L, Schönherr, S, Sidore, C, Locke, AE, Kwong, A, Vrieze, SI, Chew, EY, Levy, S, McGue, M, Schlessinger, D, Stambolian, D, Loh, PR, Iacono, WG, Swaroop, A, Scott, LJ, Cucca, F, Kronenberg, F, Boehnke, M, Abecasis, GR, Fuchsberger, C: Next-generation genotype imputation service and methods. Nat Genet, 48: 1284-1287, 2016.

127. Silverberg, MS, Cho, JH, Rioux, JD, McGovern, DP, Wu, J, Annese, V, Achkar, JP, Goyette, P, Scott, R, Xu, W, Barmada, MM, Klei, L, Daly, MJ, Abraham, C, Bayless, TM, Bossa, F, Griffiths, AM, Ippoliti, AF, Lahaie, RG, Latiano, A, Paré, P, Proctor, DD, Regueiro, MD, Steinhart, AH, Targan, SR, Schumm, LP, Kistner, EO, Lee, AT, Gregersen, PK, Rotter, JI, Brant, SR, Taylor, KD, Roeder, K, Duerr, RH: Ulcerative colitis-risk loci on chromosomes 1p36 and 12q15 found by genome-wide association study. Nat Genet, 41: 216-220, 2009.

128. Fisher, SA, Tremelling, M, Anderson, CA, Gwilliam, R, Bumpstead, S, Prescott, NJ, Nimmo, ER, Massey, D, Berzuini, C, Johnson, C, Barrett, JC, Cummings, FR, Drummond, H, Lees, CW, Onnie, CM, Hanson, CE, Blaszczyk, K, Inouye, M, Ewels, P, Ravindrarajah, R, Keniry, A, Hunt, S, Carter, M, Watkins, N, Ouwehand, W, Lewis, CM, Cardon, L, Lobo, A, Forbes, A, Sanderson, J, Jewell, DP, Mansfield, JC, Deloukas, P, Mathew, CG, Parkes, M, Satsangi, J, Consortium, WTCC: Genetic determinants of ulcerative colitis include the ECM1 locus and five loci implicated in Crohn's disease. Nat Genet, 40: 710-712, 2008.

129. Edwards, AW: G. H. Hardy (1908) and Hardy-Weinberg equilibrium. Genetics, 179: 1143-1150, 2008.

130. Wittke-Thompson, JK, Pluzhnikov, A, Cox, NJ: Rational inferences about departures from Hardy-Weinberg equilibrium. Am J Hum Genet, 76: 967-986, 2005.

215

131. Anderson, CA, Pettersson, FH, Clarke, GM, Cardon, LR, Morris, AP, Zondervan, KT: Data quality control in genetic case-control association studies. Nat Protoc, 5: 1564-1573, 2010.

132. Morris, AP, Zeggini, E: An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genet Epidemiol, 34: 188-193, 2010.

133. Consortium, WTCC: Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 447: 661-678, 2007.

134. Freedman, ML, Reich, D, Penney, KL, McDonald, GJ, Mignault, AA, Patterson, N, Gabriel, SB, Topol, EJ, Smoller, JW, Pato, CN, Pato, MT, Petryshen, TL, Kolonel, LN, Lander, ES, Sklar, P, Henderson, B, Hirschhorn, JN, Altshuler, D: Assessing the impact of population stratification on genetic association studies. Nat Genet, 36: 388-393, 2004.

135. Marchini, J, Cardon, LR, Phillips, MS, Donnelly, P: The effects of human population structure on large genetic association studies. Nat Genet, 36: 512- 517, 2004.

136. Cardon, LR, Palmer, LJ: Population stratification and spurious allelic association. Lancet, 361: 598-604, 2003.

137. Patterson, N, Price, AL, Reich, D: Population structure and eigenanalysis. PLoS Genet, 2: e190, 2006.

138. Wang, D, Sun, Y, Stang, P, Berlin, JA, Wilcox, MA, Li, Q: Comparison of methods for correcting population stratification in a genome-wide association study of rheumatoid arthritis: principal-component analysis versus multidimensional scaling. BMC Proc, 3 Suppl 7: S109, 2009.

139. Verma, SS, de Andrade, M, Tromp, G, Kuivaniemi, H, Pugh, E, Namjou-Khales, B, Mukherjee, S, Jarvik, GP, Kottyan, LC, Burt, A, Bradford, Y, Armstrong, GD, Derr, K, Crawford, DC, Haines, JL, Li, R, Crosslin, D, Ritchie, MD: Imputation and quality control steps for combining multiple genome-wide datasets. Front Genet, 5: 370, 2014.

140. van Leeuwen, EM, Kanterakis, A, Deelen, P, Kattenberg, MV, Slagboom, PE, de Bakker, PI, Wijmenga, C, Swertz, MA, Boomsma, DI, van Duijn, CM, Karssen, LC, Hottenga, JJ, Consortium, GotN: Population-specific genotype imputations using minimac or IMPUTE2. Nat Protoc, 10: 1285-1296, 2015.

141. Delaneau, O, Zagury, JF, Marchini, J: Improved whole-chromosome phasing for disease and population genetic studies. Nat Methods, 10: 5-6, 2013.

216

142. Delaneau, O, Marchini, J, Zagury, JF: A linear complexity phasing method for thousands of genomes. Nat Methods, 9: 179-181, 2011.

143. Browning, BL, Browning, SR: Genotype Imputation with Millions of Reference Samples. Am J Hum Genet, 98: 116-126, 2016.

144. Willer, CJ, Li, Y, Abecasis, GR: METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics, 26: 2190-2191, 2010.

145. Blacher, J, Levy, BI, Mourad, JJ, Safar, ME, Bakris, G: From epidemiological transition to modern cardiovascular epidemiology: hypertension in the 21st century. Lancet, 2016.

146. Calhoun, DA, Jones, D, Textor, S, Goff, DC, Murphy, TP, Toto, RD, White, A, Cushman, WC, White, W, Sica, D, Ferdinand, K, Giles, TD, Falkner, B, Carey, RM, American Heart Association Professional Education, C: Resistant hypertension: diagnosis, evaluation, and treatment: a scientific statement from the American Heart Association Professional Education Committee of the Council for High Blood Pressure Research. Circulation, 117: e510-526, 2008.

147. Muntner, P, Davis, BR, Cushman, WC, Bangalore, S, Calhoun, DA, Pressel, SL, Black, HR, Kostis, JB, Probstfield, JL, Whelton, PK, Rahman, M, Group, ACR: Treatment-resistant hypertension and the incidence of cardiovascular disease and end-stage renal disease: results from the Antihypertensive and Lipid- Lowering Treatment to Prevent Heart Attack Trial (ALLHAT). Hypertension, 64: 1012-1021, 2014.

148. Kumbhani, DJ, Steg, PG, Cannon, CP, Eagle, KA, Smith, SC, Jr., Crowley, K, Goto, S, Ohman, EM, Bakris, GL, Perlstein, TS, Kinlay, S, Bhatt, DL, Investigators, RR: Resistant hypertension: a frequent and ominous finding among hypertensive patients with atherothrombosis. Eur Heart J, 34: 1204-1214, 2013.

149. Bangalore, S, Fayyad, R, Laskey, R, Demicco, DA, Deedwania, P, Kostis, JB, Messerli, FH, Treating to New Targets Steering, C, Investigators: Prevalence, predictors, and outcomes in treatment-resistant hypertension in patients with coronary disease. Am J Med, 127: 71-81 e71, 2014.

150. Daugherty, SL, Powers, JD, Magid, DJ, Tavel, HM, Masoudi, FA, Margolis, KL, O'Connor, PJ, Selby, JV, Ho, PM: Incidence and prognosis of resistant hypertension in hypertensive patients. Circulation, 125: 1635-1642, 2012.

151. Sim, JJ, Bhandari, SK, Shi, J, Reynolds, K, Calhoun, DA, Kalantar-Zadeh, K, Jacobsen, SJ: Comparative risk of renal, cardiovascular, and mortality outcomes in controlled, uncontrolled resistant, and nonresistant hypertension. Kidney Int, 88: 622-632, 2015.

217

152. Tanner, RM, Calhoun, DA, Bell, EK, Bowling, CB, Gutiérrez, OM, Irvin, MR, Lackland, DT, Oparil, S, McClellan, W, Warnock, DG, Muntner, P: Incident ESRD and treatment-resistant hypertension: the reasons for geographic and racial differences in stroke (REGARDS) study. Am J Kidney Dis, 63: 781-788, 2014.

153. Thomas, G, Xie, D, Chen, HY, Anderson, AH, Appel, LJ, Bodana, S, Brecklin, CS, Drawz, P, Flack, JM, Miller, ER, Steigerwalt, SP, Townsend, RR, Weir, MR, Wright, JT, Rahman, M, Investigators, CS: Prevalence and Prognostic Significance of Apparent Treatment Resistant Hypertension in Chronic Kidney Disease: Report From the Chronic Renal Insufficiency Cohort Study. Hypertension, 67: 387-396, 2016.

154. Tanner, RM, Calhoun, DA, Bell, EK, Bowling, CB, Gutiérrez, OM, Irvin, MR, Lackland, DT, Oparil, S, Warnock, D, Muntner, P: Prevalence of apparent treatment-resistant hypertension among individuals with CKD. Clin J Am Soc Nephrol, 8: 1583-1590, 2013.

155. Sarafidis, PA, Georgianos, P, Bakris, GL: Resistant hypertension--its identification and epidemiology. Nat Rev Nephrol, 9: 51-58, 2013.

156. El Rouby, N, Cooper-DeHoff, RM: Genetics of resistant hypertension: a novel pharmacogenomics phenotype. Current hypertension reports, 17: 583, 2015.

157. Lacchini, R, Figueiredo, VN, Demacq, C, Coeli-Lacchini, FB, Martins, LC, Yugar- Toledo, J, Coca, A, Tanus-Santos, JE, Moreno, H, Jr.: MDR-1 C3435T polymorphism may affect blood pressure in resistant hypertensive patients independently of its effects on aldosterone release. J Renin Angiotensin Aldosterone Syst, 15: 170-176, 2014.

158. Kho, AN, Pacheco, JA, Peissig, PL, Rasmussen, L, Newton, KM, Weston, N, Crane, PK, Pathak, J, Chute, CG, Bielinski, SJ, Kullo, IJ, Li, R, Manolio, TA, Chisholm, RL, Denny, JC: Electronic medical records for genetic research: results of the eMERGE consortium. Sci Transl Med, 3: 79re71, 2011.

159. Gottesman, O, Kuivaniemi, H, Tromp, G, Faucett, WA, Li, R, Manolio, TA, Sanderson, SC, Kannry, J, Zinberg, R, Basford, MA, Brilliant, M, Carey, DJ, Chisholm, RL, Chute, CG, Connolly, JJ, Crosslin, D, Denny, JC, Gallego, CJ, Haines, JL, Hakonarson, H, Harley, J, Jarvik, GP, Kohane, I, Kullo, IJ, Larson, EB, McCarty, C, Ritchie, MD, Roden, DM, Smith, ME, Bottinger, EP, Williams, MS, e, MN: The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future. Genet Med, 15: 761-771, 2013.

218

160. Replication, DIG, Meta-analysis, C, Asian Genetic Epidemiology Network Type 2 Diabetes, C, South Asian Type 2 Diabetes, C, Mexican American Type 2 Diabetes, C, Type 2 Diabetes Genetic Exploration by Nex-generation sequencing in muylti-Ethnic Samples, C, Mahajan, A, Go, MJ, Zhang, W, Below, JE, Gaulton, KJ, Ferreira, T, Horikoshi, M, Johnson, AD, Ng, MC, Prokopenko, I, Saleheen, D, Wang, X, Zeggini, E, Abecasis, GR, Adair, LS, Almgren, P, Atalay, M, Aung, T, Baldassarre, D, Balkau, B, Bao, Y, Barnett, AH, Barroso, I, Basit, A, Been, LF, Beilby, J, Bell, GI, Benediktsson, R, Bergman, RN, Boehm, BO, Boerwinkle, E, Bonnycastle, LL, Burtt, N, Cai, Q, Campbell, H, Carey, J, Cauchi, S, Caulfield, M, Chan, JC, Chang, LC, Chang, TJ, Chang, YC, Charpentier, G, Chen, CH, Chen, H, Chen, YT, Chia, KS, Chidambaram, M, Chines, PS, Cho, NH, Cho, YM, Chuang, LM, Collins, FS, Cornelis, MC, Couper, DJ, Crenshaw, AT, van Dam, RM, Danesh, J, Das, D, de Faire, U, Dedoussis, G, Deloukas, P, Dimas, AS, Dina, C, Doney, AS, Donnelly, PJ, Dorkhan, M, van Duijn, C, Dupuis, J, Edkins, S, Elliott, P, Emilsson, V, Erbel, R, Eriksson, JG, Escobedo, J, Esko, T, Eury, E, Florez, JC, Fontanillas, P, Forouhi, NG, Forsen, T, Fox, C, Fraser, RM, Frayling, TM, Froguel, P, Frossard, P, Gao, Y, Gertow, K, Gieger, C, Gigante, B, Grallert, H, Grant, GB, Grrop, LC, Groves, CJ, Grundberg, E, Guiducci, C, Hamsten, A, Han, BG, Hara, K, Hassanali, N, Hattersley, AT, Hayward, C, Hedman, AK, Herder, C, Hofman, A, Holmen, OL, Hovingh, K, Hreidarsson, AB, Hu, C, Hu, FB, Hui, J, Humphries, SE, Hunt, SE, Hunter, DJ, Hveem, K, Hydrie, ZI, Ikegami, H, Illig, T, Ingelsson, E, Islam, M, Isomaa, B, Jackson, AU, Jafar, T, James, A, Jia, W, Jockel, KH, Jonsson, A, Jowett, JB, Kadowaki, T, Kang, HM, Kanoni, S, Kao, WH, Kathiresan, S, Kato, N, Katulanda, P, Keinanen-Kiukaanniemi, KM, Kelly, AM, Khan, H, Khaw, KT, Khor, CC, Kim, HL, Kim, S, Kim, YJ, Kinnunen, L, Klopp, N, Kong, A, Korpi-Hyovalti, E, Kowlessur, S, Kraft, P, Kravic, J, Kristensen, MM, Krithika, S, Kumar, A, Kumate, J, Kuusisto, J, Kwak, SH, Laakso, M, Lagou, V, Lakka, TA, Langenberg, C, Langford, C, Lawrence, R, Leander, K, Lee, JM, Lee, NR, Li, M, Li, X, Li, Y, Liang, J, Liju, S, Lim, WY, Lind, L, Lindgren, CM, Lindholm, E, Liu, CT, Liu, JJ, Lobbens, S, Long, J, Loos, RJ, Lu, W, Luan, J, Lyssenko, V, Ma, RC, Maeda, S, Magi, R, Mannisto, S, Matthews, DR, Meigs, JB, Melander, O, Metspalu, A, Meyer, J, Mirza, G, Mihailov, E, Moebus, S, Mohan, V, Mohlke, KL, Morris, AD, Muhleisen, TW, Muller-Nurasyid, M, Musk, B, Nakamura, J, Nakashima, E, Navarro, P, Ng, PK, Nica, AC, Nilsson, PM, Njolstad, I, Nothen, MM, Ohnaka, K, Ong, TH, Owen, KR, Palmer, CN, Pankow, JS, Park, KS, Parkin, M, Pechlivanis, S, Pedersen, NL, Peltonen, L, Perry, JR, Peters, A, Pinidiyapathirage, JM, Platou, CG, Potter, S, Price, JF, Qi, L, Radha, V, Rallidis, L, Rasheed, A, Rathman, W, Rauramaa, R, Raychaudhuri, S, Rayner, NW, Rees, SD, Rehnberg, E, Ripatti, S, Robertson, N, Roden, M, Rossin, EJ, Rudan, I, Rybin, D, Saaristo, TE, Salomaa, V, Saltevo, J, Samuel, M, Sanghera, DK, Saramies, J, Scott, J, Scott, LJ, Scott, RA, Segre, AV, Sehmi, J, Sennblad, B, Shah, N, Shah, S, Shera, AS, Shu, XO, Shuldiner, AR, Sigurdsson, G, Sijbrands, E, Silveira, A, Sim, X, Sivapalaratnam, S, Small, KS, So, WY, Stancakova, A, Stefansson, K, Steinbach, G, Steinthorsdottir, V, Stirrups, K, Strawbridge, RJ, Stringham, HM, Sun, Q, Suo, C, Syvanen, AC, Takayanagi, R, Takeuchi, F, Tay, WT, Teslovich, TM, Thorand, B, Thorleifsson,

219

G, Thorsteinsdottir, U, Tikkanen, E, Trakalo, J, Tremoli, E, Trip, MD, Tsai, FJ, Tuomi, T, Tuomilehto, J, Uitterlinden, AG, Valladares-Salgado, A, Vedantam, S, Veglia, F, Voight, BF, Wang, C, Wareham, NJ, Wennauer, R, Wickremasinghe, AR, Wilsgaard, T, Wilson, JF, Wiltshire, S, Winckler, W, Wong, TY, Wood, AR, Wu, JY, Wu, Y, Yamamoto, K, Yamauchi, T, Yang, M, Yengo, L, Yokota, M, Young, R, Zabaneh, D, Zhang, F, Zhang, R, Zheng, W, Zimmet, PZ, Altshuler, D, Bowden, DW, Cho, YS, Cox, NJ, Cruz, M, Hanis, CL, Kooner, J, Lee, JY, Seielstad, M, Teo, YY, Boehnke, M, Parra, EJ, Chambers, JC, Tai, ES, McCarthy, MI, Morris, AP: Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. Nat Genet, 46: 234-244, 2014.

161. Pe'er, I, Yelensky, R, Altshuler, D, Daly, MJ: Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet Epidemiol, 32: 381-385, 2008.

162. Gorlov, IP, Moore, JH, Peng, B, Jin, JL, Gorlova, OY, Amos, CI: SNP characteristics predict replication success in association studies. Hum Genet, 133: 1477-1486, 2014.

163. Hou, L, Zhao, H: A review of post-GWAS prioritization approaches. Front Genet, 4: 280, 2013.

164. Ward, LD, Kellis, M: HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res, 40: D930-934, 2012.

165. Boyle, AP, Hong, EL, Hariharan, M, Cheng, Y, Schaub, MA, Kasowski, M, Karczewski, KJ, Park, J, Hitz, BC, Weng, S, Cherry, JM, Snyder, M: Annotation of functional variation in personal genomes using RegulomeDB. Genome Res, 22: 1790-1797, 2012.

166. Welter, D, MacArthur, J, Morales, J, Burdett, T, Hall, P, Junkins, H, Klemm, A, Flicek, P, Manolio, T, Hindorff, L, Parkinson, H: The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res, 42: D1001-1006, 2014.

167. McDonough, CW, Gong, Y, Padmanabhan, S, Burkley, B, Langaee, TY, Melander, O, Pepine, CJ, Dominiczak, AF, Cooper-Dehoff, RM, Johnson, JA: Pharmacogenomic association of nonsynonymous SNPs in SIGLEC12, A1BG, and the selectin region and cardiovascular outcomes. Hypertension, 62: 48-54, 2013.

220

168. Vandell, AG, McDonough, CW, Gong, Y, Langaee, TY, Lucas, AM, Chapman, AB, Gums, JG, Beitelshees, AL, Bailey, KR, Johnson, RJ, Boerwinkle, E, Turner, ST, Cooper-DeHoff, RM, Johnson, JA: Hydrochlorothiazide-induced hyperuricaemia in the pharmacogenomic evaluation of antihypertensive responses study. J Intern Med, 276: 486-497, 2014.

169. Cohen, MM, Jr.: Craniofacial disorders caused by mutations in homeobox genes MSX1 and MSX2. J Craniofac Genet Dev Biol, 20: 19-25, 2000.

170. Towler, DA, Bidder, M, Latifi, T, Coleman, T, Semenkovich, CF: Diet-induced diabetes activates an osteogenic gene regulatory program in the aortas of low density lipoprotein receptor-deficient mice. J Biol Chem, 273: 30427-30434, 1998.

171. Shao, JS, Cheng, SL, Pingsterhaus, JM, Charlton-Kachigian, N, Loewy, AP, Towler, DA: Msx2 promotes cardiovascular calcification by activating paracrine Wnt signals. J Clin Invest, 115: 1210-1220, 2005.

172. Cheng, SL, Behrmann, A, Shao, JS, Ramachandran, B, Krchma, K, Bello Arredondo, Y, Kovacs, A, Mead, M, Maxson, R, Towler, DA: Targeted reduction of vascular Msx1 and Msx2 mitigates arteriosclerotic calcification and aortic stiffness in LDLR-deficient mice fed diabetogenic diets. Diabetes, 63: 4326-4337, 2014.

173. Shimizu, T, Tanaka, T, Iso, T, Kawai-Kowase, K, Kurabayashi, M: Azelnidipine inhibits Msx2-dependent osteogenic differentiation and matrix mineralization of vascular smooth muscle cells. Int Heart J, 53: 331-335, 2012.

174. Consortium, G: The Genotype-Tissue Expression (GTEx) project. Nat Genet, 45: 580-585, 2013.

175. Consortium, G: Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science, 348: 648-660, 2015.

176. Fox, CS, Heard-Costa, N, Cupples, LA, Dupuis, J, Vasan, RS, Atwood, LD: Genome-wide association to body mass index and waist circumference: the Framingham Heart Study 100K project. BMC Med Genet, 8 Suppl 1: S18, 2007.

177. Levy, D, Larson, MG, Benjamin, EJ, Newton-Cheh, C, Wang, TJ, Hwang, SJ, Vasan, RS, Mitchell, GF: Framingham Heart Study 100K Project: genome-wide associations for blood pressure and arterial stiffness. BMC Med Genet, 8 Suppl 1: S3, 2007.

221

178. Newton-Cheh, C, Guo, CY, Wang, TJ, O'donnell, CJ, Levy, D, Larson, MG: Genome-wide association study of electrocardiographic and heart rate variability traits: the Framingham Heart Study. BMC Med Genet, 8 Suppl 1: S7, 2007.

179. Below, JE, Gamazon, ER, Morrison, JV, Konkashbaev, A, Pluzhnikov, A, McKeigue, PM, Parra, EJ, Elbein, SC, Hallman, DM, Nicolae, DL, Bell, GI, Cruz, M, Cox, NJ, Hanis, CL: Genome-wide association and meta-analysis in populations from Starr County, Texas, and Mexico City identify type 2 diabetes susceptibility loci and enrichment for expression quantitative trait loci in top signals. Diabetologia, 54: 2047-2055, 2011.

180. Pei, Q, Huang, Q, Yang, GP, Zhao, YC, Yin, JY, Song, M, Zheng, Y, Mo, ZH, Zhou, HH, Liu, ZQ: PPAR-γ2 and PTPRD gene polymorphisms influence type 2 diabetes patients' response to pioglitazone in China. Acta Pharmacol Sin, 34: 255-261, 2013.

181. Saade, S, Cazier, JB, Ghassibe-Sabbagh, M, Youhanna, S, Badro, DA, Kamatani, Y, Hager, J, Yeretzian, JS, El-Khazen, G, Haber, M, Salloum, AK, Douaihy, B, Othman, R, Shasha, N, Kabbani, S, Bayeh, HE, Chammas, E, Farrall, M, Gauguier, D, Platt, DE, Zalloua, PA, consortium, F: Large scale association analysis identifies three susceptibility loci for coronary artery disease. PLoS One, 6: e29427, 2011.

182. Gong, Y, McDonough, CW, Beitelshees, AL, El Rouby, N, Hiltunen, TP, O'Connell, JR, Padmanabhan, S, Langaee, TY, Hall, K, Schmidt, SO, Curry, RW, Jr., Gums, JG, Donner, KM, Kontula, KK, Bailey, KR, Boerwinkle, E, Takahashi, A, Tanaka, T, Kubo, M, Chapman, AB, Turner, ST, Pepine, CJ, Cooper-DeHoff, RM, Johnson, JA: PTPRD gene associated with blood pressure response to atenolol and resistant hypertension. J Hypertens, 33: 2278-2285, 2015.

183. Visser, M, Palstra, RJ, Kayser, M: Human skin color is influenced by an intergenic DNA polymorphism regulating transcription of the nearby BNC2 pigmentation gene. Hum Mol Genet, 23: 5750-5762, 2014.

184. Paterson, AD, Waggott, D, Boright, AP, Hosseini, SM, Shen, E, Sylvestre, MP, Wong, I, Bharaj, B, Cleary, PA, Lachin, JM, Magic, Below, JE, Nicolae, D, Cox, NJ, Canty, AJ, Sun, L, Bull, SB, Diabetes, C, Complications Trial/Epidemiology of Diabetes, I, Complications Research, G: A genome-wide association study identifies a novel major locus for glycemic control in type 1 diabetes, as measured by both A1C and glucose. Diabetes, 59: 539-549, 2010.

185. Martínez García, F, Chaves Martínez, FJ, Redón i Mas, J, Universidad de Valencia. Departament de Medicina: Elucidating the genetic susceptibility of hypertension associated microalbuminuria: genome wide scan [Tesis doctorals. València, Universitat de València, Servei de Publicacions,, 2010 pp 1 disc òptic (CD-ROM).

222

186. Li, C, He, J, Hixson, JE, Gu, D, Rao, DC, Shimmin, LC, Huang, J, Gu, CC, Chen, J, Li, J: Abstract P253: Genomewide Gene-potassium Interaction Analyses on Blood Pressure: The GenSalt Study. Circulation, 133: AP253-AP253, 2016.

187. Williams, B, MacDonald, TM, Morant, S, Webb, DJ, Sever, P, McInnes, G, Ford, I, Cruickshank, JK, Caulfield, MJ, Salsbury, J, Mackenzie, I, Padmanabhan, S, Brown, MJ, Group, BHSsPS: Spironolactone versus placebo, bisoprolol, and doxazosin to determine the optimal treatment for drug-resistant hypertension (PATHWAY-2): a randomised, double-blind, crossover trial. Lancet, 386: 2059- 2068, 2015.

188. Cooper-DeHoff, R, Handberg, E, Heissenberg, C, Johnson, K: Electronic prescribing via the internet for a coronary artery disease and hypertension megatrial. Clin Cardiol, 24: V14-16, 2001.

189. Stranger, BE, Stahl, EA, Raj, T: Progress and promise of genome-wide association studies for human complex trait genetics. Genetics, 187: 367-383, 2011.

190. Edwards, SL, Beesley, J, French, JD, Dunning, AM: Beyond GWASs: illuminating the dark road from association to function. Am J Hum Genet, 93: 779-797, 2013.

191. Zhang, X, Bailey, SD, Lupien, M: Laying a solid foundation for Manhattan--'setting the functional basis for the post-GWAS era'. Trends Genet, 30: 140-149, 2014.

192. Avior, Y, Sagi, I, Benvenisty, N: Pluripotent stem cells in disease modelling and drug discovery. Nat Rev Mol Cell Biol, 17: 170-182, 2016.

223

193. Waterston, RH, Lindblad-Toh, K, Birney, E, Rogers, J, Abril, JF, Agarwal, P, Agarwala, R, Ainscough, R, Alexandersson, M, An, P, Antonarakis, SE, Attwood, J, Baertsch, R, Bailey, J, Barlow, K, Beck, S, Berry, E, Birren, B, Bloom, T, Bork, P, Botcherby, M, Bray, N, Brent, MR, Brown, DG, Brown, SD, Bult, C, Burton, J, Butler, J, Campbell, RD, Carninci, P, Cawley, S, Chiaromonte, F, Chinwalla, AT, Church, DM, Clamp, M, Clee, C, Collins, FS, Cook, LL, Copley, RR, Coulson, A, Couronne, O, Cuff, J, Curwen, V, Cutts, T, Daly, M, David, R, Davies, J, Delehaunty, KD, Deri, J, Dermitzakis, ET, Dewey, C, Dickens, NJ, Diekhans, M, Dodge, S, Dubchak, I, Dunn, DM, Eddy, SR, Elnitski, L, Emes, RD, Eswara, P, Eyras, E, Felsenfeld, A, Fewell, GA, Flicek, P, Foley, K, Frankel, WN, Fulton, LA, Fulton, RS, Furey, TS, Gage, D, Gibbs, RA, Glusman, G, Gnerre, S, Goldman, N, Goodstadt, L, Grafham, D, Graves, TA, Green, ED, Gregory, S, Guigó, R, Guyer, M, Hardison, RC, Haussler, D, Hayashizaki, Y, Hillier, LW, Hinrichs, A, Hlavina, W, Holzer, T, Hsu, F, Hua, A, Hubbard, T, Hunt, A, Jackson, I, Jaffe, DB, Johnson, LS, Jones, M, Jones, TA, Joy, A, Kamal, M, Karlsson, EK, Karolchik, D, Kasprzyk, A, Kawai, J, Keibler, E, Kells, C, Kent, WJ, Kirby, A, Kolbe, DL, Korf, I, Kucherlapati, RS, Kulbokas, EJ, Kulp, D, Landers, T, Leger, JP, Leonard, S, Letunic, I, Levine, R, Li, J, Li, M, Lloyd, C, Lucas, S, Ma, B, Maglott, DR, Mardis, ER, Matthews, L, Mauceli, E, Mayer, JH, McCarthy, M, McCombie, WR, McLaren, S, McLay, K, McPherson, JD, Meldrim, J, Meredith, B, Mesirov, JP, Miller, W, Miner, TL, Mongin, E, Montgomery, KT, Morgan, M, Mott, R, Mullikin, JC, Muzny, DM, Nash, WE, Nelson, JO, Nhan, MN, Nicol, R, Ning, Z, Nusbaum, C, O'Connor, MJ, Okazaki, Y, Oliver, K, Overton-Larty, E, Pachter, L, Parra, G, Pepin, KH, Peterson, J, Pevzner, P, Plumb, R, Pohl, CS, Poliakov, A, Ponce, TC, Ponting, CP, Potter, S, Quail, M, Reymond, A, Roe, BA, Roskin, KM, Rubin, EM, Rust, AG, Santos, R, Sapojnikov, V, Schultz, B, Schultz, J, Schwartz, MS, Schwartz, S, Scott, C, Seaman, S, Searle, S, Sharpe, T, Sheridan, A, Shownkeen, R, Sims, S, Singer, JB, Slater, G, Smit, A, Smith, DR, Spencer, B, Stabenau, A, Stange-Thomann, N, Sugnet, C, Suyama, M, Tesler, G, Thompson, J, Torrents, D, Trevaskis, E, Tromp, J, Ucla, C, Ureta-Vidal, A, Vinson, JP, Von Niederhausern, AC, Wade, CM, Wall, M, Weber, RJ, Weiss, RB, Wendl, MC, West, AP, Wetterstrand, K, Wheeler, R, Whelan, S, Wierzbowski, J, Willey, D, Williams, S, Wilson, RK, Winter, E, Worley, KC, Wyman, D, Yang, S, Yang, SP, Zdobnov, EM, Zody, MC, Lander, ES, Consortium, MGS: Initial sequencing and comparative analysis of the mouse genome. Nature, 420: 520-562, 2002.

194. Hamlin, RL, Altschuld, RA: Extrapolation from mouse to man. Circ Cardiovasc Imaging, 4: 2-4, 2011.

195. Takahashi, K, Tanabe, K, Ohnuki, M, Narita, M, Ichisaka, T, Tomoda, K, Yamanaka, S: Induction of pluripotent stem cells from adult human fibroblasts by defined factors. Cell, 131: 861-872, 2007.

196. Wilson, KD, Wu, JC: Induced pluripotent stem cells. JAMA, 313: 1613-1614, 2015.

224

197. Yamanaka, S: Induced pluripotent stem cells: past, present, and future. Cell Stem Cell, 10: 678-684, 2012.

198. Studer, L, Vera, E, Cornacchia, D: Programming and Reprogramming Cellular Age in the Era of Induced Pluripotency. Cell Stem Cell, 16: 591-600, 2015.

199. Merkle, FT, Eggan, K: Modeling human disease with pluripotent stem cells: from genome association to function. Cell Stem Cell, 12: 656-668, 2013.

200. Sterneckert, JL, Reinhardt, P, Schöler, HR: Investigating human disease using stem cell models. Nat Rev Genet, 15: 625-639, 2014.

201. Hockemeyer, D, Soldner, F, Beard, C, Gao, Q, Mitalipova, M, DeKelver, RC, Katibah, GE, Amora, R, Boydston, EA, Zeitler, B, Meng, X, Miller, JC, Zhang, L, Rebar, EJ, Gregory, PD, Urnov, FD, Jaenisch, R: Efficient targeting of expressed and silent genes in human ESCs and iPSCs using zinc-finger nucleases. Nat Biotechnol, 27: 851-857, 2009.

202. Hockemeyer, D, Wang, H, Kiani, S, Lai, CS, Gao, Q, Cassady, JP, Cost, GJ, Zhang, L, Santiago, Y, Miller, JC, Zeitler, B, Cherone, JM, Meng, X, Hinkley, SJ, Rebar, EJ, Gregory, PD, Urnov, FD, Jaenisch, R: Genetic engineering of human pluripotent cells using TALE nucleases. Nat Biotechnol, 29: 731-734, 2011.

203. Jinek, M, Chylinski, K, Fonfara, I, Hauer, M, Doudna, JA, Charpentier, E: A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science, 337: 816-821, 2012.

204. Pennisi, E: The CRISPR craze. Science, 341: 833-836, 2013.

205. Heyer, WD, Ehmsen, KT, Liu, J: Regulation of homologous recombination in eukaryotes. Annu Rev Genet, 44: 113-139, 2010.

206. Jasin, M, Rothstein, R: Repair of strand breaks by homologous recombination. Cold Spring Harb Perspect Biol, 5: a012740, 2013.

207. Rouet, P, Smih, F, Jasin, M: Introduction of double-strand breaks into the genome of mouse cells by expression of a rare-cutting endonuclease. Mol Cell Biol, 14: 8096-8106, 1994.

208. Burridge, PW, Li, YF, Matsa, E, Wu, H, Ong, SG, Sharma, A, Holmström, A, Chang, AC, Coronado, MJ, Ebert, AD, Knowles, JW, Telli, ML, Witteles, RM, Blau, HM, Bernstein, D, Altman, RB, Wu, JC: Human induced pluripotent stem cell-derived cardiomyocytes recapitulate the predilection of breast cancer patients to doxorubicin-induced cardiotoxicity. Nat Med, 22: 547-556, 2016.

225

209. Matsa, E, Burridge, PW, Yu, KH, Ahrens, JH, Termglinchan, V, Wu, H, Liu, C, Shukla, P, Sayed, N, Churko, JM, Shao, N, Woo, NA, Chao, AS, Gold, JD, Karakikes, I, Snyder, MP, Wu, JC: Transcriptome Profiling of Patient-Specific Human iPSC-Cardiomyocytes Predicts Individual Drug Safety and Efficacy Responses In Vitro. Cell Stem Cell, 19: 311-325, 2016.

210. Wheeler, HE, Wing, C, Delaney, SM, Komatsu, M, Dolan, ME: Modeling chemotherapeutic neurotoxicity with human induced pluripotent stem cell-derived neuronal cells. PLoS One, 10: e0118020, 2015.

211. Dong, L, Nordlohne, J, Ge, S, Hertel, B, Melk, A, Rong, S, Haller, H, von Vietinghoff, S: T Cell CX3CR1 Mediates Excess Atherosclerotic Inflammation in Renal Impairment. J Am Soc Nephrol, 27: 1753-1764, 2016.

212. Doran, AC, Meller, N, McNamara, CA: Role of smooth muscle cells in the initiation and early progression of atherosclerosis. Arterioscler Thromb Vasc Biol, 28: 812- 819, 2008.

213. Apostolakis, S, Amanatidou, V, Spandidos, DA: Therapeutic implications of chemokine-mediated pathways in atherosclerosis: realistic perspectives and utopias. Acta Pharmacol Sin, 31: 1103-1110, 2010.

214. Wong, BW, Wong, D, McManus, BM: Characterization of fractalkine (CX3CL1) and CX3CR1 in human coronary arteries with native atherosclerosis, diabetes mellitus, and transplant vascular disease. Cardiovasc Pathol, 11: 332-338, 2002.

215. Lucas, AD, Bursill, C, Guzik, TJ, Sadowski, J, Channon, KM, Greaves, DR: Smooth muscle cells in human atherosclerotic plaques express the fractalkine receptor CX3CR1 and undergo chemotaxis to the CX3C chemokine fractalkine (CX3CL1). Circulation, 108: 2498-2504, 2003.

216. Apostolakis, S, Amanatidou, V, Papadakis, EG, Spandidos, DA: Genetic diversity of CX3CR1 gene and coronary artery disease: new insights through a meta- analysis. Atherosclerosis, 207: 8-15, 2009.

217. Kimouli, M, Miyakis, S, Georgakopoulos, P, Neofytou, E, Achimastos, AD, Spandidos, DA: Polymorphisms of fractalkine receptor CX3CR1 gene in patients with symptomatic and asymptomatic carotid artery stenosis. J Atheroscler Thromb, 16: 604-610, 2009.

226

218. McDermott, DH, Fong, AM, Yang, Q, Sechler, JM, Cupples, LA, Merrell, MN, Wilson, PW, D'Agostino, RB, O'Donnell, CJ, Patel, DD, Murphy, PM: Chemokine receptor mutant CX3CR1-M280 has impaired adhesive function and correlates with protection from cardiovascular disease in humans. J Clin Invest, 111: 1241- 1250, 2003.

219. Fong, AM, Robinson, LA, Steeber, DA, Tedder, TF, Yoshie, O, Imai, T, Patel, DD: Fractalkine and CX3CR1 mediate a novel mechanism of leukocyte capture, firm adhesion, and activation under physiologic flow. J Exp Med, 188: 1413-1419, 1998.

220. Daoudi, M, Lavergne, E, Garin, A, Tarantino, N, Debré, P, Pincet, F, Combadière, C, Deterre, P: Enhanced adhesive capacities of the naturally occurring Ile249- Met280 variant of the chemokine receptor CX3CR1. J Biol Chem, 279: 19649- 19657, 2004.

221. Davis, CN, Harrison, JK: Proline 326 in the C terminus of murine CX3CR1 prevents G-protein and phosphatidylinositol 3-kinase-dependent stimulation of Akt and extracellular signal-regulated kinase in Chinese hamster ovary cells. J Pharmacol Exp Ther, 316: 356-363, 2006.

222. Johnson, JA, Boerwinkle, E, Zineh, I, Chapman, AB, Bailey, K, Cooper-DeHoff, RM, Gums, J, Curry, RW, Gong, Y, Beitelshees, AL, Schwartz, G, Turner, ST: Pharmacogenomics of antihypertensive drugs: rationale and design of the Pharmacogenomic Evaluation of Antihypertensive Responses (PEAR) study. Am Heart J, 157: 442-449, 2009.

223. Hamadeh, IS, Langaee, TY, Dwivedi, R, Garcia, S, Burkley, BM, Skaar, TC, Chapman, AB, Gums, JG, Turner, ST, Gong, Y, Cooper-DeHoff, RM, Johnson, JA: Impact of CYP2D6 polymorphisms on clinical efficacy and tolerability of metoprolol tartrate. Clin Pharmacol Ther, 96: 175-181, 2014.

224. Biel, NM, Santostefano, KE, DiVita, BB, El Rouby, N, Carrasquilla, SD, Simmons, C, Nakanishi, M, Cooper-DeHoff, RM, Johnson, JA, Terada, N: Vascular Smooth Muscle Cells From Hypertensive Patient-Derived Induced Pluripotent Stem Cells to Advance Hypertension Pharmacogenomics. Stem Cells Transl Med, 4: 1380- 1390, 2015.

225. van Wilgenburg, B, Browne, C, Vowles, J, Cowley, SA: Efficient, long term production of monocyte-derived macrophages from human pluripotent stem cells under partly-defined and fully-defined conditions. PLoS One, 8: e71098, 2013.

226. Smith, C, Ye, Z, Cheng, L: A Method for Genome Editing in Human Pluripotent Stem Cells. Cold Spring Harb Protoc, 2016: pdb.prot090217, 2016.

227

227. Tsuchiya, S, Yamabe, M, Yamaguchi, Y, Kobayashi, Y, Konno, T, Tada, K: Establishment and characterization of a human acute monocytic leukemia cell line (THP-1). Int J Cancer, 26: 171-176, 1980.

228. Hegde, SP, Zhao, J, Ashmun, RA, Shapiro, LH: c-Maf induces monocytic differentiation and apoptosis in bipotent myeloid progenitors. Blood, 94: 1578- 1589, 1999.

229. Cambien, B, Pomeranz, M, Schmid-Antomarchi, H, Millet, MA, Breittmayer, V, Rossi, B, Schmid-Alliana, A: Signal transduction pathways involved in soluble fractalkine-induced monocytic cell adhesion. Blood, 97: 2031-2037, 2001.

230. Landsman, L, Bar-On, L, Zernecke, A, Kim, KW, Krauthgamer, R, Shagdarsuren, E, Lira, SA, Weissman, IL, Weber, C, Jung, S: CX3CR1 is required for monocyte homeostasis and atherogenesis by promoting cell survival. Blood, 113: 963-972, 2009.

231. Gevrey, JC, Isaac, BM, Cox, D: Syk is required for monocyte/macrophage chemotaxis to CX3CL1 (Fractalkine). J Immunol, 175: 3737-3745, 2005.

232. Ghattas, A, Griffiths, HR, Devitt, A, Lip, GY, Shantsila, E: Monocytes in coronary artery disease and atherosclerosis: where are we now? J Am Coll Cardiol, 62: 1541-1551, 2013.

233. Fenyo, IM, Gafencu, AV: The involvement of the monocytes/macrophages in chronic inflammation associated with atherosclerosis. Immunobiology, 218: 1376-1384, 2013.

234. Rogacev, KS, Seiler, S, Zawada, AM, Reichart, B, Herath, E, Roth, D, Ulrich, C, Fliser, D, Heine, GH: CD14++CD16+ monocytes and cardiovascular outcome in patients with chronic kidney disease. Eur Heart J, 32: 84-92, 2011.

235. Yanagimachi, MD, Niwa, A, Tanaka, T, Honda-Ozaki, F, Nishimoto, S, Murata, Y, Yasumi, T, Ito, J, Tomida, S, Oshima, K, Asaka, I, Goto, H, Heike, T, Nakahata, T, Saito, MK: Robust and highly-efficient differentiation of functional monocytic cells from human pluripotent stem cells under serum- and feeder cell-free conditions. PLoS One, 8: e59243, 2013.

236. Hendriks, WT, Warren, CR, Cowan, CA: Genome Editing in Human Pluripotent Stem Cells: Approaches, Pitfalls, and Solutions. Cell Stem Cell, 18: 53-65, 2016.

237. Vouillot, L, Thélie, A, Pollet, N: Comparison of T7E1 and surveyor mismatch cleavage assays to detect mutations triggered by engineered nucleases. G3 (Bethesda), 5: 407-415, 2015.

228

238. Veeriah, S, Brennan, C, Meng, S, Singh, B, Fagin, JA, Solit, DB, Paty, PB, Rohle, D, Vivanco, I, Chmielecki, J, Pao, W, Ladanyi, M, Gerald, WL, Liau, L, Cloughesy, TC, Mischel, PS, Sander, C, Taylor, B, Schultz, N, Major, J, Heguy, A, Fang, F, Mellinghoff, IK, Chan, TA: The tyrosine phosphatase PTPRD is a tumor suppressor that is frequently inactivated and mutated in glioblastoma and other human cancers. Proc Natl Acad Sci U S A, 106: 9435-9440, 2009.

239. Satou, R, Gonzalez-Villalobos, RA: JAK-STAT and the renin-angiotensin system: The role of the JAK-STAT pathway in blood pressure and intrarenal renin- angiotensin system regulation. JAKSTAT, 1: 250-256, 2012.

229

BIOGRAPHICAL SKETCH

Nihal El Rouby attended Ain Shams University, Cairo, Egypt in 1998 and graduated in 2003 with a Bachelor of Science degree in Pharmacy. After graduation, she worked as a lab instructor in the Department of Analytical Chemistry within the same institution. She then moved to the US and practiced as a pharmacist after passing the requirement for pharmacy licensure. Nihal started the Working Professional Pharm.

D (WPPD) at the University of Florida in 2008 and earned her Pharm.D in 2011. Nihal started Doctor of Philosophy degree in Clinical Pharmaceutical Science at the University of Florida, Center for Pharmacogenomics in 2012. During her PhD, Nihal has co- authored multiple peer-reviewed manuscripts, presented her research at multiple national meetings. She received her Ph.D. from the University of Florida in the spring of

2017.

230