Dissecting the Genetic Etiology of at ETS1

A dissertation submitted to the

Graduate School

of the University of Cincinnati

in partial fulfillment of the

requirements for the degree of

Doctor of Philosophy

in the Department of Immunobiology

of the College of Medicine 2017

by

Xiaoming Lu

B.S. Sun Yat-sen University, P.R. China

June 2011

Dissertation Committee: John B. Harley, MD, PhD Harinder Singh, PhD Leah C. Kottyan, PhD Matthew T. Weirauch, PhD Kasper Hoebe, PhD Lili Ding, PhD

i

Abstract

Systemic lupus erythematosus (SLE) is a complex autoimmune disease with strong evidence for genetics factor involvement. Genome-wide association studies have identified 84 risk loci associated with SLE. However, the specific genotype-dependent

(allelic) molecular mechanisms connecting these lupus-genetic risk loci to immunological dysregulation are mostly still unidentified. ~ 90% of these loci contain variants that are non-coding, and are thus likely to act by impacting subtle, comparatively hard to predict mechanisms controlling expression. Here, we developed a strategic approach to prioritize non-coding variants, and screen them for their function. This approach involves computational prioritization using functional genomic databases followed by experimental analysis of differential binding of factors (TFs) to risk and non-risk alleles.

For both electrophoretic mobility shift assay (EMSA) and DNA affinity precipitation assay

(DAPA) analysis of genetic variants, a synthetic DNA oligonucleotide (oligo) is used to identify factors in the nuclear lysate of disease or phenotype-relevant cells. This strategic approach was then used for investigating SLE association at ETS1 locus. Genetic variants at chromosomal region 11q23.3, near the gene ETS1, have been associated with systemic lupus erythematosus (SLE), or lupus, in independent cohorts of Asian ancestry.

Several recent studies have implicated Ets1 as a critical driver of immune cell function and differentiation, and mice deficient in Ets1 develop an SLE-like autoimmunity. We performed a fine-mapping study of 14,551 subjects from multi-ancestral cohorts by starting with genotyped variants and imputing to all common variants spanning ETS1. By constructing genetic models via frequentist and Bayesian association methods, we identified 16 variants that are statistically likely to be causal. We functionally assessed

ii each of these variants based on their likelihood of affecting binding, miRNA binding, or chromatin state. Of the four variants that we have experimentally examined, only rs6590330 differentially binds lysate from B cells in EMSA. Using DAPA- mass spectrometry, we found more binding of the transcription factor signal transducer and activator of transcription 1 (STAT1) to DNA near the risk allele of rs6590330 than near the non-risk allele. Immunoblot analysis and chromatin immunoprecipitation of pSTAT1 in B cells heterozygous for rs6590330 confirmed that the risk allele increased binding to the active form of STAT1. Analysis with expression quantitative trait loci (eQTL) indicated that the risk allele of rs6590330 is associated with decreased ETS1 expression in Han Chinese, but not other ancestral cohorts. We propose a model in which the risk allele of rs6590330 is associated with decreased ETS1 expression and increases SLE risk by enhancing the binding of pSTAT1. However, whether this differential caused by rs6590330 is biologically relevant or not has not been answered.

We then developed a system combining the Clustered regularly interspaced short palindromic repeats (CRISPR) Homology-directed Repair technique with Tet-inducible

Controlled Gene Expression technique to precisely control gene expression, which can mimic this differential gene expression caused by genetic variants. The investigation of this small differential gene expression may make important progress in the field of genetics and especially the complex genetic disease etiology of SLE.

iii

iv

Acknowledgements

This work was supported in part by grants from Dr. John Harley and Dr. Leah

Kottyan, and Ryan Fellowship granted to XL. I would also like to thank the Center for

Autoimmune Genomics and Etiology (CAGE) department and the Immunobiology graduate program at CCHMC for stipend support, technical support, intellectual training, experimental reagents and research space.

I would like to thank my mentor, Dr. John Harley, for dedicating his time, energy and resources for my scientific training. He always came up with amazing ideas for research. He taught me about how to think the science in a big way and how to develop a good grant. Meanwhile, he tried to make me think about what kind of researcher I want to be in the future and what kind of risk I would like to take for the research. Importantly, he allowed me to explore myself to find my own interest, which gives me the great confidence to be an independent researcher.

I would also like to thank Dr. Leah Kottyan for her mentoring. She is extraordinary nice, patient and responsible. As a graduate student from non-coding background, she patiently taught me about how to do the genetic analysis. As an international student, she taught me the meaning of presentation and how to prepare that. As a trainee, she helped me to plan and interpret experiments. Importantly, she facilitated my success through promoting my understanding of professional development. We worked very closely to tackle scientific questions and make research plans. Apart from our work in the lab, Dr.

Leah Kottyan has been an amazing life mentor. She has been an amazing model for how to balance life and science. I am truly blessed to have her as a mentor, and I do not plan to leave her mentorship as I move on professionally.

v

In addition to Dr. John Harley and Dr. Leah Kottyan, I really thank the rest of my dissertation and qualifier committee members: Dr. Harinder Singh, Dr. Matthew T.

Weirauch, Dr. Kasper Hoebe, Dr. Lili Ding, Dr. Ken Greis and Dr. Iouri Chepelev.

Friends and colleagues have all been invaluable for my graduate study. They helped me to keep strong and confident as international students. Guansheng Liu is especially important to me. He treated me as a brother and helped me to settle down everything in the USA. I deeply appreciated his help during the way. Also, Bo Liu, Coco

Liu, Yuan Li, Shiyu Zhou, Erin Zoller and Carolyn Rydyznski are important support to me.

They helped me enjoy the scientific life and social life in the USA. Meanwhile, Zubin Patel and Erin Zoller are my close research friends. Since we worked on similar projects, it was really good to discuss science with them.

In the end, I really need to thank my family for their sacrifice, understanding, and support during this process. My wife, Chen, has great confidence in my ability to become a successful scientist. She is nice and beautiful. She always has a way to help me live a happy life. We together have a wonderful time in the USA. I would also appreciate the support from my family in China. They worked so hard to provide me with the opportunity to explore myself. They taught me about how to be a person valuable to the society. They are always willing to help me with their best. I really miss them and am sorry for not being able to accompany them for these five years. I hope the best for my family.

vi

Table of Contents

Title ...... i

Abstract ...... ii

Acknowledgements ...... v

Table of Contents ...... vii

Chapter 1. Background and Introduction...... 1

The Immune System in Health and Disease ...... 2

Systemic Lupus Erythematosus Introduction ...... 4

Clinical Features of SLE ...... 4

Diagnostic Criteria/Criteria for Inclusion in Research ...... 5

Pathoetiology of SLE ...... 6

Role of B-Cells in SLE Pathogenesis ...... 8

Cytokines in SLE ...... 9

Immune Cell Dysfunction in SLE ...... 14

Toll-Like Receptors (TLRs) in SLE ...... 19

Complement System in SLE ...... 21

SLE Animal Models ...... 22

Identifying the Genetic Component of SLE ...... 24

Monogenic SLE ...... 26

Linkage Studies Undertaken in SLE ...... 28

Coding Variants Associated with SLE ...... 29

Genome-Wide Association Study (GWAS) Undertaken in SLE ...... 30

vii

Gene Specific Targeted Studies in SLE...... 32

Statistical Methods to Identify Lupus Risk Variants and Expression Quantitative Trait Loci (eQTLs) 32

Common Variants Analysis ...... 32

Rare Variants Analysis ...... 35

Expression Quantitative Trait Loci (eQTLs) ...... 37

Statistical Power Considerations for Genetic Analysis ...... 38

Gene Regulation Introduction ...... 40

Transcription Factors ...... 41

RNA Binding (RBPs) ...... 42

Non-Coding RNAs ...... 44

Alternative Pre-mRNA Splicing ...... 45

ETS1 Function and Its Association with SLE ...... 46

Function of ETS1 in the Immune System ...... 47

Association of ETS1 with SLE ...... 52

Role of ETS1 in Lupus Pathogenesis ...... 54

Published eQTLs for ETS1 In Healthy Individuals ...... 57

CRISPR Introduction and Its Application in Genetic Studies ...... 58

Tetracycline Inducible (Tet-inducible) System ...... 59

Chapter 2. Protocol for Electrophoretic Mobility Shift Assay (EMSA) and DNA-affinity

Precipitation Assay (DAPA) ...... 60

Chapter 3. Lupus Risk Variant Increases pSTAT1 Binding and Decreases ETS1 Expression ...... 70

Chapter 4. Precisely Controlled Differential Gene Expression System ...... 91

Chapter 5. Discussions ...... 105

viii

Summary ...... 106

Summary of “Protocol for electrophoretic mobility shift assays (EMSA) and DNA-affinity

precipitation assays (DAPA)” ...... 106

Summary of “Lupus risk-variant increases pSTAT1 binding and decreases ETS1 expression” ...... 109

Summary of “Precisely Controlled Differential Gene Expression System” ...... 113

Unexplained Questions ...... 117

The relationship of ETS1 and STAT1 in disease pathogenesis ...... 117

Why is ETS1 association specific to the Asian ancestry? ...... 121

The association with other immune-mediated diseases at ETS1 locus ...... 124

The possibility of causality in other immune cell type ...... 128

Future Experiments ...... 130

Assessing the necessity and sufficiency of rs6590330 in the reduction of ETS1 expression ...... 130

Further validating the role of STAT1 in modulating ETS1 expression in a genotype-dependent

manner ...... 131

Identifying the mechanism through which pSTAT1 binds to non-overlapping risk allele of rs6590330

...... 132

Bibliography ...... 135

ix

Chapter 1. Background and Introduction

1

The Immune System in Health and Disease

The immune system of our body is a balanced system. Firstly, it acts as our defense system. Its function is to protect us from the potential harmful microorganisms living in the outside environment, including and archaea, viruses, fungi and parasites [1]. Meanwhile, the immune system needs to kill the malignant cells generated by ourselves from the inside environment, such as cancer cells [2]. On the other hand, the immune system should not attack the non-harmful microorganisms and common antigens in the outside environment, such as the commensal microbiome colonizing our body, harmless pollens, and the food we eat every day. Moreover, the system should tolerate our own normal cells inside our body. Therefore, it is very important for the immune system to establish tolerance to the common foreign antigens and our own antigens. Overall, the balance of the immune system, the equilibrium between the defense and the tolerance is critical for our survival and health in this world.

When the balance of the immune system is broken, the disease will happen. If the defense aspect of the immune system is crashed due to the immune defect of the abnormal system, we call it immunodeficiency diseases. These diseases could be either the result of inherited gene defects disrupting immune system, which are called primary immunodeficiency, or the result of other factors collapsing the immune system, defined as secondary immunodeficiency [1]. Primary immunodeficiency may happen with any defect in the process of immune system, such as the X-linked severe combined immunodeficiency (XSCID) caused by mutations in the gene IL2RG on X , which affects all IL2 family cytokines [3]. Secondary immunodeficiency may be the consequence of certain infections or environmental factors such as starvation and

2 medication. The most extreme example affecting the whole world is the acquired immune deficiency syndrome (AIDS), caused by the infection of human immunodeficiency virus

(HIV). This infection will lead to the progressive loss of CD4+ T cells, which eventually results in the overall failure of the immune system [4].

Besides defense aspect, the disease may also be the result of tolerance loss. The hypersensitivity reactions happen when the tolerance to the foreign antigens and self- antigens is breached. Moreover, when the tolerance to self-antigens is breached, it is also called autoimmunity diseases. The hypersensitivity reactions can be classified into four categories defined by Gell and Coombs, based on the defensive strategy employed by the immune system [5]. With Gell-Coombs classification, type I hypersensitivity reactions are caused by IgE bounding to mast cells, such as allergic asthma; type II hypersensitivity reactions are mediated through antigen-antibody interactions, leading to the local production of C5a, formation of membrane attack complex and subsequent tissue injury, such as vasculitides; type III hypersensitivity reactions are characterized by the formation of antigen-antibody complex, such as rheumatoid arthritis and SLE; type IV hypersensitivity reactions are also called delayed-type hypersensitivity, indicating the reaction caused by T cell mediated immune response, such as insulitis [5]. On the other hand, autoimmunity is referred as the result of breaking self-tolerance or the loss of self- tolerance, which affects around 5% of the population in Western countries [1]. It may only affect a specific cell type, such as Type 1 diabetes, in which the autoreactive T cells mainly attack pancreatic islet cell [6]. Autoimmunity may also be a system-wide phenomenon, such as systemic lupus erythematosus (SLE), which has autoantibodies and autoreactive T cells against DNA, chromatin proteins, etc. These autoantibodies and

3 autoreactive T cells may distribute throughout the entire body and affect the whole body as a system-wide phenomenon [7].

Systemic Lupus Erythematosus Introduction

Clinical Features of SLE

Systemic lupus erythematosus (SLE) is a multisystem autoimmune disorder affecting many critical organs and tissues, including brain, blood and kidney mediated by significant autoantibody production. It can lead to severe organ damage and even death.

The manifestations of SLE are very diverse, ranging from cutaneous involvement to end- stage renal failure [8]. The incidence and prevalence of SLE vary widely across the world with estimated incidence from 0.3 to 31.5 cases per 100,000 individuals per year [9, 10], and estimated prevalence from 3.2 to 517.5 cases per 100,000 individuals [8, 10, 11]. In the USA, an increased trend was observed for the incidence of SLE based on a study from 1950 to 1992 [12], although this may be the result of technological advances in the diagnosis and increased awareness of SLE. On the other hand, the mortality of SLE within

10 years of diagnosis is around 10% now, which is improved significantly from 50% mortality within 3 years in the 1960s [13, 14].

The incidence of SLE is quite different for female and male patients from different ethnic backgrounds. In general, SLE is a female predominant disease with 9:1 ratio for female to male patients [15]. Usually, the non-Caucasian population has a higher incidence and prevalence of SLE than the Caucasian population. For example, several studies have shown that African Americans have around two-fold higher incidence and prevalence than European Americans in the USA [8, 9, 16, 17]. Meanwhile, patients from

4 non-Caucasian backgrounds, including African, Asian, Hispanic and Aboriginal origins will develop disease earlier and show a more progressive development [8]. Also, several lines of evidence indicate that lupus nephritis is more prevalent in patients from non-

Caucasian backgrounds than patients from Caucasian background [8, 18]. One hypothesis is that the genetic and environmental factors associated with the ethnic background may explain these differences.

Diagnostic Criteria/Criteria for Inclusion in Research

Since SLE is a very heterogeneous disease with diverse clinical manifestations, the diagnosis of SLE is very since we need to consider all the syndromes. Currently, the diagnosis is mainly based on the combination of clinical manifestations and the laboratory tests, such as the autoantibody detection test [7]. For inclusion in research studies, two criteria for the classification of SLE are used now. One is the 1997 American

College of Rheumatology (ACR) criteria specifying 11 criteria for classification of SLE, of which at least four criteria must be present in order to be classified as SLE [19]. The current sensitivity and specificity of ACR criteria is 86% and 93%, respectively [7, 20].

The other criteria is the 2012 revised Systemic Lupus International Collaborating Clinics

(SLICC) criteria, specifying 17 criteria of SLE. SLICC requires at least four of 17 criteria to be present to be classified as SLE, including at least one of the 11 clinical criteria and one of the 6 immunological criteria [19]. The new SLICC has sensitivity of 97% and specificity of 84% [7, 21]. Although these two criteria are now widely used in a variety of settings, they were originally developed for research purpose.

5

Pathoetiology of SLE

Currently, the exact etiology of SLE is still unknown. People tend to believe that the interaction of several factors including genetic factor, environment factor and hormones leads to the perturbation of the immune system and the breakdown of tolerance to self-antigens, resulting autoantibody production, immune complex deposition and multi-organ and tissue damages [22]. Both innate and adaptive immune system are involved in the disease pathogenesis. The breakdown of tolerance to self-antigens and subsequent great production of anti-nuclear autoantibodies are proposed to be the critical step in the disease development [23]. Since the production of anti-nuclear autoantibodies is common in healthy subjects, other factors must play important roles in promoting the development of SLE [7].

The current available evidence supports the involvement of both the genetic and environmental factors. The strongest evidence for the genetic factor involvement comes from lupus family studies and lupus twin studies. The sibling recurrence ratio(휆푠) , a measurement of disease family aggregation, is 5.8 to 29 [24]. Meanwhile, the monozygotic twin concordance rate (25%-70%) is consistently higher than the dizygotic twin concordance rate (2-9%) [25-27]. All these results support the importance of the genetic factor involvement. For the genetic components related with SLE, it can be investigated through genetic studies, such as linkage analysis and genome-wide association study (GWAS).

The monozygotic twin concordance rate is less than 100%, suggesting the involvement of environmental factors in the pathogenesis of SLE. Smoking has long been implicated as a potential factor. However, the data about the relationship between

6 smoking and SLE is conflicting [8]. Ultraviolet has also been shown to exacerbate

SLE development, possibly due to its effect on the cell apoptosis [28, 29]. Certain infections, such as Epstein–Barr virus (EBV) and cytomegalovirus, are also thought to trigger SLE [13, 30]. SLE patients have higher EBV viral load compared with the healthy subjects [28, 31]. Also, the antibodies targeting Epstein–Barr nuclear antigen 1 (EBNA1) of EBV can cross-react with double stranded DNA (dsDNA) [7, 32], which may induce the production of autoantibody. This suggests a potential role of EBV in the pathogenesis of SLE. Meanwhile, certain drugs are known to cause SLE possibly due to their effects on DNA [7]. One possible mechanism is through affecting DNA methylation, which can modify gene expression and may expose potential antigen for autoantibody.

One example is hydralazine, which is used as vasodilator. It can decrease the expression of DNA methyltransferase 1 (DNMT1) and DNMT3A which regulate DNA methylation [33].

Besides the genetic and environmental factors alone, gene-environment interactions may also contribute to the pathogenesis of SLE. These interactions have been found for both the smoking and ultraviolet light [28]. In one Carolina Lupus Study, it was found that women carrying null genotype of GSTM1 with over two years of sun exposure have 3-fold increased odds to develop SLE. However, there is no independent association between SLE risk and genotype of GSTM1 or sun exposure [34]. The presence of genetic and environmental factor together induces the SLE risk.

7

Role of B-Cells in SLE Pathogenesis

B cells play an important role in the pathogenesis of SLE. SLE is well known to be associated with polyclonal B-cell hyperactivity, which involves abnormal B cell activation and differentiation to memory or plasma effector cells. B cell lymphopenia and overactivity are among the most striking abnormalities encountered in SLE. Also, most murine models of SLE are associated with diffuse B-cell hyperactivity [35]. Meanwhile, B lymphocytes from patients with lupus produce an array of autoantibodies against soluble and cellular constituents, but most commonly against intranuclear antigens (ANAs), including anti- dsDNA autoantibodies, which is very important in the autoimmune reaction. Moreover, in patients with active SLE, there is a marked reduction in the numbers of naïve B cells

(CD19+CD27–) and enhanced numbers of plasma cells

(CD27highCD38+CD19dimsIglowCD20–CD138+) in the periphery, the frequency of which is correlated with the disease activity in SLE [36]. Also, the fact that B cells from active SLE patients and lupus prone mice both have an intrinsic tendency to over-react to immunologic stimulation suggests that genetic factors affect intrinsic B cell functions in the pathogenesis of SLE. In another study, Raychaudhuri et al demonstrated that B cells were the primary candidates for functional studies to investigate the role of genetic variants in SLE through a statistical approach in which he found an enrichment of gene that were expressed by B cells in loci associated with lupus using a gene-expression dataset of immune cells from the Immunological Genome Consortium [37].

8

Cytokines in SLE

Dysregulated cytokines play an important role in the pathogenesis of SLE, contributing to the tissue and organ damage. Inflammatory cytokines are well known to affect SLE, including type I and type II interferons, interleukin-1 (IL-1), IL-6, IL-17 and tumor necrosis factor-alpha (TNF-α). Other non-inflammatory cytokines such as immunomodulatory cytokines also have a role in SLE pathogenesis, such as IL-2, IL-21,

IL-10 and transforming growth factor (TGF-β) [38].

Type I Interferons

Type I interferons (IFNs) regulate the early response to viral infections, mainly IFN-

α and IFN-β. They are mainly produced by plasmacytoid dendritic cells (pDCs) [39]. SLE patients have a higher IFN-α level in their serum [40]. This increased IFN-α level is also associated with anti-dsDNA antibody production and disease activity [41]. IFN-α is known to upregulate the expression of a group of in patients, these mRNA transcripts are collectively referred as the “interferon signature” [38, 42]. For SLE patients, the immune complexes formed by autoantibody can induce pDCs to produce IFN-α [43]. IFN-α can in turn promote DC maturation, which can disrupt the peripheral tolerance [44]. Then, the escaped self-activated T cell from breakdown of tolerance can help sustain the autoreactive B cell in turn. Meanwhile, the maturated DCs can also enhance the survival of the autoreactive B cell by secreting B cell activating factor (BAFF) [38, 45]. Therefore,

IFN-α induced by immune complex can lead to a self-reinforcing circle to promote the disease progress.

9

Interferon Gamma

Interferon Gamma (IFN-γ) induces the differentiation of T cell to Th1 cell [46]. It also induces the production of IgG2a and IgG3 antibodies for B cell, which activates complement system and promotes inflammation in mice [38, 47]. Its relationship with SLE has been revealed through lupus mice models. In NZB/W F1 mice, Th cells expressing

IFN-γ are associated with disease development [48]. Administration of recombinant IFN-

γ accelerates the disease development, while treatment with antibody against IFN-γ alleviates disease [49]. Also, IFN-γ -deficient mice have reduced glomerulonephritis and lower anti-dsDNA antibody levels [50]. In the pristine-induced mice model, IFN-γ deficiency protects mice from renal disease [51]. Although IFN-γ has been shown to play an important role in lupus mice models, the relationship of IFN-γ with human SLE is still controversial with conflicting finding on the correlation of IFN-γ with severity of lupus nephritis [38].

Interleukin-6 (IL-6)

IL-6 is produced by many cell types and is known to have a variety of different biological functions [52]. IL-6 can induce the differentiation of B cells to plasma cells and the production of IgG antibodies [53]. Also, IL-6 promotes the differentiation and proliferation of T cells [54]. In MRL/lpr lupus mice, the serum level of IL-6 and IL-6R has increased with age [55]. IL-6 deficiency will delay the onset of lupus nephritis and extend survival [56]. In NZB/W F1 mice, IL-6 monoclonal antibody blockade inhibits the autoreactive B cells and autoreactive T cells [57]. IL-6R blockade can inhibit IgG antibody production and delays the development of disease as well [58]. On the other hand, administration of IL-6 to mice accelerates the glomerulonephritis development [59].

10

In SLE patients, IL-6 has an increased serum level [60]. Increased serum IL-6 level is associated with hyperresponsive B cells and autoantibody production [61]. Meanwhile,

IL-6 is involved in lupus nephritis through inducing mesangial cell proliferation [61]. Also, patients with lupus nephritis have increased IL-6 secretion in their urine [62]. IL-6 level is also increased in SLE patients with cardiopulmonary syndromes and neuropsychiatric syndromes. Meanwhile, IL-6 may play a role in the joint damage of SLE patients [38].

Interleukin-17 (IL-17)

IL-17 is produced by several T cell subsets, natural killer (NK) cells and neutrophils

[63]. The CD4+ T cell subset secreting IL-17 is termed as Th17 cells. IL-17 induces the secretion of proinflammatory cytokines including IL-6, IL-1 and IL-21 [64]. Meanwhile,

IL-17 promotes the recruitment of T cell to local inflammatory tissues. IL-17 has also been shown to drive B cell differentiation to plasma cells and autoantibody productions [65].

In MRL/lpr mice, IL-17 expression is associated with lupus disease progression

[66]. Increased IL-17 mediated tissue injuries are observed after ischemic reperfusion of the gut [67]. IL-17 producing T cells are also observed in kidneys of mice with nephritis

[68]. In BXD2 mice, the IL-17 level is higher in the serum and IL-17+ cell number is increased in the spleen, where autoreactive germinal centers are formed with IL-17R expressing B cells [69]. IL-17R inhibition or deletion can block these phenotypes [69].

In SLE patients, the IL-17 level and IL-17 producing cell number are both increased in the serum, which are correlated with disease activity [70]. IL-17 producing cells can penetrate skin, lung and kidneys, which can contribute to the organ damage [71]. IL-17 can also induce peripheral blood mononuclear cells (PBMCs) to produce anti-double

11 stranded DNA (anti-dsDNA) antibody, which may indicate that IL-17 can promote B cell activation [72].

Interleukin-21 (IL-21)

IL-21 is secreted by several CD4+ cell subsets and natural killer T (NK T) cells. IL-

21 can promote CD8+ T cell proliferation [73]. IL-21 can also drive Th17 differentiation and negatively regulate induced regulatory T (iTreg) cells [74, 75]. IL-21 is shown to be involved in the follicular helper T (Tfh) cells development and is essential for the formation of germinal centers [76]. The effect of IL-21 on the B cell is context based. With antigen present, IL-21 can activate B cell and induce the differentiation of plasma cells. Also, IL-

21 can activate the apoptosis of resting and activated B cells [77]. However, in the context of no antigen or non-specific polyclonal signal, IL-21 will deplete the autoreactive B cells.

In MRL/lpr mice, the number of Tfh cells and extrafollicular T helper cells are increased [78]. IL-21R/Fc recombinant protein blockade leads to reduced autoantibody level and alleviates the manifestation of SLE in MRL/lpr mice [79]. In BXSB.B6-Yaa+ lupus mice model, the IL-21 mRNA level is increased as well. IL-21R depletion blocks the disease development [80]. For SLE patients, the IL-21 level is high in serum. Also, some patients may have increased number of Tfh cells related to elevated IL-21 level [81].

Interleukin-2 (IL-2)

IL-2 is mainly produced by and responded by T cells [38]. IL-2 was first identified as a driver of T cell clonal expansion. Later, IL-2 was found to maintain the suppressive function and the homeostasis of regulatory T (Treg) cells and inhibit IL-17 production, which indicates its potentially important role in preventing autoimmune diseases [38]. IL-

2 deficiency is associated with reduced Treg cell levels in mice, which may lead to

12 aberrant activation of T cells and B cells [82]. Also, IL-2 deficient mice have increased levels of IL-17 in the serum and elevated IL-17 producing T cells in the peripheral lymph nodes [38]. Mice deficient in IL-2 or IL-2R have enlarged peripheral lymphoid organs

(lymphadenopathy and splenomegaly); impaired activation-induced cell death (AICD), an apoptotic process of eliminating excess effector cells; and phenotypes, which are suggestive of autoimmune disorders [83]. In lupus mouse models, the production of IL-2 becomes defective with the onset of the disease [84]. By administering IL-2 to mice, the unbalance of Treg with T effector cells can be repaired and the progress toward SLE will be delayed [38].

For SLE patients, the production of IL-2 is reduced and the number of Treg cells is decreased [85]. Consequently, the ratio of Treg to Th17 cells is reduced and is correlated with the disease severity of SLE. T cells from SLE patients are more resistant to AICD, leading to the survival and endurance of autoreactive T cells. Therefore, reduced

IL-2 level in SLE patients may lead to decreased Treg cells, defective AICD and increased

IL17 production [86].

B-Cell Activating Factor (BAFF)

BAFF is critical for B cell homeostasis. The BAFF serum level is increased in SLE patients and is positively associated with autoantibody production [13, 87]. Increased

BAFF levels may also weaken the B-cell selection, leading autoreactive B cells to persist in the periphery [13, 88]. Overexpression of BAFF in lupus mice model can accelerate the disease progression [13, 89]. Moreover, Belimumab, the anti-BAFF monoclonal antibody, is the first FDA approved drugs for SLE treatment in many decades [13].

13

Immune Cell Dysfunction in SLE

B Cells

IgG anti-dsDNA autoantibodies are a hallmark of SLE, suggesting the importance of B cells in the pathogenesis of SLE [90]. Among these autoantibodies, the anti-Smith autoantibody is highly specific for SLE. Also, anti-dsDNA autoantibody is associated with lupus nephritis [91]. These autoantibodies can activate myeloid cells through Fc receptors.

Also, they can induce antibody dependent cell cytotoxicity and induce inflammation [90].

Besides the role of B cells in autoantibodies, B cells may also have an antibody- independent role in SLE. As we known, mice with normal numbers of B cells without circulating antibodies (mIgM MRL/MP-Fas-lpr mice) can still develop autoimmunity [91].

Therefore, B cell has a role in both the antibody-dependent and antibody-independent pathways in the pathogenesis of SLE.

B cells in SLE can be divided into three subsets: autoreactive B cells, protective B cells and regulatory B cells, which maintain homeostasis [91]. Although all have autoreactive B cells, the frequency of autoreactive B cells is higher in SLE patients.

CD27high plasmablasts are increased in SLE patients, and are associated with active SLE

high disease. Also, CD27 plasmablasts have highly mutated and clonally related VH gene rearrangements. Meanwhile, CD27+IgD- memory B cells are increased in SLE patients with decreased inhibitory IgG Fc receptor-FcRIIB expression. In human, regulatory B cells are defined as CD19+CD24highCD38high B cells producing IL-10. Reduced numbers and defective functions of regulatory B cells are observed in SLE patients. Regulatory B cells from SLE patients have shown decreased IL-10 secretion and reduced suppressive capacity [91].

14

Two main mechanisms checking B cells for autoreactivity is clonal deletion and receptor editing [91]. Precise mechanisms about how B cells breach the tolerance checkpoints are still unknown. Several surface receptors are suggested to be associated with the loss of tolerance. For example, CD40L expression on SLE B cells are increased

[91]. Interaction of CD40 with CD40L is important for germinal center (GC) reaction.

Blockade of CD40-CD40L interaction inhibits the Ig secretion of peripheral B cells from lupus patients. Also, engagement of FcRIIB with B-cell receptor (BCR) is an important negative regulatory signal for B cells [91]. Defective FcRIIB expression on B cells of patients may contribute to the overeactivity in SLE patients. Mice deficient in FcRIIB will develop SLE-like disease with autoantibody production and nephritis. Abnormal B cell receptor (BCR) down-stream signaling is also observed in SLE patients, which may also contribute to the breakdown of self-tolerance. Meanwhile, decreased RAG-2 transcription may contribute to the extended survival and decreased apoptosis of B cells.

Monocytes and

Macrophages are important phagocytic cells. Macrophages clear apoptotic cells, produce cytokines, including type I IFNs, and present antigens to T cells [92].

Macrophages of SLE patients have defects in apoptotic cell clearance, which may increase the exposure of auto-antigens. Meanwhile, in the SLE inflammatory environment, monocytes and macrophages will present self-antigens to autoreactive T cells [93]. Also,

SLE macrophages have increased production of proinflammatory cytokines, including IL-

23, type I IFNs and IL-6, which may promote the loss of immune tolerance by stimulating antibody production from B cells [94]. Moreover, monocytes may infiltrate into kidneys and blood vessels, causing renal damage and atherosclerosis [13].

15

Dendritic Cells (DCs)

DCs are very important antigen presenting cells in the immune system, which process and present peptide antigen to T cells. DCs also play a role in clearance of dying cell and tolerance maintenance when antigens are not present [95]. DCs are classified into several subsets: myeloid DCs (conventional or classical DCs or mDCs), plasmacytoid

DCs (pDCs), monocyte-associated DCs and tissue DCs. mDCs express CD11c and

CD11b, not CD14 and CD16 [95]. mDCs can express a variety of pattern recognition receptors, including TLR 1-8, and stimulate CD4+ T cell differentiation [95, 96]. pDCs express CD123, CD303 and CD304, but not CD11c or CD11b. pDCs are the strong producers of type I IFNs [95]. In the autoimmune diseases, DCs are also critical for CD4+ and CD8+ T cell activation [95].

DCs are thought to contribute to SLE pathogenesis through defective dying cell clearance, leading to the accumulation of dying cells and their debris [97]. Increased levels of apoptotic cells have been found in the serum of SLE patients [95, 98]. Self-RNA and self-DNA from these apoptotic cells may induce the production of type I IFNs by pDCs

[95, 99]. Increased type I IFNs may in turn promote B cell activation and T cell survival, leading to the antibody production and clonal expansion [95, 100].

SLE patients have reduced number of conventional DCs and increased number of pDCs [13]. pDCs have enhanced ability to stimulate T cells and reduced ability to induce

Foxp3 expression in T cells when cultured with apoptotic polymorphonuclear leukocytes

(PMNs) [95]. Meanwhile, mDCs of patients promote autoreactivity instead of tolerance

[13]. mDCs are also shown to be critical for lupus development in at least one mice model

[13].

16

Neutrophils

Neutrophils are important in the control of bacterial and fungal infection [101].

Neutrophils kill pathogens via secretion of degradative enzymes (e.g., proteases, hydrolases, and nucleases) and generation of reactive species (ROS) [102, 103].

Neutrophils also produce cytokines, chemokines to promote inflammation. Neutrophils may regulate the homeostasis and function of T and B cells as well [104]. Most importantly, neutrophils are suspected to be a source of the auto-antigens that promote autoimmunity

[102]. The potential mechanisms are neutrophil degranulation, which releases granule enzymes and changes the plasma membrane properties; apoptosis, which also changes the properties of the plasma membrane; and neutrophil extracellular trap (NET) formation

[102]. The neutrophil degranulation may convert the serum proteins into auto-antigens

[105]. And neutrophil apoptosis may expose the auto-antigens, such as DNA and proteins on their cell surfaces [106]. Moreover, NET extrudes chromatin including DNA and associated histones into the extracellular environment, which may also serve as a source of auto-antigens [107].

SLE neutrophils can form NETs upon type I IFN stimulation. NETs expose large amount of DNA to extracellular environment, which stimulates pDCs to produce type I

IFNs. Type I IFNs can further promote the formation of NETs by neutrophils, thereby forming a cycle to promote SLE development [108].

In SLE patients, neutrophils have several abnormal phenotypes, including impaired phagocytosis, abnormal ROS generation and increased apoptosis [102]. Also,

SLE patients have increased defensin in their serum, suggesting dysregulated neutrophil function. The increased apoptosis of neutrophils may promote auto-antigen exposure,

17 which contributes to autoimmunity. Neutrophils can also enhance the immunoglobulin production by B cells [102].

T cells

T cells are vital for SLE pathogenesis through secreting pro-inflammatory cytokines, providing help for B cell autoantibody production and generating memory T cell to sustain the disease [109]. SLE T cells have extensive inflammatory gene expressions, which includes the IFN gene signature. T cells have many different subgroups, which can be broadly classified into pro-inflammatory and anti-inflammatory subsets. Their prevalence and functions in SLE are all altered.

CD8 T cells are responsible for controlling infection and malignancy through secreting cytotoxic proteins including perforin and granzymes. SLE CD8 T cells have reduced cytotoxic activity, leading to an increased risk of infection [110].

Double-negative (DN) T cells (CD4-CD8-) normally have an immunosuppressive function through antigen competition and T cell killing [111]. However, DN T cells of SLE patients are pro-inflammatory through producing IL-17 [112]. DN T cells in SLE can also originate from exhausted autoreactive or continuously stimulated CD8 T cells [112].

CD4 T cells are important contributors of autoantibody production and inflammation in SLE. SLE patients will have increased effector/memory T cells and skewed Th17 cell production [113]. Th17 cells are positively correlated with lupus nephritis development. Meanwhile, SLE patients have increased extrafollicular T helper cells, which are the cell sharing follicular T helper (Tfh) cell phenotype in the peripheral blood. Tfh cells can interact with B cells to produce IgG antibody in germinal centers. Also,

Tfh cells can promote the differentiation of B cells to memory B cells and plasmablasts

18 through IL-21 and IL-10 [114]. The extrafollicular T helper cells sharing similar function are also correlates with autoantibody production and plasmablasts in the peripheral blood of SLE patients [115].

T regulatory cells (Treg) are important suppressive cells in the control of immune system. CD4+ Treg cells have high expression CD25 and FoxP3, with low levels of CD127.

Their suppressive function is impaired in SLE patients. There is a dysregulation between effector T cell and CD4+ Treg cells in SLE patients. CD8+ Treg cells can also express

FoxP3 with high levels of CD25 and low levels of CD127. CD8+ Treg cells can suppress the immune reaction through cytokine production or direct contact [116]. CD8+ Treg cells may also control immune response through the inhibition of Tfh cells. It has been shown that CD8+ Treg cells from patients have impaired suppression capacity for effector T cells

[109, 117].

Toll-Like Receptors (TLRs) in SLE

Recent studies have shown that TLRs play an important role in the pathogenesis of SLE through recognition of auto-antigens [118]. TLRs are innate immune receptors recognizing pathogen-associated molecular patterns (PAMP). TLRs are expressed on a variety of immune cells including B cells, DCs, and macrophages. TLRs could either be cell surface receptors or intracellular receptors. Those belongs to cell surface receptors include TLR2 recognizing lipopeptide, TLR4 recognizing lipopolysaccharide, and TLR5 recognizing flagellin. TLRs of intracellular receptors consist of TLR3 recognizing double- stranded RNA (synthetic mimic Poly I:C), TLR7 and TLR8 recognizing single-stranded

RNA, and TLR9 recognizing unmethylated single-stranded DNA containing cytosine–

19 phosphate–guanine (CpG) motifs [119]. TLRs are thought to bind self-antigens in the autoimmune conditions, which may contribute to autoantibody production [120].

TLR2, TLR4 and TLR5

In SLE patients, TLR2 and TLR4 mRNAs are increased in PBMCs. Also, TLR5 mRNA level is associated with the IFN-α mRNA level in PBMCs [121]. TLR4 deficiency and TLR2 deficiency can decrease the autoantibody production and attenuate nephritis development in lpr mutation-induced lupus mice model [122]. TLR4 deficiency also reduces the production of pathogenic cytokines, anti-dsDNA and anti-RNP antibodies, and decreases the renal damage in the pristane-induced lupus mice model [123].

TLR3

In MRL/lpr mice, TLR3 mRNA is increased as nephritis develops. Administration of Poly I:C exacerbates lupus nephritis. However, TLR3 deficiency has no effect on the autoantibody formation, and Poly I:C administration does not promote autoantibody production, indicating TLR3 may advance the SLE disease through a B cell independent pathway [124].

TLR7, TLR9 and TLR8

Autoantibody production is the main phenotype of SLE, including anti-dsDNA and anti-Sm/RNP antibody. In SLE patients, TLR7 and TLR9 mRNAs are increased in PBMCs and are correlated with IFN-α production. Mice studies have shown that immune complexes containing RNA and DNA activate the TLR7 and TLR9 of B cells through BCR internalization, which increases antibody production [125, 126]. TLRs can also induce pDCs to produce type I IFNs, which can lead mDCs to secret BAFF and activate autoreactive B cells [118].

20

In BXSB mice, TLR7 overexpression leads to increased autoreactive B cells.

Disrupted TLR7 signaling decreases autoantibody production [127]. In pristane-induced mice, TLR7 deficient pDCs have remarkably reduction of IFN-α production [128]. In

MRL/lpr mice, TLR7 deficiency ameliorates the SLE disease phenotype [129]. For TLR9, mice studies show that it is involved in the autoantibody production. However, TLR9 deletion leads to exacerbation of disease, indicating an unanticipated protective role of

TLR9 [129]. For TLR8, the limited studies about its function in SLE preclude the understanding of its role in SLE.

Complement System in SLE

The complement system is a component of the innate immune system. This system is a collection of more than 30 proteins. These proteins can be classified into three categories: proteins participating in the complement activation; proteins participating in the control of complement activation; and cell membrane proteins cooperating with the other complement protein fragments during activation. The complement cascade may be activated through one of the three pathways: the classical, lectin and alternative pathways

[130]. The activation of complement cascade will lead to the formation of membrane- attack complex which can result a hole in the target cell and kill the cell through cell lysis.

The complement cascade can kill the microbe and clean the damaged cell, which is an important part of the host protection.

Autoantibody production is the main feature of SLE, which can promote complement activation [131]. Moreover, immune complexes formed by autoantibody can deposit in the kidney glomeruli, which will then activate the classical pathway of

21 complement and lead to tissue damage and nephritis. Most importantly, C1q deficiency virtually always leads to SLE. Deficiency in the other complement proteins may also increase SLE risk with a lesser extent. Supplementing these patients with C1q through plasma transfusion can relieve the disease pathology, suggesting that C1q has an immune inhibitory role in the context [132].

SLE patients often have autoantibody against C1q, which leads to decreased C1q level and a phenotype similar to C1q deficiency [133]. Meanwhile, C1q autoantibody can exacerbate the effect of C1q containing immune complex deposition in the kidney [134].

NETs can also activate complement system through C1q [135]. C1q will prevent the degradation of NETs and promote the clearance of NETs by macrophages [136].

Therefore, autoantibodies against complement protein may affect the opsonisation and phagocytosis of apoptotic cells, which leads to impaired apoptotic cell clearance [131].

Complement may also contribute to the negative selection of B cells. Complement deficiency may lead to less efficient B cell selection and increase the risk of developing autoreactive B cells [137]. C4 deficient mice have increased number of autoreactive B cells. C3d is found to induce the proinflammatory cytokine production of T cells, which may contribute to the FcRγ-activated T cells in SLE [138].

SLE Animal Models

Mice models of SLE are valuable tools for studying the pathogenesis of SLE and assessing the development of drugs. Mice models provide a faster system of SLE with around 50% mortality at 5-9 months [139]. Each mice model can only capture certain

22 phenotypes reminiscent of human SLE. Therefore, it is important to select the right model depending on the aim of the study.

Several spontaneous mice strains have been developed to study the immunological mechanism of SLE, including New Zealand black and white F1 (NZB/W),

Murphy Roths Large/lymphoproliferative (MRL/lpr), and BXSB mouse models. They all produce autoantibodies and develop nephritis [140]. IFN-α dependent models and Toll- like receptor 7 (TLR7) -associated strains are also important tools for deciphering the related pathways of SLE [141].

MRL/lpr Mice Model

MRL/lpr mice carry a loss-of-function mutation (lpr) within gene Fas, a protein regulating apoptosis. They are characterized by lymphoproliferation, enlarged lymph nodes (lymphadenopathy) and nephritis [142]. Around 25 to 75% of the mice will develop arthritis. Mice have high anti-dsDNA and anti-sn-RNP autoantibody production. However, there is no female predominance in MRL/lpr mice [143]. Meanwhile, the lymphadenopathy is not a common manifestation of human SLE. Besides, these mice have no IFN signature phenotype [144]. Therefore, MRL/lpr mice are not good model for the study of IFN in SLE.

NZB/W Mice Model

NZB/W mice have lymphadenopathy, splenomegaly, autoantibody production and immune complex (IC)-mediated nephritis [143]. These mice have strong female bias and

IFN signature in the spleen [145]. The main contribution from these mice has been the identification of a sle1 locus, which is partially responsible for the autoantibody production

[146]. This sle1 locus encodes genes from FcγR, SLAM and IFN-inducible receptor families.

23

BXSB and Associated Strains with TLR7 Upregulation

The BXSB mice model is a rapid-onset lupus model with more severe disease in males [143]. This male bias is the result of the presence of the Y-autoimmune accelerator

(Yaa) locus, which duplicates several genes and increases their expressions, especially

TLR7. Also, Yaa locus can expedite disease development of other models, including MRL,

NZW and NZB lupus-prone mouse models. However, Yaa locus itself cannot induce the disease [147]. Therefore, the autoimmune genetic predisposition is still required for the disease development. BXSB mice have the IFN signature. Administration of anti-IFNαR antibody is especially useful for preventing SLE at early stages of the disease for BXSB mice model [148]. Therefore, BXSB mice can be used for studying the effect of type I IFN at the early stages of SLE.

IFNα-Dependent/Driven Mice Models

The IFN signature can be induced through chemicals in the non-autoimmune predisposition mice, including pristane and other hydrocarbons [149]. The induced disease is TLR7 and IFN dependent [149]. The mice can develop arthritis, nephritis, anti- dsDNA and anti-sn-RNP autoantibodies. Also, this chemical inducing method can be used on spontaneous lupus mouse models to induce the rapid-onset and more severe

SLE phenotypes.

Identifying the Genetic Component of SLE

Currently, the exact etiology of SLE is unknown. The available evidence suggests the involvement of both genetic and environmental factors. The strongest evidence for genetic factor involvement comes from the lupus family studies and the lupus twin studies.

24

The sibling recurrence ratio (휆푠) , a measurement of the disease family aggregation, is around 5.8 to 29 [24]. Meanwhile, monozygotic twin concordance rate (25%-70%) is consistently higher than dizygotic twin concordance rate (2-9%) [25-27]. These data supports the importance of the genetic involvement. Through the genetic studies in SLE, we can obtain a better understanding of SLE pathogenesis and improve the treatment for

SLE patients. Especially when the cellular and molecular mechanisms that underlie the genetic studies are understood, the personalized medicine is anticipated to greatly improve the management of SLE patients’ life.

The contribution of genetic factors in the SLE inheritance is around 66%, measured by heritability [150]. Therefore, identification of these genetic factors is anticipated to greatly advance our understanding of the disease etiology and promote the development of treatment for SLE patients.

Several theoretical models have been proposed for the susceptibility of the disease.

Chakravarti [151] suggests that the disease is a result of mixture of rare variants and common variants. Patients could develop disease with a rare mutation at single locus that has a large effect size, or with many common variants of small to modest effects, or somewhere between these two extreme scenarios. This model fits quite well with the actual research and clinical findings. Some SLE patients will develop disease with monogenetic mutation in the mode of Mendelian inheritance [152], while other patients are more likely to develop SLE with carrying many common variants.

These common variants have relatively small risk effects, which are presumed to act together to raise the risk of disease. This feature of common varinats makes the disease inherit in a complex mode instead of Mendelian mode [153]. Also, this feature

25 makes the identification of the causal variant very difficult. Currently, several methods have been well developed to tackle the common variants problem. Linkage analysis, mapping the disease associated regions with typing makers in families using recombination, has been commonly used in the genetic study before the genome-wide association study (GWAS) era. The disadvantage of linkage analysis is that it cannot detect variants of small or modest effect with reasonable sample size, which means a low power [154]. Meanwhile, linkage analysis can only narrow the associated region down to around 10cM in genetic distance [155]. However, these limitations can be overcome by the association studies, which maps the disease associated regions through typing makers using linkage disequilibrium (LD) instead of recombination [154]. The association studies have better power to detect the common variants with reasonable sample size and can narrow the region down to as close as less than 20bp in some areas of the [155], which allows further fine mapping of the target region with dense markers to identify the actual causal genetic variants leading to the disease.

Monogenic SLE

Monogenic SLE is the SLE caused by single gene mutation, which usually has strong effect size. The inheritance of monogenic lupus often fits Mendelian model. Study of the monogenic lupus provides important insights about the pathogenesis of lupus.

Current identified monogenic mutations causing lupus are reviewed in Table 1, modified from Lo, M.S et al [152], The affected pathways are complement system, apoptosis, immune clearance, nucleic acid sensing, DNA damage, type I IFN, tolerance, Ras/MAPK signaling and collagen degradation.

26

Table 1 Monogenic conditions causing SLE-like disease[152]

Gene Protein Inheritance Pathway Phenotype Comments C1QA, C1QB, C1q AR Complement/ SLE with rashes, also GN, C1QC Immune Clearance CNS; anti-dsDNA often negative; recurrent infections C1R C1r AR Complement/ SLE with rashes, also GN; Immune Clearance recurrent infections C1S C1s AR Complement/ SLE with rashes, also GN; Immune Clearance recurrent infections C2 C2 AR Complement/ SLE with rashes, arthritis; Immune Clearance autoantibodies often negative; recurrent infections C4A, C4B C4 AR Complement/ SLE with variable Immune Clearance phenotype; rashes, proliferative GN; recurrent infections TNFRSF6 Fas AD Apoptosis Autoimmune lymphoproliferative syndrome (ALPS); autoimmune cytopenias FASLG FasL AD Apoptosis ALPS; SLE with lymphadenopathy DNASE1 DNase1 AD Immune Clearance SLE; high titers of anti- nucleosomal antigens DNASE1L3 DNase1L3 AR Immune Clearance Anti-dsDNA antibodies, low C3/C4; GN and positive ANCA also common; very young age of onset TREX1 TREX1 AR/AD Nucleic Acid AGS; FCL; SLE with Sensing/DNA severe CNS disease Damage/Type I IFN TMEM173 STING AD Nucleic Acid SAVI; FCL Sensing/DNA Damage/Type I IFN SAMHD1 SAMHD1 AR Nucleic Acid AGS; SLE Sensing/DNA Damage/Type I IFN ADAR1 ADAR1 AR/AD Nucleic Acid AGS Sensing/DNA Damage/Type I IFN IFIH1 IFIH1, AD Nucleic Acid AGS; Singleton-Merton MDA5 Sensing/DNA syndrome Damage/Type I IFN

27

RNASEH2A, RNaseH2 AR Nucleic Acid AGS RNASEH2B, complex Sensing/DNA RNASEH2C Damage/Type I IFN NEIL3 NEIL3 AR Nucleic Acid Recurrent infections; Sensing/DNA autoimmune cytopenias Damage ACP5 TRAP AR Type I IFN Spondyloenchondrodyspl asia; autoimmune cytopenias; recurrent infections PRKCD PKCδ AR Apoptosis/ SLE Tolerance RAG2 RAG2 AR/AD Tolerance Leaky SCID; Omenn syndrome; autoimmune cytopenias; granulomas SHOC2 SHOC2 AD Ras/MARK Noonan syndrome; SLE with polyarthritis KRAS K-Ras AD Ras/MARK Noonan syndrome; SLE PEPD PEPD AR Collagen Prolidase deficiency; degradation severe leg ulcers; SLE AR autosomal recessive, AD autosomal dominant, GN glomerulonephritis, CNS central nervous system disease, AGS Aicardi-Goutieres syndrome, FCL familial chilblain lupus, , SAVI stimulator of IFN genes (STING)-associated vasculopathy with onset in infancy, SCID severe combined immunodeficiency [152].

Linkage Studies Undertaken in SLE

Linkage analysis is a tradition way to identify the location of gene causing disease.

It is designed based on the genetic recombination structure obtained from a family-based pedigree tree [154]. The marker used in the linkage study are simple tandem repeats or microsatellites flanking by unique sequences. The resolution of linkage analysis is around

10cM given human recombination structure [153]. The aim of linkage analysis is to identify chromosome regions that are inherited by affected children more often than expected from Mendelian random segregation [153]. Since SLE is a complex disease, the appropriate method should be nonparametric linkage analysis based on allele- or identity- by-descent (IBD)-sharing. The two most common strategies used in the design are

28 affected sib-pair (ASP) design and the affected relative design [154]. The ASP design method requires families with two or more affected siblings to calculate the linkage using the likelihood-ratio method. The affected relative design method requires extend families with two or more affected relatives (not just parent-child only) to calculate the linkage using nonparametric linkage score. In both designs, only the affected siblings or relatives are used in the analysis [154].

Six loci have been suggested to contain the SLE susceptibility genes through linkage analysis [150]. They are 1q23-24, 1q41-42, 2q37, 4p16-15, 6p11-22, and 16q12-

13 [150]. Now linkage analysis is no longer commonly employed due to the development of Genome-Wide Association Study (GWAS), which has a resolution about every 60 ~

200kb [156]. 1q23-24 contains candidate gene FCGR2A, which is also identified by

GWAS. 1q41-42 contains candidate gene PARP. 2q37 region seems to be

Icelandic/Swedish ancestry unique. 4p16-15 contains candidate genes CD38, BST1,

ZNF36, ZNF134 and ZNF136. 6p11-22 contains the MHC region and candidate genes

DR, DQ, C2, C4, TNFA, which are also identified by GWAS. 16q12-13 contains candidate gene NOD2, which is also identified by GWAS [150].

Coding Variants Associated with SLE

Before GWAS era, some candidate genes are tested by association study using case-control design. These genes are MHC loci, TNF, complement, mannose-binding lectin (MBL), Fcγ receptors, IL-10, and the angiotensin-converting (ACE) [150].

The identified coding variants are missense mutation in C1q, TNFR2, FCGR2A, FCGR3A,

DRB, DQA, MBL; deletion in C2 and C4; and promoter mutation in IL10 and TNFA [150].

29

Genome-Wide Association Study (GWAS) Undertaken in SLE

Based on the paper of Risch and Merikangas, the association study is better for detecting common variants of small to modest effect size compared with linkage analysis

[157]. The idea of association study is to test the correlation between the genetic variants and disease phenotype [154]. Meanwhile, based on the common disease/common variant (CDCV) hypothesis, Becker proposed that complex disease may be caused by the combination of common disease-specific alleles and non-disease-specific alleles together with environmental factors [158]. Therefore, it is reasonable to detect the disease related common variants through genotyping a dense set of genetic variants, usually

SNPs, across genome and performing large scale association tests, which is GWAS.

GWAS employs linkage disequilibrium (LD) structure, the non-random association for a set of alleles in the population, to select a set of TagSNPs to represent all the variants in mutual high LD region. On average, LD region breaks around every 60 ~ 200kb [156].

In order to capture the whole genome, LD patterns of the whole genome need first to be known. By year 2005, with the completion of LD structure from HapMap project, it is possible to select TagSNPs for the whole genome to perform association analysis.

The two most common designs for GWAS are population based studies, which compare affected cases to unaffected controls for the distribution of SNPs, and family based studies, which use transmission disequilibrium test (TDT) to measure the transmission of SNP from parents to children [154]. There is no remarkable difference between these two designs for the performance [154]. Population based studies are easy to collect samples but will suffer the problem of population stratification. However, family based studies do not suffer the population stratification problem but are hard to collect

30 enough samples, often having lower power. The problem of population stratification can also be adjusted using ancestry-informative-markers (AIMs), which have different frequencies for different ancestries, and genomic control, a method for correcting the statistic inflation caused by population sub-structure through adjusting the chi-square statistical significance threshold [156].

Currently, at least nine SLE GWASs have been done with some of the studies having at least one locus reaching genome-wide significance (p < 5 X 10-8). The first two

GWASs are performed in European populations in 2008 [159, 160]. The most associated region is the HLA region, which is usually found in the GWAS. In one study, the non-HLA loci identified in the study are ITGAM, PXK, KIAA1542 (later recognized as IRF7). Also, supporting evidence for FCGR2A, IRF5/TNPO3, PTPN22 and STAT4 are found. Several loci with suggestive support for association are detected including NMNAT2, ATG5, ICA1,

XKR6, C8orf12, BLK, LYN, UBE2L3 and SCUBE1. Meanwhile, the other study identified the non-HLA loci ITGAX-ITGAM locus and the C8orf13-BLK locus, and provided supporting evidence for IRF5 and STAT4.

Later, there are two Chinese GWASs and one Korean GWAS performed for Asians

[161-163]. The first Chinese GWAS identified 9 novel susceptibility loci for SLE, including

ETS1, IKZF1, RASGRP3, SLC15A4, TNIP1, and loci at 7q11.23, 10q11.22, 11q23.3,

16p11.2. The second Chinese GWAS identified ETS1 and WDFY4 with several European loci replicated, including BANK1, STAT4, BLK, TNFAIP3, TNFSF4, IRF5 and the HLA.

For the Korean GWAS, the new loci are the intergenic SNP between FCHSD2 and P2RY2, and ATG16L2. The GWASs for the other ancestries are still undergoing now. In total,

31 there are 84 risk loci having been identified to be associated with lupus [164]. The details of the association result are reviewed in [165].

Gene Specific Targeted Studies in SLE

What GWAS really identifies is that the high LD region represented by the associated TagSNP is associated with the disease. In order to find the real causal variant, the one that leads to disease, gene specific targeted studies are performed, which is also called fine-mapping studies. This kind of study mainly employs custom genotyping array that provides data with a much denser variants than array used in GWAS to dissect the associated LD region through association test. With direct genotyping and imputation

[166], the genetic variants within the LD region are captured to the extent. Then, the associated genetic variants are ranked for functional testing. After that, the functional genetic variants are tested for plausible causality [167].

So far, only a few gene associations have been followed by functional studies.

They have been reviewed and summarized in [165]. The major investigated loci shared by all populations are IRF5, STAT4, BANK1, ITGAM and the HLA region.

Statistical Methods to Identify Lupus Risk Variants and Expression

Quantitative Trait Loci (eQTLs)

Common Variants Analysis

Common variants are typically defined as genetic variants with a minor allele frequency (MAF) > 5% in the population. For most GWASs, the case-control study design is employed. Currently, there are two main methods used for analyzing the genetic association with disease. This two methods are frequentist approach and Bayesian

32 approach. Although these two methods are based on different assumptions, they usually arrive at the same conclusion for the most correlated genetic variants [168].

Frequentist Approach

Frequentist approach makes inference about unknowns based on measures of performance under imaginary repetitions of the procedure that is used to make the inference. It assesses evidence for a genetic association between genetic variants and a phenotype of interest through quantifying the hypothetical frequencies of the observed data under the null hypothesis (H0) of no association. The p-value measures the probability that the statistical summary of the data is equal or more extreme than the observed statistical summary under null hypothesis. Therefore, the lower the p-value, the more likely the association exists [168, 169].

The genetic association with a binary trait could either be done through chi- squared test or logistic regression. Although both tests can generate a p-value, logistic regression can be used to adjust the covariate effect, such as population stratification.

Therefore, the logistic regression will be illustrated below.

Let us define 푦𝑖(푖 = 1, … , 푛) to be the disease status for 푛 samples with 푦𝑖 = 1 indicating cases and 푦𝑖 = 0 indicating controls. Then the logistic regression model for given genetic information (퐺𝑖) can be shown as

퐿 푃푟(푦𝑖 = 1|퐺𝑖) log = 훽0 + 푥𝑖훽1 + ∑ 푥𝑖푙훾푙 푃푟(푦𝑖 = 0|퐺𝑖) 푙=1 where 푥𝑖 denotes the number of risk allele A. 푥𝑖 can take (0,1,2), (0,0,1) or (0,1,1) for genotype (BB,BA,AA) for additive, recessive or dominant effect model [154]. The exp⁡(훽1)

33 can be regarded as the odds ratio for the disease comparing those who carry the risk allele to those who do not carry the risk allele. 훾푙(푙 = 1, … , 퐿) is the covariate effects.

Bayesian Approach

Bayesian approach makes inference about an unknown parameter or a hypothesis based on the observed data and prior knowledge. This approach quantifies the evidence for two competitive hypotheses. It uses Bayes’s theorem to compute the posterior probability of the hypotheses. One advantage of the Bayesian approach compared with frequentist approach is that prior knowledge about the genetic variants can be incorporated, which can be represented by prior probability 휋 [170]. The prior knowledge can be any kind of biological evidence, such as the presence of functional genomic element and the proximity to the disease-relevant gene.

Calculation of the posterior probability of the hypotheses given data can be split into three steps. First, choose a value 휋 for the prior probability of 퐻1, the alternative hypothesis. Then, calculate the Bayes factor (BF), which is the ratio of probability of data under 퐻1 and 퐻0. Finally, calculate the posterior probability. The formula is shown as below 휋 푃푂 = 퐵퐹 ∗ 1 − 휋 where PO is posterior odds of 퐻1.

푃푂 푃푃퐴 = 1 + 푃푂 where PPA is the posterior probability of association.

Although PPA combines BF, the observation association data, with 휋, the prior probability, BF is often used as the primary summary for association evidence because

34

휋 is very small for most of the time and PPA can be calculated for any given 휋. Therefore,

BF calculation is very important. The calculation will be illustrated below.

Let 휃ℎ푒푡 denote the log of OR between the heterozygote and the common homozygote. Let 휃ℎ표푚 denotes the log of OR between the risk homozygote and the common homozygote. The null hypothesis is:

퐻0: 휃ℎ푒푡 = 휃hom = 0

The alternative hypothesis is at least one of the 휃ℎ푒푡 and 휃ℎ표푚 is not zero:

퐻1: 휃ℎ푒푡 = 푡1, 휃hom = 푡2 where 푡1, 푡2 are known. Then:

푃(푑푎푡푎|휃 = 푡 , 휃 = 푡 ) 퐵퐹 = ℎ푒푡 1 hom 2 푃(푑푎푡푎|휃ℎ푒푡 = 0, 휃hom = 0)

However, 휃ℎ푒푡 and 휃hom are unknown under 퐻1 in real situations. Therefore, the integration over all the possible values for 푡1 and 푡2 are used to calculate the numerator, which needs an optimal genetic model. The genetic models can be used are usually strictly additive model, strictly dominant or recessive models, general models centered on additivity, general models not centered on additivity [168].

Rare Variants Analysis

Rare variants are defined as variants with minor allele frequency (MAF) <1% in the population. Rare variant analysis becomes possible with the development of next generation sequencing (NGS) techniques. Since rare variant analysis usually groups genetic variants by gene, these tests are called gene-based test and use α = 2.5 × 10−6 as a significance threshold in a genome-wide association based on the assumption of

35 around 20,000 genes in the human genome [171]. Many methods have been developed for rare variants analysis. They are mainly classified into two groups: burden tests and variance-component tests.

Burden Tests

Burden tests are methods to collapse all the information of rare variants into genetic scores. These tests work well for the rare case when most variants are causal and influence the phenotype in the same direction. However, these tests are not very useful for the case when only a small proportion of the variants are causal and the variants work in the different directions [172]. The methods developed based on burden tests are

ARIEL test (accumulation of rare variants integrated and extended locus-specific), CAST

(cohort allelic sums test), CMC method (combined multivariate and collapsing method),

MZ test (Morris and Zeggini test) and WSS (weighted-sum statistic) [172].

Adaptive Burden Tests

Compared with burden tests, adaptive burden tests use data-adaptive weights or thresholds. The advantage is that it is more robust than burden tests. But they are usually computationally expensive. The methods developed based on adaptive burden tests are aSum (data-adaptive sum test), Step-up, EREC test (estimated regression coefficient test), VT (variable threshold), KBAC method (kernel-based adaptive cluster),

RBT (replication-based test) [172].

Variance-component Tests

Variance-component tests are based on the variance of genetic effects. They are useful when only a small proportion of variants are causal and the variants work in different directions. However, they are not powerful when most variants are causal and

36 all the variants work in the same direction, the situation for which burden tests are preferred. The methods developed based on variance-component tests are SKAT

(sequence kernel association test), SSU test (sum of squared score), C-alpha test [172].

Combined Tests

Combined tests combine burden tests and variance-component tests together.

This approach is very powerful for a general situation, but can be a little less powerful when the genetic mechanism underlying the phenotype fits burden test or variance- component test better. The methods developed based on combined tests are SKAT-O,

Fisher method and MiST (mixed-effects score test for continuous outcomes) [172].

EC Test

The EC (exponential combination) test is a method to exponentially combine genetic score statistics derived from genetic variants. It is very powerful when only a small fraction of variants are causal. However, this method is very computationally intensive and is not very robust for the cases when medium or high fraction of variants are causal

[172].

Expression Quantitative Trait Loci (eQTLs)

eQTLs are defined as genomic regions containing genetic variants that are associated with the expression of one or more genes. To identify eQTLs, genome-wide genotyping data is analyzed together with gene expression profiles from specific tissues or cells [165]. eQTLs have the potential to greatly advance our understanding of the disease etiology through identifying potential functional variants.

37

Most eQTL identifications employ the traditional QTL mapping techniques through considering each gene expression profile as the trait and repeating the process for all genes in the genome [154].

The simplest model for eQTL mapping is accomplished through a linear regression approach, as illustrated below.

푦 = 푎 + 푏푥 + 휖 where 푦 is the gene expression trait, which is a continuous variable. 푥 is the additive genetic factor, which can be represented by the number of risk allele in the sample. 푎 is a constant. ⁡푏 is the effect of the additive genetic factor, 휖 is the random error of the model, which is usually assumed to be a standard normal distribution with 휖~⁡푁(0,1).

Statistical Power Considerations for Genetic Analysis

Multiple Testing Correction

Although GWAS is a very powerful tool scanning the whole genome for disease association, a big problem in the interpretation of the association result is to determine the significance level that well eliminates Type I statistical error. Testing association for a genome-wide dataset creates a great burden of multiple testing. With multiple testing, the expected number of false positive association will increase [168]. Therefore, we need a more stringent threshold for claiming significance.

A commonly used approach for controlling multiple testing problem is the

Bonferroni correction. In order to get the overall significance (훼), Bonferroni correction adjusts p-value to be 푝 − 푣푎푙푢푒 < 훼/푘 , where 푘 is the number of independent tests performed [168]. From the LD structure from human HapMap data, it has been estimated

38 that there are approximately a million independent tests genome-wide for Europeans and two million independent tests genome-wide for Africans [173]. Therefore, the genome- wide significant level for European GWAS is p-value less than 5x10-8. This level has then become a standard since many of the genetic variant association reaching this level are robust with replication.

Power Analysis

The power of a statistical test is the probability that the null hypothesis is rejected under the imaginary repetitions of the study. The usually recommended power is at least

80%. GWAS is only feasible when the test has sufficient power to find the association with reasonable sample size. The number of sample size required to reach sufficient power can be affected by minor allele frequency (MAF) and effect size [156]. The power calculation for a GWAS is illustrated below.

Assume we have N SNPs, let Ci represent SNP i being causative, and Di represents SNP i being detected. Assume that one of these SNPs is the causative SNP, but it is unknown which one is the causative SNP. Then the overall power can be calculated as

Pr(퐶𝑖, 퐷𝑖) = Pr(퐷𝑖| 퐶𝑖) Pr(퐶𝑖) = 푃𝑖 Pr(퐶𝑖)

푃표푤푒푟 = ⁡ ∑ 푃𝑖Pr⁡(퐶𝑖) 𝑖=1 where Pi is the power calculated for SNP i.

Table 2, modified from Hong, E.P. et al [174], shows the sample size required to get 80% power for different number of genetic variants. For a GWAS, around 4000 cases are required to detect the association with MAF of 0.05 and OR of 1.5, which is usually a

39 reasonable sample size for most diseases. Moreover, with the accumulation of cases, some GWASs with more than 10000 cases have been performed, which can help us identify genetic variants with smaller OR.

Table 2 Sample sizes with 80% power in case-control study [174]

ORA No. of Genetic Variants 1.3 1.5 2 2.5 Single p < 0.05 1974 789 248 134 1M p < 5 x 10-8 9962 3983 1255 680 Assumptions: 5% minor allele frequency, 5% disease prevalence, complete linkage disequilibrium (D’=1), and 5% type I error rates for allelic test. ORA, odds ratio of heterozygotes under an additive model; CC, case-control study [174].

Gene Regulation Introduction

Although the human cell carries the same genomic DNA, the function and structure of different cell types can differ dramatically. This is because different cell types will use different sets and amounts of RNA and protein through the regulation of gene expression

[175]. We know that the genetic information flows from DNA to RNA to Protein, any step of which may be regulated in the process. These gene regulations can be classified into the following categories: transcriptional control, which controls when and how the gene is transcribed; RNA processing control, which controls the splicing and processing of RNA;

RNA transportation and localization control, which controls the transportation of RNA from nucleus to cytosol; translational control, which controls the translation of the mRNA to protein; mRNA degradation control, which controls the stability of mRNA; protein activity control, which controls the activity of protein, including activation, deactivation, degradation and localization [175]. All these steps are finely tuned in the cell to sustain

40 the health. If any step is affected, disease phenotype may appear. We will briefly present the steps which GWAS associated genetic variants may affect.

90% of the genetic variants identified in SLE GWAS are located in the regulatory regions of the genome [176], which may affect the following potential mechanisms: transcription factor binding, enhancer activity, post-translational histone modifications, long-range enhancer–promoter interactions, RNA polymerase recruitment [177], non- coding RNAs, the binding of RNA binding protein (RBP), and the processes affected by

RBP, such as RNA splicing. Any of these changes may disrupt the gene regulation, the downstream traits, which affects the likelihood of a disease phenotype.

Transcription Factors

Transcription factors (TFs) are regulatory proteins that control the rate of transcription through binding to specific DNA sequences. They often bind to DNA through

DNA-binding domains with up to 106-fold higher affinity for their target sequences than for the remainder of the DNA [178]. Their binding sequences are highly conserved and can be classified into different families, such as the STAT family. Also, TFs can be classified with their three-dimensional protein structure, including basic helix-turn-helix, helix-loop- helix, and proteins [178]. Meanwhile, TFs need to work cooperatively to control the gene expression through coordinated interaction of multiple proteins.

For eukaryotic cells, including human cells, gene expression is controlled by the promoter and other regulatory regions. The promoter region is where the general transcription factors and polymerases bind and start transcription. The other regulatory regions include all the genomic places where transcription factors bind and together with chromatin structure control the rate of transcription. These regions include enhancers,

41 repressors, barriers and insulators. For controlling the accessibility of these regulatory regions, histone modifying enzymes play a critical role in the regulation of chromatin structure and accessibility to transcription factors and transcriptional apparatus [179].

An enhancer is a region of DNA that leads to increased rate of transcription through interaction with promoter. It is known that transcription factor plays a role in this process.

Among all the regulatory regions, the enhancer is the one having the most functional diversity in the tissues. Studies have shown enhancers may be tens of thousands , even megabse away from its regulated promoter [175]. Although several models have been proposed for the enhancer-promoter interactions, the most accepted one is the looping model, for which the enhancer and promoter interact through looping out the intervening region. This enhancer-promoter will form a discrete region in the genome with high interaction frequency, defined as topological associated domains

(TADs), which are very important in the control of gene expression [179].

A repressor is a region of DNA that inhibits transcription. Insulator is where the binding of transcription factor can block the interaction of enhancer and promoter, the most common transcription factor for insulator is CTCF. Barrier is where the sequence stops heterochromatin spreading and maintains the border between euchromatin and heterochromatin regions [179]. All these regions are important in the regulation of transcription and require transcription factor binding to perform the functions required for life.

RNA Binding Proteins (RBPs)

RNA binding proteins (RBPs) form ribonucleoprotein (RNP) complexes to regulate the structure and function of RNAs, including the biogenesis, stability, transportation and

42 cellular localization [180]. RBPs contain one or multiple RNA-binding domains to interact with RNA, including RNA recognition motif (RRM), K-homology (KH) domain (type I and type II), RGG (Arg-Gly-Gly) box, Sm domain, DEAD/DEAH box, zinc finger, double- stranded RNA binding domain, cold-shock domain, Pumilio/FBF domain and the Piwi/

Argonaute/Zwille domain [180].

RBPs can play a role in every aspect of mRNA biology, including transcription, pre- mRNA splicing, RNA editing, polyadenylation of messenger RNA, transportation, localization, translation and turnover [180]. More than 74% of human genes have multiple mRNAs through alternative splicing, which can be controlled by RBPs. For example, Nova proteins containing KH domains control the alternative splicing of pre-mRNA through recognizing YCAY elements. Moreover, RNA editing is the most frequent RNA modification, adding the complexity of gene products. One example is the conversion of adenosine (A) to inosine (I), catalyzed by ADAR proteins. Polyadenylation is a process of adding 3’ poly (A) tails often of 200 nucleotides, which has a strong effect on its nuclear transport, translation efficiency and stability. CPSF is the key protein involved in this polyadenylation process. mRNA export is the process of transporting translation-ready mRNA from nucleus to cytoplasm. It involves three steps, from the generation of cargo- carrier complex in the nucleus, to transportation of the complex through nuclear pore, and finally the release of the cargo in the cytoplasm and the recycling of the carrier. The

TAP/NXF1:p15 heterodimer has a key role in this process. mRNA localization is another way of spatially regulating protein production. For example, in S. cerevisiae, ASH1 mRNA is located to the daughter cell through She2 and She3 proteins. Translation can be a rapid process of controlling gene expression. ZBP1 can repress the beta- translation by

43 blocking translation initiation. mRNA turnover can regulate the stability of mRNA. For example, ELAV/Hu proteins stabilize many AU-rich mRNAs in neurons.

Non-Coding RNAs

Over the past decades, non-coding RNAs have been appreciated as an important source of gene regulation. Non-coding RNAs include microRNAs (miRNAs), promoter- associated RNAs, short interfering RNAs, piwi-interacting RNAs, small nuclear RNAs, natural antisense RNAs and long non-coding RNAs [181]. Non-coding RNAs could either be sense or anti-sense in direction, which are originated from intergenic or intronic regions

[181]. They play an important role in the transcriptional and translational control of gene expression, increasing genomic complexity.

Among non-coding RNAs, miRNAs are the most extensively investigated ones. miRNAs are a class of approximately 22nt short non-coding RNAs regulating gene expression through the sequence specific recognition of target mRNAs. About 3-4% of human genome codes miRNA [182]. miRNAs are transcribed by either RNApol III or

RNApol II depending on their specific genomic location. The primary transcript (pri-mRNA) can generate one or more hairpin-like precursor miRNAs (pre-miRNAs) through Drosha processing. Pre-miRNAs are then transported to cytoplasm through Exportin 5. Pre- miRNAs are then processed to 22nt duplex through Dicer in the cytoplasm. This duplex is then loaded into Ago protein to form miRNA-induced silencing complex (miRISC) to perform function [183]. Mostly, miRNA affects gene expression through targeting 3’UTR region of the mRNAs. However, there are some reports about targeting the 5’UTR and coding region of the mRNAs [182].

44

Mostly, miRNAs mediate the gene downregulation through the match based on miRNA seed region with target mRNAs. miRNAs can regulate gene silencing through three mechanisms: pretranslational, cotranslational, posttranslational. On the other hand, miRNA has also been reported to mediate gene upregulation. It may form a complex to perform this function or work through affecting the repressive miRNAs [182].

Alternative Pre-mRNA Splicing

A feature of eukaryotic genes is the existence of introns, which separate the exon protein coding element. Then, the precursor mRNAs (pre-mRNAs) must be spliced in order to generate the fully mature transcript to a functional protein. During this splicing process, one gene can generate multiple splicing isoforms, which is defined as alternative splicing. It has been estimated nearly all the human genes have at least 2 mRNA isoforms [184], which greatly increases the human proteome complexity. The RNA splicing process has two steps to remove the intron, including RNA cleavage and ligation.

This process requires the formation of splicesome, which contains five ribonucleoprotein complexes, U1,, U4/U6 and U5. Alternative splicing process requires the interaction of cis-regulatory sequences and RNA splicing protein factors. These cis-regulatory sequences could be either exonic or intronic, acting as positive splicing enhancers and negative splicing silencers. The mechanisms of alternative splicing including RNA-protein interaction between cis-regulatory sequences and splicing factor, RNA-RNA base pair interactions or chromatin-based mechanism changing splicing patterns [184]. Errors in the mRNA splicing can lead to disease. One example is β-globin thalassemia mutations caused by mutations in the SMN-2 gene, which affects splice sites and leads to aberrant

45 splicing patterns. The splicing factor hnRNPA1, which binds to the regulatory site in the

SMN-2 transcript, is very important in this process.

ETS1 Function and Its Association with SLE

ETS1 (v-ets avian erythroblastosis virus E26 oncogene homolog 1) is a founding member of the ETS transcription factor family, defined by the presence of a conserved

ETS DNA-binding domain [185]. ETS1 is high conserved in human and mouse with similar length and only 12 amino acid difference. There are several isoforms existing for

ETS1. The major isoform is referred as p51 or p54, which is 440 amino acids long for mouse and 441 amino acids long for human. The second isoform is derived through alternative splicing of exon 7 to produce 353 amino acids long protein in mouse and 354 amino acids long protein in human, referred as p42. It represents about 10% of ETS1 expression in lymphocytes. Meanwhile, p42 is suggested to have better DNA binding than p51/p54 due to the splicing out of auto-inhibitory domain. Besides these two isoforms, there are some other minor isoforms, which doesn’t have much expression [186].

Although the expression patterns of different isoforms may be not the same, the ETS1 expression is measured using probes or antibodies recognizing all the isoforms for most studies [186].

ETS1 binds to DNA as a monomer [187]. It mainly contains four domains. They are ETS domain, auto-inhibitory domain, pointed domain and acidic domain. ETS domain is an around 85 amino acids long, including a winged-helix-turn-helix region required for

DNA binding [188]. The auto-inhibitory domain contains two alpha helices, which modulates the function of ETS domain. The phosphorylation of serines in the auto-

46 inhibitory region can prevent ETS1 binding [189]. The effect of auto-inhibitory region can be abolished through interaction with other transcription factors, such as Runx1, Pax5

[190]. Also, ETS1 can interact with other ETS1 to abrogate inhibitory effect if they are in the right orientation and position [191]. The pointed domain is a protein-protein interaction region, which is around 80 amino acid long. It can be served as the docking site for ERK2 in the protein interaction [192].

Both genetic studies and biological studies showed the region around ETS1 is associated with the susceptibility to develop SLE. Moreover, Ets1-deficient (Ets1p/p) mice develop a lupus-like disease characterized by high titers of IgM and IgG autoantibody and immune complex deposition in the kidney [193]. In conclusion, both genetic and biological studies in mice and humans suggest that reduced or depleted ETS1 expression is associated with susceptibility to SLE.

Function of ETS1 in the Immune System

ETS1 is highly expressed in B cells, T cells, NK cells and NK T cells for the immune system [186], indicating its potential role in regulating differentiation and function of these cells. The effects of ETS1 in these cells are illustrated below.

ETS1 in B Cells

Ets1 is highly expressed in resting B cells, which are naïve B cells and memory B cells, but lowly expressed in activated B cells. Meanwhile, the downregulation of Ets1 is necessary for the terminal differentiation of B cells [186]. In a Ets1-/-Rag2-/- chimaeric mice with mixed C57BL/6 and 129Sv background, in which all cells are Ets1+/+-Rag2-/- except lymphocytes which are Ets1-/--Rag2-/-. The most obvious phenotype for this mice is the increased number of IgM and IgG secreting plasma cells in the peripheral and bone

47 marrow due to enhanced differentiation [194]. Meanwhile, IgM, IgG1 and IgE levels in the serum are increased with IgG2a level decreased [194, 195]. What is more, the production of IgM and IgG autoantibody, against dsDNA, histones, cardiolipin, MBP and IgG, is increased as well, leading to the immune complex deposition in the kidney [193]. Both B cell intrinsic and extrinsic mechanisms may play a role in these alterations. Through adoptively transferring the Ets1-/- splenic B cells into non-irradiated wild-type hosts, the activated phenotype and increased terminal differentiation of B cells can be reversed, suggesting an B cell extrinsic factor [194]. For the extrinsic factor, this may be associated with decreased number and function of Tregs in Ets1 deficient mice [196]. Meanwhile, there are B cell intrinsic factors. This was shown by the increased differentiation to IgM plasma cell of Ets1-/- splenic B cells through TLR9 stimulation with CpG ODN in vitro [194].

For the intrinsic mechanism, this may be partially explained by the physical interaction of

Ets1 with Blimp1, a master regulator of plasma B cell differentiation via its ability to suppress expression of genes involved in the mature B cell program. The Ets1 and Blimp1 interaction was shown through co-immunoprecipitation (Co-IP). Ets1 can block Blimp1

DNA binding activity and reduce the ability of Blimp1 to suppress its target genes without affecting Blimp1 protein levels [197]. Blimp1 can repress target genes such as the B cell specific promoter PIII of the major histocompatibility complex class II transactivator CIITA, the pax-5 promoter and the c- promoter [197]. Ets1 can also induce the expression of Blimp1 suppressed genes by binding to their promoter [197]. This model hypothesized that in stimulated B cells Ets1 physically interacts with Blimp1 to inhibit Blimp1 DNA binding and thereby blocks repression of key Blimp1 targets, thus preventing premature

B cell differentiation [193].

48

In the bone marrow of whole body Ets1-/- mice, Ets1 deficient B cells have partial defects in transitioning from pro-B cell to pre-B cell stage. In Ets1-/- mice, the IgM–B220+ bone marrow cells had 45% CD43+ pro-B cells compared with 20% in the control mice, indicating defects in transitioning from pro-B cell to pre-B cell [198]. In the spleen of whole body Ets1-/- mice, there are no marginal zone B cells and reduced number of transitional type 2 B cells [198]. In the peripheral lymphoid organs of whole body Ets1-/- mice, the follicular B cells have higher level of CD23, CD80 and CD86, indicating an activation phenotype [198].

ETS1 in T cells

Ets1 expression is high in resting T cells and gets downregulated upon T cell activation after crosslinking TCR-CD3 complex [186, 199]. For T cells, Ets1 deficient T cells have abnormal thymic differentiation, reduced T cell numbers in the periphery, reduced IL-2 production and reduced Th1, Th2 cytokines production [186]. In the whole body Ets1-/- mice with C57BL/6 background, Ets1 deficiency severely impaired the development of immature CD4-CD8-CD25+CD44- (double-negative (DN3)) thymocytes to

CD4+CD8+ (double-positive (DP)) cells. Meanwhile, CD4-CD8-CD25+CD44- (DN4) cells have increased susceptibility to cell death, which is specific to β lineage. Also, Ets1 deficiency leads to inefficient allelic exclusion at the TCRβ locus since around 10% cell have more than one TCRβ chain. All these affected processes are controlled by the pre-

TCR signaling. It appears that Ets1 has an effect on the pre-TCR signaling [200].

Besides the defects in the thymic development, Ets1 deficiency affects the peripheral T cells as well. In Ets1 deficient mice, Th1, Th2 and Treg developments are impaired, while Th17 cells are increased [201]. In the whole body Ets1-/- mice with mixed

49

C57BL/6 and 129SV background, Ets1 deficiency leads Th1 cells to decrease IFN production and increase Th2 cytokines production, such as IL-10 and IL-24. This may be due to the effect of Ets1 on Tet for controlling IFN production through affecting IFN promoter and the effect of Ets1 on HDAC enzyme binding to the IL10 regulatory region, which represses the Th2 specific gene expression, shown by luciferase assay and chromatin immunoprecipitation (ChIP), [202, 203]. The defects of Treg development is likely due to the effects of Ets1 in Foxp3 expression, which is the key transcription factor for Treg development and function [204]. It has been shown the epigenetic modifications at the CpG-rich Treg-specific demethylated region (TSDR) of the Foxp3 locus can stabilize Foxp3 expression. Ets1 only binds to TSDR at demethylation state. Disruption of Ets1 binding can disrupt Foxp3 expression, which may affect Treg development and function [204] . The increase of Th17 cells can be due to reduced production of IL-2, which inhibits Th17 differentiation [205]. In Ets1 deficient mice, there is an increased differentiation of Th17 cells. Meanwhile, Ets1 deficient Th17 cell has increased IL-17 production, which can be reversed through Ets1 p54 transduction. The increased IL-17 can also be observed in the lung and thymus of the mice. This may be due to the decreased IL-2 production and increased resistance to IL-2 for the Ets1 deficient cell. This increased resistance to IL-2 seems to be correlated with defect in the downstream of

STAT5 phosphorylation, not RORγt expression. The details of this mechanism is still unclear [201]. Another proposed mechanism is the effect of Ets1 on the expression of

2+ IP3 (inositol 1,4,5-trisphosphate) receptor 3 (IP3R3), which controls the follow of Ca from

+ intracellular stores in response to IP3. Ets1 deficient CD4 cells lack IP3R3, which disrupts the TCR-induced calcium flux and leads to the differentiation of Th17 cells [206].

50

Ets1 deficient mice also have defects in CD8+ T cell development and function.

CD8+ T cells from deficient mice have less surface CD8 expression [207]. This may be due to the enhancer of CD8 genes regulated by Ets1 [208]. Also, the deficient mice have fewer mature CD8 single positive thymocytes [209]. This defect is intrinsic because Ets-

1 deficient thymocytes cannot efficiently develop CD8 single-positive thymocytes in the mixed bone marrow chimeric mice [209]. This defect may be due to impaired RUNX3 expression, as Ets1 promotes RUNX3 expression through binding to at least two regions of RUNX3 gene [210]. RUNX3 is necessary for maintaining CD8 expression and silencing

CD4 expression. Besides affecting development, CD8+ T cells from Ets1 deficient mice have decreased level of effector gene expression, such as IFNγ, granzyme B, and perforin, when stimulating with anti-CD3 and anti-CD28 in vitro. Meanwhile, the survival and recovery of these Ets1 deficient CD8+ T cells are also impaired [211]. This may be due to the effect of Ets1 on the expression of IL-12, which can enhance CD8+ T cell survival [212].

ETS1 in other Immune Cells

In Ets1 deficient mice, NK cell number is reduced in the bone marrow and NK T cells are decreased in the thymus, spleen and liver [213-215]. These defects are intrinsic due to the result from mixed bone marrow chimera, in which Ets1 deficient bone marrow is transferred to irradiated WT mice. More than 80% declining in the NK lineage cells from

Ets1 deficient cells suggests the defects are cell autonomous [214]. Meanwhile, there is reduced cytolytic activity for Ets1 deficient NK cells [186, 214]. What’s more, Ets1 deficient mast cells have reduced GM-CSF production due to the effect of Ets1 on the

GM-CSF promoter [216]. f

51

Association of ETS1 with SLE

For genetic studies, several independent GWASs in Asian populations have confirmed that genetic variants at the ETS1 locus are associated with susceptibility to

SLE [217-221]. The top associated single polymorphisms (SNPs) at the ETS1 locus are rs6590330 (OR=1.37, P= 1.77x10-25) and rs1128334 (OR=1.29, P=2.33x10-11).

Moreover, genetic variants at the ETS1 locus have been shown to be associated with different clinical phenotypes of SLE. Sullivan et al. reported that one 3’ polymorphisms of

ETS1 are associated with different clinical phenotypes in SLE [222]. Another recent study showed that rs6590330 is associated with SLE of age at diagnosis <20 years (OR=1.24,

P=8.91x10-5) in a Chinese Han population [223].

Recently, two association studies claimed ETS1 locus is associated with SLE in

European population. One is a candidate genotyping study [224], and the other one is a meta-analysis study [225]. However, both studies use a much less stringent approach to define the significance instead of the commonly used genome-wide significance. The candidate genotyping study selected rs6590330 as the tagging SNP for ETS1. This study included three cohorts, which is the Swedish cohort containing 1129 SLE patients and

2060 controls, the Finnish cohort containing 270 SLE patients and 343 controls and the

US cohort containing 1310 SLE patients and 7859 controls. The meta-analysis p-value of all three cohorts for the association of ETS1 with the risk of SLE is 9.8 x 10-5 [224]. For the other meta-analysis study, it combined the data from past three GWAS data sets from

Chinese (1,659 cases and 3,398 controls) and European (4,036 cases and 6,959 controls) populations. rs7941765 and rs61432431 are the SNPs associated with ETS1 locus. This

52 study used four criteria to define the association signals as 'shared' between the Chinese and European populations. These four criteria are: 1) The locus had a published association signals in both Chinese and European studies at a genome-wide level of significance (P < 5 × 10−8). 2) The SNP was published only for Europeans but the association P value in the Chinese meta-analysis was significant (FDR < 0.01 across all

SNPs in this group) and the direction of effect in all three GWASs was the same. 3) The

SNP was published only in a Chinese study but the association P value in the European

GWAS was significant (FDR < 0.01) and the direction of effect in all three GWASs was the same. 4) If the SNP failed to meet the requirements for either (2) or (3), a gene-based test on genes within ±1 Mb of the published SNP will be performed. The locus will be deemed shared if the gene-based P value was significant at the 0.01 FDR level after adjustment for multiple testing across all genes tested. From the strength of the evidence, these criteria can be ranked as 1)>2)=3)>4). In this study, rs7941765 falls into the criteria

4) and rs61432431falls into criteria 3). Therefore, both SNPs don’t have the strongest evidence for supporting the association in European population. Therefore, these results need to be interpret cautiously. If the association is defined using genome-wide significance, which is p < 5 x 10-8, the association of ETS1 with SLE is still only exits in

Asian population. Meanwhile, both studies showed that the associated structures at ETS1 locus are different with Asian population even if the association exists.

All these genetic studies suggest that genetic variants at the ETS1 locus might not only play important roles in the development of SLE, but also contribute to the variable phenotypic manifestations of SLE.

53

Role of ETS1 in Lupus Pathogenesis

Ets1 Knock-out in Mice Models

Ets1-deficient (Ets1p/p) mice can develop a lupus-like disease characterized by high titers of IgM and IgG autoantibody and immune complex deposition in the kidney

[193]. However, only mild inflammatory infiltration is observed in the affected organs [226].

ETS1 Role in the B Cell Differentiation of SLE

As discussed in the Immune Cell Dysfunction part, B cells are very important in the pathogenesis of SLE. Meanwhile, Ets1 is found to be critical for B cell development and function. In one study using Ets1-/--Rag2-/- chimeric mice in which all cells are Ets1+/+-

Rag2-/- except lymphocytes which are Ets1-/--Rag2-/-, Bories et al showed that B cells

(Ets1-/-) were present in normal numbers but with a larger proportion of IgM plasma cells

[194]. Furthermore, Ets1 deficiency has been shown to drive terminal differentiation of B cells into IgM-secreting plasma cells in B cell intrinsic fashion. This was shown by the hyperresponsiveness of B cells (Ets1-/-) to TLR9 stimulation [193, 198]. This hyperresponsiveness can at least be partially explained by the physical interaction of Ets1 with Blimp1, a master regulator of plasma B cell differentiation via its ability to suppress expression of genes involved in the mature B cell program. Ets1 can block Blimp1 DNA binding activity and reduce the ability of Blimp1 to suppress its target genes without affecting Blimp1 protein levels [197]. Blimp1 can repress target genes such as the B cell specific promoter PIII of the major histocompatibility complex class II transactivator CIITA, the pax-5 promoter and the c-myc promoter [197]. Ets1 can also induce the expression of Blimp1 suppressed genes by binding to their promoter [197]. The model hypothesized

54 that in stimulated B cells Ets1 physically interacts with Blimp1 to inhibit Blimp1 DNA binding and thereby blocks repression of key Blimp1 targets, thus preventing premature

B cell differentiation. In addition, Ets1 directly binds to the promoter segments of known

Blimp1 target genes (such as Pax5) to up-regulated them (thus counterbalancing the repressive influence of Blimp1). Upon stimulation via TLR9, Ets1 levels decrease relieving

Blimp1 of its inhibitory effects. Meanwhile, Blimp1 levels increase, leading to efficient repression of its target genes and favoring terminal differentiation of the B cell [197]. A recent Ets1 ChIP-Seq study in primary B cells found a group genes regulated by Ets1, which are associated with autoimmune disease susceptibility [227].

Meanwhile, since B cells deficient in Ets1 (from Ets1-/- mice) are hyperresponsive to TLR9 stimulation [193, 198], it is possible that Ets1 can affect the TLR9 signaling pathway. Ets1 can interact with nuclear factor of activated T cells (NFAT) to regulate IL-

2 expression [228], which also gives us the rationale to test the effect of ETS1 on the activation of MAPK and NF-κB, which are downstream of TLR9 signaling pathway [229].

ETS1 Role in the Increased Th17 Cells in SLE

As discussed in the Immune Cell Dysfunction section, Th17 cells play an important role in the pathogenesis of SLE. Th17 cells are positively correlated with lupus nephritis development. Also, IL-17 is mainly produced by Th17 cells. In SLE patients, the IL-17 level and IL-17 producing cell number are increased in the serum, which are correlated with disease activity [70]. IL-17 producing cells can penetrate skin, lung and kidneys, which contributed to the organ damage [71]. IL-17 can also induce patient peripheral

55 blood mononuclear cells (PBMCs) to produce anti-dsDNA antibody, which may indicate that IL-17 promotes B cell activation [72].

Ets1 is shown to play a negative role in the development and function of Th17 cells

[230]. Ets1 deficient mice have increased Th17 cells. This was due to the decreased IL-

2 production and increased resistance to the IL-2 inhibition effect, which may be mediated through STAT5 [201]. In Ets1 deficient mice, there is an increased differentiation of Th17 cells. Meanwhile, Ets1 deficient Th17 cell has increased IL-17 production, which can be reversed through Ets1 p54 transduction. The increased IL-17 can also be observed in the lung and thymus of the mouse. This may be due to the decreased IL-2 production and increased resistance to IL-2 for the Ets1 deficient cell. This increased resistance to IL-2 seems to be correlated with defect of downstream of STAT5 phosphorylation, not RORγt expression. The details of the mechanism is still unclear [201]. Another proposed mechanism is the effect of Ets1 on the expression of IP3 (inositol 1,4,5-trisphosphate)

2+ receptor 3 (IP3R3), which controls the follow of Ca from intracellular stores in response

+ to IP3. Ets1 deficient CD4 cells lack IP3R3, which disrupts the TCR-induced calcium flux and leads to the differentiation of Th17 cells [206].

ETS1 Expression in SLE Patients

In one early SLE study of 1996, measuring 27 SLE patients and 10 healthy controls from Athens, Greece, ETS1 expression was around 2-fold higher in the peripheral blood mononuclear cells (PBMCs) from SLE patients than healthy controls using PCR, no ancestry information about the patient is disclosed [231]. In one recent study of 2010, measuring 61 Chinese Han SLE patients, ETS1 mRNA levels in peripheral blood

56 mononuclear cells (PBMCs) from SLE patients were considerably lower compared with healthy subjects using qRT-PCR [232]. Besides qRT-PCR measurement, there are three microarray studies analyzing the gene expression of SLE patients [233-235]. Emily C.

Baechler group utilized Affymetrix U95A GeneChips to analyze gene expression in PBMC of 48 SLE patients and 42 healthy controls, no ancestry information about the patient is disclosed in this study [233]. Lynda Bennett group analyzed gene expression of PBMC from 30 pediatric lupus patients and 9 healthy control children with Affymetrix U95AV2 microarrays. Of these patients, 13 are Hispanic, 8 are African-American, 6 are Asian, and

3 are Caucasian [234]. G-M Han et al analyzed gene expression of PBMC from 10 patients and 18 controls using Mergen ExpressChipTM DNA Microarray. All the patients are Chinese in this study[235]. Due to the publication date of these papers, it is hard to get the raw data of these studies. Based on a review summarizing these microarray data

[236], ETS1 expression is only upregulated in PBMC of patient in the study from Emily C.

Baechler group with no ancestry information disclosed.

Therefore, it seems that ETS1 mRNA levels in peripheral blood mononuclear cells

(PBMCs) from SLE patients is only decreased in Chinese population.

Published eQTLs for ETS1 In Healthy Individuals

Depressed ETS1 mRNA expression has been shown to be associated with genetic variants at the ETS1 locus. In one study, pyrosequencing of DNA and cDNA from subjects heterozygous for rs1128334 demonstrated that ETS1 mRNA expression from with SLE-risk alleles was significantly reduced compared to that from the

57 chromosome with non-risk alleles in PBMCs of healthy subjects [219]. However, which genetic variant(s) actually cause this depressed ETS1 expression remains unclear.

CRISPR Introduction and Its Application in Genetic Studies

Clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR- associated (Cas) system is a bacterial adaptive immune defense system that uses short

RNA for direct degradation of foreign nucleic acids. The type II bacterial CRISPR system may be engineered to direct Cas9 protein to specifically cleave complementary target-

DNA sequence matching custom guide RNA (gRNA) in human cells, if the sequence is adjacent to short sequences known as protospacer adjacent motifs (PAMs) [237]. For the simplest and most widely used form of this technology, two components must be introduced into and/or expressed in cells to perform genome editing: the Cas9 nuclease and a single-guide RNA (sgRNA) consisting a fusion of a CRISPR RNA (crRNA) and a fixed transactivating CRISPR RNA (tracrRNA). 20 nucleotides at the 5’ end of the sgRNA directs Cas9 to a specific target DNA site using standard RNA-DNA complementarity base pairing rules. These target sites must lie immediately at 5’of a PAM sequence that matches the canonical form 5’-NGG. Cas9-induced DNA double-stranded breaks (DSBs) can then be used to introduce precise single point mutation from single-stranded oligonucleotide (ssODN) donor templates through homology-directed repair (HDR) pathways for precise genome editing [238].

CRISPR can be used to create cells that only single base pair different for proving the causality of genetic variants. A recent study utilized this strategy to prove the causality of rs1421085 for the genetic association at FTO locus [239].

58

Tetracycline Inducible (Tet-inducible) System

Tet-inducible system is one of the most commonly used gene regulation systems, originally developed by Gossen and Bujard [240]. The advantage of the system is that it allows a dose-dependent regulation of the gene of interest. This system includes two components, a tetracycline controlled transactivator (tTA) and a tet-responsive promoter

(TRP) regulating the gene of interest. The tTA binds to the tetR-moiety of the TRP, a minimal promoter linked by a tandem of tet-operator sequences, with high affinity in the presence of Doxycycline (Dox). Two transactivator variants have been developed based on their response to Dox, which is Tet-off and Tet-on systems. In the Tet-off system, the tTA is released from its DNA binding site in the presence of Dox, thus stopping gene expression. In the Tet-on system, tTA will binds to its DNA binding site in the presence of

Dox, thus starting gene expression [240].

59

Chapter 2. Protocol for Electrophoretic Mobility Shift Assay (EMSA) and DNA-affinity Precipitation Assay (DAPA)

60

Protocol for Electrophoretic Mobility Shift Assay (EMSA) and DNA-affinity Precipitation Assay (DAPA)

This protocol is adopted from the following paper we published. “Miller, D.E., Patel, Z.H., Lu, X., Lynch, A.T., Weirauch, M.T., and Kottyan, L.C. (2016). Screening for Functional Non-coding Genetic Variants Using Electrophoretic Mobility Shift Assay (EMSA) and DNA-affinity Precipitation Assay (DAPA). J Vis Exp."

Preparation of Solutions and Reagents

1. Order custom DNA oligonucleotide for use in EMSA and DAPA.

1.1. To reduce the possibility of non-specific protein binding, design short oligos

(between 35-45 base pairs (bp) in length), and place the variant of interest in the

center with 17bp endogenous genomic sequence on both side. For EMSA oligos,

a 5' fluorophore is needed. For DAPA oligos, a 5' biotin tag is needed.

1.2. Order the sense strand and its reverse complement strand for the oligos.

Alternatively, duplex (pre-annealed) oligos can be ordered. Name the oligos

based the nomenclature on an established reference genome.

Note: "Risk" and "non-risk" designation can be related with disease and specific

project, while "reference" and "non-reference" can be more universally relevant.

1.3. Upon the arrival of oligos, briefly spin down the tube and resuspend the oligo with

nuclease-free water to a final concentration of 100 μM. Store at -20 °C. Protect

the fluorophore tagged oligos from light by wrapping with aluminum foil.

Table 1 Example of EMSA/DAPA oligonucleotide design for testing a SNP for differential binding. Name Sequence GTAATGCCTTAATGAGAGAGAGTTAGTCATCTTCTCA rs76562819_REF_FOR CTTC rs76562819_REF_REV GAAGTGAGAAGATGACTAACTCTCTCTCATTAAGGCA

61

TTAC rs76562819_NONREF_F GTAATGCCTTAATGAGAGAGGGTTAGTCATCTTCTCA OR CTTC rs76562819_NONREF_R GAAGTGAGAAGATGACTAACCCTCTCTCATTAAGGCA EV TTAC "REF" stands for the reference allele, while "NONREF" stands for the non-reference allele. "FOR" stands for the forward strand, while "REV" indicates its complement. The SNP is seen in red.

2. Prepare the cytoplasmic extraction (CE) buffer with a final concentration of 10 mM

HEPES (pH 7.9), 10 mM KCl, and 0.1 mM EDTA in deionized water.

3. Prepare the nuclear extraction (NE) buffer with a final concentration of 20 mM HEPES

(pH 7.9), 0.4 M NaCl, and 1 mM EDTA in deionized water.

4. Prepare the annealing buffer with a final concentration of 10 mM Tris (pH 7.5-8.0), 50

mM NaCl, and 1 mM EDTA in deionized water.

Preparation of Nuclear Lysate from Cultured Cells

Note: This protocol was optimized using B-lymphoblastoid cell lines, but it has been tested in several other unrelated adherent/suspension cell lines and can work universally.

1. Culture B-lymphoblastoid cells in Roswell Park Memorial Institute (RPMI) 1640 with 2

mM L-glutamine, 10% fetal bovine serum, and 1x antibiotic-antimycotic containing 100

units/ml of penicillin, 100 μg/ml of streptomycin, and 250 ng/ml of amphotericin B.

1.1. Seed the cell at the density of 2 x105 - 5 x105 viable cells/ml and incubate flasks

at 37 °C with 5% carbon dioxide in an upright position with vented caps.

2. Collect cultured cells into 50ml conical tubes and spin down at 4 °C, 300 x g for 5 min

and remove culture media via aspiration

62

3. Wash cultured cells twice with 10 ml ice cold phosphate buffered saline (PBS), spin

down at 4 °C, 300 x g for 5 min and remove PBS via aspiration.

4. Count cell number using a hemocytometer and resuspend the cell pellet as 1 ml ice

cold PBS per 107 cells.

5. Aliquot 1 ml to 1.5 ml microcentrifuge tubes so that each tube contains 107 cells in

PBS. Centrifuge at 3,300 x g for 2 min at 4 °C and aspirate off PBS.

6. Prior to immediate use, add 1 mM dithiothreitol (DTT), 1x phosphatase inhibitor, and

1x protease inhibitor to the working stock of CE buffer. Then resuspend the cell pellet

with 400 μl of CE buffer and incubate on ice for 15 min.

7. Add 25 μl of 10% Nonidet P-40 and mix the solution by pipetting. Centrifuge at 4 °C,

speed for 3 min. Decant and discard the supernatant.

8. Prior to immediate use, add 1 mM DTT, 1x phosphatase inhibitor, and 1x protease

inhibitor to the working stock of NE buffer. Then resuspend the cell pellet with 30 μl of

NE buffer and mix by vortexing.

9. Incubate at 4 °C in a tube rotator or on ice for 10 min. Centrifuge at 4 °C, 3,300 x g for

2 min.

10. Collect the clear supernatant (nuclear lysate) and aliquot into 1.5 ml microcentrifuge

tubes or PCR tubes before storing at -80 °C to avoid multiple freeze-thaw cycles which

may degrade the protein. Leave a 10 μl aliquot to measure protein concentration using

the bichoninic acid assay (BCA).

Electrophoretic Mobility Shift Assay (EMSA)

1. Prepare the oligo working stock and EMSA gel.

1.1. If oligos were ordered in duplex, thaw the 100 μM stock and dilute at 1:2,000 with

63

annealing buffer to achieve a 50 nM working stock.

1.2. If oligos were ordered single-stranded, thaw the 100 μM stocks and dilute at 1:10

with annealing buffer to achieve 100 nM working stocks. Then combine 100 μl of

the 100 nM working stocks for both strands in a microcentrifuge tube.

1.2.1. Place the tube in a heat block at 95 °C for 5 min. Turn off the heat block and

allow the oligos to slowly cool down to room temperature for at least one hour

prior to use.

1.3. Pre-run the EMSA Gel.

1.3.1. Remove the slide from a pre-cast 6% TBE gel and rinse under deionized

water several times to remove any buffer from the wells. Prepare 1 Liter of

0.5x TBE buffer by adding 50 mL of 10x TBE to 950 mL of deionized water.

1.3.2. Assemble the gel electrophoresis apparatus and check for leaks through

filling the inner chamber with 0.5x TBE buffer. If no buffer leaks into the outer

chamber, fill the outer chamber roughly two thirds of the way.

1.3.3. Pre-run the gel at 100 V for 60 min.

1.3.4. Flush each well with 200 μl of 0.5x TBE buffer.

2. Prepare Binding Buffer Master Mix.

2.1. Prepare 10x binding buffer with a final concentration of 100 mM Tris, 500 mM KCl,

10 mM DTT; pH 7.5 in deionized water.

2.2. In a microcentrifuge tube, create a master mix containing the reagents common

to all reactions (10 μl 10x binding buffer, 10 μl DTT/ polysorbate, 5 μl Poly d(I-C),

and 2.5 μl salmon sperm DNA; Table 2). Add an additional 10% of the volume to

account for solution loss due to pipetting.

64

Table 2 Example EMSA reaction setup. Reagent Final Conc. Rxn #1 Rxn #2 Rxn #3 Rxn #4 Ultrapure Water to 20 μl vol. 13.5 μl 11.98 μl 13.5μl 11.98 μl 10x Binding Buffer 1x 2 μl 2 μl 2 μl 2 μl DTT/TW-20 1x 2 μl 2 μl 2 μl 2 μl Salmon Sperm DNA 500 ng/μl 0.5 μl 0.5 μl 0.5 μl 0.5 μl 1μg/μl Poly d(I-C) 1 μg 1 μl 1 μl 1 μl 1 μl Nuclear Extract (5.26 ug/μl) 8 μg - 1.52 μl - 1.52 μl NE Buffer 1.52 μl - 1.52 μl - Reference allele oligo 50 fmol 1 μl 1 μl - - Non-Reference allele oligo 50 fmol - - 1 μl 1 μl The table illustrates an example EMSA to test the hypothesis that there is genotype- dependent binding of TFs to a specific SNP.

3. Add appropriate nuclease-free water to each microcentrifuge tube such that the final

volume following addition of all reagents will be 20 μl.

4. Add the appropriate amount (5.5 μl) of master mix to each microcentrifuge tube.

5. Add 8 μg of nuclear lysate to the appropriate microcentrifuge tubes. Include tubes

containing the oligo without nuclear extract as negative controls (e.g. Table 2, Rxn #1

and Rxn #3).

Note: The optimal amount of lysate per reaction must be determined experimentally

by titration. Generally, titrating a range of 2-10 μg of lysate is sufficient.

6. Add 50 fmol of oligo to the appropriate microcentrifuge tubes. Flick to mix and briefly

spin the contents to the bottom of the tube. Incubate for 20 min at room temperature.

Note: If performing a supershift experiment, incubate the lysate mixture with antibody

for 20 min at room temperature prior to the addition of oligos. 1 μg of a ChIP-grade

antibody is a good start for titrating the optimal antibody.

7. Add 2 μl of 10x Orange Loading Dye to each microcentrifuge tube. Pipette to mix up.

8. Load the samples into the pre-run 6% TBE gel by pipetting up and down to mix and

then expelling each sample into a separate well. Run the gel at 80 V until the orange

65

dye has migrated 2/3 to 3/4 of the way down the gel. This should usually take

approximately 60-75 min.

9. Remove the gel from the plastic cassette by prying it open with a gel knife and place

the gel in a container with 0.5% TBE buffer to keep it from drying out.

10. Place the gel on the surface of an infrared and chemiluminescence imaging system,

being sure to eliminate any bubbles or contaminants that will disrupt the image.

11. Using the scanning system software, click the "Acquire" tab and then select "Draw

New" to draw a box around the area corresponding to where the gel is located on the

surface of the scanner.

12. In the "Channels" section of the "Acquire" tab, select the channel corresponding to the

wavelength of the fluorophore tag on the oligo. In the "Scanner" section, click

"preview" to get a low-quality preview scan. Adjust the scan area by dragging the blue

box surrounding the acquired preview image down to the portion of the gel to be

imaged.

Note: For example, if using oligos labeled with a 700 nm fluorophore, make sure the

"700 nm" channel is selected before scanning.

13. In the "Scan Controls" section, select the "84 μM" resolution option and the "Medium"

quality option. Set the focus to offset to half the thickness of the gel.

Note: For example, a 1 mm gel would use a 0.5 mm focus offset.

14. In the "Scanner" section, click "Start" to begin the scan.

Note: During the scan, the brightness, contrast, and color scheme can often be

adjusted manually depending on the manufacturer of the scanning system.

15. After the scan has finished, select the "Image" tab and click "Rotate or Flip" in the

66

"Create" section to correct the orientation. Save the image file by clicking "Export" in

the main menu and then select "Single Image View."

DNA Affinity Purification Assay (DAPA)

1. Preparation the 5 μM Oligo Working Stock.

1.1. If oligos were ordered in duplex, thaw the 100 μM stock and dilute at 1:2,000 with

annealing buffer to achieve a 50 nM working stock.

1.2. If oligos were ordered single-stranded, thaw the 100 μM stocks and dilute at 1:10

with annealing buffer to achieve 100 nM working stocks. Then combine 100 μl of

the 100 nM working stocks for both strands in a microcentrifuge tube.

1.2.1. Place the tube in a heat block at 95 °C for 5 min. Turn off the heat block and

allow the oligos to slowly cool down to room temperature for at least one hour

prior to use.

2. Before starting, warm the binding buffer, low stringency wash buffer, high stringency

wash buffer, and elution buffer to room temperature.

Note: A final concentration of 50 ng/mL Poly d(I-C) can be added to the binding buffer,

low stringency wash buffer, and high stringency wash buffer to reduce potential non-

specific binding of proteins to the oligos.

3. Prepare the binding mixtures for each variant.

3.1. Mix 1 volume of nuclear lysate with 2 volumes of binding buffer.

Note: The required amount of lysate must be determined experimentally due to

varying abundance of TFs. Using 100-250 μg of nuclear lysate per column is

sufficient in most cases.

3.2. Add 1x phosphatase inhibitor, 1x protease inhibitor, and 1x binding enhancer

67

(optional) and mix the solution by flicking the tube several times.

Note: 100x binding enhancer consists of 750 mM MgCl2 and 300mM ZnCl2. Add

binding enhancer if the binding of the TF to DNA depends on cofactors or reducing

agents. If this information is not known, it is recommended to add the binding

enhancer.

3.3. Add 10 μl of 5 μM biotinylated capture DNA (50 pmol) to each respective binding

mixture. Incubate for 20 min at room temperature.

Note: Incubation time and temperature may vary depending on the TF. The

optimal condition may need to be determined experimentally.

4. Add 100 μl of streptavidin microbeads. Incubate for 10 min at room temperature.

5. For each oligo probe being tested, place a binding column in the magnetic separator.

Place a microcentrifuge tube directly under each binding column and apply 100 μl of

binding buffer to rinse the column.

6. Pipette the contents of each binding mixture into separate columns and allow the liquid

to flow completely through the column into the microcentrifuge tube. Before proceeding,

make sure to label the columns with the variant oligo that was used in the binding

mixture. Then label the flow-through samples and replace with new microcentrifuge

tubes to collect the low-stringency washes.

7. Apply 100 μl of low-stringency wash buffer to the column; wait until the column reservoir

is empty. Repeat wash 4 times. Label the low-stringency wash samples and replace

with new microcentrifuge tubes to collect the high-stringency washes.

8. Apply 100 μl of high-stringency wash buffer to the column; wait until the column

reservoir is empty. Repeat wash 4 times. Label the high-stringency wash samples and

68

replace with new microcentrifuge tubes to collect the pre-elution.

9. Add 30 μl of native elution buffer to the column and let stand for 5 min. Label the pre-

elution samples and replace with new microcentrifuge tubes to collect the elution.

Note: This does not elute the bound protein; it washes the remaining high-stringency

buffer out of the column and replaces it with elution buffer to maximize the efficiency of

the elution.

10. Add an additional 50 μl native elution buffer to elute the bound TFs. For a higher yield

but less concentrated eluate, add an additional 50 μl of native elution buffer and collect

the flow-through.

Note: Analyze elution samples through mass spectrometry to determine the identity

of the bound TFs. Afterwards, verify the proteomic results through sodium dodecyl

sulfite polyacrylamide gel electrophoresis (SDS-PAGE) followed by a Western blot. If

mass spectrometry is not available, run a silver stain using standard technique instead

of a Western blot to determine the size of the protein(s) showing genotype-dependent

binding.

69

Chapter 3. Lupus Risk Variant Increases pSTAT1 Binding and Decreases ETS1 Expression

70

ARTICLE

Lupus Risk Variant Increases pSTAT1 Binding and Decreases ETS1 Expression

Xiaoming Lu,1,2 Erin E. Zoller,2 Matthew T. Weirauch,2,3 Zhiguo Wu,4 Bahram Namjou,2 Adrienne H. Williams,5 Julie T. Ziegler,5 Mary E. Comeau,5 Miranda C. Marion,5 Stuart B. Glenn,6 Adam Adler,6 Nan Shen,2,7 Swapan K. Nath,6 Anne M. Stevens,8,9 Barry I. Freedman,10 Betty P. Tsao,11 Chaim O. Jacob,12 Diane L. Kamen,13 Elizabeth E. Brown,14,15 Gary S. Gilkeson,13 Graciela S. Alarco´n,15 John D. Reveille,16 Juan-Manuel Anaya,17 Judith A. James,6,18 Kathy L. Sivils,6 Lindsey A. Criswell,19 Luis M. Vila´,20 Marta E. Alarco´n-Riquelme,6,21 Michelle Petri,22 R. Hal Scofield,6,18,23 Robert P. Kimberly,15 Rosalind Ramsey-Goldman,24 Young Bin Joo,25 Jeongim Choi,25 Sang-Cheol Bae,25 Susan A. Boackle,26 Deborah Cunninghame Graham,27 Timothy J. Vyse,27 Joel M. Guthridge,6 Patrick M. Gaffney,6 Carl D. Langefeld,5 Jennifer A. Kelly,6 Kenneth D. Greis,28 Kenneth M. Kaufman,2,29 John B. Harley,2,29,30 and Leah C. Kottyan2,29,30,*

Genetic variants at chromosomal region 11q23.3, near the gene ETS1, have been associated with systemic lupus erythematosus (SLE), or lupus, in independent cohorts of Asian ancestry. Several recent studies have implicated ETS1 as a critical driver of immune cell function and differentiation, and mice deficient in ETS1 develop an SLE-like autoimmunity. We performed a fine-mapping study of 14,551 sub- jects from multi-ancestral cohorts by starting with genotyped variants and imputing to all common variants spanning ETS1. By con- structing genetic models via frequentist and Bayesian association methods, we identified 16 variants that are statistically likely to be causal. We functionally assessed each of these variants on the basis of their likelihood of affecting transcription factor binding, miRNA binding, or chromatin state. Of the four variants that we experimentally examined, only rs6590330 differentially binds lysate from B cells. Using mass spectrometry, we found more binding of the transcription factor signal transducer and activator of transcription 1 (STAT1) to DNA near the risk allele of rs6590330 than near the non-risk allele. Immunoblot analysis and chromatin immunoprecipita- tion of pSTAT1 in B cells heterozygous for rs6590330 confirmed that the risk allele increased binding to the active form of STAT1. Anal- ysis with expression quantitative trait loci indicated that the risk allele of rs6590330 is associated with decreased ETS1 expression in Han Chinese, but not other ancestral cohorts. We propose a model in which the risk allele of rs6590330 is associated with decreased ETS1 expression and increases SLE risk by enhancing the binding of pSTAT1.

Introduction plex deposition, and multi-organ tissue damage.1 The prevalence of SLE is 20–150 cases per 100,000 individ- Systemic lupus erythematosus (SLE [MIM 152700]) is a het- uals.2–4 Although the precise etiology of SLE remains un- erogeneous autoimmune disease characterized by hyperac- clear, the contribution of genetic factors has been shown tive T and B cells, autoantibody production, immune com- by association and family studies;5 the concordance rate

1Immunology Graduate Program, College of Medicine, University of Cincinnati, Cincinnati, OH 45229, USA; 2Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA; 3Divisions of Biomedical Informatics and Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA; 4Division of Immunobiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA; 5Center for Public Health Genomics and Department of Biostatistical Sciences, Wake Forest School of Medicine, Wake Forest Baptist Medical Center, Winston-Salem, NC 27157, USA; 6Arthritis and Clinical Immunology Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK 73104, USA; 7Joint Molecular Rheumatology Laboratory, Institute of Health Sciences and Shanghai Renji Hospital, Shanghai Jiao Tong University School of Medicine in collaboration with Shanghai Institutes for Biological Sciences and Chinese Academy of Sciences, Shanghai 200031, PRC; 8Center for Immunity and Immunotherapies, Seattle Children’s Research Institute, Seattle, WA 98101, USA; 9Division of Rheumatology, Department of Pediatrics, University of Washington, Seattle, WA 98195, USA; 10Section on Nephrology, Department of Internal Medicine, Wake Forest School of Medicine, Wake Forest Baptist Medical Center, Winston-Salem, NC 27157, USA; 11Division of Rheumatology, Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA; 12Department of Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA; 13Division of Rheumatology, Medical University of South Carolina, Charleston, SC 29425, USA; 14Division of Molecular and Cellular Pathology, Department of Pathology, University of Alabama at Birmingham, Birmingham, AL 35294, USA; 15Department of Med- icine, University of Alabama at Birmingham, Birmingham, AL 35294, USA; 16Rheumatology and Clinical Immunogenetics, University of Texas Health Sci- ence Center at Houston, Houston, TX 77030, USA; 17Center for Autoimmune Diseases Research, Universidad del Rosario, 110010 Bogota, Colombia; 18Department of Medicine, University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104, USA; 19Rosalind Russell / Ephraim P. Engleman Rheumatology Research Center, Department of Medicine, University of California, San Francisco, San Francisco, CA 94117, USA; 20Division of Rheuma- tology, Department of Medicine, School of Medicine, University of Puerto Rico, Medical Sciences Campus, San Juan, PR 00936, USA; 21Center for Genomics and Oncological Research, Pfizer-University of Granada-Junta de Andalucia, Granada 18016, Spain; 22Department of Medicine, School of Medicine, Johns Hopkins University, Baltimore, MD 21224, USA; 23Oklahoma City Veterans Affairs Medical Center, Oklahoma City, OK 73104, USA; 24Division of Rheumatology, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA; 25Department of Rheumatology, Hanyang University Hospital for Rheumatic Diseases, Seoul 133-792, Korea; 26Division of Rheumatology, School of Medicine, University of Colorado, Aurora, CO 80045, USA; 27Divisions of Genetics and Molecular Medicine and Immunology, King’s College London, Strand, London WC2R 2LS, UK; 28Cancer Biology, Uni- versity of Cincinnati Medical Center, Cincinnati, OH 45267, USA; 29Cincinnati Veterans Affairs Medical Center, Cincinnati, OH 45220, USA 30These authors contributed equally to this work *Correspondence: [email protected] http://dx.doi.org/10.1016/j.ajhg.2015.03.002. Ó2015 by The American Society of Human Genetics. All rights reserved.

The American Journal of Human Genetics 96, 731–739, May 7, 2015 731 71 among monozygotic twins is between 20% and 59%, and informed-consent process approved by the local institutional re- siblings of affected subjects have an 8- to 30-fold higher view boards. risk of developing the disease than the general population. Recently, several independent genome-wide association Genotyping of Genetic Variants and Sample Quality studies (GWASs) in Asian populations have confirmed Control that genetic variants in v-ets avian erythroblastosis virus We genotyped 69 SNPs covering the ETS1 region (spanning 128.2– E26 oncogene homolog 1 (ETS1 [MIM 164720]) are associ- 128.4 Mb on ; GRCh37, UCSC Genome Browser ated with susceptibility to SLE.6–10 These studies have hg19; Table S1) as part of a larger custom genotyping study. Specif- established that the most strongly associated SNPs in ically, the variants were chosen to span the association interval identified with the Infinium HumanHap330 array of the original ETS1 are rs6590330 and rs1128334. GWAS that identified significant association at this locus. Geno- ETS1 is known to play an important role in regulating 11 typing of SNPs was completed with Infinium chemistry on an immune cell proliferation and differentiation. Moreover, Illumina iSelect custom array according to the manufacturer’s Ets1 -deficient mice develop a lupus-like disease character- protocol. The following quality-control procedures were imple- ized by high titers of immunoglobulin M (IgM) and IgG mented for identifying SNPs for analysis: well-defined clusters autoantibodies and immune-complex deposition in the for genotype calling, call rate > 90% across all samples genotyped, kidneys.12 Meanwhile, ETS1 mRNA expression levels in minor allele frequency (MAF) > 0.1%, and p < 0.05 for differential peripheral-blood mononuclear cells (PBMCs) from SLE- missingness between case and control subjects. Markers with evi- affected individuals are considerably lower than those in dence of a departure from Hardy-Weinberg proportion expecta- < healthy subjects.8 Further, ETS1 mRNA expression in tion (p 0.0001 in control subjects) were removed from the initial PBMCs from chromosomes harboring lupus risk alleles is analysis. For LLAS2, we removed samples with a call rate < 90% or excess significantly lower than that in non-risk alleles of healthy heterozygosity (the average call rate for ETS1 was 99.3%). The re- subjects,8 indicating that the risk variants at this locus are ETS1 maining individuals were examined for excessive allele sharing associated with reduced expression. as estimated by identity by descent (IBD). In sample pairs with Previous studies have identified genetic association at excessive relatedness (IBD > 0.4), one individual was removed ETS1 , but no molecular mechanism has been presented from the analysis on the basis of the following criteria: (1) remove to explain the association. Using a comprehensive dataset the sample with the lower call rate, (2) remove the control sample of genetic variants in this region in subjects from multiple and retain the case sample, (3) remove the male sample before the ancestries, we employed frequentist and Bayesian fine- female sample, (4) remove the younger control sample before the mapping strategies13 to identify a set of variants that are older control sample, and (5) in a situation with two case samples, likely to be causal.14 This approach resulted in a total of remove the sample whose available phenotype data are less 16 genetic variants comprising our credible set of most complete. likely causal genetic variants. Importantly, we demon- strated that the minor allele of rs6590330, the most Ascertainment of Population Stratification strongly associated genetic variant, leads to increased bind- Genetic outliers from each ethnic and/or racial group were ing of the activated transcription factor signal transducer removed from further analysis as determined by principal-compo- nent (PC) analysis and admixture estimates, as previously and activator of transcription 1 (pSTAT1), encoded by 16 18 STAT1 described (Figure 1 in Lessard et al. and McKeigue et al. and (MIM 600555), and is correlated with decreased 19 ETS1 Price et al. ). To distinguish the four continental ancestral popu- expression. Altogether, our study provides insight lations, we used 347 ancestry-informative markers (AIMs) that into the mechanism driving the increased lupus risk at were from the same custom genotyping study and that passed this locus in subjects of Asian ancestry. quality control in both EIGENSTRAT19 and ADMIXMAP,20,21 allowing identification of the substructure within the sample set.22,23 The AIMs were selected to distinguish four continental Material and Methods ancestral populations: Africans, Europeans, American Indians, and East Asians. We utilized PCs from EIGENSTRAT outputs to Subjects and Study Design identify outliers of each of the first three PCs for the individual We used a large collection of samples from case and control sub- population clusters with visual inspection. jects from multiple ethnic groups (Table S1). These samples were from the collaborative Large Lupus Association Study 2 Statistical Analysis: Workflow (LLAS2)15 and were contributed by participating institutions in We initiated the analysis by assessing the association of genotyped the United States, Asia, and Europe. LLAS2, an SLE genetic-associ- variants in each of the four ancestral cohorts individually. Strategi- ation study, used a candidate-gene approach to genotype 347 cally, we analyzed the genotyped variants and then the imputed ancestral-informative markers and 31,851 candidate markers variants, performed full haplotype analysis, executed an analysis throughout the genome.16 According to genetic ancestry, subjects of linkage disequilibrium (LD), and finally built statistical models were grouped into four ethnic groups, including European and to account for the lupus-associated variability in each ancestry European American (EU), African American (AA), Asian and Asian with genome-wide statistical association. In building the one- American (AS), and Hispanic American (HA). All SLE subjects met SNP models of association in the AS ancestry cohort, we compar- the American College of Rheumatology criteria for the classifica- atively evaluated every possible variant for its ability to account tion of SLE17 and were enrolled in this study through an for the lupus-associated genetic variation.

732 The American Journal of Human Genetics 96, 731–739, May 7, 2015 72 Statistical Analysis: Frequentist Approach used SNPTEST to incorporate the probability threshold from We tested each SNP for its association with SLE by using logistic each imputed value into the statistical analysis. regression models that included three estimates of admixture pro- 24 portion as covariates as implemented in PLINK v.1.07 and Electrophoretic Mobility Shift Assay 25 SNPTEST v.2.5. The additive genetic model was assessed as the We annealed pairs of single-stranded 50 IRDye-700-infrared-dye- initially tested model of inheritance for this locus. Other models labeled 35-bp oligonucleotides (obtained from IDT) to generate were subsequently considered, but these were not found to be sub- double-stranded probes. We incubated 50 fmol of labeled probes stantially superior. with nuclear extract prepared from Epstein-Barr-virus-transformed We performed stepwise logistic regression to identify those sin- B cell lines, poly dI-dC (a sodium salt complex of two strands, each gle-nucleotide variants (SNVs) independently associated with the with an alternating sequence of deoxyinosinic acid and deoxycy- development of lupus in PLINK and SNPTEST. For these analyses, tidylic acid), and buffers supplied by the Odyssey Infrared EMSA the allelic dosage(s) of specific variant(s), in addition to the admix- Kit (LI-COR Biosciences) according to LI-COR’s recommended pro- ture estimates, were added to the logistic model as covariates. LD tocols. The binding reactions were analyzed by electrophoresis on 26,27 and haplotypes were determined with Haploview v.4.2. We 6% Tris-borate-EDTA polyacrylamide gels and detected by an > calculated haplotype blocks for those haplotypes present at 3% infrared fluorescent procedure with the Odyssey Infrared Imaging frequency by using the four-gamete-rule algorithms with a mini- System (LI-COR Biosciences). The oligonucleotides sequences are 2 mum r value of 0.8. We performed haplotypic associations in included in Table S2. PLINK by using a sliding-window approach and assessing the asso- ciation of haplotypes defined by logistic regression, as described earlier. DNA Affinity Precipitation Assay We annealed pairs of single-stranded 50-biotinylated 35-bp oligo- nucleotides (obtained from IDT) to generate double-stranded Statistical Analysis: Bayesian Approach probes. Cell lysates were prepared from Epstein-Barr-virus-trans- Using SNPTEST, we calculated the Bayesian factor (BF) for each formed B cell lines with cell-lysis buffer supplied by the mMACS genetic variant: we divided the probability of the genotype Factor Finder Kit (Miltenyi Biotec) and the addition of protease configuration at that genetic variant in case and control subjects inhibitor and phosphatase inhibitors (Pierce ). under the alternative hypothesis that the genetic variant was Binding reactions were then performed with biotinylated probes, associated with disease status by the probability of the genotype cell lysate, binding buffer, binding enhancer, protease inhibitor, configuration at that genetic variant in case and control subjects phosphatase inhibitor, and 0.1 mg poly(dI-dC) according to proto- under the null hypothesis that disease status was independent of cols supplied by the mMACS Factor Finder Kit. Eluted probe- genotype at that SNP (we used the methods developed and intro- bounded proteins were identified by Nano liquid chromatography 13 14 duced in Maller et al. and implemented in Kottyan et al. ). We followed by tandem mass spectrometry analysis30 and immuno- used three admixture estimates as covariates, as we did for the fre- blotting with anti-STAT1 or anti-pSTAT1 antibodies. The oligonu- quentist approach. Large BF values correlate with robust evidence cleotide sequences are included in Table S2. of association, given that small probabilities provide strong evi- dence in a frequentist approach. For well-powered studies, the Chromatin Immunoprecipitation qPCR BFs of relatively common variants are highly correlated with Crosslinking of protein-chromatin complexes was achieved by in- the p values (reviewed in Stephens and Balding28). We used the cubation of Epstein-Barr-virus-transformed B cells in crosslinking additive model. The linear predictor is log(pi /(1 pi)) ¼ m þ 2 2 solution (1% formaldehyde, 5 mM HEPES [pH 8.0], 10 mM sodium ßGi, and the prior is m ~N(0,1 ), b ~ N(0,0.2 ) (variables are chloride [NaCl], 0.1 mM EDTA, and 0.05 mM EGTA) and shaking defined in SNPTEST and the supplemental note in Maller at room temperature for 10 min. Glycine was added to a final con- et al.13). centration of 0.125 M to quench the crosslinking. Cells were To identify the variants most likely to be driving the statistical washed twice with ice-cold PBS, resuspended in lysis buffer association, we calculated a posterior probability under the (50 mM HEPES [pH 8.0], 140 mM NaCl, 1 mM EDTA, 10% glycerol, assumption that any of the variants within a single genetic effect 0.25% Triton X-100, and 0.5% NP-40), and incubated for 10 min could be causal and that only one of these variants was causal for on ice. Protease and phosphatase inhibitors were added to all each genetic effect. Variants with a low posterior probability are buffers. Nuclei were harvested after centrifugation at 5,000 rpm highly unlikely to be causal regardless of the allele frequency or for 10 min, resuspended in lysis buffer L2 (10 mM Tris-HCl presence of the actual causal variant in the analysis, according to [pH 8.0], 1 mM EDTA, 200 mM NaCl, and 0.5 mM EGTA), and the procedure as presented.13 Regardless of whether the causal var- incubated at room temperature for 10 min. Nuclei were resus- iants have been genotyped in this experiment, variants with a low pended in sonication buffer (10 mM Tris [pH 8.0], 1 mM EDTA, posterior probability are unlikely to be causal.13 and 0.1% SDS) after centrifuging. A S220 focused ultrasonicator (COVARIS) was used to shear genomic DNA (150- to 500-bp frag- Imputation to Composite 1000 Genomes Reference ments) with 10% duty cycle, 175 peak power, and 200 bursts per Panel cycle for 7 min. Sheared chromatin was precleared with 10 ml To detect associated variants that were not directly genotyped, we Dynabeads Protein G (Life Technologies) at 4C for 1 hr. Antibody imputed the ETS1 region with IMPUTE2 by using a composite (anti-STAT1, Santa Cruz Biotechnology) was incubated with 20 ml imputation reference panel based on 1000 Genomes Project Dynabeads Protein G at room temperature for 1 hr and then sequence data from March 2012.25,29 Imputed genotypes were washed with PBS once. The antibody-coated beads were incubated included in the analysis if they had or exceeded a probability with sheared chromatin at 4C overnight. A volume of 1% sheared threshold of 0.5, an information measure of >0.4, and the same chromatin was used as input control. After immunoprecipitation, quality-control criteria described for the genotyped markers. We the beads were washed consecutively with low-salt wash buffer

The American Journal of Human Genetics 96, 731–739, May 7, 2015 733 73 Figure 1. ETS1-Imputed Genetic Variants Demonstrate Genome-wide Lupus Associ- ation in a Cohort of Asian Ancestry Each variant is represented as a data point in the context of its genomic location and is colored on the basis of whether it was directly genotyped (red) or only imputed (blue). Genomic position is given with GRCh37 coordinates. rs6590330 was directly genotyped. The SLE association of genotyped and imputed variants in co- horts of Asian and Asian-American (AS) ancestry (12,57 case and 1,258 control sub- jects), Hispanic-American (HA) ancestry (952 case and 335 control subjects), Afri- can-American (AA) ancestry (1,524 case and 1,809 control subjects), and European and European-American (EU) ancestry (3,926 case and 3,490 control subjects) were assessed in a logistic regression with adjustment for admixture estimates. Genome-wide association was defined as p < 5 3 10 8 and is indicated by a dashed red line in each figure panel.

(0.1% SDS, 1% Triton X-100, 0.1% sodium deoxycholate, 1 mM We tested each genetic variant individually for its associ- EDTA, 50 mM Tris-HCl [pH 8], and 150 mM NaCl) twice, high- ation with SLE in each ancestry by using admixture esti- salt wash buffer (as above with 500 mM NaCl) twice, LiCl wash mates for subjects as covariates (Figure 1). As described in buffer (0.5 M LiCl, 1% NP-40, 0.7% sodium deoxycholate, 1 mM previous GWASs, the AS ancestry was the only one in EDTA, and 50 mM Tris–HCl [pH 8]) twice, and 1 mM EDTA and which we identified variants with a probability (p) reach- 10 mM Tris-HCl (pH 8) twice. Purified chromatin fragments were ing genome-wide significance (p < 5 3 10 8)(Figure 1). eluted from the beads with elution buffer (340 mM NaCl, 1 mM Apart from the AS ancestry, we also observed some sugges- EDTA, and 10 mM Tris-HCl) and 1 mg/ml proteinase K and incu- bated at 37C for 1 hr. DNA crosslinks were reversed by incubation tive association in other ancestries (Figure 1). Before per- of precipitates at 65C for 5 hr. DNA was purified by the PureLink forming this association study, we had enough power to

PCR Micro Kit (Life Technologies) and resuspended in H2O. DNA detect associations with both the AA and EU ancestries was then analyzed by qPCR with a single set of genotyping primers (a prior power of 92% for AA and 99% for EU), whereas and differentially tagged fluorescent probes for the risk and non- we only had 25% power to detect an association with HA risk alleles of rs6590330. This qPCR was performed by TaqMan ancestry. However, HA ancestry had variants with associa- assay on an ABI 7500 PCR system. tion at p < 10 7, which was reduced to 10 4 after For calculating chromatin immunoprecipitation (ChIP) results admixture adjustment (Figure S1). This suggests that the as enrichment folds, the amount of immunoprecipitated DNA association seen in HA ancestry might be due to a mixed from the negative control site and the amount of immunoprecip- Asian population structure. Therefore, we conclude that itated DNA from the target site were first normalized against the genetic variants in ETS1 are associated with SLE only in amount of input DNA. The enrichment folds were then calculated as the amount of immunoprecipitated DNA from the target site the AS ancestry cohort. divided by the amount of immunoprecipitated DNA from the After performing a haplotype association analysis in the negative control site. The sequences of primer pairs and probes AS ancestry cohort (Table S3), we found that no haplotypic 9 are included in Table S2. model of association (plowest ¼ 7.38 3 10 ) was more sig- nificant than the SNP rs6590330 (the most significant SNV; p value ¼ 1.80 3 10 11, odds ratio ¼ 1.407 [1.2585– Results 1.573]), suggesting a single-variant model for lupus risk at this locus. We then performed a stepwise logistic regression We genotyped a total of 69 genetic variants in ETS1 in 7,659 analysis to identify independent genetic loci in the AS SLE subjects and 6,892 control subjects (Table S1). Our cohort; we started with adjustment for rs6590330. In the trans-ancestral cohort included AS, HA, EU, and AA partic- dataset containing only directly genotyped variants, no ipants. We then imputed against 1000 Genomes data, SNP retained a p value < 10 2 after adjustment for acquiring 1,333 genetic variants with MAFs > 0.01. A total rs6590330 (Figure 2). Four SNPs retained p values in the of 1,402 genetic variants were used for a fine-mapping and range of 0.001–0.01 after adjustment for rs6590330 with model-building study aimed at identifying the causal ge- the full dataset containing both genotyped and imputed netic variants driving the lupus association at this locus. variants. In summary, the great majority of the lupus

734 The American Journal of Human Genetics 96, 731–739, May 7, 2015 74 AB Figure 2. A Single Genetic Effect Marked by Genotyped SNV rs6590330 Contrib- utes to Lupus Risk in the AS Cohort Genomic position is given with GRCh37 coordinates. (A) The logistic regression association of genotyped variants in an AS cohort with an adjustment for admixture. (B) The logistic association of genotyped variants in an AS cohort with an adjust- ment for admixture and rs6590330. (C) The logistic association of genotyped and imputed variants in an AS cohort CD with an adjustment for admixture. (D) The logistic association of genotyped and imputed variants in an AS cohort with an adjustment for admixture and rs6590330.

On the basis of our analysis, we hy- pothesized that the causal variant might reduce ETS1 expression through differential miRNA bind- ing, differential transcriptional factor binding, and/or changing chromatin association at this locus was explained by rs6590330. interaction(s) or state(s). We used datasets and tools avail- Therefore, the frequentist approach is consistent with a able from TargetScan,31 Cis-BP,32 Roadmap Epigenomics,33 model in which a single genetic effect marked by genotyped ENCODE,34 and other sources to assess these possibilities SNV rs6590330 contributes to lupus risk in our AS cohort. for each of the SNVs in the credible set. The non-risk allele To complement our frequentist approach, we used a of rs1128334 was predicted to be bound by miR-300;31,35 Bayesian fine-mapping strategy to identify the set of however, miR-300 is not expressed in cells of hematopoi- genetic variants that account for 95% of the total posterior etic origin. The other 15 variants were not located within probability in the region. In total, this procedure was the ETS1 transcript and thus were not predicted to highly consistent with the frequentist approach and iden- disrupt miRNA binding. Altogether, we identified four tified 16 genetic variants (3 genotyped and 13 imputed) as promising functional variants with experimental evidence the 95% credible set that could be responsible for the ETS1 suggesting that they might affect transcription factor association (Figure 3). These same 16 variants were also binding by locating to active chromatin regions. These the most highly associated variants in the frequentist four variants were identified according to the chromatin- approach. Of the variants evaluated, rs6590330 made a state-segmentation hidden Markov model from the Road- 125-fold greater contribution than any other variant. map Project,33 chromatin looping to RNA polymerase II,36 Most SNPs in the credible set are in a non-coding region and DNase hypersensitivity clusters (Figure S3). We then of the genome, so the way they might act is by affecting experimentally analyzed these four most promising vari- ETS1 expression levels. Yang et al. showed that the risk ants for differential transcriptional factor binding between allele of rs1128334 is associated with reduced ETS1 mRNA risk and non-risk alleles with electrophoretic mobility expression.8 Because rs1128334 is an SNP of the credible shift assays using nuclear lysate from B cells (Figure S4). set, and is in high LD with the other 15 SNPs, we hypothe- Of these variants, only rs6590330 exhibited differential sized that each of the SNPs in the credible set was associated binding of the risk and non-risk oligonucleotides to nu- with reduced ETS1 expression. We confirmed this hypoth- clear factors (Figure S4). For this variant, we identified esis that members of the candidate credible set were associ- the specific differentially bound protein by using a DNA ated with ETS1 expression by using publically available affinity precipitation assay (DAPA) followed by mass spec- SNP-mRNA expression data. For example, we found that trometry. The mass spectrometry results indicated that rs6590330 is a strong expression quantitative trait locus STAT1 binds to the risk allele but not the non-risk allele (eQTL) for ETS1, given that the risk allele is associated for rs6590330 (Table S4). We confirmed this finding by with reduced ETS1 expression in HapMap subjects from DAPA followed by immunoblotting for phosphorylated the China-Beijing cohort (Figure S2). Meanwhile, this SNP STAT1 and total STAT1 (Figure 4). Critically, STAT1 ChIP- is also found to be a significant eQTL for nearby gene qPCR analysis also confirmed that the risk allele of TP53AIP1 (MIM 605426), suggesting the potential for mul- rs6590330 has more STAT1 binding enrichment than the tiple genotype-dependent changes in gene expression at non-risk allele in B cells heterozygous for rs6590330 this locus (Figure S2). (Figure 4).

The American Journal of Human Genetics 96, 731–739, May 7, 2015 735 75 A

B

Figure 4. The Lupus Risk Allele of rs6590330 Increases STAT1 Binding (A) STAT1 and pSTAT1 exhibit higher binding to oligonucleotides containing the rs6590330 risk allele than to oligonucleotides con- taining the non-risk allele. Biotin-labeled oligonucleotides were incubated with nuclear extract from Epstein-Barr-virus-trans- formed B cells. Proteins bound to the oligonucleotides were captured with the mMACs Factor Finder Kit. Proteins were then Figure 3. Bayesian Association Plot Showing the Signal separated by SDS-PAGE and detected with anti-pSTAT1 (top) and ETS1 Strength in as the Posterior Probability of Each SNV anti-STAT1 (bottom). Abbreviations are as follows: M, marker; Genomic position is given with GRCh37 coordinates; AS data are NR, oligonucleotide containing the non-risk allele of rs6590330; shown. SNVs are colored according to their origin: genotyped var- R, oligonucleotide containing the risk allele of rs6590330 (see iants are in red, and imputed variants are in blue. Variants in the Figure S5); mutant, oligonucleotide containing a disrupted puta- 95% credible set are marked by diamonds. Variants with larger pos- tive STAT binding site downstream of rs6590330; and cell lysate, > terior probabilities ( 0.01) represent those most likely to be causal. nuclear extract from B cells. The relative intensities of the bands are given above each band. Results are representative of four exper- iments; although all experiments demonstrated increased STAT1 rs6590330 is located 11 bases away from a putative binding to the probes with the risk allele, in two of four experi- STAT1 binding site predicted by a binding model identified ments, no STAT1 or pSTAT1 was detected in the immunoprecipi- in Factorbook37 (Figure S5). When this STAT binding site tate from the non-risk oligonucleotide, whereas both were was disrupted in the oligonucleotide used for the DAPA detected in the immunoprecipitate from the risk oligonucleotide. (B) rs6590330-heterozygous Epstein-Barr-virus-transformed B cells analysis containing the risk variant, the binding of STAT1 were used for ChIP-qPCR assessment of the differential binding of was also disrupted (Figure 4). STAT1 can be activated by STAT1 to the risk and non-risk alleles. Crosslinked and sonicated type I interferons, including interferon alpha. In the chromatin was immunoprecipitated with an anti-STAT1 antibody. context of data supporting chromatin looping of DNA sur- Site-specific primers and probes specific to the rs6590330 risk and non-risk alleles were used for determining STAT1 binding to rounding rs650330 to the ETS1 transcription start site, it is immunoprecipitated DNA. possible that this variant enhances activated STAT1 bind- ing to the putative STAT1 binding site near rs6590330 and thus results in the repression of ETS1 transcription Discussion initiated by RNA polymerase II. Taken together, these re- sults strongly support a model in which the variant The genotyped and imputed data from 14,551 subjects rs6590330 is highly associated with decreased expression facilitated fine mapping of ETS1 and its association with of ETS1 and increased lupus risk through pSTAT1 binding SLE risk. A model consisting of a single genetic effect of a DNA sequence located next to the risk allele in B cells. was identified by stepwise logistic regression analyses.

736 The American Journal of Human Genetics 96, 731–739, May 7, 2015 76 Importantly, a set of 16 variants that are likely to be causal tion to IgM plasma cells and thus subsequently contribute were identified through frequentist and Bayesian analyses. to SLE pathogenesis. Meanwhile, even though we present Our data are consistent with a mechanistic model in which rs6590330 as a candidate causal variant responsible for dis- the risk allele of rs6590330 at ETS1 contributes to increased ease risk, more experiments are necessary to establish cau- SLE risk by facilitating binding of pSTAT1 to a nearby puta- sality and define the genotype-dependent immunological tive STAT binding site and subsequent repression of the mechanisms driving lupus at this locus. expression of ETS1. Our fine-mapping results are corrobo- In conclusion, we conducted a large trans-ancestral fine- rated by a recent study that used a different analytical mapping study of ETS1 to identify genetic variants that method to estimate the probability that each variant increase lupus risk. After determining a model of single ge- within ETS1 is a causal variant; this study concluded that netic effect, we used frequentist and Bayesian association rs6590330 is among the most likely causal variants pro- methods to identify a set of 16 variants that are most likely posed for the ETS1 association with SLE,38 further support- causal. Of these, we identified an allele-specific function ing the results of our genetic analysis. for rs6590330. Altogether, we propose a model in which The biological mechanism we propose herein is specific the risk allele of rs6590330 increases SLE risk through to the association of ETS1 variants in the AS cohort. We increased binding of pSTAT1 and depressed expression performed the functional validation in B cells on the basis of ETS1. of the evidence that B cells play a critical role in the etiol- ogy of SLE. Previous studies have demonstrated that B cells Supplemental Data are significantly enriched with the expression of genes near lupus risk loci.39 Meanwhile, B cells from lupus- Supplemental Data include five figures and four tables and can be affected individuals produce autoantibodies, are hyperacti- found with this article online at http://dx.doi.org/10.1016/j.ajhg. 2015.03.002. vated, and have an exaggerated response to Toll-like receptor ligands and immune complexes.40 ETS1-hypo- morphic mice (producing ETS1 lacking the Pointed Acknowledgments Ets1p/p domain [ ]) develop a strikingly similar B cell pheno- We are grateful for support from the NIH (AI024717, AI063274, 12,41 type. This ETS1 deficiency has been shown to drive AI082714, AI083194, AR043274, AR043727, AR043814, intrinsic terminal differentiation of B cells into IgM- AR051545, AR053483, AR056360, AR057172, AR058959, secreting plasma cells in a B-cell-intrinsic fashion.42 AR060366, AR063124, AR065626, AR62277, GM103456, Although we limited our functional analysis to B cells, it re- GM104938, HG006828, K24AI078004, K24AR02318, mains possible that the same genetic risk mechanism MD007909, P01AR49084, P30AR053483, P30AR055385, might also operate in other cell types, such as T cells.42,43 P30GM103510, P60AR053308, P60AR062755, P60AR064464, ¼ R01AR44804, R21AI070304, S10RR027015, TR000077, In all four ancestries, rs6590330 is polymorphic (MAFAS ¼ ¼ ¼ U01AI101934, U01HG006828, U19AI082714, U54GM104938, 45%, MAFAA 31%, MAFHA 28%, and MAFEU 26%). UL1RR029882, UL1TR000004, and UL1TR000150). Support for The ETS1 association has only been observed in AS ancestry this project was also provided by the United States Departments cohorts in our study and others before us. Perhaps ETS1 var- of Veteran Affairs and Defense (PR094002), a Kirkland Scholar iants do not increase SLE risk in AA and EU populations, a Award, the National Basic Research Program of China (973 pro- possibility consistent with epistasis or environmental fac- gram, 2014CB541901), the National Natural Science Foundation 44 tors (gene-environment interactions). Asian-specific vari- of China (81230072 and 81421001), the State Key Laboratory of ants have been identified in previous studies,1,10 and it is Oncogenes and Related Genes (grant 91-14-05), the Key Research not surprising that ancestry-specific genetic factors affect Program of the Chinese Academy of Sciences (KJZD-EW-L01-3), the risk of SLE. Subjects of Asians ancestry have a higher the Program of the Shanghai Commission of Science and Technol- SLE incidence than do those of European ancestry, in ogy (12JC1406000 and 12431900703), the Instituto de Salud Car- addition to an ancestry-specific distribution of clinical los III (partly financed by Fonds Europe´en de De´veloppement ´ manifestations.45 Regional funds from the European Union [02558]), the Proyecto de Excelencia of the Junta de Andalucı´a (CTS2548), the Arthritis We found that the risk allele of rs6590330 results in pref- Foundation, the Alliance for Lupus Research, and the Korea erential binding of pSTAT1 to a nearby putative STAT bind- Healthcare Technology R&D Project of the Ministry for Health ETS1 ing site and is associated with reduced expression of . and Welfare, Republic of Korea (HI13C2124). STAT1 plays a complex role in regulating gene expression and is capable of acting as either an activator or a repressor, Received: December 18, 2014 depending on the cellular context.46 This is an intriguing Accepted: March 5, 2015 candidate causal mechanism in which disease risk is medi- Published: April 9, 2015 ated through the disruption of transcription factor binding to a nearby site that does not contain the risk variant. Web Resources Because the risk allele of rs6590330 is associated with reduced ETS1 expression (Yang et al.8 and Figure S2), it is The URLs for data presented herein are as follows: possible that reduced ETS1 expression associated with ADMIXMAP, http://homepages.ed.ac.uk/pmckeigu/admixmap/ the risk allele of rs6590330 might skew B cell differentia- CIS-BP, http://cisbp.ccbr.utoronto.ca/

The American Journal of Human Genetics 96, 731–739, May 7, 2015 737 77 EIGENSTRAT, http://www.hsph.harvard.edu/alkes-price/software/ association study in Korean women. Ann. Rheum. Dis. 73, Factorbook, http://www.factorbook.org/mediawiki/index.php/ 1240–1245. STAT1 11. Leng, R.-X., Pan, H.-F., Chen, G.-M., Feng, C.-C., Fan, Y.-G., Ye, Haploview, http://www.broadinstitute.org/scientific-community/ D.-Q., and Li, X.-P. (2011). The dual nature of Ets-1: focus to science/programs/medical-and-population-genetics/haploview/ the pathogenesis of systemic lupus erythematosus. Autoim- haploview mun. Rev. 10, 439–443. OMIM, http://www.omim.org/ 12. Wang, D., John, S.A., Clements, J.L., Percy, D.H., Barton, K.P., PLINK, http://pngu.mgh.harvard.edu/~purcell/plink/ and Garrett-Sinha, L.A. (2005). Ets-1 deficiency leads to altered SNPTEST, https://mathgen.stats.ox.ac.uk/genetics_software/ B cell differentiation, hyperresponsiveness to TLR9 and auto- snptest/snptest.html immune disease. Int. Immunol. 17, 1179–1191. TargetScan, http://www.targetscan.org/ 13. Maller, J.B., McVean, G., Byrnes, J., Vukcevic, D., Palin, K., Su, UCSC Genome Browser, http://genome.ucsc.edu/ Z., Howson, J.M.M., Auton, A., Myers, S., Morris, A., et al.; Wellcome Trust Case Control Consortium (2012). Bayesian refinement of association signals for 14 loci in 3 common dis- References eases. Nat. Genet. 44, 1294–1301. 1. Cui, Y., Sheng, Y., and Zhang, X. (2013). Genetic susceptibility 14. Kottyan, L.C., Zoller, E.E., Bene, J., Lu, X., Kelly, J.A., Rupert, to SLE: recent progress from GWAS. J. Autoimmun. 41, 25–33. A.M., Lessard, C.J., Vaughn, S.E., Marion, M., Weirauch, 2. Lawrence, R.C., Helmick, C.G., Arnett, F.C., Deyo, R.A., Felson, M.T., et al.; UK Primary Sjo¨gren’s Syndrome Registry; UK Pri- D.T., Giannini, E.H., Heyse, S.P., Hirsch, R., Hochberg, M.C., mary Sjo¨gren’s Syndrome Registry (2015). The IRF5-TNPO3 Hunder, G.G., et al. (1998). Estimates of the prevalence of association with systemic lupus erythematosus has two com- arthritis and selected musculoskeletal disorders in the United ponents that other autoimmune disorders variably share. States. Arthritis Rheum. 41, 778–799. Hum. Mol. Genet. 24, 582–596. 3. Pons-Estel, G.J., Alarco´n, G.S., Scofield, L., Reinlib, L., and 15. Rasmussen, A., Sevier, S., Kelly, J.A., Glenn, S.B., Aberle, T., Cooper, G.S. (2010). Understanding the epidemiology and Cooney, C.M., Grether, A., James, E., Ning, J., Tesiram, J., progression of systemic lupus erythematosus. Semin. Arthritis et al. (2011). The lupus family registry and repository. Rheu- Rheum. 39, 257–268. matology (Oxford) 50, 47–59. 4. Chakravarty, E.F., Bush, T.M., Manzi, S., Clarke, A.E., and 16. Lessard, C.J., Adrianto, I., Kelly, J.A., Kaufman, K.M., Grun- Ward, M.M. (2007). Prevalence of adult systemic lupus erythe- dahl, K.M., Adler, A., Williams, A.H., Gallant, C.J., Anaya, matosus in California and Pennsylvania in 2000: estimates J.M., Bae, S.C., et al.; Marta E. Alarco´n-Riquelme on behalf obtained using hospitalization data. Arthritis Rheum. 56, of the BIOLUPUS and GENLES Networks (2011). Identifica- 2092–2094. tion of a systemic lupus erythematosus susceptibility locus 5. Alarco´n-Segovia, D., Alarco´n-Riquelme, M.E., Cardiel, M.H., at 11p13 between PDHX and CD44 in a multiethnic study. Caeiro, F., Massardo, L., Villa, A.R., and Pons-Estel, B.A.; Grupo Am. J. Hum. Genet. 88, 83–91. Latinoamericano de Estudio del Lupus Eritematoso (GLADEL) 17. Hochberg, M.C. (1997). Updating the American College of (2005). Familial aggregation of systemic lupus erythematosus, Rheumatology revised criteria for the classification of systemic rheumatoid arthritis, and other autoimmune diseases in 1,177 lupus erythematosus. Arthritis Rheum. 40, 1725. lupus patients from the GLADEL cohort. Arthritis Rheum. 52, 18. McKeigue, P.M., Carpenter, J.R., Parra, E.J., and Shriver, M.D. 1138–1147. (2000). Estimation of admixture and detection of linkage in 6. Han, J.-W., Zheng, H.-F., Cui, Y., Sun, L.-D., Ye, D.-Q., Hu, Z., admixed populations by a Bayesian approach: application Xu, J.-H., Cai, Z.-M., Huang, W., Zhao, G.-P., et al. (2009). to African-American populations. Ann. Hum. Genet. 64, Genome-wide association study in a Chinese Han population 171–186. identifies nine new susceptibility loci for systemic lupus ery- 19. Price, A.L., Patterson, N.J., Plenge, R.M., Weinblatt, M.E., thematosus. Nat. Genet. 41, 1234–1237. Shadick, N.A., and Reich, D. (2006). Principal components 7. He, C.F., Liu, Y.S., Cheng, Y.L., Gao, J.P., Pan, T.M., Han, J.W., analysis corrects for stratification in genome-wide association Quan, C., Sun, L.D., Zheng, H.F., Zuo, X.B., et al. (2010). studies. Nat. Genet. 38, 904–909. TNIP1, SLC15A4, ETS1, RasGRP3 and IKZF1 are associated 20. Hoggart, C.J., Parra, E.J., Shriver, M.D., Bonilla, C., Kittles, with clinical features of systemic lupus erythematosus in a R.A., Clayton, D.G., and McKeigue, P.M. (2003). Control of Chinese Han population. Lupus 19, 1181–1186. confounding of genetic associations in stratified populations. 8. Yang, W., Shen, N., Ye, D.-Q., Liu, Q., Zhang, Y., Qian, X.-X., Am. J. Hum. Genet. 72, 1492–1504. Hirankarn, N., Ying, D., Pan, H.-F., Mok, C.C., et al.; Asian 21. Hoggart, C.J., Shriver, M.D., Kittles, R.A., Clayton, D.G., and Lupus Genetics Consortium (2010). Genome-wide association McKeigue, P.M. (2004). Design and analysis of admixture study in Asian populations identifies variants in ETS1 and mapping studies. Am. J. Hum. Genet. 74, 965–978. WDFY4 associated with systemic lupus erythematosus. PLoS 22. Smith, M.W., Patterson, N., Lautenberger, J.A., Truelove, A.L., Genet. 6, e1000841. McDonald, G.J., Waliszewska, A., Kessing, B.D., Malasky, M.J., 9. Zhong, H., Li, X.L., Li, M., Hao, L.X., Chen, R.W., Xiang, K., Scafe, C., Le, E., et al. (2004). A high-density admixture map Qi, X.B., Ma, R.Z., and Su, B. (2011). Replicated associations for disease gene discovery in african americans. Am. J. Hum. of TNFAIP3, TNIP1 and ETS1 with systemic lupus erythemato- Genet. 74, 1001–1013. sus in a southwestern Chinese population. Arthritis Res. Ther. 23. Halder, I., Shriver, M., Thomas, M., Fernandez, J.R., and Fruda- 13, R186. kis, T. (2008). A panel of ancestry informative markers for esti- 10. Lee, H.-S., Kim, T., Bang, S.Y., Na, Y.J., Kim, I., Kim, K., Kim, mating individual biogeographical ancestry and admixture J.-H., Chung, Y.-J., Shin, H.D., Kang, Y.M., et al. (2014). Ethnic from four continents: utility and applications. Hum. Mutat. specificity of lupus-associated loci identified in a genome-wide 29, 648–658.

738 The American Journal of Human Genetics 96, 731–739, May 7, 2015 78 24. Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, 35. Betel, D., Wilson, M., Gabow, A., Marks, D.S., and Sander, C. M.A., Bender, D., Maller, J., Sklar, P., de Bakker, P.I., Daly, (2008). The microRNA.org resource: targets and expression. M.J., and Sham, P.C. (2007). PLINK: a tool set for whole- Nucleic Acids Res. 36, D149–D153. genome association and population-based linkage analyses. 36. Li, G., Fullwood, M.J., Xu, H., Mulawadi, F.H., Velkov, S., Vega, Am. J. Hum. Genet. 81, 559–575. V., Ariyaratne, P.N., Mohamed, Y.B., Ooi, H.S., Tennakoon, C., 25. Marchini, J., Howie, B., Myers, S., McVean, G., and Donnelly, et al. (2010). ChIA-PET tool for comprehensive chromatin P. (2007). A new multipoint method for genome-wide associ- interaction analysis with paired-end tag sequencing. Genome ation studies by imputation of genotypes. Nat. Genet. 39, Biol. 11, R22. 906–913. 37. Wang, J., Zhuang, J., Iyer, S., Lin, X., Whitfield, T.W., Greven, 26. Barrett, J.C., Fry, B., Maller, J., and Daly, M.J. (2005). Haplo- M.C., Pierce, B.G., Dong, X., Kundaje, A., Cheng, Y., et al. view: analysis and visualization of LD and haplotype maps. (2012). Sequence features and chromatin structure around Bioinformatics 21, 263–265. the genomic regions bound by 119 human transcription fac- 27. Barrett, J.C. (2009). Haploview: Visualization and analysis of tors. Genome Res. 22, 1798–1812. SNP genotype data. Cold Spring Harb Protoc 2009, ip71. 38. Farh, K.K., Marson, A., Zhu, J., Kleinewietfeld, M., Housley, 28. Stephens, M., and Balding, D.J. (2009). Bayesian statistical W.J., Beik, S., Shoresh, N., Whitton, H., Ryan, R.J., Shishkin, methods for genetic association studies. Nat. Rev. Genet. 10, A.A., et al. (2015). Genetic and epigenetic fine mapping of 681–690. causal autoimmune disease variants. Nature 518, 337–343. 29. Altshuler, D.M., Gibbs, R.A., Peltonen, L., Altshuler, D.M., 39. Hu, X., Kim, H., Stahl, E., Plenge, R., Daly, M., and Raychaud- Gibbs, R.A., Peltonen, L., Dermitzakis, E., Schaffner, S.F., Yu, huri, S. (2011). Integrating autoimmune risk loci with gene- F., Peltonen, L., et al.; International HapMap 3 Consortium expression data identifies specific pathogenic immune cell (2010). Integrating common and rare genetic variation in subsets. Am. J. Hum. Genet. 89, 496–506. diverse human populations. Nature 467, 52–58. 40. Do¨rner, T., Giesecke, C., and Lipsky, P.E. (2011). Mechanisms 30. Wijeratne, A.B., Manning, J.R., Schultz, Jel.J., and Greis, K.D. of B cell autoimmunity in SLE. Arthritis Res. Ther. 13, 243. (2013). Quantitative phosphoproteomics using acetone-based 41. Wang, H., Xu, J., Ji, X., Yang, X., Sun, K., Liu, X., and Shen, Y. peptide labeling: method evaluation and application to a car- (2005). The abnormal apoptosis of T cell subsets and possible diac ischemia/reperfusion model. J. Proteome Res. 12, 4268– involvement of IL-10 in systemic lupus erythematosus. Cell. 4279. Immunol. 235, 117–121. 31. Lewis, B.P., Burge, C.B., and Bartel, D.P. (2005). Conserved 42. John, S.A., Clements, J.L., Russell, L.M., and Garrett-Sinha, seed pairing, often flanked by adenosines, indicates that thou- L.A. (2008). Ets-1 regulates plasma cell differentiation by inter- sands of human genes are microRNA targets. Cell 120, 15–20. fering with the activity of the transcription factor Blimp-1. 32. Weirauch, M.T., Yang, A., Albu, M., Cote, A.G., Montenegro- J. Biol. Chem. 283, 951–962. Montero, A., Drewe, P., Najafabadi, H.S., Lambert, S.A., 43. Moisan, J., Grenningloh, R., Bettelli, E., Oukka, M., and Ho, Mann, I., Cook, K., et al. (2014). Determination and inference I.C. (2007). Ets-1 is a negative regulator of Th17 differentia- of eukaryotic transcription factor sequence specificity. Cell tion. J. Exp. Med. 204, 2825–2835. 158, 1431–1443. 44. Hirschhorn, J.N., Lohmueller, K., Byrne, E., and Hirschhorn, 33. Bernstein, B.E., Stamatoyannopoulos, J.A., Costello, J.F., Ren, K. (2002). A comprehensive review of genetic association B., Milosavljevic, A., Meissner, A., Kellis, M., Marra, M.A., studies. Genet. Med. 4, 45–61. Beaudet, A.L., Ecker, J.R., et al. (2010). The NIH Roadmap Epi- 45. Jakes, R.W., Bae, S.C., Louthrenoo, W., Mok, C.C., Navarra, genomics Mapping Consortium. Nat. Biotechnol. 28, 1045– S.V., and Kwon, N. (2012). Systematic review of the epidemi- 1048. ology of systemic lupus erythematosus in the Asia-Pacific re- 34. Rosenbloom, K.R., Sloan, C.A., Malladi, V.S., Dreszer, T.R., gion: prevalence, incidence, clinical features, and mortality. Learned, K., Kirkup, V.M., Wong, M.C., Maddren, M., Fang, Arthritis Care Res. (Hoboken) 64, 159–168. R., Heitner, S.G., et al. (2013). ENCODE data in the UCSC 46. Ramana, C.V., Chatterjee-Kishore, M., Nguyen, H., and Stark, Genome Browser: year 5 update. Nucleic Acids Res. 41, G.R. (2000). Complex roles of Stat1 in regulating gene expres- D56–D63. sion. Oncogene 19, 2619–2627.

The American Journal of Human Genetics 96, 731–739, May 7, 2015 739 79 The American Journal of Human Genetics Supplemental Data Lupus Risk Variant Increases pSTAT1 Binding and Decreases ETS1 Expression Xiaoming Lu, Erin E. Zoller, Matthew T. Weirauch, Zhiguo Wu, Bahram Namjou, Adrienne H. Williams, Julie T. Ziegler, Mary E. Comeau, Miranda C. Marion, Stuart B. Glenn, Adam Adler, Nan Shen, Swapan K. Nath, Anne M. Stevens, Barry I. Freedman, Betty P. Tsao, Chaim O. Jacob, Diane L. Kamen, Elizabeth E. Brown, Gary S. Gilkeson, Graciela S. Alarcón, John D. Reveille, Juan-Manuel Anaya, Judith A. James, Kathy L. Sivils, Lindsey A. Criswell, Luis M. Vilá, Marta E. Alarcón-Riquelme, Michelle Petri, R. Hal Scofield, Robert P. Kimberly, Rosalind Ramsey-Goldman, Young Bin Joo, Jeongim Choi, Sang-Cheol Bae, Susan A. Boackle, Deborah Cunninghame Graham, Timothy J. Vyse, Joel M. Guthridge, Patrick M. Gaffney, Carl D. Langefeld, Jennifer A. Kelly, Kenneth D. Greis, Kenneth M. Kaufman, John B. Harley, and Leah C. Kottyan

80 A Without Admixture Adjustment 100 12 ● Genotyped FLI1 ● Imputed ETS1 10 80

8 60

( p -value) 6 ● ● 10 ● ● ●● 40 ●●●● ●●

-log ● 4 ●●● ● ●● ●● ● ●●●● ●●●● ● ●● ●● ●● ● ●●●●●●●●● ● ● 20 2 ● ●●●●●● ●●●●●● ● ●● ●●●●● ●● ●●● ● Recombination Rate (cM/Mb) ● ● ●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ●●●● ●●●●● ●●●●● ● ●●●●● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●● ●●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● 0 ●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 0 128.1 128.2 128.3 128.4 128.5 128.6 Chromosome 11(Mb)

B With Admixture Adjustment 100 12 ● Genotyped FLI1 ● Imputed ETS1 10 80

8 60

( p -value) 6 10 40

-log 4 ● ● ● ●● ● ●●● 20 2 ●●● ●● ● ●●●●●●●●●● ● Recombination Rate (cM/Mb) ●●●●●●●●●● ●●●●●● ●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ● ● ●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 0 ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● 0 128.1 128.2 128.3 128.4 128.5 128.6 Chromosome 11(Mb)

Figure S1. Association of variants at the ETS1 locus in Hispanic ancestry cohort is accounted for by admixture in the cases and controls. A) The logistic association of genotyped and imputed variants in cohort of Hispanic American (HA) without an adjustment for admixture. B) The logistic association of genotyped and imputed variants in cohort of Hispanic American (HA) with an adjustment for admixture estimates of each subject.

81 Figure S2. eQTL analysis of rs6590330. A) The risk allele of rs6590330 is correlated with decreased ETS1 expression in Han Chinese but not other ancestral cohorts. B) and C) The risk allele of rs6590330 is correlated with TP53AIP1 expression in Han Chinese but not other ancestral cohorts. ILMN_1763374 and ILMN_1766388 are two probes targeting gene TP53AIP1. GeneVar 3.1.1 was used to identify expression quantitative loci (eQTLs) using publicly available genotyping and gene expression in LCL database from Hapmap (Stranger et al. 2012). The Box Plots were created using the analysis of raw data downloaded from the link provided in the manuscript (Stranger et al. 2012). The sample size of each genotype (AA, AG, GG) for each cohort is CHB (10, 35, 35), GIH (3, 20, 59), JPT (20, 35, 27), LWK (3, 24, 55), MEX (0, 7, 38), MKK (5, 42, 89), YRI (6, 40, 61). CHB: Han Chinese in Bejing, China; GIH: Gujarati Indian from Houston, Texas, USA; LWK: Luhya in Webuye, Kenya; MEX: Mexican ancestry in Los Angeles, CA,USA; MKK: Maasai in Kinyawa, Kenya; YRI:Yoruba in Ibadan, Nigeria.

82 128,335,000 ETS1 128,330,000 rs1128334 128,325,000 hg19 rs114534998 128,320,000 rs12574073 rs140066073 128,315,000 rs11501246 rs61909063 rs11221307 10 kb rs11602895 rs6590330 STAT1 motif Chromatin State Segmentation by HMM from ENCODE/Broad 128,310,000 DNaseI Hypersensitivity Clusters in 125 cell types from ENCODE (V3) UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) rs12293665 Chromatin Interaction Analysis Paired-End Tags (ChIA-PET) from ENCODE/GIS-Ruan H3K4Me1 Mark (Often Found Near Regulatory Elements) on 7 cell lines from ENCODE H3K27Ac Mark (Often Found Near Active Regulatory Elements) on 7 cell lines from ENCODE chr11:128308991:D rs9736939 rs9736922 rs77989778 128,305,000 rs12576753 128,300,000 rs12576990 128,295,000 Scale chr11: Bayesian Bayesian credible set DNase Clusters K562 Pol2 Sig 1 16 SNPs in 95% 16 SNPs GM12878 ChromHMM

Figure S3. In silico analysis of SNVs in the 95% Bayesian credible set identifies four SNVs in potential regulatory regions. Red variants were selected for additional experimental analysis.

83 A rs6590330 B rs1128334

Oligo R NR R NR Oligo R NR R NR Nuclear Extract - - + + Nuclear Extract - - + +

C rs12574073 rs114534998

Oligo R NR R NR R NR R NR Nuclear Extract - - + + - - + +

Figure S4. EMSA analysis of rs6590330, rs12574073, rs114534998, and rs1128334. Labeled oligonucleotides corresponding to the non-risk (NR) and risk (R) variants at the indicated SNV. A) Oligonucleotides NR and R at rs6590330 were used to probe nuclear extract from B cells. B) Oligonucleotides NR and R at rs1128334 were used to probe nuclear extract from B cells. C) Oligonucleotides NR and R at rs12574073 and rs114534998 were used to probe nuclear extract from B cells.

84 Figure S5. rs6590330 is located near a putative STAT1 binding site. *The putative STAT1 binding sequence is enriched in the top 500 ChIP-seq peaks of a STAT1 ChIPseq analysis of 30min IFNα-stimulated K562 cells. The sequences of the top 500 transcription factor ChIPseq peaks were used to identify enriched motifs de novo, using the MEME-ChIP suite of tools. The motifs are then used to scan the entire set of ChIPseq peaks and the two flanking regions using the FIMO tool. This motif is enriched in the 486 of 500 top ChIP-seq peaks.

85 Table S1. Subjects and genetic variants included in the current study.

Subjects Genetic Variants Cases Controls SNPs Indels Males Females Males Females Genotyped Imputed Imputed EU 344 3582 1151 2339 69 535 36 AA 121 1403 574 1235 69 1125 71 AS 98 1159 154 1104 69 438 40 HA 94 858 55 280 69 582 42 Total 657 7002 1934 4958 1402 variants (MAF>0.01)

86 Table S2. Probes and primers used in this study.

NAME Modification Sequence Comment DNA affinity precipitation assay (DAPA) AATGTGTTGGGAGGGGCGGGGGGTATCT Non-risk-rs6590330-F 5' Biotin Non-risk rs6590330 oligo forward ATTCTTG CAAGAATAGATACCCCCCGCCCCTCCCA Non-risk-rs6590330-R 5' Biotin Non-risk rs6590330 oligo reverse ACACATT AATGTGTTGGGAGGGGCAGGGGGTATCT Risk-rs6590330-F 5' Biotin Risk rs6590330 oligo forward ATTCTTG CAAGAATAGATACCCCCTGCCCCTCCCA Risk-rs6590330-R 5' Biotin Risk rs6590330 oligo reverse ACACATT Risk rs6590330 oligo with mutating Risk-Mutant- AATGTGTTGGGAGGGGCAGGGGGTATCT the putative STAT1 binding site 5' Biotin rs6590330-F ACCTCCA forward

Risk rs6590330 oligo with mutating Risk-Mutant- TGGAGGTAGATACCCCCTGCCCCTCCCA 5' Biotin the putative STAT1 binding site rs6590330-R ACACATT reverse Chromatin Immunoprecipitation-qPCR STAT1-ChIP-Postive-F None GACAGTGGCATCCTGATAGTG Positive control for STAT1 binding 5' 6-FAM™ - STAT1-ChIP-Postive- ZEN - CTCAGATGCTGCTGCCCTCTAGTG Positive control for STAT1 binding Probe 3' Iowa Black® FQ STAT1-ChIP-Postive-R None CCCACCCTAAGTTTCAGTTCTC Positive control for STAT1 binding STAT1-ChIP-Negative- None GTTCAGGCCCGGCTTAG Negative control for STAT1 binding F 5' 6-FAM™ - STAT1-ChIP-Negative- ZEN - CCGCAGGGCTAATGCTGTGTGA Negative control for STAT1 binding Probe 3' Iowa Black® FQ STAT1-ChIP-Negative- None CGGGATGAAGCCAGTTGT Negative control for STAT1 binding R rs6590330-ChIP-F None TACGTGATTGTTCATAGCCAA rs650330 ChIP-qPCR primer forward rs6590330-ChIP-R None AAGTGCCTTTATAAATGATTTGTTCC rs650330 ChIP-qPCR primer forward 5' 6-FAM™ - rs650330 ChIP-qPCR probe non-risk rs6590330-WT-Probe 3' Iowa C+C+T+GCCCC+TC forward Black® FQ 5' HEX™ - 3' rs6590330-Mutant- Iowa Black® TACCCC+C+C+GCC rs650330 ChIP-qPCR probe risk Probe FQ Electrophoretic mobility shift assay AATGTGTTGGGAGGGGCGGGGGGTATCT Non-risk-rs6590330-F 5'-IRDye 700 Non-risk rs6590330 oligo forward ATTCTTG CAAGAATAGATACCCCCCGCCCCTCCCA Non-risk-rs6590330-R 5'-IRDye 700 Non-risk rs6590330 oligo reverse ACACATT AATGTGTTGGGAGGGGCAGGGGGTATCT Risk-rs6590330-F 5'-IRDye 700 Risk rs6590330 oligo forward ATTCTTG CAAGAATAGATACCCCCTGCCCCTCCCA Risk-rs6590330-R 5'-IRDye 700 Risk rs6590330 oligo reverse ACACATT AAAAGCTGGGAGTGTCACACTAATGGAG Non-risk-rs12576753-F 5'-IRDye 700 Non-risk rs12576753 oligo forward CTCCTTC GAAGGAGCTCCATTAGTGTGACACTCCC Non-risk-rs12576753-R 5'-IRDye 700 Non-risk rs12576753 oligo reverse AGCTTTT AAAAGCTGGGAGTGTCAAACTAATGGAG Risk-rs12576753-F 5'-IRDye 700 Risk rs12576753 oligo forward CTCCTTC

GAAGGAGCTCCATTAGTTTGACACTCCC Risk-rs12576753-R 5'-IRDye 700 Risk rs12576753 oligo reverse AGCTTTT

87 NAME Modification Sequence Comment Electrophoretic mobility shift assay (continued) Non-risk-rs114534998- ATATATATATATATATAAAACACCATAA 5'-IRDye 700 Non-risk rs12576753 oligo forward F TATGCTA Non-risk-rs114534998- TAGCATATTATGGTGTTTTATATATATAT 5'-IRDye 700 Non-risk rs12576753 oligo reverse R ATATAT ATATATATATATATATATAACACCATAAT Risk-rs114534998-F 5'-IRDye 700 Risk rs12576753 oligo forward ATGCTA TAGCATATTATGGTGTTATATATATATAT Risk-rs114534998-R 5'-IRDye 700 Risk rs12576753 oligo reverse ATATAT TAAATTATATGAAAATACAAGTTGGAAA Non-risk-rs1128334-F 5'-IRDye 700 Non-risk rs1128334 oligo forward ATAGTCA TGACTATTTTCCAACTTGTATTTTCATAT Non-risk-rs1128334-R 5'-IRDye 700 Non-risk rs1128334 oligo reverse AATTTA TAAATTATATGAAAATATAAGTTGGAAA Risk-rs1128334-F 5'-IRDye 700 Risk rs1128334 oligo forward ATAGTCA TGACTATTTTCCAACTTATATTTTCATAT Risk-rs1128334-R 5'-IRDye 700 Risk rs1128334 oligo reverse AATTTA

For probes, the red bases indicate the allele that is different between the risk and non-risk sequences. For the probe with a mutated STAT binding site, the mutated sequence is indicated as blue.. ChIP-qPCR probes use locked nucleic acid bases (LNATM) and are given as ‘+_’.

88 Table S3. Haplotype block analysis in cohort of Asian and Asian American ancestry

Haplotype Block Frequency P-value All imputed genetic variants AAGAGAGAGAA 0.389 2.70x10-9 Haplotypes built using variants CCCCCCCCCCCCCCCCCA 0.373 7.38x10-9 with individual p-values less than 10-5 CCCA 0.392 7.38x10-9

89 Table S4. Peptides from STAT1 identified by mass spectrometry following DAPA with oligos containing the risk and non-risk allele of rs6590030.

Peptides from Transcription factor found in Risk pull down: Confidence Sequence Human signal transducer and activator of transcription 1- alpha/beta 99 ELSAVTFPDIIR Human signal transducer and activator of transcription 1- alpha/beta 99 SQWYELQQLDSK Human signal transducer and activator of transcription 1- alpha/beta 99 VMAAENIPENPLK Human signal transducer and activator of transcription 1- alpha/beta 99 GLNVDQLNMLGEK

Identification of the transcription factor bounded to oligoes containing rs6590330 non-risk allele or risk allele. The biotin- labeled oligo was incubated with EBV-transformed B cell line cell nuclear extracts and the precipitate was sent for analysis using mass spectroscopy. Peptides bound to the oligo were captured using µMACs Factor Finder , then were identified by NanoLC-MS/MS. No peptides from STAT1 were identified from the non-risk pull down with confidence >50.

90 Chapter 4. Precisely Controlled Differential Gene Expression System

91

Precisely Controlled Differential Gene Expression System

Xiaoming Lu1,2, Connor Schroeder1, Carmy Forney1, Matthew T. Weirauch1,3,

John B. Harley1,2,4, Leah C. Kottyan1,2

1Center for Autoimmune Genetics and Etiology, Cincinnati Children’s

Hospital Medical Center, Cincinnati, OH

2Immunobiology Graduate Program, University of Cincinnati College of

Medicine, Cincinnati, OH

3Divisions of Biomedical Informatics and Developmental Biology, Cincinnati

Children’s Hospital Medical Center, Cincinnati, OH

4US Department of Veterans Affairs Medical Center, Cincinnati, OH

Correspondence to: Leah C. Kottyan at [email protected]

92

Abstract

Many non-coding genetic variants that increase disease risk have been identified through genome-wide association studies and large-scale sequencing studies. More than

60% of these loci have been associated with genotype-dependent expression of nearby genes. However, it remains to be determined that these 10% ~ 75% expression differences are biologically relevant. Here, we developed a system combining the

Clustered regularly interspaced short palindromic repeats (CRISPR) Homology-directed

Repair technique with Tet-inducible Controlled Gene Expression technique to precisely control gene expression. This system allows us to mimic these differential gene expressions caused by genetic variants. We focus on the expression quantitative trait loci

(eQTL) at the ETS1 lupus-risk locus. Patients with lupus have 50% less peripheral blood mononuclear cell mRNA expression of ETS1 than people without SLE, and people with the risk haplotype at the ETS1 locus have 50% less mRNA expression than people with the non-risk haplotype. Since ETS1 is a transcription factor, differential ETS1 expression could have easily measurable consequences on transcription factor binding and downstream gene expression. The investigation of these small differential gene expressions may make important progress in the field of human genetics and especially complex genetic disease etiology.

93

Introduction

Genome-wide association studies and large-scale sequencing studies identify many non-coding genetic variants that increase disease risk. Along with epigenetic and expression studies, these findings implicate gene regulation as the mechanistic key to disease initiation at many loci. Recent studies have shown that these non-coding risk loci are enriched for regulatory elements, and at least 60% of these loci have been associated with genotype-dependent expression of nearby genes (i.e. characterized as expression quantitative trait loci (eQTLs)) in at least one disease-relevant tissue [1-6]. Most of these differential gene expressions are between 10% ~ 75% [7]. One fundamental unanswered question about these differential gene expressions is whether they are biologically relevant. In order to answer this question, a precisely controlled differential gene expression system is needed to investigate these small gene expression differences.

Here, we developed a system combining the Clustered regularly interspaced short palindromic repeats (CRISPR) Homology-directed Repair technique with Tet-inducible

Controlled Gene Expression technique to mimic these small gene expressions.

CRISPR/CRISPR-associated (Cas) system is a bacterial adaptive immune defense system that uses short RNA for direct degradation of foreign nucleic acids. The type II bacterial CRISPR system may be engineered to direct Cas9 protein to specifically cleave complementary target-DNA sequence matching custom guide RNA (gRNA) in human cells, if the sequence is adjacent to short sequences known as protospacer adjacent motifs (PAMs) [8]. For the simplest and most widely used form of this technology, two components must be introduced into and/or expressed in cells to perform genome editing: the Cas9 nuclease and a single-guide RNA (sgRNA) consisting a fusion of a

94

CRISPR RNA (crRNA) and a fixed transactivating CRISPR RNA (tracrRNA). 20 nucleotides at the 5’ end of the sgRNA directs Cas9 to a specific target DNA site using standard RNA-DNA complementarity base pairing rules. These target sites must lie immediately at 5’of a PAM sequence that matches the canonical form 5’-NGG. Cas9- induced DNA double-stranded breaks (DSBs) are used to introduce precise repair from single-stranded oligonucleotide (ssODN) or plasmid donor templates through homology- directed repair (HDR) pathways [9].

Tet-inducible system is one of the most commonly used gene regulation systems, originally developed by Gossen and Bujard [10]. Its advantage is allowing a dose- dependent regulation of the gene of interest. This system includes two components, a tetracycline controlled transactivator (tTA) and a tet-responsive promoter (TRP) regulating the gene of interest. The tTA binds to the tetR-moiety of the TRP, a minimal promoter linked by a tandem of tet-operator sequences, with high affinity in the presence of Doxycycline (Dox). Two transactivator variants have been developed based on their response to Dox, which is Tet-off and Tet-on systems. In the Tet-off system, the tTA is released from its DNA binding site in the presence of Dox, thus stopping gene expression.

In the Tet-on system, tTA will binds to its DNA binding site in the presence of Dox, thus starting gene expression [10].

Through combining the CRISPR Homology-directed Repair with Tet-inducible

Controlled Gene Expression, we developed a system to mimic small gene expression changes. We utilized gene ETS1 as our target gene and cell line HEK293T as our target cell to optimize the system. Genetic variants at the ETS1 locus were found to be associated with susceptibility to SLE in several independent GWASs in Asian populations

95

[11-15]. In one recent study, measuring 61 Chinese Han SLE patients, ETS1 mRNA levels in peripheral blood mononuclear cells (PBMCs) from SLE patients were considerably lower compared with healthy subjects using qRT-PCR [16]. Meanwhile,

Ets1-deficient (Ets1p/p) mice could develop a lupus-like disease characterized by high titers of IgM and IgG autoantibody and immune complex deposition in the kidney [17]. In another following study, it was demonstrated that ETS1 mRNA expression from chromosomes with SLE-risk alleles was significantly reduced compared to that from the chromosome with non-risk alleles in PBMCs of healthy subjects heterozygous for rs1128334 through pyrosequencing of DNA and cDNA from subjects [13]. Since ETS1 is a transcription factor, differential ETS1 expression could have easily measurable consequences on transcription factor binding and downstream gene expression. The investigation of these small differential gene expressions may make important progress in the field of human genetics and especially complex genetic disease etiology.

Methods

Cell lines and cell culture

DMEM media with 4.5 g/L Glucose and L-Glutamine were purchased from VWR,

Radnor, Pennsylvania. HEK293T cells were maintained in DMEM media supplemented with 10% fetal bovine serum and penicillin-streptomycin.

Preparation of CRIPSR Vector

gRNA: GCATTAAAAGCTACTTTCAG targeting 2nd exon of ETS1 was cloned into pSpCas9(BB)-2A-GFP vector using BbsI sites according to protocol described in [18] by transgenic core of Cincinnati Children’s Hospital Medical Center. DNA templates for HDR

96 were generated by cloning donor cassette into pBlueScript vector without promoter. The donor cassette contains (2kb left homology arm)-P2A-mCherry/TagBFP-(2kb right homology arm).

Transfection and purification

3ug pSpCas9 vector, 3ug mCherry donor vector and 3ug TagBFP donor vector were transfected into 6.5 x 105 HEK293T cells in 6-well plate using Lipofectamine® 2000 transfection reagent according to the manufacture protocol. Three days after transfection, mCherry+TagBFP+ HEK293T cells were purified using FACS sorting. Sorted cells were then cultured for expansion.

Fluorescent tag knock-in verification

5’ PCR primer: CTTTTGGGATCCCCAGTCGTT and 3’ PCR primer:

CTTTTGGGATCCCCAGTCGTT flanking the gRNA targeting region were used to amply the mCherry/TagBFP knockin using PCR. The PCR products were analyzed on 2% agarose gel containing Ethidium Bromide.

Preparation of lentivirus for Tet-inducible vector

ETS1 variant 2 tagging with TagGFP were cloned into pLVX-Tetone-Puro vector through BamHI and EcoRI sites using In-Fusion® HD Cloning Kit from Clontech®

Laboratories, Inc. This generated pLVX- ETS1-GFP-Tetone-Puro vector were used for generation of lentivirus through viral vector core of Cincinnati Children’s Hospital Medical

Center.

Lentivirus transduction and purification

2ml of 1.25ug/ml Retronectin solution from Clontech® Laboratories, Inc. was used to treat each well for 24-well plate on the day before experiments. pLVX- ETS1-GFP-

97

Tetone-Puro lentivirus was used to transduce HEK293T cells using MOI of 2. Two day after transduction, 100ng/ml Doxycycline (Dox) was added to the culture medium. Three days after transduction, cells were collected for FACS sorting. GFP+ cells were collected for expansion.

Doxycycline Titration

6.5 x 105 GFP+ HEK293T cells from pLVX-ETS1-Tetone-Puro lentiviral transduction were seeded in 6-well plate the day before experiments. Different concentrations of Dox were added to the culture medium. 2 days after Dox induction, cells were collected for assessing ETS1 mRNA and protein expression.

Results and Discussion

ETS1 knockout using CRISPR system

gRNA was designed to target the 2nd exon of ETS1 gene, which is the first common exon for all major isoforms. Since mCherry or TagBFP without promoter was inserted into the ETS1 gene, all major isoforms of ETS1 are disrupted and the shortest fragment of

ETS1 is expressed through targeting the 2nd exon. By transfecting Cas9-gRNA vector and two donor vectors containing either mCherry or TagBFP into HEK293T cells, mCherry or

TagBFP would be incorporated into the gRNA targeting site through HDR pathway, which disrupts ETS1 gene. Then, if cells express mCherry and TagBFP together, both ETS1 copies will be disrupted.

53% of the transfected cells are mCherry+TagBFP+ cells, indicating a very high gene knockout efficiency. The ETS1 double knockout of mCherry+TagBFP+ cells were validated through amplifying gene insertion region. Since the length of mCherry or

98

TagBFP gene is around 1kb, pure double knockout cells should only have the 1kb band, no other bands. As shown in Figure 1, mCherry+TagBFP+ cells only have the 1kb band, indicating a pure ETS1 double knockout population.

Graded ETS1 expression using Tet-inducible system

mCherry+TagBFP+ cells were then transduced with pLVX-ETS1-GFP-Tetone-Puro lentivirus. ETS1 variant 2 tagged with GFP is under the control of Tet-inducible promoter in the lentivirus. Through adding 100ng/ml Dox in the culture medium, ETS1-GFP expression will be induced and GFP+ cells will be successfully transduced cells. These

GFP+ cells were then used for checking Dox titration.

Different levels of Dox have been investigated for the ability to induce gene expression. As shown in Figure 2, the gene expression changed very little from 0ng/ml to

10ng/ml. But the gene expression changed more significant from 20ng/ml to 80ng/ml. And the gene expression saturated around 100ng/ml. Also, the gene expression difference could be as big as 10-fold. What is more, Dox could induced the differential gene expression on both the mRNA and protein levels. On the other hand, there are some leaking ETS1 expression from the vector when no Dox in the culture medium.

Conclusion

We developed a system combining the CRISPR Homology-directed Repair technique with Tet-inducible Controlled Gene Expression technique to mimic these small gene expression differences caused by genetic variants. Our result shows that sorted mCherry+TagBFP+ HEK293T cells all had both copies of ETS1 genes disrupted, indicating a pure population. Therefore, there is no endogenous ETS1 expression.

99

Meanwhile, the Tet-inducible Controlled Gene Expression Vector lenti-transduction had very high transduction efficiency due to the nature of HEK293T cells. In the end, the Dox titration experiments on the sorted GFP+ cells confirmed that the graded response of the

Tet-inducible Controlled Gene Expression system. For Dox in the range of concentration from 0 to 100 ng/ml, we observed significant graded mRNA and protein expressions that distinguish Dox concentration difference of 10ng/ml difference. The expression difference will be very small when the concentration change is less than 10ng/ml. What is more, the differential ETS1 expression may be titrated from 1-fold to 10-fold difference, indicating the great flexibility and responsiveness of this system.

Reference:

1. Peloquin, J., et al., O-002 Genes in IBD-Associated Risk Loci Demonstrate Genotype-, Tissue-, and Inflammation-Specific Patterns of Expression in Terminal Ileum and Colon Mucosal Tissue. Inflamm Bowel Dis, 2016. 22 Suppl 1: p. S1. 2. McGovern, D.P., S. Kugathasan, and J.H. Cho, Genetics of Inflammatory Bowel Diseases. Gastroenterology, 2015. 149(5): p. 1163-1176 e2. 3. Hulur, I., et al., Enrichment of inflammatory bowel disease and colorectal cancer risk variants in colon expression quantitative trait loci. BMC Genomics, 2015. 16: p. 138. 4. Julia, A., et al., A genome-wide association study identifies a novel locus at 6q22.1 associated with ulcerative colitis. Hum Mol Genet, 2014. 23(25): p. 6927-34. 5. Chepelev, N.L., et al., Competition of nuclear factor-erythroid 2 factors related transcription factor isoforms, Nrf1 and Nrf2, in antioxidant enzyme induction. Redox Biol, 2013. 1: p. 183-9.

100

6. Jostins, L., et al., Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature, 2012. 491(7422): p. 119-24. 7. Zhu, Z., et al., Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet, 2016. 48(5): p. 481-7. 8. Mali, P., et al., RNA-guided human genome engineering via Cas9. Science, 2013. 339(6121): p. 823-6. 9. Sander, J.D. and J.K. Joung, CRISPR-Cas systems for editing, regulating and targeting genomes. Nat Biotechnol, 2014. 32(4): p. 347-55. 10. Heinz, N., K. Hennig, and R. Loew, Graded or threshold response of the tet-controlled gene expression: all depends on the concentration of the transactivator. BMC Biotechnol, 2013. 13: p. 5. 11. Han, J.-W., et al., Genome-wide association study in a Chinese Han population identifies nine new susceptibility loci for systemic lupus erythematosus. Nature genetics, 2009. 41(11): p. 1234-7. 12. He, C.F., et al., TNIP1, SLC15A4, ETS1, RasGRP3 and IKZF1 are associated with clinical features of systemic lupus erythematosus in a Chinese Han population. Lupus, 2010. 19(10): p. 1181-6. 13. Yang, W., et al., Genome-wide association study in Asian populations identifies variants in ETS1 and WDFY4 associated with systemic lupus erythematosus. PLoS genetics, 2010. 6(2): p. e1000841-e1000841. 14. Zhong, H., et al., Replicated associations of TNFAIP3, TNIP1 and ETS1 with systemic lupus erythematosus in a southwestern Chinese population. Arthritis research & therapy, 2011. 13(6): p. R186- R186. 15. Lee, H.-S., et al., Ethnic specificity of lupus-associated loci identified in a genome-wide association study in Korean women. Annals of the rheumatic diseases, 2013.

101

16. Li, Y., et al., Expression analysis of ETS1 gene in peripheral blood mononuclear cells with systemic lupus erythematosus by real-time reverse transcription PCR. Chinese medical journal, 2010. 123(16): p. 2287-8. 17. Wang, D., et al., Ets-1 deficiency leads to altered B cell differentiation, hyperresponsiveness to TLR9 and autoimmune disease. International immunology, 2005. 17(9): p. 1179-91. 18. Ran, F.A., et al., Genome engineering using the CRISPR-Cas9 system. Nat Protoc, 2013. 8(11): p. 2281-308.

102

Marker DP Ctrl

1500

1000

500

Figure 1. ETS1 double knockout validation. Primers flanking the gRNA targeting region were used to amply the mCherry or TagBFP knock-in using PCR. The

PCR products were analyzed on 2% agarose gel containing Ethionamide Bride. Marker represents the molecular marker for DNA. DP represents PCR products from mCherry+TagBFP+ cells. Ctrl represents PCR products from non-CRIPSR modified cells.

103

ETS1 Relative Expression A ETS1 Relative Expression 12 12 10 10 8 8 6 6 4 4

2 2

ETS1 Relative Expression Relative ETS1 ETS1 Relative Expression Relative ETS1 0 0 0 0.1 1 2 3 4 5 6 7 8 10 100 0 20 40 80 100 200 400 800 10001200 Dox Concentration(ng/ml) Dox Concentration(ng/ml)

B

Figure 2. Graded ETS1 expression with different Dox concentration. A) ETS1mRNA

relative expression in GFP+ cell stimulated with different Dox concentration. The left figure

showed expression from cell stimulated with 0, 0.1, 1, 2, 3, 4, 5, 6, 7, 8, 10, 100 ng/ml.

The right figure showed expression from cell stimulated with 0, 20, 40, 80, 100, 200, 400,

800, 1000, 1200 ng/ml. All mRNA ETS1 expression are first normalized to HPRT1 and

then normalized to the sample with 0 ng/ml. B) ETS1 protein expression in GFP+ cell

stimulated with different Dox concentration. This figures shows ETS1 protein expression

from cell stimulated with 0, 20, 40, 80, 100, 200, 400, 800, 1000, 1200 ng/ml. beta-actin

was used as internal control.

104

Chapter 5. Discussions

105

Summary

Summary of “Protocol for electrophoretic mobility shift assays (EMSA) and DNA-affinity precipitation assays (DAPA)”

The understanding of disease pathogenesis has been greatly advanced through the identification of disease associated genetic variants with development of sequencing and genotyping technologies. However, our ability to understand the functional mechanisms impacted by these variants are still limited. This is mainly because most of these associated genetic variants are located in the non-coding regions of the genome, which makes it hard to predict the mechanism affecting disease. One most common function of the non-coding regions of genome is to affect the nearby gene expression through TF binding, which is possibly disturbed by genetic variants. Here, we present a protocol based on EMSA and DAPA techniques to investigate these genotype-dependent

TF binding, which may explain the mechanisms impacted by these variants. Although these two techniques have been existed for a while, they have only been used in the genetic analysis of TF binding recently. Moreover, EMSA can be used to investigate the effect of genetic variants on the binding of RNA binding proteins with minor adjustment to the protocol [241].

These two experiments are simple and easy to perform; nonetheless, some questions need to be carefully examined before experimentation. First, it is critical to produce an initial list of candidate variants through rigorous statistical analyses. The mistake in this step can ruin the whole downstream analyses, as some variants may affect

TF binding without affecting disease risk. The next step is to these candidate variants using functional genomic resources, such as ENCODE [242] and Roadmap Epigenomics

106

[243]. Other information regarding gene expression levels can also be incorporated into the analyses, such as ImmGen [244], BioGPS [245, 246], and SNPsea [247]. Also, it is really important to assess these variants in the pertinent cell type. Irrelevant cell types may produce false positive result. Finally, when performing these experiments, it also needs to be carried out in the disease relevant cell.

There are certain caveats of these experiments need to be considered. Especially, repeated experiments with different batches of nuclear lysate is needed to confirm the actual differential TF binding. Furthermore, an EMSA supershift can be shown as a further shift with respect to the original band, in which antibody binds to TF, or a loss of the original band, in which antibody blocks the DNA-TF binding. In either scenario, an isotype control is very important in excluding the possibility of non-specific binding, serving as a negative control. Another useful negative control, the scrambled oligo, can be used to demonstrate that any observed binding is specific to the oligos of interest. As for the positive control, a consensus sequence with well characterized binding motif of the TF

(obtained from databases such as CIS-BP [248] or Factorbook [249]) can be used.

Both two techniques have certain limitations. For example, the binding between

DNA and TF is largely affected by the buffer conditions. Ideally, the buffer should mimic the real condition of the nucleus for optimal binding. However, this is not always the case.

Improper buffer conditions may lead to weak binding between DNA and TF. Therefore, these two techniques are only effective when the conditions are optimal for DNA and TF binding. Here, we present the most common used buffer condition. However, for specific project, the buffer may need to be optimized. Another limitation is that the experimental results doesn’t provide direct explanation about the actual mechanism through which TF

107 binds to DNA. TF may binds to DNA directly or indirectly through other TF. Therefore, it is very important to investigate how TF binds to DNA. This can be predicted through computationally analyzing the oligo sequences and then be tested through experiments.

For example, the predicted binding site can be mutated or the binding partner can be depleted from the lysate. Finally, the amount of the lysate used in the experiment needs to be optimized for each experiment. Too much lysate may saturate the DNA-TF binding, which covers the real differential binding between risk and non-risk alleles. For other troubleshooting problem, the following reviews and method papers can be a good resource [250-252].

In addition to these two assays described here, there are other in vivo experiments for further studying the role of variants within living cells. Chromatin immunoprecipitation followed by allele-specific quantitative polymerase chain reactions (ChIP-qPCR) can be used to determine if the differential binding observed in the in vitro experiments is actually happening in the living cells [253]. For example, using cells heterozygous for a variant of interest, ChIP-qPCR can identify the relative difference between two alleles in the TF immuneprecipitated chromatin. Usually, ChIP-grade antibody is needed for ChIP. If no

ChIP-grade antibody is available, transfecting the cell with a flag-tagged TF and then using antibody against flag to perform ChIP is an effective alternative [254]. Additionally, expression quantitative trait loci (eQTL) analysis in relevant cell types can provide insight about the target gene affected by the genetic variants. eQTL analysis can either be done through locally collecting expression data from genotyped cells or publically available dataset, such as those collected by Genevar [255]. Furthermore, luciferase reporter assays can be used to confirm the effect of eQTL. This assay can be modified to detect

108 the effect of variants located in promoter, enhancer or repressor [256-258]. Finally, genome-editing technologies, such as CRISPR/Cas9 [259], is necessary for proving the causation. Such technologies can greatly increase the sensitivity and reduce the noise through generating cell lines that only differ at the variant of interest. Functional readouts such as gene expression or other disease intermediate phenotype between edited cell line and non-edited cell line can be used to assess the effect of the variant of interest.

The main advantage of EMSA and DAPA is that they can easily and rapidly detect the genotype-dependent TF binding, which can be further validated through other experiments. Particularly, these two experiments can be applied to any disease or phenotype for assessing the function of associated genetic variants. Such knowledge is critical for understanding the molecular mechanisms driving genetic-based disease. This strategy has been successfully performed in the analysis of molecular mechanism of PXK locus and IRF5 locus in SLE [260, 261].

Summary of “Lupus risk-variant increases pSTAT1 binding and decreases ETS1 expression”

We have used the genotyped and imputed data from 14,551 subjects to carry out a fine mapping study of ETS1 association with SLE risk. A model consisting of a single genetic effect was identified using step-wise logistic regression analyses. Through frequentist and Bayesian analyses, a set of 16 variants that are most likely to be causal were identified. Our experimental data suggested a mechanistic model in which the risk allele of rs6590330 at the ETS1 locus contributes to increased SLE risk by facilitating binding of pSTAT1 to a nearby putative STAT binding site and subsequent repression of

109 the expression of ETS1. Further, our fine mapping results are corroborated by a recent study that used a different analytical method to estimate the probability of each variant within the ETS1 locus of being a causal variant, which concluded that rs6590330 is among the likely causal variants proposed for the ETS1-locus association with SLE [262], further supporting the results of our genetic analysis.

The biological mechanism we propose herein is specific to the association of variants at the ETS1 locus in the AS ancestry. We performed the functional validation in

B cells based on the evidence that B cells play a critical role in the etiology of SLE.

Previous studies have demonstrated that B cells are significantly enriched for the expression of genes near lupus risk-loci [263]. Meanwhile, B cells from lupus patients produce autoantibodies, are hyperactivated, and have an exaggerated response to TLR ligands and immune complexes [264]. Ets1-deficient (Ets1p/p) mice develop a strikingly similar B cell phenotype [193, 265]. This ETS1 deficiency has been shown to drive intrinsic terminal differentiation of B cells into IgM-secreting plasma cells in B cell intrinsic fashion [197]. While we limited our functional analysis to B cells, it remains possible that the same genetic risk mechanism might also operate in other cell types such as T cells

[197, 201].

In all four ancestries, rs6590330 is polymorphic (MAFAS= 45%; MAFAA= 31%;

MAFHA= 28%; MAFEU= 26%). The ETS1 association has only been observed in AS ancestry cohorts in our study and others before us. Perhaps variants at the ETS1 locus do not increase SLE risk in AA and EU populations, a possibility consistent with epistasis or environmental factors (gene-environment interactions) [266]. Asian-specific variants have been identified in previous studies [221, 267] and it is not surprising that ancestry-

110 specific genetic factors impact the risk of SLE. Patients of Asians ancestry have a higher

SLE incidence than those of European ancestry in addition to the ancestry-specific distribution of clinical manifestations [268].

We found that the risk allele of rs6590330 results in preferential binding of pSTAT1 to a nearby putative STAT binding site and is associated with reduced expression of ETS1.

STAT1 plays a complex role in regulating gene expression, and is capable of acting as either an activator or a repressor, depending on the cellular context [269]. This is an intriguing candidate causal mechanism in which disease risk is mediated through the disruption of transcription factor binding to a nearby site that does not contain the risk variant, as most SNPs affect transcription factor binding through affecting TF binding site directly [270]. There is little report that SNP can disrupt transcription factor binding to a nearby site that does not contain SNP. To tackle this question, we first need to make sure whether pSTAT1 can direct bind to DNA itself, which can be verified through EMSA and

DAPA using purified pSTAT1 protein. If pSTAT1 cannot bind to DNA itself, it means a co- transcription factor is necessary for pSTAT1 binding. Therefore, it is possible that rs6590330 affects the co-transcription factor directly, which will disrupt pSTAT1 binding then. If pSTAT1 can bind to DNA itself, this means rs6590330 may affect pSTAT1 binding through affecting chromatin state, such as the histone modification and chromatin structure [271-273]. For example, rs7903146 associated with type 2 diabetes can differentially affects local open chromatin in the pancreatic islet cells as detected by the

FAIRE-Seq assay [271]. However, since differential pSTAT1 binding is also observed in the in vitro EMSA and DAPA with synthesized oligo, which usually doesn’t preserve the chromatin state information, the probability that rs6590330 affects chromatin state is

111 lower compared with affecting co-transcription factor. But the probability that rs6590330 affects chromatin state still exists as the in vitro system cannot fully capture the in vivo situation.

For question about how the pSTAT1 differential binding affected by rs6590330 is associated with depressed ETS1 expression, it is possible that the region around rs6590330 plays a regulatory role in the ETS1 expression since rs6590330 is an intergenic SNP located 17.5 kb upstream of ETS1 gene transcription start site. It is possible that this associated region is an enhancer for ETS1 gene, which is disrupted by the risk allele or is a repressor for ETS1 gene, which risk allele enhances the repressed function. Meanwhile, this region has recently been annotated as an uncharacterized long non-coding RNA LOC105369566 by GeneCards [274], which is highly expressed in , thymus and lymph node [275, 276]. rs6590030 is in the middle of this long non- coding RNA. Nothing is known about the function of LOC105369566. As we known, long non-coding RNA are proposed to have four functions: signal, which indicates gene regulation in space and time; decoy, which can titrate away transcription factor and other protein from chromatin; guide, which can recruit chromatin modifying enzyme to the target site; and scaffold, which can bring multiple proteins to form the ribonucleoprotein complexes [277]. It has been shown genetic variation can affect the function of long non- coding RNA [278]. Therefore, the STAT1 differential binding affected by rs6590330 may depress ETS1 expression through acting through enhancer or repressor or affecting long non-coding RNA function.

Because the risk allele of rs6590330 is associated with reduced ETS1 expression

(reference [162] and Figure S2), it is possible that the reduced ETS1 expression

112 associated with the risk allele of rs6590330 may skew B cell differentiation to IgM-plasma cells, which can subsequently contribute to SLE pathogenesis. Meanwhile, even though we present rs6590330 as a candidate causal variant responsible for disease risk, more experiments are necessary to establish causality and define the genotype-dependent immunological mechanisms driving lupus at this locus.

In conclusion, we conducted a large trans-ancestral fine mapping study at the

ETS1 locus to identify genetic variants that increase lupus risk. After determining a model of single genetic effect, we used frequentist and Bayesian association methods to identify a set of 16 variants that are most likely to be causal. Of these we identified an allele specific function for rs6590330. Taken together, we propose a model in which the risk allele of rs6590330 increases SLE risk through increased binding of pSTAT1 and thereby depressing expression of ETS1.

Summary of “Precisely Controlled Differential Gene Expression

System”

After the identification of causal genetic variants, whose effects are affecting gene expression mostly, the next question is about how these differential gene expressions caused by genetic variants affect disease etiology. Since these differential gene expressions are around 10% ~ 75% [279], a precise controlled gene expression system is needed to investigate these small gene expression differences. We developed a system combining the CRISPR Homology-directed Repair technique with Tet-inducible

Controlled Gene Expression technique to mimic these small gene expression differences.

113

We utilized gene ETS1 as our target gene and cell line HEK293T as our target cell to optimize the system. Firstly, we eliminated the endogenous ETS1 expression. A Cas9 plasmid containing the sgRNA targeting the 2nd exon of the ETS1 gene and two donor plasmids containing either mCherry or TagBFP with about 2kb of sequence homologous to both the upstream and downstream sequence surrounding ETS1 are transiently transfected into HEK293T cells. Then, 2nd exon of ETS1 gene is cut by the Cas9 protein, creating a double strand break. mCherry or TagBFP is inserted into the break point through homology-directed repair with two donor plasmids. The ETS1 gene is disrupted and mCherry or TagBFP is expressed instead. Therefore, when the cell expresses both mCherry and TagBFP, both copies of ETS1 will be disrupted, which we accept as evidenc that the endogenous ETS1 have been eliminated. Meanwhile, these cells can be easily

-/- purified through flow sorting. Secondly, the ETS1 cell is then supplemented with Tet- inducible Controlled Gene Expression Vector through lentiviral transduction. Tet-inducible

Controlled Gene Expression Vector contains the ETS1 gene tagged with TagGFP under the control of Tet-inducible promoter. Through lentiviral transduction, the Tet-inducible

Controlled Gene Expression Vector is inserted into genome. In the presence of Dox, the expression of TagGFP will indicated cell transduced with Tet-inducible Controlled Gene

Expression Vector. These cells can be easily purified through flow sorting, which are the cells to be used for Dox experiments. Thirdly, Dox titration experiments are performed with these final cells. For titration experiments, we need to determine the optimal Dox concentration mimicking the small differential gene expression.

Our result shows that sorted mCherry+TagBFP+ HEK293T cells all had both copies of ETS1 genes disrupted, indicating a pure population. Therefore, there is no endogenous

114

ETS1 expression. Meanwhile, the Tet-inducible Controlled Gene Expression Vector lenti- transduction had very high transduction efficiency due to the nature of HEK293T cells. In the end, the Dox titration experiments on the sorted GFP+ cells confirmed that the graded response of the Tet-inducible Controlled Gene Expression system. For Dox in the range of concentration from 0 to 100 ng/ml, we observed significant graded mRNA and protein expressions that distinguish Dox concentration difference of 10ng/ml difference. The expression difference will be very small when the concentration change is less than

10ng/ml. What is more, the differential ETS1 expression may be titrated from 1-fold to 10- fold difference, indicating the great flexibility and responsiveness of this system.

This system has several advantages. First, these is no need for single cell cloning in the gene knockout process. Normally, single cell cloning and sequence confirmation is needed when CRISPR system is used for gene knockout. However, this process is very time consuming and may not be feasible for some cell lines, including EBV-transformed

B cell lines, which do not survive low cell densities. Our approach provides a convenient way to obtain a pure knockout population, which saves time and is feasible for all cells.

Second, eGFP sorting is used instead of puromycin selection in the lentiviral transduction, which saves time and increases the purity. The most important feature is the graded gene expression induced by Dox. This provides a system for studying the function of small differential gene expression. Traditionally, gene knockout or overexpression approaches are used to study the gene function and disease development. However, this may not really reflect the physiological situation. In fact, complex diseases are rarely caused by complete gene knockout or extreme gene overexpression. Rather, it is believed that complex genetic diseases are caused by an aggregation of many small differential gene

115 expressions, which disrupt the homeostasis in the end. Therefore, our system provides a more physiologically relevant way to study the gene function and disease development.

Although this system is originally developed to study the downstream consequences of small differential gene expression of genetic variants, focusing on gene expression and transcription factor binding, this system has great potential to explore other questions. For example, it can be used to study cell function with small differential gene expression. Also, it can be used to study cell differentiation of small differential gene expression through combination with ES cells. Moreover, although it is very complicated, this system has the potential to fully mimic the disease development through precise control of several genes together in the mouse model with combination of Tet-On and

Tet-Off system.

Besides the Tet-inducible gene expression control part, the technique to achieve homozygous gene knockout through mCherry and TagBFP using CRISPR system has great potential as well. Although it is designed to quickly generate gene knockout population, it can be used as reporter system. This system still uses the endogenous gene promoter through insertion of mCherry or TagBFP into the gene body. Therefore, it can be used to measure the effect of different stimuli on the targeted gene through measuring mCherry or TagBFP activities using flow cytometry or qPCR.

A possible next step is to utilize this system in the EBV-transformed B cell line, which is a more relevant model for SLE. This system would allow the decoding of the downstream effects of small differential gene expression caused by genetic variants.

Employing RNA-Seq and ChIP-Seq would allow the exploration of the gene expression change and the transcription factor binding change. By combining RNA-Seq and ChIP-

116

Seq, we may describe the downstream pathway affected by this small differential gene expression. This can greatly advance our understanding of the function of genetic variants and help determine how the genetic variants connect with disease etiology.

Unexplained Questions

The relationship of ETS1 and STAT1 in disease pathogenesis

STAT1 and STAT4 genes are members of STAT transcription factor family, originally defined by their binding to cytokine-inducible genes [280]. These two genes are in tandem on and are expressed in hematopoietic cells (including T cells,

B cells, and myeloid cells) and testis [281]. Both genetic studies and biological studies showed the STAT1-STAT4 locus is associated with the susceptibility to develop SLE.

This STAT1-STAT4 locus has been confirmed to be associated with SLE disease risk in all major ancestries [159, 282-289]. Currently, most studies used the variants within intron 3 of STAT4 gene to represent this associated region, which is confined to be 55.5 kb in genome. Now, little is known about how these genetic variants affects disease risk.

One recent study suggested this locus may affect disease risk through affecting STAT1 and STAT4 expression. This study showed that the haplotype of lupus-risk variants at this locus are associated with increased STAT1 and STAT4 expression in monocytes [290].

Meanwhile, the data from our group also found the genetic variants at this locus are associated with increased STAT1 expression in B cells (unpublished data).

Importantly, STAT1 and STAT4 are key transcription factors activated in the Type

1 & Type 2 interferon signaling pathway and IL-12 signaling pathway [291]. The IL-12 receptor is a heterodimer. It consists of IL-12Rβ2 and IL-12Rβ1, which are coupled to the

117

Janus kinases JAK2 and TYK2 [292]. After phosphorylation by JAK2 or TYK2, STAT4 or

STAT1 will dimerize and translocate to the nucleus, where STAT4/STAT1 dimer binds to

DNA to induce gene transcription [293]. This activation mode of STAT4/STAT1 can launch a rapid response to cytokine stimuli, which can turn on the downstream signaling quickly. Therefore, the levels of STAT proteins can drastically affect the magnitude of the transcriptional response.

In lupus pathogenesis, Type 1 interferon signaling pathway is known to play an important role. In the serum, SLE patients have higher levels of Type 1 interferon than healthy controls. Meanwhile, peripheral blood mononuclear cells (PBMCs) from patients have higher levels of Type 1 interferon gene expression signature than PBMCs from healthy controls [294][295]. Moreover, PBMCs from SLE patients are more sensitive to

Type 1 interferon stimulation compared with PBMCs from healthy controls [296, 297].

Furthermore, although very rare, sometimes the injection of Interferon-β will develop a

SLE-like disease characterized by the production of anti-nuclear autoantibodies and multisystem pathology [298, 299]. Finally, Type 2 Interferon is important in initiating the germinal center of autoreactive B-cells and leading Type 1 interferon signature [300, 301].

All these data suggest increased STAT1 expression may affect SLE risk.

Our data suggested that enhanced STAT1 binding to the risk allele of rs6590330 is associated with decreased ETS1 expression, which is related with increased SLE risk

(see Chapter 3). Past studies have shown ETS1 has close relationship with STAT1 protein. Some studies show ETS1 can regulate STAT1 expression. For example, in MCF-

7 human breast cancer cell line, STAT1 is the downstream target of ETS1 [302]. However, there is limit report about whether STAT1 can regulate ETS1 expression.

118

Meanwhile, ETS1 and STAT1 can work coordinately or competitively to regulate downstream gene expression. First, they may work synchronically. ETS1 can physically interact with STAT1 protein to form a complex to regulate gene expression. In mouse B cells, T-bet, a T-box transcription factor, will be up-regulated in lymphocytes after stimulation by IFN-γ or IL-27. T-bet expression is required for inducing the class-switch recombination (CSR) to IgG2a. Mechanistically, ETS1 and STAT1 forms a complex to bind on a T-bet enhancer to induce T-bet expression [195]. Similarly, ETS1 and STAT1 forms a complex to regulate the expression of intercellular adhesion molecule-1 (ICAM-

1) in the immune cell after stimulation by proinflammatory cytokines, including IFN-γ.

ICAM-1 plays an important role in the interactions between effector cells and target cells

[303]. On the other hand, ETS1 and STAT1 may have competitive roles in the transcriptional gene regulation. For example, cyclooxygenase-2 (COX-2) is very important in some pathological problems, such as pancreatic beta-cell dysfunction. In the regulation of COX-2, ETS1 can positively regulate its promoter activity, however STAT1 will negatively affect its promoter [304].

Our data showed enhanced STAT1 binding is associated with decreased ETS1 expression. Although STAT1 is generally involved in the positive transcriptional regulation, past studies have shown its binding has a role in the negative regulation as well. For example, transcriptional silencing of perlecan gene by IFNγ is Stat1-dependent. This suppression needs a distal promoter region which contains multiple gamma activated sequence (GAS) elements, where Stat1 can bind [305]. IFNγ mediated c-myc suppression is absent in Stat1-null mouse embryo fibroblasts [306]. It turns out this

119 suppression needs a GAS element in the c-myc promoter and the phosphorylation of both tyrosine and serine residues of Stat1 [306].

Taken together, the finding that enhanced STAT1 binding to the risk allele of rs6590330 is associated with decreased ETS1 expression is still unique now. Even though there are reports about STAT1 binding leads to decreased expression for other genes [305, 306], there are no other report about STAT1 binding can lead to decreased

ETS1 expression. This may be because this STAT1 mediated ETS1 suppression is through the risk allele of rs6590330. First, this SNP doesn’t exist in mouse, which makes this phenotype unlikely to be observed in mice. Second, the frequency of risk allele is around 49.6% in Asian and only 17.7% in Caucasian. Since most of the human cell lines are first derived from Caucasian, this makes it even harder to find this suppression effect in most current available cell lines. Third, the decreased ETS1 expression is only observed in Asian. Some ancestry specific mechanisms may play a role in this suppression effect, which will talk about later. All of these factors make this phenotype difficult to be observed in the current available models, including cell lines and mice, which may explain why this phenotype is only discovered by us now.

Although our finding that enhanced STAT1 binding to the risk allele of rs6590330 results in decreased ETS1 expression is still unique, this finding is very important in explaining the lupus pathogenesis, as this may indicate a synchronized effect between

STAT1 and ETS1 for lupus risk. It is possible that the higher levels of Type I interferon from SLE patients leads to increased STAT1 expression, which in turn results in decreased ETS1 expression. Then, this decreased ETS1 expression may disrupt downstream signaling pathway and B cell function, which promotes SLE development.

120

Why is ETS1 association specific to the Asian ancestry?

As discussed in the “Association of ETS1 with SLE” part of Chapter 1, recently there are two association studies claimed ETS1 locus is associated with SLE in

Caucasian population. One of the studies is a candidate genotyping study [224], and the other one is a meta-analysis studies [225]. Although both claimed ETS1 locus is associated with SLE in European population, they both used a much less stringent approach to define the significant association (p < 0.05) instead of the commonly used genome-wide significance (p < 5 x 10-8). Therefore, these results need to be interpreted cautiously. Hence, if based on the genome-wide significance criteria, ETS1 locus is still only associated in Asian ancestry population.

For our study, we have performed a power analysis before the association study.

The results confirmed we had enough power to detect the association in bother African

American (AA) and European (EU) ancestry populations with a prior power of 92% for AA and 99% for EU. However, we only had 25% power to detect the association in Hispanic

(HA) ancestry population. But, HA ancestry had variants with association at p<10-7, which are reduce to 10-4 after admixture adjustment (Chapter 3 Figure S1). This suggests that the association seen in HA ancestry might be due to its mixed Asian population structure.

Therefore, we conclude that genetic variants in ETS1 are associated with SLE only in the

AS ancestry population.

There are two possible mechanisms can explain this Asian specific ETS1 association in SLE: ancestry specific epigenetic modification and gene epistasis interaction. For the epigenetic modification mechanism, the ancestry specific epigenetic

121 modification may either activate this associated region and the risk allele disrupts the active state or repress the associated region and the risk allele enhances the repressed function. It is already known that the ancestry specific epigenetic modifications can be inheritable [307]. Beside the inheritable ancestry specific epigenetic modification, epigenetic modification may also be influenced by and environmental exposures

[308, 309]. It is well known that the dietary structures differ between Asian and European population. In a study conducted in adolescents, Asians have a diet containing more eggs, chicken, fish, than Europeans, while Europeans have more milk, cereal than

Asians [310]. These dietary differences may also contribute to the epigenetic modification differences. Meanwhile, environmental exposures are known to alter the epigenetic modifications, including the air pollution, which repressed the global methylation content

[311]. Both mice and human studies have shown methylation plays an important role in the pathogenesis of lupus [312]. Methylation of DNA can stabilize chromatin in an inactive state and suppress gene transcription. Change of methylation status can affect gene expression and contribute to the development of disease. Both hypomethylation and hypermethylation have been shown to affect the function of T cell and B cell in SLE [312].

It is also suggested that the hypomethylated plasma DNA and CpG motif may induce several SLE-related manifestation [312]. For DNA that is rich in CpG motifs and hypomethylated, such as bacterial DNA, viral DNA and synthetic oligos, can induce the polyclonal B cell activation and production of autoantibodies in mice and human. Since the environmental pollution in Asian areas is much more severe than the North American and European areas [313], this different environmental exposure may also contribute to

122 the ancestry different epigenetic modification, which may explain why ETS1 association is only observed in Asian ancestry population.

For exploring epigenetic modifications, we have investigated the epigenetic modification at rs6590330 in EBV-transformed B cell line (GM12878) from a public database including ENCODE [242] and Roadmap Epigenomics [243]. No epigenetic modification is found at rs6590330. This maybe because GM12878 is derived from

European population and is homozygous non-risk for rs6590330, which means GM12878 would not carry the Asian specific epigenetic modification. Therefore, it is necessary to investigate the epigenetic modifications in EBV-transformed B cell line derived from Asian ancestry containing the risk allele.

The other possible mechanism is the gene epistasis interaction [314]. One way is that ETS1 may interact with other ancestry specific genes to affect disease pathogenesis.

Both associated genes need to be affected together to induce disease. Besides ETS1

locus, DNASE1, ELF1, CD80, ATG16L2/FCHSD2/P2RY2, GPR19 and DRAM1 loci are

Asian specific lupus association loci [165]. The other possible way is that ancestry specific epigenetic modification may lead to ancestry specific gene expression. This ancestry specific expressed gene may interact with ETS1 to affect disease pathogenesis. However, it is very hard to find the epistatic interaction due to the low power caused by the small effect size of GWAS associated genetic variants and the incredible number of possible combinations.

We have recently performed a primary investigation of the genomic function for the associated region through a luciferase assay. Around ~ 1kb genomic region with rs6590330 in the center from Asian ancestry cell line was cloned into a luciferase vector

123 upstream of the mini promoter. The result showed the genomic segment has a weak enhancer activity with non-risk allele, which will be disrupted by risk allele. The problem is that the observed decreased activity caused by risk allele is not very reproducible, which may be due to the small effect size. In order to significantly detect the decreased activity caused by risk allele, a sample size of larger than 25 may be needed.

This Asian specific ETS1 association may also partially explain the higher incidence and prevalence of SLE in Asian population [8, 9, 16, 17]. As we know, patients from Asian origins will develop disease earlier and show a more progressive development

[8]. Also, several lines of evidence indicate that lupus nephritis is more prevalent in patients from Asian [8, 18]. The pathogenesis of SLE in Asian ancestry must be different from that in other ancestries in some part. It has been shown that Ets1-deficient (Ets1p/p) mice will develop a lupus-like disease characterized by high titers of IgM and IgG autoantibody and immune complex deposition in the kidney [193]. Meanwhile, this ETS1 deficiency has been shown to drive intrinsic terminal differentiation of B cells into IgM- secreting plasma cells in B cell intrinsic fashion [197]. Therefore, this Asian specific ETS1 association may explain why Asian population has higher incidence and prevalence of

SLE and more prevalent lupus nephritis.

The association with other immune-mediated diseases at ETS1 locus

Beside the association with SLE, ETS1 locus has been reported to be associated with other immune-mediated diseases, including atopic dermatitis, celiac disease, bipolar disorder, follicular lymphoma, rheumatoid arthritis, psoriasis vulgaris, psoriasis and allergy through GWAS Catalog [315]. For each disease, the associated SNP, p-value,

124

OR and 95% confidence interval (CI) is summarized in Table 3. The LD relationship of all these associated SNPs in Asian and European populations are represented in Figure 1 using LDlink tool [316]. The LD structure of these SNPs in Asian and European populations are very similar with some small difference. However, rs6590330, the potential causal genetic variants identified in our study, has only been reported to be associated with SLE, no other diseases. Moreover, this SNP is only in high LD with rs1128334, another SLE associated SNP, no other disease associated SNPs. This may indicate that the mechanisms of ETS1 association in the other immune-mediated diseases are likely to be different from the mechanism of association in SLE.

Table 3 ETS1 association in the other immune-mediated diseases [315]

Disease/Trait SNP p-value OR 95% CI Ref Rheumatoid arthritis rs4937362 8.00E-07 1.09 [1.06-1.13] [317] Rheumatoid arthritis rs73013527 1.00E-10 1.09 [1.06-1.12] [318] Psoriasis vulgaris rs3802826 1.00E-07 1.124 NR [319] Psoriasis rs6590334 2.00E-09 1.14 [1.09-1.18] [320] Psoriasis rs7933433 3.00E-06 1.11 [1.06-1.16] [320] Psoriasis rs55974252 3.00E-06 1.12 [1.07-1.18] [320] [NR] pg/mL Obesity-related traits rs7117932 8.00E-06 0.03 [321] increase chr11:1274 [0.69-1.65] Manganese levels 2.00E-06 1.17 [322] 16532 unit decrease Follicular lymphoma rs4937362 7.00E-11 1.19 [1.13-1.25] [323] Depressive episodes [NR] unit rs7123770 3.00E-06 0.14 [324] in bipolar disorder increase Celiac disease rs11221332 5.00E-16 1.21 [1.16-1.27] [325] Atopic dermatitis rs7127307 1.00E-20 NR [326] [0.036-0.082] Self-reported allergy rs10893845 6.00E-07 0.059 [327] unit decrease

125

A

126

B

127

Figure 1 LD relationship of associated ETS1 SNPs in the all immune-mediated diseases. (A) represents the LD relationship of associated SNPs in Asian population. (B) represents the LD relationship of associated SNPs in European population. The LD relationship is represented by R2. The bigger the R2 is, the stronger the LD relationship is [316].

The possibility of causality in other immune cell type

As discussed in Chapter 1, in SLE, several immune cells are affected, including B cells, macrophages, DCs, neutrophils, T cells. For ETS1, it has important functions in B cells, T cells, NK cells and NK T cells. Based on the manifestations of SLE and immune cell dysfunction caused by depressed ETS1 expression, B cells and Th17 cells are the most possible cell type for causality in SLE.

As discussed in the Immune Cell Dysfunction section of Chapter 1, Th17 cells play an important role in the pathogenesis of SLE. There is an increased Th17 cells in SLE patients [205]. Th17 cells are positively correlated with lupus nephritis development. Also,

IL-17 is mainly produced by Th17 cells. In SLE patients, the IL-17 level and IL-17 producing cell number are increased in the serum, which are correlated with disease activity [70]. IL-17 producing cells can penetrate skin, lung and kidneys, which contributes to the organ damage [71]. IL-17 can also induce patient peripheral blood mononuclear cells (PBMCs) to produce anti-dsDNA antibody, which may indicate that IL-17 promotes

B cell activation [72].

Ets1 is shown to play a negative role in the development and function of Th17 cells

[230]. Ets1 deficient mice have increased Th17 cells. This was due to the decreased IL-

2 production and increased resistance to the IL-2 inhibition effect, which may be mediated through STAT5 [201]. In Ets1 deficient mice, there is an increased differentiation of Th17 cells [230]. Meanwhile, Ets1 deficient Th17 cell has increased IL-17 production, which

128 can be reversed through Ets1 p54 transduction. The increased IL-17 can be observed in the lung and thymus of the mouse. This may be due to the decreased IL-2 production and increased resistance to IL-2 for the Ets1 deficient cell. This increased resistance to IL-2 seems to be correlated with defect of downstream of STAT5 phosphorylation, not RORγt expression. The details of the mechanism is still unclear [201]. Another proposed mechanism is the effect of Ets1 on the expression of IP3 (inositol 1,4,5-trisphosphate)

2+ receptor 3 (IP3R3), which controls the follow of Ca from intracellular stores in response

+ to IP3. Ets1 deficient CD4 cells lack IP3R3, which disrupts the TCR-induced calcium flux and leads to the differentiation of Th17 cells [206].

IL-17 can induce the secretion of proinflammatory cytokines including IL-6, IL-1 and IL-21 [64]. Meanwhile, IL-17 can promote the recruitment of T cells to local inflammatory tissues. IL-17 has also been shown to drive B cell differentiation to plasma cells and autoantibody production[65].

In MRL/lpr mice, IL-17 expression is associated with lupus disease progression

[66]. Increased IL-17 mediated tissue injuries are observed after ischemic reperfusion of the gut [67]. IL-17 producing T cells are also observed in kidneys for mice with nephritis

[68]. In BXD2 mice, IL-17 level is high in serum and IL-17+ cell number is increased in spleen, where autoreactive germinal center is formed with IL-17R expressing B cells [69].

IL-17R inhibition or deletion can block these phenotypes [69].

In summary, as SLE patients have decreased ETS1 expression and rs6590330 is associated with decreased ETS1 expression, it is possible ETS1 affects SLE through increased numbers and function of Th17 caused by decreased ETS1 expression.

129

Future Experiments

Assessing the necessity and sufficiency of rs6590330 in the reduction of ETS1 expression

Our current hypothesis is the risk allele of rs6590330 leads to reduced ETS1 expression in EBV-transformed B cell lines. Although rs6590330 has the highest possibility to be the causal genetic variants according to our current evidence, all the other genetic variants still have the possibility to be the causal, especially for those we have never tested by experiments. Therefore, we need more experiments to demonstrate the causality of rs6590330.

To tackle this question, we would employ the strategy used in the FTO paper [239].

We would use an Asian derived EBV-transformed B cell line carrying homozygous non- risk for rs6590330 to create a cell line that is totally identical except homozygous risk for rs6590330 using CRISPR technology. Then this cell line carrying homozygous risk for rs6590330 would be re-edited to homozygous non-risk for rs6590330 using CRISPR technology. Then, these three generated cell lines will be assessed for ETS1 expression.

gRNA targeting risk allele or non-risk allele of rs6590330 will be designed using

CRISPR design tool [328]. Specific gRNA will be cloned into vector pCas9-GFP containing Cas9 and GFP selection marker. The homology vector will contain 4 kb homology regions with only the rs6590330 single nucleotide being different. pCas9-GFP and homology vector will be co-transfected into EBV-transformed B cell lines using Neon transfection system. Single cell colonies would be obtained through FACS and serial dilutions, followed by an expansion period to establish a new clonal cell line. Single cell colonies will be genotyped with PCR and sequencing.

130

In the end, we would have two EBV-transformed B cell lines with homozygous risk allele or non-risk allele for rs6590330. Then, these three cell lines would provide the capacity to assess ETS1 mRNA expression with Taq-man qPCR and ETS1 protein expression by western blotting, under the assumption that the only difference between them was at the rs6590330 variant.

We expect to observe decreased ETS1 expression with changing from non-risk allele to risk allele. Moreover, the ETS1 expression would return back to the non-risk level when re-editing back to non-risk allele from risk allele.

Further validating the role of STAT1 in modulating ETS1 expression in a genotype-dependent manner

Our hypothesis is the abundance of pSTAT1 is correlated with ETS1 expression.

We can tackle this question through overexpression and knockdown of STAT1 in the previous CRISPR engineered EBV-transformed B cell lines carrying risk or non-risk allele of rs6590330. To assess the effect of STAT1 overexpression on ETS1 expression, we can employ the Tet-inducible Controlled Gene Expression system to have STAT1 under the control of Tet-inducible promoter. The vector will then be transfected into

CRISPR engineered B cell lines using Neon transfection system. Through providing cell with different Dox concentration, we can control the level of STAT1 overexpression.

Moreover, to assess the effect of STAT1 knockdown on ETS1 expression, different amount of STAT1 shRNA will be transfected into CRISPR engineered B cell lines using

Neon transfection system.

131

STAT1 overexpression and STAT1 knockdown would be validated using Taq-man qPCR and western blotting. Then, the ETS1 mRNA expression would be assessed using

Taq-man qPCR. Also, we would detect ETS1 protein expression by western blotting.

We would only expect to see decreased ETS1 mRNA and protein expression in cell lines transfected with STAT1 overexpression plasmid in EBV-transformed B cell line carrying risk allele. Also, we would expect to see increased ETS1 mRNA and protein expression in cell lines transfected with STAT1 shRNA vector in EBV-transformed B cell line carrying risk allele. Moreover, we would not expect to see changes in ETS1 mRNA and protein expression for STAT1 overexpression or knockdown for in EBV-transformed

B cell line carrying non-risk allele or re-editing non-risk allele.

Identifying the mechanism through which pSTAT1 binds to non- overlapping risk allele of rs6590330

As discussed in the summary of “Lupus risk-variant increases pSTAT1 binding and decreases ETS1 expression” part, we first need to make sure whether pSTAT1 can direct bind to DNA itself, which can be verified through EMSA and DAPA with purified pSTAT1 protein. If pSTAT1 cannot bind to DNA itself, it means a co-transcription factor is necessary for pSTAT1 binding. Therefore, it is possible that rs6590330 affects the co- transcription factor directly, which will disrupts pSTAT1 binding. If pSTAT1 can bind to

DNA itself, this means rs6590330 may affect pSTAT1 binding through affecting chromatin state, such as the histone modification and chromatin structure [271-273]. However, since differential pSTAT1 binding is observed in in vitro EMSA and DAPA with synthesized oligo, which doesn’t preserve the chromatin state information, the probability that rs6590330

132 affects chromatin state is lower compared with affecting co-transcription factor. But the probability that rs6590330 affects chromatin state still exists as the in vitro system cannot fully capture the in vivo situation.

To verify whether pSTAT1 can directly bind to DNA itself, we can incubate synthesize biotinylated oligos to perform EMSA or DAPA with purified pSTAT1 protein.

Normal oligo will include 35 bp sequence around rs6590330 containing risk allele or non- risk allele. STAT1 scramble oligo will use the same sequence but scramble the non- perfect STAT1 binding site. Positive control oligo will use the same sequence but replace rs6590330 risk allele with perfect pSTAT1 binding site and negative control oligo will use totally scrambled sequence. Purified pSTAT1 protein will be incubated with these oligos to test whether STAT1 protein can bind to these oligos.

If the result of EMSA or DAPA shows that pSTAT1 cannot directly bind to DNA itself, we will hypothesize that pSTAT1 binds to a non-overlapping region near risk allele of rs6590330 through another co-transcription factor. Sp1 is identified through motif analysis. Therefore, we have recently explored this hypothesis through DAPA coupled with Mass Spectrometry and DAPA coupled with western blotting for Sp1. So far, we haven’t found strong candidate through data mining of the mass spectrometry result and found no difference in the DAPA with western blotting for Sp1. This may be due to the technical limits of DAPA, as DAPA cannot fully capture the in vivo condition and the sensitivity of the mass spectrometry and western blotting may be limited in detecting Sp1 difference. Other techniques liking ChIP-ChIP and Sp1 ChIP-qPCR can be used to further explore this question.

133

If the result of EMSA or DAPA shows that pSTAT1 can directly bind to DNA itself, we will hypothesize that risk allele may change the local chromatin state. One way is to change the nearby histone modification, which opens the chromatin for pSTAT1 binding.

To test this hypothesis, we need to explore the histone modification profile in the region near rs6590330 with CRISPR generated EBV cell lines carrying risk or non-risk allele.

ChIP-Seqs of different histone modifications are needed to investigate this problem. We expect to see increased histone modification indicating chromatin openness with the presence of risk allele, which will decrease with the presence of non-risk allele.

Another possible mechanism is that the risk allele may affect the local structure of the chromatin, which creates a binding site for STAT1. However, pSTAT1 binding is only around 10bp away from rs6590330, which crystallization of the two alleles from in vivo generated genomic structure would be fantastic. However, this will be very difficult and expensive experiment due to the technical limits.

134

Bibliography

1. Duan, L. and E. Mukherjee, Janeway’s Immunobiology, Ninth Edition. The Yale Journal of Biology and Medicine, 2016. 89(3): p. 424-425. 2. Paul, W.E., Fundamental Immunology. 2012: Wolters Kluwer Health. 3. Al-Herz, W., et al., Primary immunodeficiency diseases: an update on the classification from the international union of immunological societies expert committee for primary immunodeficiency. Front Immunol, 2014. 5: p. 162. 4. Chinen, J. and W.T. Shearer, Secondary immunodeficiencies, including HIV infection. J Allergy Clin Immunol, 2010. 125(2 Suppl 2): p. S195-203. 5. Rajan, T.V., The Gell–Coombs classification of hypersensitivity reactions: a re-interpretation. Trends in Immunology, 2003. 24(7): p. 376-379. 6. Aghazadeh, Y. and M.C. Nostro, Cell Therapy for Type 1 Diabetes: Current and Future Strategies. Curr Diab Rep, 2017. 17(6): p. 37. 7. Kaul, A., et al., Systemic lupus erythematosus. Nat Rev Dis Primers, 2016. 2: p. 16039. 8. Carter, E.E., S.G. Barr, and A.E. Clarke, The global burden of SLE: prevalence, health disparities and socioeconomic impact. Nat Rev Rheumatol, 2016. 12(10): p. 605-20. 9. Nasonov, E., et al., The prevalence and incidence of systemic lupus erythematosus (SLE) in selected cities from three Commonwealth of Independent States countries (the Russian Federation, Ukraine and Kazakhstan). Lupus, 2014. 23(2): p. 213-9. 10. Rees, F., et al., The incidence and prevalence of systemic lupus erythematosus in the UK, 1999-2012. Ann Rheum Dis, 2016. 75(1): p. 136-41. 11. Malaviya, A.N., et al., Prevalence of systemic lupus erythematosus in India. Lupus, 1993. 2(2): p. 115-8.

135

12. Uramoto, K.M., et al., Trends in the incidence and mortality of systemic lupus erythematosus, 1950-1992. Arthritis Rheum, 1999. 42(1): p. 46-50. 13. Tsokos, G.C., et al., New insights into the immunopathogenesis of systemic lupus erythematosus. Nat Rev Rheumatol, 2016. 12(12): p. 716-730. 14. Bernatsky, S., et al., Mortality in systemic lupus erythematosus. Arthritis Rheum, 2006. 54(8): p. 2550-7. 15. McMurray, R.W. and W. May, Sex hormones and systemic lupus erythematosus: review and meta-analysis. Arthritis Rheum, 2003. 48(8): p. 2100-10. 16. Lim, S.S., et al., The incidence and prevalence of systemic lupus erythematosus, 2002-2004: The Georgia Lupus Registry. Arthritis Rheumatol, 2014. 66(2): p. 357-68. 17. Somers, E.C., et al., Population-based incidence and prevalence of systemic lupus erythematosus: the Michigan Lupus Epidemiology and Surveillance program. Arthritis Rheumatol, 2014. 66(2): p. 369-78. 18. Lau, C.S., G. Yin, and M.Y. Mok, Ethnic and geographical differences in systemic lupus erythematosus: an overview. Lupus, 2006. 15(11): p. 715-9. 19. Shaikh, M.F., N. Jordan, and D.P. D'Cruz, Systemic lupus erythematosus. Clin Med (Lond), 2017. 17(1): p. 78-83. 20. Hochberg, M.C., Updating the American College of Rheumatology revised criteria for the classification of systemic lupus erythematosus. Arthritis Rheum, 1997. 40(9): p. 1725. 21. Petri, M., et al., Derivation and validation of the Systemic Lupus International Collaborating Clinics classification criteria for systemic lupus erythematosus. Arthritis Rheum, 2012. 64(8): p. 2677-86. 22. Moulton, V.R., et al., Pathogenesis of Human Systemic Lupus Erythematosus: A Cellular Perspective. Trends Mol Med, 2017. 23. Zharkova, O., et al., Pathways leading to an immunological disease: systemic lupus erythematosus. Rheumatology (Oxford), 2017. 56(suppl_1): p. i55-i66.

136

24. Alarcon-Segovia, D., et al., Familial aggregation of systemic lupus erythematosus, rheumatoid arthritis, and other autoimmune diseases in 1,177 lupus patients from the GLADEL cohort. Arthritis Rheum, 2005. 52(4): p. 1138-47. 25. Eroglu, G.E. and P.F. Kohler, Familial systemic lupus erythematosus: the role of genetic and environmental factors. Ann Rheum Dis, 2002. 61(1): p. 29-31. 26. Block, S.R., et al., Studies of twins with systemic lupus erythematosus. A review of the literature and presentation of 12 additional sets. Am J Med, 1975. 59(4): p. 533-52. 27. Grennan, D.M., et al., Family and twin studies in systemic lupus erythematosus. Dis Markers, 1997. 13(2): p. 93-8. 28. Sparks, J.A. and K.H. Costenbader, Genetics, environment, and gene-environment interactions in the development of systemic rheumatic diseases. Rheum Dis Clin North Am, 2014. 40(4): p. 637- 57. 29. Barbhaiya, M. and K.H. Costenbader, Ultraviolet radiation and systemic lupus erythematosus. Lupus, 2014. 23(6): p. 588-95. 30. Nelson, P., et al., Viruses as potential pathogenic agents in systemic lupus erythematosus. Lupus, 2014. 23(6): p. 596-605. 31. Poole, B.D., et al., Epstein-Barr virus and molecular mimicry in systemic lupus erythematosus. Autoimmunity, 2006. 39(1): p. 63-70. 32. Yadav, P., et al., Antibodies elicited in response to EBNA-1 may cross-react with dsDNA. PLoS One, 2011. 6(1): p. e14488. 33. Du, J., et al., DNA methylation pathways and their crosstalk with histone methylation. Nat Rev Mol Cell Biol, 2015. 16(9): p. 519-32. 34. Fraser, P.A., et al., Glutathione S-transferase M null homozygosity and risk of systemic lupus erythematosus associated with sun exposure: a possible gene-environment interaction for autoimmunity. (0315-162X (Print)). 35. Ran, F.A., et al., Genome engineering using the CRISPR-Cas9 system. (1750-2799 (Electronic)). 36. Jacobi, A.M., et al., Correlation between circulating CD27high plasma cells and disease activity in patients with systemic lupus erythematosus. Arthritis Rheum, 2003. 48(5): p. 1332-42.

137

37. Li, R., D.K. Pei H Fau - Watson, and D.K. Watson, Regulation of Ets function by protein - protein interactions. 2000(0950-9232 (Print)). 38. Ohl, K. and K. Tenbrock, Inflammatory cytokines in systemic lupus erythematosus. J Biomed Biotechnol, 2011. 2011: p. 432595. 39. Fitzgerald-Bocarsly, P., J. Dai, and S. Singh, Plasmacytoid dendritic cells and type I IFN: 50 years of convergent history. Cytokine Growth Factor Rev, 2008. 19(1): p. 3-19. 40. Ytterberg, S.R. and T.J. Schnitzer, Serum interferon levels in patients with systemic lupus erythematosus. Arthritis Rheum, 1982. 25(4): p. 401-6. 41. Bengtsson, A.A., et al., Activation of type I interferon system in systemic lupus erythematosus correlates with disease activity but not with antiretroviral antibodies. Lupus, 2000. 9(9): p. 664-71. 42. Obermoser, G. and V. Pascual, The interferon-alpha signature of systemic lupus erythematosus. Lupus, 2010. 19(9): p. 1012-9. 43. Vallin, H., et al., Patients with systemic lupus erythematosus (SLE) have a circulating inducer of interferon-alpha (IFN-alpha) production acting on leucocytes resembling immature dendritic cells. Clin Exp Immunol, 1999. 115(1): p. 196-202. 44. Banchereau, J., et al., Immunobiology of dendritic cells. Annu Rev Immunol, 2000. 18: p. 767-811. 45. Boule, M.W., et al., Toll-like receptor 9-dependent and - independent activation by chromatin-immunoglobulin G complexes. J Exp Med, 2004. 199(12): p. 1631-40. 46. Yang, J., et al., Induction of interferon-gamma production in Th1 CD4+ T cells: evidence for two distinct pathways for promoter activation. Eur J Immunol, 1999. 29(2): p. 548-55. 47. Ferber, I.A., et al., Mice with a disrupted IFN-gamma gene are susceptible to the induction of experimental autoimmune encephalomyelitis (EAE). J Immunol, 1996. 156(1): p. 5-7. 48. Enghard, P., D. Langnickel, and G. Riemekasten, T cell cytokine imbalance towards production of IFN-gamma and IL-10 in NZB/W F1 lupus-prone mice is associated with autoantibody levels and nephritis. Scand J Rheumatol, 2006. 35(3): p. 209-16.

138

49. Jacob, C.O., P.H. van der Meide, and H.O. McDevitt, In vivo treatment of (NZB X NZW)F1 lupus-like nephritis with monoclonal antibody to gamma interferon. J Exp Med, 1987. 166(3): p. 798-803. 50. Haas, C., B. Ryffel, and M. Le Hir, IFN-gamma receptor deletion prevents autoantibody production and glomerulonephritis in lupus- prone (NZB x NZW)F1 mice. J Immunol, 1998. 160(8): p. 3713-8. 51. Richards, H.B., et al., Interferon-gamma is required for lupus nephritis in mice treated with the hydrocarbon oil pristane. Kidney Int, 2001. 60(6): p. 2173-80. 52. Hirano, T., Interleukin 6 and its receptor: ten years later. Int Rev Immunol, 1998. 16(3-4): p. 249-84. 53. Muraguchi, A., et al., The essential role of B cell stimulatory factor 2 (BSF-2/IL-6) for the terminal differentiation of B cells. J Exp Med, 1988. 167(2): p. 332-44. 54. Lotz, M., et al., B cell stimulating factor 2/interleukin 6 is a costimulant for human thymocytes and T lymphocytes. J Exp Med, 1988. 167(3): p. 1253-8. 55. Tang, B., et al., Age-associated increase in interleukin 6 in MRL/lpr mice. Int Immunol, 1991. 3(3): p. 273-8. 56. Cash, H., et al., Interleukin 6 (IL-6) deficiency delays lupus nephritis in MRL-Faslpr mice: the IL-6 pathway as a new therapeutic target in treatment of autoimmune kidney disease in systemic lupus erythematosus. J Rheumatol, 2010. 37(1): p. 60-70. 57. Liang, B., et al., Anti-interleukin-6 monoclonal antibody inhibits autoimmune responses in a murine model of systemic lupus erythematosus. Immunology, 2006. 119(3): p. 296-305. 58. Mihara, M., et al., IL-6 receptor blockage inhibits the onset of autoimmune kidney disease in NZB/W F1 mice. Clin Exp Immunol, 1998. 112(3): p. 397-402. 59. Finck, B.K., B. Chan, and D. Wofsy, Interleukin 6 promotes murine lupus in NZB/NZW F1 mice. J Clin Invest, 1994. 94(2): p. 585-91. 60. Linker-Israeli, M., et al., Elevated levels of endogenous IL-6 in systemic lupus erythematosus. A putative role in pathogenesis. J Immunol, 1991. 147(1): p. 117-23.

139

61. Tackey, E., P.E. Lipsky, and G.G. Illei, Rationale for interleukin-6 blockade in systemic lupus erythematosus. Lupus, 2004. 13(5): p. 339-43. 62. Iwano, M., et al., Urinary levels of IL-6 in patients with active lupus nephritis. Clin Nephrol, 1993. 40(1): p. 16-21. 63. Korn, T., et al., IL-17 and Th17 Cells. Annu Rev Immunol, 2009. 27: p. 485-517. 64. Miossec, P., T. Korn, and V.K. Kuchroo, Interleukin-17 and type 17 helper T cells. N Engl J Med, 2009. 361(9): p. 888-98. 65. Mitsdoerffer, M., et al., Proinflammatory T helper type 17 cells are effective B-cell helpers. Proc Natl Acad Sci U S A, 2010. 107(32): p. 14292-7. 66. Zhang, Z., V.C. Kyttaris, and G.C. Tsokos, The role of IL-23/IL-17 axis in lupus nephritis. J Immunol, 2009. 183(5): p. 3160-9. 67. Edgerton, C., et al., IL-17 producing CD4+ T cells mediate accelerated ischemia/reperfusion-induced injury in autoimmunity- prone mice. Clin Immunol, 2009. 130(3): p. 313-21. 68. Kang, H.K., M. Liu, and S.K. Datta, Low-dose peptide tolerance therapy of lupus generates plasmacytoid dendritic cells that cause expansion of autoantigen-specific regulatory T cells and contraction of inflammatory Th17 cells. J Immunol, 2007. 178(12): p. 7849-58. 69. Hsu, H.C., et al., Interleukin 17-producing T helper cells and interleukin 17 orchestrate autoreactive germinal center development in autoimmune BXD2 mice. Nat Immunol, 2008. 9(2): p. 166-75. 70. Wong, C.K., et al., Hyperproduction of IL-23 and IL-17 in patients with systemic lupus erythematosus: implications for Th17-mediated inflammation in auto-immunity. Clin Immunol, 2008. 127(3): p. 385-93. 71. Yang, J., et al., Th17 and natural Treg cell population dynamics in systemic lupus erythematosus. Arthritis Rheum, 2009. 60(5): p. 1472-83.

140

72. Dong, G., et al., IL-17 induces autoantibody overproduction and peripheral blood mononuclear cell overexpression of IL-6 in lupus nephritis patients. Chin Med J (Engl), 2003. 116(4): p. 543-8. 73. Leonard, W.J., R. Zeng, and R. Spolski, Interleukin 21: a cytokine/cytokine receptor system that has come of age. J Leukoc Biol, 2008. 84(2): p. 348-56. 74. Korn, T., et al., IL-21 initiates an alternative pathway to induce proinflammatory T(H)17 cells. Nature, 2007. 448(7152): p. 484-7. 75. Peluso, I., et al., IL-21 counteracts the regulatory T cell-mediated suppression of human CD4+ T lymphocytes. J Immunol, 2007. 178(2): p. 732-9. 76. Vogelzang, A., et al., A fundamental role for interleukin-21 in the generation of T follicular helper cells. Immunity, 2008. 29(1): p. 127-37. 77. Kuchen, S., et al., Essential role of IL-21 in B cell activation, expansion, and plasma cell generation during CD4+ T cell-B cell collaboration. J Immunol, 2007. 179(9): p. 5886-96. 78. Odegard, J.M., et al., ICOS-dependent extrafollicular helper T cells elicit IgG production via IL-21 in systemic autoimmunity. J Exp Med, 2008. 205(12): p. 2873-86. 79. Herber, D., et al., IL-21 has a pathogenic role in a lupus-prone mouse model and its blockade with IL-21R.Fc reduces disease progression. J Immunol, 2007. 178(6): p. 3822-30. 80. Bubier, J.A., et al., A critical role for IL-21 receptor signaling in the pathogenesis of systemic lupus erythematosus in BXSB-Yaa mice. Proc Natl Acad Sci U S A, 2009. 106(5): p. 1518-23. 81. Simpson, N., et al., Expansion of circulating T cells resembling follicular helper T cells is a fixed phenotype that identifies a subset of severe systemic lupus erythematosus. Arthritis Rheum, 2010. 62(1): p. 234-44. 82. Setoguchi, R., et al., Homeostatic maintenance of natural Foxp3(+) CD25(+) CD4(+) regulatory T cells by interleukin (IL)-2 and induction of autoimmune disease by IL-2 neutralization. J Exp Med, 2005. 201(5): p. 723-35.

141

83. Sadlack, B., et al., Ulcerative colitis-like disease in mice with a disrupted interleukin-2 gene. Cell, 1993. 75(2): p. 253-61. 84. Altman, A., et al., Analysis of T cell function in autoimmune murine strains. Defects in production and responsiveness to interleukin 2. J Exp Med, 1981. 154(3): p. 791-808. 85. Alcocer-Varela, J. and D. Alarcon-Segovia, Decreased production of and response to interleukin-2 by cultured lymphocytes from patients with systemic lupus erythematosus. J Clin Invest, 1982. 69(6): p. 1388-92. 86. Lieberman, L.A. and G.C. Tsokos, The IL-2 defect in systemic lupus erythematosus disease has an expansive effect on host immunity. J Biomed Biotechnol, 2010. 2010: p. 740619. 87. Pers, J.O., et al., BAFF overexpression is associated with autoantibody production in autoimmune diseases. Ann N Y Acad Sci, 2005. 1050: p. 34-9. 88. Liu, Z. and A. Davidson, BAFF and selection of autoreactive B cells. Trends Immunol, 2011. 32(8): p. 388-94. 89. Stohl, W., et al., BAFF overexpression and accelerated glomerular disease in mice with an incomplete genetic predisposition to systemic lupus erythematosus. Arthritis Rheum, 2005. 52(7): p. 2080-91. 90. Suurmond, J., et al., DNA-reactive B cells in lupus. Curr Opin Immunol, 2016. 43: p. 1-7. 91. Zhao, L., Y. Ye, and X. Zhang, B cells biology in systemic lupus erythematosus-from bench to bedside. Sci China Life Sci, 2015. 58(11): p. 1111-25. 92. Byrne, J.C., et al., Genetics of SLE: functional relevance for monocytes/macrophages in disease. Clin Dev Immunol, 2012. 2012: p. 582352. 93. Kavai, M. and G. Szegedi, Immune complex clearance by monocytes and macrophages in systemic lupus erythematosus. Autoimmun Rev, 2007. 6(7): p. 497-502. 94. Sestak, A.L., et al., The genetics of systemic lupus erythematosus and implications for targeted therapy. Ann Rheum Dis, 2011. 70 Suppl 1: p. i37-43.

142

95. Klarquist, J., et al., Dendritic Cells in Systemic Lupus Erythematosus: From Pathogenic Players to Therapeutic Tools. Mediators Inflamm, 2016. 2016: p. 5045248. 96. Waldheim, E., et al., Health-related quality of life, fatigue and mood in patients with SLE and high levels of pain compared to controls and patients with low levels of pain. Lupus, 2013. 22(11): p. 1118- 27. 97. Elliott, M.R. and K.S. Ravichandran, Clearance of apoptotic cells: implications in health and disease. J Cell Biol, 2010. 189(7): p. 1059-70. 98. Shao, W.H. and P.L. Cohen, Disturbances of apoptotic cell clearance in systemic lupus erythematosus. Arthritis Res Ther, 2011. 13(1): p. 202. 99. Elkon, K.B. and A. Wiedeman, Type I IFN system in the development and manifestations of SLE. Curr Opin Rheumatol, 2012. 24(5): p. 499-505. 100. Hervas-Stubbs, S., et al., Direct effects of type I interferons on cells of the immune system. Clin Cancer Res, 2011. 17(9): p. 2619-27. 101. Nauseef, W.M. and N. Borregaard, Neutrophils at work. Nat Immunol, 2014. 15(7): p. 602-11. 102. Thieblemont, N., et al., Human neutrophils in auto-immunity. Semin Immunol, 2016. 28(2): p. 159-73. 103. Roos, D., R. van Bruggen, and C. Meischl, Oxidative killing of microbes by neutrophils. Microbes Infect, 2003. 5(14): p. 1307-15. 104. Mantovani, A., et al., Neutrophils in the activation and regulation of innate and adaptive immunity. Nat Rev Immunol, 2011. 11(8): p. 519-31. 105. Rasheed, Z., Hydroxyl radical damaged immunoglobulin G in patients with rheumatoid arthritis: biochemical and immunological studies. Clin Biochem, 2008. 41(9): p. 663-9. 106. Midgley, A. and M.W. Beresford, Cellular localization of nuclear antigen during neutrophil apoptosis: mechanism for autoantigen exposure? Lupus, 2011. 20(6): p. 641-6. 107. Brinkmann, V., et al., Neutrophil extracellular traps kill bacteria. Science, 2004. 303(5663): p. 1532-5.

143

108. Lande, R., et al., Neutrophils activate plasmacytoid dendritic cells by releasing self-DNA-peptide complexes in systemic lupus erythematosus. Sci Transl Med, 2011. 3(73): p. 73ra19. 109. Suarez-Fueyo, A., S.J. Bradley, and G.C. Tsokos, T cells in Systemic Lupus Erythematosus. Curr Opin Immunol, 2016. 43: p. 32-38. 110. Gravano, D.M. and K.K. Hoyer, Promotion and prevention of autoimmune disease by CD8+ T cells. J Autoimmun, 2013. 45: p. 68-79. 111. Thomson, C.W., B.P. Lee, and L. Zhang, Double-negative regulatory T cells: non-conventional regulators. Immunol Res, 2006. 35(1-2): p. 163-78. 112. Crispin, J.C., et al., Expanded double negative T cells in patients with systemic lupus erythematosus produce IL-17 and infiltrate the kidneys. J Immunol, 2008. 181(12): p. 8761-6. 113. Talaat, R.M., et al., Th1/Th2/Th17/Treg cytokine imbalance in systemic lupus erythematosus (SLE) patients: Correlation with disease activity. Cytokine, 2015. 72(2): p. 146-53. 114. Craft, J.E., Follicular helper T cells in immunity and systemic autoimmunity. Nat Rev Rheumatol, 2012. 8(6): p. 337-47. 115. Choi, J.Y., et al., Circulating follicular helper-like T cells in systemic lupus erythematosus: association with disease activity. Arthritis Rheumatol, 2015. 67(4): p. 988-99. 116. Kim, H.J., et al., Inhibition of follicular T-helper cells by CD8(+) regulatory T cells is essential for self tolerance. Nature, 2010. 467(7313): p. 328-32. 117. Alvarado-Sanchez, B., et al., Regulatory T cells in patients with systemic lupus erythematosus. J Autoimmun, 2006. 27(2): p. 110-8. 118. Wu, Y.W., W. Tang, and J.P. Zuo, Toll-like receptors: potential targets for lupus treatment. Acta Pharmacol Sin, 2015. 36(12): p. 1395-407. 119. Barton, G.M. and R. Medzhitov, Toll-like receptors and their ligands. Curr Top Microbiol Immunol, 2002. 270: p. 81-92. 120. Kruse, K., et al., Inefficient clearance of dying cells in patients with SLE: anti-dsDNA autoantibodies, MFG-E8, HMGB-1 and other players. Apoptosis, 2010. 15(9): p. 1098-113.

144

121. Komatsuda, A., et al., Up-regulated expression of Toll-like receptors mRNAs in peripheral blood mononuclear cells from patients with systemic lupus erythematosus. Clin Exp Immunol, 2008. 152(3): p. 482-7. 122. Lartigue, A., et al., Critical role of TLR2 and TLR4 in autoantibody production and glomerulonephritis in lpr mutation-induced mouse lupus. J Immunol, 2009. 183(10): p. 6207-16. 123. Summers, S.A., et al., TLR9 and TLR4 are required for the development of autoimmunity and lupus nephritis in pristane nephropathy. J Autoimmun, 2010. 35(4): p. 291-8. 124. Patole, P.S., et al., Viral double-stranded RNA aggravates lupus nephritis through Toll-like receptor 3 on glomerular mesangial cells and antigen-presenting cells. J Am Soc Nephrol, 2005. 16(5): p. 1326-38. 125. Viglianti, G.A., et al., Activation of autoreactive B cells by CpG dsDNA. Immunity, 2003. 19(6): p. 837-47. 126. Lau, C.M., et al., RNA-associated autoantigens activate B cells by combined B cell antigen receptor/Toll-like receptor 7 engagement. J Exp Med, 2005. 202(9): p. 1171-7. 127. Pisitkun, P., et al., Autoreactive B cell responses to RNA-related antigens due to TLR7 . Science, 2006. 312(5780): p. 1669-72. 128. Lee, P.Y., et al., TLR7-dependent and FcgammaR-independent production of type I interferon in experimental mouse lupus. J Exp Med, 2008. 205(13): p. 2995-3006. 129. Christensen, S.R., et al., Toll-like receptor 7 and TLR9 dictate autoantibody specificity and have opposing inflammatory and regulatory roles in a murine model of lupus. Immunity, 2006. 25(3): p. 417-28. 130. Sturfelt, G. and L. Truedsson, Complement in the immunopathogenesis of rheumatic disease. Nat Rev Rheumatol, 2012. 8(8): p. 458-68. 131. Leffler, J., A.A. Bengtsson, and A.M. Blom, The complement system in systemic lupus erythematosus: an update. Ann Rheum Dis, 2014. 73(9): p. 1601-6.

145

132. Mehta, P., et al., SLE with C1q deficiency treated with fresh frozen plasma: a 10-year experience. Rheumatology (Oxford), 2010. 49(4): p. 823-4. 133. Gullstrand, B., et al., Complement classical pathway components are all important in clearance of apoptotic and secondary necrotic cells. Clin Exp Immunol, 2009. 156(2): p. 303-11. 134. Mahler, M., R.A. van Schaarenburg, and L.A. Trouw, Anti-C1q autoantibodies, novel tests, and clinical consequences. Front Immunol, 2013. 4: p. 117. 135. Leffler, J., et al., Neutrophil extracellular traps that are not degraded in systemic lupus erythematosus activate complement exacerbating the disease. J Immunol, 2012. 188(7): p. 3522-31. 136. Farrera, C. and B. Fadeel, clearance of neutrophil extracellular traps is a silent process. J Immunol, 2013. 191(5): p. 2647-56. 137. Carroll, M.C., A protective role for innate immunity in systemic lupus erythematosus. Nat Rev Immunol, 2004. 4(10): p. 825-31. 138. Chauhan, A.K. and T.L. Moore, Immune complexes and late complement proteins trigger activation of Syk tyrosine kinase in human CD4(+) T cells. Clin Exp Immunol, 2012. 167(2): p. 235-45. 139. Celhar, T. and A.M. Fairhurst, Modelling clinical systemic lupus erythematosus: similarities, differences and success stories. Rheumatology (Oxford), 2017. 56(suppl_1): p. i88-i99. 140. Du, Y., et al., Animal models of lupus and lupus nephritis. Curr Pharm Des, 2015. 21(18): p. 2320-49. 141. Crow, M.K., Type I interferon in the pathogenesis of lupus. J Immunol, 2014. 192(12): p. 5459-68. 142. Watanabe-Fukunaga, R., et al., Lymphoproliferation disorder in mice explained by defects in Fas antigen that mediates apoptosis. Nature, 1992. 356(6367): p. 314-7. 143. Andrews, B.S., et al., Spontaneous murine lupus-like syndromes. Clinical and immunopathological manifestations in several strains. J Exp Med, 1978. 148(5): p. 1198-215. 144. Hron, J.D. and S.L. Peng, Type I IFN protects against murine lupus. J Immunol, 2004. 173(3): p. 2134-42.

146

145. Lu, Q., et al., Genomic view of IFN-alpha response in pre- autoimmune NZB/W and MRL/lpr mice. Genes Immun, 2007. 8(7): p. 590-603. 146. Morel, L., et al., Functional dissection of systemic lupus erythematosus using congenic mouse strains. J Immunol, 1997. 158(12): p. 6019-28. 147. Hudgins, C.C., et al., Studies of consomic mice bearing the Y chromosome of the BXSB mouse. J Immunol, 1985. 134(6): p. 3849- 54. 148. Baccala, R., et al., Anti-IFN-alpha/beta receptor antibody treatment ameliorates disease in lupus-predisposed mice. J Immunol, 2012. 189(12): p. 5976-84. 149. Reeves, W.H., et al., Induction of autoimmunity by pristane and other naturally occurring hydrocarbons. Trends Immunol, 2009. 30(9): p. 455-64. 150. Tsao, B.P., An update on genetic studies of systemic lupus erythematosus. Curr Rheumatol Rep, 2002. 4(4): p. 359-67. 151. Chakravarti, A., Population genetics--making sense out of sequence. Nat Genet, 1999. 21(1 Suppl): p. 56-60. 152. Lo, M.S., Monogenic Lupus. Curr Rheumatol Rep, 2016. 18(12): p. 71. 153. Vyse, T.J. and J.A. Todd, Genetic Analysis of Autoimmune Disease. Cell, 1996. 85(3): p. 311-318. 154. Cui, Y., et al., Designs for linkage analysis and association studies of complex diseases. Methods Mol Biol, 2010. 620: p. 219-42. 155. Harley, I.T., et al., Genetic susceptibility to SLE: new insights from fine mapping and genome-wide association studies. Nat Rev Genet, 2009. 10(5): p. 285-90. 156. Wang, W.Y., et al., Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet, 2005. 6(2): p. 109-18. 157. Risch, N. and K. Merikangas, The future of genetic studies of complex human diseases. Science, 1996. 273(5281): p. 1516-7. 158. Becker, K.G., The common variants/multiple disease hypothesis of common complex genetic disorders. Med Hypotheses, 2004. 62(2): p. 309-17.

147

159. International Consortium for Systemic Lupus Erythematosus, G., et al., Genome-wide association scan in women with systemic lupus erythematosus identifies susceptibility variants in ITGAM, PXK, KIAA1542 and other loci. Nat Genet, 2008. 40(2): p. 204-10. 160. Crow, M.K., Collaboration, genetic associations, and lupus erythematosus. N Engl J Med, 2008. 358(9): p. 956-61. 161. Han, J.W., et al., Genome-wide association study in a Chinese Han population identifies nine new susceptibility loci for systemic lupus erythematosus. Nat Genet, 2009. 41(11): p. 1234-7. 162. Yang, W., et al., Genome-wide association study in Asian populations identifies variants in ETS1 and WDFY4 associated with systemic lupus erythematosus. PLoS Genet, 2010. 6(2): p. e1000841. 163. Lessard, C.J., et al., Identification of a Systemic Lupus Erythematosus Risk Locus Spanning ATG16L2, FCHSD2, and P2RY2 in Koreans. Arthritis Rheumatol, 2016. 68(5): p. 1197-209. 164. Chen, L., D.L. Morris, and T.J. Vyse, Genetic advances in systemic lupus erythematosus: an update. Curr Opin Rheumatol, 2017. 165. Teruel, M. and M.E. Alarcon-Riquelme, The genetic basis of systemic lupus erythematosus: What are the risk factors and what have we learned. J Autoimmun, 2016. 74: p. 161-175. 166. Howie, B.N., P. Donnelly, and J. Marchini, A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet, 2009. 5(6): p. e1000529. 167. Wang, S., et al., Disease mechanisms in rheumatology--tools and pathways: defining functional genetic variants in autoimmune diseases. Arthritis Rheumatol, 2015. 67(1): p. 1-10. 168. Stephens, M. and D.J. Balding, Bayesian statistical methods for genetic association studies. Nat Rev Genet, 2009. 10(10): p. 681-90. 169. Wasserstein, R.L. and N.A. Lazar, The ASA's Statement on p-Values: Context, Process, and Purpose. The American Statistician, 2016. 70(2): p. 129-133. 170. Kass, R.E. and A.E. Raftery, Bayes Factors. Journal of the American Statistical Association, 1995. 90(430): p. 773-795.

148

171. Auer, P.L. and G. Lettre, Rare variant association studies: considerations, challenges and opportunities. Genome Med, 2015. 7(1): p. 16. 172. Lee, S., et al., Rare-variant association analysis: study designs and statistical tests. Am J Hum Genet, 2014. 95(1): p. 5-23. 173. Pe'er, I., et al., Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet Epidemiol, 2008. 32(4): p. 381-5. 174. Hong, E.P. and J.W. Park, Sample size and statistical power calculation in genetic association studies. Genomics Inform, 2012. 10(2): p. 117-22. 175. Alberts, B. and J.H. Wilson, Molecular Biology of the Cell. 2008: Garland Science. 176. Maurano, M.T., et al., Systematic localization of common disease- associated variation in regulatory DNA. Science, 2012. 337(6099): p. 1190-5. 177. Vockley, C.M., A. Barrera, and T.E. Reddy, Decoding the role of regulatory element polymorphisms in complex disease. Curr Opin Genet Dev, 2016. 43: p. 38-45. 178. Phillips, T., Regulation of transcription and gene expression in eukaryotes. Nature Education, 2008. 1(1): p. 199. 179. Matharu, N. and N. Ahituv, Minor Loops in Major Folds: Enhancer- Promoter Looping, Chromatin Restructuring, and Their Association with Transcriptional Regulation and Disease. PLoS Genet, 2015. 11(12): p. e1005640. 180. Glisovic, T., et al., RNA-binding proteins and post-transcriptional gene regulation. FEBS Lett, 2008. 582(14): p. 1977-86. 181. Barrett, L.W., S. Fletcher, and S.D. Wilton, Regulation of eukaryotic gene expression by the untranslated gene regions and other non- coding elements. Cell Mol Life Sci, 2012. 69(21): p. 3613-34. 182. Valinezhad Orang, A., R. Safaralizadeh, and M. Kazemzadeh-Bavili, Mechanisms of miRNA-Mediated Gene Regulation from Common Downregulation to mRNA-Specific Upregulation. Int J Genomics, 2014. 2014: p. 970607.

149

183. Leung, A.K., The Whereabouts of microRNA Actions: Cytoplasm and Beyond. Trends Cell Biol, 2015. 25(10): p. 601-10. 184. Lee, Y. and D.C. Rio, Mechanisms and Regulation of Alternative Pre-mRNA Splicing. Annu Rev Biochem, 2015. 84: p. 291-323. 185. Gallant, S. and G. Gilkeson, ETS transcription factors and regulation of immunity. Archivum immunologiae et therapiae experimentalis, 2006. 54(3): p. 149-63. 186. Garrett-Sinha, L.A., Review of Ets1 structure, function, and roles in immunity. Cellular and molecular life sciences : CMLS, 2013. 2. 187. Fisher, R.J., et al., Human ETS1 oncoprotein. Purification, isoforms, -SH modification, and DNA sequence-specific binding. J Biol Chem, 1992. 267(25): p. 17957-65. 188. Donaldson, L.W., et al., Secondary structure of the ETS domain places murine Ets-1 in the superfamily of winged helix-turn-helix DNA-binding proteins. Biochemistry, 1994. 33(46): p. 13509-16. 189. Rabault, B. and J. Ghysdael, Calcium-induced phosphorylation of ETS1 inhibits its specific DNA binding activity. J Biol Chem, 1994. 269(45): p. 28143-51. 190. Fitzsimmons, D., et al., Highly cooperative recruitment of Ets-1 and release of autoinhibition by Pax5. J Mol Biol, 2009. 392(2): p. 452- 64. 191. Baillat, D., et al., ETS-1 transcription factor binds cooperatively to the palindromic head to head ETS-binding sites of the stromelysin- 1 promoter by counteracting autoinhibition. J Biol Chem, 2002. 277(33): p. 29386-98. 192. Seidel, J.J. and B.J. Graves, An ERK2 docking site in the Pointed domain distinguishes a subset of ETS transcription factors. Genes Dev, 2002. 16(1): p. 127-37. 193. Wang, D., et al., Ets-1 deficiency leads to altered B cell differentiation, hyperresponsiveness to TLR9 and autoimmune disease. International immunology, 2005. 17(9): p. 1179-91. 194. Bories, J.C., et al., Increased T-cell apoptosis and terminal B-cell differentiation induced by inactivation of the Ets-1 proto-oncogene. Nature, 1995. 377(6550): p. 635-8.

150

195. Nguyen, H.V., et al., The Ets-1 transcription factor is required for Stat1-mediated T-bet expression and IgG2a class switching in mouse B cells. Blood, 2012. 119(18): p. 4174-81. 196. Mouly, E., et al., The Ets-1 transcription factor controls the development and function of natural regulatory T cells. J Exp Med, 2010. 207(10): p. 2113-25. 197. John, S.A., et al., Ets-1 regulates plasma cell differentiation by interfering with the activity of the transcription factor Blimp-1. J Biol Chem, 2008. 283(2): p. 951-62. 198. Eyquem, S., et al., The development of early and mature B cells is impaired in mice deficient for the Ets-1 transcription factor. Eur J Immunol, 2004. 34(11): p. 3187-96. 199. Bhat, N.K., et al., Reciprocal expression of human ETS1 and ETS2 genes during T-cell activation: regulatory role for the protooncogene ETS1. Proceedings of the National Academy of Sciences of the United States of America, 1990. 87(10): p. 3723- 3727. 200. Eyquem, S., et al., The Ets-1 transcription factor is required for complete pre-T cell receptor function and allelic exclusion at the T cell receptor beta locus. (0027-8424 (Print)). 201. Moisan, J., et al., Ets-1 is a negative regulator of Th17 differentiation. J Exp Med, 2007. 204(12): p. 2825-35. 202. Grenningloh, R., B.Y. Kang, and I.C. Ho, Ets-1, a functional cofactor of T-bet, is essential for Th1 inflammatory responses. J Exp Med, 2005. 201(4): p. 615-26. 203. Lee, C.G., et al., Interaction of Ets-1 with HDAC1 represses IL-10 expression in Th1 cells. J Immunol, 2012. 188(5): p. 2244-53. 204. Polansky, J.K., et al., Methylation matters: binding of Ets-1 to the demethylated Foxp3 gene contributes to the stabilization of Foxp3 expression in regulatory T cells. J Mol Med (Berl), 2010. 88(10): p. 1029-40. 205. Boyman, O. and J. Sprent, The role of interleukin-2 during homeostasis and activation of the immune system. Nat Rev Immunol, 2012. 12(3): p. 180-90.

151

206. Nagaleekar, V.K., et al., IP3 Receptor-Mediated Ca2+ Release in Naive CD4 T Cells Dictates Their Cytokine Program. The Journal of Immunology, 2008. 181(12): p. 8315-8322. 207. Muthusamy, N., K. Barton, and J.M. Leiden, Defective activation and survival of T cells lacking the Ets-1 transcription factor. Nature, 1995. 377(6550): p. 639-42. 208. Hambor, J.E., et al., Identification and characterization of an Alu- containing, T-cell-specific enhancer located in the last intron of the human CD8 alpha gene. Mol Cell Biol, 1993. 13(11): p. 7056-70. 209. Clements, J.L., S.A. John, and L.A. Garrett-Sinha, Impaired generation of CD8+ thymocytes in Ets-1-deficient mice. J Immunol, 2006. 177(2): p. 905-12. 210. Zamisch, M., et al., The transcription factor Ets1 is important for CD4 repression and Runx3 up-regulation during CD8 T cell differentiation in the thymus. J Exp Med, 2009. 206(12): p. 2685-99. 211. Li, Q., et al., Antigen-induced Erk1/2 activation regulates Ets-1- mediated sensitization of CD8+ T cells for IL-12 responses. J Leukoc Biol, 2010. 87(2): p. 257-63. 212. Kieper, W.C., et al., IL-12 Enhances CD8 T Cell Homeostatic Expansion. The Journal of Immunology, 2001. 166(9): p. 5515-5521. 213. Barton, K., et al., The Ets-1 transcription factor is required for the development of natural killer cells in mice. Immunity, 1998. 9(4): p. 555-63. 214. Ramirez, K., et al., Gene deregulation and chronic activation in natural killer cells deficient in the transcription factor ETS1. Immunity, 2012. 36(6): p. 921-32. 215. Walunas, T.L., et al., Cutting edge: the Ets1 transcription factor is required for the development of NK T cells in mice. J Immunol, 2000. 164(6): p. 2857-60. 216. McKinlay, L.H., et al., The role of Ets-1 in mast cell granulocyte- macrophage colony-stimulating factor expression and activation. J Immunol, 1998. 161(8): p. 4098-105. 217. Han, J.-W., et al., Genome-wide association study in a Chinese Han population identifies nine new susceptibility loci for systemic lupus erythematosus. Nature genetics, 2009. 41(11): p. 1234-7.

152

218. He, C.F., et al., TNIP1, SLC15A4, ETS1, RasGRP3 and IKZF1 are associated with clinical features of systemic lupus erythematosus in a Chinese Han population. Lupus, 2010. 19(10): p. 1181-6. 219. Yang, W., et al., Genome-wide association study in Asian populations identifies variants in ETS1 and WDFY4 associated with systemic lupus erythematosus. PLoS genetics, 2010. 6(2): p. e1000841-e1000841. 220. Zhong, H., et al., Replicated associations of TNFAIP3, TNIP1 and ETS1 with systemic lupus erythematosus in a southwestern Chinese population. Arthritis research & therapy, 2011. 13(6): p. R186-R186. 221. Lee, H.-S., et al., Ethnic specificity of lupus-associated loci identified in a genome-wide association study in Korean women. Annals of the rheumatic diseases, 2013. 222. Sullivan, K.E., et al., 3' polymorphisms of ETS1 are associated with different clinical phenotypes in SLE. (1059-7794 (Print)). 223. He, C.F., et al., TNIP1, SLC15A4, ETS1, RasGRP3 and IKZF1 are associated with clinical features of systemic lupus erythematosus in a Chinese Han population. (1477-0962 (Electronic)). 224. Wang, C., et al., Genes identified in Asian SLE GWASs are also associated with SLE in Caucasian populations. Eur J Hum Genet, 2013. 21(9): p. 994-9. 225. Morris, D.L., et al., Genome-wide association meta-analysis in Chinese and European individuals identifies ten new loci associated with systemic lupus erythematosus. Nat Genet, 2016. 48(8): p. 940- 6. 226. Leng, R.X., et al., The dual nature of Ets-1: focus to the pathogenesis of systemic lupus erythematosus. Autoimmun Rev, 2011. 10(8): p. 439-43. 227. Saelee, P., et al., Genome-Wide Identification of Target Genes for the Key B Cell Transcription Factor Ets1. Front Immunol, 2017. 8: p. 383. 228. Tsao, H.-W., et al., Ets-1 facilitates nuclear entry of NFAT proteins and their recruitment to the IL-2 promoter. Proceedings of the National Academy of Sciences, 2013. 110(39): p. 15776-15781.

153

229. Lim, E.-J., et al., Toll-like receptor 9 dependent activation of MAPK and NF-kB is required for the CpG ODN-induced matrix metalloproteinase-9 expression. Exp Mol Med, 2007. 39: p. 239- 245. 230. Pan, H.F., et al., Ets-1: a new player in the pathogenesis of systemic lupus erythematosus? Lupus, 2011. 20(3): p. 227-30. 231. Georgiou, P., et al., Expression of ets family of genes in systemic lupus erythematosus and Sjogren's syndrome. Int J Oncol, 1996. 9(1): p. 9-18. 232. Li, Y., et al., Expression analysis of ETS1 gene in peripheral blood mononuclear cells with systemic lupus erythematosus by real-time reverse transcription PCR. Chinese medical journal, 2010. 123(16): p. 2287-8. 233. Baechler, E.C., et al., Interferon-inducible gene expression signature in peripheral blood cells of patients with severe lupus. Proceedings of the National Academy of Sciences of the United States of America, 2003. 100(5): p. 2610-2615. 234. Bennett, L., et al., Interferon and granulopoiesis signatures in systemic lupus erythematosus blood. J Exp Med, 2003. 197(6): p. 711-23. 235. Han, G.M., et al., Analysis of gene expression profiles in human systemic lupus erythematosus using oligonucleotide microarray. Genes Immun, 2003. 4(3): p. 177-86. 236. Crow, M.K. and J. Wohlgemuth, Microarray analysis of gene expression in lupus. Arthritis Res Ther, 2003. 5(6): p. 279-87. 237. Mali, P., et al., RNA-guided human genome engineering via Cas9. Science, 2013. 339(6121): p. 823-6. 238. Sander, J.D. and J.K. Joung, CRISPR-Cas systems for editing, regulating and targeting genomes. Nat Biotechnol, 2014. 32(4): p. 347-55. 239. Claussnitzer, M., et al., FTO Obesity Variant Circuitry and Adipocyte Browning in Humans. N Engl J Med, 2015. 373(10): p. 895-907. 240. Heinz, N., K. Hennig, and R. Loew, Graded or threshold response of the tet-controlled gene expression: all depends on the

154

concentration of the transactivator. BMC Biotechnol, 2013. 13: p. 5. 241. Fillebeen, C., N. Wilkinson, and K. Pantopoulos, Electrophoretic Mobility Shift Assay (EMSA) for the Study of RNA-Protein Interactions: The IRE/IRP Example. 2014(94): p. e52230. 242. Consortium, E.P., An integrated encyclopedia of DNA elements in the human genome. Nature, 2012. 489(7414): p. 57-74. 243. Roadmap Epigenomics, C., et al., Integrative analysis of 111 reference human epigenomes. Nature, 2015. 518(7539): p. 317-30. 244. Heng, T.S., M.W. Painter, and C. Immunological Genome Project, The Immunological Genome Project: networks of gene expression in immune cells. Nat Immunol, 2008. 9(10): p. 1091-4. 245. Wu, C., et al., BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources. Genome Biol, 2009. 10(11): p. R130. 246. Wu, C., I. Macleod, and A.I. Su, BioGPS and MyGene.info: organizing online, gene-centric information. Nucleic Acids Res, 2013. 41(Database issue): p. D561-5. 247. Griffon, A., et al., Integrative analysis of public ChIP-seq experiments reveals a complex multi-cell regulatory landscape. Nucleic Acids Res, 2015. 43(4): p. e27. 248. Weirauch, M.T., et al., Determination and inference of eukaryotic transcription factor sequence specificity. Cell, 2014. 158(6): p. 1431-43. 249. Wang, J., et al., Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res, 2012. 22(9): p. 1798-812. 250. Holden, N.S. and C.E. Tacon, Principles and problems of the electrophoretic mobility shift assay. J Pharmacol Toxicol Methods, 2011. 63(1): p. 7-14. 251. LI-COR Odyssey® Infrared EMSA Kit: Instruction Manual. 2014. 252. Inc., M.B. µMACS FactorFinder Kit: Data sheet. 2014. 253. Lu, X., et al., Lupus Risk Variant Increases pSTAT1 Binding and Decreases ETS1 Expression. Am J Hum Genet, 2015. 96(5): p. 731- 9.

155

254. Xu, J., et al., Osr1 acts downstream of and interacts synergistically with Six2 to maintain nephron progenitor cells during kidney organogenesis. Development, 2014. 141(7): p. 1442-52. 255. Yang, T.-P., et al., Genevar: a database and Java application for the analysis and visualization of SNP-gene associations in eQTL studies. Bioinformatics (Oxford, England), 2010. 26(19): p. 2474- 2476. 256. Fort, A., et al., A liver enhancer in the fibrinogen gene cluster. Vol. 117. 2011. 276-282. 257. Solberg, N. and S. Krauss, Luciferase assay to study the activity of a cloned promoter DNA fragment. Methods Mol Biol, 2013. 977: p. 65-78. 258. Rahman, M., et al., A repressor element in the 5'-untranslated region of human Pax5 exon 1A. Gene, 2001. 263(1-2): p. 59-66. 259. Mali, P., et al., RNA-Guided Human Genome Engineering via Cas9. Science, 2013. 339(6121): p. 823-826. 260. Kottyan, L.C., et al., The IRF5-TNPO3 association with systemic lupus erythematosus has two components that other autoimmune disorders variably share. Hum Mol Genet, 2014. 261. Vaughn, S.E., et al., Lupus risk variants in the PXK locus alter B- cell receptor internalization. Frontiers in Genetics, 2015. 5. 262. Farh, K.K., et al., Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature, 2014. 263. Hu, X., et al., Integrating autoimmune risk loci with gene-expression data identifies specific pathogenic immune cell subsets. Am J Hum Genet, 2011. 89(4): p. 496-506. 264. Dorner, T., C. Giesecke, and P. Lipsky, Mechanisms of B cell autoimmunity in SLE. Arthritis research & therapy, 2011. 13(5): p. 243. 265. Wang, H., et al., The abnormal apoptosis of T cell subsets and possible involvement of IL-10 in systemic lupus erythematosus. Cell Immunol, 2005. 235(2): p. 117-21. 266. Hirschhorn, J.N., et al., A comprehensive review of genetic association studies. Genet Med, 2002. 4(2): p. 45-61.

156

267. Cui, Y., Y. Sheng, and X. Zhang, Genetic susceptibility to SLE: recent progress from GWAS. Journal of autoimmunity, 2013. 41: p. 25-33. 268. Jakes, R.W., et al., Systematic review of the epidemiology of systemic lupus erythematosus in the Asia-Pacific region: prevalence, incidence, clinical features, and mortality. Arthritis Care Res (Hoboken), 2012. 64(2): p. 159-68. 269. Ramana, C.V., et al., Complex roles of Stat1 in regulating gene expression. Oncogene, 2000. 19(21): p. 2619-27. 270. Guo, Y. and D.C. Jamison, The distribution of SNPs in human gene regulatory regions. BMC Genomics, 2005. 6: p. 140. 271. Gaulton, K.J., et al., A map of open chromatin in human pancreatic islets. Nat Genet, 2010. 42(3): p. 255-9. 272. McDaniell, R., et al., Heritable Individual-Specific and Allele- Specific Chromatin Signatures in Humans. Science, 2010. 328(5975): p. 235. 273. Birney, E., et al., Allele-specific and heritable chromatin signatures in humans. Human Molecular Genetics, 2010. 19(R2): p. R204- R209. 274. Rebhan, M., et al., GeneCards: integrating information about genes, proteins and diseases. Trends Genet, 1997. 13(4): p. 163. 275. Fagerberg, L., et al., Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol Cell Proteomics, 2014. 13(2): p. 397-406. 276. Duff, M.O., et al., Genome-wide identification of zero nucleotide recursive splicing in Drosophila. Nature, 2015. 521(7552): p. 376- 9. 277. Wang, K.C. and H.Y. Chang, Molecular mechanisms of long noncoding RNAs. Molecular cell, 2011. 43(6): p. 904-914. 278. Bhartiya, D. and V. Scaria, Genomic variations in non-coding RNAs: Structure, function and regulation. Genomics, 2016. 107(2-3): p. 59-68.

157

279. Zhu, Z., et al., Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet, 2016. 48(5): p. 481-7. 280. Darnell, J.E., Jr., I.M. Kerr, and G.R. Stark, Jak-STAT pathways and transcriptional activation in response to IFNs and other extracellular signaling proteins. Science, 1994. 264(5164): p. 1415- 21. 281. Zhong, Z., Z. Wen, and J.E. Darnell, Jr., Stat3 and Stat4: members of the family of signal transducers and activators of transcription. Proc Natl Acad Sci U S A, 1994. 91(11): p. 4806-10. 282. Abelson, A.K., et al., STAT4 associates with systemic lupus erythematosus through two independent effects that correlate with gene expression and act additively with IRF5 to increase risk. Ann Rheum Dis, 2009. 68(11): p. 1746-53. 283. Graham, R.R., et al., Genetic variants near TNFAIP3 on 6q23 are associated with systemic lupus erythematosus. Nat Genet, 2008. 40(9): p. 1059-61. 284. Remmers, E.F., et al., STAT4 and the risk of rheumatoid arthritis and systemic lupus erythematosus. N Engl J Med, 2007. 357(10): p. 977-86. 285. Sanchez, E., et al., Identification of novel genetic susceptibility loci in African American lupus patients in a candidate gene association study. Arthritis Rheum, 2011. 63(11): p. 3493-501. 286. Yuan, H., et al., A meta-analysis of the association of STAT4 polymorphism with systemic lupus erythematosus. Mod Rheumatol, 2010. 20(3): p. 257-62. 287. Namjou, B., et al., High-density genotyping of STAT4 reveals multiple haplotypic associations with systemic lupus erythematosus in different racial groups. Arthritis Rheum, 2009. 60(4): p. 1085-95. 288. Suarez-Gestal, M., et al., Replication of recently identified systemic lupus erythematosus genetic associations: a case-control study. Arthritis Res Ther, 2009. 11(3): p. R69. 289. Okada, Y., et al., A genome-wide association study identified AFF1 as a susceptibility locus for systemic lupus eyrthematosus in Japanese. PLoS Genet, 2012. 8(1): p. e1002455.

158

290. Raj, P., et al., Regulatory polymorphisms modulate the expression of HLA class II molecules and promote autoimmunity. Elife, 2016. 5. 291. Platanias, L.C., Mechanisms of type-I- and type-II-interferon- mediated signalling. Nat Rev Immunol, 2005. 5(5): p. 375-86. 292. Bacon, C.M., et al., Interleukin 12 induces tyrosine phosphorylation and activation of STAT4 in human lymphocytes. Proc Natl Acad Sci U S A, 1995. 92(16): p. 7307-11. 293. Yamamoto, K., et al., Binding sequence of STAT4: STAT4 complex recognizes the IFN-gamma activation site (GAS)-like sequence (T/A)TTCC(C/G)GGAA(T/A). Biochem Biophys Res Commun, 1997. 233(1): p. 126-32. 294. Kariuki, S.N., et al., Cutting edge: autoimmune disease risk variant of STAT4 confers increased sensitivity to IFN-alpha in lupus patients in vivo. J Immunol, 2009. 182(1): p. 34-8. 295. Niewold, T.B., Type I interferon in human autoimmunity. Front Immunol, 2014. 5: p. 306. 296. Niewold, T.B., et al., High serum IFN-alpha activity is a heritable risk factor for systemic lupus erythematosus. Genes Immun, 2007. 8(6): p. 492-502. 297. Weckerle, C.E., et al., Network analysis of associations between serum interferon-alpha activity, autoantibodies, and clinical features in systemic lupus erythematosus. Arthritis Rheum, 2011. 63(4): p. 1044-53. 298. Ronnblom, L.E., G.V. Alm, and K.E. Oberg, Possible induction of systemic lupus erythematosus by interferon-alpha treatment in a patient with a malignant carcinoid tumour. J Intern Med, 1990. 227(3): p. 207-10. 299. Ronnblom, L.E., G.V. Alm, and K.E. Oberg, Autoimmunity after alpha-interferon therapy for malignant carcinoid tumors. Ann Intern Med, 1991. 115(3): p. 178-83. 300. Munroe, M.E., et al., Altered type II interferon precedes autoantibody accrual and elevated type I interferon activity prior to systemic lupus erythematosus classification. Ann Rheum Dis, 2016. 75(11): p. 2014-2021.

159

301. Jackson, S.W., et al., B cell IFN-gamma receptor signaling promotes autoimmune germinal centers via cell-intrinsic induction of BCL-6. J Exp Med, 2016. 213(5): p. 733-50. 302. Jung, H.H., et al., STAT1 and Nmi are downstream targets of Ets-1 transcription factor in MCF-7 human breast cancer cell. FEBS Lett, 2005. 579(18): p. 3941-6. 303. Yockell-Lelievre, J., et al., Functional cooperation between Stat-1 and ets-1 to optimize icam-1 gene transcription. Biochem Cell Biol, 2009. 87(6): p. 905-18. 304. Zhang, X., et al., Several transcription factors regulate COX-2 gene expression in pancreatic beta-cells. Mol Biol Rep, 2007. 34(3): p. 199-206. 305. Sharma, B. and R.V. Iozzo, Transcriptional silencing of perlecan gene expression by interferon-gamma. J Biol Chem, 1998. 273(8): p. 4642-6. 306. Ramana, C.V., et al., Regulation of c-myc expression by IFN-gamma through Stat1-dependent and -independent pathways. (0261-4189 (Print)). 307. McDaniell, R., et al., Heritable individual-specific and allele- specific chromatin signatures in humans. Science, 2010. 328(5975): p. 235-9. 308. Choi, S.W. and S. Friso, : A New Bridge between Nutrition and Health. Adv Nutr, 2010. 1(1): p. 8-16. 309. Marsit, C.J., Influence of environmental exposure on human epigenetic regulation. J Exp Biol, 2015. 218(Pt 1): p. 71-9. 310. Sluyter, J.D., et al., Dietary intakes of Pacific, Maori, Asian and European adolescents: the Auckland High School Heart Survey. (1753-6405 (Electronic)). 311. Baccarelli, A. and V. Bollati, Epigenetics and environmental chemicals. Current opinion in pediatrics, 2009. 21(2): p. 243-251. 312. Sekigawa, I., et al., DNA methylation in systemic lupus erythematosus. Lupus, 2003. 12(2): p. 79-85. 313. Akimoto, H., Global Air Quality and Pollution. Science, 2003. 302(5651): p. 1716.

160

314. Phillips, P.C., Epistasis--the essential role of gene interactions in the structure and evolution of genetic systems. Nat Rev Genet, 2008. 9(11): p. 855-67. 315. MacArthur, J., et al., The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic acids research, 2016: p. gkw1133. 316. Machiela, M.J. and S.J. Chanock, LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants. Bioinformatics, 2015. 31(21): p. 3555-7. 317. Okada, Y., et al., Meta-analysis identifies nine new loci associated with rheumatoid arthritis in the Japanese population. Nat Genet, 2012. 44(5): p. 511-6. 318. Okada, Y., et al., Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature, 2014. 506(7488): p. 376-81. 319. Stuart, P.E., et al., Genome-wide Association Analysis of Psoriatic Arthritis and Cutaneous Psoriasis Reveals Differences in Their Genetic Architecture. Am J Hum Genet, 2015. 97(6): p. 816-36. 320. Yin, X., et al., Genome-wide meta-analysis identifies multiple novel associations and ethnic heterogeneity of psoriasis susceptibility. Nat Commun, 2015. 6: p. 6916. 321. Comuzzie, A.G., et al., Novel genetic loci identified for the pathophysiology of childhood obesity in the Hispanic population. PLoS One, 2012. 7(12): p. e51954. 322. Ng, E., et al., Genome-wide association study of toxic metals and trace elements reveals novel associations. Hum Mol Genet, 2015. 24(16): p. 4739-45. 323. Skibola, C.F., et al., Genome-wide association study identifies five susceptibility loci for follicular lymphoma outside the HLA region. Am J Hum Genet, 2014. 95(4): p. 462-71. 324. Fabbri, C. and A. Serretti, Genetics of long-term treatment outcome in bipolar disorder. Prog Neuropsychopharmacol Biol Psychiatry, 2016. 65: p. 17-24.

161

325. Dubois, P.C., et al., Multiple common variants for celiac disease influencing immune gene expression. Nat Genet, 2010. 42(4): p. 295-302. 326. Paternoster, L., et al., Multi-ancestry genome-wide association study of 21,000 cases and 95,000 controls identifies new risk loci for atopic dermatitis. Nat Genet, 2015. 47(12): p. 1449-56. 327. Hinds, D.A., et al., A genome-wide association meta-analysis of self-reported allergy identifies shared and allergy-specific susceptibility loci. Nat Genet, 2013. 45(8): p. 907-11. 328. Hsu, P.D., et al., DNA targeting specificity of RNA-guided Cas9 nucleases. Nat Biotech, 2013. 31(9): p. 827-832.

162