Submitted Manuscript: Confidential template updated: February 28 2012

Title: Functional clusters of targeting TLR and SMAD pathways stratify novel subgroups of Systemic Erythematosus

Authors: M. J. Lewis1, M. B. McAndrew2, C. Wheeler2, N. Workman2, P. Agashe2, J. Koopmann2, E. Uddin2, L. Zou1, R. Stark2, J. Anson2, A. P. Cope3, T. J. Vyse4*

Affiliations: 1Centre for Experimental Medicine and Rheumatology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, EC1M 6BQ, UK. 2Oxford Technology, Unit 15, Oxford Industrial Park, Yarnton, Oxfordshire, OX5 1QU, UK. 3Academic Department of Rheumatology, Centre for Molecular and Cellular Biology of Inflammation, King’s College London, London, SE1 9RT, UK. 4Department of Medical and Molecular Genetics, King’s College London, London, SE1 9RT, UK. *To whom correspondence should be addressed: [email protected] Tel. +44 20 7188 8431 Fax +44 20 7188 2585

One Sentence Summary: Using an advanced design of microarray, we reveal the identity of over 100 targeted by autoantibodies in the Systemic Lupus Erythematosus, and show that these novel autoantigens cluster into functionally-related groups involved in toll-like pathways and SMAD signaling.

Abstract: The molecular targets of the vast majority of autoantibodies in systemic lupus erythematosus (SLE) are unknown. Using a baculovirus-insect cell expression system to create an advanced protein microarray with improved protein folding and epitope conservation, we assayed sera from 277 SLE individuals and 280 age, gender and ethnicity matched controls. Here, we identified 103 novel autoantigens in SLE sera and show that SLE autoantigens are distinctly clustered into four functionally related groups. The original SLE autoantigens Ro60, La, and SMN1/Sm complex formed the first distinct antigen cluster (SLE1a). A further set of antigens (SLE1b) extended this group with other proteins involved in RNA, DNA and chromatin processing. The largest two subgroups of novel autoantigens involved interconnected involved in receptor-regulated SMAD signalling (SLE2) and toll-like receptor pathways and lymphocyte development (SLE3). SLE individuals clustered into four subgroups matching the four functional clusters of autoantigens, which suggests that profiles of SLE patients identify subgroups with different underlying pathogenic immune mechanisms. The microarray revealed a significant number of ethnicity-sensitive autoantibodies in keeping with the strong influence of ethnicity on SLE prevalence, severity and organ involvement. Multiple recent failures of new SLE therapies in clinical trials demonstrates that there is a need for improved classification of SLE and related connective tissue diseases. The novel autoantibody clusters identified in this study represent a major advance in SLE diagnostics, and enable identification of new subgroups of SLE, each of which may require different therapeutic strategies.

2

[Main Text: ] Introduction Although first described in 1957, anti-nuclear antibodies (ANA) and anti-double-stranded DNA (dsDNA) antibodies remain the main diagnostic tests for systemic lupus erythematosus (SLE) (1, 2). Following the development of assays for extractable nuclear antigens Ro, La, Sm and U1- RNP, there have been no significant improvements in diagnostic assays for SLE for many years (3). In contrast, the identification of citrullinated proteins as autoantigen epitopes in (RA) led to a marked improvement in RA diagnostic tests with the development of anti- cyclic citrullinated peptide (CCP) assays. Although numerous SLE-associated autoantibodies have been described (4), they have not significantly improved upon the diagnostic and biomarker abilities of conventional SLE tests, and all too often the true molecular target remains undefined. Protein microarrays using spots of autoantigens applied to glass slides for detecting autoantibodies in sera of SLE patients were first reported in 2002 by Robinson et al (5). This initial study was largely based on existing autoantigens, as were subsequent similar studies, which identified several glomerular antigens and serum factors including B cell-activating factor (BAFF) as autoantibody targets in SLE individuals (6-10). Microarrays utilising large scale de novo synthesis of thousands of proteins have detected autoantibodies in cancer and other diseases (11, 12), but were only able to identify one novel SLE autoantigen, a chloride intracellular channel, CLIC2 (13). Older, first generation protein microarrays may have failed to identify autoantibodies for a number of reasons: bacterial and yeast protein expression systems employed in older studies are prone to generate proteins of poor conformation due to misfolding and lack of higher species post-translational modifications; furthermore, early protein microarrays were not designed to preserve protein epitopes in their native conformation. In this study, we describe a next generation protein microarray utilising 1545 distinct proteins chosen from multiple functional and disease pathways, to probe for the presence of autoantibodies to novel autoantigens in SLE. Our overall aim being to identify previously undiscovered autoantibodies that in combination might act as SLE biomarkers that will substantially improve the diagnostic (and potentially prognostic) performance over existing clinical assays. The protein microarray platform used in this study has been designed to display full-length, correctly folded proteins. Proteins bound to the array are expressed with a biotin carboxyl carrier protein (BCCP) folding tag using a baculovirus insect cell expression system and screened for correct folding before being arrayed in a specific, oriented manner designed to conserve native, discontinuous epitopes (Fig. 1A) (14, 15). This newer design of protein microarray has never before been applied to autoimmune disease. In this study, we set out to validate this novel protein microarray as a platform for improving antibody-based diagnostic tests and to identify the underlying nature of autoantigens in SLE.

Results Identification of novel SLE-associated autoantigens During the discovery phase of the study, serum samples from 86 SLE patients and 90 age, gender and ethnicity matched healthy controls were analysed for IgG autoantibody levels against 1545 correctly folded, full-length human proteins using the Discovery array v.3.0 protein array (Oxford Gene Technology, UK) (Fig. 1A). Samples were assayed by ELISA for ANA and anti- dsDNA antibodies for comparison. Candidate autoantigens were identified for validation in two

3 further cohorts of UK SLE patients and tested against a confounding autoimmune disease cohort including individuals with the following conditions: rheumatoid arthritis (n=68), systemic sclerosis (n=12) and primary Sjögren’s syndrome (n=6), polymyositis (n=3) and mixed connective tissue disease (n=3). After correction for multiple testing, the array identified a total of 116 autoantibodies (Figure 1B, Table 1), with higher antibody levels in SLE patients compared to controls, at a false discovery rate (FDR) of <0.01. The well-known SLE autoantigens TROVE2 (Ro60) and SSB (La) showed the most significant difference between SLE and control groups. The array validated a further 11 previously reported SLE autoantigens (Fig. 1C): PABPC1 (16), ANXA1 (Annexin A1) (17), HMGB2 (18), HNRNPA2B1 (19, 20), PSME3 (also known as Ki) (21), ARAF (22), DEK (23), MAP3K14 (NIK) (23), APEX1 (24), MAGEB1 and MAGEB2 (25), RPL10 (26) and PSIP1 (27). A total of 103 novel autoantigens were identified by the microarray, with the most statistically significant six novel autoantibodies shown in Fig. 1D. Ten of the autoantigens have been shown to be implicated in SLE pathogenesis through immunological studies, but were not previously known to be autoantigens: CREB1, ZAP70, VAV1, PPP2CB, TEK (Tie2 receptor), IRF4, IRF5, EGR2, PPP2R5A and LYN (28-33). Five novel autoantigens are the products of SLE susceptibility genes including the well known genes IRF5 and LYN, as well as PIK3C3, NFKBIA and DNAJA1 (34-38). In summary, 26 of the 116 autoantigens have a previously identified link to SLE, either as known autoantibody targets or directly implicated in SLE pathogenesis. In a secondary analysis of all three cohorts pooled, autoantibodies from the array were ranked by positivity in SLE patients, defined as autoantibody levels >2 SD of the control population, and tested for statistical significance using Fisher’s exact test, corrected for multiple testing. The most prevalent autoantibodies were the known SLE autoantigens Ro60 (37.5%), SSB/La (35.4%), HNRNPA2B1 (29.6%) and PSME3/Ki (23.8%) (Fig. 2A). The most prevalent novel autoantibodies were LIN28A (22.4%), IGF2BP3 (21.7%) and HNRNPUL1 (21.3%). SLE patients tended to be simultaneously positive for multiple autoantibodies in contrast to the control group (Kruskal-Wallis test with post-hoc Nemenyi test, P<2 x 10-16) and confounding disease group (P=6.7 x 10-9), with some individuals producing antibodies against over 60 antigens (Fig. 2B).

SLE autoantibodies cluster into distinct subgroups Since we observed that groups of autoantibodies showed strong cross-correlation, we performed unsupervised hierarchical clustering of autoantibody levels in SLE individuals (n=277). Four distinct clusters of autoantibodies (Clusters 1a, 1b, 2 and 3) could be observed (Fig. 3A), which were able to delineate SLE individuals into four corresponding subgroups SLE 1a, 1b, 2 and 3. SLE subgroup 1a individuals were characterised by being strongly anti-Ro60 and anti-La positive. SLE subgroup 2 showed the broadest range of autoantibody positivity, and were often positive with large numbers of antibodies across all four antibody clusters. SLE subgroup 3 were mainly positive for cluster 3 antibodies, while SLE group 1b showed a more mixed pattern. A correlation plot (Fig. 3B) confirmed strong internal correlation within each antibody cluster, validating the grouping of the autoantibodies. Cluster 1a autoantibodies, which include Ro60, La, PSME3 (Ki) and two novel antigens SMN1 and RQCD1, did not show correlation with other autoantibodies, and anti-SMN1 and anti-RQCD1 showed a tendency to slight inverse correlation with other autoantibodies. Similarly antibodies to SUB1, TGIF2 and NHLH1 from cluster 1b also showed evidence of inverse correlation with other clusters. Clusters 2 and 3 showed some evidence of overlap, with a small number of cluster 2 autoantibodies such as those targeting

4

SMAD2, SMAD5 and PPP2CB being correlated with large numbers of cluster 3 autoantibodies. Anti-ZAP70, -CLK1 and -BIRC3 (cIAP-2) from cluster 3 showed a tendency to inverse correlation with most cluster 1b and 2 autoantibodies. These groupings are borne out in the heatmap (Fig. 3A) and a condensed heatmap of the SLE subgroups (Fig. S1) which show that SLE group 2 tended to be positive for large numbers of antibodies across all 4 clusters, including cluster 3 antibodies. SLE group 3 showed stronger positivity for cluster 3 antibodies, with lower levels of positivity for cluster 2 antibodies. Cluster 2 autoantigens SMAD2, SMAD5, CSNK2A1 and CSNK2A2 showed the strongest positivity in SLE group 2, but also showed some degree of positivity across all four SLE groups. The existence of subgroups of autoantibody response was further confirmed by clustering of all four SLE subgroups when data were re-analysed with inclusion of controls (n=280) using principal component analysis (PCA) as an alternative unsupervised method (Fig. 4A, Movie S1 and Fig. S2A). PCA showed delineation between SLE patient clusters 1a, 2 and 3 on PC2 and PC3, with PC1 aiding delineation between control and SLE individuals as well as SLE cluster 1b. Component loadings plots showed separation of cluster 1a and 1b from clusters 2 and 3 on PC1 and PC2, with division of clusters 2 and 3 best seen on PC3 and separation of cluster 1b on PC5 (Fig. 4B, Movie S2 and Fig. S2B). These analyses lend weight to the clustering of both the autoantigens and the subgroups of SLE individuals. To probe whether the autoantibody clusters were linked to differential SLE phenotype, we compared ANA and dsDNA antibody levels in the four SLE subgroups. SLE group 1a, whose individuals are strongly Ro60 and La positive, showed very high levels of ANA positivity (P<0.01, Kruskal-Wallis test), while SLE group 2 showed the highest levels of anti-dsDNA antibodies (P<0.01) (Fig. 4C). Groups 1b and 3 showed significantly lower levels of both ANA and dsDNA antibodies, consistent with these groups being distinct entities. Furthermore, analysis of ANA negative individuals showed that these were almost exclusively in subgroups 1b and 3, which constituted 90% of ANA negative individuals (P=1.3 x 10−9, Chi square test) (Fig. 4D). Similarly subgroups 1b and 3 made up 69% of dsDNA negative SLE individuals (P=8.4 x 10−5). ANA and dsDNA antibody levels were measured by ELISA and were not based on patient clinical records. SLE subphenotype clinical data was available on the UK SLE cohorts (184/277). Generally, SLE organ involvement and clinical features were not significantly different between SLE subgroups as defined by the novel autoantibodies, with the exception of arthritis, which had the highest frequency in SLE group 2 (P=0.022, Fisher’s exact test) (Fig. 4E). There were significant differences in ethnicity between SLE subgroups (Fig. 4F) (P=0.032, Fisher’s exact test), so the effect of ethnicity on autoantibodies was analysed in more depth.

The effect of ethnicity on SLE autoantibody levels Further analysis identified SLE autoantibodies with significantly different autoantibody levels in the three SLE ethnic groups at FDR<0.01 (one-way ANOVA with FDR correction) (Fig. 5A): HNRNP2AB1 showed the highest antibody positivity in African ancestry SLE, APOBEC3G and HNRNPUL1 showed the highest positivity in Hispanic SLE individuals, and ARAF showed the strongest pro-European SLE bias (Fig. 5B). The highest number of ethnicity sensitive autoantibodies was seen for African ancestry SLE, in comparison to lower numbers of European SLE-sensitive autoantibodies. Linear discriminant analysis (LDA) was applied to determine whether SLE patients could be separated into different ethnic groups based on autoantibody data alone. LDA was able to reliably classify the three main SLE groups from the control population

5

(Fig. 5C and Movie S3). The first linear discriminant (LD1) delineated African/Afro-Caribbean SLE patients, LD2 delineated both European and Hispanic SLE individuals and LD3 separated Hispanic from European SLE (P=4.3 x 10−10, one-way ANOVA with Tukey post-hoc tests). Overall LDA was able to reliably classify the three main SLE ethnic groups as well as the control population in over 83% of individuals (Table D in Fig. 5). These results suggest that ethnicity is an important modulator of autoantibody response.

Functional annotation of novel autoantigens To examine whether the clusters of autoantigens identified associated with different SLE subgroups showed themes of molecular or functional categorisation, each cluster of autoantigens was investigated using STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) database, and cross-referenced against Ingenuity Pathway Analysis and the PANTHER (protein annotation through evolutionary relationship) classification system. Protein interaction networks identified using STRING were discernible in cluster 2 and cluster 3 autoantigens. In cluster 2, the largest network of interacting proteins were centred around SMAD2, SMAD5 and included proteins associated with TGF-β, Wnt and bone morphogenic protein (BMP) signalling such as PPP2CB, ID2, TWIST2, CSNK2A1 and CSNK2A2 (Fig. 6A). In cluster 3, a protein network of genes involved in TLR signalling and NF-κB activation including MYD88, BIRC3 (cIAP-2), NFKBIA (IB), MAP3K7 (TAK1) and MAP3K14 (NIK) was observed, interlinked with genes involved in apoptosis regulation such as BIRC3 (cIAP-2), ANXA1 (Annexin A1), CASP9 (caspase-9), ZMYND11 and BCL2A1 (Fig. 6B). The cluster 3 network also incorporated key proteins involved in lymphocyte differentiation such as VAV1, EGR2, ZAP70 and SH2B1. STRING identified TGFBR1 (TGF-β receptor 1) and RELA (NF-κB p65) as predicted functional nodes for clusters 2 and 3 respectively (prediction score 0.999). Nearly all (22/23) of the functional nodes predicted by STRING algorithm, were absent from the protein array. The only predicted node which was found to be already included in the protein array was GTF2F1. Although GTF2F1 did not reach statistical significance for the pooled analysis, it almost reached statistical significance in USA SLE cohort (Padjusted=0.055), with 4.7% of the USA SLE cohort being positive for antibodies against GTF2F1. Interestingly GTF2F1 has been identified as a novel SLE susceptibility in a recent meta-analysis of Chinese SLE individuals (T. Vyse, unpublished data). In cluster 1b, autoantigens were mainly involved in RNA processing, including mRNA editing (APOBEC3G, PABPC1, IGF2BP3), or they were DNA binding proteins (ZNF207, HMGB2, HMG20B, TGIF2) or transcription factors (HOXB6, NHLH1). The functional themes of the autoantigen clusters are summarised in Fig. 6C.

Improved diagnostic autoantibody panels Logistic regression using Bayesian general linear model was employed to identify optimal antibody panels for SLE diagnosis. Increasing the number of autoantibodies allowed in the model gave clear benefits up to approximately 50 autoantibodies (Fig. 7A), as demonstrated by increasing area under curve (AUC) on Receiver Operating Characteristic (ROC) curve analysis. Backward stepwise selection identified more accurate and efficient models than forward selection, and the optimal model constituted 56 autoantibodies, when using Akaike information criteria (k=2). Using the Bayesian information criteria (k=log n), to apply a greater penalty for model variables, logistic regression with backward stepwise selection identified a 24- autoantibody panel. Both the 56- and 24-autoantibody panels were compared against the standard ANA and dsDNA antibody diagnostic tests for SLE and a combined logistic regression

6 model of ANA+dsDNA, using ROC curve analysis (Fig. 7B, Table S2). Both panels of novel autoantibodies showed higher sensitivity at given specificities than ANA or dsDNA alone and ANA+dsDNA combined (Table S3). To mimic the real world situation in a typical rheumatology clinic, we added the confounding group individuals (mostly RA individuals) to the control group, to determine whether the novel autoantibody models were able to delineate individuals with SLE from non-SLE (Fig. 7C). Low level ANA is commonly observed in pregnant or individuals, as well as being associated with autoimmune thyroid or liver disease and inflammatory bowel disease. Thus in clinical practice ANA performs more poorly as it is significantly less specific than dsDNA antibodies, especially at lower titres required for higher assay sensitivity. At a specificity of 92%, the 56-antibody panel reached a sensitivity of 74.4% compared to 65.2% for ANA test. At a stringent specificity of 96%, the 56 and 24 autoantibody panels demonstrated substantially better sensitivity (59.2% and 57.0% respectively) than conventional tests ANA (27.7%), dsDNA (41.2%) or combined ANA+dsDNA (42.4%) (Table D in Fig. 7). Thus both the 56- and 24-autoantibody panels showed superior diagnostic performance compared to conventional assays. Analysis of the 56-autoantibody panel (Fig. 7E) showed that it contained autoantibodies from all four autoantibody clusters identified earlier in Fig. 3A, with enrichment for autoantibody clusters 1b and 3, and the majority of these showed specificity for particular SLE patient subgroups (statistical analysis by one-way ANOVA). Similarly the 56-autoantibody panel elements showed specificity for SLE according to ethnicity (Fig. 7F). This suggests that the autoantibodies selected in the optimal model provided sufficient coverage for identifying SLE in all three ethnic groups tested in this study.

Discussion Using a novel protein microarray design optimised to enhance presentation of correctly folded proteins, we have identified 103 proteins as novel autoantigens in SLE, and confirmed 13 known SLE autoantibody targets. The new design of microarray used in our study found a large number of novel autoantigens in contrast to previous proteomic approaches to autoantigen discovery. Mass spectrometry based techniques have only identified a handful of new autoantigens due to the need to pool SLE serum samples to produce sufficient quantities of protein for analyses (24). A previous study, which profiled SLE patients’ sera against a microarray of 5011 proteins synthesised in yeast, was only able to validate a single autoantigen (13). However, a striking feature of the novel SLE autoantigens found in our study is that large numbers are clearly identifiable as important immune system regulators, in some cases already implicated in SLE pathogenesis. Thirteen of the 103 novel autoantigens have been directly implicated in studies of SLE pathogenesis or as SLE susceptibility genes. This helps to confirm the validity of this new protein microarray approach for identification of novel autoantibodies. We also identified important differences in SLE autoantigen responses between ethnicities, with groups of autoantibodies showing greater positivity in SLE patients of different ethnicities (Fig. 5). It is well known that ethnicity plays an important role in SLE susceptibility, since SLE is significant more prevalent in different ethnic groups with the highest prevalence in individuals of African ancestry (39), with more severe disease in Hispanic and African ancestry individals (40). Underlying genetic differences are likely to be an important factor in the altered autoantibody response between ethnicities. Unsupervised hierarchical clustering of the novel autoantigens revealed four SLE subgroups associated with four clusters of autoantigens (Fig. 3A). The clustering designation of both the SLE subgroups and autoantigen clusters was strongly supported by principal component

7 analysis (Fig. 4A, B). The most well known autoantigens form their own cluster 1a, which includes TROVE2 (Ro60), SSB (La), the proteasome subunit PSME3 (Ki, PA28 gamma) and SMN1, which forms a complex with the Sm autoantigen, and in turn associates with U1-RNP autoantigen as part of the spliceosome complex. Cluster 1b includes known autoantigens HNRNPA2B1, PABPC1 and HMGB2. Autoantigens in clusters 1a and 1b are distinguished by a theme of involvement in RNA processing including mRNA decay (RQCD1), mRNA splicing (SMN1), mRNA editing (APOBEC3G, PABPC1, IGF2BP3), nucleocytoplasmic RNA transport (Ro60, SSB/La and HNRNPA2B1) and microRNA binding (LIN28A). Other 1b antigens are involved in chromatin remodelling and DNA binding. Comparison with ANA and dsDNA antibody levels showed that group 1a were strongly ANA positive and group 2 were strongly dsDNA positive. Group 1b and 3 showed lower levels of ANA and dsDNA antibodies and 90% of the ANA individuals were from SLE1b and SLE3. Thus, the cluster 1b and 3 autoantigens are of clinical importance for detecting ANA negative and dsDNA negative SLE individuals. We analysed the clustered autoantigens using the STRING database to search for protein- protein interactions (Fig. 6). Two key themes emerged: in cluster 2, a group of autoantigens centred around SMAD2 and SMAD5 are linked to TGF-/ Wnt/ BMP signalling; cluster 3 autoantigens are implicated in TLR signalling, NF-kB activation and apoptosis regulation, as well as B and T lymphocyte development. STRING protein-protein interaction analysis identified TGF- receptor 1 as a predicted node for cluster 2 and RELA (p65 NF-kB) as a predicted cluster 3 node. SLE2 subgroup showed the highest levels of arthritis and Raynaud’s, consistent with TGF-β pathway overactivity. Intriguingly, the autoantigens involved in TLR/NF- kB signalling and apoptosis regulation clustered as autoantigens for a distinct subgroup of SLE patients, SLE3. Excess TLR7 activity is linked to development of SLE (41), and we identified autoantigens involved in toll-like receptor (TLR) signalling and which converge on NF-κB: MYD88, BIRC3 (cIAP-2), MAP3K7 (TAK1), MAP3K14 (NIK). cIAP-2 acts downstream of CD40, BAFF receptor and lymphotoxin-β receptor to induce non-canonical activation of NF-κB by NIK (42). TAK1 and cIAP-2 are responsible for regulating apoptosis mediated by the RIPK1 ripoptosome (42). Cluster 3 autoantigens also included ANXA1 (annexin A1), which has pleiotropic effects including induction of neutrophil apoptosis (43). TLR7 is upregulated in SLE neutrophils and primes neutrophils for production of neutrophil extracellular traps (NETs), leading to activation of plasmacytoid dendritic cells through activation of TLR9 (44, 45). Thus one possible explanation for the cluster 3 antigen grouping is that these antigens may be derived from neutrophils primed for NET formation, which is thought to be a source of antigen for anti- dsDNA antibody development. SLE3 cluster autoantigens also included transcription factors and adaptors important for regulating T and B cell development including ZAP70, EGR2, CREB1 and VAV1. Egr-2 controls T cell self tolerance and Egr2 deficient mice develop lupus-like autoimmune disease with T cell activation, proliferation and organ infiltration associated with glomerulonephritis (31). The protooncogene VAV1 and the DNA methylation enzyme EZH2 are important for haematopoiesis. Type I interferon production plays an important role in plasma cell differentiation and SLE pathogenesis, and consistent with this, IRF5 was identifed as cluster 3 autoantigen. Although the unsupervised clustering of the antigens matched protein-protein networks with identifiable functional roles, a few exceptions are apparent. For example IRF4 did not cluster with the related type 1 interferon response gene IRF5 in cluster 3, even though IRF4 is important for germinal centre formation and plasma cell differentiation (46). TGIF2, which acts as a corepressor for TGF-β activated Smad3 (47), was not observed in cluster 2.

8

A conundrum in SLE pathogenesis is the cellular source of autoantigens. The demonstration that anti-Ro and anti-La antibodies in SLE sera bound apoptotic cell blebs (48), and association of complement deficiency with SLE led to the ‘waste disposal hypothesis’, which proposed that defective clearance of dying cells is the source of autoantigen exposure (49). The discovery of autoantigens involved in apoptosis regulation is consistent with defective apoptosis being an important factor in SLE pathogenesis. The original SLE autoantigens Ro, La, Sm and U1-RNP do not display any discernible tissue expression signature, but are ubiquitously expressed. The novel antigens described in our study suggest that immune system lineage cells are the most likely source of cluster 3 autoantigens. The clustering of antigens into functional groups hints at different underlying pathogenic mechanisms defining SLE subgroups, even though the clinical outcome appears to show substantial overlap between the SLE subgroups. Different patterns of immune system dysregulation due to genetic or environmental factors such as separate viruses may drive differential autoantigen exposure between SLE subgroups. If the new classes of autoantibodies represent different underlying pathogenic mechanisms, this would have important clinical ramifications, with the prediction that the SLE subgroups defined by this study might require different treatment strategies. For example, patients with NF-κB and B cell differentiation genes as antigens may be a subgroup which are more likely to respond to B cell therapies such as rituximab and belimumab, while patients with TGF-β/ Wnt signalling pathways may be at higher risk of organ fibrosis and manifestations overlapping with systemic sclerosis, and thus require alternative therapies and disease monitoring. Some of the new autoantibodies may also have a role for monitoring disease activity in conjunction with anti-dsDNA antibodies. Positivity for cluster 2 autoantibodies was associated with a significantly higher incidence of arthritis, suggesting that some autoantibodies may have prognostic utility. Further studies are required to determine which autoantibodies are most useful for prognostic and monitoring purposes. Using multivariate statistical methods, we identified 56-autoantibody and 24- autoantibody biomarker panels which showed improved sensitivity and specificity for diagnosis of SLE in comparison to standard ANA and anti-dsDNA antibody diagnostic tests (Fig. 7D). The use of a repertoire of autoantibodies for SLE diagnosis has parallels with the large peptide libraries employed by anti-CCP assays for diagnosis of RA, and the 56-autoantibody biomarker panel brings the level of sensitivity and specificity for SLE diagnosis up to a similar level as seen with anti-CCP assays in RA (50). The 56-autoantibody biomarker panel includes a range of autoantigens across all four clusters and thus is able to detect all four SLE subgroups. Autoantigens in groups 1b and 3 in particular are well represented in the 56-autoantibody panel, which allows better detection of ANA or dsDNA negative SLE individuals. The biomarker panel autoantigens also show good coverage for identifying SLE in different ethnic groups. Our data suggest that these factors are important considerations for optimising future designs for SLE diagnostic assays. These results redefine SLE as divided into key subgroups based on autoantibody patterns. Surprisingly the autoantibodies not only cluster according to correlations between patients, but appear to be loosely grouped by functional pathways, with the two largest clusters of novel autoantigens involved in TGF-β/ Wnt signalling (autoantibody cluster 2) and TLR/ NF-κB and lymphocyte differentiation pathways (autoantibody cluster 3). The existence of the Toll pathway in flies as the homologue to mammalian TLR pathways may have aided our ability to detect TLR pathway autoantigens, since the insect cell expression system possesses post-translational

9 machinery (e.g. glycosylation, ubiquitination) closer to that found in mammalian immune signalling. In summary, our study shows that through technological improvements to protein microarray design with attention to optimal protein folding and synthesis, we have identified a large number of novel SLE autoantigens. Our study suggests that SLE can be subgrouped not on the basis of clinical phenotype, but on molecular signature identified through four distinct autoantibody patterns. We propose that each SLE subgroup may have distinct pathogenic and/or genetic mechanisms underlying the differential autoantigen response.

Materials and Methods Study population Serum samples were obtained from several groups of subjects: Discovery cohort 1 consisted of SLE samples sourced from USA (n=85) plus age, gender and ethnicity matched controls (n=90) (Seralabs); Validation cohort 1 consisted of UK SLE samples of European ancestry (n=96) and age/sex/ethnicity matched controls (n=98); Validation cohort 2 consisted of UK Afro-Caribbean SLE samples (n=96) and age/sex/ethnicity matched controls (n=92); Confounding/ interfering disease cohort included patients with the following conditions: systemic sclerosis (n=12), primary Sjögren’s syndrome (n=6), polymyositis (n=3) and mixed connective tissue disease (n=3) sourced from USA (Seralabs), and rheumatoid arthritis (RA) (n=68) obtained from multiple UK institutions. For validation cohorts 1 and 2, serum samples were previously obtained from SLE patients from multiple UK institutions. All SLE patients fulfilled the 1997 revised American College of Rheumatology (ACR) criteria for classification of SLE. RA patients fulfilled the 2010 ACR-EULAR (European League against Rheumatism) criteria for diagnosis of RA. For ethnicity analyses, the Discovery cohort and two validation cohorts were combined and analysed as three ethnic groups: African ancestry SLE (n=117 from Discovery cohort 1 and Validation cohort 2) and controls (n=111), Hispanic SLE (n=30 from Discovery cohort 1) and controls (n=33), and European ancestry SLE (n=130 from Discovery cohort 1 and Validation cohort 1) and controls (n=136).

Serum autoantibody measurement Anti-dsDNA antibody and anti-nuclear antibody (ANA) titres were measured in all serum samples by ELISA (QUANTA Lite cat no: 704650 & 708750; Inova Diagnostics, San Diego, USA), according to the manufacturer’s instructions. ANA and dsDNA positive and negative results were defined using the assay thresholds determined by the manufacturer, and were not based on historical case record results.

Protein array analysis The Discovery Array version 3.0 protein microarray employed in this study has been developed to display native, discontinuous epitopes using previously described technology (14, 15). Full- length human proteins were expressed with a biotin carboxyl carrier protein (BCCP)- folding tag via baculovirus vector in Sf9 insect cells. Each cell line was Sanger sequenced to ensure correct subcloning. Biotinylation of the BCCP tag, which requires correct folding of the BCCP sequence, was used to screen for correct protein folding. All cell lines were subject to rigorous quality control measures including being tested by Western blot to confirm synthesis of a single biotinylated protein product of correct molecular weight, reliable and consistent protein expression over time and consistent plating density across arrays. Proteins were thus arrayed on

10 streptavidin-coated, HydroGel-derivatized Schott Nexterion glass slides (Schott, Jena, Germany) in a specific, oriented manner designed to conserve native epitopes. Each microarray contained approximately 1550 human proteins representing 1545 distinct genes chosen from multiple functional and disease pathways printed in quadruplicate together with control proteins: four control proteins for the BCCP-myc tag (BCCP, BCCP-myc, β-galactosidase-BCCP-myc and β- galactosidase-BCCP) and additional controls including Cy3-labeled biotin-BSA, dilution series of biotinylated-IgG and biotinylated IgM and buffer-only spots. Microarrays were incubated with serum samples to allow detection of binding of serum immunoglobulins to specific proteins on the arrays, enabling the identification of both auto- antibodies and their cognate antigens, as previously described (51). In brief, following clarification by centrifugation, samples were diluted 200-fold in 0.1% v/v Triton/0.1% v/v BSA in 1x PBS (Triton-BSA buffer) and then applied to the arrays. Following incubation for 2 hours at 20°C with gentle shaking (50 rpm), arrays were washed three times in fresh Triton-BSA buffer at 20°C for 20 minutes with gentle shaking. The washed arrays were then incubated with labelled anti-human IgG antibody) at 20°C for 2 hours. Arrays were washed three times in Triton-BSA buffer for 5 minutes at 20°C, rinsed briefly (5-10 seconds) in distilled water, and centrifuged for 2 minutes at 240g. Each batch of microarrays was also tested with a pooled normal human sera control as an internal check of background reproducibility across different experiments over time. The probed and dried arrays were scanned using a High-Resolution microarray scanner (Agilent Technologies, Santa Clara, CA, USA) at 10µm resolution. The resulting 20-bit TIFF images were feature extracted using Feature Extraction software version 10.5 or 10.7.3.1 (Agilent Technologies) to determine the intensity of fluorescence at each protein spot.

Statistical analysis Raw median signal intensity of each protein feature on the array was subtracted from the local median background intensity and fluorescence intensity for each autoantigen normalised by consolidating the replicates (median consolidation), followed by normal transformation and then global median normalisation. Outliers were identified and removed. Normalised data was used for identification of individual SLE autoantibodies and incorporation into combined biomarker panels. Statistical analysis was performed using R statistics package version 3.1 (R Foundation for Statistical Computing, Vienna, Austria), MATLAB version 8 (MathWorks, Nattick, MA, USA) and Prism (GraphPad Software, La Jolla, CA, USA). No significant underlying structure between cohorts was observed using principal component analysis, so data from the three SLE and matched control cohorts were merged. Autoantibodies were tested by comparing SLE patients and controls using Student t-test, with P values adjusted for false discovery rate (FDR) using Benjamini Hochberg method. Autoantibody positivity was defined as autoantibody levels >2 SD of the control population, tested for statistical significance using Fisher’s exact test, corrected using FDR. Autoantibodies were checked for non-significant levels of positivity in the confounding group. Unsupervised hierarchical clustering of autoantibody levels in SLE patients (n=277) was performed using 116 autoantibodies shown to be significantly higher in SLE patients compared to controls (t-test, FDR<0.01). Correlation was used as distance metric with Ward’s method for clustering and plotted using Heatmap.2 function in R package gplots. Principal component analysis was performed with the same 116 autoantibodies using both SLE individuals (n=277) and controls (n=280). Correlation between antibody clusters was confirmed using corrplot

11 function. For linear discriminant analysis (LDA) and logistic regression, correlated autoantibody groupings with Pearson r>0.8 were rationalised to a single autoantibody to prevent multi- collinearity. LDA was performed on SLE patients (n=277) and controls (n=280) using R package MASS and 3D plots visualised using R packages plot3d and rgl. Logistic regression was performed on 116 autoantibodies in SLE patients (n=277) and controls (n=280) using Bayesian generalised linear model with the R package arm, using both forward and backward stepwise selection with Akaike (k=2) or Bayesian information criteria (k=log n, where n=557). Receiver operating characteristic (ROC) curves on logistic regression models were plotted and parameters calculated using R package pROC. The performance of classification models was compared to a randomized set of case and control status samples (permutation assay). Analyses were performed both with the inclusion and exclusion of ANA and anti-dsDNA data generated using ELISA assays as described above to identify biomarkers from the protein array data which were additive to, correlated or decorrelated with ANA and anti-dsDNA activity. The results from each classifier were compared to identify individual biomarkers and biomarker combinations which were independently selected using multiple approaches. The analysis parameters were specifically adjusted in order to create larger biomarker panels which could include low penetrance biomarkers. Biomarkers that were used in this analysis were preselected and poorly performing biomarkers which appeared more frequently in the healthy cohort where excluded from the biomarker selection process. Nested cross-validation was applied to the classification procedures in order to assess the classification prediction accuracy. The statistical analysis was verified by an independent statistician (L.Z.).

Protein-protein interaction analysis Autoantigens identified by the array were imported into STRING database version 10 (Search Tool for the Retrieval of Interacting Genes/Proteins) and human protein-protein interactions retrieved with predicted nodes identified by STRING. Data was exported and protein-protein interaction networks were visualised in Cytoscape version 3.2.1.

Supplementary Materials Table S1. List of autoantibodies by cluster detected by Discovery protein microarray Figure S1. Heatmap showing differences between in autoantibody levels across four SLE subgroups Figure S2. Principal component analysis of autoantibody levels in SLE individuals and controls Table S2. List of proteins in 56- and 24-autoantigen biomarker models Table S3. Receiver operating characteristic curve analysis of biomarkers panels comparing SLE vs healthy controls Movie S1. 3D plot of principal component scores for SLE individuals and controls Movie S2. 3D plot of principal component loading scores for SLE autoantigens Movie S3. 3D plot of linear discriminant analysis of autoantibody levels in SLE ethnic groups

12

References and Notes: 1. G. J. Friou, Clinical application of a test for lupus globulin-nucleohistone interaction using fluorescent antibody. Yale J. Biol. Med. 31, 40 (1958). 2. P. Miescher, R. Strassle, New serological methods for the detection of the L.E. factor. Vox Sang. 2, 283 (1957). 3. D. A. Isenberg, J. J. Manson, M. R. Ehrenstein, A. Rahman, Fifty years of anti-ds DNA antibodies: are we approaching journey's end? Rheumatology (Oxford). 46, 1052 (2007). 4. G. Yaniv, G. Twig, D. B. Shor, A. Furer, Y. Sherer, O. Mozes, O. Komisar, E. Slonimsky, E. Klang, E. Lotan, M. Welt, I. Marai, A. Shina, H. Amital, Y. Shoenfeld, A volcanic explosion of autoantibodies in systemic lupus erythematosus: a diversity of 180 different antibodies found in SLE patients. Autoimmunity reviews 14, 75 (2015). 5. W. H. Robinson, C. DiGennaro, W. Hueber, B. B. Haab, M. Kamachi, E. J. Dean, S. Fournel, D. Fong, M. C. Genovese, H. E. de Vegvar, K. Skriner, D. L. Hirschberg, R. I. Morris, S. Muller, G. J. Pruijn, W. J. van Venrooij, J. S. Smolen, P. O. Brown, L. Steinman, P. J. Utz, Autoantigen microarrays for multiplex characterization of autoantibody responses. Nat. Med. 8, 295 (2002). 6. Q. Z. Li, C. Xie, T. Wu, M. Mackay, C. Aranow, C. Putterman, C. Mohan, Identification of autoantibody clusters that best predict lupus disease activity using glomerular proteome arrays. J. Clin. Invest. 115, 3428 (2005). 7. Q. Z. Li, J. Zhou, A. E. Wandstrat, F. Carr-Johnson, V. Branch, D. R. Karp, C. Mohan, E. K. Wakeland, N. J. Olsen, Protein array autoantibody profiles for insights into systemic lupus erythematosus and incomplete lupus syndromes. Clin. Exp. Immunol. 147, 60 (2007). 8. B. F. Chong, L. C. Tseng, T. Lee, R. Vasquez, Q. Z. Li, S. Zhang, D. R. Karp, N. J. Olsen, C. Mohan, IgG and IgM autoantibody differences in discoid and systemic lupus patients. J. Invest. Dermatol. 132, 2770 (2012). 9. K. Papp, P. Vegh, R. Hobor, Z. Szittner, Z. Voko, J. Podani, L. Czirjak, J. Prechl, Immune complex signatures of patients with active and inactive SLE revealed by multiplex protein binding analysis on antigen microarrays. PloS one 7, e44824 (2012). 10. J. V. Price, D. J. Haddon, D. Kemmer, G. Delepine, G. Mandelbaum, J. A. Jarrell, R. Gupta, I. Balboni, E. F. Chakravarty, J. Sokolove, A. K. Shum, M. S. Anderson, M. H. Cheng, W. H. Robinson, S. K. Browne, S. M. Holland, E. C. Baechler, P. J. Utz, Protein microarray analysis reveals BAFF-binding autoantibodies in systemic lupus erythematosus. J. Clin. Invest. 123, 5135 (2013). 11. C. Desmetz, A. Mange, T. Maudelonde, J. Solassol, Autoantibody signatures: progress and perspectives for early cancer detection. Journal of cellular and molecular medicine 15, 2013 (2011). 12. L. Abel, S. Kutschki, M. Turewicz, M. Eisenacher, J. Stoutjesdijk, H. E. Meyer, D. Woitalla, C. May, Autoimmune profiling with protein microarrays in clinical applications. Biochim. Biophys. Acta 1844, 977 (2014). 13. W. Huang, C. Hu, H. Zeng, P. Li, L. Guo, X. Zeng, G. Liu, F. Zhang, Y. Li, L. Wu, Novel systemic lupus erythematosus autoantigens identified by human protein microarray technology. Biochem. Biophys. Res. Commun. 418, 241 (2012). 14. J. M. Boutell, D. J. Hart, B. L. Godber, R. Z. Kozlowski, J. M. Blackburn, Functional protein microarrays for parallel characterisation of mutants. Proteomics 4, 1950 (2004). 15. J. O. Koopmann, M. B. McAndrew, J. M. Blackburn, in Protein Microarrays, M. Schena, Ed. (Jones and Bartlett Publishers, Sudbury, MA, USA, 2005), chap. 22, pp. 401-420.

13

16. D. A. Stetler, M. Reichlin, C. M. Berlin, S. T. Jacob, Antibodies against nuclear poly(A) polymerases in rheumatic autoimmune diseases. J. Clin. Immunol. 7, 24 (1987). 17. N. J. Goulding, M. R. Podgorski, N. D. Hall, R. J. Flower, J. L. Browning, R. B. Pepinsky, P. J. Maddison, Autoantibodies to recombinant lipocortin-1 in rheumatoid arthritis and systemic lupus erythematosus. Ann. Rheum. Dis. 48, 843 (1989). 18. H. Uesugi, S. Ozaki, J. Sobajima, F. Osakada, H. Shirakawa, M. Yoshida, K. Nakao, Prevalence and characterization of novel pANCA, antibodies to the high mobility group non-histone chromosomal proteins HMG1 and HMG2, in systemic rheumatic diseases. J. Rheumatol. 25, 703 (1998). 19. R. Fritsch-Stork, D. Mullegger, K. Skriner, B. Jahn-Schmid, J. S. Smolen, G. Steiner, The spliceosomal autoantigen heterogeneous nuclear ribonucleoprotein A2 (hnRNP-A2) is a major T cell autoantigen in patients with systemic lupus erythematosus. Arthritis research & therapy 8, R118 (2006). 20. K. Skriner, W. H. Sommergruber, V. Tremmel, I. Fischer, A. Barta, J. S. Smolen, G. Steiner, Anti-A2/RA33 autoantibodies are directed to the RNA binding region of the A2 protein of the heterogeneous nuclear ribonucleoprotein complex. Differential epitope recognition in rheumatoid arthritis, systemic lupus erythematosus, and mixed connective tissue disease. J. Clin. Invest. 100, 127 (1997). 21. M. Matsushita, Y. Takasaki, K. Takeuchi, H. Yamada, R. Matsudaira, H. Hashimoto, Autoimmune response to proteasome activator 28alpha in patients with connective tissue diseases. J. Rheumatol. 31, 252 (2004). 22. W. Li, W. Wang, S. Sun, Y. Sun, Y. Pan, L. Wang, R. Zhang, K. Zhang, J. Li, Autoantibodies against the catalytic domain of BRAF are not specific serum markers for rheumatoid arthritis. PloS one 6, e28975 (2011). 23. I. Wichmann, J. R. Garcia-Lozano, N. Respaldiza, M. F. Gonzalez-Escribano, A. Nunez- Roldan, Autoantibodies to transcriptional regulation proteins DEK and ALY in a patient with systemic lupus erythematosus. Hum. Immunol. 60, 57 (1999). 24. Y. Katsumata, Y. Kawaguchi, S. Baba, S. Hattori, K. Tahara, K. Ito, T. Iwasaki, N. Yamaguchi, M. Oyama, H. Kozuka-Hata, H. Hattori, K. Nagata, H. Yamanaka, M. Hara, Identification of three new autoantibodies associated with systemic lupus erythematosus using two proteomic approaches. Molecular & cellular proteomics : MCP 10, M110 005330 (2011). 25. D. K. McCurdy, L. Q. Tai, J. Nguyen, Z. Wang, H. M. Yang, N. Udar, F. Naiem, P. Concannon, R. A. Gatti, MAGE Xp-2: a member of the MAGE gene family isolated from an expression library using systemic lupus erythematosus sera. Mol. Genet. Metab. 63, 3 (1998). 26. K. Elkon, S. Skelly, A. Parnassa, W. Moller, W. Danho, H. Weissbach, N. Brot, Identification and chemical synthesis of a ribosomal protein antigenic determinant in systemic lupus erythematosus. Proc. Natl. Acad. Sci. U. S. A. 83, 7419 (1986). 27. M. Mahler, J. G. Hanly, M. J. Fritzler, Importance of the dense fine speckled pattern on HEp-2 cells and anti-DFS70 antibodies for the diagnosis of systemic autoimmune diseases. Autoimmunity reviews 11, 642 (2012). 28. C. G. Katsiari, V. C. Kyttaris, Y. T. Juang, G. C. Tsokos, Protein phosphatase 2A is a negative regulator of IL-2 production in patients with systemic lupus erythematosus. J. Clin. Invest. 115, 3193 (2005).

14

29. S. Krishnan, Y. T. Juang, B. Chowdhury, A. Magilavy, C. U. Fisher, H. Nguyen, M. P. Nambiar, V. Kyttaris, A. Weinstein, R. Bahjat, P. Pine, V. Rus, G. C. Tsokos, Differential expression and molecular associations of Syk in systemic lupus erythematosus T cells. J. Immunol. 181, 8145 (2008). 30. P. S. Biswas, S. Gupta, R. A. Stirzaker, V. Kumar, R. Jessberger, T. T. Lu, G. Bhagat, A. B. Pernis, Dual regulation of IRF4 function in T and B cells is required for the coordination of T-B cell interactions and the prevention of autoimmunity. J. Exp. Med. 209, 581 (2012). 31. B. Zhu, A. L. Symonds, J. E. Martin, D. Kioussis, D. C. Wraith, S. Li, P. Wang, Early growth response gene 2 (Egr-2) controls the self-tolerance of T cells and prevents the development of lupuslike autoimmune disease. J. Exp. Med. 205, 2295 (2008). 32. P. Kumpers, S. David, M. Haubitz, J. Hellpap, R. Horn, V. Brocker, M. Schiffer, H. Haller, T. Witte, The Tie2 receptor antagonist angiopoietin 2 facilitates vascular inflammation in systemic lupus erythematosus. Ann. Rheum. Dis. 68, 1638 (2009). 33. K. L. Silver, T. L. Crockford, T. Bouriez-Jones, S. Milling, T. Lambe, R. J. Cornall, MyD88-dependent autoimmune disease in Lyn-deficient mice. Eur. J. Immunol. 37, 2734 (2007). 34. R. R. Graham, C. Kyogoku, S. Sigurdsson, I. A. Vlasova, L. R. Davies, E. C. Baechler, R. M. Plenge, T. Koeuth, W. A. Ortmann, G. Hom, J. W. Bauer, C. Gillett, N. Burtt, D. S. Cunninghame Graham, R. Onofrio, M. Petri, I. Gunnarsson, E. Svenungsson, L. Ronnblom, G. Nordmark, P. K. Gregersen, K. Moser, P. M. Gaffney, L. A. Criswell, T. J. Vyse, A. C. Syvanen, P. R. Bohjanen, M. J. Daly, T. W. Behrens, D. Altshuler, Three functional variants of IFN regulatory factor 5 (IRF5) define risk and protective haplotypes for human lupus. Proc. Natl. Acad. Sci. U. S. A. 104, 6758 (2007). 35. S. N. Kariuki, B. S. Franek, R. A. Mikolaitis, T. O. Utset, M. Jolly, A. D. Skol, T. B. Niewold, Promoter variant of PIK3C3 is associated with autoimmunity against Ro and Sm epitopes in African-American lupus patients. Journal of biomedicine & biotechnology 2010, 826434 (2010). 36. R. Lu, G. S. Vidal, J. A. Kelly, A. M. Delgado-Vega, X. K. Howard, S. R. Macwana, N. Dominguez, W. Klein, C. Burrell, I. T. Harley, K. M. Kaufman, G. R. Bruner, K. L. Moser, P. M. Gaffney, G. S. Gilkeson, E. K. Wakeland, Q. Z. Li, C. D. Langefeld, M. C. Marion, J. Divers, G. S. Alarcon, E. E. Brown, R. P. Kimberly, J. C. Edberg, R. Ramsey-Goldman, J. D. Reveille, G. McGwin, Jr., L. M. Vila, M. A. Petri, S. C. Bae, S. K. Cho, S. Y. Bang, I. Kim, C. B. Choi, J. Martin, T. J. Vyse, J. T. Merrill, J. B. Harley, M. E. Alarcon-Riquelme, Biolupus, G. M. Collaborations, S. K. Nath, J. A. James, J. M. Guthridge, Genetic associations of LYN with systemic lupus erythematosus. Genes and immunity 10, 397 (2009). 37. Y. Li, H. Cheng, X. B. Zuo, Y. J. Sheng, F. S. Zhou, X. F. Tang, H. Y. Tang, J. P. Gao, Z. Zhang, S. M. He, Y. M. Lv, K. J. Zhu, D. Y. Hu, B. Liang, J. Zhu, X. D. Zheng, L. D. Sun, S. Yang, Y. Cui, J. J. Liu, X. J. Zhang, Association analyses identifying two common susceptibility loci shared by psoriasis and systemic lupus erythematosus in the Chinese Han population. J. Med. Genet. 50, 812 (2013). 38. P. S. Ramos, A. H. Williams, J. T. Ziegler, M. E. Comeau, R. T. Guy, C. J. Lessard, H. Li, J. C. Edberg, R. Zidovetzki, L. A. Criswell, P. M. Gaffney, D. C. Graham, R. R. Graham, J. A. Kelly, K. M. Kaufman, E. E. Brown, G. S. Alarcon, M. A. Petri, J. D. Reveille, G. McGwin, L. M. Vila, R. Ramsey-Goldman, C. O. Jacob, T. J. Vyse, B. P. Tsao, J. B. Harley, R. P. Kimberly, M. E. Alarcon-Riquelme, C. D. Langefeld, K. L. Moser, Genetic analyses of

15

interferon pathway-related genes reveal multiple new loci associated with systemic lupus erythematosus. Arthritis Rheum. 63, 2049 (2011). 39. A. T. Borchers, S. M. Naguwa, Y. Shoenfeld, M. E. Gershwin, The geoepidemiology of systemic lupus erythematosus. Autoimmunity reviews 9, A277 (2010). 40. P. I. Burgos, G. McGwin, Jr., G. J. Pons-Estel, J. D. Reveille, G. S. Alarcon, L. M. Vila, US patients of Hispanic and African ancestry develop lupus nephritis early in the disease course: data from LUMINA, a multiethnic US cohort (LUMINA LXXIV). Ann. Rheum. Dis. 70, 393 (2011). 41. P. Pisitkun, J. A. Deane, M. J. Difilippantonio, T. Tarasenko, A. B. Satterthwaite, S. Bolland, Autoreactive B cell responses to RNA-related antigens due to TLR7 gene duplication. Science 312, 1669 (2006). 42. J. Silke, R. Brink, Regulation of TNFRSF and innate immune signalling complexes by TRAFs and cIAPs. Cell Death Differ. 17, 35 (2010). 43. M. Perretti, F. D'Acquisto, Annexin A1 and glucocorticoids as effectors of the resolution of inflammation. Nat. Rev. Immunol. 9, 62 (2009). 44. G. S. Garcia-Romo, S. Caielli, B. Vega, J. Connolly, F. Allantaz, Z. Xu, M. Punaro, J. Baisch, C. Guiducci, R. L. Coffman, F. J. Barrat, J. Banchereau, V. Pascual, Netting neutrophils are major inducers of type I IFN production in pediatric systemic lupus erythematosus. Science translational medicine 3, 73ra20 (2011). 45. R. Lande, D. Ganguly, V. Facchinetti, L. Frasca, C. Conrad, J. Gregorio, S. Meller, G. Chamilos, R. Sebasigari, V. Riccieri, R. Bassett, H. Amuro, S. Fukuhara, T. Ito, Y. J. Liu, M. Gilliet, Neutrophils activate plasmacytoid dendritic cells by releasing self-DNA-peptide complexes in systemic lupus erythematosus. Science translational medicine 3, 73ra19 (2011). 46. S. N. Willis, K. L. Good-Jacobson, J. Curtis, A. Light, J. Tellier, W. Shi, G. K. Smyth, D. M. Tarlinton, G. T. Belz, L. M. Corcoran, A. Kallies, S. L. Nutt, IRF4 regulates germinal center cell formation through a B cell-intrinsic mechanism. J. Immunol. 192, 3200 (2014). 47. T. A. Melhuish, C. M. Gallo, D. Wotton, TGIF2 interacts with histone deacetylase 1 and represses transcription. J. Biol. Chem. 276, 32109 (2001). 48. L. A. Casciola-Rosen, G. Anhalt, A. Rosen, Autoantigens targeted in systemic lupus erythematosus are clustered in two populations of surface structures on apoptotic keratinocytes. J. Exp. Med. 179, 1317 (1994). 49. M. J. Walport, Complement. Second of two parts. N. Engl. J. Med. 344, 1140 (2001). 50. P. Taylor, J. Gartemann, J. Hsieh, J. Creeden, A systematic review of serum biomarkers anti-cyclic citrullinated Peptide and rheumatoid factor as tests for rheumatoid arthritis. Autoimmune diseases 2011, 815038 (2011). 51. S. Gnjatic, C. Wheeler, M. Ebner, E. Ritter, A. Murray, N. K. Altorki, C. A. Ferrara, H. Hepburne-Scott, S. Joyce, J. Koopman, M. B. McAndrew, N. Workman, G. Ritter, R. Fallon, L. J. Old, Seromic analysis of antibody responses in non-small cell lung cancer patients and healthy donors using conformational protein arrays. J. Immunol. Methods 341, 50 (2009).

16

Acknowledgments: Funding: The protein microarray, serum assays and initial data analysis were funded by Oxford Gene Technology. Sample collection and secondary data analysis as carried out by M.J.L., L.Z., A.P.C., T.J.V. was funded by Arthritis Research UK, Medical Research Council, Wellcome Trust and George Koukis. M.J.L. was funded by an Arthritis Research UK Clinician Scientist Fellowship (19631). Author Contributions: M.B.M., C.W., N.W., P.A., J.K., E.U., R.S., J.A. designed, produced and conducted assays with the protein microarray. T.J.V, M.B.M. and J.A. supervised the study. A.P.C. provided confounding cohort RA samples. T.J.V. provided European SLE samples for SLE cohorts 2 and 3 and matched controls. M.J.L. and L.Z. performed data analysis. M.J.L. wrote the manuscript with comments from co-authors. Competing Interests: M.B.M., C.W., N.W., P.A., J.K., E.U., R.S., J.A. are or were full-time employees of Oxford Gene Technology. Oxford Gene Technology and King’s College London have filed patent application(s) related to this work on diagnostic biomarkers in Systemic Lupus Erythematosus.

17

AB

C

D

Figure 1. Novel autoantigens identified by protein microarray in Systemic Lupus Erythematosus (SLE) (A) Next generation protein microarray technology used in this study. (B) Volcano plot displaying each autoantigen on the microarray as a single point with the P value on the y-axis (unadjusted t test) versus the fold change in antibody levels between SLE and matched controls on the x-axis. Red dots signify FDR<0.01. (C & D) Tukey boxplots of median normalised IgG binding data showing IgG autoantibody reactivity against specific antigens on the protein microarray. (C) Top six previously identified autoantigens confirming validation of lesser known antigens HNRNPA2B1, PSME3 (Ki), PABPC1 and HMGB2. (D) Top six novel autoantigens identified by microarray. Box plots show median, upper and lower quartiles, with whiskers denoting maximal and minimal data within 1.5 × interquantile range (IQR). Dark blue dots represent antibody positivity defined as >2 SD of control population. Confounding group includes individuals with rheumatoid arthritis, Sjogren’s syndrome and other connective tissue diseases.

18

A

B

Figure 2. Ranking of autoantibody positivity in microarray antigens (A) Autoantigens ranked by positivity in SLE patients, showing distribution across the three cohorts used for the study (UK European, UK Afro-Caribbean and USA mixed cohort). P values were calculated by chi-square test with adjustment for multiple testing. Adjusted P<0.05 was considered significant. (B) Distribution histogram showing total number of positive autoantibodies for each individual showing that sera from SLE patients can recognise over 60 discrete autoantigens.

19

Ab positivity threshold

A

B Cluster 1a

Cluster 1b

Cluster 2

Cluster 3

Figure 3. Hierarchical clustering of autoantibody profiles in SLE (A) Heatmap of unsupervised hierarchical clustering of top 116 autoantibody levels in SLE patients (n=277), using correlation as distance metric and Ward’s clustering method. Autoantibody levels were Z score normalised against control population mean and SD, with Z scores >2 corresponding to positive autoantibody levels. Autoantibodies cluster into four major clusters, with four matching clusters SLE1a, SLE1b, SLE2 and SLE3 identified in SLE individuals. (B) Correlation plot of Pearson r values shows significant cross-correlation of autoantibodies within each cluster.

20

A B

ANA dsDNA *** * ANA negative SLE dsDNA negative SLE 250 ** 800 *** C *** *** D 3% 22% 200 600 27% SLE1a 150 SLE1b 400 47% SLE2 100 SLE3 ANA titre ANA dsDNA titre dsDNA 200 63% 50 7% 22%

0 0 9%

SLE2 SLE3 SLE2 SLE3 SLE1a SLE1b SLE1a SLE1b

100 P=0.022 E F 50 P=0.032 80

SLE1a 40 60 SLE1b SLE2 30 SLE1a SLE3 SLE1b 40 20 SLE2 Frequency (%) Frequency SLE3

20 (%) frequency Cluster 10

0 0 AA HA EA

Figure 4. Relationship of SLE clusters to principal component analysis and SLE disease characteristics (A) Principal component analysis of 116 autoantibody levels in SLE individuals (n=277) and controls (n=280). Principal component (PC) scores for PC1-3 showing clusters of SLE individuals identified by hierarchical clustering. (B) Plots of PC loading scores for PC3 and PC5 (2D plot) or PC2, PC3 and PC5 (3D plot) showing clustering of autoantigens identified by hierarchical clustering. (C) Anti-nuclear and anti-double-stranded DNA antibody results according to SLE cluster groups SLE1a, SLE1b, SLE2 and SLE3. ANA and dsDNA sera levels were measured by ELISA, and are not based on patient clinical records. Statistical analysis by Kruskal-Wallis test. (D) Comparison of SLE subgroups among ANA and dsDNA negative individuals, showing that ANA negative individuals are predominantly from subgroups SLE1b and SLE3. (E) Subphenotype analysis among SLE patient clusters. (F) SLE subgroup distribution varies by ethnicity in African (AA), Hispanic (HA) and European (EA) ancestry SLE individuals. Statistical analysis by Fisher’s exact test.

21

A Afro-Caribbean B

Hispanic

European

C

Predicted D Control SLE AA SLE HA SLE EA Observed Control 262 8 0 10 SLE AA 22 89 1 5 SLE HA 5 0 24 1 SLE EA 33 4 4 89 Misclassified 16.7%

Figure 5. Distinctive SLE autoantibodies profiles according to ethnicity (A) Differential antibody positivity according to ethnicity, showing autoantibodies with increased positivity in Afro-Caribbean (AA) SLE, Hispanic (HA) SLE and European (EA) ancestry SLE. Autoantibodies were selected as statistically different between SLE ethnic groups by one-way ANOVA (FDR adjusted P<0.01). (B) Boxplots of key autoantigens specific for different ethnicities in SLE. (C) Linear discriminant analysis based on plots showing separation of Afro-Caribbean, Hispanic and European ancestry SLE individuals according to linear discriminant factors LD1, LD2 and LD3. Right hand panel, 3-dimensional plot of LD1, LD2 and LD3. (D) Table of LDA predicted categorisation of SLE ethnic groups and control individuals compared to observed groupings.

22

AB

cIAP2 MYD88 IκBα TAK1 TLR signalling NIK NF-κB activation CASP9 Apoptosis regulation BCL2A1 C Annexin-A1 ZMYND11

ZAP70 VAV1 CREB1 EGR2 B and T cell Ro60 PIK3C3 La development RNA processing SH2B1 SMN1/Sm Spliceosome WAS RQCD1 mRNA editing EZH2 APOBEC3G Nucleocytoplasmic IRF5 PABPC1 transport HNRNPA2B1 IGF2BP3

HMGB2 HMG20B MAGEB1 Chromatin remodelling SMAD2 MAGEB2 DNA repair SMAD5 HOXB6 Transcription factors PPP2CB APEX1 PPM1A IRF4 ID2 TGIF2 CSNK2A1 TGF-β, Wnt signalling CSNK2A2 TBX6 MYF6 TWIST2 LYN

Figure 6. Protein-protein interactions and functional annotation of novel autoantigens (A & B) Protein-protein interaction networks were derived from STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) database and plotted using Cytoscape for (A) cluster 2 and (B) cluster 3 autoantigens. Line thickness represents strength of interaction confidence. Predicted nodes are shown in orange. (C) Circular dendrogram from hierarchical clustering of autoantigens summarising key protein functions for autoantigens in each cluster.

23

ABCSLE vs healthy controls SLE vs non-SLE

D 56 Ab panel 24 Ab panel ANA+dsDNA ANA dsDNA AUC 0.931 0.907 0.858 0.839 0.701

AUC 95%CI 0.911 - 0.950 0.884 - 0.930 0.827 - 0.890 0.806 - 0.873 0.658 - 0.743

Sensitivity at specificity 92% 74.4% 72.6% 66.2% 65.2% 49.6%

Sensitivity at specificity 94% 68.6% 67.9% 46.0% 53.8% 46.0%

Sensitivity at specificity 96% 59.2% 57.0% 42.4% 27.7% 41.2% E F

Figure 7. Performance characteristics of new autoantibody biomarker panels Logistic regression was performed using 116 autobodies to identify biomarker panels for diagnosis of SLE in 277 SLE patients and 280 healthy controls. (A) Receiver Operating Characteristic (ROC) curve analysis shows the relationship between the numbers of autoantibodies included in biomarker panel and sensitivity, specificity and area under curve (AUC). (B) Optimal diagnostic biomarker panel of 56 autoantibodies was derived by logistic regression using Akaike information criteria (k=log 2) and stepwise backward selection. Using Bayesian information criteria (k=log n, where n=557), logistic regression identified a 24- autoantibody biomarker panel. ROC curves are shown for both biomarker panels compared with ANA, dsDNA and combined ANA + dsDNA model. (C) ROC curve comparison of biomarker panels, ANA and dsDNA assays with addition of confounding cohort (n=92) to the control group, and analysis of SLE (n=277) vs non-SLE (n=372) shows that ANA sensitivity at high specificity is reduced. (D) Table of area under curve (AUC), sensitivity and specificity of biomarker panels compared to ANA, dsDNA and ANA + dsDNA model. (E) Heatmap of Z scores of mean autoantibody levels in each SLE subgroup for 56- autoantibody biomarker panel. (F) Heatmap of Z scores of mean autoantibody levels in SLE individuals grouped by ethnicity for 56-autoantibody biomarker panel. Statistical analysis by one-way ANOVA with FDR correction.

24

Submitted Manuscript: Confidential template updated: February 28 2012

Supplementary Materials:

Table S1. List of autoantibodies by cluster detected by Discovery protein microarray

Gene name Aliases Full name Gene ID Chromosomal PUSA PUK-EU PUK-AC Pcombined FDR Novel (NCBI) location adjusted P Cluster 1a TROVE2 RO60, SSA2 TROVE domain family, member 2 6738 1q31 6.18E-07 3.41E-09 6.50E-12 9.11E-24 1.41E-20 SSB/La Sjogren syndrome antigen B (autoantigen La) 6741 2q31.1 1.39E-06 3.00E-09 8.91E-11 8.34E-23 6.43E-20 PSME3 Ki, PA28- proteasome (prosome, macropain) activator subunit 3 10197 17q21 9.57E-04 1.04E-06 4.94E-05 2.59E-11 3.34E-09 gamma (PA28 gamma; Ki) SMN1 survival of motor neuron 1, telomeric 6606 5q13.2 1.74E-03 5.23E-03 2.40E-02 2.06E-05 3.83E-04 * RQCD1 CNOT9 RCD1 required for cell differentiation1 homolog (S. 9125 2q35 1.20E-01 3.26E-03 1.45E-03 9.62E-04 6.75E-03 * pombe) Cluster 1b PABPC1 poly(A) binding protein, cytoplasmic 1 26986 8q22.2-q23 6.16E-06 8.43E-06 3.47E-10 3.86E-17 1.98E-14 HMGB2 HMG2 high mobility group box 2 3148 4q31 2.80E-04 2.30E-04 7.81E-08 1.94E-14 7.48E-12 LIN28A lin-28 homolog A (C. elegans) 79727 1p36.11 4.13E-05 1.58E-03 1.32E-09 8.18E-14 2.10E-11 * HNRNPA2B1 HNRNPA2, heterogeneous nuclear ribonucleoprotein A2/B1 3181 7p15 1.27E-03 2.22E-04 2.81E-08 8.06E-14 2.10E-11 SNRPB1 IGF2BP3 insulin-like growth factor 2 mRNA binding protein 3 10643 7p11 2.25E-05 5.85E-05 5.48E-08 1.47E-13 2.84E-11 * APOBEC3G ARP-9 apolipoprotein B mRNA editing enzyme, catalytic 60489 22q13.1-q13.2 3.07E-05 1.84E-03 1.01E-07 1.54E-11 2.16E-09 * polypeptide-like 3G HMG20B high mobility group 20B 10362 19p13.3 7.39E-03 2.43E-03 3.23E-09 7.30E-11 8.67E-09 * CDC25B cell division cycle 25B 994 20p13 5.93E-06 2.93E-04 6.07E-02 8.89E-10 9.80E-08 * PRM1 protamine 1 5619 16p13.2 5.25E-07 1.79E-03 5.11E-05 4.47E-09 3.84E-07 * HOXB6 B6 3216 17q21.3 2.75E-03 1.64E-04 1.14E-04 6.03E-09 4.65E-07 * PRM2 protamine 2 5620 16p13.2 4.49E-05 2.80E-03 5.31E-04 2.62E-06 8.42E-05 * APEX1 APEX nuclease (multifunctional DNA repair enzyme) 1 328 14q11.2 8.99E-02 3.66E-01 5.57E-07 1.92E-05 3.61E-04 IRF4 interferon regulatory factor 4 3662 6p25-p23 1.06E-03 2.99E-01 3.97E-06 2.09E-05 3.84E-04 * MAGEB2 MAGE-XP-2 melanoma antigen family B2 4113 Xp21.3 2.31E-01 7.56E-02 9.90E-06 5.73E-05 8.41E-04 SUB1 p14 SUB1 homolog (S. cerevisiae) 10923 5p13.3 6.96E-03 8.53E-04 2.13E-02 1.32E-04 1.58E-03 * MAGEB1 MAGE-Xp melanoma antigen family B1 4112 Xp21.3 1.41E-01 1.05E-01 1.91E-04 2.27E-04 2.31E-03 Gene name Aliases Full name Gene ID Chromosomal PUSA PUK-EU PUK-AC Pcombined FDR Novel (NCBI) location adjusted P TGIF2 TGFB-induced factor homeobox 2 60436 20q11.23 2.63E-01 1.52E-01 3.30E-04 2.62E-04 2.56E-03 * NHLH1 nescient helix loop helix 1 4807 1q22 1.83E-01 2.61E-02 6.65E-02 3.02E-04 2.77E-03 * ZNF207 protein 207 7756 17q11.2 1.18E-03 6.62E-02 2.65E-04 4.06E-04 3.56E-03 * Cluster 2 HNRNPUL1 E1B-AP5 heterogeneous nuclear ribonucleoprotein U-like 1 11100 19q13.2 2.24E-10 1.56E-03 4.30E-07 1.22E-13 2.68E-11 * RRAS related RAS viral (r-ras) oncogene homolog 6237 19q13.33 1.26E-01 2.14E-02 7.33E-06 1.31E-09 1.26E-07 * FUS HNRNPP2 FUS RNA binding protein 2521 16p11.2 3.42E-03 2.34E-03 1.97E-06 1.50E-09 1.36E-07 * SMAD5 SMAD family member 5 4090 5q31 8.06E-04 3.84E-02 2.14E-05 6.87E-08 4.61E-06 * SMAD2 SMAD family member 2 4087 18q21.1 3.18E-02 2.57E-01 5.01E-04 5.99E-07 2.80E-05 * CARHSP1 calcium regulated heat stable protein 1, 24kDa 23589 16p13.2 3.02E-01 2.24E-03 5.80E-04 9.53E-07 4.02E-05 * LYN LYN proto-oncogene, Src family tyrosine kinase 4067 8q13 6.61E-06 1.40E-01 1.16E-03 4.49E-06 1.33E-04 * CSNK2A1 CK2A1, CK2 casein kinase 2, alpha 1 polypeptide 1457 20p13 7.16E-02 5.20E-01 2.25E-07 4.80E-06 1.37E-04 * GNG4 guanine nucleotide binding protein (G protein), gamma 2786 1q42.3 1.67E-01 5.14E-01 5.41E-03 5.09E-06 1.43E-04 * 4 PPP2CB protein phosphatase 2, catalytic subunit, beta isozyme 5516 8p12 4.89E-03 2.02E-01 5.54E-03 6.08E-06 1.56E-04 * DUSP12 dual specificity phosphatase 12 11266 1q21-q22 7.62E-02 4.41E-03 2.78E-10 7.27E-06 1.73E-04 * RPL10 ribosomal protein L10 6134 Xq28 3.11E-01 7.89E-01 3.28E-05 9.14E-06 2.04E-04 MARK4 MAP/microtubule affinity-regulating kinase 4 57787 19q13.3 1.36E-04 5.26E-01 2.26E-03 1.35E-04 1.59E-03 * EFS embryonal Fyn-associated substrate 10278 14q11.2 8.79E-01 7.84E-01 7.49E-04 1.45E-04 1.66E-03 * PDCD6 programmed cell death 6 10016 5p15.33 3.36E-03 5.24E-01 3.30E-04 1.47E-04 1.67E-03 * TWIST2 twist family bHLH transcription factor 2 117581 2q37.3 1.90E-03 1.71E-02 5.43E-02 1.91E-04 2.05E-03 * DSTYK RIPK5 dual serine/threonine and tyrosine protein kinase 25778 1q32.1 6.14E-01 6.40E-01 3.02E-03 1.97E-04 2.08E-03 * MYF6 myogenic factor 6 (herculin) 4618 12q21 6.31E-02 1.45E-01 6.13E-03 2.19E-04 2.28E-03 * ME2 malic enzyme 2, NAD(+)-dependent, mitochondrial 4200 18q21 5.26E-01 9.50E-01 2.69E-04 2.27E-04 2.31E-03 * PRKRA protein kinase, interferon-inducible double stranded 8575 2q31.2 3.89E-01 9.78E-01 6.59E-04 2.35E-04 2.37E-03 * RNA dependent activator TBX6 T-box 6 6911 16p11.2 3.02E-01 6.11E-01 2.63E-04 2.80E-04 2.65E-03 * PPM1A PP2Calpha protein phosphatase, Mg2+/Mn2+ dependent, 1A 5494 14q23.1 4.78E-01 7.82E-01 2.59E-03 2.83E-04 2.66E-03 * MYOZ2 myozenin 2 51778 4q26-q27 1.12E-01 6.95E-01 1.68E-05 2.98E-04 2.75E-03 * ID2 inhibitor of DNA binding 2, dominant negative helix- 3398 2p25 9.70E-04 1.80E-01 4.62E-02 3.06E-04 2.79E-03 * loop-helix protein VAX2 ventral anterior homeobox 2 25806 2p13 4.88E-01 6.58E-01 1.88E-05 3.39E-04 3.02E-03 *

26

Gene name Aliases Full name Gene ID Chromosomal PUSA PUK-EU PUK-AC Pcombined FDR Novel (NCBI) location adjusted P RPS7 ribosomal protein S7 6201 2p25 6.39E-01 3.68E-01 1.68E-03 4.10E-04 3.56E-03 * ZNF394 zinc finger protein 394 84124 7q22.1 1.54E-02 9.36E-01 1.88E-03 4.37E-04 3.73E-03 * MRTO4 MRT4 homolog, ribosome maturation factor 51154 1p36.13 5.16E-03 4.86E-01 1.71E-05 4.40E-04 3.73E-03 * FOXM1 forkhead box M1 2305 12p13 4.04E-01 9.91E-01 1.26E-04 4.81E-04 3.94E-03 * GTF2E2 general transcription factor IIE, polypeptide 2, beta 2961 8p12 2.70E-04 3.72E-01 1.06E-03 5.48E-04 4.35E-03 * 34kDa BRD2 RNF3 bromodomain containing 2 6046 6p21.3 1.48E-01 8.82E-01 7.39E-03 5.58E-04 4.37E-03 * RAPGEF4 Rap guanine nucleotide exchange factor (GEF) 4 11069 2q31-q32 7.66E-02 6.60E-01 5.16E-03 5.81E-04 4.53E-03 * SRSF5 SFRS5, SRP40 serine/arginine-rich splicing factor 5 6430 14q24 8.89E-01 7.51E-01 3.88E-04 9.16E-04 6.53E-03 * GTF2B general transcription factor IIB 2959 1p22-p21 3.25E-02 4.87E-01 5.10E-03 1.05E-03 7.11E-03 * DUSP14 dual specificity phosphatase 14 11072 17q12 1.99E-01 9.37E-01 1.54E-03 1.15E-03 7.68E-03 * MKNK2 MAP kinase interacting serine/threonine kinase 2 2872 19p13.3 3.84E-03 9.46E-02 2.97E-01 1.18E-03 7.82E-03 * DNAJA1 NEDD7 DnaJ (Hsp40) homolog, subfamily A, member 1 3301 9p13.3 4.56E-01 9.57E-01 7.53E-03 1.21E-03 7.94E-03 * CDKN2C cyclin-dependent kinase inhibitor 2C (p18, inhibits 1031 1p32 7.15E-01 7.64E-01 9.91E-02 1.31E-03 8.43E-03 * CDK4) CSNK2A2 CK2A2 casein kinase 2, alpha prime polypeptide 1459 16q21 2.09E-02 7.01E-01 6.80E-04 1.48E-03 9.12E-03 * Cluster 3 DLX4 distal-less homeobox 4 1748 17q21.33 5.08E-03 9.32E-03 2.29E-07 4.13E-13 7.09E-11 * ANXA1 LPC1 annexin A1 301 9q21.13 3.33E-08 4.06E-03 2.55E-07 5.65E-13 8.72E-11 NFIL3 E4BP4, IL3BP1 nuclear factor, interleukin 3 regulated 4783 9q22 3.02E-08 5.63E-02 6.29E-05 5.88E-09 4.65E-07 * PYGB phosphorylase, glycogen; brain 5834 20p11.21 3.52E-04 2.30E-01 2.88E-05 9.09E-08 5.75E-06 * ATF4 CREB-2 activating transcription factor 4 468 22q13.1 1.52E-03 1.39E-02 1.18E-04 3.73E-07 1.92E-05 * ARAF A-Raf proto-oncogene, serine/threonine kinase 369 Xp11.4-p11.2 6.47E-04 4.38E-02 6.04E-04 9.64E-07 4.02E-05 BIRC3 c-IAP2 baculoviral IAP repeat containing 3 330 11q22 3.30E-03 1.70E-05 3.75E-02 1.04E-06 4.22E-05 * NEUROD4 neuronal differentiation 4 58158 12q13.2 4.48E-06 3.19E-02 3.83E-05 1.68E-06 6.17E-05 * TGIF1 TGFB-induced factor homeobox 1 7050 18p11.3 1.98E-02 1.35E-02 1.34E-04 1.91E-06 6.69E-05 * PLD2 phospholipase D2 5338 17p13.1 4.89E-01 4.78E-01 1.02E-05 2.01E-06 6.89E-05 * IRF5 interferon regulatory factor 5 3663 7q32 7.32E-08 8.79E-02 1.56E-05 2.32E-06 7.62E-05 * EZH2 enhancer of zeste 2 polycomb repressive complex 2 2146 7q35-q36 6.45E-03 5.37E-01 2.34E-04 3.21E-06 1.01E-04 * subunit MLF1 myeloid leukemia factor 1 4291 3q25.1 7.56E-02 3.12E-02 6.75E-04 3.94E-06 1.19E-04 * DLX3 distal-less homeobox 3 1747 17q21 1.55E-02 1.17E-01 5.27E-06 5.81E-06 1.55E-04 *

27

Gene name Aliases Full name Gene ID Chromosomal PUSA PUK-EU PUK-AC Pcombined FDR Novel (NCBI) location adjusted P VAV1 VAV vav 1 guanine nucleotide exchange factor 7409 19p13.2 2.16E-05 1.98E-01 4.12E-03 6.02E-06 1.56E-04 * ZAP70 zeta-chain (TCR) associated protein kinase 70kDa 7535 2q12 1.91E-04 1.46E-01 1.20E-03 6.53E-06 1.60E-04 PPP2R5A protein phosphatase 2, regulatory subunit B', alpha 5525 1q32.2-q32.3 8.49E-03 5.04E-01 7.88E-05 6.43E-06 1.60E-04 * CREB1 CREB cAMP responsive element binding protein 1 1385 2q34 2.07E-01 1.07E-01 6.35E-03 6.35E-06 1.60E-04 * MLLT3 myeloid/lymphoid or mixed-lineage leukemia; 4300 9p22 4.11E-03 5.38E-01 7.57E-03 6.92E-06 1.67E-04 * translocated to, 3 CRX cone-rod homeobox 1406 19q13.3 1.79E-01 5.90E-02 3.86E-05 7.94E-06 1.86E-04 * SSX4 synovial sarcoma, X breakpoint 4 6759 Xp11.23 1.90E-03 2.36E-02 9.61E-04 1.14E-05 2.44E-04 * ZMYND11 zinc finger, MYND-type containing 11 10771 10p14 1.08E-04 5.90E-02 6.93E-05 1.36E-05 2.83E-04 * PATZ1 ZNF278 POZ (BTB) and AT hook containing zinc finger 1 23598 22q12.2 1.49E-07 6.79E-03 2.07E-02 1.50E-05 3.05E-04 * RPL18A ribosomal protein L18a 6142 19p13 7.75E-03 1.62E-01 1.43E-04 1.57E-05 3.16E-04 * PIK3C3 phosphatidylinositol 3-kinase, catalytic subunit type 3 5289 18q12.3 2.11E-03 9.87E-01 1.79E-04 2.16E-05 3.91E-04 * WAS Wiskott-Aldrich syndrome 7454 Xp11.4-p11.21 1.34E-01 7.65E-03 2.06E-02 3.74E-05 5.90E-04 * HIST1H4I histone cluster 1, H4i 8294 6p21.33 1.61E-05 5.57E-01 7.41E-04 3.75E-05 5.90E-04 * HOXC10 homeobox C10 3226 12q13.3 2.37E-03 1.18E-02 9.23E-03 5.28E-05 7.91E-04 * MAP3K7 MEKK7, TAK1 mitogen-activated protein kinase kinase kinase 7 6885 6q15 3.15E-04 8.27E-01 6.01E-04 6.61E-05 9.28E-04 * MAFG v- avian musculoaponeurotic fibrosarcoma 4097 17q25.3 3.02E-02 1.44E-01 9.84E-04 6.53E-05 9.28E-04 * oncogene homolog G MYD88 myeloid differentiation primary response 88 4615 3p22 1.73E-02 8.04E-01 6.88E-04 7.77E-05 1.06E-03 * BCL2A1 BCL2-related protein A1 597 15q24.3 2.44E-01 4.47E-01 1.70E-02 9.46E-05 1.22E-03 * EGR2 KROX20 early growth response 2 1959 10q21.1 1.52E-05 7.86E-01 2.46E-03 1.04E-04 1.31E-03 * SERPINB5 serpin peptidase inhibitor, clade B (ovalbumin), 5268 18q21.33 8.66E-03 1.47E-01 6.11E-04 1.14E-04 1.39E-03 * member 5 TFE3 transcription factor binding to IGHM enhancer 3 7030 Xp11.22 2.07E-01 5.60E-01 2.41E-03 1.54E-04 1.71E-03 * NFKBIA Ikappa Balpha, nuclear factor of kappa light polypeptide gene 4792 14q13 1.10E-04 3.13E-01 1.77E-03 1.61E-04 1.77E-03 * IκBα enhancer in B-cells inhibitor, alpha RPS6KA6 ribosomal protein S6 kinase, 90kDa, polypeptide 6 27330 Xq21 6.83E-03 4.16E-01 5.74E-04 1.89E-04 2.04E-03 * RXRG , gamma 6258 1q22-q23 1.21E-03 7.84E-04 6.29E-02 2.07E-04 2.18E-03 * WT1 Wilms tumor 1 7490 11p13 8.28E-10 9.19E-02 3.34E-02 2.20E-04 2.28E-03 * MAP3K14 NIK mitogen-activated protein kinase kinase kinase 14 9020 17q21 1.31E-06 8.03E-01 2.28E-03 2.51E-04 2.51E-03 PIAS2 protein inhibitor of activated STAT, 2 9063 18q21.1 2.98E-01 6.76E-01 6.28E-05 2.77E-04 2.65E-03 * CASP9 caspase 9, apoptosis-related cysteine peptidase 842 1p36.21 2.34E-01 3.99E-01 7.91E-04 2.92E-04 2.72E-03 * CLK1 CDC-like kinase 1 1195 2q33 1.78E-03 2.29E-01 2.29E-02 6.62E-04 5.03E-03 *

28

Gene name Aliases Full name Gene ID Chromosomal PUSA PUK-EU PUK-AC Pcombined FDR Novel (NCBI) location adjusted P HOXB7 homeobox B7 3217 17q21.3 4.13E-03 3.18E-01 3.06E-02 8.08E-04 6.03E-03 * ZNF35 zinc finger protein 35 7584 3p21.32 1.80E-01 7.03E-01 3.17E-03 8.32E-04 6.10E-03 * FOSL1 FOS-like antigen 1 8061 11q13 7.50E-05 2.38E-01 1.06E-02 1.01E-03 6.96E-03 * JUNB AP-1 jun B proto-oncogene 3726 19p13.2 1.43E-02 8.35E-01 7.52E-02 1.05E-03 7.11E-03 * RPS2 ribosomal protein S2 6187 16p13.3 1.25E-05 7.88E-01 1.10E-03 1.06E-03 7.15E-03 * DEK DEK proto-oncogene 7913 6p22.3 7.91E-05 3.60E-01 4.74E-02 1.32E-03 8.47E-03 * SH2B1 SH2B adaptor protein 1 25970 16p11.2 6.94E-02 3.06E-01 2.68E-01 1.39E-03 8.82E-03 * SECISBP2 SECIS binding protein 2 79048 9q22.2 1.81E-02 1.80E-01 3.29E-04 1.47E-03 9.11E-03 * TEK TIE-2 TEK tyrosine kinase, endothelial 7010 9p21 5.61E-01 5.25E-01 2.01E-02 1.50E-03 9.13E-03 * DAPK2 DRP-1 death-associated protein kinase 2 23604 15q22.31 6.79E-02 4.87E-01 6.25E-03 1.55E-03 9.29E-03 *

29

Submitted Manuscript: Confidential template updated: February 28 2012

Figure S1. Heatmap showing differences between in autoantibody levels across four SLE subgroups Heatmap shows row Z scores for mean autoantibody levels in each SLE subgroup. P values calculated by one-way ANOVA with FDR correction. A

Figure S2A. Principal component analysis of autoantibody levels in SLE individuals and controls (A) Principal component analysis performed on SLE individuals (n=277) and controls (n=280) for 116 autoantibodies. Principal component (PC) scores 1-3 show clustering of SLE individuals by subgroups SLE1a, SLE1b, SLE2 and SLE3 as defined by hierarchical clustering.

31

B

Figure S2B. Principal component analysis of autoantibody levels in SLE individuals and controls (B) Component loadings for PC1-3 and PC5 for autoantigens showing clustering of autoantigen groups 1a, 1b, 2 and 3 as defined by hierarchical clustering.

32

Table S2. List of proteins in 56- and 24-autoantigen biomarker models

56-autoantigen model 24-autoantigen model (k=2) (k=log n) ANXA1 1 1 APOBEC3G 1 0 ARAF 1 0 ATF4 1 1 BIRC3 1 1 CARHSP1 1 0 CDC25B 1 1 CLK1 1 0 CREB1 1 0 CRX 1 0 CSNK2A1 1 0 CSNK2A2 1 0 DEK 1 1 DLX4 1 1 DUSP14 1 1 EFS 1 0 FUS 1 1 GTF2B 1 0 GTF2E2 1 0 HIST1H4I 1 1 HMG20B 1 0 HOXB7 1 1 HOXC10 1 0 JUNB 1 1 LIN28A 1 0 MAP3K14 1 1 MAP3K7 1 0 MARK4 1 0 MYOZ2 1 0 NEUROD4 1 0 NHLH1 1 0 PABPC1 1 1 PLD2 1 0 PPP2CB 1 0 PRM1 1 0 PRM2 1 0 PSME3 1 1 PYGB 1 0 RPL18A 1 0 RPS2 1 1 RPS6KA6 1 0 RQCD1 1 1 RRAS 1 1 RXRG 1 0 SECISBP2 1 0 SERPINB5 1 1 SH2B1 1 0 SSB/La 1 0 SSX4 1 1 SUB1 1 0 TEK 1 0 TROVE2 1 1 VAV1 1 1 VAX2 1 1 ZAP70 1 1 ZNF35 1 1

33

Table S3. Receiver operating characteristic curve analysis of biomarkers panels comparing SLE vs healthy controls

56 Ab panel 24 Ab panel ANA+dsDNA ANA dsDNA

AUC 0.967 0.939 0.905 0.893 0.731

AUC 95%CI 0.953 - 0.978 0.919 - 0.957 0.878 - 0.932 0.863 - 0.920 0.686 - 0.772

Sensitivity at specificity 92% 88.1% 82.3% 80.9% 78.0% 55.1%

Sensitivity at specificity 94% 85.9% 80.5% 79.4% 76.9% 53.1%

Sensitivity at specificity 96% 84.5% 78.3% 77.9% 75.5% 46.7%

34

Movies S1-S3. Movie S1. 3D plot of principal component scores for SLE individuals and controls Three dimensional plot of principal component scores 1-3 for SLE individuals (n=277) and controls (n=280, grey spheres). SLE individuals are coloured according to subgroup SLE1a (red), SLE1b (purple), SLE2 (green), SLE3 (blue) as determined by hierarchical clustering.

Movie S2. 3D plot of principal component loading scores for SLE autoantigens Three dimensional plot of principal component loading scores for SLE autoantigens, along axes PC2, PC3 and PC5. Autoantigens are coloured according to antigen cluster 1a (red), cluster 1b (purple), cluster 2 (green), cluster 3 (blue) as determined by hierarchical clustering.

Movie S3. 3D plot of linear discriminant analysis of autoantibody levels in SLE ethnic groups Three dimensional plot of linear discriminant analysis of autoantibody levels in SLE individuals using ethnicity as discriminant factor with colours for individuals as follows: controls (grey), Afro-Caribbean SLE (red), Hispanic SLE (green) and European SLE (blue).

35