HLA and autoantibodies define scleroderma subtypes and risk in African and European Americans and suggest a role for molecular mimicry

Pravitt Gourha,b,1 , Sarah A. Safrana, Theresa Alexandera, Steven E. Boydenb, Nadia D. Morganc,2, Ami A. Shahc, Maureen D. Mayesd, Ayo Doumateye, Amy R. Bentleye, Daniel Shrinere, Robyn T. Domsicf, Thomas A. Medsger Jr.f, Paula S. Ramosg , Richard M. Silverg, Virginia D. Steenh, John Vargai, Vivien Hsuj, Lesley Ann Saketkook, Elena Schiopul, Dinesh Khannal, Jessica K. Gordonm, Brynn Kronn, Lindsey A. Criswelln, Heather Gladueo, Chris T. Derkp, Elana J. Bernsteinq, S. Louis Bridges Jr.r, Victoria K. Shanmugams, Kathleen D. Kolstadt, Lorinda Chungt,u, Suzanne Kafajav, Reem Janw, Marcin Trojanowskix, Avram Goldbergy, Benjamin D. Kormanz, Peter J. Steinbachaa, Settara C. Chandrasekharappabb, James C. Mullikincc, Adebowale Adeyemoe, Charles Rotimie, Fredrick M. Wigleyc,3, Daniel L. Kastnerb,1,3, Francesco Boinn,3, and Elaine F. Remmersb,3

aNational Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Bethesda, MD 20892; bInflammatory Disease Section, National Research Institute, National Institutes of Health, Bethesda, MD 20892; cDivision of Rheumatology, Johns Hopkins University School of Medicine, Baltimore, MD 21224; dDivision of Rheumatology and Clinical Immunogenetics, University of Texas McGovern Medical School, Houston, TX 77030; eCenter for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892; fDivision of Rheumatology & Clinical Immunology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261; gDivision of Rheumatology, Medical University of South Carolina, Charleston, SC 29425; hDivision of Rheumatology, Georgetown University School of Medicine, Washington, DC 20007; iDivision of Rheumatology, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611; jDivision of Rheumatology, Rutgers Robert Wood Johnson Medical School, New Brunswick, NJ 08903; kScleroderma Patient Care and Research Center, Tulane University, New Orleans, LA 70112; lDivision of Rheumatology, University of Michigan, Ann Arbor, MI 48109; mDepartment of Rheumatology, Hospital for Special Surgery, New York, NY 10021; nRosalind Russell/Ephraim P. Engleman Rheumatology Research Center, University of California, San Francisco, CA 94115; oDepartment of Rheumatology, Arthritis and Osteoporosis Consultants of the Carolinas, Charlotte, NC 28207; pDivision of Rheumatology, University of Pennsylvania, Philadelphia, PA 19104; qDivision of Rheumatology, New York Presbyterian Hospital, Columbia University, New York, NY 10032; rDivision of Clinical Immunology and Rheumatology, University of Alabama at Birmingham, Birmingham, AL 35233; sDivision of Rheumatology, School of Medicine and Health Sciences, The George Washington University, Washington, DC 20052; tDivision of Immunology and Rheumatology, Stanford University School of Medicine, Stanford, CA 94305; uDepartment of Medicine, Palo Alto VA Health Care System, Palo Alto, CA 94304; vDivision of Rheumatology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095; wDivision of Rheumatology, Pritzker School of Medicine, University of Chicago, Chicago, IL 60637; xDivision of Rheumatology, Boston University Medical Center, Boston, MA 02118; yDivision of Rheumatology, NYU Langone Medical Center, New York, NY 10003; zDivision of Allergy, Immunology and Rheumatology, University of Rochester Medical Center, Rochester, NY 14642; aaCenter for Molecular Modeling, Center for Information Technology, National Institutes of Health, Bethesda, MD 20892; bbGenomics Core, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892; and ccNIH Intramural Sequencing Center, National Human Genome Research Institute, Rockville, MD 20852

Contributed by Daniel L. Kastner, November 11, 2019 (sent for review April 17, 2019; reviewed by Mark J. Daly, Steffen Gay, and Robert D. Inman)

Systemic sclerosis (SSc) is a clinically heterogeneous autoim- scleroderma | HLA | autoantibodies | molecular mimicry | mune disease characterized by mutually exclusive autoantibodies directed against distinct nuclear antigens. We examined HLA asso- ciations in SSc and its autoantibody subsets in a large, newly Author contributions: P.G., D.L.K., and E.F.R. designed research; P.G., S.A.S., T.A., S.E.B., recruited African American (AA) cohort and among European D.S., P.J.S., S.C.C., J.C.M., and E.F.R. performed research; P.G., T.A., N.D.M., A.A.S., M.D.M., Americans (EA). In the AA population, the African ancestry- A.D., A.R.B., R.T.D., T.A.M., P.S.R., R.M.S., V.D.S., J.V., V.H., L.A.S., E.S., D.K., J.K.G., B.K., predominant HLA-DRB1*08:04 and HLA-DRB1*11:02 alleles were L.A.C., H.G., C.T.D., E.J.B., S.L.B., V.K.S., K.D.K., L.C., S.K., R.J., M.T., A.G., B.D.K., A.A., C.R., F.M.W., and F.B. contributed new reagents/analytic tools; P.G., S.A.S., T.A., S.E.B., * associated with overall SSc risk, and the HLA-DRB1 08:04 allele and E.F.R. analyzed data; and P.G., S.A.S., A.A.S., M.D.M., A.D., A.R.B., D.S., R.T.D., P.S.R., was strongly associated with the severe antifibrillarin (AFA) anti- R.M.S., V.D.S., J.V., V.H., L.A.S., E.S., D.K., J.K.G., B.K., L.A.C., H.G., C.T.D., E.J.B., S.L.B., body subset of SSc (odds ratio = 7.4). These African ancestry- V.K.S., K.D.K., L.C., S.K., R.J., M.T., A.G., B.D.K., P.J.S., S.C.C., J.C.M., A.A., C.R., F.M.W., predominant alleles may help explain the increased frequency D.L.K., F.B., and E.F.R. wrote the paper.y and severity of SSc among the AA population. In the EA Reviewers: M.J.D., Massachusetts General Hospital Center for Human Genetic Research; S.G., Center of Experimental Rheumatology; and R.D.I., University of Toronto. y population, the HLA-DPB1*13:01 and HLA-DRB1*07:01 alleles Competing interest statement: M.D.M. received consulting fees from Mitsubishi Tanabe, were more strongly associated with antitopoisomerase (ATA) Astellas, Boehringer Ingelheim, Gerson Lehrman Group, Smart Analyst, and Guidepoint and anticentromere antibody-positive subsets of SSc, respec- Global. R.T.D. received consulting fees from Eicos Sciences. D.K. received consulting tively, than with overall SSc risk, emphasizing the importance fees from Actelion, Bristol-Myers Squibb, CSL Behring, Inventiva, EMD Merck-Serono, of HLA in defining autoantibody subtypes. The association of Sanofi-Aventis, GlaxoSmithKline, Corbus, Cytori, UCB, Bayer, Boehringer Ingelheim, and * + Genentech-Roche, and research support from Bayer, Bristol-Myers Squibb, and Pfizer, and the HLA-DPB1 13:01 allele with the ATA subset of SSc in both owns stock or stock options in Eicos Sciences, Inc. (CiViBioPharma, Inc.). L.A.S. received AA and EA patients demonstrated a transancestry effect. A consulting fees from Axon Pharma. H.G. received consulting fees from Pfizer, AbbVie, direct correlation between SSc prevalence and HLA-DPB1*13:01 Actelion, and Horizon Pharmaceuticals. C.T.D. received research support from Gilead, allele frequency in multiple populations was observed (r = Actelion, and Cytori. V.K.S. received research support from AbbVie. E.J.B. received con- sulting fees from Genentech. L.C. received consulting fees or served on the board of −6 0.98, P = 3 × 10 ). Conditional analysis in the autoanti- Reata, Bristol-Myers Squibb, Boehringer-Ingelheim, Eicos, and Mitsubishi Tanabe. S.L.B. body subsets of SSc revealed several associated amino acid and M.J.D. are coauthors on a 2015 research article.y residues, mostly in the peptide-binding groove of the class II This open access article is distributed under Creative Commons Attribution-NonCommercial- HLA molecules. Using HLA α/β allelic heterodimers, we bioin- NoDerivatives License 4.0 (CC BY-NC-ND).y formatically predicted immunodominant peptides of topoiso- 1 To whom correspondence may be addressed. Email: [email protected] or pravitt. merase 1, fibrillarin, and centromere protein A and discovered [email protected] that they are homologous to viral protein sequences from 2 Deceased December 15, 2018.y the and families. Taken together, 3 F.M.W., D.L.K., F.B, and E.F.R. contributed equally to this work.y these data suggest a possible link between HLA alleles, This article contains supporting information online at https://www.pnas.org/lookup/suppl/ autoantibodies, and environmental triggers in the pathogenesis doi:10.1073/pnas.1906593116/-/DCSupplemental.y of SSc. First published December 23, 2019.

552–562 | PNAS | January 7, 2020 | vol. 117 | no. 1 www.pnas.org/cgi/doi/10.1073/pnas.1906593116 Downloaded by guest on September 27, 2021 Downloaded by guest on September 27, 2021 or tal. et Gourh omnSEsl-nies(0.Peec fvrssi SSc in of Presence a (30). with , self-antigens mimicry Epstein–Barr molecular SLE instance, exhibits common virus, For DNA (27–33). human ubiquitous SSc spondy- erythematosus and mellitus, lupus diabetes systemic (SLE), disease, 1 diseases, Graves’ been type autoimmune loarthropathies, has sclerosis, several multiple mimicry of pathogenesis including Molecular acti- the cell (26). in is B proposed self-protein, mimicry to the molecular leading against and called production cells autoantibody helper and thus T vation self-protein, a of from activation peptide sequence microbial causing peptide a the of to sequence homologous acid is mech- amino This the proteins. whereby microbial anism, from derived but homologous are self-antigens that to peptides presenting by or derived self-proteins peptides from presenting by either cells and T autoantigen-specific cells helper T of recognize autoantibodies. activation APCs of causing production the self-peptide on specific sequence a groove antigen-binding cific oatvto fatratv eprclsta lyacru- and a formation, play autoantibody that cells, B cells Thus, of induction. helper activation autoimmunity T in autoreactive role leads cial of (APCs) (25). cells activation class antigen-presenting cells the to via helper on presentation binding molecules T peptide HLA their to II foreign determine or presented self-peptide that antigens Aberrant specific molecules for HLA affinity the of even grooves an (11– have subsets These autoantibody alleles 24). SSc-specific these the within and effect susceptibility, stronger SSc on with influence to SSc of risk tend the to also increased to and leading disease, AFA 7–10). involve- lung (5, skin mortality interstitial and diffuse greater comprising ATA and phenotype ment severe of Compared more a prevalence (4–6). have have survival higher (AA) involvement Americans and and African associated a organ prognosis (EA), exclusive are Americans internal of European and autoantibodies with markers mutually skin These are are of 3). and patterns and (2, distinct SSc with SSc for in autoan- specific common Anticen- the reported (1). are (ARA) tibodies patients antibody III of polymerase anti- RNA 95% and AFA), (ATA), (fibrillarin, to antibody (ANA), antibody I anti-U3-ribonucleoprotein 90 antibodies antitopoisomerase (ACA), in antinuclear antibody SSc tromere observed internal circulating of are of hallmark and A which presence skin mortality. the the and morbidity is of to thickening leading organs, progressive by acterized S oeua iir,die yteitrcino pcfi viral specific of HLA interaction corresponding with the antigens through by driven arise mimicry, may molecular autoantibodies scleroderma-specific and hypothesis that Mimiviridae the suggest from findings Our these sequences families. of Phycodnaviridae protein of homology viral demonstrated with risk pep- and peptides overall self-antigens immunodominant autoanti- the of predicted with with tides bioinformatically than associated We scleroderma strongly SSc. of more subsets much body were autoanti- ancestry-predominant that specific demonstrated and We exclusive African bodies. mutually by of characterized cohort Scleroderma is scleroderma. with ances- large associated the residues American recruited acid define amino European newly comprehensively a we a using Americans, and study, cohort this tral in but, risk, HLA Significance L-eitdpeetto ffrinatgn a activate can antigens foreign of presentation HLA-mediated eei atr,aogwt niomna atr,contribute factors, environmental with along factors, Genetic uedsaeta sciial eeoeeu n schar- is and heterogeneous autoim- clinically systemic is a that is disease SSc) mune (scleroderma, sclerosis ystemic lee aepeiul enipiae ihscleroderma with implicated been previously have alleles HLA lee noevrain nteantigen-binding the in variations encode alleles HLA ee eotdt aetestrongest the have to reported genes HLA α/β lee oigfraspe- a for coding alleles heterodimers. HLA HLA lee and alleles alleles HLA 1). (Table 2.3 of OR an the with SSc, for with associated accounting analysis allele, predominant regression (OR) ratio of conditional odds effect A an with 3.2. SSc-associated, of strongly most the allele, was ancestral African predominantly popu- a and for SSc, II 1 correcting (Table after class stratification SSc independent lation AA with 2 associated controls, significantly 946 and Classical the for S5. and controls S4 Tables and cases and SSc DPA1, frequencies EA allele The and II HLA-B, S3). AA class Table both Appendix, and in (SI I higher class or HLA for 96% the rates using concordance HLA The controls, (37). GRASP and 763 software SSc from both HLA types the individuals, sequence-based compared exome we with cohort, cordance AA the in Classical populations EA and S1). AA stratification Fig. the Appendix, population both (SI for individually for analyses correct association PCs to the 10 covariates in top the as and used performed, were were EA (PCAs) and analyses (PC) AA nent the in shown for is information patients SSc autoantibody S1). and Table database Appendix, gender the (SI The (dbGaP) from EA Phenotypes obtained 723 and controls with Genotypes EA along of 5,347 cohort 946 and and GRASP patients patients the SSc SSc in AA enrolled 662 controls examined AA we SSc, with associated S ainsadPplto Stratification. Population and Patients SSc Results molecular potential suggesting mimicry. families, Mimiviridae the Phycodnaviridae in viruses these and from derived sequence Remarkably, peptides significant molecules. with homology demonstrated HLA peptides these immunodominant beta bind and that self- alpha specific antigens the molecule from derived HLA peptides immunodominant predict of stronger Pairs (HLA even cohorts. chains EA an and SSc, AA of specific subsets and autoantibody Upon AAs. the in observed part, African severity analyzing were in and 2 frequency least which SSc identified at SSc-specific increased genes, explain, we the could common II cohort, that SSc and class alleles AA ancestry-predominant 5 SSc the impute and with In to I autoantibodies. association used class for were 3 systematic 2 ancestries evaluated of from EA conducting alleles Genotypes and for classical large 36). AA SSc a (8, of studies enroll with cohorts genetic to patients comprehensive created AA and was of that Scleroderma number American cohort African (GRASP) in Patients Research obtained were Genome patients the SSc from AA with pathogen- mimicry. SSc molecular peptides promote through might esis foreign which rela- autoantigens, candidate the to homology identified investigated we and SSc, autoantigens, in confers genetic between it tionships risk the genetic in strong factors autoimmu- risk for environmental trigger (33–35). as a SSc of proposed as pathogenesis been acting have infections nity viral and tissues n73E S ae n ,3 otos independent 3 controls, 5,437 and cases SSc EA 723 In ie h oeta motneo the of importance potential the Given lee rsn tmnrall rqec of frequency allele minor at present alleles lsia alleles, classical HLA HLA-C, HLA HLA-DRB1 HLA-DPB1 HLA leeAscain ihSSc. with Associations Allele α/β leeImputation. Allele HLA-DRB1, eeoies eeue obioinformatically to used were heterodimers) leeascainwsosre nbt the both in observed was association allele HLA-DRB1 PNAS ∗ 08:04 rnia compo- Principal S2. Table Appendix, SI HLA-DQB1 ee r rsne in presented are genes HLA | aur ,2020 7, January leeietfidascn African- second a identified allele HLA-DQA1, soitos uoniois and autoantibodies, associations, ∗ oass muainaccuracy imputation assess To hc a independently was which 11:02, .I AA In S2). Fig. Appendix, SI ∗ 02:02, oidentify To | n62A S cases SSc AA 662 In HLA ∗ o.117 vol. HLA-DQB1, M:3all con- allele IMP:03 HLA-DPB1 HLA HLA-DRB1 einadthe and region IAppendix, SI | lee were alleles HLA %were >1% o 1 no. ∗ PRG:LA HLA-A, ∗ ∗ alleles 13:01, HLA- 08:04 HLA | 553

MEDICAL SCIENCES Table 1. Logistic regression and conditional analysis of HLA classical alleles in AA SSc Frequency (%)† Unconditioned Conditioned

HLA allele (SSc/Ctrls) OR (95% CI) P value OR (95% CI) P value

All SSc vs. controls HLA-DRB1*08:04 24.3/9.3 3.2 (2.4-4.2) 3.26 × 10−16 3.2 (2.4-4.2) 3.26 × 10−16‡ SSc = 662; control = 946 HLA-DQB1*03:19 18.4/8.8 2.4 (1.8-3.2) 2.45 × 10−8 HLA-DQB1*03:01 37/25.8 1.8 (1.4-2.2) 1.41 × 10−6 HLA-DRB1*07:01 11.5/20.0 0.6 (0.4-0.7) 2.72 × 10−6 HLA-DQA1*02:01 11.5/20.0 0.6 (0.4-0.7) 3.18 × 10−6 HLA-DRB1*11:02 13.6/7.1 2.2 (1.6-3.0) 9.39 × 10−6 2.3 (1.6-3.2) 1.84 × 10−6 HLA-DPA1*02:01 62.1/51.5 1.6 (1.3-1.9) 3.20 × 10−5 HLA-DPB1*13:01 16.9/9.7 1.9 (1.4-2.6) 3.21 × 10−5

AFA+ SSc vs controls HLA-DRB1*08:04 42.6/9.3 7.4 (4.9-11.3) 2.61 × 10−19 7.4 (4.9-11.3) 2.61 × 10−19‡ SSc = 129; control = 946 HLA-DQB1*06:09 20.9/6.6 3.8 (2.3-6.3) 1.37 × 10−6 4.1 (2.4-7.0) 2.04 × 10−6 HLA-DQB1*03:01 45.0/25.8 2.5 (1.7-3.6) 9.16 × 10−6 HLA-DQB1*03:19 22.5/8.8 3.0 (1.9-4.9) 2.53 × 10−5 HLA-DRB1*13:02 29.5/13.8 2.6 (1.7-4.0) 3.70 × 10−5

ATA+ SSc vs. controls HLA-DPB1*13:01 30.6/9.7 4.3 (2.9-6.3) 2.35 × 10−12 4.3 (2.9-6.3) 2.35 × 10−12‡ SSc = 183; control = 946 HLA-DQB1*02:01 7.7/25.1 0.3 (0.2-0.5) 1.10 × 10−8 0.3 (0.2-0.5) 2.17 × 10−7 HLA-DRB1*03:01 3.3/14.9 0.2 (0.1-0.5) 4.92 × 10−7 HLA-DQB1*03:01 43.7/25.8 2.3 (1.7-3.1) 2.64 × 10−6 HLA-DPA1*02:01 69.9/51.5 2.3 (1.6-3.2) 2.73 × 10−6 HLA-DRB1*08:04 21.3/9.3 2.8 (1.8-4.2) 8.13 × 10−6 HLA-DQA1*05:01 51.4/39.1 1.6 (1.2-2.2) 2.90 × 10−3 2.1 (1.5-3.0) 2.21 × 10−5

ARA+ SSc vs. controls None significant SSc = 119; control = 946

ACA+ SSc vs. controls None significant SSc = 64; control = 946

Independent associations by conditional regression analyses are shown in bold. †Frequency of individuals with 1 or 2 alleles. ‡Unconditioned; common AA haplotype: HLA-DRB1*08:04/DQA1*05:01/DQB1*03:01.

∗ and HLA-DRB1 11:04, were significantly associated with SSc HLA classical alleles were statistically significantly associated (Table 2 and SI Appendix, Fig. S2). The most significantly asso- with SSc in 119 ARA+ AA SSc patients, nor in 64 ACA+ ∗ ciated allele, HLA-DQB1 02:02, was disease-protective, with an AA SSc. ∗ ∗ AFA data were not reported for the EA SSc patients in OR of 0.5, whereas HLA-DPB1 13:01 and HLA-DRB1 11:04 + were disease risk alleles, with ORs of 2.6 and 2, respec- dbGaP, so we were unable to evaluate the AFA subset in the EA SSc patients. In 115 ATA+ EA SSc patients, HLA- tively (Table 2). Conditioning on the final HLA model for ∗ ∗ each ancestral population, neither the classical HLA alleles DPB1 13:01 and HLA-DRB1 11:04 were independently associ- + HLA ated in the EA ATA SSc patients (Table 2). Also, as seen nor the region single nucleotide variants remained sig- ∗ nificant at the statistical threshold (SI Appendix, Figs. S3 and in AAs, the association of HLA-DPB1 13:01 in ATA+ SSc S4). None of the class I HLA alleles showed any statisti- subset was much stronger (OR = 13.7, P = 1.47 × 10−24) cally significant independent association with SSc in either the than its association with overall SSc in EAs (OR = 2.6, AA or EA cohort. P = 1.75 × 10−8; Table 2). Interestingly, as in the ARA+ SSc subset in AAs, none of the classical HLA alleles were sta- Classical HLA Allele Associations within SSc Autoantibody-Positive tistically significantly associated with the ARA+ SSc subset in ∗ Subsets. SSc has highly specific and mutually exclusive autoan- EAs. In 238 ACA+ EA SSc, HLA-DRB1 07:01, associated with tibodies, and thus we hypothesized that the HLA allelic associ- SSc protection, was the most statistically significantly associ- ations would be stronger within the SSc-specific autoantibody ated allele, with OR = 0.1 and P = 4.79 × 10−20 (Table 2 subsets. We tested this hypothesis by stratifying the SSc sam- ∗ and SI Appendix, Fig. S2). HLA-DRB1 07:01, part of the HLA- ples into subsets of SSc-specific autoantibody-positive patients ∗ ∗ ∗ and evaluating association of HLA alleles. In 129 AFA+ AA DRB1 07:01/DQA1 02:01/DQB1 02:02 haplotype, is in strong ∗ ∗ SSc patients, the OR for HLA-DRB1 08:04 increased from 3.2 linkage disequilibrium (LD) with HLA-DQB1 02:02 (r2 = 0.95), ∗ (P = 3.26 × 10−16) in overall SSc to 7.4 (P = 2.61 × 10−19) and likely explains the association of HLA-DQB1 02:02 with in the AFA+ subset (Table 1). Although not detected in over- overall SSc. ∗ all SSc, HLA-DQB1 06:09 was independently associated in the ∗ ∗ ∗ AFA+ SSc subset. In 183 ATA+ AA SSc, HLA-DPB1 13:01, HLA-DPB1 13:01 and SSc. HLA-DPB1 13:01 was identified as a ∗ ∗ + HLA-DQB1 02:01, and HLA-DQA1 05:01 were independently strong risk allele in both the AA and EA ATA SSc; there- ∗ associated with ATA+ SSc (Table 1). The association of HLA- fore, we examined whether HLA-DPB1 13:01 was enriched in ∗ ∗ DPB1 13:01 with ATA+ SSc was particularly strong (OR = the ATA+ SSc subset exclusively. In AAs, HLA-DPB1 13:01 was 4.3) as compared to overall SSc (OR = 1.9). None of the present in 11.7% of ATA− SSc and 9.7% of controls, which was

554 | www.pnas.org/cgi/doi/10.1073/pnas.1906593116 Gourh et al. Downloaded by guest on September 27, 2021 Downloaded by guest on September 27, 2021 † fteeascain ihteeScseicatatbde (SI S8 autoantibodies and the SSc-specific S7 these Tables specificity in Appendix, with the associations observed these highlighting not of subsets, were strong SSc the subsets autoantibody-negative with SSc independent consistent autoantibody-positive was other This The SSc. of AA association AFA the the in both in subsets associated significantly subset- tistically the independent with association specific for subsets ATA autoantibody-negative the in Subsets. Classical ACA ARA or tal. et Gourh ATA AFA controls vs. SSc All of analysis conditional and regression Logistic 2. Table ATA EAs, ATA in of Similarly 30.6% it. whereas different, not statistically ‡ aeahge niec n rvlneo S,w etexplored next we SSc, and prevalence of SSc prevalence AAs and that incidence fact as higher the population a Given control population. have AA control the EA in the to frequency compared higher a had also but HLA-DPB1 ewe S rvlnei n ie ouainadthe and population given any DPB1 correlation in direct prevalence a SSc observed We between world. the around populations IAppendix , (SI 0.81 remained prevalence coefficient S5 highest Fig. correlation the the with SSc, population of Choctaw the removing P nodtoe;Cmo Ahpoye:HLA-DRB1 haplotypes: EA Common Unconditioned; rqec fidvdaswt r2alleles. 2 or 1 with individuals of Frequency S 3;cnrl=5,437 = control 238; = SSc 5,437 = control 123; = SSc S 1;cnrl=5,437 = control 115; = SSc 5,437 = control 0; = SSc 5,437 = Control 723; = SSc 1.8 = needn soitosb odtoa ersinaaye r hw nbold. in shown are analyses regression conditional by associations Independent + + + + + S s controls vs. SSc S s controls vs. SSc S s controls vs. SSc controls vs. SSc ∗ ). 13:01 S n ny38 fATA of 3.8% only and SSc HLA × nosrigteercmn of enrichment the observing On 10 ∗ 13:01 rqec,wt orlto ofceto .8and 0.98 of coefficient correlation a with frequency, leeAscain ihnScAutoantibody-Negative SSc within Associations Allele + − 6 HLA-DRB1 S ust esseaial xmndthe examined systematically we subset, SSc Fg and 1 (Fig. a o nyerce nteATA the in enriched only not was HLA-DPB1 HLA-DPB1 HLA HLA ∗ ). alleles. 08:04 .Ee after Even ). S6 Table Appendix, SI HLA-DRB1 HLA-DQB1 HLA-DQA1 HLA-DRB1 HLA-DQA1 HLA-DQB1 HLA-DQB1 HLA-DQA1 HLA-DRB1 HLA-DPA1 HLA-DPA1 HLA-DRB1 HLA-DRB1 HLA-B HLA-DRB1 HLA-DPB1 HLA-DQA1 HLA-DRB1 HLA-DPB1 HLA-DQB1 o tested Not oesignificant None ∗ ∗ 13:01 soitosietfidi the in identified associations HLA 13:01 noealA S patients. SSc AA overall in − * HLA-DRB1 44:03 allele S n .%o controls. of 3.3% and SSc leefeunyi several in frequency allele * * * * * * * * * * * * * * * * * * a rsn n3.%of 32.3% in present was * 08:01 01:01 01:01 07:01 13:01 13:01 07:01 01:03 02:01 11:04 11:04 03:03 02:02 05:01 02:02 01:01 02:01 02:01 04:01 * 11:04/DQA1 HLA-DPB1 ∗ 08:04 rqec (%) Frequency + + + ScCrs R(5 CI) (95% OR (SSc/Ctrls) S carried SSc 242. . 1830 4.21 (1.8-3.0) 2.3 42.4/22.3 021. . 0406 3.55 (0.4-0.6) 0.5 10.2/18.0 n AFA and 483. . 1839 4.65 8.70 (1.8-3.9) 2.7 (2.0-4.2) 2.9 54.8/32.6 48.7/26.3 451. . 1730 1.32 7.08 (1.7-3.0) 2.2 (1.7-2.9) 2.2 34.5/17.3 47.5/26.7 371. . 1218 4.39 (1.2-1.8) 1.5 1.04 6.06 23.7/17.3 (0.4-0.7) 0.6 (0.4-0.7) 0.5 15.8/24.3 15.1/23.7 435427(.-.)4.18 (1.8-4.0) 2.7 14.3/5.4 ./3701(.502 4.79 (0.05-0.2) 0.1 3.4/23.7 524763(.-00 8.62 (3.9-10.0) 6.3 25.2/4.7 9.25 (1.7-2.9) 2.2 10.5/4.7 22331. 892.)1.47 (8.9-21.0) 13.7 32.2/3.3 184626(.-.)4.97 (1.7-3.9) 2.6 11.8/4.6 2.44 4.85 (0.1-0.3) 0.1 (0.1-0.2) 0.1 2.9/18.0 4.6/14.5 S subset SSc ./. . 1935 1.75 (1.9-3.5) 2.6 8.3/3.3 ./. . 0105 6.29 (0.1-0.5) 0.2 2.1/8.9 ./. . 0307 4.72 (0.3-0.7) 0.5 5.1/9.7 HLA * a sta- was 05:01/DQB1 ∗ HLA- 13:01 lsia lee nE SSc EA in alleles classical − † * 30 n HLA-DRB1 and 03:01 i.1. Fig. .I h ATA AFA the the In in and S6A). risk 45 Fig. SSc positions Appendix , with 71 aa (SI associated LD, position subset HLA-DQB1, tight independently aa In in were S6A). by are 86 Fig. followed which Appendix, association, 189, (SI and strongest 74 the of AFA positions showed each (aa) the acid for In amino analysis subsets. II association tibody Subsets. acid class Autoantibody amino the SSc with performed Associations We Residue Acid Amino ouainfeunyof frequency Population Unconditioned HLA * 07:01/DQA1 PNAS P ee nteA n AScautoan- SSc EA and AA the in genes au R(5 CI) (95% OR value × × × × × × × × × × × × × × × × × × × × 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 | − − − − − − − − − − − − − − − − − − − − HLA-DPB1*13:01 * 14 18 20 12 24 aur ,2020 7, January 6 9 5 6 7 9 7 8 8 8 5 6 8 9 9 02:01/DQB1 + AScsbe,HLA-DRB1 subset, SSc AA 37(.-10 1.47 (8.9-21.0) 13.7 . 00-.)4.79 (0.05-0.2) 0.1 . 401.)1.59 (4.0-10.6) 6.5 . 1840 6.67 (1.8-4.0) 2.7 2.93 (1.5-2.6) 2.0 . 1527 1.04 (1.5-2.7) 2.0 . 1936 1.04 (1.9-3.6) 2.6 . 0406 3.55 (0.4-0.6) 0.5 * 02:02. | leeadScprevalence. SSc and allele Conditioned o.117 vol. + AScsubset, SSc AA | o 1 no. P value × × × × × × × × 10 10 10 10 10 10 10 10 | − − − − − − − − 555 20 24 9 11 6 6 8 8 + ‡ ‡ ‡

MEDICAL SCIENCES ∗ ∗ HLA-DPB1 aa position 76, HLA-DQB1 aa positions 45 and 57, 5.4% carried HLA-DRB1 08:04−/HLA-DQB1 06:09−/HLA- ∗ and HLA-DQA1 aa position 34 were independently contribut- DRB1 13:04+ (Fig. 3A and SI Appendix, Fig. S8B). Taken ing toward SSc risk (SI Appendix, Fig. S6A). Interestingly, aa together, the susceptibility HLA alleles account for 64% of position 45 on HLA-DQB1 was important for both the AFA+ ATA+ SSc patients and 62% of AFA+ SSc patients in the AA and ATA+ SSc subsets in the AA population (SI Appendix, population. Similar analysis performed in the EA ATA+ SSc sub- + ∗ ∗ Fig. S6A). In the ACA EA SSc subset, HLA-DRB1 aa posi- set identified HLA-DPB1 13:01+ (32.2%), HLA-DPB1 13:01−/ ∗ ∗ tions 60, 16, 13, and 180, HLA-DQB1 aa positions 135 and HLA-DRB1 11:04+ (20.9%), and HLA-DPB1 13:01−/HLA- 74, and HLA-DQA1 aa position 47 were independently asso- ∗ − ∗ + ciated with SSc (SI Appendix, Fig. S6B). In the ATA+ EA DRB1 11:04 /HLA-DQA 03:01 (4.3%), accounting for SSc subset, HLA-DPB1 aa position 76 was strongly associated 53% of patients with risk alleles (Fig. 3B and SI Appendix, Fig. S9A). Fifty-four percent of EA ACA+ SSc patients were with SSc risk, similar to in the AA population. HLA-DRB1 ∗ ∗ accounted for by HLA-DRB1 07:01−/HLA-DQB1 05:01+ aa positions 58 and 67 were also independently contributing ∗ ∗ toward SSc risk in the ATA+ subset in EAs (SI Appendix, (42%) and HLA-DRB1 07:01−/HLA-DQB1 05:01−/HLA- ∗ Fig. S6B). We highlighted these independently associated aa DQA1 04:01+ (11.7%) (Fig. 3B and SI Appendix, Fig. S9B). residues in 3-dimensional (3D) ribbon models of HLA-DRβ, The ORs from the CART analysis were comparable to HLA-DQα, HLA-DQβ, and HLA-DPβ with a direct view of the multivariate logistic regression analyses shown in the peptide-binding groove. All of the above-mentioned SSc- Tables 1 and 2. associated amino acids were part of the peptide-binding groove of class II HLA molecules, except for HLA-DRβ aa posi- HLA Molecule α and β Chain-Pair Associations with SSc Autoan- tions 180 and 189 and HLA-DQβ aa position 135 (Fig. 2 and tibody Subsets. The HLA alleles encode for class II HLA SI Appendix, Fig. S7). molecules that are composed of an alpha and a beta chain, and the resulting 3D structure defines the nature of the peptides Classification and Regression Tree. We used an established that are effectively bound. We performed association analysis of exploratory method (Classification and Regression Tree the HLA haplotypes for HLA-DQA1/DQB1, HLA-DPA1/DPB1, [CART]) as an alternative approach to identify interactions and HLA-DRA1/DRB1 pairs within SSc autoantibody subsets to among the classical HLA alleles in the autoantibody subsets identify HLA α/β heterodimers. In the AAs, conditional regres- α β in the AA and EA populations. The alleles partitioning out sional analysis identified 2 HLA / heterodimers independently associated with the AFA+ SSc subset and 3 HLA α/β het- higher in the decision tree suggest greater importance than the + ones lower in the tree. In the ATA+ AA SSc subset, 30.6% erodimers associated with the ATA SSc subset (Table 3). In ∗ the EAs, conditional regressional analysis identified 2 HLA α/β carried HLA-DPB1 13:01, and, furthermore, 2 higher-order heterodimers associated with the ATA+ SSc subset and 2 α/β ∗ − HLA allelic interactions were identified. HLA-DPB1 13:01 / heterodimers associated with the ACA+ SSc subset (Table 3). ∗ + HLA-DQB1 02:01 was seen in 6% of patients, and HLA- ∗ ∗ + ∗ ∗ ∗ HLA-DRA1 01:01/DRB1 07:01 was protective for ACA SSc − − + ∗ ∗ DPB1 13:01 /HLA-DQB1 02:01 /HLA-DQA1 05:01 was subset in the EAs, and HLA-DQA1 05:01/DQB1 02:01 was pro- seen in 33.3% of patients (Fig. 3A and SI Appendix, Fig. S8A). + ∗ tective for ATA SSc subset in the AAs. The other 7 indepen- In the AFA+ AA SSc subset, 42.6% carried HLA-DRB1 08:04; HLA α β ∗ ∗ dently associated / heterodimers were all associated with 14% carried HLA-DRB1 08:04−/HLA-DQB1 06:09+, and increased SSc risk.

A HLA-DR⍺ HLA-DQ⍺ 34 HLA-DP⍺ P P 10 10 log - -log 57 189 86 71 P 10 log - - 76 HLA-DRβ HLA-DQβ HLA-DPβ 74 45

Amino acid posion in HLA-DPB1 B HLA-DR⍺ HLA-DQ⍺ HLA-DP⍺ P 47 10 -log

13 60 180

58 76 67 135 74 HLA-DRβ HLA-DQβ HLA-DPβ 16

Fig. 2. Ribbon model of the HLA-DR, HLA-DQ, and HLA-DP proteins with independently associated amino acid residues, based on PDB ID codes 6atf, 1s9v, and 3lqz, respectively. (A) Scleroderma-associated aa positions in AAs; (B) Scleroderma-associated aa positions in EAs.

556 | www.pnas.org/cgi/doi/10.1073/pnas.1906593116 Gourh et al. Downloaded by guest on September 27, 2021 Downloaded by guest on September 27, 2021 or tal. et Gourh sequence were protein sequences homologous microbial of with hundreds were Several homology S9) peptide Table databases. for Appendix , (SI immunodominant I compared mimicry. topoisomerase molecular predicted from by sequences induced bioinformatically be may The SSc autoanti- the in whether observed assess bodies to sequences protein microbial and sequences peptide immunodominant predicted bioinformatically Mimicry. Molecular S11B). and S11A, S10, Tables Appendix, (SI subsets autoantibody multiple to predicted the bioinformatically bind were that CENPB on regions 7 and ACA the in and, AFA fibrillarin the multi- immun- in for peptides the search similar odominant A bind S9). Table that Appendix, (SI I populations topoisomerase of regions ple 3 on immunodominant peptides identified bioinformatically We selected. were bind- of a affinity had that ing sequences Peptide A (38). protein [CENPA/CENPB]) centromere B or and fibrillarin, I, (topoisomerase antigens pep- the restricted using HLA tides II class identify We to molecules. 3.2 HLA NetMHCIIpan these utilized by that presented peptides and of recognized source being the are as act self-antigens nuclear whether of Prediction. Peptide Immunodominant A 33.3% 38.0% 5.4% 30.1% HLA AAA SSc ATA+ AA AA AFA+ SSc AFA+ AA Total=183 risk Total=129 i.3. Fig. 14.0% α/β HLA 0 Mt associated 2 to nM <500 i hrso neednl soitdclassical associated independently of charts Pie HLA eeoiesi ATA in heterodimers † S aegop(n) group case SSc of analysis conditional and subsets regression autoantibody Logistic 3. Table AATA AA AFA AA AACA EA ATA EA et eepoe oooybtenthe between homology explored we Next, infiac pncniinn ntpassociated top on conditioning upon Significance α/β 42.6% 30.6% S 2;cnrl=946 = control 129; = SSc S 1;cnrl=5437 = control 115; = SSc S 3;cnrl=5,437 = control 239; = SSc 946 = control 183; = SSc 6.0% risk eeoiesadtecreae S self- SSc correlated the and heterodimers 82.4% + + + + + S s controls vs. SSc S s controls vs. SSc S s controls vs. SSc S s controls vs. SSc 47.2% α/β ust ile eino CENPA on region 1 yielded subset, AControls AA AControls AA + eeoiesfrterespective the for heterodimers Total=946 Total=946 ustietfid5rgoson regions 5 identified subset eepoe h possibility the explored We 9.3% 9.7% + HLA S nte2ancestral 2 the in SSc 5.9% 19.7% 2.4% 23.4% α/β heterodimers HLA DPA1 DRA1 DQA1 DRA1 DRA1 DPA1 DQA1 DQA1 DQA1 α/β * * * * * * * * * 02:01/DPB1 02:01/DPB1 01:01/DRB1 01:01/DRB1 01:01/DRB1 HLA 01:02/DQB1 01:01/DQB1 05:01/DQB1 05:01/DQB1 heterodimer lee.Dt rmATA from Data alleles. α/β heterodimer(s). * * * * * 42.9% 42.6% a nieyt cu ycac.T etti yohsseven hypothesis this test To chance. (SI by sequences occur to CENPB unlikely the was of any for S11B). Table found Appendix, and were and Mimiviridae matches 4 the respec- 0.01, (Fig. in and 0.004 tively of NCLDV values E from with families Phycodnaviridae sequences protein (E to matches and homologous were “GRDLINLAKKRTNII” “LQEAAEAFLVHLFED” high-quality sequence sequence CENPA had Fibrillarin S11A). CENPA, and fibrillarin, in in one sequence value immun- peptide another predicted one and bioinformatically only of peptides, com- On odominant dozens database. sequence several for protein CENPB paring viral and the CENPA, within immun- fibrillarin, homology predicted from bioinformatically peptides the odominant examined 3.0 we and of findings, 4 value the had (Fig. E of and homology Hokovirus part clade, significant family, extremely (NCLDV) Mimiviridae an virus the DNA in large viruses nucleocytoplasmic of from value were E the tides an in at matches database high-quality viral had an “RQRAVALYFIDKLAL,” at even I, sequences bacterial of There value with proteins. E observed I homology topoisomerase no fungi was and human value between (E ties fungi in identified B * ∗ * * 13:01 13:01 08:04 07:01 11:04 06:09 03:19 02:01 05:01 hs ihysgicn aussgetta h homology the that suggest values E significant highly These 4.3% HLA < AAA SSc ACA+ EA AAA SSc ATA+ EA alsS10 Tables Appendix, (SI database viral the in 0.05) Total=238 Total=115 11.7% .Rmral,ol n eunei topoisomerase in sequence one only Remarkably, <1. α/β + 3.4% R(5 CI) (95% OR . (3.2-7.1) 4.8 . 491.)2.6 (4.9-11.3) 7.4 . (2.6-7.9) 4.6 57(012.)1.2 (10.1-24.2) 15.7 . 00-.)4.8 2.9 (0.05-0.2) 0.1 (3.9-10.4) 6.4 . (2.0-5.5) 3.3 (0.1-0.5) 0.2 . (1.5-2.6) 2.0 AFA , eeoiesi SSc in heterodimers + 42.0% n ACA and , 32.2% 20.9% .N high-quality No ). S12 Table Appendix, SI PNAS 53.1% 61.7% .Gvnthese Given S12). Table Appendix, SI | + aur ,2020 7, January EAs. (B) and AAs (A) in subsets < AControls EA .5 u oetniesimilari- extensive to due 0.05) .5 hs oooospep- homologous These <0.05. Total=5437 AControls EA Total=5437 P 8.4 4.0 1.8 1.6 5.3 value 3.3% × × × × × × × × × 10 10 10 10 10 10 10 10 10 4.5% − − − − − − − − − 23.6% | 19 7† 25 20 11† 14 6† 5† 6† o.117 vol. 4.1% 19.2% 30.5% × | o 1 no. 10 − 6 | with 557

MEDICAL SCIENCES A ATA+ SSc Subset C Ancestry Pepde Affinity (nM) Posion Sequence DPA1*02:01/DPB1*13:01 DRA1*01:01/DRB1*11:04 EA 473 RQRAVALYFIDKLAL 264.2 340.3

Posion Sequence DPA1*02:01/DPB1*13:01 DQA1*05:01/DQB1*02:01 AA 473 RQRAVALYFIDKLAL 264.2 446.4

B Topoisomerase I Source % Homology Sequence Immunodominant Epitope 100 RQRAVALYFIDKLAL Hokovirus HKV1 93 RQIAVALYFIDKLAL Megavirus chiliensis 80 R Q I A T A L Y F I D K F A L Megavirus vis 80 RQIATALYFIDKFAL Powai lake megavirus 80 R Q I A T A L Y F I D K F A L Klosneuvirus KNV1 80 R Q I A T A L Y F I D K F A L Catovirus CTV1 80 R Q I A T A L Y F I D K F A L

D AFA+ SSc Subset F Ancestry Pepde Affinity (nM) Posion Sequence DRA1*01:01/DRB1*08:04 DRA1*01:01/DRB1*13:02@ AA 197GRDLINLAKKRTNII 81.2 304.3 E Fibrillarin Source % Homology Sequence Immunodominant Epitope 100 GRDL INLAKKRTNI I Acanthamoeba polyphaga moumouvirus 65 D L I N L A K K I-- --N N I I Saudi moumouvirus 65 D L I N L A K K I-- --N N I I Moumouvirus Monve 65 D L I N L A K K I-- --N N I I

G ACA+ SSc Subset (CENPA) I Ancestry Pepde Affinity (nM) Posion Sequence DRA1*01:01/DRB1*07:01 DRA1*01:01/DRB1*01:01 EA 94LQEAAEAFLVHLFED 445.81 299.55

H CENPA Source % Homology Sequence Immunodominant Epitope 100 LQEAAEAFLVHL FED Dishui lake phycodnavirus 1 80 L Q E A A E A Y L T S L F E D

Exact match Conservave amino acid substuon

Fig. 4. Bioinformatically derived immunodominant peptides and homologous viral protein identification. (A) Predicted immunodominant peptides in topoisomerase I protein, (B) peptide sequences from microbial proteins homologous to topoisomerase I sequence, (C) 3D ribbon model of topoisomerase I with the identified immunodominant peptide in pink, (D) predicted immunodominant peptides in fibrillarin protein, (E) peptide sequences from microbial proteins homologous to fibrillarin sequence, (F) 3D ribbon model of fibrillarin protein with the identified immunodominant peptide in pink, (G) predicted immunodominant peptides in CENPA, (H) peptide sequences from microbial proteins homologous to CENPA sequence, and (I) 3D ribbon model of CENPA protein with the identified immunodominant peptide in pink. (These structures are based on PDB ID codes 1a35 for topoisomerase I, 2ipx for fibrillarin, and 3nqu for CENPA; @ in LD with DQA1*01:02/DQB1*06:09.)

more rigorously, we contrasted the E values from randomly gen- any viruses from the Mimiviridae or Phycodnaviridae families. erated peptides to act as negative comparators. One hundred The topoisomerase I immunodominant peptide “RQRAVALY- randomly generated 15-mer peptide sequences were compared FIDKLAL” is part of the catalytic domain of the topoisom- to the viral sequence database for homology. None of these erase I enzyme and is highlighted in pink on the 3D structure sequences matched any viruses from the Mimiviridae or Phy- (Fig. 4C). The fibrillarin and CENPA immunodominant pep- codnaviridae families (SI Appendix, Tables S13A and S13B). tides are also highlighted on their respective protein structures Additionally, an arbitrary 15-mer peptide selected from serum (Fig. 4 F and I). Next, we compared the “RQRAVALYFID- albumin residues 152 to 166, to act as a negative control for KLAL” sequence in topoisomerase I, “GRDLINLAKKRTNII” self-antigen peptides (39), did not show any homology with sequence in fibrillarin, and “LQEAAEAFLVHLFED” sequence

558 | www.pnas.org/cgi/doi/10.1073/pnas.1906593116 Gourh et al. Downloaded by guest on September 27, 2021 Downloaded by guest on September 27, 2021 or tal. et absence Gourh The explored. its further be and to pattern, need staining will nuclear SSc speckled to the a relevance of with any of individuals association in in the identified that not seems analysis, but it further population On subsets. AA autoantibody SSc-specific the examined in SSc this of all to association exception strong An the the ORs. in was significant increased were with SSc, subsets overall autoantibody we in significant hypothesis, statistically this not subset-specific Supporting autoantibody severalfold. identified risk the increased the of SSc in link environmental Phycodnaviri- potential and a Mimiviridae pathogenesis. homology suggesting the significant in families, had viruses dae from CENPA proteins and to fibrillarin, I, somerase topoi- on peptides immunodominant predicted bioinformatically identify- ancestry African patients an SSc AA of HLA-DRB1 study alleles, genetic ancestry-predominant African largest ing the is This Discussion the are 45). (40, ACA which S15B) Table the population, for Japanese les and the populations Chinese in to and affinity Japanese significant pre- the with was bind CENPA to in dicted sequence “LQEAAEAFLVHLFED” Likewise, (40–44). the A) S15 Table Appendix , (SI populations Indian and populations, Iranian and sig- tion, with bind populations, to to Thai predicted “RQRAVALYFIDKLAL” affinity was the I nificant topoisomerase populations, in these sequence in subset S rl ntectgr fcasII class of category the in II firmly class SSc I of class no independently were tified there diseases Notably, autoimmune (46–50). other well several and as for 0.1, protective of be ACA to OR the reported an in with effect subset protective SSc extremely an confers lotype the that show the of ATA association the strong SSc, very con- in a of and allele effect the patients unreported of previously SSc independent A AA 7.4-fold. the of HLA-DQB1 in risk common a is fers that AFA with (40– the manuscripts examining published from Upon Indian selected 45). Choctaw and were Mexican, populations Iranian, Turkish, SSc-associated Thai, Chinese, by AA recognized the also HLA in were identified patients HLA were SSc EA that and viruses Phycodnaviridae Mimiviridae and to Other homology with peptides in immunodominant dicted Peptides Immunodominant Ancestries. Predicted Bioinformatically of to S14C). values unique and respectively, were proteins, E database CENPA sequences with and protein peptide fibrillarin, human I, these the topoisomerase that within discovered homology and for CENPA in nitrsigosraini u td a h enrichment the was study our in observation interesting An HLA-DRB1 lee nScptet fohracsre.SSc-associated ancestries. other of patients SSc in alleles lee nteACA the in alleles HLA + usto S htdsly rnacsr fet We effect. transancestry a displays that SSc of subset eeaie hte h iifraial pre- bioinformatically the whether examined We ∗ ∗ htices S ik edmntaethat demonstrate We risk. SSc increase that 11:02, ofr iko .-odi h AFA the in 4.1-fold of risk a confers 06:09, HLA-DRB1 lee nScseicatatbd ust that subsets autoantibody SSc-specific in alleles ∗ + 11:04 HLA-DRB1 S14B, S14 A, Tables , Appendix (SI <0.05 HLA , Appendix (SI populations these in subset HLA-DRB1 and + ∗ allele, 07:01/DQA1 n ATA and HLA HLA-DRB1 HLA-DPB1 HLA-DRB1 HLA-DRB1 ∗ HLA-DRB1 HLA-DRB1 HLA 08:02 ∗ HLA-DPB1 15:02 ikallsfrteATA the for alleles risk HLA-DRB1 + leeascain placing association, allele HLA nteMxcnpopula- Mexican the in ∗ ust nteJapanese, the in subsets HLA 02:01/DQB1 ∗ nteJpns and Japanese the in ∗ HLA-DRB1 ∗ ∗ eas report also We 08:04. HLA-DQB1 13:01 16:02 11:02 ∗ ∗ ies.Lsl,the Lastly, disease. sassociated is 08:04, 11:02 HLA-DQB1 ∗ lee ht while that, alleles HLA 13:01 ∗ HLA nteChoctaw the in 07:01 nteTurkish the in on nover- in found a stronger was lee iden- alleles ∗ ∗ leewith allele 02:02 08:04 ikalle- risk ∗ a been has + 05:01 subset, ∗ + 03:01 hap- and EA in + n,adteall speeti hr fteATA the transances- of This third of ancestry. a effect European try in or present African is of allele irrespective the and ing, ARA of pathogenesis the in unlikely. whatsoever is role no playing rvlneo S 5,5) nbt h AadE popu- EA and AA the both In 52). (51, lations, SSc of frequency of have carrier prevalence patients population SSc reported American the highest native the Choctaw only The not positivity 52). ATA (51, 95% patients, set with SSc phenotype, American 65% uniform and native very a Choctaw have the who in reported been ARA non-HLA only the that in possibility risk a SSc still increase is of aggres- There stratification or ciations. further crisis, Perhaps renal ARA involvement. the SSc skin of diffuse characterized presence sive phenotypes association, SSc cancer of by collection diverse ARA a the ARA represent Instead, that ATA entity. possible the neous than is size ARA it sample the thus inadequate larger However, could to a association. leading had This an size, subset detect populations. sample to small EA power the or statistical to AA due be the that possibly AA intriguing either is the in It in S2). associations ACA Table of of Appendix, ARA because (SI frequency the power low the inadequate is since to patients size, SSc due sample likely small was ACA the AAs the the in in association set significant statistically any of atppie htwr eonzdb multiple by recognized immunodomi- were predicted that bioinformatically peptides presented peptides We are nant of APCs, cells. on source helper molecules T likely HLA to the we a to study, as bound once this self-antigens II that, In proposed class (1). thus the patients of and SSc role of the 95% explored in seen are antigens SSc the of subsets. 53–64% these for of each account in which cases populations, EA and AA II analysis, (AFA CART class subsets Using the class autoantibody receptor. the of multiple cell identified of interaction T we structure the the with the modifying molecule altering or HLA in molecule role HLA peptide-binding a result- II helper the play ultimately T outside might activation, residues to cell groove The lead B autoimmunity. cells, turn, in pep- helper ing in specific T and, of to activation recognition presentation cell APC on to that, leading the tides residue of be outside groove might 189 peptide-binding position changes at These serine groove. with to LD peptide-binding perfect specific in in was was 74, and position groove at peptide-binding leucine, the acid Amino groove. increase peptide-binding an II to class leading the cells, helper T are risk. to SSc that in APCs peptides possi- specific by is recognize It presented to populations. EA groove HLA-DPβ and peptide-binding in the AA statistically that the playing only both ble the be in was isoleucine may aa HLA-DPβ 76 significant allele position the aa this analyses, in tion role Interestingly, the distinct pathogenesis. around populations SSc a various suggests in prevalence world, significant. SSc of statistically with correlation not direct quency and the with controls along the This, to similar was sets nioisdrce oaddfeetncero uloa self- nucleolar or nuclear different toward directed Antibodies HLA-DPB1 eietfidsvrlScascae arsde nalof all in residues aa SSc-associated several identified We HLA-DPB1 HLA-DPB1 + + HLA-DPB1 ustwudyedsaitclysignificant statistically yield would subset S ustddntyedaysaitclysignificant statistically any yield not did subset SSc HLA-DPB1 HLA ∗ 13:01 ∗ 13:01 ee,adms fte eepr fthe of part were them of most and genes, ∗ PNAS soito ihATA with association ∗ 13:01 13:01 lee(5)btas h ihs reported highest the also but (45%) allele HLA ∗ apsto 6ioecn oie the modifies isoleucine 76 position aa 13:01 | rqec nteATA the in frequency aur ,2020 7, January + are rqec nteATA the in frequency carrier lee o aho h SSc-specific the of each for alleles ATA , + nAsadEshspreviously has EAs and AAs in + S a o eoehomoge- one be not may SSc + HLA ust u the but subset, S ustcudpotentially could subset SSc + n ACA and , lee n autoantigens and alleles HLA-DPB1 | + o.117 vol. S svr interest- very is SSc HLA-DRB1 + HLA + usti EAs; in subset nbt the both in ) | − HLA aassocia- aa ∗ HLA + ikalleles risk o 1 no. + 13:01 S sub- SSc S sub- SSc patients ∗ + + genes genes 08:04 asso- | sub- fre- SSc 559 +

MEDICAL SCIENCES for each of the SSc-specific autoantibody subsets. The topoiso- trials may yield beneficial results. In the future, screening HLA- ∗ merase I peptide sequence “RQRAVALYFIDKLAL” that was DPB1 13:01+ individuals for ANA and ATA could result in bioinformatically predicted as an immunodominant peptide in early identification and therapeutic intervention to block the + both AA and EA ATA subsets is part of the catalytic domain development of SSc. of the molecule, unique to this protein and evolutionarily con- served. Hu et al. (53) have identified peripheral T cell lines from Materials and Methods SSc patients recognizing the “RAVALYFIDKLA” peptide on Patients and Controls. This study included 662 AA SSc patients enrolled from topoisomerase I. 23 academic centers throughout the United States under the GRASP consor- Molecular mimicry has been invoked previously as a potential tium with available genotype (Dataset S1) and serum SSc-specific autoanti- mechanism driving autoimmunity in several diseases, includ- body data (Dataset S2) (8, 36). The study was conducted in accordance with ing in multiple sclerosis and Epstein–Barr the Declaration of Helsinki, and participating centers secured local ethics virus in lupus (27–30). We examined whether the bioinformat- committee approval prior to participant enrollment. All patients met the 1980 American College of Rheumatology (ACR) or the 2013 ACR/European ically predicted immunodominant peptide sequences identified League Against Rheumatism classification criteria for SSc or had at least 3 in this study had homology to microbial protein sequences. of the 5 features of CREST syndrome (calcinosis, Raynaud’s phenomenon, Remarkably, “RQRAVALYFIDKLAL” sequence in topoisom- esophageal dysmotility, sclerodactyly, telangiectasias); 946 genetically sim- erase I, “GRDLINLAKKRTNII” sequence in fibrillarin, ilar unrelated controls were obtained from the Howard University Family and “LQEAAEAFLVHLFED” sequence in CENPA matched Study, a population-based study of AA families and unrelated individuals sequences from viruses of the Mimiviridae and Phycodnaviri- (65). All cases and controls provided written informed consent. Genotype dae families that belong to the NCLDV clade, with an extremely and phenotype data of 723 European ancestry SSc patients and 5,437 con- high confidence level (54). and phycodnaviruses are trols genotyped on the same platform were extracted from dbGaP (SI ubiquitous in aquatic environments, and are constantly Appendix, Table S1). European ancestry SSc patients were a subset of those being exposed to these viruses (55, 56). These viruses cannot reported by Radstake et al. (14). infect human cells or replicate in them, but rather mimiviruses Autoantibody Testing. Sera from the AA SSc patients were tested by a line infect amoeba and phycodnaviruses infect algae (57, 58). Even immunoassay for SSc profile autoantibodies (Euroimmun Euroline profile though humans do not get infected with these viruses, phagocy- kit). For the European ancestry SSc patients, reported autoantibody data tosis by macrophages of virus-infected amoeba or algae can lead were extracted from dbGaP accession phs000357.v1.p1. to processing of viral antigens and presentation via the class II HLA receptor to T helper cells (57). Activated T cells recog- Genotyping. The AA SSc cases and controls were genotyped with the nizing self-antigens with homology to viruses could arise from Illumina Infinium Multi-Ethnic Global Array kit. High-quality genotypes were imputed using the Michigan Imputation Server, and the required activation of quiescent T cells with receptors specific for host ∗ antigens resulting in autoreactive T cells. Alternatively, autoreac- 6,114 markers were submitted to the HLA IMP:03 server for HLA impu- tive T cells could arise by T cell receptor poly-specificity. Peptide tation. The European ancestry samples were genotyped on Illumina Human610-Quadv1 B chip ( ). recognition by T cell receptors is based on amino acid proper- SI Appendix ties, and the binding motifs are degenerate, with only a small PCA. For the 2 ancestral populations, PCA was used to evaluate the genetic sequence needed for recognition. Similarities in peptides at criti- similarity of the cases with the controls, to remove outliers, and to correct cal residues that bind to the class II HLA molecules could lead to for residual dissimilarity separately (SI Appendix). Two-dimensional plots of T cell cross-reactivity, ultimately leading to autoreactive T cells the first 2 principal components of the cases and controls in each study are (59–62). Mimivirus and phycodnavirus peptides that have homol- shown in SI Appendix, Fig. S1.

ogy to topoisomerase I, fibrillarin, or CENPA could activate T ∗ helper cells, which, in turn, activate B cells with receptors that HLA Imputation. We selected the HLA IMP:03 tool to perform HLA impu- specifically recognize and target these nuclear antigens (topoi- tation in the AA samples because it has a multiethnic reference panel of somerase I, fibrillarin, and CENPA, respectively). This could 10,561 individuals that includes 568 of African ancestry (SI Appendix) (66). We used available whole-exome sequence data for 763 of the AA samples lead to the formation of ATA, AFA, and ACA observed in ∗ to determine their HLA alleles using HLA PRG:LA, allowing for comparison ∗ ∗ 74.8% of AA and 65.8% of EA SSc. Our data indicate that gen- of HLA IMP:03 imputed alleles with HLA PRG:LA sequence-based alleles eration of a particular autoantibody has a strong relationship (37). For EA samples, SNP2HLA software and a mainly European ancestry to class II HLA alleles, but the pathogenic potential of SSc- reference of 5,225 individuals were used to impute classical HLA alleles and specific autoantibodies is unclear, and their presence could be polymorphic HLA amino acids (67). an epiphenomenon. Constant exposure of the immune system to these nuclear antigens could lead to chronic autoimmunity. HLA Association and Conditional Analysis. HLA alleles with frequency less This raises an interesting hypothesis for a possible environ- than 0.01 were omitted, and a logistic regression association analysis was mental link in SSc pathogenesis. An increased occurrence of performed under a dominant model classically used for identifying HLA alle- antibodies against mimivirus collagen has been demonstrated les associated with diseases (68) (SI Appendix). Regressions were corrected in rheumatoid arthritis patients, along with antibodies against for genetic dissimilarity between the cases and the controls by including the the mimivirus capsid protein L425 in a third of the rheumatoid top 10 PCs as covariates. To account for strong LD in the region, independent associations were identified by recursively including independent alleles as arthritis patients (63). It is also possible that molecular mimicry covariates. The total number of classical HLA alleles tested for association to mimiviruses may just be an epiphenomenon not playing a was 138 in both the populations, and there were 5 analyses conducted. Thus direct role in SSc pathogenesis (64). Testing SSc samples for anti- a Bonferroni’s multiple test corrected significance threshold of P < 0.000072 bodies against mimivirus capsid protein would be an important was used for association analysis. step to demonstrate patient exposure to these viruses. These HLA findings validate our understanding of SSc as an Amino Acid Analysis. Amino acid associations with SSc were evaluated with autoimmune disease and emphasize the relevance of class II a dominant model logistic regression analysis. Amino acids with frequency HLA genes in SSc pathogenesis. The heterogeneity observed less than 0.01 were omitted. The P value threshold was set as P < 0.000013 in SSc is best characterized by the robust HLA allelic associ- based on 800 amino acids tested across both population samples, multiplied by 5 sets of analysis. ations demonstrated in the SSc-specific autoantibody subsets. These SSc-specific autoantibodies correlate not only with spe- The 3D Protein Modeling. Protein Data Bank (PDB) ID codes 1a35, 2ipx, cific HLA alleles but also with distinct clinical phenotypes and 3nqu, 6atf, 1s9v, and 3lqz were obtained for topoisomerase I, fibrillarin, disease outcomes (4–6). Stratifying SSc on the basis of autoan- CENPA, HLA-DR, HLA-DQ, and HLA-DP, respectively. UCSF Chimera was used tibodies and HLA alleles together for research and clinical to highlight individual aa positions (69).

560 | www.pnas.org/cgi/doi/10.1073/pnas.1906593116 Gourh et al. Downloaded by guest on September 27, 2021 Downloaded by guest on September 27, 2021 3 .Kwn,J auai .A ese r .M rgt nimndmnn epitope immunodominant An Wright, M. T. Jr, Medsger A. T. Kaburaki, J. Kuwana, M. 23. 2 .J McHugh J. N. 22. Reveille D. J. 21. 0 .D Gladman D. D. 20. Radstake R. T. 14. 9 .C Arnett C. F. 19. Gourh P. 17. Gourh P. 16. Rueda B. 15. Agarwal K. S. 13. Alkassab F. 12. 1979-1998. States: United Gourh the in P. mortality 11. sclerosis Systemic Furst, E. D. Krishnan, E. 10. 8 .Gorlova O. 18. or tal. et Gourh te ua rtis n ofni(ai:45) atra(ai:2,and 2), (taxid: bacteria 4751), (taxid: in fungi sequences to homologous and identify Information proteins, the to with human Biotechnology (taxid:9606) BLAST other human for Protein to Standard Center Tool set Search organism National Alignment Local the Basic (NCBI) into entered were Mimicry. Molecular SSc-associated the (38). peptides of of immunodominant affinity binding 2 a in with Peptides subsets. II body histocompatibility class major I, SSc-associated (MHC) (topoisomerase the complex interest to of of CENPA/CENPB) binding protein the or the predict fibrillarin, within to used sequences was peptide server, 15-mer 3.2 NetMHCIIpan the tool, Peptides. tional Immunodominant of Prediction Bioinformatic the (70). tems among interactions Analysis. CART .A .Gelber C. A. 9. sero- and clinical A Jr, Medsger Morgan A. D. T. N. Fertig, 8. N. Lucas, M. anti-DNA Domsic, serum T. of R. Correlation Steen, Wright, V. M. T. 7. Jr, Medsger A. T. Fertig, N. scleroderma. Hu, in Q. P. autoantibodies sclerosis. 6. of systemic in relevance Autoantibodies clinical Steen, mosaic. D. The V. autoantibody Reveille, 5. An D. J. sclerosis: Ho, Systemic T. K. Black, 4. M. C. Bunn, C. C. precip- 3. and Antinuclear Committee Hoc Ad Rowell, Rheumatology of College R. American N. Solomon, H. D. Gray, Reveille, D. J. G. K. 2. Anderson, R. J. Beck, S. J. 1. nDAtpioeaeIi ofrainli aue eeoeet nisrecognition its in sera. Heterogeneity sclerosis nature: systemic in by conformational is I topoisomerase DNA on study. HLA and serological A (1994). relatives: their and systemic (progressive scleroderma sclerosis). in response autoantibody I antitopoisomerase with Rheum. 30Cuain fia-mrcnadhsai ae n 00controls. 1000 and Dis. cases in hispanic Analyses and sclerosis: African-American systemic Caucasian, in 1300 protection or susceptibility confer which epitopes and populations. European and American locus. susceptibility new a as CD247 sclerosis. systemic in hypertension Rheumatol. pulmonary and positivity anti-topoisomerase-I hntpso ytmcslrsstruhagnm-ieascainstrategy. association genome-wide a through Genet. sclerosis systemic of phenotypes sclerosis. systemic Caucasians. in sclerosis systemic sclerosis. systemic positive antibody anticentromere with Rheumatology associated is (SNP) phism sclerosis. systemic antibody-positive anticentromere Rheum. and I- topoisomerase Epidemiol. J. Eur. and Center Scleroderma Hopkins Johns literature. the the at of experience review 20-year A scleroderma: in American African database. in clinical research genome patients the scleroderma of Analysis cohort: American African center sclerosis. systemic with patients Caucasian Rheum. and Arthritis American African of comparison logic sclerosis. systemic in activity and Rheum. severity Arthritis disease with levels antibody I topoisomerase (2005). Ther. Res. Arthritis Immunol. antibodies. nucleolar and (2003). Scl-70, 399–412 Anticentromere, immuno- of tests: use the logic for guidelines Evidence-based Guidelines, Testing Immunologic of sclerosis. systemic progressive in (1963). autoantibodies itating 2–2 (2010). 822–827 69, 1018(2011). e1002178 7, 5–5 (1981). 854–856 24, 9535 (2006). 3945–3953 54, soito fteCof3BKrgo ihssei ceoi nNorth- in sclerosis systemic with region C8orf13-BLK the of Association al., et AK ucinlvrat r soitdwt ucpiiiyt diffuse to susceptibility with associated are variants functional BANK1 al., et soito fTFF O4L oyopim ihssetblt to susceptibility with polymorphisms (OX40L) TNFSF4 of Association al., et .Ci.Investig. Clin. J. 0–0 (1999). 207–208 117, soito ftePP2 60 oyopimwt anti- with polymorphism R620W PTPN22 the of Association al., et nalgatiflmaoyfco AF)snl uloiepolymor- nucleotide single (AIF1) 1 factor inflammatory allograft An al., et tal. et 7522 (2009). 2715–2723 36, dnicto fnvlgntcmresascae ihclinical with associated markers genetic novel of Identification al., et soito faioai eune nteHADB rtdomain first HLA-DQB1 the in sequences acid amino of Association al., et aeadascainwt ies aiettosadmortality and manifestations disease with association and Race al., et nicnrmr niois(C)i ytmcslrsspatients sclerosis systemic in (ACA) antibodies Anti-centromere al., et lncladsrlgclfaue fssei ceoi namulti- a in sclerosis systemic of features serological and Clinical al., et eoewd soito td fssei ceoi identifies sclerosis systemic of study association Genome-wide al., et soito fitrekn2 eetrplmrhsswith polymorphisms receptor 23 interleukin of Association al., et ATaayi a efre oepoehigher-order explore to performed was analysis CART 2815 (2007). 1248–1251 46, nrae rqec fHAD5i scleroderma. in HLA-DR5 of frequency Increased al., et ao itcmaiiiycmlx(H)casI lee,haplotypes alleles, II class (MHC) complex histocompatibility Major , 9629 (2012). 2986–2994 64, (2003). 1363–1373 48, 5–6 (2005). 855–861 20, n.Rem Dis. Rheum. Ann. 09 (2003). 80–93 5, h roiie muooiatppiesequences peptide immunodominant prioritized The Medicine α/β rhii Rheum. Arthritis 7–8 (1992). 973–980 90, HLA eeoieswti h epcieautoanti- respective the within heterodimers HLA n.Rem Dis. Rheum. Ann. 9–0 (2013). 191–205 92, 5–5 (2010). 550–555 69, a.Genet. Nat. lee sn AT60 afr Sys- Salford 6.0, CART using alleles α/β .Autoimmun. J. Medicine 1918 (1999). 1179–1188 42, eeoieswr roiie as prioritized were heterodimers 2–2 (2010). 426–429 42, 88 (2017). e8980 96, 0–0 (2010). 700–705 69, ei.AtrtsRheum. Arthritis Semin. ln x.Immunol. Exp. Clin. 5–6 (2010). 155–162 34, ≤ 0 Madobserved and nM 500 noln computa- online An Lancet rhii Rheum. Arthritis 1188–1190 2, n.Rheum. Ann. 267–274 96, ln Exp. Clin. 35–42 35, Arthritis Arthritis PLoS 49, J. 5 .Nejs .L ogm,P al .Bke oad ytm nesadn fMHC of understanding systems a Towards Bakke, O. Paul, P. Jongsma, Recent L. sclerosis: M. systemic Neefjes, of J. Genetics Feghali-Bostwick, 25. A. C. Silver, M. R. Ramos, S. P. 24. 6 .A ol,A .Vle,Gntc fteHArgo ntepeito ftp 1 type of prediction the in region HLA the Newman of B. Genetics 47. Valdes, M. A. Noble, A. J. 46. Zhou D. X. 45. Kuwana M. 44. and HLA-B27 uveitis, Spondyloarthritis, Ptaszynska, T. Baines, M. Ebringer, A. 29. Wandinger K. encephalitogenic 28. the between homology acid Amino Oldstone, B. M. Fujinami, S. R. 27. Molecular Koprowski, H. Frankel, E. M. Wroblewska, Z. Oldstone, B. M. Fujinami, S. R. 26. 3 .Gonzalez-Serna D. 43. Rodriguez-Reyna S. T. 42. Louthrenoo W. 41. autoantibodies Maul G. Antifibrillarin the G. Bona, in 33. A. C. infections Spiera, viral H. Hatakeyama, of A. Role Kasturi, Gabrielli, N. K. A. 32. Tonnini, C. Mori, S. Moroncini, G. 31. James A. J. 30. 0 .Furukawa H. 40. Hogeboom C. 39. Jensen K. K. 38. Dilthey T. A. 37. Gourh P. 36. neoin- induces MCMV Lunardi C. LeRoy, C. 35. E. Hazen-Martin, D. Harley, A. R. Hamamdzic, D. 34. o rtclraigo h manuscript. Schwartz Daniella the and Dr. of thank R.M.S.); reading We (F.B.). critical and Health for Lung (P.S.R. for P60-AR-062755 Program Grant Ireland NIH Grant F.M.W.); Nina and NIH A.A.S., (P.S.R.); (N.D.M., P30-AR-070254 K01-AR-067280 Discovery Grant NIH NIH Memorial N.D.M.); and Staurulakis Fund and Chresanthe Rheumatology (P.G. (N.D.M.); a award T32-AR-048522 by Development Grant part, Scientist in cluster Foundation supported, Biowulf compu- Research was computing the work high-performance using This NIH analyzed National (http://hpc.nih.gov). the were of the Center Data resources the NIH. Institute, and tational of Diseases, Research Technology Skin Pro- Information and Genome Research Musculoskeletal for Human and Intramural Arthritis National of the Institute the and of Foundation grams Research Scleroderma the S1). Table ACKNOWLEDGMENTS. Appendix, (SI dbGaP from available is dataset EA h orsodn hntpcifraini vial as available is information phenotypic corresponding the Availability. Data Significant sequences. of value microbial E homologous an by find defined was to homology 10239) (taxid: virus ls n H ls Iatgnpresentation. antigen II class MHC and I class advances. i nIainadTrihppltostruhagnm-ieascainstudy. association genome-wide a through populations Turkish and Rheumatology Iranian in sis Klebsiella. MS. in reactivation virus autoimmunity. for Mechanism virus: and protein (1985). 1043–1045 basic myelin of site (1983). 2346–2350 filaments. 80, intermediate herpes of human or with phosphoprotein protein virus virus measles simplex of Crossreaction infection: virus in mimicry lnclsbye n uonioisi eia ytmcslrss(S)patients. (SSc) sclerosis systemic Mexican One in autoantibodies and subtypes clinical patients. DRB1*14:06, Thai in (2013). sclerosis DRB1*13:02, systemic to alleles, susceptibility the protective independent four the DPB1*02:01. and DQB1*03:01, of sign The sclerosis. systemic in autoimmunity for U.S.A. cause possible p30 a retroviral with gests similarity Sequence I: topoisomerase DNA antigen similar with interact diseases tissue connective other epitopes. and sclerosis systemic in present sclerosis. (2013). systemic of etiopathogenesis iu stigrfrmlil sclerosis. multiple for trigger as virus molecules. II class Bioinformatics. Americans. African in scleroderma to Rheum. susceptibility Arthritis increase that networks gene and cells. endothelial human in apoptosis Med. induce Nat. and UL94 protein of late proliferation cytomegalovirus persistent and apoptosis cell Intimal mice: myofibroblasts. IFN-gammaR-/- in tima oaiaini rh’ disease. Crohn’s in localization diabetes. population. Chinese in features systemic with patients Japanese in susceptibility disease sclerosis. with not but profiles, tibody 0277(2015). e0126727 10, 4289 (1989). 8492–8496 86, nen Med. Intern. ur ibtsRep. Diabetes Curr. .Ep Med. Exp. J. re eot hl-xm eunigt dniyrr variants rare identify to sequencing Whole-exome report: Brief al., et ur pn Rheumatol. Opin. Curr. 1318 (2000). 1183–1186 6, ytmcslrssimngoui uonioisbn h human the bind autoantibodies G immunoglobulin sclerosis Systemic al., et muo.Rev. Immunol. eemnto fa ptp ftedfuessei ceoi marker sclerosis systemic diffuse the of epitope an of Determination al., et uu n Epstein-Barr. and Lupus al., et soito fHADB*51wt ceoem n t clinical its and scleroderma with HLA-DQB1*0501 of Association al. , et soito fhmnluoyeatgncasI ee ihautoan- with genes II class antigen leukocyte human of Association al., et AD5adHADB lee nunessetblt n disease and susceptibility influence alleles DRB1 HLA and CARD15 al., et mrvdmtosfrpeitn etd idn fnt oMHC to affinity binding peptide predicting for methods Improved al., et L*AHAtpn rmlnal rjce rp alignments. graph projected linearly from typing HLA*LA—HLA al., et ua ekct nie n ytmcslrssi Japanese: in sclerosis systemic and antigen leukocyte Human al., et soito ewe lncldsaeatvt n Epstein-Barr and activity disease clinical between Association al., et etd oi nlsspeit ypoyi choriomeningitis lymphocytic predicts analysis motif Peptide al., et 8–9 (2018). 289–298 58, 3449 (2019). 4394–4396 35, M ucsee.Disord. Muscoskelet. BMC soito fHADB*50 n R50:2all with allele DRB5*01:02 and HLA-DRB1*15:02 of Association al., et 6416 (2018). 1654–1660 70, h Agntp aaaeaalbeas available are data genotype AA The Immunology tal. et 3–4 (1999). 336–344 38, 0713 (1995). 1027–1036 181, L ls n Ibok r soitdt susceptibility, to associated are blocks II and I class HLA al., et hssuywsspotdb eerhfnigfrom funding research by supported was study This PNAS Neurology nlsso h eei opnn fssei sclero- systemic of component genetic the of Analysis , 1116(1985). 86:101–116 3–4 (2011). 533–542 11, LSOne PLoS m .Gastroenterol. J. Am. 9–0 (2018). 394–406 154, n.J muoahl Pharmacol. Immunopathol. J. Int. | 2–2 (2015). 521–529 27, aur ,2020 7, January 7–8 (2000). 178–184 55, o.Immunol. Mol. ln x.Rheumatol. Exp. Clin. 0525(2016). e0154255 11, ur pn Rheumatol. Opin. Curr. (2001). 3 2, .5(71). <0.05 a.Rv Immunol. Rev. Nat. 2–3 (2015). 625–635 67, 0–1 (2004). 306–315 99, | huao.Int. Rheumatol. o.117 vol. rc al cd c.U.S.A. Sci. Acad. Natl. Proc. rc al cd Sci. Acad. Natl. Proc. 2Spl7) 3–7 76), Suppl 31(2 2–3 (2011). 823–836 11, 8–8 (2012). 383–388 24, 4–5 (2013). 747–751 26, and S1, Dataset The S2. Dataset | gag o 1 no. 2069–2077 33, Science rti sug- protein | PLoS 230, 561

MEDICAL SCIENCES 48. P. Goyette et al., High-density mapping of the MHC identifies a shared role for 59. K. W. Wucherpfennig et al., Polyspecificity of T cell and B cell receptor recognition. HLA-DRB1*01:03 in inflammatory bowel diseases and heterozygous advantage in Semin. Immunol. 19, 216–224 (2007). ulcerative colitis. Nat. Genet. 47, 172–179 (2015). 60. K. W. Wucherpfennig et al., Structural requirements for binding of an immunodom- 49. I. L. Mero et al., Oligoclonal band status in Scandinavian multiple sclerosis patients is inant myelin basic protein peptide to DR2 isotypes and for its recognition by human associated with specific genetic risk alleles. PLoS One 8, e58352 (2013). T cell clones. J. Exp. Med. 179, 279–290 (1994). 50. A. Paradowska-Gorycka et al., Association of HLA-DRB1 alleles with susceptibility to 61. F. Sinigaglia, J. Hammer, Defining rules for the peptide-MHC class II interaction. Curr. mixed connective tissue disease in Polish patients. HLA 87, 13–18 (2016). Opin. Immunol. 6, 52–56 (1994). 51. F. K. Tan, et al., Hla haplotypes and microsatellite polymorphisms in and around 62. P. A. Reay, R. M. Kantor, M. M. Davis, Use of global amino acid replacements to define the major histocompatibility complex region in a Native American population with the requirements for MHC binding and T cell recognition of moth cytochrome c (93- a high prevalence of scleroderma (systemic sclerosis). Tissue Antigens 53, 74–80 103). J. Immunol. 152, 3946–3957 (1994). (1999). 63. N. Shah et al., Exposure to mimivirus collagen promotes arthritis. J. Virol. 88, 838–845 52. F. C. Arnett et al., Increased prevalence of systemic sclerosis in a Native American (2014). tribe in Oklahoma. Association with an Amerindian HLA haplotype. Arthritis Rheum. 64. L. I. Albert, R. D. Inman, Molecular mimicry and autoimmunity. N. Engl. J. Med. 341, 39, 1362–1370 (1996). 2068–2074 (1999). 53. P. Q. Hu, J. J. Oppenheim, T. A. Medsger Jr, T. M. Wright, T cell lines from sys- 65. A. Adeyemo et al., A genome-wide association study of hypertension and blood temic sclerosis patients and healthy controls recognize multiple epitopes on DNA pressure in African Americans. PLoS Genet. 5, e1000564 (2009). topoisomerase I. J. Autoimmun. 26, 258–267 (2006). 66. A. Motyer et al., Practical use of methods for imputation of HLA alleles from SNP 54. L. M. Iyer, S. Balaji, E. V. Koonin, L. Aravind, Evolutionary genomics of nucleo- genotype data. https://doi.org/10.1101/091009. (9 December 2016). cytoplasmic large DNA viruses. Virus Res. 117, 156–184 (2006). 67. X. Jia et al., Imputing amino acid polymorphisms in human leukocyte antigens. PLoS 55. M. Boughalmi et al., High-throughput isolation of giant viruses of the Mimiviridae One 8, e64683 (2013). and Marseilleviridae families in the Tunisian environment. Environ. Microbiol. 15, 68. G. Thomson, Hla disease associations: Models for the study of complex human genetic 2000–2007 (2013). disorders. Crit. Rev. Clin. Lab. Sci. 32, 183–219 (1995). 56. B. La Scola et al., Tentative characterization of new environmental giant viruses by 69. E. F. Pettersen et al., UCSF Chimera—A visualization system for exploratory research MALDI-TOF mass spectrometry. Intervirology 53:344–353 (2010). and analysis. J. Comput. Chem. 25, 1605–1612 (2004). 57. E. Ghigo et al., Ameobal pathogen mimivirus infects macrophages through 70. H. Zhang, G. Bonney, Use of classification trees for association studies. Genet. phagocytosis. PLoS Pathog. 4, e1000087 (2008). Epidemiol. 19, 323–332 (2000). 58. H. Chen et al., The genome of a prasinoviruses-related freshwater virus reveals 71. S. F. Altschul, W. Gish, W. Miller, E. W. Myers, D. J. Lipman, Basic local alignment search unusual diversity of phycodnaviruses. BMC Genomics 19, 49 (2018). tool. J. Mol. Biol. 215, 403–410 (1990).

562 | www.pnas.org/cgi/doi/10.1073/pnas.1906593116 Gourh et al. Downloaded by guest on September 27, 2021