<<

and Immunity (2014) 15, 466–476 & 2014 Macmillan Publishers Limited All rights reserved 1466-4879/14 www.nature.com/gene

ORIGINAL ARTICLE Annotation of functional variation within non-MHC MS susceptibility loci through bioinformatics analysis

FBS Briggs, LJ Leung and LF Barcellos

There is a strong and complex genetic component to multiple sclerosis (MS). In addition to variation in the major histocompatibility complex (MHC) region on 6p21.3, 110 non-MHC susceptibility variants have been identified in Northern Europeans, thus far. The majority of the MS-associated genes are immune related; however, similar to most other complex genetic diseases, the causal variants and biological processes underlying pathogenesis remain largely unknown. We created a comprehensive catalog of putative functional variants that reside within linkage disequilibrium regions of the MS-associated genic variants to guide future studies. Bioinformatics analyses were also conducted using publicly available resources to identify plausible pathological processes relevant to MS and functional hypotheses for established MS-associated variants.

Genes and Immunity (2014) 15, 466–476; doi:10.1038/.2014.37; published online 17 July 2014

INTRODUCTION structure through alternative splicing within IL7R7 and 8 Multiple sclerosis (MS) is a clinically heterogeneous autoimmune TNFRSF1A. However, the causal variants and the pathological disease of the central nervous system with a complex etiology, biological processes mediated by the remaining 103 loci have not primarily characterized by demyelination and the formation of been fully explored. neurological lesions.1 The prevalence of MS is greatest among Here, we present a comprehensive catalog of candidate Northern Europeans (0.1–0.2%).1 There is a prominent genetic functional variants within linkage disequilibrium (LD) blocks component illustrated by consistently higher disease concordance encompassing the non-MHC MS susceptibility loci to expedite in monozygotic twins compared with dizygotic twins across the prioritization of SNPs for future fine-mapping analyses. We several populations (30% and 5%, respectively), similar to other also conducted several bioinformatics enrichment analyses that 2 autoimmune diseases. The relative recurrence risk (ls)isB6.3 integrated multiple functional (-omic) data for both MS-associated among siblings.3 For several decades, variation within the major genes and the individual non-MHC risk variants. These results histocompatibility complex (MHC) on chromosome 6p21.3 has provide thorough insight into underlying disease mechanisms in been known to confer the strongest genetic risk for MS. The MS based on current knowledge. primary susceptibility is a human leukocyte antigen (HLA) class II allele, HLA-DRB1*15:01.1,4,5 Recent fine-mapping of the MHC demonstrated the presence of at least 10 additional RESULTS independent HLA and non-HLA risk alleles for MS.6 Of the 110 non-MHC MS risk variants, 63 SNPs are within genes, Full characterization of the non-MHC genetic component in MS 1 SNP is within a noncoding RNA (PVT1) and the remaining 46 has been challenging; however, 110 non-MHC risk variants have SNPs are intergenic (Table 1). Using Haploreg v2 (2013.02.14; been identified within the last decade through both genome-wide http://www.broadinstitute.org/mammals/haploreg/haploreg.php)9 association studies (GWASs) and follow-up candidate gene studies features of the genic risk SNPs were assessed. Among the genic facilitated by international research initiatives (International risk SNPs, there were three missense SNPs (TYK2, SLC44A2 and Multiple Sclerosis Genetics Consortium and collaborative con- IFI30), two 30 untranslated region (UTR) SNPs (EVI5 and CD69), one 4,5 0 struction of the ImmunoChip). These variants explain 20% of ls, 5 UTR SNP (CD86), one synonymous SNP (TIMMDC1) and one and with the inclusion of the four most prominent HLA risk alleles downstream SNP (EOMES); the remaining 45 SNPs are intronic 5 as much as 28% of ls. These risk variants are both genic (Table 1). With the exception of the TYK2 missense variant, these (63 single-nucleotide polymorphisms (SNPs) in 63 genes) and variants are fairly common with minor allele frequencies 45%. intergenic, similar to most complex diseases. For most of the 63 A total of 92 SNPs reside within regulatory motifs, and 27 SNPs MS-associated genes, their contribution to pathogenesis is reside within factor binding sites. Interestingly, the unclear; however, they appear to be primarily involved in MS risk variant rs4796791 is an intronic STAT3 (a transcription immune response. In the largest MS association study, high- factor) SNP, and three other MS risk variants exist within signal resolution mapping was performed for 8 risk loci; for 5 of them, transducer and activator of transcription 3 (STAT3) binding sites TNFSF14, IL2RA, TNFRSF1A, IL12A and STAT4, 50% of the posterior within their respective genes/noncoding RNA: rs4410871 (PVT1), probability of association was explained by an intronic variant; 3 of rs917116 (JAZF1) and rs2236262 (ZFP36L1). Among all variants, the 5 variants are associated with protein levels.5 Functional there was a significant enrichment of strong enhancers for several experiments have revealed the primary causal variants affecting cell lines, including human embryonic stem cells and leukemia

Genetic Epidemiology and Genomics Laboratory, Division of Epidemiology, School of Public Health, University of California, Berkeley, CA, USA. Correspondence: Profesor LF Barcellos, Genetic Epidemiology and Genomics Laboratory, Division of Epidemiology, School of Public Health, 324 Stanley Hall, University of California, Berkeley, CA 94720, USA. E-mail: [email protected] Received 20 February 2014; revised 2 May 2014; accepted 29 May 2014; published online 17 July 2014 & 04McilnPbihr Limited Publishers Macmillan 2014 Table 1. Annotation for the 110 non-MHC MS risk SNPs

Chr SNP ID Ref Alt EUR Published odds bound Motifs changed Gene Gene location allele allele freq ratio (95% CI)a ( location )

1 2525665 rs3748817 T C 0.33 1.14 (1.10–1.18) ZNF263 GR, p300 MMEL1 Intronic 1 6530189 rs3007421 G A 0.11 1.12 (1.07–1.18) ERa-a, Nkx2, RAR PLEKHG5 Intronic 1 85746993 rs12087340 C T 0.08 1.22 (1.15–1.29) 4.4 kb 5’ of BCL10 1 85915183 rs11587876 T C 0.21 1.12 (1.07–1.17) P300 DDAH1 Intronic 1 92975464 rs41286801 C T 0.17 1.20 (1.15–1.25) Arid5a, CEBPA, CEBPB, CTCF, EVI5 3’-UTR Pou2f2, STAT 1 101240893 rs7552544 T C 0.44 1.08 (1.05–1.12) Pax-4, Zbtb3 36 kb 3’ of VCAM1 1 101407519 rs11581062 A G 0.29 1.05 (1.01–1.09) SEF-1, YY1 SLC30A7 Intronic 1 117080166 rs6677309 A C 0.14 1.34 (1.27–1.41) POL24H8, AP2ALPHA, Mtf1, Rad21 CD58 Intronic AP2GAMMA, E2F6, CMYC 1 120258970 rs666930 T C 0.54 1.09 (1.06–1.13) EBF, Pou1f1 PHGDH Intronic 1 157770241 rs2050568 C T 0.47 1.08 (1.05–1.12) FCRL1 Intronic 1 160711804 rs35967351 A T 0.3 1.09 (1.05–1.13) GATA2 ATF3, CCNT2, INSM1, SP1, ZBTB33 SLAMF7 Intronic 1 192541472 rs1359062 C G 0.81 1.18 (1.13–1.23) 3.4 kb 5’ of RGS1 1 200874728 rs55838263 A G 0.26 1.12 (1.08–1.17) FOXA1, HNF4G, RXRA, Ik-2, NF-AT1, Pou2f2 C1orf106 Intronic TCF4 2 25017860 rs4665719 C T 0.75 1.09 (1.05–1.13) CENPO Intronic 2 43361256 rs2163226 T C 0.27 1.10 (1.07–1.15) CMYC Irf, Nanog, Spz1 88 kb 3’ of ZFP36L2 2 61095245 rs842639 G A 0.68 1.11 (1.08–1.15) PU1, YY1, ELF1 FLJ16341 Intronic 2 68587477 rs7595717 C T 0.28 1.10 (1.06–1.14) 4.8 kb 5’ of PLEK 2 112665201 rs17174870 C T 0.29 1.03 (1.00–1.07) Maf MERTK Intronic 2 191974435 rs9967792 T C 0.66 1.11 (1.07–1.15) TCF12 STAT4 Intronic

2 231115454 rs9989735 G C 0.18 1.17 (1.12–1.22) POL24H8 AP-1, CAC-binding-protein, CCNT2, SP140 Intronic pathogenesis MS in Briggs dysregulation FBS T-cell for role A , EWSR1-FLI1, Egr-1, MAZ, MAZR, MZF1::1-4, , PU.1, Pou2f2, SP1, STAT, Sp4, TATA, TFII-I, WT1, ZNF263, Zfp281 tal et 3 18785585 rs11719975 G C 0.25 1.09 (1.05–1.13) 305 kb 5’ of SATB1 3 27757018 rs2371108 G T 0.4 1.08 (1.05–1.12) Zic 866 bp 3’ of EOMES Downstream 3 28078571 rs1813375 G T 0.49 1.15 (1.12–1.19) EBF1 CCNT2, , Foxc1, Foxj1, GATA, 205 kb 5’ of CMC1 Nkx3, Sox, TAL1 3 33013483 rs4679081 T C 0.54 1.08 (1.04–1.11) CEBPG, DMRT7, Evi-1, Foxo, HNF1, 17 kb 3’ of CCR4 Hoxa9, Hoxc10, Hoxc9, Irf, 3 71530346 rs9828629 C T 0.41 1.08 (1.05–1.12) CEBPB, HMG-IY, Myc, Zfp187 FOXP1 Intronic 3 105558837 rs2028597 G A 0.08 1.04 (0.98–1.11) AIRE, AP-4, LBP-1 CBLB Intronic 3 119222456 rs1131265 G C 0.16 1.19 (1.14–1.24) TIMMDC1 Synonymous ee n muiy(04 6 476 – 466 (2014) Immunity and Genes 3 121543577 rs1920296 A C 0.61 1.14 (1.11–1.18) FAC1, GR IQCB1 Intronic 3 121770539 rs2255214 G T 0.51 1.11 (1.08–1.15) BRCA1, Foxi1, Pou2f2, Pou3f3, TEF 3.7 kb 5’ of CD86 3 121796768 rs9282641 G A 0.09 1.12 (1.05–1.19) NF-kB, POL2, POL24H8, CD86 5’-UTR PU1 3 159691112 rs1014486 T C 0.47 1.11 (1.07–1.14) T3R 16 kb 5’ of IL12A 4 103551603 rs7665090 A G 0.49 1.08 (1.05–1.12) Pou1f1 1 kb 3’ of MANBA 4 106173199 rs2726518 A C 0.6 1.09 (1.05–1.13) PPAR, RAR TET2 Intronic 5 35879156 rs6881706 G T 0.28 1.12 (1.08–1.16) Barhl1, Barx1, Barx2, En-1, Gbx1, 2.2 kb 3’ of IL7R Gbx2, Hlxb9, Ik-1, Ik-2, Isl2, Lhx4, Lhx8, Msx-1, Msx2, Pax-6, Pax7, Phox2a, Pou3f2, Prrx1, Prrx2 5 40399096 rs6880778 A G 0.58 1.10 (1.06–1.14) AP-1, Elf5, HMG-IY, Maf, STAT 281 kb 5’ of PTGER4 5 55440730 rs71624119 G A 0.25 1.12 (1.08–1.17) Bbx, DMRT1, STAT ANKRD55 Intronic 5 133446575 rs756699 C T 0.85 1.12 (1.07–1.18) PLZF, Pax-5 3.8 kb 5’ of TCF7 5 141506564 NA T G 0.61 1.07 (1.04–1.11) NDFIP1 Intronic 5 158759900 rs2546890 A G 0.5 1.06 (1.02–1.09) EBF1, MEF2A, PU1 Ik-2, RBP-Jk LOC285626 Intronic 467 468 ee n muiy(04 6 476 – 466 (2014) Immunity and Genes Table 1. (Continued )

Chr Base pair SNP ID Ref Alt EUR Published odds Proteins bound Motifs changed Gene Gene location allele allele freq ratio (95% CI)a (transcription factor location binding site)

5 176788570 rs4976646 T C 0.34 1.13 (1.09–1.17) BCL, GATA, PU.1, Pax-5 RGS14 Intronic 6 14719496 rs17119 G A 0.77 1.11 (1.06–1.15) AP-2rep, Cdx2, Crx, Hoxc9, Obox3, 527 kb 5’ of JARID2 Pitx2, Pitx3 pathogenesis MS in dysregulation T-cell for role A 6 36375304 rs941816 G A 0.82 1.13 (1.08–1.18) Hdx PXT1 Intronic 6 90976768 rs72928038 G A 0.18 1.11 (1.07–1.16) Ets, FEV, STAT BACH2 Intronic 6 128278798 rs802734 A G 0.28 1.03 (0.99–1.06) 11 kb 3’ of PTPRK 6 135739355 rs11154801 C A 0.38 1.11 (1.07–1.15) Nkx2 AHI1 Intronic 6 137452908 rs17066096 A G 0.23 1.14 (1.10–1.18) Irx, Myc 12 kb 3’ of IL22RA2 6 137962655 rs7769192 G A 0.48 1.08 (1.04–1.12) CTCF HNF1, Irf, Pax-6 147 kb 5’ of OLIG3 6 138244816 rs67297943 T C 0.19 1.12 (1.07–1.16) Znf143, p300 40 kb 3’ of TNFAIP3 6 159470559 rs212405 A T 0.61 1.15 (1.11–1.19) Ik-3, NF-kB 4.4 kb 5’ of TAGAP 7 3113034 rs1843938 G A 0.38 1.08 (1.05–1.12) Gfi1, Gfi1b, RREB-1 30 kb 5’ of CARD11 7 27014988 rs706015 T G 0.18 1.14 (1.09–1.19) Evi-1, HNF1, Sox, TATA 111 kb 5’ of SKAP2 7 28172739 rs917116 T G 0.22 1.12 (1.07–1.16) STAT3, P300 Rad21 JAZF1 Intronic 7 37382465 rs60600003 T G 0.1 1.16 (1.10–1.22) CTCF, BATF, EBF1, ELF1, ELMO1 Intronic POL2, POL24H8, PU1, Briggs FBS RAD21, SIN3AK20, SMC3, TAF1, TBP, YY1, NF-kB, CMYC, MAFK, MAX 7 50325567 rs201847125 C T NA 1.11 (1.07–1.15) HDAC2, Nanog, p300 19 kb 5’ of IKZF1 al et 7 149289464 rs354033 G A 0.26 1.03 (1.00–1.07) STAT ZNF767 Intronic 8 79575804 rs1021156 T C 0.72 1.12 (1.08–1.16) Cdx2, Hoxa10, Hoxb9, Nanog 2.5 kb 5’ of FAM164A 8 128192981 rs2456449 A G 0.31 1.10 (1.06–1.14) Evi-1, Foxa, Foxd3, Foxp1, HDAC2, 235 kb 5’ of POU5F1B Irf, Sox, Zfp105, p300 8 128815029 rs4410871 T C 0.7 1.12 (1.08–1.16) STAT3 Hsf, NF-AT, STAT PVT1 (non-coding RNA) 8 129158945 rs759648 A C 0.31 1.09 (1.05–1.13) Ets 3.4 kb 5’ of MIR1208 9 5893861 rs2150702 G A, T 0.48 1.16 (1.10–1.22) MLANA Intronic 10 6099045 rs2104286 T C 0.22 1.21 (1.16–1.26) CEBPB IL2RA Intronic 10 31415106 rs793108 C T 0.48 1.09 (1.06–1.13) Foxd3, Foxf1, Foxi1, Foxq1 94 kb 5’ of ZNF438 10 75658349 rs2688608 G T 0.55 1.07 (1.03–1.10) HNF4, Mrg1::Hoxa9, PPAR, RAR, 11 kb 3’ of C10orf55 RXR::LXR, RXRA, TR4 10 81048611 rs1782645 C T 0.37 1.09 (1.05–1.13) NRSF, p300 ZMIZ1 Intronic 10 94481917 rs7923837 G A 0.39 1.11 (1.07–1.14) Foxd1, Foxf2, Foxj1, Foxk1, Foxo, 27 kb 3’ of HHEX Hmbox1 11 47702395 rs7120737 A G 0.15 1.13 (1.08–1.18) CEBPB Zbtb12 AGBL2 Intronic 11 60793330 rs34383631 C T 0.39 1.11 (1.07–1.15) AP-4, CTCF, LBP-1, Lmo2-complex, 5.5 kb 3’ of CD6

& Nanog

04McilnPbihr Limited Publishers Macmillan 2014 11 64097233 rs694739 A G 0.4 1.08 (1.04–1.11) AP-1, ERa-a, Egr-1, Myf, Pou2f2, 7.9 kb 3’ of PRDX5 TBX5 11 118566746 rs533646 C G 0.33 1.10 (1.06–1.14) BATF, BCL11A, EBF1, IRF4, STAT, TCF12 16 kb 5’ of TREH MEF2C, P300, PAX5N19, SP1, TCF12, NF-kB, YY1 11 118724894 rs9736016 T A 0.38 1.10 (1.07–1.14) CCNT2, Cart1, GR, Nanog, Nkx2 30 kb 5’ of CXCR5 11 118755738 rs523604 A G 0.5 1.09 (1.05–1.13) HNF1, Maf CXCR5 Intronic 12 6440009 rs1800693 T C 0.43 1.14 (1.11–1.18) ERa-a TNFRSF1A Intronic 12 6503500 rs12296430 G C 0.2 1.14 (1.09–1.18) POL2 Ncx, Spz1, Zfp410 2.8 kb 3’ of LTBR 12 9905690 rs11052877 A G 0.38 1.10 (1.07–1.14) POL2 CD69 3’–UTR 12 58182062 rs201202118 A T NA 1.14 (1.10–1.18) Evi-1, FAC1, Foxa, Foxk1, Foxp1, TSFM Intronic HDAC2, Irf, Mef2, Nanog, Sox, TATA, Zfp105, p300 &

04McilnPbihr Limited Publishers Macmillan 2014 Table 1. (Continued )

Chr Base pair SNP ID Ref Alt EUR Published odds Proteins bound Motifs changed Gene Gene location allele allele freq ratio (95% CI)a (transcription factor location binding site)

12 123593382 rs7132277 C T 0.2 1.10 (1.06–1.15) Hic1 PITPNM2 Intronic 13 100086259 rs4772201 A G 0.18 1.12 (1.07–1.17) HNF1, NF-AT, Pax-4 28 kb 5’ of MIR548AN 14 69261472 rs2236262 A G 0.5 1.08 (1.04–1.11) POL2, TAF1, JUND, STAT3, GATA, Pou1f1, SIX5, STAT, Zfp105, ZFP36L1 Intronic TBP, CMYC, POL24H8, Znf143 POL2B 14 75961511 rs4903324 C T 0.21 1.10 (1.05–1.14) SRF, YY1, CFOS, USF2, Gm397, Pax-6 22 kb 3’ of JDP2 USF1 14 88432328 rs74796499 C A 0.05 1.31 (1.21–1.42) AP-1, EWSR1-FLI1, Elf3, Elf5, GATA, GALC Intronic HMG-IY, Irf, Lhx4 14 103263788 rs12148050 A G 0.65 1.08 (1.04–1.11) AP-4, Rad21 TRAF3 Intronic 15 79207466 rs59772922 T C 0.19 1.11 (1.06–1.15) TCF12 6.6 kb 3’ of CTSH 15 90977333 rs8042861 G T 0.46 1.08 (1.05–1.12) Foxa, Foxd1 IQGAP1 Intronic 16 1073552 rs2744148 A G 0.16 1.09 (1.04–1.13) Egr-1, Myc, TATA 37 kb 3’ of SOX8 16 11194771 rs12927355 C T 0.33 1.21 (1.17–1.26) Mef2 CLEC16A Intronic 16 11288806 rs4780346 G A 0.28 1.09 (1.05–1.13) AP-1, COMP1, NF-E2 13 kb 3’ of CLEC16A 16 11435990 rs6498184 T C 0.85 1.15 (1.10–1.21) 3.3 kb 5’ of RMI2 16 30156963 rs7204270 C T 0.56 1.09 (1.06–1.13) DMRT5, HP1-site-factor, Pax-6, 22 kb 5’ of MAPK3 Pou6f1 16 68685905 rs1886700 C T 0.16 1.11 (1.06–1.16) AP-1, NF-Y, Pbx3, RFX5, SP2 CDH3 Intronic 16 79110596 rs12149527 C T 0.45 1.08 (1.05–1.12) , MAZR, ZBTB7A, Zfp740 WWOX Intronic 16 79649394 rs7196953 A G 0.69 1.08 (1.04–1.12) SRF 15 kb 5’ of MAF 16 85994484 rs35929052 C T 0.11 1.14 (1.09–1.20) SIX5 38 kb 3’ of IRF8 17 37912377 rs12946510 C T 0.47 1.08 (1.04–1.11) NF-kB, BCL11A, EBF1, Foxo, Foxp1, HNF1, Irf, Mef2, Pax-4, 8.8 kb 3’ of IKZF3 MEF2A, MEF2C, P300, Sox pathogenesis MS in Briggs dysregulation FBS T-cell for role A PAX5C20, POL2, POL24H8, SP1, TBP, YY1, TAF1 17 40530763 rs4796791 T C 0.62 1.10 (1.06–1.14) Foxa, Sox STAT3 Intronic 17 45597098 rs4794058 C T 0.53 1.07 (1.04–1.11) Glis2 11 kb 5’ of NPEPPS al et 17 57816757 rs8070345 T C 0.55 1.14 (1.11–1.18) Barhl1, GR VMP1 Intronic 18 56384192 rs7238078 T G 0.21 1.05 (1.02–1.10) COMP1, Irf, Pou5f1, Sox MALT1 Intronic 19 6668972 rs1077667 C T 0.24 1.16 (1.12–1.21) POL2, CMYC, MAX TNFSF14 Intronic 19 10463118 rs34536443 G C 0.03 1.28 (1.18–1.40) BDP1, Glis2, HEY1, NRSF, Sin3Ak-20, TYK2 Missense ZBTB7A, Zfp161, Zic 19 10742170 rs2288904 A G 0.77 1.14 (1.09–1.19) GR, NRSF, YY1 SLC44A2 Missense 19 16505106 rs1870071 T C 0.31 1.12 (1.08–1.16) EWSR1-FLI1 EPS15L1 Intronic 19 18285944 rs11554159 G A 0.27 1.15 (1.11–1.20) POL2 NF-I IFI30 Missense ee n muiy(04 6 476 – 466 (2014) Immunity and Genes 19 49870643 rs8107548 T C 0.23 1.09 (1.05–1.13) CHOP::CEBPa DKKL1 Intronic 20 44747947 rs4810485 T G 0.75 1.08 (1.04–1.12) NF-kB, MEF2A, MEF2C, STAT CD40 Intronic PU1 20 48438761 rs17785991 T A 0.34 1.09 (1.05–1.13) SLC9A8 Intronic 20 52791518 rs2248359 C T 0.4 1.07 (1.03–1.10) GR, TCF4 1 kb 5’ of CYP24A1 20 62373983 rs2256814 G A 0.18 1.11 (1.07–1.16) Ascl2, CTCFL, CTCF, E2A, HEN1, Myf, SLC2A4RG Intronic Rad21, SMC3 20 62409713 rs6062314 C T 0.92 1.10 (1.03–1.16) GR ZBTB46 Intronic 22 22131125 rs2283792 T G 0.51 1.08 (1.05–1.12) MAPK1 Intronic 22 50966914 rs470119 T C 0.6 1.07 (1.03–1.10) POL2 Hbp1, Hoxc9, Nrf1, Obox6, Pdx1, TYMP Intronic Vax2 Abbreviations: Chr, chromosome; CI, confidence interval; MHC, major histocompatibility complex; MS, multiple sclerosis; NA, not available; SNP, single-nucleotide polymorphism; UTR, untranslated region. aOdds ratio and 95% confidence intervals reported in 14 498 MS cases and 24 091 healthy controls.5 469 470 ee n muiy(04 6 476 – 466 (2014) Immunity and Genes

Table 2. Functional assignments within the LD block boundaries for the 63 genic non-MHC MS-risk SNPs oefrTcl yrglto nM pathogenesis MS in dysregulation T-cell for role A Chr SNP ID Base Gene SNP used Block start Block end Block 3 0 UTR 5 0 UTR Coding Feature Feature Frame- Inframe Inframe Missense NMD Noncoding Splice Splice Splice Stop Stop Synonymous Total position position position width variant variant sequence elongation truncation shift deletion insertion variant transcript exon variant acceptor donor region gained lost variant (kb) variant variant variant variant ariant variant

1 rs3748817 2525665 MMEL1 rs3748817 2524205 2566482 42.28 12 5 0 0 1 0 0 0 92 4 70 1 1 16 4 0 39 245 1 rs3007421 6530189 PLEKHG5 rs3007421 6529443 6533393 3.95 0 0 0 0 0 0 0 0 56 0 78 0 0 9 1 0 28 172 1 rs11587876 85915183 DDAH1 rs11587876 85915149 85918101 2.95 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 rs41286801 92975464 EVI5 rs41286801 92972933 92977275 4.34 189 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 191 1 rs11581062 101407519 SLC30A7 rs11581062 101391432 101418569 27.14 0 0 0 35 12 0 0 0 0 0 0 0 0 0 0 0 0 47 1 rs6677309 117080166 CD58 rs6677309 117079262 117084410 5.15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 rs666930 120258970 PHGDH rs666930 120255578 120258970 3.39 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 rs2050568 157770241 FCRL1 rs2050568 157736437 157775921 39.49 33 0 0 0 0 0 0 0 69 0 148 1 3 9 2 0 35 300 1 rs35967351 160711804 SLAMF7 rs35967351 160708150 160719238 11.09 0 1 0 0 0 0 0 0 14 0 22 0 0 2 0 0 6 45 1 rs55838263 200874728 C1orf106 rs55838263 200871843 200878727 6.89 0 0 0 0 0 0 0 0 14 27 0 0 0 6 0 0 6 53 2 rs4665719 25017860 CENPO rs4665719 25015074 25019001 3.93 0 28 0 0 0 0 0 0 11 0 21 0 0 5 0 0 0 65 2 rs842639 61095245 FLJ16341 rs842639 61093035 61133793 40.76 0 0 0 13 13 0 0 0 0 0 0 0 0 0 0 0 0 26 2 rs17174870 112665201 MERTK rs17174870 112660424 112668005 7.58 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 rs9967792 191974435 STAT4 rs9967792 191974332 191997946 23.62 9 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 12 2 rs9989735 231115454 SP140 rs9989735 231110737 231120719 9.98 0 0 0 1 0 1 0 0 22 0 0 1 1 1 0 0 4 31 3 rs2371108 27757018 EOMES rs2371108 27756233 27757018 0.79 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 rs9828629 71530346 FOXP1 rs9828629 71526152 71538137 11.99 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 rs2028597 105558837 CBLB rs2028597 105554066 105601615 47.55 0 12 0 0 0 0 0 0 10 0 0 1 0 0 0 0 2 25 B Briggs FBS 3 rs1131265 119222456 TIMMDC1 rs1131265 119203587 119228508 24.92 14 3 0 0 0 0 0 0 20 2 17 0 1 4 1 1 14 77 3 rs1920296 121543577 IQCB1 rs1920296 121494142 121572252 78.11 33 23 7 2 2 0 0 0 53 1 4 1 1 8 3 0 15 153 3 rs9282641 121796768 CD86 rs9282641 121786069 121803149 17.08 0 7 0 0 0 0 0 0 0 0 7 0 0 1 0 0 0 15 4 rs2726518 106173199 TET2 rs2726518 106151843 106201684 49.84 406 3 7 115 185 265 3 2 646 0 0 3 10 12 138 1 72 1868 5 rs71624119 55440730 ANKRD55 rs71624119 55435796 55443479 7.68 0 10 0 0 0 0 0 0 4 0 0 0 0 2 0 0 6 22 5 none 141506564 NDFIP1 rs7737631 141495917 141529469 33.55 0 0 0 2 8 0 0 0 5 0 59 0 0 3 0 0 17 94 5 rs2546890 158759900 LOC285626 rs2546890 158759531 158759900 0.37 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1

5 rs4976646 176788570 RGS14 rs4976646 176788570 176797343 8.77 0 0 0 1 1 0 0 0 17 0 51 0 0 6 2 0 13 91 al et 6 rs941816 36375304 PXT1 rs941816 36359879 36391236 31.36 0 0 0 0 0 0 0 0 3 0 0 0 0 2 1 0 1 7 6 rs72928038 90976768 BACH2 rs72928038 90951239 90999652 48.41 0 6 0 0 0 0 0 0 0 0 6 0 0 1 0 0 0 13 6 rs11154801 135739355 AHI1 rs11154801 135736276 135741087 4.81 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 rs917116 28172739 JAZF1 rs917116 28171832 28172739 0.91 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 rs60600003 37382465 ELMO1 rs60600003 37372614 37382520 9.91 0 6 0 0 0 0 0 0 2 0 10 0 0 0 0 0 2 20 7 rs354033 149289464 ZNF767 rs354033 149287037 149307098 20.06 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 rs2150702 5893861 MLANA rs2150702 no block 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 rs2104286 6099045 IL2RA rs2104286 6098949 6101129 2.18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 rs1782645 81048611 ZMIZ1 rs1782645 81048280 81048611 0.33 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 rs7120737 47702395 AGBL2 rs7120737 47700540 47707156 6.62 9 0 0 10 9 0 0 0 6 2 12 0 1 1 1 0 7 58 11 rs523604f 118755738 CXCR5 rs523604 no block 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 rs1800693 6440009 TNFRSF1A rs1800693 6439470 6440009 0.54 10 0 0 0 0 0 0 0 7 0 13 0 0 1 0 0 6 37 12 rs11052877 9905690 CD69 rs11052877 9904075 9909956 5.88 15 0 0 1 3 0 0 0 19 0 34 0 0 0 0 0 9 81 12 rs201202118 58182062 TSFM rs12368653 58151295 58213485 62.19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 rs7132277 123593382 PITPNM2 rs7132277 123554604 123597877 43.27 0 13 0 2 2 0 0 0 0 0 5 0 0 1 0 0 0 23 14 rs2236262 69261472 ZFP36L1 rs2236262 69260290 69273090 12.80 12 33 0 0 3 1 0 0 15 0 0 0 0 0 0 0 5 69 14 rs74796499 88432328 GALC rs74796499 88415845 88453278 37.43 93 3 36 5 3 0 0 0 84 1 104 1 1 14 4 0 19 368 14 rs12148050 103263788 TRAF3 rs12148050 103244990 103276855 31.87 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 15 rs8042861 90977333 IQGAP1 rs8042861 90966084 90986403 20.32 0 11 0 5 2 1 0 0 23 0 12 1 0 7 0 0 5 67 16 rs12927355 11194771 CLEC16A rs12927355 11191572 11199447 7.88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 rs1886700 68685905 CDH3 rs1886700 no block 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 rs12149527 79110596 WWOX rs12149527 79109071 79111043 1.97 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 17 rs4796791 40530763 STAT3 rs4796791 40529835 40535845 6.01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 17 rs8070345 57816757 VMP1 rs8070345 57813824 57820356 6.53 6 8 0 0 0 0 0 0 5 2 0 0 0 3 0 0 4 28 18 rs7238078 56384192 MALT1 rs7238078 56382716 56387260 4.55 0 0 0 0 0 0 0 0 2 0 0 0 0 1 0 0 0 3 19 rs1077667 6668972 TNFSF14 rs1077667 6668545 6669934 1.39 0 0 0 0 0 0 0 0 5 0 0 0 0 2 0 0 4 11 19 rs34536443 10463118 TYK2 rs34536443 10458633 10468668 10.04 33 20 0 3 1 3 0 0 59 1 97 0 1 10 1 0 25 254 19 rs2288904 10742170 SLC44A2 rs2288904 10737988 10743161 5.17 39 29 0 11 1 0 0 0 28 2 33 2 0 10 0 0 18 173 19 rs1870071 16505106 EPS15L1 rs1870071 16504516 16510279 5.76 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 3 10 19 rs11554159 18285944 IFI30 rs11554159 18282590 18294923 12.33 9 2 0 1 0 0 0 0 42 0 0 0 2 8 0 1 13 78

& 19 rs8107548 49870643 DKKL1 rs8107548 49864497 49889798 25.30 5 6 0 0 1 0 0 0 42 0 0 0 1 7 2 0 11 75 20 rs4810485 44747947 CD40 rs4810485 44746403 44749251 2.85 0 8 0 0 1 0 0 0 5 0 1 0 0 0 0 0 0 15

04McilnPbihr Limited Publishers Macmillan 2014 20 rs17785991 48438761 SLC9A8 rs17785991 48429682 48449940 20.26 0 8 0 2 1 0 0 0 6 0 10 0 0 1 0 0 2 30 20 rs2256814 62373983 SLC2A4RG rs2256814 62372005 62374389 2.39 5 0 0 1 4 1 0 0 42 0 94 0 0 11 0 0 23 181 20 rs6062314 62409713 ZBTB46 rs6062314 62408472 62414424 5.95 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 rs2283792 22131125 MAPK1 rs2283792 22128678 22131125 2.45 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 rs470119 50966914 TYMP rs470119 50964862 50966914 2.05 0 0 9 0 1 0 0 0 29 0 48 1 1 8 0 0 9 106 Abbreviations: Chr, chromosome; LD, linkage disequilibrium; MHC, major histocompatibility complex; MS, multiple sclerosis; NMD, nonsense-mediated mRNA decay; SNP, single-nucleotide polymorphism; UTR, untranslated region. A role for T-cell dysregulation in MS pathogenesis FBS Briggs et al 471 Table 3. Overrepresented pathways among the 63 non-MHC MS-associated genes

Pathway ID C O E R P-value Genes

KEGG Toxoplasmosis 5145 132 5 0.19 25.93 2.61E À 05 CD40, STAT3, MAPK1, TYK2, TNFRSF1A Hepatitis C 5160 134 5 0.2 25.54 2.61E À 05 TRAF3, STAT3, MAPK1, TYK2, TNFRSF1A Jak–STAT signaling 4630 155 5 0.23 22.08 3.56E À 05 IL2RA, STAT3, TYK2, STAT4, CBLB Toll-like 4620 102 4 0.15 26.85 0.0001 TRAF3, CD40, MAPK1, CD86 signaling Cell adhesion molecules 4514 133 4 0.19 20.59 0.0002 CDH3, CD40, CD86, CD58 Cytokine–cytokine 4060 265 5 0.39 12.92 0.0002 CXCR5, IL2RA, TNFSF14, CD40, TNFRSF1A receptor interaction Chemokine signaling 4062 189 4 0.28 14.49 0.0009 CXCR5, STAT3, MAPK1, ELMO1 T-cell receptor signaling 4660 108 3 0.16 19.02 0.002 MAPK1, CBLB, MALT1 Osteoclast differentiation 4380 128 3 0.19 16.04 0.0032 MAPK1, TYK2, TNFRSF1A Pathways in cancer 5200 326 4 0.48 8.4 0.0041 TRAF3, STAT3, MAPK1, CBLB

Wikipathways pathway Inflammatory Response WP453 32 4 0.05 85.57 5.90E À 06 IL2RA, CD40, TNFRSF1A IL-12 signaling WP2111 11 3 0.02 186.7 9.96E À 06 STAT3, TYK2, STAT4 IL-3 signaling WP286 54 4 0.08 50.71 1.02E À 05 STAT3, MAPK1, CD86, CD69 Type III interferon WP2113 14 3 0.02 146.69 1.02E À 05 STAT3, TYK2, STAT4 signaling TSLP signaling WP2203 49 4 0.07 55.88 1.02E À 05 IL2RA, STAT3, MAPK1, STAT4 IL-7 signaling WP205 25 3 0.04 82.15 4.56E À 05 IL2RA, STAT3, MAPK1 IL-17 signaling WP2112 38 3 0.06 54.04 0.0001 TRAF3, STAT3, MAPK1 Toll-like receptor WP75 116 4 0.17 23.61 0.0001 TRAF3, CD40, MAPK1, CD86 signaling Interleukin-11 signaling WP2332 49 3 0.07 41.91 0.0002 STAT3, MAPK1, TYK2 IL-2 signaling WP49 53 3 0.08 38.75 0.0003 IL2RA, STAT3, MAPK1

Pathway Commons pathway IL-12-mediated signaling DB_ID:1633 113 8 0.17 48.46 7.23E À 10 EOMES, CD86, STAT4, TNFRSF1A, MALT1, IL2RA, STAT3, events TYK2 TCR signaling in CD8 þ T DB_ID:1521 129 7 0.19 37.15 5.34E À 08 IL2RA, EOMES, MAPK1, CD86, STAT4, TNFRSF1A, MALT1 cells Insulin pathway DB_ID:1466 1288 13 1.88 6.91 1.46E À 07 EOMES, ELMO1, STAT4, IQGAP1, TNFRSF1A, CBLB, MALT1, ZFP36L1, IL2RA, ZMIZ1, STAT3, MAPK1, TYK2 CD40/CD40L signaling DB_ID:1479 58 5 0.08 59.01 1.46E À 07 TRAF3, CD40, TNFRSF1A, CBLB, MALT1 Nectin adhesion DB_ID:1472 1295 13 1.89 6.87 1.46E À 07 EOMES, ELMO1, STAT4, IQGAP1, TNFRSF1A, CBLB, MALT1, ZFP36L1, IL2RA, ZMIZ1, STAT3, MAPK1, TYK2 ErbB1 downstream DB_ID:1602 1288 13 1.88 6.91 1.46E À 07 EOMES, ELMO1, STAT4, IQGAP1, TNFRSF1A, CBLB, signaling MALT1, ZFP36L1, IL2RA, ZMIZ1, STAT3, MAPK1, TYK2 Syndecan-1-mediated DB_ID:1454 1300 13 1.9 6.85 1.46E À 07 EOMES, ELMO1, STAT4, IQGAP1, TNFRSF1A, CBLB, signaling events MALT1, ZFP36L1, IL2RA, ZMIZ1, STAT3, MAPK1, TYK2 Signaling events DB_ID:1516 1296 13 1.89 6.87 1.46E À 07 EOMES, ELMO1, STAT4, IQGAP1, TNFRSF1A, CBLB, mediated by VEGFR1 and MALT1, ZFP36L1, IL2RA, ZMIZ1, STAT3, MAPK1, TYK2 VEGFR2 Class I PI3K signaling DB_ID:1553 1288 13 1.88 6.91 1.46E À 07 EOMES, ELMO1, STAT4, IQGAP1, TNFRSF1A, CBLB, events MALT1, ZFP36L1, IL2RA, ZMIZ1, STAT3, MAPK1, TYK2 GMCSF-mediated DB_ID:1461 1292 13 1.89 6.89 1.46E À 07 EOMES, ELMO1, STAT4, IQGAP1, TNFRSF1A, CBLB, signaling events MALT1, ZFP36L1, IL2RA, ZMIZ1, STAT3, MAPK1, TYK2 Abbreviations: C, number of reference genes in the category; E, expected number in the category; GMCSF, granulocyte macrophage colony-stimulating factor; IL, interleukin; Jak, Janus kinase; KEGG, Kyoto Encyclopedia of Genes and Genomes; MHC, major histocompatibility complex; MS, multiple sclerosis; O, number of genes in the candidate gene list and within the category; PI3K, phosphatidylinositol 3-kinase; R, ratio of enrichment; SNP, single-nucleotide polymorphism; STAT, signal transducer and activator of transcription; TCR, T-cell receptor; VEGFR, vascular endothelial growth factor receptor.

cells (H1 and K562, Po1  10 À 8); B lymphoblastocytes (GM12878; integrated variant set (March 2012)10 (see Materials and methods). P ¼ 2.4  10 À 5) and epidermal keratinocytes (NHEK; P ¼ 5.3 To start, there were over 208 000 functional assignments within  10 À 5) (Supplementary Table 1). These variants were also more the 63 MS risk genes (Supplementary Table 3); when excluding likely to reside within DNase sites in multiple cells lines including intronic and noncoding transcript annotations, there were 81 311 CD20 þ B cells (P ¼ 6.4  10 À 4), T helper cells type 1 (P ¼ 2.5 functional assignments. Many SNPs possess multiple functional  10 À 4), CD4 þ T helper cells type 0 (P ¼ 0.0023) and several B assignments because of transcript variation; for example, the TYK2 lymphoblastocytes (Supplementary Table 1). variant rs140324156 is annotated as synonymous, splice region, To explore candidate functional variants within the 63 noncoding exon and 30 UTR variant (Supplementary Table 4). MS-associated genes, we cataloged all functional assignments There were no LD blocks spanning the risk variants within CDH3, (1000 Genomes release 13 December 2012 EBI; Supplementary CXCR5 and MLANA (Table 2). For the remaining 60 SNPs, the blocks Table 2) within the LD block boundaries containing the associated sizes ranged from 332 base pairs (ZMIZ1) to 78.1 kb (IQCB1), with a risk variant. LD blocks were constructed using the genotypes for median size of 7.8 kb and a mean size of 16.2 kb (s.d. ¼ 17.6; Northern Europeans available in the 1000 Genomes Phase 3 Table 2).

& 2014 Macmillan Publishers Limited Genes and Immunity (2014) 466 – 476 A role for T-cell dysregulation in MS pathogenesis FBS Briggs et al 472 There were functional variants within the LD block boundaries PAX4, Oct-1, NF-kB, Chx10, EST1, LEF1, Oct, ELK1 and AP-4 encompassing 43 risk SNPs; for 17 risk SNPs there were no (Po5  10 À 3; Table 4). Prednisone and immune globulins were functional variant variants within the LD block boundaries among the drugs more strongly associated with these genes (Table 2). The total number of functional assignments across (Po5  10 À 3; Table 4). Furthermore, there was an overrepresen- these 43 genes was almost 116 000 annotations; when removing tation of several protein–protein interactions; the genes involved intronic and noncoding transcript annotations, there were 52 439 are largely reflective of the genes identified in enriched pathways functional assignments (Figure 1a). However, when focusing on (Table 4). variation within the LD blocks alone, there were 2816 unique SNPs (Supplementary Table 4) with 5244 functional assignments (Figure 1b) and included: 932 30 UTR SNPs, 249 50 UTR SNPs, DISCUSSION 59 coding sequence SNPs, 213 feature elongation variants, 254 Characterization of the genetic component of MS has greatly feature truncation variants, 272 frameshift variants, 3 inframe advanced within the past decade, primarily because of interna- deletions, 2 inframe insertions, 42 nonsense-mediated mRNA tional efforts and the completion of recent GWASs followed by decay transcript variants, 957 noncoding exon variants, 13 splice replication and meta-analyses. The primary genetic risk locus for acceptor variants, 24 splice donor variants, 172 splice region MS is within the MHC, and recent efforts have finally begun to variants, 161 stop gained variants, 3 stop lost variants, 423 unravel the complexities within the region.6 Thus far, 110 non- synonymous SNPs and 1465 missense SNPs (Table 2). Over 35% of MHC disease-associated loci have been identified, although the all functional assignments were located within the 49.8-kb LD functional pathological variants within these loci are largely block encompassing TET2 rs2726518. There was a greater unknown.4,5 We conducted a two-part bioinformatics analysis of percentage of missense assignments among the 2816 unique these risk loci. First, all likely functional candidates within genes SNPs (28%), compared with 21% among all assignments (not established through GWAS and follow-up studies were identified. including intronic and noncoding transcript annotations). A similar Second, enrichment analyses were conducted to explore the increase was observed for 30 UTR assignments (10–18%) and underlying biology. Our results underscore the importance of frameshifts (3–5%), when comparing the 52 439 assignments cataloging likely functional candidates and assessing enrichment across the 43 genes to the 5244 assignments within the 43 LD of various biological features as they relate to function and blocks (Figure 1). regulatory capacity of coding and noncoding variants. A major interest in human genetics is to distinguish missense GWASs are hypothesis free and do not provide any explicit polymorphisms that are functionally neutral from those that may biological evidence relating associated SNPs to disease; only a few contribute to disease. Using protein prediction algorithms, 816 SNPs per LD region are usually interrogated. As a result, it is missense variants were predicted to be ‘deleterious’ and/or unclear whether MS-associated SNPs are the causal variants; ‘probably damaging’ for at least one gene transcript by SIFT however, it is expected that the adjacent genomic regions in (sorting intolerant from tolerant)11 and PolyPhen (polymorphism LD with the associated variant are likely to contain the biologically phenotyping),12 respectively, of which 373 missense variants were relevant polymorphisms.14 Elucidating the causal variants within both ‘deleterious’ and ’probably damaging’ (Supplementary these susceptibility loci requires extensive functional experiments Table 4; see Materials and methods). For the three missense in conjunction with comprehensive fine-mapping analyses MS risk variants, the predicted impact on protein function in larger association studies; both are time intensive and was: TYK2 rs34536443: ‘deleterious’ and ‘probably damaging’; expensive.15,16 The utilization of bioinformatics resources to SLC44A2 rs2288904: ‘tolerated’ and ‘benign’; and IFI30 rs11554159: identify and prioritize candidate SNPs that may be ‘deleterious’ and ‘possibly damaging’. phenotypically relevant is an efficient and cost-effective Functional classification of the 63 MS-associated genes approach. Available bioinformatics tools can also reveal putative was conducted using Web-based Gene Set Analysis Toolkit biological processes and pathways not previously known to (WebGestalt; updated January 2013; http://bioinfo.vanderbilt. contribute to pathogenicity. Until recently, determining edu/webgestalt/),13 and compared with all human genes, it functionality was largely restricted to nonsynonymous SNPs, and suggests that there was an enrichment of several KEGG (Kyoto whether the encoded amino acid change was a neutral or Encyclopedia of Genes and Genomes) pathways (Table 3), deleterious substitution in the resulting protein. However, including toxoplasmosis (P ¼ 2.6  10 À 5), hepatitis C (P ¼ 2.6  10 currently available resources can be used to interrogate a role À 5), Janus kinase (JAK)–STAT signaling pathway (P ¼ 3.6  10 À 5), for both genic and intergenic variations in protein regulation, Toll-like receptor signaling pathway (P ¼ 1  10 À 4), cell adhesion expressivity and function. Furthermore, enrichment analyses can molecules (P ¼ 2  10-4) and T-cell receptor (TCR) signaling (P ¼ 5 identify new experimental targets, including microRNA and  10 À 4). The Pathways Commons analysis determined that transcription factor binding sites. Results from bioinformatics interleukin (IL)-12-mediated signaling (P ¼ 7.2  10 À 10), TCR analyses presented here emphasize that many MS risk variants are signaling (P ¼ 5.3  10 À 8), insulin pathway (P ¼ 1.5  10 À 7), not located within coding region of genes, and are likely to CD40/CD40L signaling (P ¼ 1.5  10 À 7) and several pathways that participate in gene regulation and expressivity. A total of 84% shared a common gene set (P ¼ 1.5  10 À 7; nectin adhesion, of the 110 established MS risk variants are located within a ErbB1 downstream, syndecan-1-mediated, vascular endothelial regulatory motif, and 25% are within a transcription factor binding growth factor receptor 1/2-mediated, class I phosphatidylinositol site. The results also suggest complex interactions among MS risk 3-kinase and granulocyte macrophage colony-stimulating factor- genes, including STAT3 with PVT1, JAZF1 and ZFP36L1. mediated signaling pathways) were enriched among the MS By focusing on variation within the LD block boundaries genes (Table 3). Using the Wiki Pathway database, genes involved spanning 43 of the 63 genic MS risk SNPs, we identified 2816 SNPs in inflammatory response (P ¼ 5.9  10 À 6) and IL-2, IL-3, IL-7, as primary candidates for functional and fine-mapping investiga- IL-11, IL-12 and IL-17 signaling pathways (Po5  10 À 3) were tions. Within these genes, there were 52 439 functional assign- overrepresented (Table 3). ments; however, there was a 90% reduction in assignments when There was an enrichment of microRNA targets within the 63 restricting to the LD blocks (5244 assignments). Within this MS-associated genes, including targets for miR-182, miR-524, catalog, 28% of these assignments were missense annotations; miR-122A, miR-506, miR-29A, miR-29B, miR-29C, miR-181A, there were also 3 stop lost variants, 161 stop gain variants, 209 miR-181B, miR-181C, miR-181D and miR-124A (Po5  10 À 3; splice variants and 277 structural insertions, deletions and Table 4). There were several transcription factor binding targets frameshifts. Based on these results, fine-mapping and functional enriched among these genes; the most common were for NFAT, studies can be focused; for example, MALT1 contains two missense

Genes and Immunity (2014) 466 – 476 & 2014 Macmillan Publishers Limited A role for T-cell dysregulation in MS pathogenesis FBS Briggs et al 473 Table 4. Overrepresented biological features, drug associations and protein–protein interactions among the 63 MS-associated genes

Biological feature ID C O E R P-value Genes

Transcription target hsa_TGGAAA_V$NFAT_Q4_01 DB_ID:2437 1871 14 2.73 5.12 6.77E À 05 CXCR5, TYMP, EOMES, ELMO1, WWOX, IQCB1, CD86, TNFRSF1A, ZFP36L1, ZMIZ1, FOXP1, PHGDH, STAT3, SLC30A7 hsa_V$PAX4_02 DB_ID:2065 233 6 0.34 17.63 0.0001 PITPNM2, ZMIZ1, EOMES, ELMO1, STAT3, FOXP1 hsa_V$OCT1_Q5_01 DB_ID:2306 265 5 0.39 12.92 0.0008 TRAF3, IQCB1, FOXP1, CD86, STAT4 hsa_V$NFKB_Q6_01 DB_ID:2259 231 5 0.34 14.82 0.0008 CXCR5, IL2RA, CD40, ELMO1, CD69 hsa_TAATTA_V$CHX10_01 DB_ID:2408 797 8 1.16 6.87 0.0008 PITPNM2, BACH2, EOMES, ELMO1, CBLB, ZFP36L1, ZMIZ1, STAT3 hsa_V$ETS1_B DB_ID:2057 255 5 0.37 13.42 0.0008 RGS14, ELMO1, FOXP1, CBLB, SLC30A7 hsa_CTTTGT_V$LEF1_Q2 DB_ID:2428 1939 12 2.83 4.24 0.0008 TET2, BACH2, EOMES, EPS15L1, CDH3, C1orf106, ZFP36L1, ZMIZ1, NDFIP1, FOXP1, STAT3, SLC30A7 hsa_V$OCT_Q6 DB_ID:2271 257 5 0.38 13.32 0.0008 TRAF3, IQCB1, FOXP1, CD86, STAT4 hsa_V$ELK1_01 DB_ID:1853 266 5 0.39 12.87 0.0008 IFI30, ELMO1, TIMMDC1, STAT4, CBLB hsa_CAGCTG_V$AP4_Q5 DB_ID:2403 1502 10 2.19 4.56 0.001 CXCR5, BACH2, EOMES, SLC44A2, CDH3, TRAF3, ZFP36L1, ZMIZ1, FOXP1, STAT3

MicroRNA target hsa_TTGCCAA,MIR-182 DB_ID:757 324 6 0.47 12.68 0.0004 SLC44A2, ZFP36L1, BACH2, ELMO1, EVI5, JAZF1 hsa_CTTTGTA,MIR-524 DB_ID:802 431 6 0.63 9.53 0.0007 ZMIZ1, FOXP1, STAT4, JAZF1, IQGAP1, SLC30A7 hsa_ACACTCC,MIR-122A DB_ID:719 69 3 0.1 29.76 0.0007 BACH2, GALC, IQGAP1 hsa_GTGCCTT,MIR-506 DB_ID:712 712 7 1.04 6.73 0.0007 SLC44A2, ZFP36L1, BACH2, EVI5, STAT3, IQGAP1, SLC30A7 hsa_TGGTGCT,MIR-29A,MIR- DB_ID:671 515 6 0.75 7.98 0.0007 IFI30, ZFP36L1, PITPNM2, BACH2, 29B,MIR-29C TNFRSF1A, ZBTB46 hsa_TGAATGT,MIR-181A,MIR- DB_ID:669 479 6 0.7 8.57 0.0007 ZFP36L1, BACH2, MAPK1, FOXP1, JAZF1, 181B,MIR-181C,MIR-181D CBLB hsa_TGCCTTA,MIR-124A DB_ID:811 542 6 0.79 7.58 0.0007 BACH2, NDFIP1, STAT3, VMP1, JAZF1, IQGAP1 hsa_ACTGCCT,MIR-34B DB_ID:831 215 4 0.31 12.74 0.0019 PLEKHG5, MAPK1, FOXP1, JAZF1 hsa_TAGCTTT,MIR-9 DB_ID:759 234 4 0.34 11.7 0.0023 ZFP36L1, PITPNM2, BACH2, JAZF1 hsa_CACTGCC,MIR-34A,MIR- DB_ID:673 277 4 0.4 9.89 0.0036 SLC44A2, PLEKHG5, FOXP1, SLC2A4RG 34C,MIR-449

Drug associations Prednisone DB_ID:PA451100 33 3 0.05 62.23 0.0002 IL2RA, CD40, CD86 Immune globulin DB_ID:PA164754884 624 7 0.91 7.68 0.0002 SLAMF7, IL2RA, CD40, CD86, STAT4, FCRL1, TNFRSF1A Nilotinib DB_ID:PA165958345 21 2 0.03 65.2 0.0014 AHI1, CD69 Dasatinib DB_ID:PA162372878 18 2 0.03 76.06 0.0014 CD40, STAT3 Sorafenib DB_ID:PA7000 25 2 0.04 54.76 0.0016 STAT3, MAPK1 Prednisolone DB_ID:PA451096 27 2 0.04 50.71 0.0016 IL2RA, FOXP1 Phenylephrine DB_ID:PA450935 36 2 0.05 38.03 0.0026 DDAH1, MAPK1 Alseroxylon DB_ID:PA164746411 43 2 0.06 31.84 0.0031 CD40, FOXP1 Vincristine DB_ID:PA451879 45 2 0.07 30.42 0.0031 CD40, FOXP1 Cyclophosphamide DB_ID:PA449165 52 2 0.08 26.33 0.0038 CD40, FOXP1

Protein–protein interaction Hsapiens_Module_531 DB_ID:531 52 4 0.18 21.95 0.0009 TRAF3, TNFSF14, CD40, TNFRSF1A Hsapiens_Module_540 DB_ID:540 7 2 0.02 81.51 0.0029 DDAH1, IQCB1 Hsapiens_Module_863 DB_ID:863 7 2 0.02 81.51 0.0029 STAT3, STAT4 Hsapiens_Module_594 DB_ID:594 42 3 0.15 20.38 0.0029 STAT3, TYK2, STAT4 Hsapiens_Module_408 DB_ID:408 103 4 0.36 11.08 0.0029 NDFIP1, MLANA, TIMMDC1, EPS15L1 Hsapiens_Module_287 DB_ID:287 115 4 0.4 9.92 0.0034 IL2RA, STAT3, TYK2, STAT4 Hsapiens_Module_110 DB_ID:110 212 5 0.74 6.73 0.0037 IL2RA, MAPK1, STAT3, TYK2, STAT4 Hsapiens_Module_203 DB_ID:203 341 6 1.2 5.02 0.0043 TRAF3, TNFSF14, CD40, JAZF1, TNFRSF1A, MALT1 Hsapiens_Module_25 DB_ID:25 1871 15 6.56 2.29 0.0052 AHI1, ELMO1, CD86, STAT4, IQGAP1, CBLB, IFI30, SLC44A2, CDH3, IL2RA, DKKL1, MAPK1, MERTK, STAT3, TYK2 Hsapiens_Module_299 DB_ID:299 257 4 0.9 4.44 0.0362 DKKL1, CD86, IQGAP1, CBLB Abbreviations: C, number of reference genes in the category; E, expected number in the category; MS, multiple sclerosis; O, number of genes in the candidate gene list and within the category; R, ratio of enrichment.

& 2014 Macmillan Publishers Limited Genes and Immunity (2014) 466 – 476 A role for T-cell dysregulation in MS pathogenesis FBS Briggs et al 474 Functional Assignment

3 prime UTR variant 5 prime UTR variant Coding sequence variant Feature elongation Feature truncation Frameshift variant Incomplete terminal codon variant Inframe deletion Inframe insertion Initiator codon variant Missense variant NMD transcript variant Non coding exon variant Splice acceptor variant Splice donor variant Splice region variant Stop gained Stop lost Stop retained variant Synonymous variant Figure 1. Graphical distribution of all functional assignments with 43 MS-associated genes (a; 52 439 assignments), and within the nested 43 LD block boundaries (b; 5244 assignments).

SNPs within the LD block boundaries (Supplementary Table 4). States, but most infected individuals are asymptomatic The rs201310286 missense variant is rare (global minor allele (www.cdc.gov). A comparison of MS patients with (N ¼ 12) and frequency o0.001) and is predicted to be ‘deleterious’ (SIFT) and without (N ¼ 12) toxoplasmosis suggest that infected cases had a ‘possibly damaging’ (PolyPhen). MALT1 regulates the activation of less severe disease progression, determined by number of nuclear factor (NF)-kB, a transcription factor for which binding relapses and disability measures, and may be related to increased targets are enriched among the MS-associated genes (Table 4). secretion of IL-10 and transforming growth factor-b and the Results suggest the presence of a hierarchical biological relation- presence of CD25 þ CD4 þ FoxP36 þ T cells.21 A recent ship between MALT1 and other genes with NF-kB binding sites. nationwide population-based cohort study in Taiwan observed In Malt1 À / À mice, T helper 17 cells demonstrated strong an increased risk of developing cancer (85%) among MS cases, infiltration of the central nervous system.17 Unfortunately, for particularly breast cancer (2.2-fold increased risk).22 20 MS-associated genes there were either no LD blocks These results complement prior bioinformatics analyses of non- encompassing the risk variant (for example, CDH3, CXCR5 and MHC MS risk variants. assessment of genes near MLANA) or there were no nonintronic/noncoding transcript the 52 risk variants identified in 2011 and of the 110 risk variants assignments within the blocks (for example, the 332-bp block identified in 2013 suggested that processes involved T-cell spanning ZMIZ1 rs1782645) (Table 2). These 20 genic risk SNPs differentiation were overrepresented.4,5 A protein-interaction- require further attention. network-based pathway analysis of 137 457 SNPs that were Several bioinformatics enrichment analyses were conducted in nonsynonymous and potential deleterious (PolyPhen), located in the current study. The primary biological mechanism highlighted 50 or 30 UTRs, or within transcription factor or histone binding sites by the 63 MS-associated genes is T-cell development, activation in two large genetic MS studies demonstrated that encoded and signaling. We investigated three databases, each with very proteins are more likely to be connected within the protein- distinct unique pathways (number of pathways: KEGG ¼ 390; interaction-network, than by chance alone.22 Based on analyses of Pathway Commons ¼ 1651; WikiPathways ¼ 1018) and pathway 79 nominally significant genes within these two data sets, there names (Supplementary Table 5). Therefore, we utilized multiple was an overrepresentation of genes involved in leukocyte databases for analysis and reviewed the genes identified within activation, apoptosis and positive regulation of macromolecule each enriched pathway. We observed common biological themes metabolic process, JAK–STAT signaling, acute myeloid leukemia from among the enrichment pathways, for example, TCR signaling and TCR signaling using Gene Ontology and KEGG pathways in KEGG (genes: MAPK1, CBLB and MALT1) and TCR signaling in CD databases.23 Our bioinformatics and enrichment analyses differ in 8 þ T cells in Pathway Commons (genes: IL2RA, EOMES, MAPK1, approach; however, similar immunological processes were CD86, STAT4, TNFRSF1A and MALT1). Across the three pathway highlighted across all enrichment analyses. databases, TCR signaling processes were enriched. Similarly, the Functional variants are likely to be adjacent to the GWAS signal, transcription factors overrepresented among the binding targets existing within LD blocks encompassing the risk SNPs. However, are also involved in T-cell development (nuclear factor of activated regions of strong LD can be large and SNPs associated with a T-cells (NFAT)18) and activation (NF-kB17). Furthermore, there can be in perfect LD with SNPs several hundred was an enrichment of microRNA targets for several microRNAs kilobases away.14 We utilized LD block boundaries as defined by involved in T-cell clonal expansion (miR182), activation (miR181a) Gabriel et al.24 and based on data from 279 Northern Europeans. and differentiation (miR29).19,20 These results also encourage the Haplotype blocks inferred by different algorithms are likely to exploration of other candidate genes with similar transcription differ in size and coverage given SNP density and allele frequency factor and microRNA binding targets. distributions; however, shared chromosomal regions can increase Interestingly, the enrichment analyses suggest an overlap with higher SNP density and the inclusion of less common between MS-associated genes and genes contributing to suscept- variants.25 ibility for toxoplasmosis and cancer. Toxoplasmosis is a leading In summary, we present the first comprehensive catalog of cause of death attributable to food-borne illnesses in the United functional candidates for MS to expedite prioritization of SNP

Genes and Immunity (2014) 466 – 476 & 2014 Macmillan Publishers Limited A role for T-cell dysregulation in MS pathogenesis FBS Briggs et al 475 selection for fine-mapping and molecular studies. Application of bioinformatic tools (Supplementary Table 6).35 WebGestalt includes bioinformatics and computational tools provided insight into the biological pathways (KEGG,36 updated 21 March 2011; Pathway underlying biological pathways and regulatory processes that may Commons,37 updated 11 November 2012; WikiPathways,38 updated 11 39 contribute to MS onset. The in silico methods employed can be November 2012), regulatory modules (Molecular signatures database; adapted by any investigator to a priori SNP selection or post hoc updated 11 November 2012), disease and drug association modules (Pharmacogenetics Knowledge Base,40 updated 26 January 2013; Gene List evaluation of variants established through GWASs. Investigators Automatically Derived For You,41 updated 26 January 2013). Enrichment must keep in mind that databases for bioinformatics analysis are analyses were conducted using hygergeometric tests. P-values were not necessarily complete, and the algorithms employed by various corrected for multiple testing based on the false discovery rate using the tools may differ. Thus, we recommend the use of multiple Benjamini and Hochberg procedure.13 We considered all human genes as bioinformatics tools and available resources. Our results empha- the appropriate reference for comparison. The top 10 enriched classes for size a role for T-cell dysregulation in MS pathogenesis, and each bioinformatics analysis were reported. complex gene–gene interactions between STAT3 and PVT1, JAZF1 and ZFP36L1 and MALT1 and NF-kB target genes. CONFLICT OF INTEREST The authors declare no conflict of interest. MATERIALS AND METHODS Characterization of MS risk variants The physical location and functional annotation for each of the 110 non- ACKNOWLEDGEMENTS MHC risk variants were characterized using the Genome Reference We thank Dr Jing Wang and Gary Artim for technical assistance. This study was Consortium GRCh37 (hg19) available in the UCSC Genome Browser26 supported through NIH R01NS049510, NIH R01AI076544 and NIH R01ES017080. and Haploreg v2 (2013.02.14; http://www.broadinstitute.org/mammals/ haploreg/haploreg.php).9 The regulatory capacity of all coding and noncoding variants was also assessed using Haploreg. Haploreg is a bioinformatics resource that includes an extensive library of SNPs (dbSNP REFERENCES 137), motif instances (including position weight matrices collected 1 Oksenberg JR, Barcellos LF. Multiple sclerosis genetics: leaving no stone unturned. from TRANSFAC, JASPAR, protein-binding microarray,and ENCODE CHIP- Genes Immun 2005; 6: 375–387. seq experiments),27–30 expression quantitative trait loci (GTex eQTL 2 Hawkes CH, Macgregor AJ. Twin studies and the heritability of MS: a conclusion. browser)31 and enhancer annotation (Roadmap Epigenome Mapping Mult Scler 2009; 15: 661–667. Consortium).32,33 Allele frequencies are based on 1000 Genomes Phase 1 3 Hemminki K, Li X, Sundquist J, Hillert J, Sundquist K. Risk for multiple sclerosis in low-coverage data.10 Haploreg also performs enhancer and DNase relatives and spouses of patients diagnosed with autoimmune and related con- enrichment analyses using a binomial test, comparing the coverage of ditions. Neurogenetics 2009; 10: 5–11. enhancers, strong enhancers and DNase sites of the query list of variants to 4 Sawcer S, Hellenthal G, Pirinen M, Spencer CC, Patsopoulos NA, Moutsianas L et al. all variants. Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature 2011; 476: 214–219. Identification of functional candidates within MS-associated genes 5 Beecham AH, Patsopoulos NA, Xifara DK, Davis MF, Kemppinen A, Cotsapas C For the 63 genic risk SNPs, all variation within 50 kb upstream or et al. Analysis of immune-related loci identifies 48 new susceptibility variants for downstream of each gene boundary was extracted from the 1000 multiple sclerosis. Nat Genet 2013; 45: 1353–1360. Genomes Phase 3 integrated variant set (March 2012). This data set is 6 Patsopoulos NA, Barcellos LF, Hintzen RQ, Schaefer C, van Duijn CM, Noble JA based on 1000 Genomes Project sequence data freezes from November et al. Fine-mapping the genetic association of the major histocompatibility 2010 (low-coverage whole-genome) and May 2011 (high-coverage exome) complex in multiple sclerosis: HLA and non-HLA effects. PLoS Genet 2013; 9: that was revised in February and March 2012 to remove sequencing e1003926. artifacts for 1092 individuals (246 Africans, 181 Asians and 379 7 Gregory SG, Schmidt S, Seth P, Oksenberg JR, Hart J, Prokop A et al. Interleukin 7 Europeans).10 This autosomal variant set (36 648 992 SNPs; 1 380 736 receptor alpha chain (IL7R) shows allelic and functional association with multiple INDELs; 13 805 structural variants) did not include sites that were sclerosis. Nat Genet 2007; 39: 1083–1091. monomorphic. LD blocks spanning the 63 genic risk SNPs were 8 Gregory AP, Dendrou CA, Attfield KE, Haghikia A, Xifara DK, Butter F et al. TNF determined in 267 Northern Europeans (85 Utah residents with Northern receptor 1 genetic risk mirrors outcome of anti-TNF therapy in multiple sclerosis. and Western European ancestry, 93 Finnish in Finland and 80 British in Nature 2012; 488: 508–511. England and Scotland). Blocks were constructed using the approach 9 Ward LD, Kellis M. HaploReg: a resource for exploring chromatin states, suggested by Gabriel et al.,24 as implemented in PLINK v1.07.34 Functional conservation, and regulatory motif alterations within sets of genetically linked annotation for all variants within the MS-associated genes was variants. Nucleic Acids Res 2011; 40(Database issue): D930–D934. downloaded from the 1000 Genomes Browser (1000 Genomes release 10 Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE et al. An 13 December 2012, EBI). Information for variation was not restricted, and integrated map of genetic variation from 1,092 human genomes. Nature 2012; thus included annotation for variants for which there were no genotypic 491:56–65. data in the phase 3 integrated variant set. We included all functional 11 Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous assignments (Supplementary Table 2) for each SNP; for example, SNPs variants on protein function using the SIFT algorithm. Nat Protoc 2009; 4: within genes with multiple transcripts may also have varying function 1073–1081. dependent on the transcript. We did not consider SNPs that were only 12 Ramensky V, Bork P, Sunyaev S. Human non-synonymous SNPs: server and survey. intronic or noncoding transcript variants across all gene transcripts. For Nucleic Acids Res 2002; 30: 3894–3900. missense variants, impact on protein function based on in silico tools 13 Zhang B, Kirov S, Snoddy J. WebGestalt: an integrated system for exploring gene was also extracted. Predictive values from both SIFT11 (‘tolerated’ to sets in various biological contexts. Nucleic Acids Res 2005; 33(Web Server issue): ‘deleterious assignments’) and PolyPhen12 (‘benign’ to ‘probably damaging W741–W748. assignments’) determined the potential pathogenicity of missense variants. 14 Schaub MA, Boyle AP, Kundaje A, Batzoglou S, Snyder M. Linking disease SIFT values of p0.05 are considered ‘deleterious’, and PolyPhen values of associations with regulatory information in the . Genome Res 40.905 are ‘probably damaging’. We subsequently extracted all functional 2012; 22: 1748–1759. annotation within the LD block boundaries for the 63 genic MS risk SNPs. 15 Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ et al. Finding the missing heritability of complex diseases. Nature 2009; 461: 747–753. 16 Cooper GM, Shendure J. Needles in stacks of needles: finding disease-causal Bioinformatics enrichment analyses of MS-associated genes variants in a wealth of genomic data. Nat Rev Genet 2011; 12: 628–640. Comprehensive bioinformatics analyses for the 63 MS-associated genes 17 Brustle A, Brenner D, Knobbe CB, Lang PA, Virtanen C, Hershenfield BM et al. The were conducted using Web-based Gene Set Analysis Toolkit (WebGestalt; NF-kappaB regulator MALT1 determines the encephalitogenic potential of Th17 updated January 2013; http://bioinfo.vanderbilt.edu/webgestalt/) that cells. J Clin Invest 2012; 122: 4698–4709. incorporates publicly available biological and functional data for enrich- 18 Macian F. NFAT proteins: key regulators of T-cell development and function. Nat ment analysis.13 It includes several unique features compared with other Rev Immunol 2005; 5: 472–484.

& 2014 Macmillan Publishers Limited Genes and Immunity (2014) 466 – 476 A role for T-cell dysregulation in MS pathogenesis FBS Briggs et al 476 19 Stittrich AB, Haftmann C, Sgouroudis E, Kuhl AA, Hegazy AN, Panse I et al. The 31 GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat Genet microRNA miR-182 is induced by IL-2 and promotes clonal expansion of activated 2013; 45: 580–585. helper T lymphocytes. Nat Immunol 2010; 11: 1057–1062. 32 Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic A, Meiss- 20 Baumjohann D, Ansel KM. MicroRNA-mediated regulation of T helper cell differ- ner A et al. The NIH Roadmap Epigenomics Mapping Consortium. Nat Biotechnol entiation and plasticity. Nat Rev Immunol 2013; 13: 666–678. 2010; 28: 1045–1048. 21 Correale J, Farez M. Association between parasite infection and immune 33 Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward LD, Epstein CB et al. responses in multiple sclerosis. Ann Neurol 2007; 61: 97–108. Mapping and analysis of chromatin state dynamics in nine human cell types. 22 Sun LM, Lin CL, Chung CJ, Liang JA, Sung FC, Kao CH. Increased breast cancer risk Nature 2011; 473: 43–49. for patients with multiple sclerosis: a nationwide population-based cohort study. 34 Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D et al. PLINK: a Eur J Neurol 2014; 22: 238–244. tool set for whole-genome association and population-based linkage analyses. 23 IMSGC. Network-based multiple sclerosis pathway analysis with GWAS data from Am J Hum Genet 2007; 81: 559–575. 15,000 cases and 30,000 controls. Am J Hum Genet 2013; 92: 854–865. 35 Wang J, Duncan D, Shi Z, Zhang B. WEB-based GEne SeT AnaLysis Toolkit 24 Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B et al. (WebGestalt): update 2013. Nucleic Acids Res 2013; 41(Web Server issue): The structure of haplotype blocks in the human genome. Science 2002; 296: W77–W83. 2225–2229. 36 Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M. The KEGG resource 25 Indap AR, Marth GT, Struble CA, Tonellato P, Olivier M. Analysis of concordance of for deciphering the genome. Nucleic Acids Res 2004; 32(Database issue): different haplotype block partitioning algorithms. BMC Bioinformatics 2005; 6: 303. D277–D280. 26 Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM et al. The human 37 Cerami EG, Gross BE, Demir E, Rodchenkov I, Babur O, Anwar N et al. Pathway genome browser at UCSC. Genome Res 2002; 12: 996–1006. Commons, a web resource for biological pathway data. Nucleic Acids Res 2011; 27 Berger MF, Philippakis AA, Qureshi AM, He FS, Estep 3rd PW, Bulyk ML. Compact, 39(Database issue): D685–D690. universal DNA microarrays to comprehensively determine transcription-factor 38 Kelder T, van Iersel MP, Hanspers K, Kutmon M, Conklin BR, Evelo CT et al. binding site specificities. Nat Biotechnol 2006; 24: 1429–1435. WikiPathways: building research communities on biological pathways. Nucleic 28 Berger MF, Badis G, Gehrke AR, Talukder S, Philippakis AA, Pena-Castillo L et al. Acids Res 2011; 40(Database issue): D1301–D1307. Variation in homeodomain DNA binding revealed by high-resolution analysis of 39 Liberzon A, Subramanian A, Pinchback R, Thorvaldsdottir H, Tamayo P, Mesirov JP. sequence preferences. Cell 2008; 133: 1266–1276. Molecular signatures database (MSigDB) 3.0. Bioinformatics 2011; 27: 1739–1740. 29 Badis G, Berger MF, Philippakis AA, Talukder S, Gehrke AR, Jaeger SA et al. 40 Hewett M, Oliver DE, Rubin DL, Easton KL, Stuart JM, Altman RB et al. Diversity and complexity in DNA recognition by transcription factors. Science PharmGKB: the Pharmacogenetics Knowledge Base. Nucleic Acids Res 2002; 30: 2009; 324: 1720–1723. 163–165. 30 Kheradpour P, Kellis M. Systematic discovery and characterization of regulatory 41 Jourquin J, Duncan D, Shi Z, Zhang B. GLAD4U: deriving and prioritizing gene lists motifs in ENCODE TF binding experiments. Nucleic Acids Res 2013; 42: 2976–2987. from PubMed literature. BMC Genomics 2012; 13(Suppl 8): S20.

Supplementary Information accompanies this paper on Genes and Immunity website (http://www.nature.com/gene)

Genes and Immunity (2014) 466 – 476 & 2014 Macmillan Publishers Limited