and Immunity (2011) 12, 191–198 & 2011 Macmillan Publishers Limited All rights reserved 1466-4879/11 www.nature.com/gene

ORIGINAL ARTICLE Exploring the CLEC16A reveals a MS-associated variant with correlation to the relative expression of CLEC16A isoforms in thymus

I-L Mero1,2, M Ban3,A˚ R Lorentzen2,4, C Smestad1, EG Celius1, H Sæther2, H Saeedi2, MK Viken2, B Skinningsrud5, DE Undlien5, J Aarseth6, K-M Myhr6, S Granum7, A Spurkland7, S Sawcer3, A Compston3, BA Lie2,8 and HF Harbo1,4,8 1Department of Neurology, Oslo University Hospital, Oslo, Norway; 2Institute of Immunology, Oslo University Hospital, Oslo, Norway; 3Department of Clinical Neurosciences, University of Cambridge, Addenbrooke’s, Cambridge, UK; 4Institute of Clinical Medicine, University of Oslo, Oslo, Norway; 5Department and Institute of Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway; 6The Norwegian registry and biobank, Department of Neurology, Haukeland University Hospital, Bergen, Norway and 7Institute of Basal Medical Science, University of Oslo, Oslo, Norway

Genomewide association studies have implicated the CLEC16A gene in several autoimmune diseases, including multiple sclerosis (MS) and . However, the most associated single-nucleotide polymorphism (SNP) varies, and causal variants are still to be defined. In MS, two SNPs in partial linkage disequilibrium with each other, rs6498169 and rs12708716, have been validated at genomewide significance level. To explore the CLEC16A association in MS in more detail, we genotyped 57 SNPs in 807 Norwegian MS patients and 1027 Norwegian controls. Six highly associated SNPs emerged and were then replicated in two large independent sample sets (Norwegian and British), together including 1153 MS trios, 2308 MS patients and 4044 healthy controls. In combined analyses, SNP rs12708716 gave the strongest association signal in MS (P ¼ 5.3 Â 10À8, odds ratio 1.18, 95% confidence interval ¼ 1.11–1.25), and was found to be superior to the other SNP associations in conditional logistic regression analyses. Expression analysis revealed that rs12708716 genotype was significantly associated with the relative expression levels of two different CLEC16A transcripts in thymus (P ¼ 0.004), but not in blood, possibly implying a thymus- or cell-specific splice regulation. Genes and Immunity (2011) 12, 191–198; doi:10.1038/gene.2010.59; published online 23 December 2010

Keywords: CLEC16A; MS; finemapping; gene expression

Introduction nity likely. The CLEC16A belongs to the C-type lectin family whose immune functions include differ- Genomewide association studies (GWAS) have demon- entiation of self versus non-self glycoproteins and the strated that single-nucleotide polymorphisms (SNPs) induction of the appropriate immune response.10 CLE- in the C-type lectin domain family 16, member A C16A is an atypical C-type lectin carrying only a short C- (CLEC16A) gene at 16p13, influence type lectin domain unable to bind carbohydrates, and the susceptibility in both multiple sclerosis (MS) and type 1 function of the protein is still unknown.9 Limited data diabetes (T1D).1–3 Multiple replication and candidate exists on CLEC16A isoforms, and despite its consider- gene studies have verified this region as being disease able length spanning 200 000 bp, only three transcripts associated in T1D and MS, and association with are annotated at the University of California Santa Cruz CLEC16A has also been found in anti-cyclic citrullinated (UCSC) Genome Browser on Human Feb. 2009 peptide (CCP) negative rheumatoid arthritis, juvenile (GRCh37/hg19) Assembly (http://genome.ucsc.edu). idiopathic arthritis, Addison’s disease and NOD2-nega- All the strongest SNP associations found so far are tive Crohn’s disease.4–9 The association of CLEC16A situated in seemingly noncoding regions, and their SNPs across multiple immune-mediated diseases and the biological consequences are largely unknown. observation that the gene is almost uniquely expressed The original CLEC16A SNP association in MS, found on immune cells, make a common effect on autoimmu- by the International MS Genetics Consortium (IMSGC) GWAS in 2007, was rs6498169 in intron 22.2 This SNP association has been replicated and confirmed at Correspondence: Dr I-L Mero, Department of Neurology, Oslo genomewide significance level in MS (Po5 Â 10À7).6,11 University Hospital, Ulleva˚l, N-0407 Oslo, Norway. In T1D, the strongest association so far has been found E-mail: [email protected] 9 8These authors contributed equally to this work. for four SNPs situated in intron 19 and 22; rs12708716, 1 Received 3 May 2010; revised 29 July 2010; accepted 3 August 2010; rs725613, rs2903692 and rs17673553, which are all in published online 23 December 2010 fairly strong linkage disequilibrium (LD) (r2 ¼ 0.66–1.00; Exploring the CLEC16A gene in MS I-L Mero et al 192 haploview v 4.1, hapmap release 21, CEU sample set MS-associated rs12708716, (ii) test SNPs previously (Utah residents with Northern and Western European reported in autoimmune disease and (iii) tag the entire ancestry from the Centre de’Etude du Polymorphism CLEC16A gene. The panel was genotyped in a Norwe- Human (CEPH) collection).12,13 The LD between the MS- gian sample set, which after removal of three individuals associated rs6498169 and the T1D SNPs is modest with more than 50% missing genotypes, included 804 MS (r2 ¼ 0.18–0.26),12,13 and association with rs6498169 has cases and 1027 controls. Five of 57 markers failed in not been found in T1D.8,14 In contrast, the T1D-associated the screening phase: rs794423 and rs12927773 did not SNPs rs12708716 and rs725613 have indeed also shown genotype because of assay failure, whereas rs45537935, strong association with MS in an IMSGC study including rs7204935 and rs3177383 were genotyped as mono- more than 2000 trio families, 5000 cases and 10 000 morphic. Genotype success rates were above 95% unrelated controls (P ¼ 1.6 Â 10À15), and in a Sardinian for all remaining markers, and all SNPs were in study (P ¼ 6.4 Â 10À5), respectively.4,15 Hardy–Weinberg Equilibrium (P40.001) in patients The reported CLEC16A SNP associations also vary for and controls. Association results and inclusion criteria other autoimmune diseases. The originally described MS- for all SNPs in the screening phase can be found associated SNP rs6498169 has been shown to be associated in Supplementary Table 1. with anti-CCP-negative rheumatoid arthritis and juvenile Five markers were associated with a Po0.001 in the idiopathic arthritis, but not anti-CCP-positive rheumatoid screening phase (Table 1). All were included in the study arthritis, Addison’s disease and NOD2-negative Crohn’s as part of the tagging SNP set, except rs6498169, disease.5,8,16 Vice versa the T1D SNPs rs2903692 and previously reported to be associated with MS and rs12708716 have shown association with NOD2-negative anti-CCP-negative rheumatoid arthritis.2,6,8,11 The most Crohn’s and Addison’s disease, respectively.5,8 significantly associated SNP, rs12923849 has previously Thus to date, MS is the only disease described to be showed some degree of association with T1D.1 The other associated with both rs6498169 and the LD block three top hits in our screening phase; rs9934231, containing the four T1D associated SNPs. The only rs6498168 and rs7206912 have not previously been fine-mapping study of CLEC16A performed in MS so reported to be associated in MS or other autoimmune far is an Australian study (n ¼ 1146 cases and n ¼ 1309 diseases. Because of failure of genotyping rs6498168 controls) of several MS risk genes, where 44 tagging in the replication phase, it was replaced by a SNP in SNPs in the CLEC16A gene pointed towards rs6498169, complete LD (r2 ¼ 1), rs8060411, and this SNP was for the most significantly associated CLEC16A SNP in the completeness, also tested in the screening panel. original IMSGC GWAS.2,17 It does not appear from the Neither the MS-associated rs12708716 (P ¼ 0.006) nor paper whether rs12708716 was tested in this study. any of the SNPs selected on the basis of LD with this In order to further explore the CLEC16A SNP associa- variant (that is, r240.6) were among these top hits. tion in MS, we conducted fine mapping in two phases, However, several of them showed significant association screening a 57 SNP panel in a Norwegian MS patient- in the screening (Po0.01) (Supplementary Table 1). control sample set in the first phase and replicating the As the magnitude of their association was negligibly top hit findings in two large independent Norwegian different from rs12708716, we chose to only include the and British sample sets in the second phase. Further- already confirmed associated SNP rs12708716 in our more, we studied the expression levels of different replication. CLEC16A transcripts in thymus and blood in order to To evaluate the correlation of the six SNPs chosen for correlate these with the genotyping results. replication, we looked at their LD pattern in the screen- ing sample set (Supplementary Figure 1). Rs12923849, rs9934231 and the previously reported rs12708716 were Results relatively uncorrelated with the other top hit SNPs from the screening (r2p0.29), whereas rs6498168, rs7206912 Screening phase and rs6498169 showed correlation with each other Our screening phase comprised a 57 SNP panel (r240.70). All six SNPs were in strong LD as measured specifically designed to (i) explore SNPs in LD with the by D0 (D040.65). In general, strong D0-LD, but limited

Table 1 Results of the top five MS-associated SNPs and the reference SNP rs12708716 in the screening phase

SNP id Risk allele RAF, n ¼ 804 cases, n ¼ 1027 P-value OR GSR HWE controls cases (%)/controls (%) (95% CI) (%) P-valuea

rs12923849 G 88.1/83.1 2.14EÀ05 1.51 (1.24–1.82) 99.0 0.65 rs7206912 G 46.8/40.5 0.0002 1.29 (1.13–1.47) 99.5 0.3 rs6498168 T 40.9/34.8 0.0002 1.29 (1.13–1.48) 98.6 0.68 rs9934231 C 61.0/55.0 0.0003 1.28 (1.12–1.46) 99.0 0.31 rs6498169 G 41.0/35.4 0.0006 1.27 (1.11–1.45) 99.4 0.12 rs8060411b G 41.4/35.9 0.0008 1.26 (1.10–1.45) 96.9 1.0 rs12708716 A 70.4/66.1 0.006 1.22 (1.06–1.40) 99.4 0.58

Abbreviations: GSR, genotype success rate; HWE, Hardy–Weinberg Equilibrium; OR (95% CI), odds ratio 95% confidence interval; RAF, risk allele frequency; SNP, single-nucleotide polymorphism. aThe measurements of deviation from the HWE are given for controls only. bSNP rs8060411 was typed in the screening sample set as a replacement for SNP rs6498168, assay of which failed in the replication phase.

Genes and Immunity Exploring the CLEC16A gene in MS I-L Mero et al 193 r2-LD was observed across the entire region screened confidence intervals ¼ 1.08–1.21). Significant population (Supplementary Figure 1). heterogeneity (Po0.05) was seen for all SNPs except the most associated rs12708716, and to adjust for this Replication and combined phase all combined analyses were calculated with nationality Six SNPs were genotyped in the Norwegian and the as confounder. British replication sample sets. All SNPs had a genotype success rates495% and were in Hardy–Weinberg Equili- LD, haplotype and logistic regression calculations brium (P40.001) in patients and controls. No Mendelian The correlation between the most significantly associated errors were found within the genotyped trios. Removal SNP rs12708716 and the other five replicated top hit of individuals (15 Norwegian and 19 British) and trios SNPs was modest (r2o0.4; Figure 1a), but LD at the 0 0 (n ¼ 40) with less than 50% genotype success rates, D level was high (D 40.65; Figure 1b). The relationship left 1201 Norwegian cases, 2054 Norwegian controls, between the six top hit SNPs was therefore further 1113 British trios, 1097 UK cases and 1966 UK controls for explored by haplotype analyses (Table 3), which inter- further analyses. The results from the replication and estingly showed that only six haplotypes were com- combined analyses of the screening and replication monly seen (45%) in the tested populations. Among phases are shown in Table 2. In the Norwegian these six haplotypes there was only one significant risk replication sample set, all SNPs except rs12923849 were haplotype carrying all the risk alleles of the six replicated replicated to be associated at the 5% significance level, SNPs, and one highly associated protective haplotype with rs7296012 being the most strongly associated harbouring all the protective alleles. One other haplotype (P ¼ 2.3 Â 10À4). In the UK replication sample set, only identical for all SNPs except carrying the risk allele of rs12708716 (P ¼ 4.5 Â 10À4) and rs12923849 (P ¼ 0.015) SNP rs12923849, showed a protective effect (P ¼ 0.02), were significantly associated in the trios and case which would speak against this SNP being a causal controls combined. The association with rs12708716 variant. Among the not significantly associated haplo- had previously been found for the entire UK sample types, were haplotypes that carry either the risk or set in an IMSGC study.4 However, for all the other SNPs protective allele for rs12708716 and rs9934231. They were included in the replication, the trends of association also seen to always carry all nonrisk alleles of the three in the British samples were in the same direction as for SNPs mapping to one LD block; rs7260912, rs806411 and the two Norwegian sample sets. rs6498169. Lastly, we performed logistic regression Combining the screening and replication data includ- analyses on all top hit SNPs to further explore the ing genotypes of 1113 trios, 3102 MS patients and 5047 dependency of these SNP associations. The other five top controls (Table 2) collected in Norway and the UK, the hit SNPs lost their significant association when con- strongest association was seen for SNP rs12708716 ditioning on rs12708716, whereas rs12708716 remained (P ¼ 5.3 Â 10À8, P corrected ¼ 3.2 Â 10À7, odds ratio ¼ 1.18, highly significant when conditioning on the other top hit 95% confidence intervals ¼ 1.11–1.25), which correlates SNPs (Table 4). These data suggest that rs12708716 is with earlier reports in MS and T1D.4,9 The other the best candidate for being the causal variant in MS. SNPs also reached highly significant P-values on their own, including SNP rs7206912, not previously Expression analyses demonstrated to be disease associated (P ¼ 3.4 Â 10À6, We analysed CLEC16A RNA expression in 42 thymic P corrected ¼ 2.1 Â10À5, odds ratio ¼ 1.14, 95% tissue samples, and in 24 peripheral whole-blood

Table 2 Association testing of the SNPs selected for replication shown in the individual replication cohorts and the total sample set including the screening and replication cohorts

SNP id Risk allele Norwegian cases and UK cases and UK trio families, Screening and replication controls, n ¼ 1201 cases, controls, n ¼ 1097 cases, n ¼ 1113 combined, n ¼ 3102 cases, n ¼ 2054 controls n ¼ 1966 controlsa n ¼ 5047 controls, n ¼ 1113 trios

RAF case/ P-value OR RAF (%) P-value OR RAF (%) P-value OR P-valueb OR controls (95% CI) case/controls (95% CI) trans/untrans (95% CI) (95% CI) (%) (%) (%) rs12708716c A 70.6/66.7 0.001 1.20 (1.07–1.34) 68.7/64.9 0.003 1.19 (1.06–1.33) 69.0/65.1 0.006 1.20 (1.05–1.36) 5.3EÀ08 1.18 (1.11–1.25) rs7206912 G 46.0/41.3 0.0002 1.21 (1.09–1.34) 39.1/36.7 0.07 1.11 (0.99–1.23) 40.5/38.5 0.19 1.09 (0.96–1.23) 3.4EÀ06 1.14 (1.08–1.21) rs6498169d G 40.6/36.7 0.002 1.18 (1.06–1.31) 36.0/33.8 0.09 1.10 (0.99–1.23) 37.4/35.2 0.13 1.10 (0.97–1.25) 2.4EÀ05 1.13 (1.07–1.20) rs9934231 C 58.0/55.2 0.03 1.12 (1.01–1.24) 55.0/52.1 0.03 1.12 (1.01–1.25) 55.0/53.3 0.3 1.07 (0.94–1.21) 4.6EÀ05 1.12 (1.06–1.19) rs8060411 G 40.8/37.4 0.008 1.15 (1.04–1.28) 36.9/34.4 0.04 1.12 (1.00–1.25) 37.6/35.8 0.25 1.08 (0.95–1.22) 5.6EÀ05 1.12 (1.06–1.19) rs12923849 G 85.9/85.2 0.49 1.05 (0.91–1.22) 84.0/82.4 0.12 1.12 (0.97–1.29) 85.3/82.3 0.007 1.26 (1.07–1.48) 0.0001 1.16 (1.08–1.26)

Abbreviations: CI, confidence interval; RAF, risk allele frequency in cases/controls and transmitted/untransmitted (trios); SNP, single- nucleotide polymorphism. aThe UK control frequencies for SNP rs12923849, rs9934231 and rs6498169 were obtained through the Wellcome Trust Case Control Consortium (WTCCC) website (www.wtccc.org.uk). bThe combined P-value accounts for significant population heterogeneity seen for SNPs rs 7206912, rs9934231, rs6498169, rs800411 and is calculated by setting nationality to confounder. cGenotype data for rs12708716 was already available and published for 1029 Norwegian controls8 and for the UK sample set.4 dGenotype data for rs6498169 was already available and published for 1350 UK cases and 1966 controls.2

Genes and Immunity Exploring the CLEC16A gene in MS I-L Mero et al 194 rs9934231 rs8060411 rs7206912 rs6498169 rs9934231 rs8060411 rs7206912 rs6498169 rs12923849 rs12708716 rs12923849 rs12708716

12 3456 123456

98 67 89 92 99 22 27 22 73 82 96 83 92 94 35 33 28 87 81 85 91 7 41 23 90 84 10 33 89 8

Figure 1 LD plots of the SNPs in the replication phase on the basis of genotypes from all sample sets included in the combined analysis. (a)D0-LD. (b)R2-LD.

Table 3 Haplotype analyses for SNPs rs12923849, rs9934231, homozygotes for the risk allele (peripheral blood n ¼ 14, rs12708716, rs8060411, rs7206912 and rs6498169 based on the thymus n ¼ 17) and carriers of the protective allele combined screening and replication genotype data (peripheral blood n ¼ 8, thymus n ¼ 21). The rationale for this division was to increase statistical power as few Haplotypea Haplotype frequencyb P-valuec OR thymus samples (n ¼ 6) and no blood samples were (95% CI) homozygotes for the protective allele. No association Cases/ Trios (%) with rs12708716 genotype was found looking at the controls trans/ expression levels of the assay comprising all isoforms nor (%) untrans for the short and long isoform separately. However, we found that the relative expression of the short and long G-C-A-G-G-G 35.3/32.0 34.7/31.7 7.9EÀ05 1.07 (0.96–1.19) A-G-G-C-C-A 12.2/14.6 13.5/16.2 1.8EÀ05 0.84 (0.75–0.96) isoform was significantly associated with rs12708716 G-G-G-C-C-A 10.6/11.8 11.6/12.0 0.02 0.90 (0.79–1.02) genotype in the thymus samples (P ¼ 0.004), where a G-C-A-C-C-Ad 10.4/10.6 9.9/10.1 0.81 1.0 higher relative proportion of the shorter isoform versus G-C-G-C-C-A 5.5/5.7 5.1/5.4 0.59 0.96 (0.82–1.14) the long isoform was found for the individuals harbour- G-G-A-C-C-A 15.2/16.0 17.3/15.7 0.69 0.97 (0.86–1.10) ing the risk genotype AA. As we compared the relative

CTs of the two isoform assays directly and not through a Abbreviations: OR (95% CI), odds ratio (95% confidence interval); standard curve, we cannot tell whether this relationship Trans/untrans, frequency of transmitted/untransmitted allele. is because of an increase in the short isoform or a aThe alleles are given for rs12923849, rs9934231, rs12708716, decrease in the long isoform. The same relationship was rs8060411, rs7206912 and rs6498169, respectively. not found for the peripheral blood samples (P ¼ 0.76) bHaplotype minor frequency threshold is set to 5%. (Figure 2). cThe P-values account for population heterogeneity and are calculated by setting nationality to confounder. All P-values are noncorrected for mulitple testing. dG-C-A-C-C-A was set to reference haplotype. Discussion The results of our fine-mapping analysis of CLEC16A in a large sample set including 1113 MS trios, 3102 MS samples from healthy control subjects. Not knowing patients and 5047 healthy controls from Norway and the which isoforms to expect in our thymus samples, we first UK, indicated that the strongest CLEC16A SNP associa- used an assay covering all known transcripts, and found tion in MS was the previously reported MS-associated that CLEC16A transcripts were abundantly expressed SNP rs12708716. Association at the other SNPs tested both in thymic tissue and in peripheral blood. We went seemed to be related to LD with rs12708716, with on to test the expression of two CLEC16A transcripts; the risk allele from each of the other SNPs occurring on comprising a short (21 exons) and a full-length protein the same haplotype as the risk allele of rs12708716. variant (24 exons) in the thymus tissue and peripheral Logistic regression showed that rs12708716 withstood blood samples in view of their rs12708716 genotype. conditioning on the five other top hit SNPs. Likewise the A third transcript is also described in Genome Browser other SNPs lost significance conditioning on rs12708716. (http://genome.ucsc.edu), however, this comprises Altogether, among the SNPs tested the MS-asso- only four exons, and was not further investigated. ciated SNP rs12708716 appeared to exert the superior Four thymus samples and two peripheral blood samples association. 0 were excluded because of cycle threshold (CT) s.d.40.1. Haplotype analysis and LD-D showed that there had The remaining samples were divided into two groups: been little historical recombination on the main haplo-

Genes and Immunity Exploring the CLEC16A gene in MS I-L Mero et al 195 Table 4 Conditional logistic regression analyses of the SNPs in the replication phase

SNP Combined P cond P cond P cond P cond P cond P cond P-value rs12708716 rs7206912 rs6498189 rs9934231 rs8060411 rs12923849 rs12708716 5.3EÀ08 NA 0.0009 0.0002 0.002 0.0002 0.004 rs7206912 3.4EÀ06 0.04 NA 1 0.07 0.11 0.0008 rs6498169 2.4EÀ05 0.06 1 NA 0.09 1 0.002 rs9934231 4.6EÀ05 0.17 0.06 0.02 NA 0.13 0.01 rs8060411 5.6EÀ05 0.14 0.6 1 0.03 NA 0.0008 rs12923849 0.0001 0.06 0.01 0.005 0.08 0.06 NA

Abbreviations: NA, not applicable; P cond, P-value of logistic regression analysis conditioning on the respective SNP; SNP, single-nucleotide polymorphism. Logistic regression testing was performed setting haplotype frequency threshold to 0.05. The P-values account for population heterogeneity and are calculated by setting nationality to confounder. All P-values are noncorrected for multiple testing.

strong LD with this SNP. Interestingly, Todd and colleagues have performed extensive sequencing on the CLEC16A gene region, and found neither non-synon- ymous or rare variants within the gene, nor any variant in the neighbouring genes that overcome the association of the CLEC16A SNP rs12708716 in T1D.9,18 Both up- and downstream of the CLEC16A gene are other interesting candidate genes for autoimmune disease, including the MHC2 transcription factor CIITA and the cytokine signalling suppressor SOCS1 gene (http://genome.ucsc.edu/). However, CLEC16A lies within its own LD block, and little D0 LD is seen with these two candidate genes on either side (www.hapmap. org). Thus, the disease association signal picked up by GWAS is likely to come from the CLEC16A gene itself. In our screening panel, we genotyped one SNP down- stream of SOCS1, rs243329 (P ¼ 0.03), which has been found associated with T1D, and one SNP in CIITA, rs8048002 (P ¼ 0.71), which has shown association with Addison’s disease.7,9 Both SNPs should based on low LD (D0o0.35, r2o0.1), be independent of the CLEC16A- associated SNPs found in MS (see Supplementary Figure 1). Still, evidence of a role for SOCS1 and CIITA in MS is emerging. Recently, Zuvich et al.19 published a study showing association of a polymorphism in the SOCS1 gene associated in MS. Another study by Bronson et al.20 showed association between MS susceptibility and variation in CIITA in patients with the MS predisposing HLA-DRB1*1501 allele. Even if these associations are apparently independent of CLEC16A because of very limited LD between the associated polymorphisms in Figure 2 Correlation between rs12708716 genotype and the relative 19,20 distribution of the short- and long isoform. (a) peripheral blood and either gene, polymorphisms within the CLEC16A

(b) Thymus. The y-axis represents (CT mean short isoform/CT mean gene could potentially affect expression levels of SOCS1 long isoform). CT is inversely related to quantity, thus, there is a or CIITA,orvice versa. Thus, a role of all three genes higher relative proportion of the short isoform in thymus samples in MS should be further investigated. with the risk genotype AA compared with AG and GG. To date, only limited expression data on CLEC16A has No peripheral blood samples harboured the rarest genotype GG. been published. CLEC16A encodes a suggested immune P-values were calculated by Mann–Whitney U-test. receptor, and shows association with multiple autoim- mune diseases. This could mean that mutations in type carrying the rs12708716 risk allele. However, the r2 the gene cause a general autoimmune dysfunction. was relatively low because of frequency differences As the thymus is relevant for the establishment of central between the associated alleles, providing sufficient tolerance, it is relevant to study the expression power to demonstrate primary association to of CLEC16A in thymic tissue. We found association rs12708716 by logistic regression analyses. The between rs12708716 genotype and the relative expression rs12708716 SNP and the four other SNPs comprising of a short (21 exons) and long (24 exons) CLEC16A the risk haplotype identified in our study are non-coding transcript in thymus. In addition to fewer exons (lacks SNPs located in intron 10, 19, 20 and 22. Among these, exon 22–24), the short transcript contains a 48 bp deletion rs12708716 is the best candidate for being the causal not found in the long transcript comprising the exon variant even though we cannot rule out variants in 10–11 boundary (http://genome.ucsc.edu). rs12708716 is

Genes and Immunity Exploring the CLEC16A gene in MS I-L Mero et al 196 situated in intron 19, whereas rs9934231, rs8060411, Table 5 Demographic information of patient sample sets rs7206912 and rs6498169, which are also on the described MS-associated haplotype are in intron 10, 20 or 22. Thus, Clinical variable Norway (n)UK(n) they cluster in some proximity to regions where differences are seen in the transcripts. Hakonarson M:F ratio 2.7:1 (1814) 2.9:1 (1691) et al.1 have previously studied CLEC16A expression in Mean age of onset 33.4 (1672) 29.2 (1539) PPMS (%) 14.4 (1742) 11.0 (1676) NK cells in view of the T1D-associated SNP rs2903692 Mean EDSS 4.3 (633) 4.7 (932) and found that homozygotes for the A allele showed a Mean disease durationa 15.0 (624) 11.0 (932) trend towards increased CLEC16A mRNA expression (KIAA0350/CLEC16A Affymetrix probe set 231221 Abbreviations: EDSS, expanded Disability Status Scale; M:F ratio, (Affymetrix, Santa Clara, CA, USA) ). Interestingly, this male: female ratio; n, total number of patients for whom the clinical 2 SNP rs2903692 is in strong LD (r ¼ 0.88) with variable is available; PPMS, primary progressive MS. rs12708716. We saw no such direct relationship with aMean disease duration is measured from the year of disease onset genotype and expression level of CLEC16A transcript to the last clinical assessment by EDSS. neither in thymus nor whole blood, nor did we find any effect on relative isoform expression in whole blood as seen in the thymus samples. Of note, thymus samples consist mainly of thymocytes, whereas peripheral whole- patient sample sets is given in Table 5. In the expression blood samples are very heterogenous in cellular compo- analyses, we used peripheral whole blood samples from sition. It is possible that with more detailed analysis of healthy bone marrow donors, and thymic tissue samples CLEC16A expression in subsets of white blood cells, the obtained from 42 Norwegian children under the age of same relationship between CLEC16A transcripts and 13 undergoing corrective cardiac surgery at Rikshospita- genotypes would be observed. Our results could there- let. Whole blood was collected on Tempus tubes fore speak for thymus-specific and or cell-specific (Applied Biosystems, Foster city, CA, USA), and thymic regulatory effect on splicing mechanisms being picked tissue samples were collected in RNAlater solution up by rs12708716. (Ambion, Austin, TX, USA) immediately during surgery. In conclusion, in the most extensive fine-mapping Informed consent was achieved from all participants study of the CLEC16A gene in MS performed to date, we or their parents (for the thymus samples), and the study find a significant association between MS susceptibility was approved by the appropriate local ethical autho- and rs12708716. Furthermore, the rs12708716 genotype is rities. associated with the relative expression of CLEC16A transcripts in thymus, but not in whole blood, suggesting SNP panel design and genotyping thymus-specific splicing regulation being involved in the The SNP selection for the initial screening phase was expression of this plausible immune receptor gene. done in three stages. The majority of the screening samples (644 cases and 1023 controls) had previously showed nominally significant association with 4 Materials and methods rs12708716 (P ¼ 0.005) in an IMSGC study. We therefore chose to further explore putative associations repre- Subjects sented by this known associated SNP rs12708716, and The Norwegian patient sample sets were gathered in selected a panel of SNPs in moderate-to-strong LD with the Oslo MS registry and biobank (n ¼ 619) and the rs12708716 (r240.6, MAF40.01). We then used a tagging Norwegian MS registry and biobank (n ¼ 1394), and were approach among these SNPs to remove redundant SNPs sampled in Oslo with surrounding areas and nationwide, in almost complete LD (r240.95). This left us with respectively. In the screening phase, we genotyped 807 14 SNPs representing 59 SNPs in moderate LD with the Norwegian MS patients from the Oslo- and the Norwe- reference SNP rs12708716. Second, we incorporated these gian MS registry and biobank, and 1027 healthy controls 14 SNPs by force inclusion into a general mapping panel from the Norwegian bone marrow registry. Of these, 644 of the gene (r24 ¼ 0.8 and MAF40.05, capturing patients and 1023 controls had previously been geno- 241 SNPs). In this general map, we also forced to include typed for rs12708716 in an IMSGC study.4 Norwegian SNPs of particular interest, like possible splice site SNPs replication samples consisted of 1206 MS patients from or SNPs reported to be associated with other auto- the Norwegian MS registry and biobank, and 2064 immune disease, including rs6498169. As the CLEC16A Norwegian blood donors as healthy controls. Addition- gene is fairly contained within its own LD block, making ally, 22 duplicates were genotyped within the screening the previously reported CLEC16A association signals phase and 245 duplicates were typed in both the likely to originate from genetic variant(s) within the gene screening phase and replication phase for quality control. itself (www.hapmap.org), we confined our SNP selection The British replication sample set consisted of 1153 area to either end of the longest flanking distance of the trios, 1102 UK MS cases, 1980 controls from the 1958 CLEC16A gene chr16p13:10945–11184 kb. (dbSNP build British birth cohort (www.b58cgene.sgul.ac.uk) and 277 129 SNP track for the Human March 2006 assembly duplicates. The whole UK sample set had previously (http://genome.ucsc.edu/). Finally, we added SNPs of been genotyped for rs12708716 as part of an IMSGC special interest from the neighbourhood genes SOCS1 study,4 whereas 1350 of the patients and all UK controls and CIITA or the intergenic region separating these had been included in the IMSGC GWAS in 2007, thereby genes. The inclusion criterion for each SNP included in being genotyped for rs6498169.2All patients included in the screening panel is given in Supplementary Table 1. the study were diagnosed according to Poser and/or For the SNP selection, we used the Tagger pro- McDonald criteria.21,22 Demographic information of the gramme23 incorporated in Haploview v 4.1,12 and SNP

Genes and Immunity Exploring the CLEC16A gene in MS I-L Mero et al 197 genotype data from HapMap release 21, CEU 0.1. As the sample input for the measurement of long- analysis panel (samples collected from people living and short isoform were the same, CT mean was in Utah with ancestry from northern and western measured directly and not through RNase P. Europe).13 The sequences for primer design were obtained from dbSNP build 129, SNP track for the Statistical analyses Human March 2006 (hg18) assembly at the UCSC The data quality control including PED CHECK of the website (http://genome.ucsc.edu/). Our final panel of trio samples, genotype success rates and Hardy–Wein- 57 SNPs was genotyped by Sequenom technology on the berg Equilibrium was performed in Plink v 1.06.24 Centre for interactive genetics; Cigene, Norwegian Haploview v 4.1 was used for LD plot analyses.12 University of Life Sciences (UMB) AAs. Association testing, conditional logistic regression Replication genotyping of our five top hits from the analyses and haplotype analyses were carried out in screening phase, along with the already described Unphased v 3.1.4.25 Haplotype frequency threshold was associated SNP rs12708716 was performed by TaqMan set to 5%. In the combined analyses, we found significant technology. (Applied Biosystems). As the TaqMan assay heterogeneity for several SNPs between the UK and did not work for one of the top hit SNP (rs6498168), we Norwegian population. This was corrected for in all the replaced this SNP with one in complete LD (r2 ¼ 1; combined analyses, by setting nationality as confounder. rs8060411), which was then genotyped in both the Two of the SNPs (rs12708716 and rs6498169) on the screening cohort and the replication cohorts. A total identified MS susceptibility haplotype have previously of 1029 Norwegian blood donors were previously been confirmed associated with MS at genomewide genotyped for rs12708716 by Skinningsrud et al.7,8 For significance level, and our primary goal for combined the 1958 birth controls, SNPs rs9934231, rs12708716, data analyses was to single out a primary association. rs6498169 and rs12923849, had already been genotyped Hence, the P-values are presented noncorrected in the by the Wellcome Trust Case-Control Consortium combined material. The comparison of CLEC16A expres- and, hence, their genotypes were obtained through the sion and genotype was done by comparing the CT mean Wellcome Trust Case-Control Consortium website distribution among the homozygotes for the risk allele (www.wtccc.org.uk). Further, SNP rs12708716 has pre- versus heterozygotes and noncarriers for the risk allele viously been genotyped in the complete UK cohort .and by Mann–Whitney U-test performed in GraphPad. Prism does not represent new data.4 v. 5.01 (GraphPad Software Inc., San Diego, CA, USA). Expression analyses Whole-blood RNA from healthy Norwegian bone mar- Conflict of interest row donors was isolated using the Tempus spin RNA kit and included a DNAse digestion step by AbsoluteRNA The authors declare no conflict of interest. Wash Solution (Applied Biosystems). Complementary DNA was produced by the reverse transcriptase kit provided by Eurogentec (Eurogentec, Liege Science Park, Acknowledgements Seraing, Belgium). Thymus RNA isolation was per- formed with TRIzol reagent (Invitrogen, Carlsbad, CA, We would like to thank all MS patients and healthy USA) according to the manufacturer’s protocols from controls for their participation in the study. The study is whole tissue biopsies from human thymi. The comple- funded by grants from The South-Eastern Norway mentary DNA synthesis was performed on total RNA by Regional Health Authority, the Research Council employing SuperScrip III reverse transcriptase (Invitro- of Norway; Scientific Advisory Council Ulleva˚l, Oslo gen, Cat. No: 18080-051) and random hexamers with University Hospital; Norwegian Foundation for Health B1000 ng input of total RNA. DNAse digestion was and Rehabilitation, the Norwegian Diabetes Association, performed with DNaseI (RNase-free) M0303S (BioLabs the Oslo MS Association and the Odd Fellow MS society. New England, Ipswich, MA, USA). We thank all contributors to the collection of samples and An assay comprising known CLEC16A RNA clinical data in the Norwegian MS Registry and Biobank. transcripts, and differential expression of short iso- The Norwegian MS Registry and Biobank is supported form BC112897 (21 exons) and full-length peptide by the Research Council of Norway, Haukeland NM_015226.1/AB002348.3 (24 exons) were measured University Hospital and Western Norway Regional by real-time PCR on an ABI7900HT system using Health Authority. The Norwegian Bone Marrow Donor predesigned gene expression assays available from Registry, Rikshospitalet, Oslo University Hospital are Applied Biosystems. All assays had primers covering acknowledged for providing Norwegian controls. exon–exon transition to avoid DNA contamination. Harald Lindberg is thanked for collection of thymus B-Rajii RNA was isolated using the RNeasy Mini Kit by samples. The Centre for interactive genetics; Cigene, Quiagen Inc. (Valencia, CA, USA) and used as positive Norwegian University of Life Sciences (UMB) AAs is control for the two isoforms. Expression of the two thanked for performing Sequenom analyses. The transcripts in B-Rajii cells was before the real-time University of Cambridge and colleagues at the Depart- experiment confirmed by PCR and gel electrophoresis. ment of Clinical Neuroscience Addenbrooke’s hospital RNase P was used as endogenous control/house keeping are acknowledged for laboratory assistance and much gene and corrected for when looking at either transcripts appreciated collaboration in connection with the research alone. Triplicates were used for all samples included in stay of IL Mero in this department. This study makes use the experiment. Where one replicate was clearly an of data generated by the Wellcome Trust Case-Control outlier, this was omitted to leave a comparison of the two Consortium. A full list of the investigators who closest replicates. CT standard deviation cut off was set to contributed to the generation of the data is available

Genes and Immunity Exploring the CLEC16A gene in MS I-L Mero et al 198 from www.wtccc.org.uk. Funding for the project was 12 Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and provided by the Wellcome Trust under award 076113. We visualization of LD and haplotype maps. Bioinformatics 2005; acknowledge use of DNA from the British 1958 Birth 21: 263–265. Cohort collection, funded by the Medical Research 13 International HapMap Consortium 2007. A second generation Council Grant G0000934 and the Wellcome Trust Grant human haplotype map of over 3.1 million SNPs. Nature 2007; 068545/Z/02. This work was also supported by the 449: 851–861. Medical Research Council (G0700061), the Wellcome 14 Martinez A, Perdigones N, Cenit M, Espino L, Varade J, Lamas JR et al. Chromosomal region 16p13: further evidence Trust (084702/Z/08/Z) and the Cambridge NIHR of increased predisposition to immune diseases. Ann Rheum Biomedical Research Centre. Dis 2010; 69: 309–311. 15 Zoledziewska M, Costa G, Pitzalis M, Cocco E, Melis C, Moi L et al. Variation within the CLEC16A gene shows consistent disease association with both multiple sclerosis and type 1 References diabetes in Sardinia. Genes Immun 2009; 10: 15–22. 16 Bronson PG, Ramsay PP, Seldin MF, Gregersen PK, Criswell 1 Hakonarson H, Grant SF, Bradfield JP, Marchand L, Kim CE, LA, Barcellos LF. A candidate gene study of CLEC16A Glessner JT et al. A genome-wide association study identifies does not provide evidence of association with risk for KIAA0350 as a type 1 diabetes gene. Nature 2007; 448: 591–594. anti-CCP-positive rheumatoid arthritis. Genes Immun 2010; 11: 2 International Multiple Sclerosis Genetics Consortium 504–508. (IMSGC). Risk alleles for multiple sclerosis identified by a 17 Perera D, Stankovich J, Butzkueven H, Taylor BV, Foote SJ, genomewide study. N Engl J Med 2007; 357: 851–862. Kilpatrick TJ et al. Fine mapping of multiple sclerosis 3 Wellcome Trust Case Control Consortium. Genome-wide susceptibility genes provides evidence of allelic hetero- association study of 14 000 cases of seven common diseases geneity at the IL2RA locus. J Neuroimmunol 2009; 211: and 3000 shared controls. Nature 2007; 447: 661–678. 105–109. 18 Nejentsev S, Walker N, Riches D, Egholm M, Todd JA. 4 International Multiple Sclerosis Genetics Consortium Rare variants of IFIH1, a gene implicated in antiviral (IMSGC). The expanding genetic overlap between multiple responses, protect against type 1 diabetes. Science 2009; 324: sclerosis and type I diabetes. Genes Immun 2009; 10: 11–14. 387–389. 5 Marquez A, Varade J, Robledo G, Martinez A, Mendoza JL, 19 Zuvich RL, McCauley JL, Oksenberg JR, Sawcer SJ, Taxonera C et al. Specific association of a CLEC16A/ De Jager PL, Aubin C et al. Genetic variation in the IL7RA/ KIAA0350 polymorphism with NOD2/CARD15(-) Crohn’s IL7 pathway increases multiple sclerosis susceptibility. disease patients. Eur J Hum Genet 2009; 17: 1304–1308. Hum Genet 2010; 127: 525–535. 6 Rubio JP, Stankovich J, Field J, Tubridy N, Marriott M, 20 Bronson PG, Caillier S, Ramsay PP, McCauley JL, Zuvich RL, Chapman C et al. Replication of KIAA0350, IL2RA, RPL5 De Jager PL et al. CIITA variation in the presence of and CD58 as multiple sclerosis susceptibility genes in HLA-DRB1*1501 increases risk for multiple sclerosis. Australians. Genes Immun 2008; 9: 624–630. Hum Mol Genet 2010; 19: 2331–2340. 7 Skinningsrud B, Husebye ES, Pearce SH, McDonald DO, 21 McDonald WI, Compston A, Edan G, Goodkin D, Brandal K, Boe WA et al. Polymorphisms in CLEC16A and Hartung HP, Lublin FD et al. Recommended diagnostic criteria CIITA at 16p13 are associated with primary adrenal insuffi- for multiple sclerosis: guidelines from the International Panel ciency. J Clin Endocrinol Metab 2008; 93: 3310–3317. on the diagnosis of multiple sclerosis. Ann Neurol 2001; 50: 8 Skinningsrud B, Lie BA, Husebye ES, Kvien TK, Forre O, Flato 121–127. B et al. A CLEC16A variant confers risk for juvenile idiopathic 22 Poser CM, Paty DW, Scheinberg L, McDonald WI, Davis FA, arthritis and anti-CCP negative rheumatoid arthritis. Ann Ebers GC et al. New diagnostic criteria for multiple sclerosis: Rheum Dis 2009; 69: 1471–1474. guidelines for research protocols. Ann Neurol 1983; 13: 9 Todd JA, Walker NM, Cooper JD, Smyth DJ, Downes K, 227–231. Plagnol V et al. Robust associations of four new chromosome 23 de Bakker PI, Yelensky R, Pe’er I, Gabriel SB, Daly MJ, regions from genome-wide analyses of type 1 diabetes. Altshuler D. Efficiency and power in genetic association Nat Genet 2007; 39: 857–864. studies. Nat Genet 2005; 37: 1217–1223. 10 Geijtenbeek TB, Gringhuis SI. Signalling through C-type lectin 24 Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, receptors: shaping immune responses. Nat Rev Immunol 2009; Bender D et al. PLINK: a tool set for whole-genome association 9: 465–479. and population-based linkage analyses. Am J Hum Genet 2007; 11 Hoppenbrouwers IA, Aulchenko YS, Janssens AC, 81: 559–575. Ramagopalan SV, Broer L, Kayser M et al. Replication of 25 Dudbridge F. Likelihood-based association analysis for CD58 and CLEC16A as genome-wide significant risk genes for nuclear families and unrelated subjects with missing genotype multiple sclerosis. J Hum Genet 2009; 54: 676–680. data. Hum Hered 2008; 66: 87–98.

Supplementary Information accompanies the paper on Genes and Immunity website (http://www.nature.com/gene)

Genes and Immunity