and Immunity (2014) 15, 355–360 & 2014 Macmillan Publishers Limited All rights reserved 1466-4879/14 www.nature.com/gene

ORIGINAL ARTICLE Coeliac disease-associated polymorphisms influence thymic expression

SS Amundsen1, MK Viken2,3, LM Sollid1 and BA Lie2,3

Significant associations between coeliac disease (CD) and single nucleotide polymorphisms (SNPs) distributed over 40 genetic regions have been established. The majority of these SNPs are non-coding and 20 SNPs were, by expression quantitative trait loci (eQTL) analysis, found to harbour cis regulatory potential in peripheral blood mononuclear cells (PBMC). Almost all regions contain genes with an immunological relevant function, of which many act in the same biological pathways. One such pathway is T-cell development in the thymus, a pathway previously not explored in CD pathogenesis. The aim of our study was to explore the regulatory potential of the CD-associated SNPs (n ¼ 50) by eQTL analysis in thymic tissue from 42 subjects. In total, 43 nominal significant (Po0.05) eQTLs were found within 24 CD-associated chromosomal regions, corresponding to 27 expression-altering SNPs (eSNPs) and 40 probes (eProbes) that represents 39 unique genes (eGenes). Nine significant probe–SNP pairs (corresponding to 8 eSNPs and 7 eGenes) overlapped with previous findings in PBMC (rs12727642-PARK7, rs296547-DDX59, rs917997-IL18RAP, rs842647-AHSA2, rs13003464-AHSA2, rs6974491-ELMO1, rs2074404-NSF (two independent probes) and rs2298428-UBE2L3). When compared across more tissues, we found that 14 eQTLs could represent potentially novel thymus-specific eQTLs. This implies that CD risk polymorphisms could affect gene regulation in thymus.

Genes and Immunity (2014) 15, 355–360; doi:10.1038/gene.2014.26; published online 29 May 2014

INTRODUCTION displaying eQTLs within these regions was significantly higher Coeliac disease (CD) is a chronic inflammatory disease of the small than expected by chance.2 Hence, these observations strongly intestine that develops in genetically susceptible individuals as a indicate that several of the CD risk variants contribute to result of an inappropriate immune response against dietary wheat disease risk through mechanisms altering gene expression and gluten, as well as related from rye and barley. There is an regulation. unambiguous genetic component in CD. Specific DQA1 and DQB1 As the above-mentioned eQTL studies were performed alleles within the human leukocyte antigen (HLA) complex are using PBMCs, the available expression data come from a limited strongly associated with CD; 90–95% of the patients carry the number of cell types. Although whole blood constitutes a good alleles DQA1*05:01/DQB1*02:01 (encoding the HLA-DQ2.5 mole- surrogate tissue for gene expression analysis,2 genes with cule), whereas almost all the other patients either carry the alleles expression specific to other cell types are most likely DQA1*03:01/DQB1*03:02 (encoding the HLA-DQ8 molecule) or also relevant for CD pathogenesis (for example, intestinal DQA1*02:01/DQB1*02:02 (encoding the HLA-DQ2.2 molecule).1 epithelial cells or thymic tissue), and would be missed These specific DQA1 and DQB1 alleles are needed, but not when studying eQTLs only in PBMCs. Consequently, expression sufficient for the disease to develop, nor to explain the whole data from additional disease-relevant tissues are warranted genetic predisposition. During the last couple of years, convincing in order to evaluate the relevance of such eQTL data in CD evidence for association between CD and single nucleotide pathogenesis. polymorphisms (SNPs) within 39 loci outside of the HLA Almost all the novel CD risk variants map near genes with complex has been revealed by genome-wide association (GWA) immunologic function clearly pointing towards the involvement of as well as follow-up studies.2–4 the immune system in CD pathogenesis. Many of the genes also The vast majority of the CD-associated SNPs are non-coding act together in the same biological pathways. T-cell development variants (that is, intronic and intergenic SNPs). Such disease- in the thymus was one pathway highlighted in these studies,2 a associated non-coding risk variants are proposed to contribute to pathway that has not yet been explored in the context of CD disease risk by influencing gene expression and regulation. development. Owing to the fact that CD is a T-cell-mediated Genetic variants that affect expression of a transcript mapping disease characterised by T-cell tissue destruction, as well as the within the same locus (that is, cis associations) have been fact that the T-cell repertoire is shaped in the thymus, we extensively reported.5,6 Thus, to address the regulatory potential considered thymus to be a highly relevant organ to address the of the CD-associated SNPs, cis expression–genotype correlation regulatory role of genetic risk variants for CD. studies (so called expression quantitative trait loci (eQTL) analyses) Therefore, based on the preceding knowledge, our study aimed have been performed in peripheral blood mononuclear cells to explore the regulatory potential for the novel CD-associated (PBMCs). These analyses revealed that the CD-associated regions SNPs within thymic tissue by cis expression–genotype correlation were significantly enriched for eQTLs as the number of SNPs (that is, eQTL) analysis.

1Centre for Immune Regulation, Institute of Immunology, University of Oslo, Oslo University Hospital, Rikshospitalet, Oslo, Norway; 2Department of Medical Genetics, University of Oslo, Oslo University Hospital, Ullevål, Oslo, Norway and 3Department of Immunology, Oslo University Hospital, Rikshospitalet, Oslo, Norway. Correspondence: Dr SS Amundsen, Centre for Immune Regulation, Institute of Immunology, University of Oslo, Oslo University Hospital, Rikshospitalet, 0027 Oslo, Norway. E-mail: [email protected] Received 11 January 2014; revised 21 April 2014; accepted 22 April 2014; published online 29 May 2014 Coeliac disease-associated eQTLs in thymus SS Amundsen et al 356 RESULTS cis gene expression (excluding the HLA region on 6 Genotype quality control because the major causal gene within this region is known and All SNPs (N ¼ 50) included in this study displayed a genotype also because high degree of exonic polymorphisms influence the success rate above 92% (that is, maximum three missing binding efficiency of probes to various HLA alleles). In total, 43 genotypes) when genotyped with the mass array technology. nominal significant probe–SNP pairs (Po0.05) distributed over 24 To validate the Sequenom mass array results, two randomly CD-associated chromosomal regions were detected (Table 1 selected SNPs were also genotyped by TaqMan, and complete and Supplementary Figure 1). The 43 eQTLs involved 27 genotype correlation was seen for all samples successfully expression-altering SNPs (eSNPs) and 40 probes (eProbes) genotyped with both methods. representing 39 unique genes (eGenes). The most significant eQTLs (that is, P-valueo0.01) were rs2298428-UBE2L3 (P-value ¼ 9.0 Â 10 À 4), rs12727642-PARK7 (P-value ¼ 1.4 Â 10 À 3), Gene expression in thymus rs3748816-HES5 (P-value ¼ 1.7 Â 10 À 3), rs917997-IL18RAP Expression data for 549 probes, covering 337 of the total 424 (P-value ¼ 1.8 Â 10 À 3), rs13003464-XPO1 (P-value ¼ 5.8 Â 10 À 3), non-HLA genes located within the 39 candidate regions, were rs4675374-RAPH1 (P-value ¼ 1.4 Â 10 À 3), rs6441961-CCR1 available and hence included in our analysis (Supplementary (P-value ¼ 1.7 Â 10 À 3), rs1738074-ILMN_1902604 (P-value ¼ Table 1). Over all, the probe intensity signals in this data set were 3.2 Â 10 À 3) and rs13003464-C2orf74 (P-value ¼ 1.1 Â 10 À 3) in the range 5–14 with the majority (66%) of probes in the lower (Figure 2). The distance between the eSNP and the eGene ranged end of the scale (that is, intensity of 5–7). This is in line with what from 0 to 728 kb, with an average distance of 223 kb. Seven of the we would expect, because all genes are assumed to be expressed 27 eSNPs were intergenic, while 16 were located within known in at least low amounts in thymic tissue. Few probes displayed genes in the respective regions: exons (n ¼ 4), UTRs (n ¼ 2) and signals in the upper end of the scale, implying that few of the introns (n ¼ 14). Two SNPs were located within the gene of which genes within the risk loci were strongly expressed in the thymus they were associated with differential expression, that is, (Figure 1). rs17035378 in PLEK and rs6974491 in ELMO1. Both SNPs were In order to validate the array data, we also measured expression located in introns, not close to exons. of nine randomly selected genes using TaqMan gene expression assays (see Materials and Methods). Good correlation was seen Thymic eQTLs compared with PBMC between the gene expression levels measured by the two Of the 27 putative thymic eSNPs, 16 had previously been methods, that is, genes with low array signals amplified with annotated as eSNPs in PMBC by Dubois et al.2 Eight of these X high Ct values (Ct 32), whereas genes with signals in the upper eSNPs influenced expression of the same gene in both tissues with range amplified with low Ct values (Ct ¼ 32-25) (data not shown). the same direction of allelic expression (rs12727642-PARK7, From these data, we concluded that genes with array signals as rs296547-DDX59, rs917997-IL18RAP, rs842647-AHSA2, rs13003464- low as 5–5.5 were either not expressed or expressed at very low AHSA2, rs6974491-ELMO1, rs2074404-NSF (two independent amounts in thymic tissue, whereas genes with signals ranging probes) and rs2298428-UBE2L3). The remaining eight eSNPs from 6–14 showed expression in thymus. showed tissue-specific regulation and influenced expression of different genes in the two tissues (that is, discordant regulation; Expressed quantitative trait loci (eQTL) associated with CD Table 1). Furthermore, 11 eSNPs were previously not annotated as An on-going challenge in genetic research of complex eSNPs in PMBC and represent potentially thymus-specific eSNPs. human traits is the assignment of function to the detected The association between SNP genotype and gene expression for disease-associated non-coding genetic variants. As eQTL analysis these 27 eSNPs is shown in Supplementary Figure 1. is one way to suggest causal genes within defined risk regions, we investigated the 39 non-HLA risk loci (50 SNPs) for correlation with eQTL comparisons across tissues Encouraged by these findings, we wanted to explore the regulatory potential of the 27 identified eSNPs in more tissues using the Genevar software, particularly aiming to evaluate the 11 15 potentially thymus-specific eSNPs. The Genevar software contains 14 eQTL results generated from five distinct tissues (B-lymphoblas- 13 toid cell lines, T cells, fibroblast, skin and adipose tissue), thus 12 providing a tool for comparison of SNPs regulatory potential over several tissues. Using this software, all 27 eSNPs appeared to be 11 expression associated SNPs also in other tissues and cell types. 10 However, in most cases, the genes under regulation, that is, the 9 eGenes, differed between different tissues (Supplementary Intensity Table 2). Hence, none of the eSNPs were found to exclusively 8 influence gene expression in thymus. On the other hand, 7 expression of 13 genes were only associated with eSNPs in 6 thymic tissue, and not found to be differentially expressed due to the CD-risk SNPs either in PMBC2 or in any of the other five tissues 5 available in Genevar (Supplementary Table 3). This implies that Probes expression of these 13 genes could be exclusively regulated by the CD-associated SNPs in thymus, and hence represented by Figure 1. Distribution of the signal intensity for the 549 probes putative thymus-specific eQTLs. Notably, of these 13 genes, covering genes located within the 39 non-HLA CD-associated probes for 7 genes were not tested across all five tissues regions. Each dot represents one probe and the intensity value is the mean intensity of the 42 thymic samples. The 20 most strongly (that is, RGS13, INADL, PANK4, TSP50, ERP29, FAM109 and expressed genes (array signals 410, captured by 25 probes) within ZFP36L1). Therefore, only six eQTLs could be perceived to be the CD-associated regions were TNFRSF14, EST1, AIRE, SRRM1, thymus-specific based on current information; that is, rs864537- 2 ARL6IP5, TM2D1, SYF2, PSMG2, BCL11A, ZFP36L1, CD247, CSTB, XPO1, POU2F1 (P-value ¼ 1.3 Â 10 À ), rs13098911-LRRC2 (P-value ¼ 3.7 LMOD3, VIL2, COX17, PARK7, ERP29, TMSB4X and TRIM13. Â 10 À 2), rs196432-GRHL3 (P-value ¼ 1.4 Â 10 À 2), rs196432-

Genes and Immunity (2014) 355 – 360 & 2014 Macmillan Publishers Limited Coeliac disease-associated eQTLs in thymus SS Amundsen et al 357 Table 1. Overview of CD-associated SNPs being eQTLs in thymic tissue

Region eSNP Chr Location eGene IlluminaArrayID Intensity P-value Distance eQTL in blooda (kb)

1 rs12727642 1 Intergenic PARK7 ILMN_1744713 10.67 1.4 Â 10 À 3b 1.3 PARK7 2 rs2816316 1 Intergenic RGS13 ILMN_2407775 5.33 2.7 Â 10 À 2 68 — rs2816316 1 Intergenic TROVE2 ILMN_1738909 5.71 4.3 Â 10 À 2 492 — 3 rs296547 1 Intergenic GPR25 ILMN_1761766 5.15 3.8 Â 10 À 2 49 DDX59 rs296547 1 Intergenic DDX59 ILMN_1748077 8.23 3.3 Â 10 À 2 253 DDX59 rs296547 1 Intergenic KIF21B ILMN_1800670 5.24 3.3 Â 10 À 2 46 DDX59 4 rs6691768 1 Intron (NFIA) INADL ILMN_1773312 5.36 4.8 Â 10 À 2 416 — 5 rs864537 1 Intron (CD247) POU2F1 ILMN_1794333 6.05 1.3 Â 10 À 2 15 CD247 6 rs3748816 1 Exon (MMEL1) HES5 ILMN_1794742 5.52 1.7 Â 10 À 3 65 PLCH2, TNFRSF14, C1orf93, MMEL1 rs3748816 1 Exon (MMEL1) PANK4 ILMN_1743910 6.28 2.3 Â 10 À 2 68 PLCH2, TNFRSF14, C1orf93, MMEL1 7 rs196432 1 Exon (RCAN3) GRHL3 ILMN_1782141 5.59 1.4 Â 10 À 2 170 — rs196432 1 Exon (RCAN3) IL28RA ILMN_1680805 5.34 1.7 Â 10 À 2 348 — rs196432 1 Exon (RCAN3) IL22RA1 ILMN_1666175 5.23 2.5 Â 10 À 2 392 — 8 rs17035378 2 Intron (PLEK) PLEK ILMN_1795762 8.94 4.2 Â 10 À 2 —— rs17035378 2 Intron (PLEK) PPP3R1 ILMN_1796962 7.71 1.4 Â 10 À 2 112 — 9 rs917997 2 Intergenic IL18RAP ILMN_1721762 6.12 1.8 Â 10 À 3 1,5 IL18RAP rs917997 2 Intergenic TMEM182 ILMN_1806801 5.36 3.2 Â 10 À 2 308 IL18RAP 10 rs842647 2 Intron (REL) AHSA2 ILMN_1798308 7.80 4.6 Â 10 À 2 285 AHSA2 rs13003464 2 Intron (PUS10) AHSA2 ILMN_1798308 7.80 3.1 Â 10 À 2b 217 AHSA2 rs13003464 2 Intron (PUS10) C2orf74 ILMN_1754501 6.67 1.1 Â 10 À 3 185 AHSA2 rs842647 2 Intron (REL) C2orf74 ILMN_1754501 6.67 2.7 Â 10 À 2 252 AHSA2 rs13003464 2 Intron (PUS10) XPO1 ILMN_1725121 10.10 5.8 Â 10 À 3 518 AHSA2 11 rs4675374 2 Intron (ICOS) RAPH1 ILMN_1783846 6.60 1.4 Â 10 À 3b 402 — 12 rs13314993 3 Intergenic FBXL2 ILMN_1688639 5.57 2.8 Â 10 À 2 303 — 13 rs6441961 3 Intergenic TSP50 ILMN_1775615 5.23 3.5 Â 10 À 2 401 CCR3 rs6441961 3 Intergenic CCR1 ILMN_1678833 5.82 1.7 Â 10 À 3 106 CCR3 rs13098911 3 Intron (CCR3) CCR1 ILMN_1678833 5.82 6.5 Â 10 À 3b 10 CCR3, CXCR6 rs13098911 3 Intergenic (CCR3) LRRC2 ILMN_1748530 5.38 3.7 Â 10 À 2b 322 CCR3, CXCR6 14 rs11712165 3 Intron (CDGAP) ARHGAP31 ILMN_1915328 5.12 4.0 Â 10 À 2 20 — 15 rs13151961 4 Intron (KIAA1109) NUDT6 ILMN_2366972 6.21 4.1 Â 10 À 2 728 — 16 rs1738074 6 UTR (TAGAP) EZR-AS1 ILMN_1902604 5.41 3.2 Â 10 À 3 224 TAGAP 17 rs6974491 7 Intron (ELMO1) ELMO1 ILMN_1784320 7.53 1.3 Â 10 À 2 — ELMO1 18 rs1250552 10 Intron (ZMIZ1) PPIF ILMN_1809607 5.59 4.7 Â 10 À 2 48 ZMIZ1 19 rs3184504 12 EXON (SH2B3) FAM109A ILMN_1743316 5.37 1.2 Â 10 À 2 83/206 SH2B3, ALDH2, and and Intron TMEM116 rs653178* (ATXN2) rs3184504 12 EXON (SH2B3) ERP29 ILMN_2323048 11.29 4.6 Â 10 À 2 443/566 SH2B3, ALDH2, and and Intron TMEM116 rs653178* (ATXN2) 20 rs4899260 14 Intergenic ZFP36L1 ILMN_1675448 10.16 1.2 Â 10 À 2 18 — 21 rs2074404 17 Intron (WNT3) NSF ILMN_1680353 9.63 2.4 Â 10 À 2 31 NSF, LRRC37A, WNT3, LOC388397 rs2074404 17 Intron (WNT3) NSF ILMN_2251784 6.16 3.0 Â 10 À 2 31 NSF, LRRC37A, WNT3, LOC388397 22 rs1893217* 18 Intron (PTPN2) SLMO1 ILMN_2232157 6.06 8.4 Â 10 À 3 377 — 23 rs4819388 21 UTR (ICOSLG) PDXK ILMN_1672504 8.80 3.9 Â 10 À 2 465 RRP1 24 rs2298428 22 Exon (YDJC) UBE2L3 ILMN_1677877 7.90 9.0 Â 10 À 4 4.6 UBE2L3 rs2298428 22 Exon (YDJC) TOP3B ILMN_1912077 5.37 2.7 Â 10 À 2 328 UBE2L3 rs2298428 22 Exon (YDJC) HIC2 ILMN_1652762 7.60 3.7 Â 10 À 3 177 UBE2L3 Abbreviations: Chr, chromosome; ns, not significant. These results represent Kruskal–Wallis statistics. aReference Dubois et al.2 eSNPs marked in bold do not overlap with eSNPs in PBMC. Intensity refers to the mean signal intensity of the probe. As rs3184504 and rs653178* are in complete LD, the results for these two SNPs were identical. Distance refers to the distance from the SNP to the gene under regulation. When performing the Mann–Whitney U-test, homozygous and heterozygote carriers for the risk alleles were grouped and compared with homozygous non-risk individuals. The Mann–Whitney U-test was performed only in the cases when a clear trend towards carrier status was seen. In a few cases, only two SNP genotype groups were available (due to low minor allele frequency), and in this case the Mann–Whitney U-test was performed. bMann–Whitney U-test.

IL22RA1 (P-value ¼ 2.5 Â 10 À 2), rs917997-TMEM182 (P-value ¼ genes amplified with Ct values that correlated well with the 3.2 Â 10 À 2) and rs13314993-FBXL2 (P-value ¼ 2.8 Â 10 À 2) low array signals (Ct values in the range 30–34) except for IL22RA1 (Supplementary Table 3). As the six putatively thymus-specific where most samples were non-detectable (Ct values435). regulated genes showed low expression (with array intensity in Taken together, these data suggest that 27 of the 50 CD- the range 5–6; Table 1 and Supplementary Figure 1) we further associated SNPs investigated in this study are eSNPs in thymic confirmed the expression of these genes with TaqMan analysis. All tissue, and six of the identified eQTLs were exclusively found in

& 2014 Macmillan Publishers Limited Genes and Immunity (2014) 355 – 360 Coeliac disease-associated eQTLs in thymus SS Amundsen et al 358 HES5 (ILMN_1794742) IL18RAP (ILMN_1721762) -4 6.0 P-value = 2.0x10 -4 8.0 P-value = 2.0x10 5.8 7.5

5.6 7.0 6.5 5.4 Intensity

Intensity 6.0 5.2 5.5 5.0 5.0 G G A

17997-TT 748816_G 748816_A rs917997-CT rs9 rs3 rs3748816_A rs3 rs917997-CC Genotypes Genotypes UBE2L3 (ILMN_1677877) C2orf74 (ILMN_1754501) 9.0 8.0 P-value = 8.0x10-4 P-value = 2.0x10-4 8.5 7.5

7.0 8.0

Intensity 6.5 7.5 Intensity 6.0

7.0 5.5

298428_AA 298428_AG 298428_GG rs2 rs2 rs2 rs1303464_CC rs1303464_CT rs1303464_TT Genotypes Genotypes

PARK7 (ILMN_1744713) XPO1 (ILMN_1725121) ILMN_1902604 -3 11.5 P-value = 5.8x10 6.0 -3 10.6 P-value = 3.2x10 P-value = 1.4x10-3 10.4 5.8 11.0 10.2 5.6

10.0 5.4 Intensity Intensity 10.5 Intensity 9.8 5.2

10.0 9.6 5.0

C T C

738074_TT 738074_C 738074_C 2727642_AA 2727642_A 2727642_CC 3003464_TT 3003464_CT 3003464_CC rs1 rs1 rs1 rs1 rs1 rs1 rs1 rs1 rs1 Genotypes Genotypes Genotypes

RAPH1 (ILMN_1783846) CCR1 (ILMN_1678833) CCR1 (ILMN_1678833) -3 -3 7.5 P-value = 1.4x10-3 P-value = 6.5x10 7.0 P-value = 6.5x10

6.5 7.0 6.5

6.5 6.0 6.0 Intensity Intensity Intensity 6.0 5.5 5.5

5.5 5.0 5.0

T C C T T _C _C _T 1 1 1 6 6 6 9 9 9 1 1 1 4 4 4 675374_TT 675374_C 675374_C 4 4 4 6 6 3098911_CT 3098911_CC rs4 rs4 rs4 rs rs rs6 rs13098911_TT rs1 rs1 Genotypes Genotypes Genotypes Figure 2. eQTLs with strongest statistical significance in thymus.

the thymus, indicating that expression of these six genes could be thymic eSNPs had previously also been identified as eSNPs in uniquely regulated in thymus by the CD-associated SNPs. PBMC, and all showed association with the same allelic direction. Eight of these eSNPs showed concordant regulation of the same genes in PBMC and thymus, while the remaining eight SNPs DISCUSSION appeared to regulate another nearby gene. In addition, at least 6 More than half of the CD-associated regions investigated, 24 out thymus-specific eQTLs were indicated, and 13 when including of 39 loci, were in this study suggested to be expressed those that were not explored across the entire set of tissues. The quantitative trait loci in thymus. The associations did not stand expression levels of some of these genes were low, thus their correction for multiple testing, which is not surprising given in vivo relevance is unclear. The vast majority of cells in thymus are limited thymus sample size; hence, the results should be thymocytes. Hence, it is most likely that the expression profiles confirmed in future studies. Nevertheless, 16 of the 27 potential and differences observed in this study are due to expression

Genes and Immunity (2014) 355 – 360 & 2014 Macmillan Publishers Limited Coeliac disease-associated eQTLs in thymus SS Amundsen et al 359 differences in this dominant cell type. However, as the expression regulation mediated by ZFP36L1 was found to affect thymocyte analysis in this study was performed on total RNA, consisting of a development as it contributed to aberrant b-selection.8 Studies of pool of mixed cell types, we are unable to draw conclusions about ZFP36L1 in mice have also shown that modulation of mRNA which subpopulations of cells the eQTLs are likely to be important. stability represents a promising therapeutic approach in cancer So, the observation of low expression of many genes could be therapy.9 Our finding that ZFP36L1 is expressed in a genotype- explained by the fact that they are expressed in only subpopula- dependent manner in thymus, influenced by the CD-risk SNP tion of cells in thymus. Further investigations are needed to rs4899260, is therefore interesting. conclude whether and how the observed eQTLs specifically Specific gene expression in thymus has been implicated in the contribute to the aetiology of CD. At any rate, identification of pathogenesis of autoimmune diseases. In type 1 diabetes, disease- eSNPs using blood as surrogate tissue seems to be good but not associated variants upstream of the insulin gene lead to altered sufficient for tissue- and disease-specific eQTL mapping. This is insulin expression in thymus, and this was hypothesised to altered noteworthy given that SNPs associated with complex traits have T-cell tolerance for insulin as a self-.10 The importance of been found to more often exert a tissue-dependent effect on gene thymic T-cell regulation and development in CD has so far not expression.7 been investigated. Hence, to our knowledge this is the first study SNPs located in transcriptional regulatory elements (including showing that CD-associated SNPs act as cis regulatory variants also SNPs at 30 and 50 regions as well as non-synonymous-coding SNPs) in the thymus. have been shown to be enriched for tissue-dependent regulation.7 In conclusion, some of the CD-associated SNPs appear to be cis It has been suggested that the majority of eQTLs display regulatory variants in thymus, indicating that regulation of gene concordant association across tissues and that tissue-dependent expression in thymus could be of importance in CD aetiology. eQTLs in most cases can be explained by different causal variants Hence, our results add to the knowledge of the CD-associated (that is, from more and independently associated eSNPs).7 One of SNPs being regulatory SNPs and further underline the need for the potential thymus-specific eSNPs was non-synonymous investigating eQTLs in multiple disease-relevant tissues. (rs196432), whereas the rest were intronic and intergenic SNPs. This may relate to the fact that the majority of the 50 CD- associated SNPs included in this study were intergenic (n ¼ 26) MATERIALS AND METHODS and intronic (n ¼ 14). Alternatively, it could imply that these eSNPs Sample material are markers for the true regulatory variant giving significant eQTL Thymic tissue samples from 42 Norwegian children undergoing heart signals through linkage disequilibrium with the causal regulatory surgery (22 girls and 20 boys) were used in this study. All children were less variant yet to be discovered. than 13 years (26 samples were collected from children less than 1-year As thymus is an immunologically relevant organ involved in old). This project was approved by the regional ethical committee and autoimmunity in general, one could envisage that thymus-specific written informed consent was given by all parents. All tissue samples were eQTLs would be shared between autoimmune diseases. Of the kept anonymous. putative thymus-specific eQTLs identified in this study, all eSNPs, except rs13314993, are located within regions found to associate SNP selection with at least one additional disease that is related to the immune We included in our study all 40 SNPs displaying genome-wide (P-value system. For instance, the coding SNP rs3184504 (locating within o5 Â 10 À 8) as well as suggestive association with CD (in the study the gene SH2B3) is one such common autoimmune risk variant. by Dubois et al.,2 defined as 5 Â 10 À 6 o P-value 45 Â 10 À 8 and/or À 4 This exon SNP is positioned within a histone mark H3K27ac site P-valueGWAS o10 and P-valuefollow-up o0.01). In addition, we included that is a genomic site known to spot active enhancers. The same is the second SNP that display independent and strong association with CD true for the nearby intronic SNP rs653178. Both SNPs were found from the six loci showing evidence of multiple independent associations (in logistic regression analysis). The four non-synonymous SNPs (rs196432- to regulate FAM109 in a putative thymus-specific manner. Another RUNX3, rs3184504-SH2B3, rs3816281-PLEK and rs3748816-MMEL1) with regulatory SNP position within a histone mark H3K27ac site was À 4 evidence for association with CD (P-valueGWASo10 ) within these regions rs2298428. In terms of statistical significance, cis regulation of were also included. In total, 50 SNPs were investigated (Supplementary UBE2L3 by rs2298428 was among the most significant Table 1). (P-value ¼ 9.0 Â 10 À 4). This gene was also strongly expressed in thymus (array signal of 7.9). Concordant regulation of UBE2L3 by DNA and SNP genotyping rs2298428 was reported in PBMC,2 and rs2298428 locates within a DNA was extracted from thymic tissue samples and whole-genome- genomic region shown to confer risk to various immune-related amplified using GenomiPhi Amplification (GE Healthcare, Little Chalfont, diseases (for example, systemic lupus erythematosus and UK). The whole-genome-amplified DNA samples were genotyped using the rheumatoid arthritis). Sequenom MassArray technology (Sequenom, San Diego, CA, USA). Primer More than half of the genes (53%) included in this study design and multiplexing were performed using the SEQUENOM software appeared to be either weakly or not expressed in thymus (array MassArray Assay Design (version 2.0.0.17). Size separation was performed signal o6). This may reflect the reality, but it could also be due to in a MALDI-TOF mass spectrometer (integrated in the Sequenom poor probe coverage as the majority of the genes within these equipment), and genotypes were called using the software SPECTRO regions were covered by only one probe on the microarray. More ACQUIRE (version 3.0.1.14). As an additional quality control and method probes covering a gene would increase the likelihood to detect validation, we genotyped two randomly selected SNPs (rs802734-THEMIS, and rs10903122-RUNX3) using the TaqMan technology. The selected the transcripts expressed by a particular gene (exemplified with TaqMan assays (Applied Biosystems, Foster City, CA, USA) were run in the gene UBE2E3 where good expression was only seen for one of accordance with the manufacturer’s recommended programme in a in total seven probes covering this gene). Of the 39 unique eGenes StepOne Plus RT-PCR instrument (Applied Biosystems) and in a total in this study, 19 displayed low expression (array intensity in the reaction volume of 5 ml. range 5–6) and only 7 genes were strongly expressed, that is, PARK7, PLEK, XPO1, PDXK, NSF, ERP29 and ZFP36L1, of which ERP29 RNA isolation and cDNA synthesis and ZFP36L1 were among the genes displaying putative thymus- The thymic tissue samples were collected as thin slices in tubes containing specific regulation. The gene ZFP36L1 (Zinc finger protein 36, C3H RNAlater solution (Ambion, Austin, TX, USA) immediately after surgical type-like 1) is a RNA-binding protein that regulates mRNA removal. Small pieces of B50 mg were used for the extraction of DNA and 0 turnover by binding to AU-rich elements in the 3 -untranslated total RNA using TRIzol Reagent (Invitrogen, Carlsbad, CA, USA). Hence, region of mRNAs (UUAUUUAU being the most optimal), and cDNA synthesis was performed with random hexamers and SuperScrip III thereby triggering their degradation. In mice, post-transcriptional Reverse Transcriptase (Invitrogen) after treatment with RNase-free DNase I

& 2014 Macmillan Publishers Limited Genes and Immunity (2014) 355 – 360 Coeliac disease-associated eQTLs in thymus SS Amundsen et al 360 (New England Biolabs, Ipswich, UK). We performed the cDNA synthesis female twins of the MuTHER resource.14 eQTLs with a P-valueo0.05 within according to the manufacturer’s recommendation using B1000 ng of total any of the above-mentioned sample sets were considered significant and RNA as input. thus selected for comparison.

RNA quality control Statistics All RNA samples were run on a Bioanalyzer 2100 instrument (Agilent For the eQTL analysis, we performed a nonparametric one-way ANOVA Technologies, Santa Clara, CA, USA) in order to measure the purity and RNA (Kruskal–Wallis) test. For a few SNPs, none of the individuals were integrity (degradation). None of the samples showed signs of DNA homozygous for the minor allele, and we performed a two-tailed Mann– contamination. The RNA concentrations were quantified using NanoDrop Whitney U-test. A P-value o0.05 was considered to be statistically spectrophotometer (Thermo Fischer Scientific Inc., Waltham, MA, USA). significant.

Determination of gene expression levels by microarray A whole-genome expression array was performed utilising the Illumina CONFLICT OF INTEREST HumanWG-6 v3 array (Illumina, San Diego, CA, USA) and raw probe The authors declare no conflict of interest. intensity was extracted using BeadStudio software (Illumina). From the whole-genome expression data, only genes mapping within one mega base (1 Mb) surrounding the CD-associated SNP were included in our study ACKNOWLEDGEMENTS (Supplementary Table 1). Two thymic tissue samples were removed before We thank Harald Lindberg at Rikshospitalet for collecting the thymus tissues, Siri Flåm further analysis due to low RNA integrity number, leaving 40 cDNA and Hege D Sollid for preparing the DNA and RNA samples, and Haleh Saeedi for samples for further analysis. Quantile-normalised data were log -trans- 2 TaqMan gene expression analyses. This study was supported by grants from the formed by using J-express software.11 After log transformation, the probe 2 South-Eastern Norway Regional Health Authorities, the Norwegian Diabetes intensity values ranged from 5 to 14. Association and Novo Nordisk. TaqMan real-time gene expression In order to validate the microarray data, we measured gene expression for REFERENCES nine randomly selected genes (selected among the genes within the CD- 1 Sollid LM. Coeliac disease: dissecting a complex inflammatory disorder. Nat Rev associated regions) with predesigned TaqMan gene expression assays Immunol 2002; 2: 647–655. (Applied Biosystems). We selected assays designed to capture all or at least 2 Dubois PC, Trynka G, Franke L, Hunt KA, Romanos J, Curtotti A et al. Multiple as many splice variants as possible: MMEL1 (Hs00364353_m1), TNFRSF14 common variants for celiac disease influencing immune gene expression. (Hs00998600_g1), REL (Hs00231279_m1), CTLA4 (Hs03044418_m1), OLIG3 Nat Genet 2010; 42: 295–302. (Hs00703087_s1), TNFAIP3 (Hs01568119_m1), IL2 (Hs00174114_m1), IL21 3 Hunt KA, Zhernakova A, Turner G, Heap GA, Franke L, Bruinenberg M et al. Newly (Hs00222327_m1) and CLEC16A (Hs00389799_m1). In addition, we identified genetic risk variants for celiac disease related to the immune response. quantified the expression of the six putative thymus-specific differentially Nat Genet 2008; 40: 395–402. expressed genes: GRHL3 (Hs00297962_m1), IL22RA1 (Hs00222035_m1), 4 van Heel DA, Franke L, Hunt KA, Gwilliam R, Zhernakova A, Inouye M et al. TMEM182 (Hs00288464_m1), FBXL2 (Hs00247211_m1), LRRC2 A genome-wide association study for celiac disease identifies risk variants in the (Hs00225885_m1) and POU2F1 (Hs01552835_m1_M). Gene expression region harboring IL2 and IL21. Nat Genet 2007; 39: 827–829. was performed by real-time PCR (RT-PCR). All samples were run in 5 Emilsson V, Thorleifsson G, Zhang B, Leonardson AS, Zink F, Zhu J et al. triplicates in an ABI7900HT instrument under standard conditions and Genetics of gene expression and its effect on disease. Nature 2008; 452: analysed by the SDS v2.3 software (Applied Biosystems) after removal of 423–428. samples with standard deviation above 0.167. All reactions were À 1 6 Stranger BE, Forrest MS, Clark AG, Minichiello MJ, Deutsch S, Lyle R et al. Genome- performed in a total volume of 10 ml: 4 ml cDNA (1 ng ml ), 0.5 ml Taqman wide associations of gene expression variation in humans. PLoS Genet 2005; 1: gene expression assay (20 Â ), 5 ml Taqman gene expression master mix e78. (10 Â ) and 0.5 ml sterile water. 7 Fu J, Wolfs MG, Deelen P, Westra HJ, Fehrmann RS, Te Meerman GJ et al. Unraveling the regulatory mechanisms underlying tissue-dependent genetic eQTL analysis variation of gene expression. PLoS Genet 2012; 8: e1002431. SNP genotypes were correlated with expression data for all genes locating 8 Hodson DJ, Janas ML, Galloway A, Bell SE, Andrews S, Li CM et al. Deletion of the within a window of 1 Mb surrounding the respective SNP (that is, 500 kb on RNA-binding proteins ZFP36L1 and ZFP36L2 leads to perturbed thymic devel- each side of the SNP) within the 39 non-HLA CD-associated regions. The opment and T lymphoblastic leukemia. Nat Immunol 2010; 11: 717–724. Illumina chip contained expression data for 337 genes (covered by 549 9 Planel S, Salomon A, Jalinot P, Feige JJ, Cherradi N. A novel concept in anti- probes) of the total 424 genes located within these 39 regions (that is, 87 angiogenic and antitumoral therapy: multitarget destabilization of short-lived genes were not covered by probes on the array; Supplementary Table 1). mRNAs by the zinc finger protein ZFP36L1. Oncogene 2010; 29: 5989–6003. The number of genes tested for each region ranged from 2 to 23, with an 10 Vafiadis P, Bennett ST, Todd JA, Nadeau J, Grabs R, Goodyer CG et al. Insulin average of 8.6 (Supplementary Table 1). expression in human thymus is modulated by INS VNTR alleles at the IDDM2 locus. Nat Genet 1997; 15: 289–292. 11 Dysvik B, Jonassen I. J-Express: exploring gene expression data using Java. The Genevar software Bioinformatics 2001; 17: 369–370. The gene expression variation (Genevar) software was used in order to 12 Stranger BE, Montgomery SB, Dimas AS, Parts L, Stegle O, Ingle CE et al. Patterns compare the eSNPs regulatory potential across the cell types and tissues of cis regulatory variation in diverse human populations. PLoS Genet 2012; 8: available in this database (http://www.sanger.ac.uk/resources/software/ e1002639. genevar/). eQTL data were available from (1) HapMap lymphoblastoid cell 13 Dimas AS, Deutsch S, Stranger BE, Montgomery SB, Borel C, Attar-Cohen H et al. lines from several populations (CEU (n ¼ 109), CHB (n ¼ 80), GIH (n ¼ 82), Common regulatory variation impacts gene expression in a cell type-dependent JPT (n ¼ 82), LWK (n ¼ 82), MEX (n ¼ 45), MKK (n ¼ 138) and YRI (n ¼ 108));12 manner. Science 2009; 325: 1246–1250. (2) T-cells, lymphoblastoid cell lines and fibroblast derived from umbilical 14 Nica AC, Parts L, Glass D, Nisbet J, Barrett A, Sekowska M et al. The architecture of cords of 75 Geneva GenCord individuals13 and (3) adipose (n ¼ 166), skin gene regulatory variation across multiple human tissues: the MuTHER study. PLoS (n ¼ 160) and lymphoblastoid cell lines (n ¼ 156) derived from healthy Genet 2011; 7: e1002003.

Supplementary Information accompanies this paper on Genes and Immunity website (http://www.nature.com/gene)

Genes and Immunity (2014) 355 – 360 & 2014 Macmillan Publishers Limited