and Immunity (2006) 7, 359–365 & 2006 Nature Publishing Group All rights reserved 1466-4879/06 $30.00 www.nature.com/gene

ORIGINAL ARTICLE Sequence variation, linkage disequilibrium and association with Crohn’s disease on 5q31

C Onnie1,4, SA Fisher1, K King1, M Mirza1, R Roberts1, A Forbes2, J Sanderson3, CM Lewis1 and CG Mathew1 1Department of Medical and Molecular Genetics, King’s College London School of Medicine, Guy’s Hospital, London, UK; 2Institute for Digestive Diseases, University College London Hospitals NHS Trust, London, UK and 3Department of Gastroenterology, Guy’s and St Thomas’ NHS Foundation Trust, St Thomas’ Hospital, London, UK

Chromosome 5q31 contains a cluster of genes involved in immune response, including a 250 kb risk haplotype associated with Crohn’s disease (CD) susceptibility. Recently, two functional variants in SLC22A4 and SLC22A5 (L503F and G-207C), encoding the cation transporters OCTN1 and OCTN2, were proposed as causal variants for CD, but with conflicting genetic evidence regarding their contribution. We investigated this locus by resequencing the coding regions of 10 genes in 24 CD cases and deriving a linkage disequilibrium (LD) map of the 27 single nucleotide polymorphisms (SNPs) detected. Ten SNPs representative of the LD groups observed, were tested for CD association. L503F in SLC22A4 was the only nonsynonymous SNP significantly associated with CD (P ¼ 0.003), but was not associated with disease in the absence of other markers of the 250 kb risk haplotype. Two other SNPs, rs11242115 in IRF1 and rs17166050 in RAD50, lying outside the 250 kb risk haplotype, also showed CD association (P ¼ 0.019 and P ¼ 0.0080, respectively). The RAD50 contains a locus control region regulating expression of the Th2 cytokine genes at this locus. Other as yet undiscovered SNPs in this region may therefore modulate gene expression and contribute to the risk of CD, and perhaps of other inflammatory phenotypes. Genes and Immunity (2006) 7, 359–365. doi:10.1038/sj.gene.6364307; published online 18 May 2006

Keywords: Crohn’s disease; chromosome 5q31; linkage disequilibrium

Introduction other and were strongly associated with CD,7 a finding that has since been replicated in several populations.8–10 Crohn’s disease (CD) [MIM266600] is one of two Re-sequencing of the genes within the 250 kb risk common subtypes of inflammatory bowel disease haplotype in seven individuals failed to identify the (IBD), and is characterized by chronic inflammation that susceptibility gene at the IBD5 locus.7 High-resolution can affect any part of the gastrointestinal tract. The analysis of the haplotype structure across 5q31 has prevalence of CD in Western countries is 26–198/100 000 shown discrete haplotype blocks, each with a limited and this, together with its clinical course and high diversity of between two and four haplotypes. Recombi- 1 morbidity, represents a significant burden to health care. nation occurs between blocks but this is modest enough Attempts to localize CD susceptibility genes through to detect long-range LD among blocks.11 genome-wide linkage studies have identified regions of 2 It has recently been reported that two SNPs within the significant linkage on several . Subsequent adjacent SLC22A4 and SLC22A5 genes at the IBD5 locus, positional cloning and candidate gene studies identified which alter the function of the OCTN1 and OCTN2 mutations within the CARD15 (NOD2) gene, at a cation transporters which they encode, are associated susceptibility locus on chromosome 16, that were 12 3–5 with CD. These two SNPs (L503F and G-207C in strongly associated with CD. A genomewide search SLC22A4 and SLC22A5, respectively), which are in strong in Canadian families found evidence of linkage to CD on 6 LD, lie within the IBD5 risk haplotype, and it has been chromosome 5q31. Linkage disequilibrium (LD) map- proposed that it is these variants that confer the disease ping of this locus (known as IBD5) identified a 250 kb risk for CD rather than other alleles on the existing 250 kb haplotype containing 11 single nucleotide polymorph- haplotype. However, it remains unclear whether these isms (SNPs) that were in almost complete LD with each two genes are the causative genes at the IBD5 locus, as some studies have shown that the variants are not Correspondence: Professor CG Mathew, Department of Medical and independent of the general risk haplotype at this locus.13– Molecular Genetics, King’s College London School of Medicine, 8th 16 In addition, mutations within SLC22A5 that have been Floor Guy’s Tower, Guy’s Hospital, London SE1 9RT, UK. associated with systemic carnitine deficiency do not show E-mail: [email protected] any evidence of gastrointestinal disturbance.17 It is there- 4Current address: St Mark’s Hospital, Northwick Park, Harrow, Middlesex, UK fore possible that causal genes or mutations for CD at this Received 15 February 2006; revised 13 April 2006; accepted 13 April locus are yet to be identified. This chromosomal region 2006; published online 18 May 2006 is also of considerable interest in other inflammatory Chromosome 5q31 in Crohn’s disease C Onnie et al 360 disorders as it contains a cluster of cytokine genes, defined by Daly et al.11 (Figure 1), and both show and sequence variants within it have been associated moderate association with CD in our sample (P ¼ 0.019 with a variety of inflammatory phenotypes. An intronic and P ¼ 0.030, respectively), with SNP IGR2063 in block 4 SNP in a RUNX1 binding site of SLC22A4 is associated showing the strongest association (P ¼ 0.0009). There was with rheumatoid arthritis,18 variants in the IL4 and IL13 strong LD across the 250 kb risk haplotype as represented genes are associated with asthma and atopy,19 and the by C2063G and G3236T (D0 ¼ 0.68), in contrast to the SLC22A5 promoter SNP is associated with psoriatic Canadian sample in which SNPs across this region were arthritis.20 in complete LD.7 SNPs IGR2063 and IGR2011 are in In view of the potential importance of these findings to strong LD with D0 ¼ 0.95. A test for conditional associa- the understanding of the pathogenesis of CD and its tion, conditioning on the effect of IGR2063, showed that general relevance to inflammatory phenotypes, we have SNP IGR3236 does not significantly refine the disease carried out a comprehensive mutation screen of the haplotype; that is, there is no significant difference in the coding regions of nine known genes and one hypothe- disease odds ratios due to IGR2063 between haplotypes tical gene in this region, refined the LD map and tested with and without IGR3236. relevant variants for association with CD. We find The observation that the SNP IGR2011, which lies evidence of association of variants that are outside the outside the 250 kb risk haplotype, was associated with canonical 250 kb risk haplotype with CD, and which are CD, and was in strong LD with an SNP within the risk located close to sequence elements that affect expression haplotype in our British sample, led us to consider of cytokine genes. whether regions adjacent to the canonical risk haplotype might harbour causal susceptibility variants for CD. We therefore selected the four known genes from the 250 kb Results risk haplotype (P4HA2, PDLIM4, SLC22A4 and SLC22A5) and the next five genes telomeric to SLC22A5 (IRF1, IL5, We initially tested three SNPs that flank the disease risk RAD50, IL13 and IL4) which include several plausible haplotype defined by Rioux et al.7 for association with candidates, for re-sequencing in a panel of 24 unrelated CD in a British case–control sample. The SNPs IGR2011 cases of CD. Cases were selected for those with disease and IGR3236 are located at opposite ends of the region of onset occurring below age 21 years in view of the association in haplotype blocks 3 and 10 as previously association of this locus with early-onset disease,6,8 and

rs274558 rs4646204 rs1050152 rs9895 C807T T/A C1672T 3’UTR G+20C rs1800474 18.8% 4.2% 18.8% 4.2% rs2070874 A117G 5’UTR C-33T 2.1% 4.2% rs2070874 rs17165851 5’UTR G-77A rs272879 G775T rs17166050 4.2% C1182G 4.2% IVS4 G+19A 18.8% 10.4% rs11242115 5’UTR C-184G 25% rs13180043 rs2304081 5’UTR C-78T IVS6 G+5A rs270619 rs847 4.2% 4.2% A591G 3’UTR C+695T 4.2% 6.3% rs960757 5’UTR T-79C 10.4% rs13180186 rs20541 5’UTR G-107T rs3761659 G389A 4.2% IVS3 G-93C rs1007602 6.3% 4.2% C255T rs2070724 10.4% IVS5 C-7T 10.4% rs13181069 Exon 10 5’UTR G-139T C1540T 4.2% rs3792876 1.0% IVS1 C+6607T rs9282762 4.2% 5’UTF A555G G-140A 10.4% rs2631367 20.8% 5’UTR G-207C 18.8%

IL4 IL13 RAD50 IL5 IRF1 LOC441108 SLC22A5 SLC22A4 PDLIM4 P4HA2

IGR2011 IGR2063 IGR3236 LD haplotype block (as defined by Daly et al11 ) 1 2 3 4 5 6 7 8 9 10

Figure 1 Map of 5q31 region and SNPs identified from gene re-sequencing in 24 CD cases. Distinct LD groups (pairwise LD for SNPs r240.8) are indicated by different colours. White boxes indicate SNPs that are not in strong LD with any other group. The frequency of the SNPs in the 24 CD cases is shown, and the positions of the genes relative to the block–like haplotype structure described by Daly et al.11 are shown. The SLC22A4 and SLC22A5 genes encode the cation transporters OCTN1 and OCTN2, respectively.

Genes and Immunity Chromosome 5q31 in Crohn’s disease C Onnie et al 361 Table 1 SNPs detected by re-sequencing the 10 genesa in 24 unrelated CD cases

Gene rs number Sequence change Functional significance

IL4 rs2070874 50UTR C-33T FTZ-F1 binding site IL13 rs20541 Exon 4 G389A R144Q rs847 30UTR C+695T RAD50 rs17166050 IVS4 G+19A N/A Exon 10 C1540T L514L IL5 rs1800474 Exon 1 A117G R41R IRF1 rs11242115 50UTR C-184G rs960757 50UTR T-79C Sp1 binding site rs2070724 IVS5 C-7T rs9282762 Exon 6 A555G P185P SLC22A5 rs2631367 50UTR G-207C Heat shock transcription factor binding site rs13181069 50UTR G-139T rs13180186 50UTR G-107T rs13180043 50UTR C-78T AP-2a transcription factor binding site rs2070874 50UTR G-77A FTZ-F1 binding site rs274558 Exon 4 C807T L269L SLC22A4 rs3792876 IVS1 C+6607T RUNX1 binding site rs3761659 IVS3 G-93C rs2304081 IVS6 G+5A Splice site rs272879 Exon 7 C1182G T394T rs1050152 Exon 9 C1507T L503F rs4646204 30UTR T+467A PDLIM4 rs1007602 Exon 3 C255T G85G rs270619 Exon 5 A591G P197P rs17165851 Exon 6 G775T G259T rs9895 30UTR G+20C P4HA2 N/A 50UTR G-140A USF binding site aNo variants were detected in the putative RNA encoding gene LOC441108. enriched for the disease risk haplotype as defined by SNPs, including a rare SNP L514L in exon 10 of RAD50. IGR2063 (18 homozygotes, six heterozygotes). This panel Twelve SNPs occurred in 50 or 30 untranslated regions of provides 490% power to detect SNPs with a frequency genes, and five SNPs were intronic. Rs11242115 in the of 5% or more, as compared to the previous re- 50UTR of IRF1 was identified as the previously geno- sequencing screen for common variants in six CD cases typed IGR2011. The TRANSFAC database predicted that and two unaffected individuals.7 We also investigated a seven of these SNPs, including G-207C in SLC22A5 and hypothetical gene, LOC441108, which is located between the RUNX1-binding domain SNP rs3792876, were lo- SLC22A5 and IRF1.21 This gene is supported by 39 cated in potential regulatory sites.22 The SNP rs2304081 human expressed sequence tags and three full-length in intron 6 of SLC22A4 was identified as a potential splice cDNA sequences in GenBank (including the cDNA site mutation. FLJ46914). It comprises four exons spread over a 66 kb LD was assessed across the SNPs, and pairwise LD stretch of (5:131 774 000–131 840 000 in coefficients (r2) calculated using Haploview30 are shown Ensembl v36), taking up most of the 86 kb, which in Figure 2. Of the 27 SNPs detected, 21 SNPs could be separates the flanking SLC22A5 and IRF1 genes, and is classified into five distinct LD groups according to values currently annotated as Ensembl known transcript of r2 as described in Patients and methods (Figure 1). The NP_001013739.1. Recognizable orthologous sequences remaining six SNPs (represented as unshaded in are found in syntenic sites and with good consensus Figure 1) were not in strong LD with any other SNP splice signals in the genomes of chimpanzee, rhesus and therefore each represents an additional LD group. A monkey, pig, dog, cow, rat and mouse. The gene is maximum of two LD groups were observed for each reasonably well conserved at the nucleotide level, gene, suggesting a limited diversity of haplotypes, as particularly in exon 2 where there are two regions suggested by the haplotype block structure proposed showing 90% identity between human and mouse; of previously.11 A representative SNP from each LD group note is a 15 bp sequence in the middle of exon 2, which is (with the exception of two rare SNPs, RAD50_C1540T invariant in all nine species examined. Despite this, there and IL5_A117G) was selected to test for association with are multiple reading frame shifts between the species, CD, with a preference for those that might be of and no conserved open reading frame. The assumption functional importance (Table 1, and Figure 1). SNP must be that this is a functional RNA that does not IGR2011 in IRF1 (50UTR C-184G) had already been typed encode a . as representative of haplotype block 3 (see above and Mutation screening of the coding regions of these 10 Table 2). SNPs L503F in SLC22A5 and G-207C in genes in our panel of 24 CD cases revealed 27 SNPs. The SLC22A5 were in the same LD group but were both sequence changes and their potential functional effects genotyped in view of the proposed direct functional role are listed in Table 1, and their locations are shown in of these SNPs.12 Therefore, a total of nine additional Figure 1. Three SNPs were non-synonymous, including SNPs were genotyped in the British CD case–control L503F in SLC22A4. There were seven exonic synonymous sample. Thus, SNPs representing the observed LD

Genes and Immunity Chromosome 5q31 in Crohn’s disease C Onnie et al 362 9 6 6 9 1 2 4 7 6 8 4 1 0 43 9 85 02 5 1 287 165 408 879 015 620 136 810 801 558 087 1 605 4 9 6 0 2 5 4 3 1 1 4 7 5 6 65 76 800 C>T 7 11 24 62 6 7 7 3 7 0 6 6 3 3 7 0 3 9 4 0 4 - 1 3 3 2 2 1 4 2 1 1 2 2 71 00 31 s s s s s s s s s s s 1 1 10_ r r r r r r r r r r r 242 075 707 827 s s 7 541 047 087 r r s17 VS1 Ex I 4_ 4_ 4_ 4_ 4_ 4_ 5_ 5_ 5_ 5_ 5_ _ _ _rs1 80 07 4 4 2_ A5 M M 50_ 50_r I I LIM4_rs98 LIM4_rs27 D D C22A C22A C22A C22A C22A C22A C22A HA 22 C22A C22A C22A C22A A A L L L L L 4 L L L L L L D D P S PDL S P S PDL S P S S S IL13_rs84 IL13_rs20 R S IL4_rs2 R S IL5_rs1 IRF1_rs11 SL IRF1_rs96 IRF1_rs20 IRF1_rs92 S S

0.48

0.50 0.50 0.88 0.48 0.88 0.50

0.48 0.88 0.48

0.50

0.88

0.44

Figure 2 Pattern of linkage disequilibrium between the SNPs detected in the 10 genes across the IBD5 locus. Pairwise LD is shown for the 27 SNPs analysed. Each cell represents LD as measured by the coefficient r2. Black cells denote r2 ¼ 1, indicating absolute LD, and white cells indicate no LD. Other values of r2 greater than 0.4 are shown.

Table 2 Allele frequencies of 12 SNPs in Crohn’s cases and controlsa

SNP Allele CD, frequency Controls, frequency Significant P-values

IL4_rs2070874 T 14.3 13.3 IL13_R144Q T 16.0 18.5 RAD50_rs17166050 C 83.6 78.2 0.0080 IGR2011 (IRF1_rs11242115) G 41.9 35.8 0.019 IRF1_rs960757 T 30.7 33.6 IGR2063 G 48.4 39.9 0.00092 SLC22A5_G-207C C 51.3 45.5 0.028 SLC22A4_L503F T 48.0 40.5 0.003 SLC22A4_rs3792876 T 7.6 8.1 PDLIM4_G259T A 9.7 10.2 PDLIM4_rs1007602 T 33.5 32.7 IGR3236 T 43.2 49.0 0.030

Abbreviations: CD; Crohn’s disease, SNPs, single nucleotide polymorphisms aGenotyping was carried out in 632 CD cases and 284 controls.

groups from all the genes in this region, with the to controls (P ¼ 0.003 and P ¼ 0.008, respectively). In exception of IL5 and LOC441108, were tested for addition, a moderately significant association was association with CD. SNP G-140A in P4HA2 was in observed for G-207C in SLC22A5 (P ¼ 0.028), and for complete LD with L503F in SLC22A4 and G-207C in the IGR2011 SNP in the 50UTR of IRF1 (P ¼ 0.019). There SLC22A5 (Figure 2); only one low-frequency synon- were no significant differences in allele frequencies ymous SNP was detected in IL5, and no SNPs were between cases and controls for the SNPs representing detected in LOC441108. Allele frequencies are shown in IL4, IL13, PDLIM4 and P4HA2. Table 2. Two SNPs, L503F in SLC22A4 and the common In order to investigate the contribution of the L503F/ allele of an intronic SNP, rs17166050 in RAD50, showed a G-207C risk haplotype proposed by Peltekova et al.12 significantly increased frequency in CD cases compared relative to the background 250 kb risk haplotype (as

Genes and Immunity Chromosome 5q31 in Crohn’s disease C Onnie et al 363 defined here by the IGR2063G allele), cases and controls TC haplotype in CD cases versus controls. Our findings were stratified according to the presence or absence of are consistent with several very recent studies of the IGR2063G. In 278 individuals (177 cases, 101 controls) contribution of these variants to CD susceptibility,13–15 who did not carry the IGR2063G allele, the frequency of that do not support a causal role for these variants and the L503F/G-207C TC haplotype is not significantly are not consistent with the findings of Peltekova et al.12,25 increased in cases (3.4%) compared with controls (2.8%, Taken together, the weight of current evidence suggests P ¼ 0.78), indicating that the L503F/G-207C haplotype is that despite the effect of the L503F and G-207C variants only associated in the presence of the common risk on the function of the cation transporters OCTN1 and haplotype as defined by Daly et al.11 An association of OCTN2,12 they are unlikely to have a direct causal role in CD with the common allele of RAD50_rs17166050, which the pathogenesis of CD. lies outside the canonical risk haplotype, was further In an effort to identify functional sequence variants investigated in the context of this background haplotype. with a direct role in CD susceptibility, we re-sequenced The LD between RAD50_rs17166050 and IGR2063 is D0 all four protein-encoding genes that lie within the IBD5 0.54, consistent with the LD risk haplotype block risk haplotype and five genes that lie telomeric to the risk structure proposed by Daly et al.11 Two SNP haplotype haplotype, given that SNPs in haplotype blocks 1 and 3 frequencies for these two markers showed that the risk of Daly et al.11 are also associated with CD in 24 unrelated allele C of RAD50_rs17166050 is increased in cases CD patients. These included the functional candidates compared to controls only in the presence of the IL4, IL5 and IL13, which may be involved in the IGR2063G risk allele (RAD50_rs17166050C-IGR2063G: hyperactive immune response observed in CD as well C-G 44.6% in cases versus 35.9% in controls; C-C 38.9% as other chronic inflammatory conditions. Only three in cases versus 42.0% in controls); thus the non-synonymous SNPs were observed – L503F in RAD60_rs17166050C association appears to be secondary SLC22A412 (which was not independently associated to the 250 kb risk haplotype and results from LD between with CD), and two further SNPs, R144Q in IL13 and these SNPs. G259T in PDLIM4, neither of which was associated with CD. Thus, any causal sequence variants for CD that are located within this region are likely to be silent or non- coding changes that affect the expression of one or more Discussion genes at this locus. None of the non-coding SNPs that did The association of sequence variants at the IBD5 locus on show association with CD have any obvious functional chromosome 5q31 with CD is strong and well replicated, significance other than G-207C, which we find to be but there is no general agreement as to the precise associated only in the presence of the IBD5 risk identity of the disease susceptibility genes and alleles haplotype. Interestingly, the IBD5 locus is replete with involved.13–16 The original purpose of our study was to conserved non-coding sequences that are involved in the 26 fine-map the region of association with CD and to try to regulation of the TH2 cytokine gene cluster, and identify functional sequence variants that were directly detailed sequence analysis of such regions may reveal related to pathogenesis. Our analysis of LD across the variants that alter expression of these cytokines. Finally, region is consistent with the model proposed by Daly the detailed characterization of sequence variation and et al.11 of discrete blocks characterized by stretches of LD relationships at this locus provided in this study may high LD interspersed with short intervals of LD break- be usefully applied to the genetic analysis of its down and high levels of recombination. However, we involvement in other inflammatory disease phenotypes. did observe decay in LD across the 250 kb IBD5 risk haplotype, in contrast to the finding of almost complete LD described in the Canadian population.7 The reasons Patients and methods for such differences are unclear but may reflect differ- ences in population growth, admixture or migration - Patient ascertainment factors that could all influence LD.23 Indeed, a recent A cohort of unrelated British patients with CD (N ¼ 632) comparison of LD at this locus in West Africans and were recruited after ethical review and after obtaining Europeans found evidence of long-range LD in the West informed consent from Guy’s and St Thomas’ Hospitals African sample, which may be indicative of recent and St Mark’s Hospital, UK. The diagnosis of CD was positive selection in this population.24 made by established criteria of clinical, radiological and 27 Importantly, we observed association of SNPs outside endoscopic analysis and from histology reports. the canonical risk haplotype with CD; SNPs rs17166050, Healthy controls (N ¼ 284) with no reported history of located in intron 4 of RAD50, and rs11242115 in 50UTR of immunological disorders were recruited from the Great- IRF1 (IGR_2011) were significantly associated with CD, er London region to match the recruitment area and and are located in blocks 1 and 3, respectively, as defined ethnicity of CD cases. The ethnicity of both cases and by Daly et al.11 The association of the RAD50 SNP with controls was approximately 94% white Caucasian and CD appears to be dependent on the presence of the 6% non-Caucasian; there was no evidence of population IGR2063 risk allele and may therefore be a result of long- stratification in these cohorts in previous case–control 28–31 range LD, but it is also possible that the true causal studies. variant or variants lie outside the previously defined 250 kb risk haplotype. Further haplotype-based studies Mutation screening are required to define the limits of the association with We designed primers to amplify the exons and flanking CD at this locus. Also, in individuals who were negative introns of the 10 genes shown in Figure 1 in 24 unrelated for the IBD5 risk haplotype, we did not find a CD cases (primers and PCR conditions available on significantly increased frequency of the L503F-G-207C request). Conserved sequences of the hypothetical gene

Genes and Immunity Chromosome 5q31 in Crohn’s disease C Onnie et al 364 LOC441108 most likely to represent exons were identi- variants with susceptibility to Crohn’s disease. Nature 2001; fied by comparing seven different orthologues. The 411: 599–603. RUNX1 binding site SNP rs3792876 was also sequenced 4 Ogura Y, Bonen DK, Inohara N, Nicolae DL, Chen FF, in the same panel of 24 CD cases. Ramos R et al. A frameshift mutation in Nod2 associated with susceptibility to Crohn’s disease. Nature 2001; 411: Genotyping 603–606. 5 Hampe J, Cuthbert A, Croucher P, Mirza MM, Mascheretti S, Genotyping was performed using Pyrosequencing and Fisher S et al. Association between insertion mutation in TaqMan platforms. Pyrosequencing primers were de- NOD2 gene and Crohn’s disease in German and British signed for IGR2063 (haplotype block 4 as defined by populations. Lancet 2001; 357: 1925–1928. Daly et al.11), rs960757 (IRF1), R144Q (rs20541 – IL13), 6 Rioux JD, Silverberg MS, Daly MJ, Steinhart AH, McLeod RS, G259T (rs17165851 – PDLIM4), rs17166050 (RAD50), Griffiths AM et al. Genomewide search in Canadian families L503F (rs1050152 – SLC22A4) and rs3792876 (SLC22A4). with inflammatory bowel disease reveals two novel suscept- TaqMan probes and primers are available from ABI- ibility loci. Am J Hum Genet 2000; 66: 1863–1870. assay-on-demand/design for IGR3236 (haplotype 7 Rioux JD, Daly MJ, Silverberg MS, Lindblad K, Steinhart H, block10), IGR2011 (haplotype block 3, rs11242115 Cohen Z et al. Genetic variation in the 5q31 cytokine gene (IRF1), rs2070874 (IL4), rs1007602 (PDLIM4) and G- cluster confers susceptibility to Crohn disease. Nat Genet 2001; 29: 223–228. 207C (rs17166050 – SLC22A5). Sequence and reaction 8 Mirza MM, Fisher SA, King K, Cuthbert AP, Hampe J, conditions are available on request. All SNP assays were Sanderson J et al. Genetic evidence for interaction of the verified by comparison with the 24 CD samples that had 5q31 cytokine locus and the CARD15 gene in Crohn disease. been genotyped by DNA sequencing, and by checking Am J Hum Genet 2003; 72: 1018–1022. for Hardy–Weinberg equilibrium in the control popula- 9 Giallourakis C, Stoll M, Miller K, Hampe J, Lander E, Daly M tion. All control genotypes were in Hardy–Weinberg et al. IBD5 is a general risk factor for inflammatory bowel equilibrium, and average genotyping call rates were disease: replication of association with Crohn’s disease and 95.5%. identification of a novel association with Ulcerative Colitis. Am J Hum Genet 2003; 78: 205–211. 10 Negoro K, McGovern DPB, Kinouchi Y, Takahashi S, Lench Statistical analysis NJ, Shimosegawa T et al. Analysis of the IBD5 locus and Tests for Hardy–Weinberg equilibrium and case–control potential gene–gene interactions in Crohn’s disease. Gut 2003; association analyses (difference in allele frequency) were 52: 541–546. performed using w2-proportions tests (Splus v6.0); two- 11 Daly MJ, Rioux JD, Schaffner SE, Hudson TJ, Lander ES. High- sided P-values are reported. Pairwise SNP LD coeffi- resolution haplotype structure in the . Nat cients (D0, r2)32,33 were calculated using Haploview.34 LD Genet 2001; 29: 229–232. groups were defined such that the LD coefficient r2 was 12 Peltekova VD, Wintle RF, Rubin LA, Amos CI, Huang Q, greater than 0.8 for all pairs of SNPs within a group. Gu X et al. Functional variants of OCTN cation transporter SNPs with high values of r2 will not only be in strong LD genes are associated with Crohn disease. Nat Genet 2004; 36: but will also be of similar frequency and therefore have 471–475. equivalent power to detect association. Haplotype 13 To¨ro¨k HP, Glas J, Tonenchi L, Lohse P, Mu¨ ller-Myhsok B, Limbersky O et al. Polymorphisms in the DLG5 and OCTN frequencies in cases and controls were estimated by an cation transporter genes in Crohn’s disease. Gut 2005; 54: expectation-maximisation algorithm implemented using 1354–1357. 35 COCAPHASE, a module of UNPHASED. Significance 14 Vermeire S, Pierik M, Hlavaty T, Claessens G, van Shuerbeeck levels for differences in haplotype frequencies between N, Joossens S et al. Association of organic cation transporter subgroups were obtained from unconditional logistic risk haplotype with perianal penetrating Crohn’s disease but regression as previously described36 and verified using not with susceptibility to IBD. Gastroenterology 2005; 129: 1845– permutation tests with 10 000 replicates. Tests for 1853. conditional analyses were carried out using COCA- 15 Noble CL, Nimmo ER, Drummond H, Ho GT, Tenesa A, Smith PHASE, providing a test for equality of odds ratios for L et al. The contribution of OCTN1/2 variants within the IBD5 haplotypes identical at conditioning loci. locus to disease susceptibility and severity in Crohn’s disease. Gastroenterology 2005; 129: 1854–1864. 16 Trinh TT, Rioux JD. Understanding association and causality in the genetic studies of inflammatory bowel disease. Acknowledgements Gastroenterology 2005; 129: 2106–2110. 17 Nezu J, Tamai I, Oku A, Ohashi R, Yabuuchi H, Hashimoto N This work was supported by CORE, the 5th Framework et al. Primary systemic carnitine deficiency is caused by Programme of the European Commission, and the mutations in a gene encoding sodium ion-dependent carnitine Wellcome Trust. transporter. Nat Genet 1999; 21: 91–94. 18 Tokuhiro S, Yamada R, Chang X, Suzuki A, Kochi Y, Sawada T et al. An intronic SNP in a RUNX1 binding site of SLC22A4, encoding an organic cation transporter, is References associated with rheumatoid arthritis. Nat Genet 2003; 35: 341–348. 1 Loftus EV. Clinical epidemiology of inflammatory bowel 19 Donfack J, Schneider DH, Tan Z, Kirz T, Dubchak I, Frazer KA disease: incidence, prevalence, and environmental influences. et al. Variation in conserved non-coding sequences on Gastroenterology 2004; 126: 1504–1517. chromosome 5q and susceptibility to asthma and atopy. Resp 2 Mathew CG, Lewis CM. Genetics of inflammatory bowel Res 2005; 6: 145. disease: progress and prospects. Hum Mol Genet 2004; 13: 20 Ho P, Bruce IN, Silman A, Symmons D, Newman B, Young H R161–R168 (Special Issue 1). et al. Evidence for common genetic control in pathways of 3 Hugot JP, Chamaillard M, Zouali H, Lesage S, Cezard JP, inflammation for Crohn’s disease and psoriatic arthritis. Belaiche J et al. Association of NOD2 leucine-rich repeat Arthritis Rheum 2005; 52: 3596–3602.

Genes and Immunity Chromosome 5q31 in Crohn’s disease C Onnie et al 365 21 Frazer KA, Ueda Y, Zhu Y, Gifford VR, Garofalo MR, 29 Fisher SA, Moody A, Mirza MM, Cuthbert AP, Hampe J, Mohandas N et al. Computational and biological analysis of MacPherson A et al. Genetic variation at the chromosome 16 680 kb of DNA sequence from the human 5q31 cytokine gene chemokine gene cluster: development of a strategy for cluster region. Genome Res 1997; 7: 495–512. association studies in complex disease. Ann Hum Genet 2003; 22 Heinemeyer T, Wingender E, Reuter I, Hermjakob H, Kel AE, 67: 377–390. Kel OV et al. Databases on transcriptional regulation: 30 Cuthbert AP, Fisher SA, Sanderson J, Forbes A, Lewis CM, TRANSFAC, TRRD and COMPEL. Nucleic Acids Res 1998; 26: Mathew CG. Genetic association between EPHX1 and Crohn’s 362–367. disease: population stratification, genotyping error or random 23 Ardle KG, Kruglyak L, Seielstad M. Patterns of linkage chance? Gut 2004; 53: 1386. disequilibrium in the human genome. Nat Rev Genet 2002; 3: 31 Mirza MM, Fisher SA, Lewis CM, Mathew CG. Failure to 299–309. replicate the association of a functional NFkB1 promoter 24 Luoni G, Forton J, Jallow M, Akha ES, Sisay-Joof F, Pinder M polymorphism with ulcerative colitis in a British case control et al. Population-specific patterns of linkage disequilibrium in cohort. Gut 2005; 54: 1206. the human 5q31 region. Genes Immun 2005; 6: 723–727. 32 Lewontin R. The Interaction of Selection and Linkage. II. 25 Newman B, Gu X, Wintle R, Cescon D, Yazdanpanah M, Liu X Optimum models. Genetics 1964; 50: 757–782. et al. A risk haplotype in the Solute Carrier Family 22A4/22A5 33 Devlin B, Risch N. A comparison of linkage disequilibrium gene cluster influences phenotypic expression of Crohn’s measures for fine-scale mapping. Genomics 1995; 29: disease. Gastroenterology 2005; 128: 260–269. 311–322. 26 Sallusto F, Reiner SL. Sliding doors in the immune response. 34 Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and Nat Immunol 2005; 6: 10–12. visualization of LD and haplotype maps. Bioinformatics 2005; 27 Lennard-Jones JE. Classification of inflammatory bowel 21: 263–265. disease. Scand J Gastroenterol Suppl 1989; 170: 2–6. 35 Dudbridge F. Pedigree disequilibrium tests for multilocus 28 King K, Moody A, Fisher SA, Mirza MM, Cuthbert AP, Hampe haplotypes. Genet Epidemiol 2003; 25: 115–121. J et al. Genetic variation in the IGSF6 gene and lack of 36 Zhao JH, Curtis D, Sham PC. Model-free analysis and association with inflammatory bowel disease. Eur J Immuno- permutation tests for allelic associations. Hum Hered 2000; genet 2003; 30: 187–190. 50: 133–139.

Genes and Immunity