and Immunity (2003) 4, 476–486 & 2003 Nature Publishing Group All rights reserved 1466-4879/03 $25.00 www.nature.com/gene

Complex haplotypic structure of the central MHC region flanking TNF in a West African population

HC Ackerman1,7, G Ribas2, M Jallow4, R Mott1, M Neville3, F Sisay-Joof5, M Pinder5, RD Campbell6 and DP Kwiatkowski1,7 1Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, UK; 2Fundacion Centro Nacional de Investigaciones Oncologicas Carlos III, Departmento de Patologia Molecular, Lab. Gennetica Humana, C/Melchor Fernandez Almagro, Madrid, Spain; 3Cancer Research UK, General Practice Research Group, Clinical Pharmacology, Radcliffe Infirmary, Oxford, UK; 4Royal Victoria Hospital, Banjul, The Gambia; 5MRC Laboratories, Fajara, The Gambia; 6MRC UK HGMP Resource Center, Genome Campus, Hinxton, Cambridge, UK; 7University Department of Paediatrics, John Radcliffe Hospital, Oxford, UK

TNF polymorphisms have been associated with susceptibility to malaria and other infectious and inflammatory conditions. We investigated a sample of 150 West African to determine linkage disequilibrium (LD) between 25 SNP markers located in an 80 kb segment of the MHC Class III region encompassing TNF and eight neighbouring genes. We observed 45 haplotypes, and 22 of them comprise 80% of the sample. The pattern of LD is remarkably patchy, such that many markers show no LD with adjacent markers but high LD with markers that are much further away. We introduce a method of examining the implications of LD data for disease association studies based on sample size considerations: this shows that certain TNF polymorphisms would be likely to yield positive associations if the true disease allele resided in LTA or BAT1. We conclude that detailed marker maps are needed to resolve the causal origin of disease associations observed at the TNF locus. Genes and Immunity (2003) 4, 476–486. doi:10.1038/sj..6364008

Keywords: haplotype; power; association; haplotype-tagging; linkage; disequilibrium; htSNP

Introduction associations with TNF SNPs are independent of each other, but the possibility remains that an untyped SNP The central major histocompatibility complex (MHC) has some distance away may be responsible for the observed a very high gene density, and a large proportion of these associations with disease. genes are predicted to have a role in immunity and It may be reasonable to expect that TNF SNPs, in inflammation.1 This poses a considerable challenge for weak linkage disequilibrium (LD) with each other, genetic association studies, because for a given disease will also be in weak LD with more distant SNPs in there may be many plausible candidates in close physical the central MHC; however, the strength of LD is and genetic proximity. determined not only by the physical proximity of two One candidate gene, TNF, has been associated with , but also by their proximity in time and numerous infectious and inflammatory diseases.2–12 TNF lineage.16 These factors are further modified by the is a proinflammatory cytokine that is an essential effects of genetic drift, migration, and natural selection, component of the immune response, but may be harmful and together they determine the pattern of LD in a in excess. Although there are no common polymorph- population sample.17,18 So it is conceivable that TNF isms in the amino-acid sequence of TNF, the gene SNPs, while inefficient markers of each other, may be promoter is rich with single nucleotide polymorphisms13 very good markers of SNPs in more distant genes. If such (SNPs), some of which modify the expression of the TNF heterogeneity in LD existed, it would be important to gene in vitro.6,14 Several of these promoter SNPs have account for it in the interpretation of disease association been associated with susceptibility to infectious disease studies. in Africa, raising the question of whether these are To begin to understand the relations between the independent associations or whether they share a SNPs in TNF and the SNPs in other genes of the common etiology.3,6,8,9 central MHC, we focused on an 80 kb genomic segment A recent analysis of the TNF gene at the DNA containing nine genes, AIF1, NCR3, LST1, LTB, TNF, sequence haplotype level found that many of the SNPs LTA, NFKBIL1, ATP6V1G2, and BAT1, each of which in TNF are inefficient markers of each otherFto such an is involved in immunity or inflammation (see Table 1 and extent that if one SNP in TNF were a true disease Figure 1). We developed a panel of 25 SNPs in six susceptibility locus, most of the other SNPs in TNF of these genes using DHPLC, DNA sequencing, and would appear neutral.15 This suggests that the disease database interrogation.13,19 These 25 SNPs were geno- typed in a sample of healthy Gambian adults to determine the haplotypic structure of this region. Correspondence: Dr HC Ackerman, 64 Linnaean Street, Cambridge, MA 02138, USA. E-mail: [email protected] Analysis of this haplotypic structure reveals a hetero- Received 28 October 2002; revised 25 January 2003; accepted 18 geneous pattern of allelic association between these February 2003 genes of the central MHC. Central MHC region flanking TNF HC Ackerman et al 477 Table 1 Names and functions of nine genes in the 80 kb investigated

Gene Full name; (Aliases) Protein References

AIF1 Allograft inhibitory factor 1; A leucine zipper protein containing a core nuclear 33 interferon-gamma responsive localization sequence is induced by interferon-gamma transcript 1 (IRT1) and inhibits the proliferation of vascular smooth muscle cells NCR3 Activating natural killer receptor p30; (1C7) A 30 kDa triggering receptor selectively expressed by resting 33, 34 and activated human natural killer (NK) cells LST1 Leucocyte-specific transcript 1; An alternatively spliced cell-surface molecule expressed on 35–37 lymphocyte antigen 117 (LY117) mononuclear cells; inhibits lymphocyte proliferation LTB Lymphotoxin-beta A 33 kDa glycoprotein present on the surface of activated 38, 39 T and B cells which forms a heterotrimer with LTalpha, anchoring it to the cell surface TNF Tumour necrosis factor; tumour necrosis A multifunctional proinflammatory cytokine principally 40, 41 factor alpha (TNFA) expressed by activated macrophages LTA Lymphotoxin-alpha; tumour necrosis A 70 kDa homotrimer secreted by activated lymphocytes; 42, 43 factor beta (TNFB) LTA binds to the same receptors as TNF, and has similar functions NFKBIL1 Nuclear factor of kappa light-chain A 381-amino-acid protein containing ankyrin motifs similar 44 gene enhancer in B cells inhibitor-like 1; to those of the I-kappa-B family of proteins which Inhibitor of kappa light-chain gene inhibit NF-kB enhancer in B cells-like; (IKBL) ATP6V1G2 ATPase, H+transporting, lysosomal, A vacuolar H+ transporter predicted to be involved in 34 subunit g, isoform 2 glycosylation in the Golgi, degradation of cellular debris in lysosomes, and the processing of endocytosed receptor–ligand complexes BAT1 U2AF65 associated protein, 56 kDa; A member of the DEAD-box family of ATP-dependent 45, 46 HLA-B-associated transcript 1 (06581E) RNA helicases, involved in initiation of translation, RNA splicing, and ribosome assembly; a negative regulator of inflammation

AIF1 NCR3LST1 LTB TNF LTA NFKBIL1 ATP6VIG2 BAT1

020406080kb Figure 1 Genomic organization of the region investigated. SNPs were identified in six of the nine genes of this genomic segment. The direction of transcription is shown above the name of each gene.

Results occur on the background of the NFKBIL1*15811 SNP. At the other end of the haplotype, some 60–80 kb away, Haplotypic diversity AIF1*825, also a common SNP, exists on the background A total of 25 SNPs in an 80 kb segment were genotyped in a sample of healthy Gambian adults. The sample of SNPs showed a broad range of allele frequencies 0.500 (Figure 2); those SNPs ascertained by database interrogation tended to be of greater frequency than SNPs ascertained by resequencing. These 25 SNPs 0.400 describe 45 unique haplotypes (Figure 3) and we estimate gene diversity, the chance that two randomly 0.300 sampled haplotypes are different, to be 0.96 (s.d. ¼ 0.05). In Figure 4, the cumulative haplotype frequencies are 0.200 shown; 22 unique haplotypes are required to comprise Frequency 80% of the sample. Inspection of Figure 3 reveals that a 0.100 few common SNPs dominate the haplotype structure, for example, the minor alleles of LTA*251 and BAT1*7126 occur together in a block found on about 40% of 0.000 chromosomes. TNF*À308 occurs only on a subset of SNP these chromosomes. Another common block of alleles Figure 2 The distribution of SNP frequencies (with 95% confidence is defined by BAT1*1595 and BAT1*1715 which tend to intervals) used in this study.

Genes and Immunity Central MHC region flanking TNF HC Ackerman et al 478 1.000

0.800

0.600

Frequency 0.400

0.200

0.000 Haplotype Figure 4 Frequencies of the 45 haplotypes are indicated by black columns. The cumulative frequency is plotted as a line.

of both common haplotype blocks, strong evidence of recombination. In the intervening regions containing densely spaced markers in NCR3 and TNF, we see a number of rare haplotypes that are deeply divergent from the common haplotypes.

Dissecting the haplotypic structure We applied the entropy maximization method (EMM) to identify those SNPs that most effectively dissect the underlying haplotypic structure of this 80 kb segment. The EMM searches for optimal combinations of SNPs that account for the greatest amount of haplotypic diversity as measured by entropy. In Figure 5, we plot the percentage of the full 25-SNP haplotypic diversity that is accounted for by the optimal subset of SNPs, as the number of SNPs in the subset increases from 1 to 25. The single SNP that explains the greatest proportion of the full haplotypic diversity is LTA*251. Additional SNPs (indicated on the curve) are added, explaining increasing proportions of haplotypic diversity. The optimal seven- SNP subset explains greater than 80% of the full 25-SNP haplotypic diversity.

Pairwise correlations between SNPs In order to quantify the correlations between allelic states in each pair of SNPs, we calculated r2 from the haplotypic data (Figure 6). The heterogeneity of allelic associations is evident. One would expect the strongest associations to be between those SNPs closest to each other, that is, along the diagonal of the figure. Only a few regions show strong correlations between adjacent SNPs: the five LTA, NFKBIL1, and BAT1 SNPs, and a few sets of three SNPs. We see predominantly weak associations punctuated by pockets of strong correlation, often between more distant SNPs. A small number of SNPs have significant correlations with many others. For example, the LTA*251 and BAT1*7126 SNPs are signifi- cantly correlated with 19/24 and 17/24 of the remaining SNPs, respectively. This implies that the haplotype bearing the less common alleles of LTA*251 and BAT1*7126 may have increased in frequency recently,

Figure 3 Haplotypic structure defined by 25 SNPs in AIF-1, NCR3, TNF, LTA, NFKBIL1, and BAT1. The minor allele at each SNP is indicated in black. Haplotypes were sorted so that high-frequency SNPs are given greater priority than low-frequency SNPs.

Genes and Immunity Central MHC region flanking TNF HC Ackerman et al 479 LTA*251 NCR3*-172 NCR3* 2708 NCR3* 3008 TNF*1304 TNF*-238 TNF*-376 TNF*-863 TNF*-1031

100% BAT1* 7126 NCR3*-204 TNF*467 TNF*-857 TNF*-244 NCR3* 3571 BAT1* 1715 NCR3* 3790 TNF*851

80% NCR3*-412 I185 I126 C3 98 TNF*-308, NCR3* 3918, AIF1*206, AIF1*825, AIF1*206 FBL*51,BAT1* 1595 NFKBIL1*15811, 60% BAT1* 1715 NFKBIL1*15811

40% AIF1*825 Haplotypes Defined

20% LTA*251

0% 0 5 10 15 20 25 Number of SNPs Figure 5 Percentage of the 25-SNP haplotype diversity accounted for as the number of genotyped SNPs increases from 1 to 25. with little opportunity for recombination. TNF*À1031 In this sample of haplotypes, the prospect of detecting and NCR3*3918 also have many significant correlations an association with a marker is bleak: many of the SNPs with other SNPs. would be powerless to detect a relative risk (RR) of 2 at some other SNP in this region (average power to detect RR ¼ 2, Po0.01, in 1000 cases and 1000 controls is 16%, Pairwise power analysis s.d. ¼ 28%); however, there are a few SNP pairs that are LD or association statistics summarize a pairwise rela- powerful enough markers of each other to be relevant to tion into a single statistic, when in fact there should be disease association studies. In this data set, allele two, because the ability of SNP A to detect SNP B is frequency had only a moderate effect on the power of not the same as the ability of SNP B to detect SNP A, one SNP to detect a disease association at another SNP. unless they have the same frequency. Furthermore, the When analysed in allele frequency bins of 0–0.1, 0.1–0.3, important role of allele frequency, which can (1) directly and 0.3–0.5, average pairwise power in each bin affect the measure of LD, (2) determine the power to increased with allele frequency: 13% (s.d. 24%), 18% reject LD, and (3) ultimately determine the power of an (s.d. 29%), and 28% (s.d. 38%), respectively, although LD association study, is often unaccounted for. As an (as measured by r2) did not: 0.11 (s.d. 0.24), 0.15 (s.d. alternative to LD or association statistics, we propose 0.24), 0.12 (s.d. 0.24). Although high-frequency SNPs using a different statistic, the power of a marker SNP to have greater power on average, there is considerable detect a hypothetical disease SNP. In calculating the variability in the power of low-, medium-, and high- power of one SNP to detect another, we can account for frequency SNPs, and even some of the low-frequency the strength of allelic association between a pair of SNPs, SNPs have powerful relations with other SNPs. the frequencies of their alleles, and the asymmetry of The strong correlations between the LTA, NFKBIL1, their pairwise relation. and BAT1 SNPs, combined with their high frequencies, To construct the pairwise power map, we begin by make these SNPs powerful but redundant markers of assigning a hypothetical disease SNP a relative risk of 2. each other. Another cluster of powerful markers is found We then calculate the increased frequency of a marker between NCR3*3008 and TNF*851. Outside of these two allele, among cases drawn from the general population, clusters, most SNPs would be powerless to detect a by virtue of its association with the hypothetical disease relative risk of 2 at their nearest neighbour. Moving away SNP. The power to detect this change in allele frequency from the diagonal in Figure 7, where the disease allele at the marker SNP is determined using a standard itself is typed, we see a few scattered marker–disease method.20 We calculate power for all N Â N pairs of pairs of relevant power, an indication of the heterogeneity SNPs to generate the results shown in Figure 7. Each row of allelic association in this region. A number of TNF represents a marker SNP, and each column represents a promoter SNPs show moderate power to detect disease hypothetical disease SNP. The power when the disease SNPs in the NCR3 promoter and 30 UTR even though SNP itself is typed is found along the diagonal. The they are powerless to detect associations at other SNPs in results in Figure 7 represent the power to detect an TNF. Finally, the TNF*À308 SNP, which has implicated association at the Po0.01 level in 1000 cases and 1000 the TNF gene in numerous diseases, is a powerful marker controls, when the hypothetical disease SNP in each pair of SNPs in LTA, NFKBIL1, and BAT1 despite its low is assigned a relative risk of 2. power to detect associations at other TNF promoter SNPs.

Genes and Immunity Central MHC region flanking TNF HC Ackerman et al 480 AIF1*825

AIF1*206 .07

NCR3*-412 .03 .14

NCR3*-204 .01 .01 .01

NCR3*-172 .01 .01 .01 1.00

NCR3* 2708 .01 .00 .01 .00 .00

NCR3* 3008 .00 .01 .00 .00 .00 .00

NCR3* 3571 .00 .01 .01 .00 .00 .00 .66

NCR3* 3790 .01 .00 .01 .00 .00 .00 .00 .00

NCR3* 3918 .03 .01 .01 .22 .22 .18 .27 .16 .00

TNF*1304 .01 .00 .02 .00 .00 .21 .59 .52 .00 .36

TNF*851 .02 .01 .00 .00 .00 .14 .41 .35 .01 .23 .69

TNF*467 .04 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00

TNF*-238 .01 .00 .01 .00 .00 .44 .00 .00 .00 .07 .16 .34 .00

TNF*-244 .01 .11 .09 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00

TNF*-308 .04 .00 .06 .01 .01 .01 .02 .02 .01 .05 .02 .03 .00 .01 .01

TNF*-376 .01 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .13 .00 .39 .00 .00

TNF*-857 .01 .00 .01 .00 .00 .00 .00 .00 .63 .01 .00 .01 .00 .00 .00 .02 .00

TNF*-863 .02 .01 .01 .83 .83 .03 .00 .00 .00 .27 .00 .00 .00 .00 .00 .01 .00 .00

TNF*-1031 .03 .00 .02 .44 .44 .35 .00 .01 .00 .32 .05 .12 .00 .44 .00 .02 .17 .00 .53

LTA*251 .00 .11 .29 .03 .03 .02 .02 .05 .04 .07 .06 .04 .01 .03 .03 .38 .01 .04 .03 .06

NFKBIL1*15811 .03 .04 .10 .01 .01 .00 .02 .01 .04 .01 .03 .05 .02 .01 .01 .15 .01 .05 .01 .02 .36

BAT1* 1595 .05 .01 .02 .01 .01 .01 .01 .00 .01 .01 .02 .03 .06 .01 .01 .07 .00 .01 .01 .00 .13 .20

BAT1* 1715 .09 .03 .04 .00 .00 .01 .02 .00 .02 .02 .02 .03 .04 .01 .01 .10 .00 .02 .00 .00 .17 .28 .73

BAT1*7126 .00 .14 .22 .02 .02 .02 .04 .04 .03 .08 .05 .03 .01 .02 .03 .40 .01 .04 .03 .05 .87 .36 .16 .22 AIF1*825 AIF1*206 NCR3*-412 NCR3*-204 NCR3*-172 NCR3* 2708 NCR3* 3008 NCR3* 3571 NCR3* 3790 NCR3* 3918 TNF*1304 TNF*851 TNF*467 TNF*-238 TNF*-244 TNF*-308 TNF*-376 TNF*-857 TNF*-863 TNF*-1031 LTA*251 NFKBIL1*15811 BAT1* 1595 BAT1* 1715 BAT1* 7126

Figure 6 Correlations of SNP pairs. r2,orw2 divided by the sample size, is calculated for each pair of SNPs. r2 ranges from 0 to 1. r2 values greater than 0.026 are significant at the Po0.05 level, uncorrected for multiple tests.

Discussion power map shows generally low power to detect associations with SNPs in this region, although some We analysed 25 SNPs on 150 West African chromosomes TNF polymorphisms would be likely to yield positive to determine the haplotypic structure of an 80 kb associations if the true disease allele resided in LTA or segment of the central MHC flanking the TNF locus. BAT1. We found 45 haplotypes, and a minimum of 22 different haplotypes were needed to comprise 80% of the sample. TNF associations LD is remarkably heterogeneous, such that many Polymorphisms in the TNF promoter have been asso- markers show no LD with adjacent markers but high ciated with numerous infectious and inflammatory LD with markers that are further away. We describe these diseases.2–12 In West African populations, the TNFÀ308 pairwise associations using a conventional measure of polymorphism has been associated with both trachoma LD, r2, and introduce an alternative pairwise statistic, the and cerebral malaria; the TNFÀ238 polymorphism has power of a marker SNP to detect a given disease been associated with severe malarial anaemia; and the association at a hypothetical disease SNP. Our pairwise TNFÀ376 has been associated with cerebral malaria.3,6,8,9

Genes and Immunity Central MHC region flanking TNF HC Ackerman et al 481 AIF1*825 1.0 .23 .09 .01 .01 .01 .00 .01 .01 .06 .02 .04 .02 .01 .01 .12 .01 .01 .02 .05 .01 .09 .15 .35 .01

AIF1*206 .32 1.0 .49 .01 .01 .01 .02 .02 .00 .03 .01 .01 .01 .00 .05 .00 .01 .00 .01 .01 .43 .17 .04 .08 .54

NCR3*-412 .11 .46 1.0 .01 .01 .01 .01 .02 .01 .02 .02 .01 .01 .01 .04 .26 .01 .02 .01 .03 .89 .50 .06 .13 .79

NCR3*-204 .04 .02 .02 .71 .71 .00 .00 .00 .00 .51 .01 .01 .00 .00 .00 .03 .00 .00 .70 .63 .09 .02 .02 .01 .07

NCR3*-172 .04 .02 .02 .71 .71 .00 .00 .00 .00 .51 .01 .01 .00 .00 .00 .03 .00 .00 .70 .63 .09 .02 .02 .01 .07

NCR3*2708 .03 .01 .02 .00 .00 .58 .00 .00 .00 .39 .26 .23 .00 .31 .00 .02 .00 .00 .02 .49 .06 .01 .02 .02 .05

NCR3*3008 .00 .02 .01 .00 .00 .00 .91 .80 .01 .65 .79 .73 .00 .00 .00 .05 .00 .01 .01 .01 .07 .06 .04 .06 .14

NCR3*3571 .01 .03 .04 .00 .00 .00 .77 .94 .01 .40 .74 .68 .00 .00 .00 .06 .00 .01 .01 .01 .20 .03 .00 .00 .17

NCR3*3790 .02 .00 .03 .00 .00 .00 .01 .01 .87 .00 .01 .01 .00 .00 .00 .04 .00 .72 .00 .01 .14 .11 .03 .05 .12

NCR3*3918 .08 .03 .02 .19 .19 .11 .38 .23 .00 1.0 .62 .52 .00 .05 .01 .17 .00 .01 .29 .59 .31 .03 .03 .06 .37

TNF*1304 .04 .01 .04 .01 .01 .12 .72 .70 .01 .81 .96 .94 .00 .11 .00 .06 .00 .01 .01 .07 .23 .10 .05 .08 .20

TNF*851 .07 .02 .01 .01 .01 .08 .56 .54 .01 .62 .89 .99 .00 .29 .00 .10 .04 .01 .01 .19 .13 .20 .07 .12 .10

TNF*467 .07 .00 .01 .00 .00 .00 .00 .00 .00 .00 .00 .00 .25 .00 .00 .01 .00 .00 .00 .00 .02 .04 .11 .08 .02 SNP

Marker TNF*-238 .04 .00 .02 .00 .00 .26 .00 .00 .00 .13 .20 .59 .00 .71 .00 .03 .11 .00 .00 .63 .09 .02 .02 .03 .07

TNF*-244 .02 .24 .21 .00 .00 .00 .00 .00 .00 .01 .00 .00 .00 .00 .43 .02 .00 .00 .00 .00 .06 .04 .01 .02 .07

TNF*-308 .13 .00 .23 .01 .01 .01 .02 .03 .02 .12 .03 .06 .01 .01 .01 1.0 .01 .02 .02 .04 .97 .71 .27 .44 .98

TNF*-376 .01 .00 .01 .00 .00 .00 .00 .00 .00 .00 .00 .17 .00 .23 .00 .01 .25 .00 .00 .19 .02 .02 .01 .01 .02

TNF*-857 .02 .00 .03 .00 .00 .00 .01 .01 .67 .02 .01 .01 .00 .00 .00 .05 .00 .91 .01 .01 .17 .14 .04 .06 .14

TNF*-863 .05 .02 .02 .64 .64 .02 .01 .01 .00 .62 .01 .01 .00 .00 .00 .03 .00 .01 .80 .74 .11 .03 .01 .01 .10

TNF*-1031 .11 .01 .05 .38 .38 .23 .01 .01 .01 .76 .07 .23 .00 .38 .00 .07 .05 .01 .54 .98 .27 .05 .00 .01 .23

LTA*251 .01 .39 .89 .02 .02 .02 .03 .06 .04 .20 .08 .07 .01 .02 .02 .98 .01 .05 .03 .10 1.0 .98 .53 .71 1.0

NFKBIL1*15811 .10 .12 .40 .01 .01 .01 .02 .02 .04 .02 .04 .10 .01 .01 .01 .65 .01 .06 .01 .03 .98 1.0 .74 .92 .99

BAT1*1595 .17 .03 .06 .01 .01 .01 .02 .00 .02 .02 .03 .05 .02 .01 .01 .29 .01 .02 .01 .00 .62 .75 1.0 1.0 .75

BAT1*1715 .36 .06 .11 .01 .01 .01 .02 .00 .02 .04 .03 .06 .02 .01 .01 .43 .01 .02 .01 .01 .75 .90 1.0 1.0 .89

BAT1*7126 .01 .47 .77 .02 .02 .02 .04 .06 .04 .23 .07 .05 .01 .02 .02 .98 .01 .04 .03 .09 1.0 .99 .65 .86 1.0 AIF1*825 AIF1*206 NCR3*-412 NCR3*-204 NCR3*-172 NCR3*2708 NCR3*3008 NCR3*3571 NCR3*3790 NCR3*3918 TNF*1304 TNF*851 TNF*467 TNF*-238 TNF*-244 TNF*-308 TNF*-376 TNF*-857 TNF*-863 TNF*-1031 LTA*251 NFKBIL1*15811 BAT1*1595 BAT1*1715 BAT1*7126

Disease SNP Figure 7 Pairwise power map of 625 SNP pairs. The SNPs indicated along the x-axis are designated as hypothetical disease SNPs where the minor allele has a relative risk of 2. The power to detect an association at the Po0.01 level in a sample of 1000 cases and 1000 controls using a marker SNP listed on the y-axis is indicated in each cell. Cells with grey background indicate moderate power (0.5–0.8) and cells with black background indicate high power (0.8–B1).

In both of these diseases TNF remains a strong SNP alleles are often found to occur together in blocks, a candidate, but it is important to appreciate the potential consequence of low haplotypic diversity.23,24 The obser- involvement of other neighbouring genes, many of vation that many of these SNPs are redundant markers of which could be strong candidates for disease association each other has inspired the development of methods for as well. We explored the hypothesis that disease choosing a subset of SNPs that most efficiently distin- associations with TNF SNPs could be attributed to SNPs guish the haplotypes in a population15,22,25. In this paper, in the genes flanking TNF, which were not typed in the we evaluate subsets of SNPs based on the proportion of association studies cited above. the full haplotypic diversity that they explain. The measure of diversity we use is entropy, which is Entropy maximization method maximized when all haplotypes are of equal frequency. Haplotypic studies in the have shown that, The application of this method allows us to choose the over short distances, alleles of SNPs tend to be correlated most informative SNPs even in the absence of a clear such that of the 2n haplotypes possible with n SNPs, only block-like haplotypic structure. a small fraction of them are actually observed.21,22 This Not surprisingly, the single SNP with the greatest is especially true among non-African populations, where entropy is LTA*251, which has the greatest allele

Genes and Immunity Central MHC region flanking TNF HC Ackerman et al 482 frequency. The most informative pair of SNPs is created facilitated by more powerful relations between SNPs, with the addition of AIF1*825, a common SNP that is not this increased probability of detection would come at the correlated with LTA*251 (r2 ¼ 0.003). These two SNPs cost of decreased resolution: it would be more difficult to describe four haplotypes of moderate frequency (0.41, identify the causal variant when many SNPs appear 0.30, 0.16, and 0.13) for a diversity that is about 40% of associated with disease. For example, a recent study in a the full 25-SNP diversity. As more SNPs are included in Japanese population found SNPs in LST1, NCR3, LTB, the subset, the proportion of diversity explained in- TNF, LTA, NFKBIL1, ATP6VK, and BAT1 to be asso- creases. Interestingly, to generate the optimal six-SNP ciated with susceptibility to myocardial infarction.26 subset, the NKR3*3918 SNP is added, and the LTA*251 Functional data supported the notion that LTA*251 was and BAT1*1715 SNPs are replaced by the TNF*À308 and the causal variant. Interestingly, even in the diverse West BAT1*1595 SNPs. With the addition of NCR3*À412, 80% African population studied here, LTA*251 is in signifi- of the haplotypic diversity can be explained by seven cant LD with SNPs in every gene studied. This makes it SNPs. Efforts to choose SNPs that distinguish the an excellent marker for the detection of disease associa- common haplotypes at a gene locus represent a rational tions in this region of the central MHC, and raises the approach to reducing the number of SNPs genotyped in question of what role LTA*251 has played in associations a disease association study. Unfortunately, dissecting the previously attributed to SNPs in neighbouring genes. haplotypic structure does not ensure that the genotyped SNPs will be able to detect a disease-modifying SNP Possibility of selection when one exists in the gene region. That ability is Under a neutral model of molecular evolution, high- determined by the distribution of marker and disease frequency SNPs are expected to be old SNPs. In general, alleles on the underlying haplotypic structure. To these older, high-frequency SNPs are expected to have determine how well our haplotype-tagging SNPs would lower LD with other SNPs, because there has been ample detect potential disease-modifying SNPs in the region time for recombination events to restore linkage equili- flanking TNF, we examine the pairwise power map. brium between them. In contrast, under a scenario of natural selection, a favourable allele will rise in Pairwise power analysis frequency over a relatively short period of time, and The first SNP chosen by the haplotype-based method LD will persist. This may be the case with BAT1*7126 or was LTA*251. Fortunately, this SNP has moderate to high LTA*251, which are high-frequency SNPs but are in power to detect a relative risk of two at six other SNPs significant LD with most of the SNPs analysed in this among the 25 analysed in this region. The addition of gene region. Closer examination of this gene region may AIF1*825, an excellent haplotype-tagging SNP, does little reveal a signature of natural selection;23,27 if so, it would to improve the power to detect an association at the other be interesting to determine whether the locus under SNPs. The third haplotype-tagging SNP was selection is found among the classical HLA Class I/II NFKBIL1*15811; in terms of power to detect an associa- loci, or whether a favourable variant exists in a gene of tion it is redundant: LTA*251 was already a good marker the central MHC. of those SNPs that NFKBIL1*15811 has power to detect. Likewise, BAT1*1715 and AIF1*206, priority haplotype- tagging SNPs, contribute little additional power to detect Conclusion associations caused by other SNPs in this region. The We have generated a map of allelic association across an addition of NCR3*3918, which has power to detect 80 kb segment of the central MHC. We find that allelic associations at SNPs not covered by LTA*251, is the first association is heterogeneous as a consequence of the haplotype-tagging SNP to improve the power to detect complex haplotypic structure at this locus. This impres- disease associations at other SNPs. sion is confirmed by analysis of pairwise power which What combination of SNPs would provide good shows generally low power to detect disease-modifying coverage as measured by statistical power to detect a SNPs even over short distances, punctuated by a few disease association? By inspection of the pairwise power marker–disease SNP pairs with the power to detect map in Figure 7, the complementary combination of associations over much greater distances. Interestingly, LTA*251, TNF*1304, and TNF*À863 together have those SNPs that distinguish the common haplotypes of moderate to high power to detect disease associations this region do not offer the best power to detect disease at 16/25 of the SNPs analysed in this region, including associations. While our analysis is limited to 80 kb, the SNPs in every gene except AIF1. Interestingly, TNF*1304 heterogeneity of allelic association found thus far and TNF*À863 have little utility in distinguishing the suggests that a comprehensive association map of the common haplotypes in this gene region and have low central MHC is required for the planning and interpreta- allele frequencies (0.07 and 0.04, respectively). These tion of disease association studies in this region. findings suggest that haplotype-tagging SNPs may not necessarily offer the best power to detect an association; it is important to consider the pairwise relations between Methods SNPs when choosing markers for a disease association study. Subjects The methods of entropy maximization and pairwise Healthy unrelated adults were recruited in Banjul, The power mapping could be applied to data sets from Gambia. All adults were parents of children who other global populations. In non-African populations, presented to hospital with severe malaria. A total of 55 we might expect to see lower haplotypic diversity and family trios (mother, father, child), six parent–child pairs, consequently more powerful relations between SNPs. and three adult singletons comprised the study sample. While the detection of a disease association would be Since the healthy adults were parents of children with

Genes and Immunity Central MHC region flanking TNF HC Ackerman et al 483 severe malaria, it was possible that the parents’ trans- Haplotypic analysis mitted chromosomes would have allele frequencies that Gene diversity. Gene diversity, the probability that two differ from the general population. To test for this, we randomly sampled haplotypes will be different was compared the allele frequencies of transmitted chromo- calculated using somes against the allele frequencies of the untransmitted ! Xk chromosomes and found them to be the same. We n 2 H ¼ 1 À p ; present data which include both transmitted and À 1 n i¼1 untransmitted chromosomes from the healthy unrelated parents. When available, the genotypes of offspring were where n is the number of gene copies in the sample, k is used to help determine the phase of the parental the number of different haplotypes, and p is the haplotypes. The work was approved by the Gambia frequency of the ith haplotype.31 The sampling variance Government/Medical Research Council Joint Ethical of this estimate was estimated using the method of Nei Committee. and Roychoudhury.32 These calculations were performed using Arlequin.33 Ascertainment of SNP markers Linkage disequilibrium. Pairwise LDs were calculated A total of 10 SNPs in the TNF gene were previously 2 13 using the r statistic. Terms for allele and two-locus identified by sequencing 36 healthy Gambians. To these haplotype frequencies are given in Table 3. r2 was 10 SNPs we added one SNP in LTA, an NcoI RFLP 251 bp calculated using 28 ! from the start of transcription. Denaturing high- 2 performance liquid chromatography and sequencing 2 D was used to identify five SNPs in NCR3, and two SNPs r ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; p1p2q1q2 in AIF1 in 24 individuals of European descent.19 Interrogation of NCBI’s database, dbSNP, identified 45 where D ¼ f11f22Àf12f21. potential SNPs in a 100 kb segment centred on TNF. Of these 45, 12 were selected for genotyping, but only seven Entropy maximization method. A number of methods of them appeared polymorphic. This contributed three for selecting SNPs based on their ability to distinguish more SNPs in NCR3, one SNP in NFKBIL1, and three common haplotypes have been proposed.15,22,23 Here SNPs in BAT1, for a total of 25 SNPs over 85 kb (median we introduce a similar method that selects the subset distance between SNPs: 384 bp). of SNPs that accounts for the greatest proportion of the full haplotypic diversity at the locus of interest. Genotyping We use entropy, E, as a measure of haplotypic diversity: The amplification refractory system (ARMS) method was used to genotype SNPs.13,29 Primer se- Xk quences are given in Table 2. PCR amplification was E ¼À pi log pi; performed in a 15 ml reaction volume containing approxi- i mately 15 ng genomic DNA plus: 0.4 mM total dNTPs; where pi is the frequency of the ith haplotype and there 16.6 mM ammonium sulphate; 1.9 mM magnesium chlor- are k unique haplotypes in the sample. Entropy is ide; 67.9 mM Trisbase (pH 8.9); 0.1% v/v Tween 20; 20 ng maximized when all haplotypes have equal frequencies. of each allele-specific and consensus primer; 5 ng of each Richard Mott designed an algorithm that identifies positive control primer; 0.25 U of Bioline Taq. An MJ subsets of SNPs that dissect the haplotypes into groups Tetrad thermocycler was programmed as follows: 961C approaching equal size, thus maximizing entropy. The for 1 min; five cycles of 961C for 35 s, 701C for 45 s, 721C entropy of these optimal subsets of SNPs (as the number for 35 s; 21 cycles of 961C for 25 s, 651C for 50 s, 721C for of SNPs in the subset increases from 1 to 25) is plotted in 40 s; six cycles of 961C for 35 s, 551C for 1 min, 721C for Figure 5. 1.5 min. The annealing temperatures for each set of cycles (indicated above as 70, 65, and 551C) were changed to 68, Pairwise power map. As an alternative to a LD or 64, and 561C for the genotyping of some SNPs. These two association statistic, we propose calculating the power of thermocycling conditions are indicated in Table 2 as one SNP to detect an association at another SNP. temperatures A and B, respectively. The accuracy of the Consider a true disease-modifying SNP A that is in LD ARMS PCR method was tested on sequenced individuals with a neighbouring nonfunctional SNP B. The respec- before extending it to DNA samples of unknown tive allele and haplotype frequencies are given in Table 3. genotype. In a random sample of the population, the frequency of SNP B, q2, is expected to be

Haplotype construction f12 þ f22 q2 ¼ : Family and population data were integrated to generate f11 þ f12 þ f21 þ f22 the most certain haplotype reconstructions from geno- typic data. This was implemented using the program We define RA as the relative risk associated with allele

PHAMILY to parse pedigrees for phase information A2 compared to allele A1. Under a multiplicative mode before sending the genotypes to the program PHASE to of inheritance, the frequencies of the different haplo- reconstruct haplotypes.30 Individuals with missing sites types found in diseased individuals are expected to

(because of genotyping failure) were not included in the be f11, f12, RAf21, and RAf22. Given the risk associated analysis. with A2 and the frequency of each haplotype, we can

Genes and Immunity Central MHC region flanking TNF HC Ackerman et al 484 Table 2 Primer sequences and thermocycling conditions

SNP Primer sequence Primer type Thermocycling program

AIF1*825 gct cta ggt gag tct tgg g Conserved A atg gcg ata ttg gtg aga aac Allele-specific atg gcg ata ttg gtg aga aat Allele-specific AIF1*206 agc atc tgc tga gct atg ag Conserved A ctg ggc ctt cag cag tcc Allele-specific ctg ggc ctt cag cag tct Allele-specific NCR3*À412 agc ttc acc aat cag ctt gc Conserved A ctt tta ccc aga aca agc ctc Allele-specific ctt tta ccc aga aca agc ctg Allele-specific NCR3*À204 ggg gat ctg agc agt gag gt Conserved B cct cca gca gca tct gtT ct Allele-specific cct cca gca gca tct gtT cc Allele-specific NCR3*À172 cct tgg gtt agc agc cat ct Conserved A cac aac tgc cag ggG cct c Allele-specific cac aac tgc cag ggG cct t Allele-specific NCR3*2708 tgg gat gtt cta ctc caa gc Conserved A atc tcg gaa cca cgt gac g Allele-specific atc tcg gaa cca cgt gac a Allele-specific NCR3*3008 ccg gag aga gta gat ttg gc Conserved A caa tgt cct tgg gag gca g Allele-specific caa tgt cct tgg gag gca a Allele-specific NCR3*3571 cag cac cgt cta tta cca gg Conserved A acc aca gcc ggc/t agc tgc Allele-specific acc aca gcc ggc/t agc tga Allele-specific NCR3*3790 ccc act tct gtg tct cag tcc Conserved B tga aca ctg tca ttc acT caa t Allele-specific tga aca ctg tca ttc acT caa c Allele-specific NCR3*3918 atg gga agg atc aga tat gac tc Conserved A ctc gag cct ccg ttc aaa t Allele-specific ctc gag cct ccg ttc aaa a Allele-specific TNF*1304 gta agt gtc tcc aaa cct ctt Conserved A cca tca gcc ggg ctt caa t Allele-specific cca tca gcc ggg ctt caa c Allele-specific TNF*851 aag aca cat cct cag agc tc Conserved A tgc tgg aag gtg aat aca cg Allele-specific atg ctg gaa ggt gaa tac aca Allele-specific TNF*467 ctc ttt ccc tga gtg tct tc Conserved A gtg cgc tga tag gga ggg Allele-specific gtg cgc tga tag gga gga Allele-specific TNF*À238 ggg gtc tgt gaa ttc ccg g Conserved A ccc cat cct ccc tgc tcc Allele-specific ccc cat cct ccc tgc tct Allele-specific TNF*À244 ggc tgg gtg tgc caa caa c Conserved A cca gaa gac ccc cct cg Allele-specific cca gaa gac ccc cct ca Allele-specific TNF*À308 ggc tgg gtg tgc caa caa c Conserved A ata ggt ttt gag ggg cat gg Allele-specific tag gtt ttg agg ggc atg a Allele-specific TNF*À376 ggc tgg gtg tgc caa caa c Conserved B cct gca tcc tgt ctg gaa g Allele-specific tcc tgc atc ctg tct gga aa Allele-specific TNF*À857 aag gat aag ggc tca gag ag Conserved B tct aca tgg ccc tgt ctt cg Allele-specific tct aca tgg ccc tgt ctt ca Allele-specific TNF*À863 ccg gga att cac aga ccc c Conserved B cga gta tgg gga ccc ccc Allele-specific gag tat ggg gac ccc ca Allele-specific TNF*À1031 ccg gga att cac aga ccc c Conserved B caa agg aga agc tga gaa gat Allele-specific caa agg aga agc tga gaa gac Allele-specific LT*251 gca ggt gag gct ctc ctg Conserved B gga agg gaa cag aga gga at Allele-specific gga agg gaa cag aga gga ac Allele-specific NFKBIL1*15811 gtc tca ttc ttg ggg ctt tg Conserved A gag att tag aac atc acg cac agt Allele-specific gag att tag aac atc acg cac agg Allele-specific BAT1*1595 gaa gcg ctc ata ttc ctt gc Conserved A tgg ggt tca tga ttt aga tTa cat Allele-specific tgg ggt tca tga ttt aga tTa cag Allele-specific BAT1*1715 acc ctt gac aac ctg aca gc Conserved A cga gtg tga cac atc aTc agt Allele-specific cga gtg tga cac atc aTc agc Allele-specific BAT1*7126 tct caa agg gag agc aag ga Conserved A tcc att gca gca ttc tga tct Allele-specific tcc att gca gca ttc tga tca Allele-specific

Genes and Immunity Central MHC region flanking TNF HC Ackerman et al 485 Table 3 Terms for allele and two-locus haplotype frequencies 7 Knight JC, Kwiatkowski D. Inherited variability of production and susceptibility to infectious Disease SNP disease. Proc Assoc Am Physicians 1999; 111: 290–298. Allele A1 A2 8 McGuire W, Hill AV, Allsopp CE, Greenwood BM, Kwiat- kowski D. Variation in the TNF-alpha promoter region associated with susceptibility to cerebral malaria. Nature Frequency p1 p2 1994; 371: 508–510. 9 McGuire W, Knight JC, Hill AV, Allsopp CE, Greenwood BM, Kwiatkowski D. Severe malarial anemia and cerebral malaria B1 q1 A1B1 A2B1 are associated with different tumor necrosis factor promoter

Marker f11 f21 alleles. J Infect Dis 1999; 179: 287–290. 10 Moffatt MF, Cookson WO. Tumour necrosis factor haplotypes SNP B2 q2 A1B2 A2B2 and asthma. Hum Mol Genet 1997; 6: 551–554. f12 f22 11 Nadel S, Newport MJ, Booy R, Levin M. Variation in the tumor necrosis factor-alpha gene promoter region may be associated with death from meningococcal disease. J Infect Dis 1996; 174: 878–880. 12 Negoro K, Kinouchi Y, Hiwatashi N et al. Crohn’s disease is derive the frequency of allele B2 among diseased 0 0 associated with novel polymorphisms in the 5 -flanking region individuals (q 2): of the tumor necrosis factor gene. Gastroenterology 1999; 117: 1062–1068. 0 f12 þ RAf22 q2 ¼ : 13 Richardson A, Sisay-Joof F, Ackerman H et al. Nucleotide f11 þ f12 þ RAf21 þ RAf22 diversity of the TNF gene region in an African village. Genes Immun 2001; 2: 343–348. We then calculate the power, the chance 1Àb of correctly 14 Udalova IA, Richardson A, Denys A et al. Functional conse- 0 declaring that the proportions q 2 and q2 differ, using the quences of a polymorphism affecting NF-kappaB p50-p50 bind- following equation:20 ingtotheTNFpromoterregion.Mol Cell Biol 2000; 20: 9113–9119. pffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 15 Ackerman H, Usen S, Mott R et al. Haplotypic analysis of the 0 2 TNF locus by association efficiency and etropy. Genome Biology c =2 2q2q1 Àjq2 À q2j N À jq0 Àq j 2 2 2002; 4: R24 http://genomebiology.com/2003/4/4/R24 c1 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; À 0 0 16 Nordborg M, Tavare S. Linkage disequilibrium: what history q2q1 þ q2q1 has to tell us. Trends Genet 2002; 18: 83–90. 17 Thompson EA, Neel JV. Allelic disequilibrium and allele where a is the type I error rate or the significance level of frequency distribution as a function of social and demo- 0 the test, q2 ¼jq2 þ q2j=2, q1 ¼ 1 À q2, and N is the graphic history. Am J Hum Genet 1997; 60: 197–204. number of cases which is equal to the number of 18 Reich DE, Cargill M, Bolk S et al. Linkage disequilibrium in the controls. The probability p associated with c1Àb can human genome. Nature 2001; 411: 199–204. be determined from p ¼ Pr(z4c1Àb), where z has the 19 Ribas G, Neville MJ, Campbell RD. Single-nucleotide poly- standard normal distribution. morphism detection by denaturing high-performance liquid chromatography and direct sequencing in genes in the MHC class III region encoding novel cell surface molecules. Acknowledgements Immunogenetics 2001; 53: 369–381. 20 Fleiss JL. Statistical Methods for Rates and Proportions. Wiley: We wish to thank our colleagues at the Royal Victoria New York, 1981; pp. 38–42. Hospital in Banjul. This work was funded by the Medical 21 Rioux JD, Daly MJ, Silverberg MS et al. Genetic variation in the Research Council, UK, the Rhodes Trust, and the 5q31 cytokine gene cluster confers susceptibility to Crohn Harvard Medical School Office of Enrichment Programs. disease. Nat Genet 2001; 29: 223–228. 22 Johnson GC, Esposito L, Barratt BJ et al. Haplotype tagging for the identification of common disease genes. Nat Genet 2001; 29: References 233–237. 23 Hull J, Ackerman H, Isles K et al. Unusual haplotypic structure 1 The MHC Sequencing Consortium. Complete sequence and of il8, a susceptibility locus for a common respiratory virus. gene map of a human major histocompatibility complex. Am J Hum Genet 2001; 69: 413–419. Nature 1999; 401: 921–923. 24 Gabriel SB, Schaffner SF, Nguyen H et al. The structure of haplo- 2 Cabrera M, Shaw MA, Sharples C et al. Polymorphism in type blocks in the human genome. Science 2002; 296: 2225–2229. tumor necrosis factor genes associated with mucocutaneous 25 Zhang K, Deng M, Chen T, Waterman MS, Sun F. A dynamic leishmaniasis. J Exp Med 1995; 182: 1259–1264. programming algorithm for haplotype block partitioning. Proc 3 Conway DJ, Holland MJ, Bailey RL et al. Scarring trachoma is Natl Acad Sci USA 2002; 99: 7335–7339. associated with polymorphism in the tumor necrosis factor 26 Ozaki K, Ohnishi Y, Iida A et al. Functional SNPs in the alpha (TNF-alpha) gene promoter and with elevated TNF- lymphotoxin-alpha gene that are associated with susceptibil- alpha levels in tear fluid. Infect Immun 1997; 65: 1003–1006. ity to myocardial infarction. Nat Genet 2002; 32: 650–654. 4 Dunstan SJ, Stephens HA, Blackwell JM et al. Genes of the 27 Sabeti PC, Reich DE, Higgins JM et al. Detecting recent class II and class III major histocompatibility complex are positive selection in the human genome from haplotype associated with typhoid fever in Vietnam. J Infect Dis 2001; structure. Nature 2002; 419: 832–837. 183: 261–268. 28 Wilson AG, di Giovine FS, Blakemore AI, Duff GW. Single 5 Fernandez-Arquero M, Arroyo R, Rubio A et al. Primary base polymorphism in the human tumour necrosis factor association of a TNF gene polymorphism with susceptibility alpha (TNF alpha) gene detectable by NcoI restriction of PCR to multiple sclerosis. Neurology 1999; 53: 1361–1363. product. Hum Mol Genet 1992; 1: 353. 6 Knight JC, Udalova I, Hill AV et al. A polymorphism that 29 Newton CR, Graham A, Heptinstall LE et al. Analysis of any affects OCT-1 binding to the TNF promoter region is point mutation in DNA. The amplification refractory mutation associated with severe malaria. Nat Genet 1999; 22: 145–150. system (ARMS). Nucleic Acids Res 1989; 17: 2503–2516.

Genes and Immunity Central MHC region flanking TNF HC Ackerman et al 486 30 Stephens M, Smith NJ, Donnelly P. A new statistical method complex with lymphotoxin on the cell surface. Cell 1993; 72: for haplotype reconstruction from population data. Am J Hum 847–856. Genet 2001; 68: 978–989. 40 Nakamura T, Tashiro K, Nazarea M, Nakano T, Sasayama S, 31 Nei M. Molecular Evolutionary Genetics. Columbia University Honjo T. The murine lymphotoxin-beta receptor cDNA: Press: New York, 1987; 176–181. isolation by the signal sequence trap and chromosomal 32 Nei M, Roychoudhury AK. Sampling variances of hetero- mapping. Genomics 1995; 30: 312–319. zygosity and genetic distance. Genetics 1974; 76: 379–390. 41 Nedospasov SA, Shakhov AN, Turetskaya RL et al. Tandem 33 Schneider S, Roessli D, Excoffier L. Arlequin ver. 2.000: A arrangement of genes coding for tumor necrosis factor (TNF- Software for Population Genetics Analysis. Genetics and Biometry alpha) and lymphotoxin (TNF-beta) in the human genome. Laboratory, University of Geneva: Switzerland, 2000. Cold Spring Harbor Symp Quant Biol 1986; 51(Part 1): 611–624. 34 Autieri MV, Agrawal N. IRT-1, a novel interferon-gamma- 42 Old LJ. Tumor necrosis factor (TNF). Science 1985; 230: responsive transcript encoding a growth-suppressing basic 630–632. leucine zipper protein. J Biol Chem 1998; 273: 14731–14737. 43 Gray PW, Aggarwal BB, Benton CV et al. Cloning and 35 Neville MJ, Campbell RD. A new member of the Ig super- expression of cDNA for human lymphotoxin, a lymphokine family and a V-ATPase G subunit are among the predicted with tumour necrosis activity. Nature 1984; 312: 721–724. products of novel genes close to the TNF locus in the human 44 Aggarwal BB, Eessalu TE, Hass PE. Characterization of MHC. J Immunol 1999; 162: 4745–4754. receptors for human tumour necrosis factor and their 36 Holzinger I, de Baey A, Messer G, Kick G, Zwierzina H, regulation by gamma-interferon. Nature 1985; 318: 665–667. Weiss EH. Cloning and genomic characterization of LST1: a 45 Albertella MR, Campbell RD. Characterization of a novel gene new gene in the human TNF region. Immunogenetics 1995; 42: in the human major histocompatibility complex that encodes a 315–322. potential new member of the I kappa B family of proteins. 37 Rollinger-Holzinger I, Eibl B, Pauly M et al. LST1: a gene with Hum Mol Genet 1994; 3: 793–799. extensive and immunomodulatory func- 46 Peelman LJ, Chardon P, Nunes M et al. The BAT1 gene in tion. J Immunol 2000; 164: 3169–3176. the MHC encodes an evolutionarily conserved putative 38 Neville MJ, Campbell RD. Alternative splicing of the LST-1 nuclear RNA helicase of the DEAD family. Genomics 1995; gene located in the Major Histocompatibility Complex on 26: 210–218. human 6. DNA Seq 1997; 8: 155–160. 47 Fleckner J, Zhang M, Valcarcel J, Green MR. U2AF65 recruits a 39 Browning JL, Ngam-ek A, Lawton P et al. Lymphotoxin beta, a novel human DEAD box protein required for the U2 snRNP- novel member of the TNF family that forms a heteromeric branchpoint interaction. Genes Dev 1997; 11: 1864–1872.

Genes and Immunity