<<

Genes and Immunity (1999) 1, 120–129  1999 Stockton Press All rights reserved 1466-4879/99 $15.00 http://www.stockton-press.co.uk Taxonomic hierarchy of HLA class I allele sequences

LM McKenzie1, J Pecon-Slattery1, M Carrington2 and SJ O’Brien1 1Laboratory of Genomic Diversity, Frederick Cancer Research and Development Center, National Cancer Institute, Frederick, MD, 21702, USA; 2Intramural Research Support Program, SAIC-Frederick, National Cancer Institute-Frederick Cancer Research and Development Center, Frederick, MD, 21702, USA

The markedly high levels of polymorphism present in classical class I loci of the human major histocompatibility complex have been implicated in infectious and immune disease recognition. The large numbers of alleles present at these loci have, however, limited efforts to verify associations between individual alleles and specific diseases. As an approach to reduce allele diversity to hierarchical evolutionarily related groups, we performed phylogenetic analyses of available HLA- A, B and C allele complete sequences (n = 216 alleles) using different approaches (maximum parsimony, distance-based minimum evolution and maximum likelihood). Full nucleotide and amino acid sequences were considered as well as abridged sequences from the hypervariable peptide binding region, known to interact in vivo, with HLA presented foreign peptide. The consensus analyses revealed robust clusters of 36 HLA-C alleles concordant for full and PBR sequence analyses. HLA-A alleles (n = 60) assorted into 12 groups based on full nucleotide and amino acid sequence which with few exceptions recapitulated serological groupings, however the patterns were largely discordant with clusters prescribed by PBR sequences. HLA-B which has the most alleles (n = 120) and which unlike HLA-A and -C is thought to be subject to frequent recombinational exchange, showed limited phylogenetic structure consistent with recent selection driven retention of maximum heterozygosity and population diversity. Those allele categories recognized offer an explicit phylogenetic criterion for grouping alleles potentially relevant for epidemiologic associations, for inferring the origin of MHC genome organization, and for comparing functional constraints in peptide presentation of HLA alleles.

Keywords: evolution; HLA; phylogenetic; peptide binding region

Introduction critical for peptide binding and are therefore thought to be critical for T recognition9 and represent the func- The human major histocompatibility complex (MHC) is tional part of the molecule. Further, patterns of a multi-gene family comprised of several loci which are mutational substitutions within the PBR have affirmed 1,2 marked by unusually high levels of polymorphism. the strong role of historic selection pressures in preserv- Class I genes (HLA-A, -B and -C) encode molecules which ing functional variation, likely with peptide binding present antigenic peptides on the cell surface for recog- consequences, for this region.8,11–13 + nition by CD8 T cells. Previous research with class I loci To date there are some 108 HLA-A, 223 HLA-B and 67 indicate a correlation between specific alleles and the HLA-C alleles which have been classified into groups or 3–6 genetic basis of resistance to particular diseases. How- families on the basis of serological reactivity.14 For ever, the relationships uniting each of the alleles at these example, HLA-A alleles separate into six serological fam- loci are less clear. Competing processes of selection, ilies; A1/3/11, A9, A2/28, A10, A1915 and A80.16,17 A recombination, functional constraints, and mutation rates small number of studies describe some possible mole- obscure efforts to identify groups of alleles involved with cular relationships among HLA-A, -B and -C alleles 2,7,8 specific diseases. respectively. A preliminary analysis of available cDNA The majority of polymorphic sites in class I alleles are sequences of alleles of the HLA- Aw19 complex (A29, found in exons 2 and 3 which code for the alpha 1 and A30, A31, A32 and A33) reveal that, with the exception alpha 2 domains of the mature protein. These domains of A30 (which is genetically related to the A1/A3/A11 9,10 form the peptide binding region (PBR). Fifty-seven group), seven nucleotides are specific to this group.15 residues within the PBR have been identified as being Further, these Aw19 specific positions were spread over all exons except for the highly variable exon 2. Similarly, an analysis of the nucleotide sequence relationships among 16 HLA-A alleles using maximum parsimony sug- Correspondence: Stephen J O’Brien, PhD, Laboratory of Genomic Diver- gest these alleles cluster according to serological typing, sity, Frederick Cancer Research and Development Center, Building 560, again with the exception of A30 which grouped with the Frederick, MD, 21702, USA. E-mail: [email protected] A3/11/1 family.18 However, this study is limited by The publisher or recipient acknowledges the right of the US assessment of a single allele from each serological group- Government to retain a non-exclusive, royalty-free license in and ing and a lack of statistical analysis of the phylogenetic to any copyright covering the article. This project has been funded in part with Federal funds from the clusters. Likewise, an analysis of complete cDNA National Cancer Institute, National Institutes of Health, under Con- sequences of 14 HLA-C alleles revealed that HLA-C alleles tract No. NO1-CO-56000. divide into two groups, one comprising the Cw7 sero- Taxonomic hierarchy of HLA class I allele sequences LM McKenzie et al 121 type, and the other group including all remaining bootstrap values (as an indication of confidence for the alleles.19 These and other studies have sought to elucidate divergent nodes) generated by the ME and MP analyses the sequence relationships of a subset of HLA class I and significant associations by the ML method. Only ME alleles by parsimony or distance-based phylogenetic trees of nucleotide data are presented as figures in this methods.20,21 manuscript. ME trees of amino acids and MP and ML The extensive polymorphism in HLA class I genes is trees can be viewed on the Laboratory of Genomic Diver- thought to increase the repertoire of immune responses sity (LGD) web site which is located at which an individual can mount. Generation and mainte- http://Igd.nci.nih.gov nance of polymorphism at these loci appear to be the result of selection, interallelic recombination or gene con- HLA-A version.22–24 Present allelic variation has been studied Complete sequence analyses: All five methods of phylogen- extensively to reveal possible associations between dis- etic analysis using complete nucleotide and amino acid ease susceptibility/resistance or disease progression and sequences of HLA-A alleles exhibited a consistent pattern specific HLA alleles. HLA associations have been ident- among trees (Figure 1a, Table 1, LGD web site). Phylo- ified for several infectious diseases including malaria and genetic analyses resulted in a basal bifurcation. Of the HLA-B53,3,4 chronic arthritis and HLA-B276 and various two major evolutionary groups formed, the first includes HLA alleles and the rate of disease progression in HIV.5 alleles of serological families A1, A36, A11, A3, A9, A- .(The large number of class I alleles is responsible, at 8001 and two of ten A19 alleles (herein designated A19ٙ least in part, for low levels of statistical confidence for Specific alleles included in each serological family are reported associations plus the failure to reproduce epide- indicated in Figure 1a after Bodmer et al.14 Within this miological efforts in parallel studies. Here we describe a HLA-A lineage, additional resolution occurred, highly series of phylogenetic analyses of all complete HLA class supported by bootstrap values, in accordance with recog- I allele sequences to identify alleles with a common nized serological terminology. Thus, all five complete evolutionary history. These analyses use patterns of sequence topologies depicted the A9 family as a separate mutations to establish a hierarchical system of grouping subgroup as well as a consistent close association alleles. Whilst any groups identified may reflect evolu- between A1 and A36 allele families (Table 1). Three of tionary relationships among alleles, they may not neces- five analyses indicated an additional association between sarily reflect functional relationships. Hughes et al25 sug- A1, A11 and A36 families. The single allele of the A80 gest that there is a relationship between the PBR region family (A-8001) was placed within this evolutionary clus- of HLA molecules and their role in disease processes. ter in all analyses except the distance-based phylogeny Since the majority of genetic polymorphism between of amino acids. class I alleles is found in the 57 functional residues of The second major evolutionary lineage within HLA-A the PBR, a region implicated in presentation and consisted of families A2, A28, A10, A4301, and the disease association, it was of interest to compare phylo- remainder of the A19 family identified as A19Љ (Figure genetic relationships among the functional residues of the 1a). Although the relative branching order uniting these PBR as well as the complete sequences. distinct families varied with the five methods employed, Here we report a comprehensive phylogenetic analysis the integrity of the family groupings was well-supported of currently available HLA-A, -B and -C alleles using by high bootstrap values (Table 1). The most consistent maximum parsimony, distance-based and maximum findings for higher-order clustering were the inclusion of likelihood phylogenetic methods. Analysis was perfor- A4301 with A10 alleles and the close association between med for each locus using the complete cDNA sequences, the A2 and A28 families. complete amino acid sequences, and the cDNA and amino acid sequences for the 57 functional residues of Analyses of PBR sequences: Comparisons among the five the PBR. The phylogenetic patterns seen provide a formal trees generated from sequence data for the 57 residues of basis for evolutionarily-based groupings of HLA class I the PBR using the same phylogenetic approaches indi- alleles plus identify some clusters of alleles that can be cated a less consistent pattern among themselves than the complete sequence data (Figure 1b, Table 1, LGD web assessed in disease associations based upon well sup- ٙ ported phylogenetic associations. site). In the five trees, with the exception of A19 , and to a lesser extent, A2, A3, and A11, only minimal measures of support for clustering alleles into previously estab- Results lished families implicated by the complete sequences Separate phylogenetic trees for HLA-A,-B and -C loci analyses were recapitulated. The remainder of the PBR were generated for nucleotide and amino acid alignment analyses resolved sporadically additional associations of alleles. A total of five methods of analysis consisting within and between families but were not consistent of minimum evolution (ME), maximum parsimony (MP) among the five methods using PBR residues (Table 1). and maximum likelihood (ML) with nucleotide and ME These consisted of the A1 family, and higher order affili- and MP with amino acid data were performed. Trees ations of A1, A36 and A11 families and the inclusion of A8001 within the major lineage of A1, A3, A36, A11, generated using complete sequence data were compared ٙ and alleles which consistently clustered together by all A19 and A9 families. No phylogenetic associations were methods were designated as being a member of a parti- apparent in the analyses of the PBR which were not cular group. Concordance between these groupings and already observed in analyses of complete sequence data. the results of analyses performed using the PBR HLA-B sequences only were then investigated. The results of these comparisons are listed in Tables 1, 2 and 3 for HLA- Complete sequences analyses: Results of the analysis of com- A, -B and -C alleles, respectively. These tables list the plete HLA-B sequences exhibited minimal statistically Taxonomic hierarchy of HLA class I allele sequences LM McKenzie et al 122 Table 1 Bootstrap values in support of phylogenetic groups with HLA-A sequences identified in distance (ME) and maximum parsimony (MP) analyses

Group No. Complete sequence PBR sequence alleles ME MP ML ME MP ME MP ML ME MP (nt) (nt) (nt) (aa) (aa) (nt) (nt) (nt) (aa) (aa)

A1 2 90 96 Y 93 89 89 – – 94 69 A3 2 93 89 Y 100 85 – 84 Y 100 82 A9 7 96 100 Y 99 91 N – Y 94 87 A11 2 91 94 Y 99 85 75 – Y 94 64 A19ٙ 2 100 100 Y 99 96 88 82 Y 76 61 A1, A36 3 98 92 Y 99 93 65 – – 91 82 A1, A11, A36 5 76 N Y 59 – – – – – – above plus A8001 17 59 74 Y N 64 – – – – – A2 16 82 99 Y 74 80 N 85 Y 70 69 A19Љ 8 85 93 Y 81 82 – – Y – N A2, 28 21 87 93 Y 80 84 – – – – N A10 plus A4301 14 79 63 – 79 59 – – – – – A10, 19Љ 22 66 – – 70 55 – – – – – A2, A28, A10, A19Љ 43 69 67 – N 58 – – – – –

N indicates topologies not supported by bootstrap values; Y indicates that the node is significant at P Ͻ 0.01 in the maximum likelihood (ML) analyses; – indicates that there was no association seen between alleles in these analyses; nt indicates nucleotide sequences and aa indicates amino acid sequences.

Table 2 Bootstrap values in support of phylogenetic groups with HLA-B sequences identified in distance (ME) and maximum parsimony (MP) analyses

Group No. Complete sequence PBR sequence alleles ME (nt) MP (nt) ME (aa) MP (aa) ME (nt) MP (nt) ME (aa) MP (aa)

B5 6 NNNN – – NN B7 6 N–52N –––N B8 2 85 94 87 92 – 64 59 72 B13 3 97989789N – 53N B14 2 86 96 94 87 N 67 100 92 B16 14 N N N – – – N N B18 2 98969997N N 9482 B22 7 N N – N – – – N B27 8 73 N 90 – N N 77 – B40 (61) 6 79 N 91 N – – – N B44 (12) 6 73 97 98 89 – 64 – N B57 2 100 100 98 98 52 N – 50 B58 2 98749563N N – – B60 3 73 N 98 N – N 75 68 B63 (15) 2 79 N 86 – – – 51 – B70 5 N–N– –––– B78 2 87 N 95 N N 53 81 62 B5,78 8 N61N– –––– B57, 58 4 74 – 52 N – 80 67 N B16, 67 15 N – – – – – – N B14, 16, 67 16 73 64 – N –––– B57, 58, 63 (15) 6 72 – 56 – 53 – 94 57 B50 (21), 49 (21), 45 (12) 4 – 97 83 87 – – – N

N indicates topologies not supported by bootstrap values; Y indicates that the node is significant at P Ͻ 0.01 in the maximum likelihood (ML) analyses; – indicates that there was no association seen between alleles in these analyses; nt indicates nucleotide sequences and aa indicates amino acid sequences.

supported phylogenetic structure (Figure 2a, Table 2, Seven groups (B8, B13, B44(12), B50(21)+B49(21)+B45(12), LGD web site). No consistent higher-order structure B57, B18 and B14) totaling 21 alleles formed well sup- among serological groupings across the four different ported clades in all four trees; one group (B58, two methods (ME and MP using nucleotide and amino acid alleles) was present in three of four trees and five groups sequences) was defined as nearly all internal nodes col- (B60, B27, B40(61), B63(15) and B78) totaling 21 alleles lapsed in the bootstrap analyses. However, some phylo- were derived in two of four analyses (Table 2). Associ- genetic associations were apparent at terminal nodes of ations were implied among a further nine groups of the trees that corroborate serological nomenclature. alleles (Table 2) in one or more of the analyses but boot- Taxonomic hierarchy of HLA class I allele sequences LM McKenzie et al 123 0.01 in the maximum likelihood (ML) analyses; – indicates that there was no Ͻ P ––––Y N978986Y ––––Y N93––Y ––––– ––8174– ME(nt) MP (nt) ML (nt) ME (aa) MP (aa) ME (nt) MP (nt) ME (aa) MP (aa) ML (nt) ME (nt) MP (nt) ME (aa) MP (aa) ML (nt) alleles Bootstrap values in support of phylogenetic groups with HLA-C sequences identified in distance (ME) and maximum parsimony (MP) analyses GroupCw1Cw14 No.Cw1, Cw14Cw4Cw18Cw18, Cw4Cw3Cw2 4Cw15 2 2Cw5, Cw8Cw16 Complete 5 sequenceall 3 the above 2plus chimp 65 92Cw7 85Cw7, 3 gorilla 4 2 63 100 N 5 81 88 61 2 69 Y Y 79 69 95 – 82 85 67 100 4 100 Y Y 97 80 88 Y 67 97 98 93 99 96 – Y 64 96 N 86 93 100 PBR 99 Y sequence Y 80 65 Y Y 99 – 60 Y 50 93 83 88 100 N Y 90 96 95 99 97 – 69 75 Y 86 97 69 69 – – 83 73 98 83 86 58 81 100 73 57 – – 98 N – 96 58 – 65 65 64 99 N 94 N – 88 83 – N no 56 – PBR – Y sequence N 88 62 N – – Y N N N Y 51 71 94 Y Y 67 77 68 N Y N 76 Y 96 – – 84 75 Y N N 76 Y 83 93 51 N 99 Y 61 – 61 56 100 Y 91 N 91 94 56 – N – – 90 99 Y 91 99 81 87 – – Y 92 Y 93 93 95 80 98 Y Y – 74 94 Y 99 80 Y Y 62 97 Y Y Y Table 3 N indicates topologiesassociation not seen supported between by alleles bootstrap in values; these Y analyses; indicates nt that indicates nucleotide the sequences node and is aa significant indicates at amino acid sequences. Taxonomic hierarchy of HLA class I allele sequences LM McKenzie et al 124

Figure 1 Minimum evolution analyses of HLA-A (a) complete nucleotide sequences and (b) peptide binding region nucleotide sequences. Branch lengths are proportional and represent estimated % sequence divergence. Numbers in italics are values from 100 bootstrap iterations. Numbers and letters in bold indicate proposed phylogenetic clades.

strap support for these clades was minimal. The clusters which, in general, were well supported by strong boot- of alleles, in general, reflected serological nomenclature. strap values. Cw14 and Cw1 alleles clustered together in However, 38 alleles from various serofamilies remained all trees but was only supported statistically in the ML unaligned. tree and by high bootstrap values in the ME analysis of amino acid sequences. Cw18 and Cw4 alleles formed a Analyses of PBR sequences: Phylogenetic structure was sys- clade in all trees but the association was only significantly tematically reduced in the four analyses of the PBR supported in the ML analysis. The five trees failed to sequences relative to that of complete sequence data produce a consistent pattern between group topological (Figure 2b, Table 2, LGD web site). With the possible hierarchy consistent with recent divergence of these exception of B8, B78 B14 and B18 families, there was no modern clades. consistent tree topology among the methods and little association was seen between alleles even at terminal Analyses of PBR sequences: Trees derived from analysis of nodes (Table 2). Moreover, three serological families (B60, sequences from the 57 residues of the PBR, in general, B18 and B14) exhibited a trend whereby translation into matched those from the complete sequences analyses amino acids improved phylogenetic associations based (Figure 3b, Table 3, LGD web site). The PBR analyses dif- on the PBR residues. Again, as with HLA-A, no additional fered by having lower bootstrap values, particularly for phylogenetic associations were observed in PBR analyses groups formed by Cw4 and Cw15 alleles (Table 3). Cw16 that were not apparent in the analyses of the complete alleles were only associated in the ME analysis based on sequences. amino acid sequences and the ML analysis. In addition, Cw14 and Cw1 alleles formed unsupported clusters in HLA-C both ME analyses and the ML analysis but were not asso- ciated in the MP analyses. The predicted association by Complete sequence analyses: The phylogenetic analyses of complete sequence analyses between Cw5 and Cw8 complete sequences of HLA-C alleles exhibited high fid- alleles was only detected in the ML analysis of PBR elity consistent across all methods between evolutionary sequences. associations and serological nomenclature (Figure 3a, Because of the high fidelity seen in comparisons of Table 3, LGD web site). Phylogenetic reconstruction analyses of complete sequence data with that of just the resulted in HLA-C alleles forming 12 significant clusters PBR we analyzed sequences with the PBR removed. Taxonomic hierarchy of HLA class I allele sequences LM McKenzie et al 125

Figure 2 Minimum evolution analyses of HLA-B (a) complete nucleotide sequences and (b) peptide binding region sequences. Branch lengths are proportional and represent estimated % sequence divergence. Numbers in italics are values from 100 bootstrap iterations. Numbers and letters in bold indicate proposed phylogenetic clades. Taxonomic hierarchy of HLA class I allele sequences LM McKenzie et al 126 s and letters in bold indicate proposed ) non-peptide binding region nucleotide sequences. c ) peptide binding region nucleotide sequences; and ( b ) complete nucleotide sequences; ( a ( HLA-C Neighbor joining analyses of Figure 3 Branch lengths are proportionalphylogenetic and clades. represent estimated % sequence divergence. Numbers in italics are values from 100 bootstrap iterations. Number Taxonomic hierarchy of HLA class I allele sequences LM McKenzie et al 127 These analyses also, in general, recapitulated strong between them. This was not surprising since interallelic associations between alleles of the same serotype (Figure recombination at the HLA-B locus occurs dispro- 3c, Table 3), suggesting that both PBR and non-PBR resi- portionately in tracts such as exon 3 that contain codons dues contribute to the overall phylogenetic association of which code for the PBR.22,23 A comparison of genetic vari- HLA-C alleles. ation in exons 2 and 3 between chimpanzees and humans indicates high sequence conservation in exon 2 but little similarity in exon 3.27 Additionally, neighbor-joining Discussion analyses of exon 2 and 3 sequences of HLA-B35 and HLA- B15 alleles revealed that alleles clustered together in HLA-A accordance with serotype for exon 2. In contrast, these On the basis of complete nucleotide sequences, previous relationships were not recapitulated in the exon 3 phy- reports indicate that HLA-A alleles can be grouped into logeny.28 Further, the HLA-B locus is known to have six families (A2/28, A1/3/11, A9, A10, A19 and A80) and undergone high levels of interallelic recombination for these families correlate with the cross-reacting groups of over 5 million years.22,23,29 This extensive evolutionary defined by .15 In this study we employ reassortment of variation among different alleles most rigorous phylogenetic analyses of complete molecular likely explains the lack of comprehensively identifiable sequences of HLA-A alleles and confirm that the A19 fam- evolutionary relationships. ily does not form one closely related clade. In fact, analy- ses of complete sequence data show that A19 alleles div- HLA-C ide into two divergent groups in support of Kato et al15 In general, phylogenetic analyses using both complete findings suggesting a more complex evolutionary history. sequence data, PBR sequences and sequences excluding The consistent pattern seen among the five phylogen- the PBR, showed that HLA-C alleles cluster together con- etic methods used based on complete sequences was not sistent with designated serological types. This pattern is reflected in those based on the 57 residues of the PBR. in contrast to that seen for HLA-A and particularly This discrepancy suggests that the variation seen in the HLA-B alleles. Sequence analysis suggests HLA-C is more functional part of HLA-A molecules proceeds under dif- closely related to HLA-B than HLA-A and indicates the ferent constraints or at different rates from those rep- HLA-C locus may have arisen as a duplication of HLA- resented in the complete sequence data. Possible expla- B.24 HLA-C homologues have been identified in the nations for the decrease in resolution of the phylogeny chimp30,31 and gorilla19 but are not found in genome or when the data from the 57 residues include both low transcript searches of Asian apes (Old World primates, levels of phylogenetically informative substitutions orangutans, gibbons or rhesus monkeys). The suspected within the PBR and the relative role of selection in main- recent origin of the HLA-C locus may explain the more taining allelic polymorphism. The lack of a consistent pat- conservative nature of the phylogenetic analyses tern in the topology of the five phylogenetic trees as sug- presented here. gested by low bootstrap values within the PBR derived HLA-C has lower cell-surface expression that HLA-A or trees, low levels of informative changes, combined with HLA-B32 and as such has not been the subject of intense a high number of sequences composed of only 171 interest in the past. Recently however it has been shown nucleotides may cause low phylogenetic resolution of that HLA-C products are involved in NK cell regulation.33 this region. In addition, the putative relationship between Therefore, an alternative explanation for the general con- functional residues may be due to sequences that are not cordance seen in the phylogenetic relationships of HLA- a consequence of shared ancestry. Recombinational gen- C alleles based on complete sequence and sequence of eration of alleles would obscure phylogenetic signal, only the 57 residues of the PBR may be that the HLA-C however evidence for appreciable interallelic exchange locus has a specific role and as such is not under the same has not been observed for HLA-A, as it has for HLA-B.22 selective pressure as the other classical class I loci, in Jakobsen et al26 have shown that whilst variable sites in particular the HLA-B locus. HLA-A alleles are distributed along the sequence, the majority of signature sites which identify families are located in exons 1 and 4–8 with few located in regions Conclusion which code for the PBR. This finding is reflected in the The phylogenetic analyses presented here of three class lack of phylogenetic resolution among PBR sequences I HLA loci each represent unique evolutionary properties observed in this study. of the MHC. Contrasting patterns of phylogenetically informative substitutions between complete sequences HLA-B and PBR sequences implicate the relative importance of The HLA-B locus is the most polymorphic class I MHC selection, recombination and evolutionary history in locus. ME and MP analyses of complete nucleotide and maintaining differential levels of polymorphism among amino acid sequences of HLA-B alleles reveal no consist- the three loci. In particular, HLA-B alleles are remarkably ent pattern of overall tree topology among the methods. polymorphic, with less evidence of patterns for uniting There was a general lack of bootstrap support for the alleles into families than for HLA-A and HLA-C alleles. structure of any of the trees. However, some consistent Recombination has seemingly partially obscured substi- relationships among some alleles are indicated. There are tutions which reflect shared historical patterns. In con- seven strongly supported clades representing 21 alleles trast, HLA-A and HLA-C are not undergoing such levels which can be interpreted as being phylogenetically of recombination, yet exhibit markedly different levels of related. concordance between serological typings, complete Analyses of PBR sequences of HLA-B alleles revealed sequence phylogeny and relationships derived from essentially no detectable phylogenetic relationships analyses of the PBR. Taxonomic hierarchy of HLA class I allele sequences LM McKenzie et al 128 The phylogenetic depiction of the discordance between and MP were performed to test for the reliability of the the PBR and the complete sequence data for HLA-A is data to recapitulate the same topology. Bootstrap values reminiscent of selection maintaining polymorphism in Ͼ70% were considered to indicate strong support for the PBR. In contrast, the concordance between the PBR that node.43,44 and the entire sequence of HLA-C may indicate recent ancestry or a specialized and specific role for the locus. It is clear from the results of this study that phylogen- Acknowledgements etic analyses of HLA-B alleles can provide only limited We thank Dr R Stevens for computer programming insight into possible groupings of alleles with common assistance. We acknowledge the National Cancer Institute evolutionary history. This is not unexpected, as previous for allocation of computer time and assistance at the studies have highlighted the high levels of recombination 22,23 Frederick Supercomputing Center. The content of this at this locus. However consistent clusters of HLA-C publication does not necessarily reflect the views or poli- and to a lesser extent HLA-A alleles identified by these cies of the Department of Human Health and Human phylogenetic analyses indicate evolutionarily-related Services, nor does mention of trade names, commercial groups which can now be used to investigate epidemiol- products, or organizations imply endorsement by the ogical associations. US Government.

Materials and methods References Sequences 1 Klein J (ed). Natural History of the Major Histocompatibility Com- HLA alleles used in the analyses were obtained from the plex. John Wiley and Sons, Inc: New York, 1986, p 775. American Society for Histocompatibility and Immuno- 2 Parham P, Ohta T. Population biology of antigen presentation 272 genetics (ASHI) web page located at by MHC class I molecules. Science 1996; : 67–74. F 3 Hill AVS, Allsopp CEM, Kwiatkowski D et al. Common West http://www.swmed.edu/home pages/ASHI/ African HLA antigens are associated with protection from severe sequences/seq3/htm malaria. Nature 1991; 352: 595–600. Sequences obtained from the ASHI web page were pre- 4 Hill AVS, Yates SNR, Allsopp CEM et al. Human leukocyte anti- viously aligned using the Genedoc Multiple Sequences gens and natural selection by malaria. Phil Trans Royal Soc Lond Alignment Editor by P Parham. Only alleles for which B 1994; 346: 379–385. the complete sequence was available in August 1998 were 5 Haynes BF, Pantaleo G, Fauci AS. Toward an understanding of included in this study, ie, 60, 120 and 36 alleles were protective immunity to HIV infection. Science 1996; 271: 324–328. retrieved for HLA-A, HLA-B and HLA-C, respectively. 6 Lopez de Castro JA. The pathogenetic role of HLA-B27 in Rhesus macaque sequences used as outgroups for HLA- chronic arthritis. Curr Opin Immunol 1998; 10: 59–66. 7 Hughes A, Yeager M. Natural selection at major histocompat- A and HLA-B analyses were obtained from Genbank with ibility complex loci of vertebrates. Annu Rev Genet 1998; 32: accession numbers U41832 and U41829 respectively. 415–435. Chimpanzee and gorilla sequences (Genbank accession 8 Klein J, Satta Y, O’hUigin C, Takahata N. The molecular descent numbers D11383 and X60251 respectively) were used as of the major histocompatibility complex. Annu Rev Immunol outgroups for HLA-C analyses as these are the only two 1993; 11: 269–295. species for which loci homologous to HLA-C have been 9 Bjorkman PJ, Saper MA, Samraoui B et al. Structure of the identified. human class I histocompatibility antigen, HLA-A2. Nature 1987; 329: 506–512. 10 Bjorkman PJ, Saper MA, Samraoui B et al. The foreign antigen Phylogenetic analyses binding site and T cell recognition regions of class I histocom- Phylogenetic trees were constructed using three methods: patibility antigens. Nature 1987; 329: 512–518. ME using the distance measures of Tajima-Nei for 11 Hughes AL, Nei M. Pattern of nucleotide substitution at major nucleotide data which assumes unequal base fre- histocompatibility complex class I loci reveals overdominant quencies34 and mean character difference for amino acid selection. Nature 1988; 335: 167–170. data as implemented by PAUP* (by permission of D 12 Hughes AL, Nei M. Nucleotide substitution at major histocom- Swofford), MP35 and ML (nucleotide sequences only)36 patibility complex class II loci: evidence for overdominant selec- for HLA-A, -B and -C alleles with one exception. ML tion. Proc Natl Acad Sci USA 1989; 86: 958–962. analysis was not performed using the HLA-B data set due 13 Hedrick PW, Whittam TS, Parham P. Heterozygosity at individ- ual amino acid sites: extremely high levels for HLA-A and HLA- to the extensive computational time required. These three B genes. Proc Natl Acad Sci USA 1991; 88: 5897–5901. methods, based on different models of evolution, do not 14 Bodmer JG, Marsh SGE, Albert ED et al. Nomenclature for fac- 37–41 necessarily derive the same topologies. Thus, we tors of the HLA system, 1996. Hum Immunol 1996; 53: 98–128. interpret congruent phylogenetic associations from the 15 Kato K, Trapani JA, Allopenna J et al. Molecular analysis of the three methods as the best estimates of the true phy- serologically defined HLA-Aw19 antigens: a genetically distinct logeny.42 ME and MP analyses of nucleotide and amino family of HLA-A antigens comprising A29, A31, A32 and Aw33, acid data were performed using PAUP* (by permission but probably not A30. J Immunol 1989; 143: 3371–3378. of D Swofford) MP analyses of nucleotide data were 16 Domena JD, Hildebrand WH, Bias WB, Parham P. A sixth family performed using conditions of simple addition of of HLA-A alleles defined by HLA-A*8001. Tissue Antigens 1993; sequences, general heuristic search, and branch swapping 42: 156–159. 17 Wagner AG, Hughes AL, Iandoli ML et al. HLA-A*8001 is a using the nearest neighbor interchange. ML analyses member of a newly discovered ancient family of HLA-A alleles*. were conducted using the DNAML program in PHYLIP Tissue Antigens 1993; 42: 522–529. version 3.5 using the empirical frequencies of nucleotide 18 Lawlor DA, Warren E, Ward, FE, Parham P. Comparison of class bases and the global search option.37 Bootstrap resam- I alleles in humans and apes. Immunol Rev 1990; 113: 147–185. pling analyses (100 iterations) in conjunction with ME 19 Lawlor DA, Warren E, Taylor P, Parham P. Gorilla class I major Taxonomic hierarchy of HLA class I allele sequences LM McKenzie et al 129 histocompatibility complex alleles: comparison to human and phism predate the divergence of humans and chimpanzees. chimpanzee class I. J Exp Med 1991; 174: 1491–1509. Nature 1988; 335: 268–271. 20 Lienert K, Parham P. Evolution of MHC class I genes in higher 31 Mayer WE, Jonker M, Klein D, Ivanyi P et al. Nucleotide primates. Immunol Cell Biol 1996; 74: 349–356. sequences of chimpanzee MHC class I alleles: evidence for a 21 Spencer Wells RS, Parham P. HLA class I genes: structure and trans-species mode of evolution. EMBO J 1988; 7: 2765–2774. diversity. In: Browning MJ, McMichael AJ (eds). HLA and MHC: 32 Snary D, Barnstable CJ, Bodmer WF, Crumpton MJ. Molecular genes, molecules and function. BIOS Scientific Publishers Ltd: structure of human histocompatibility antigens: the HLA-C ser- Oxford, 1996, pp 77–83. ies. Eur J Immunol 1977; 8: 580–585. 22 Hughes AL, Hughes MK, Watkins DI. Contrasting roles of inter- 33 Gumperz JE, Parham P. The enigma of the natural killer cell. allelic recombination at the HLA-A and HLA-B loci. Genetics Nature 1995; 378: 245–248. 1993; 133: 669–680. 34 Tajima F, Nei M. Estimation of evolutionary distance between 23 McAdam SN, Boyson JE, Liu X et al. A uniquely high level of nucleotide sequences. Mol Biol Evol 1984; 1: 269–285. recombination at the HLA-B locus. Proc Natl Acad Sci USA 1994; 35 Swofford DL. Phylogenetic analysis using parsimony (PAUP), ver- 91: 5893–5897. sion 3.1.1. University of Illinois: Champaign, 1993. 24 Yeager M, Hughes AL. Interallelic recombination has not played 36 Felsenstein J. Evolutionary trees from DNA sequences: a a major role in the history of the HLA-C locus. Immunogenetics maximum likelihood approach. J Mol Evol 1981; 17: 368–376. 37 Felsenstein J. Phylogeny Inference Package (PHYLIP). Version 3.5. 1996; 44: 128–133. University of Washington: Seattle, 1993. 25 Hughes AL, Yeager M, Carrington M. Peptide binding function 38 Huelsenbeck JP, Hillis DM. Success of phylogenetic methods in and the paradox of HLA disease associations. Immunol Cell Biol the four-taxon case. Syst Biol 1993; 42: 247–264. 1996; 74: 444–448. 39 Russo CAM, Takezaki N, Nei M. Efficiencies of different genes 26 Jakobsen IB, Gao X, Easteal S, Chelvanayagam G. Correlating and different tree-Building methods in recovering a known ver- sequence variation with HLA-A allelic families: Implications for tebrate phylogeny. Mol Biol Evol 1996; 13: 525–536. T cell receptor binding specificities. Immunol Cell Biol 1998; 76: 40 Sourdis J, Nei M. Relative efficiencies of the maximum parsi- 135–142. mony and distance-matrix methods in obtaining the correct tree. 27 Boyson JE, Shufflebotham C, Cadavid LF et al. The MHC class Mol Biol Evol 1988; 5: 298–311. I genes of the rhesus monkey. Different evolutionary histories 41 Zhang J, Nei M. Accuracies of ancestral amino acid sequences of MHC class I and II genes in primates. J Immunol 1996; 156: inferred by the parsimony, likelihood, and distance methods. 4656–4665. J Mol Evol 1997; 44: S139–146. 28 Cereb N, Kim C, Hughes AL, Yang SY. Molecular analysis of 42 Hillis DM, Moritz C, Mable BK. Molecular Systematics (2nd HLA-B35 alleles and their relationship to HLA-B15 alleles. Tissue edn). Sinauer Associates, Inc: Massachusetts, 1996. Antigens 1997; 49: 389–396. 43 Hillis DM, Bull JJ. An empirical test of bootstrapping as a 29 Jakobsen IB, Wilson SR, Easteal S. Patterns of reticulated evol- method for assessing confidence in phylogenetic analysis. Syst ution for the classical class I and II HLA loci. Immunogenetics Biol 1993; 42: 182–192. 1998; 48: 312–323. 44 Hillis DM. Approaches for assessing phylogenetic accuracy. Syst 30 Lawlor DA, Ward FE, Ennis PD et al. HLA-A and B polymor- Biol 1995; 44: 3–16.