Allelic Sequence Variation of the HLA-DQ Loci
Total Page:16
File Type:pdf, Size:1020Kb
Proc. Nati. Acad. Sci. USA Vol. 85, pp. 6012-6016, August 1988 Genetics Allelic sequence variation of the HLA-DQ loci: Relationship to serology and to insulin-dependent diabetes susceptibility (molecular mimicry/HLA DNA typing/epitope mapping/DNA polymerase chain reaction) GLENN T. HORN*, TEODORICA L. BUGAWAN*, CHRISTOPHER M. LONGt, AND HENRY A. ERLICH*t Departments of *Human Genetics and tMicrobial Genetics, Cetus Corp., 1400 Fifty-Third Street, Emeryville, CA 94608 Communicated by Sherman M. Weissman, March 17, 1988 ABSTRACT Analysis of sequence variation in the poly- The haplotypes defined by serologic DR typing are genet- morphic second exon of the major histocompatibility complex ically heterogeneous. Stronger associations with IDDM have genes HLA-DQa and -DQ( has revealed 8 allelic variants at the been found when the serologic specificities are subdivided by a locus and 13 variants at the .3 locus. Correlation of sequence restriction fragment length polymorphisms (RFLPs) by using variation with serologic typing suggests that the DQw2, DQw3, DR(3 and DQ(3 cDNA probes (6-10). In this report, we and DQ(blank) types are determined by the DQ(3 subunit, describe coding sequence polymorphism in the HLA-DQa while the DQwl specificity is determined by DQa. The nature and -DQj3 loci, determined by amplification using the DNA ofthe amino acid at position 57 in the DQ(3 subunit is correlated polymerase chain reaction (PCR) (11-14). The overall pattern with susceptibility to insulin-dependent diabetes mellitus. This ofallelic variation has been correlated with the major serologic region ofthe DQ.3 chain contains shared peptides with Epstein- specificities [DQwl, DQw2, DQw3, and DQ(blank)] and with Barr virus and rubella virus. susceptibility to IDDM. Since the development ofautoimmune disease may involve a viral triggering event (15), we have also Insulin-dependent diabetes mellitus (IDDM or type I diabe- searched for homology between the HLA-DQ alleles and tes) is an autoimmune disease in which dysfunctional regu- potentially immunogenic regions of two viral pathogens. lation of glucose metabolism results from the immunologi- METHODS cally mediated destruction of the insulin-producing islet cells PCR Amplification and M13 Cloning of DQa and DQ(3 of the pancreas (1). The 50% concordance rate of monozy- Genomic Sequences. Genomic DNA (1 ug) from various cell gotic twins indicates a significant genetic component for lines was amplified by PCR (12) for 28 cycles with the Klenow IDDM and suggests, as well, that an "environmental trigger" fragment of Escherichia coli DNA polymerase. The DQa (e.g., viral infection) may be required to elicitthe clinical disease PCR primers GH26 and GH27 are described in ref. 14; the in genetically susceptible individuals. Loci in the HLA (human DQ,3 PCR primers GH28 and GH29 are described in ref. 16. leukocyte antigen) region of chromosome 6 contribute a major The amplified DNA was digested with BamHI and Pst I, portion ofthe genetic predisposition, judged on the basis ofthe cloned in the M13mplO vector by a modification of the PCR concordance rates (25%) of HLA-identical sibs, patterns of cloning procedure (14), and sequenced by the dideoxynu- haplotype sharing among affected sib pairs, and linkage analysis cleotide primer extension method (41). The DXa genes from in families (reviewed in ref. 2). In addition, population analysis the HLA homozygous lines LG2, FPF, and TAB were has revealed that the frequency ofcertain serologically defined sequenced and compared to published sequences of other variants ofthe HLA class II antigens is different among patients alleles. Only one silent genetic difference was seen: at codon and controls. These studies found that the serologic types DR3 36, TAC vs. TAT in refs. 17 and 18. Sequences for DXf3 were and DR4 were positively associated with IDDM and DR2 was determined from IDDM patients JOP (DR3,4), MZ (DR7,8), negatively associated (2). These disease associations may re- and JS (DR4,4), as well as from cell lines FPF and ARC; all flect the function either ofthe class II antigens themselves or of had CGG at codon 25, a silent coding difference from the two other disease susceptibility genes in linkage disequilibrium with published alleles (19). specific class II alleles. Viral Sequence Homology. The sequence of the 172,282- The class II loci of the human major histocompatibility base-pair (bp) genome of strain B95-8 of Epstein-Barr virus complex (MHC) encode macrophage and B-cell transmem- (EBV) (20) was translated by computer into protein sequence brane glycoproteins that present antigen to the helper T for all six of the forward and reverse phases. These transla- lymphocytes (for review, see ref. 3). Three related class II tions were then searched for exact matches of five or more antigens, each composed of two distinct subunits, a and A, amino acids with polypeptide sequences centered at position have been identified, and are designated HLA-DR, -DQ, and 57 of the DQ(3 alleles (see Table 3). The location and size of -DP. These proteins are highly polymorphic, possibly as a the open reading frame (ORF) for each match were then result of selective pressure arising from their role in the determined and correlated with the major ORFs and tran- immunological defense against infectious pathogens (4). The scriptional segments of the EBV genome (20). In the six polymorphism in class II antigens is localized to the NH2- translation phases of EBV, the amino acid residues (one- terminal outer domain and is encoded by the second exon. letter code) near position 57 of the DQP alleles occur at the class II residues have been to following fractions: A = 0.0804, D = 0.0258, E = 0.0337, G These polymorphic postulated = 0.1089, L = 0.0890, P = 0.1089, R = 0.0962, and Y = interact with the T-cell antigen receptor and/or foreign 0.0141. The chances of random occurrence for the polypep- antigen (5), with recognition ofthe antigen peptide fragments = in association with a specific class II product leading to T-cell tide epitopes are as follows: R P DA E 1/1,370,000; G L activation. Abbreviations: IDDM, insulin-dependent diabetes mellitus; MHC, major histocompatibility complex; RFLP, restriction fragment The publication costs of this article were defrayed in part by page charge length polymorphism; PCR, (DNA) polymerase chain reaction; payment. This article must therefore be hereby marked "advertisement" EBV, Epstein-Barr virus; ORF, open reading frame. in accordance with 18 U.S.C. §1734 solely to indicate this fact. tTo whom reprint requests should be addressed. 6012 Downloaded by guest on September 25, 2021 Genetics: Horn et al. Proc. Natl. Acad. Sci. USA 85 (1988) 6013 P A A = 1/147,000; P AA E Y = 1/2,990,000; G P P A A Table 1. Pattern of residue sharing among HLA-DQ alleles = 1/120,000; and P P A A E Y = 1/27,400,000. Frequency of RESULTS Number of shared shared variable Allelic Sequence Variation. Allelic variation in the HLA- Serologic unique residues residues, % DQ region has previously been defined by the serological specificity DQa DQ/3 DQa DQ(3 DQwl, DQw2, and DQw3 specificities and by DQ(blank) DQwl 15 (n = 7) 1 (n = 15) 87 48 (21). In this study, we have determined 14 independent DQa DQw2 0 (n = 3) 9 (n = 9) 56 100 sequences and 36 independent DQ(3 sequences. Thus far, 8 DQw3 0 (n = 8) 1 (n = 16) 48 87 allelic DNA sequence variants have been identified at the DQ(blank) 0 (n = 2) 2 (n = 4) 56 9 DQa locus and 13 at DQ(3. We propose a nomenclature for The term "shared unique residues" refers to amino acids common the DNA-defined alleles that has its basis in the correspon- to a specific group of allele products and absent from all other allele dence between current DQw serological specificities and products. The amplified regions of the DQ alleles shown in Figs. 1 sequence patterns. Specifically, the types DQB1, DQB2, and and 2 were used for this analysis; 23 of these residues are polymor- DQB3 designate sequences derived from the serologically phic for both loci. Listed is the serologic type used to define each defined DQwl, DQw2, and DQw3 haplotypes, respectively, group, the number of uniquely shared residues, and the number (n) while DQB4 designates those sequences derived from of sequences compared. The frequency of shared variable residues DQ(blank) haplotypes, an apparently homogeneous type (see for each group is also shown. below). Sequence variants that subdivide these types are DQ(blank) cells (50). A more detailed discussion of epitope designated by a subtype number (e.g., DQB1.2 or DQA1.3). mapping and of the mechanisms for generating DQ polymor- The designation of DQa allelic variants, however, does not phism will be given elsewhere. always correspond to the DQw specificity, since some of the DQw specificities appear to be determined by polymorphic Susceptibility to Autoimmune Disease. The availability of a epitopes on the f3 chain, independent of allelic variation on large number of DQ,8 allelic sequences (Fig. 2), including 20 the a chain. sequences from diabetic patients, allows the tentative local- The PCR amplification procedure used here for the DQa ization of epitopes, including putative disease susceptibility and DQ,8 loci also amplifies the related DXa and DXP genes. epitopes. DR4-bearing haplotypes occur at increased fre- All were found to be identical in predicted protein sequence quency in IDDM patients (2). However, the DR,8 variants, to the DX alleles reported earlier (18, 19). The lack of which correlate with the cellular Dw typing (29), are only sequence variation in the unexpressed DX genes from vari- weakly associated with IDDM (35).