Complete Article
Total Page:16
File Type:pdf, Size:1020Kb
The EMBO Journal Vol. I No. 12 pp. 1539-1544, 1982 Long terminal repeat-like elements flank a human immunoglobulin epsilon pseudogene that lacks introns Shintaro Ueda', Sumiko Nakai, Yasuyoshi Nishida, lack the entire IVS have been found in the gene families of the Hiroshi Hisajima, and Tasuku Honjo* mouse a-globin (Nishioka et al., 1980; Vanin et al., 1980), the lambda chain (Hollis et al., 1982), Department of Genetics, Osaka University Medical School, Osaka 530, human immunoglobulin Japan and the human ,B-tubulin (Wilde et al., 1982a, 1982b). The mouse a-globin processed gene is flanked by long terminal Communicated by K.Rajewsky Received on 30 September 1982 repeats (LTRs) of retrovirus-like intracisternal A particles on both sides, although their orientation is opposite to each There are at least three immunoglobulin epsilon genes (C,1, other (Lueders et al., 1982). The human processed genes CE2, and CE) in the human genome. The nucleotide sequences described above have poly(A)-like tails -20 bases 3' to the of the expressed epsilon gene (CE,) and one (CE) of the two putative poly(A) addition signal and are flanked by direct epsilon pseudogenes were compared. The results show that repeats of several bases on both sides (Hollis et al., 1982; the CE3 gene lacks the three intervening sequences entirely and Wilde et al., 1982a, 1982b). Such direct repeats, which were has a 31-base A-rich sequence 16 bases 3' to the putative also found in human small nuclear RNA pseudogenes poly(A) addition signal, indicating that the CE3 gene is a pro- (Arsdell et al., 1981), might have been formed by repair of cessed gene. The CE3 gene sequence is homologous to the five staggered chromosomal breaks following insertion of pseudo- separate DNA segments of the CE1 gene; namely, a segment in genes into chromosomes. Several models have been proposed the 5' -flanking region (100 bases) and four exons, which are to account for the generation of such pseudogenes, including interrupted by a spacer region or intervening sequences. Long gene conversion by mRNA-DNA heteroduplex formation terminal repeat (LTR)-like sequences which contain TATAAA (Nishioka et al., 1980), the insertion of cDNA into a stag- and AATAAA sequences as well as terminal inverted repeats gered chromosomal break following the reverse transcription are present in both 5'- and 3' -flanking regions. The 5' and 3' of mRNA (Hollis et al., 1982), and insertion via a retroviral LTR-like sequences do not, however, constitute a direct RNA intermediate with LTR (Lueders et al., 1982). repeat, unlike transposable elements of eukaryotes and retro- Here, we report the nucleotide sequences of the human C.1 viruses. The 3' LTR-like sequence is repetitive in the human and CE3 genes and their flanking regions. Comparison of the genome, but is not homologous to the Alu family DNA. C., and CE3 sequences indicates that the C13 gene lacks all of Models for the evolutionary origin of the processed gene the three intervening sequences and is flanked by a 31-base flanked by the LTR-like sequences are discussed. The CE3 A-rich sequence 16 bases 3' to the poly(A) addition signal, in- gene has a new open frame which codes potentially for an dicating that it is a processed pseudogene. In addition, the CE3 unknown protein of 292 amino acid residues. pseudogene is flanked by LTR-like elements with a direct Key words: heteroduplex/nucleotide sequence/processed repeat on both ends. Such a structure may suggest that the gene/LTR-like sequence/new open frame LTR-like sequence might have been involved for the genera- tion of the C13 pseudogene. Introduction Results Immunoglobulin heavy chain constant region (C) genes are Comparison of the CE1 and CE3 genes by heteroduplex arranged in the mouse genome in the order 5'-C,7-C-C.,3-C. 1- analyses and nucleotide sequence determination Cy2b-Cy2a-C-Ca,r3' (Liu et al., 1980; Nishida et al., 1981; Roeder et al., 1981; Shimizu et al., 1981, 1982; Takahashi et The human CES and CE3 genes (Ch4A * H *Igel1l and al., 1981). Structural comparison of these genes has shown WES H.Igc-3 1, respectively) were isolated previously from that they evolved through gene duplications and interveming DNA of an IgE-producing myeloma cell line 266B1 (Nishida sequence (IVS)-mediated domain transfer events (Miyata et et al., 1982). The germline CES gene (Ch4A H Ige-12) was al., 1980; Yamawaki-Kataoka et al., 1981, 1982). There is no subsequently isolated from a library containing partial AluI- extensively conserved pseudogene of the heavy chain genes in HaeIII digests of human fetal liver DNA. We have examined the mouse genome. In contrast, the human heavy chain gene the homology of the CES and C13 genes by heteroduplex of the analyses as shown in Figure 1. Heteroduplexes formed bet- family contains several pseudogenes gamma and ep- ween the CE3 and germline CE1 cloned DNAs showed five silon genes (Max et al., 1982; Nishida et al., 1982; Takahashi four of which seem to to et al., 1982). There are two epsilon pseudogenes (CE2 and CE) homologous segments, correspond in addition to the expressed gene (CE) (Flanagan and Rab- the exons of the CES gene from their locations and lengths. C, The fifth homologous segment (F in Figure 1) of - 100 bases bitts, 1982; Max et al., 1982; Nishida et al., 1982) indicating is located 0.7 kb upstream of the CHI exon of the CES gene. that the CE locus was a target of frequent genetic rearrange- The five segments seem to be separated in the C,1 gene but ments. that the first was contiguous in the CE3 gene. These observations suggest Several pseudogenes have been reported since was created the fusion of five described in the 5S RNA gene of Xenopus laevis (Jacq et al., the CE3 gene by separate that segments, probably a segment in the 5'-flanking region and 1977). Recently several so-called processed pseudogenes four exons of the prototype epsilon gene i.e., the ancestor of 'Present address: Department of Anthropology, Faculty of Science, The the CES gene. University of Tokyo, Tokyo, Japan. To compare the nucleotide sequences of the CE. and CE3 *To whom reprint requests should be sent. genes we have determined the nucleotide sequences of the (U IRL Press Limited, Oxford, England. 0261-4189/82/0112-1539$2.00/0. 1539 S.Ueda et al. 41, I ~~~~ * ~~~~~*a' 40 ~~~A~~~~~~ 4. ~~~~~~~. -'' * . #44±* .A '.-6 .. ~~4 ~b~"Al vC~±5 * ~~~~4a~~~~~~~~a -% o *'101, 640~~~~~~~~~~~ 100,~ ~ ~ ~ ~ ''? 4F~~~~/f40~~~~~%ok.ta-.1*44 - *"~ ~ ~ "a $a,~~~~~~a,a44 ~ ~ ~ 4 .4*a4` I* ..4 ,j' I . 1 ad~~~ S W-~W-4a -~~a;~~ 4k'- ~~a'~ A "I CH2 Fig. 1. Heteroduplex analyses between the CO3 and the germline C,1 genes. An electron-micrograph and diagram are shown. 31 molecules were measured. Lengths (kb) of duplexed regions are as follows; F, 0.11 A 0.02; CHI, 0.31 + 0.04; CH2, 0.34 + 0.05; CH3, 0.34 ± 0.02; and CH4, 0.47 + 0.05. The single-stranded loop between F and CH1 is 0.66 + 0.1 kb. structural and the flanking regions of the two genes according J segment. The nucleotide sequence of the 100-base region to the strategies shown in Figure 2. The nucleotide sequences 755 bases upstream of the CHI exon of the C,j gene is of the CE. and C43 genes and the amino acid sequence homologous (8507o) to that of the region immediately 5' to predicted from the nucleotide sequence of the C,1 gene are the pseudo-CHI domain of the C43 gene. The sequence shown in Figure 3. The nucleotide sequence of the C,1 gene (GTGGG) immediately 3' to this homology region of the C41 was derived from the rearranged CE1 gene except that the gene is similar to the consensus sequence for the 5' splicing germline gene sequence was used for the 5'-flanking region site (GTPuAG). Outside these homology regions the two se- where the rearrangement took place in the myeloma (Nishida quences are different from each other. These structural et at., 1982). The human CE, gene consists of four exons each features of the CE3 gene are reminiscent of processed gene encoding a domain and three intervening sequences like the pseudogenes previously described in several gene families mouse CE gene (Ishida et al., 1982). The 5' end of the CHI (Nishioka et at., 1980; Vanin et at., 1980; Hollis et al., 1982; domain of the C,, gene was tentatively determined according Wilde et al., 1982a, 1982b). to a consensus sequence for the 3' splicing site (PyN- The amino acid sequence predicted from the nucleotide se- PyPyPyNCAG) and because all the splicing sites fall between quence of the CEl gene is consistent with that determined the first and the second bases of a codon in the immuno- from the protein analysis except for 17 residues out of 427 globulin genes (Sharp, 1981) although the amino acid se- residues; eight substitutions, eight insertions, and one dele- quence of this region is not known (Bennich and von Bohr- tion (Bennich and von Bohr-Lindstrom, 1974). Our CEl se- Lindstrom, 1974). quence contains much longer flanking regions than those Comparison of the nucleotide sequences of the CE, and CE3 published recently (Flanagan and Rabbitts, 1982; Max et al., genes revealed the complete absence of the three IVSs from 1982). Our sequence differs from that of the germline Ce, the CE3 gene and the presence of a 31-base A-rich sequence 16 gene (Max et al., 1982) at 13 positions which are located in bases downstream of the poly(A) addition signal (AATAAA) flanking regions and IVS or at the silent positions except that of the CE3 gene.