Copyright  2000 by the Genetics Society of America

Two Genes Become One: The Genes Encoding Heterochromatin Protein SU(VAR)3-9 and Translation Initiation Factor Subunit eIF-2␥ Are Joined to a Dicistronic Unit in Holometabolic

Veiko Krauss and Gunter Reuter Institute of Genetics, Martin Luther University Halle-Wittenberg, D-06108 Halle, Germany Manuscript received January 25, 2000 Accepted for publication July 13, 2000

ABSTRACT The Drosophila suppressor of position-effect variegation Su(var)3-9 encodes a heterochromatin-associ- ated protein that is evolutionarily conserved. In contrast to its yeast and mammalian orthologs, the Drosophila Su(var)3-9 gene is fused with the locus encoding the ␥ subunit of translation initiation factor eIF2. Synthesis of the two unrelated proteins is resolved by alternative splicing. A similar dicistronic Su(var)3-9/eIF-2␥ transcription unit was found in arietis, Leptinotarsa decemlineata, and Scoliopterix libatrix, representing two different orders of holometabolic insects (Coleoptera and Lepidoptera). In all these the N terminus of the eIF-2␥, which is encoded by the first two exons, is fused to SU(VAR)3-9. In contrast to Drosophila melanogaster, RT-PCR analysis in the two coleopteran and the lepidopteran species demonstrated the usage of a nonconserved splice donor site located within the 3Ј end of the SU(VAR)3-9 ORF, resulting in removal of the Su(var)3-9-specific stop codon from the mRNA and complete in-frame fusion of the SU(VAR)3-9 and eIF-2␥ ORFs. In the centipede Lithobius forficatus eIF-2␥ and Su(var)3-9 are unconnected. Conservation of the dicistronic Su(var)3–9/eIF-2␥ transcription unit in the studied insects indicates its origin before radiation of holometabolic insects and represents a useful tool for molecular phylogenetic analysis in .

URING recent years molecular analysis has revealed is a heterochromatin-associated protein involved in chro- D several interesting exceptional types of gene orga- matin condensation (Tschiersch et al. 1994; Schotta nization in eukaryotes (Blumenthal 1998). These in- and Reuter 2000). Heterochromatin-associated proteins clude cases where several genes are arranged into a (HPs) were first identified by dominant modifier muta- multicistronic unit expressed from a single promoter. tions of position-effect variegation (PEV) in Drosophila The polycistronic pre-mRNA is processed by alternative (Reuter and Spierer 1992; Weiler and Wakimoto 1995; or trans-splicing to monocistronic mRNAs. In Caenorhab- Wallrath 1998). Three different HPs have been studied of the genes are estimated to be in more detail: the HP1 chromo domain (James and %25ف ditis elegans contained in polycistronic units (Zorio et al. 1994). Elgin 1986; Eissenberg et al. 1992), the SU(VAR)3-7 Dicistronic transcription units that are resolved by alter- zinc finger (Cleard et al. 1997), and the SU(VAR)3-9 native splicing were documented in C. elegans unc-17/ chromo and SET domain protein (Tschiersch et al. cha-1, Drosophila ChAT/VAChT, and mammalian UOG- 1994). HP1 and SU(VAR)3-9 are evolutionarily con- 1/GDF-1 (Lee 1991; Alfonso et al. 1994; Kitamoto et served. Su(var)3-9 orthologs were identified in Schizosac- al. 1998). A similar gene structure was found for the charomyces pombe (Clr4; Ivanova et al. 1998) and in mice sesB/Ant2 genes in Drosophila (Zhang et al. 1999). In and humans (SUV39h1/SUV39H1; Aagaard et al. 1999). contrast the Adh/Adhr locus of Drosophila melanogaster Molecular analysis of the Su(var)3-9 genomic region in is transcribed as one dicistronic mRNA (Brogna and Drosophila revealed a unique gene structure (Tschiersch Ashburner 1997). et al. 1994). The D. melanogaster SU(VAR)3-9 sequences Here we describe another dicistronic transcription are encoded by a specific exon within intron 2 of eIF-2␥. unit that evolved by the fusion of two functionally unre- The mRNAs of the two genes are produced by alterna- lated genes: the suppressor of position-effect variega- tive splicing. SU(VAR)3-9 becomes fused to the N-termi- tion Su(var)3-9 and the gene encoding the ␥ subunit nal 80 amino acids of eIF-2␥ encoded by the first two of translation initiation factor eIF-2. The eIF-2␥ protein exons. This unusual gene structure is not found in fis- Met binds GTP and the initiator tRNA at translation initia- sion yeast or mammals. Human eIF–2␥ and SUV39H1 tion (Erickson and Hannig 1996) whereas SU(VAR)3-9 are both X chromosomal genes but are located at distant regions (Xp21 and Xp11.2, respectively; Geraghty et al. 1993; Ehrmann et al. 1998). Corresponding author: Veiko Krauss, Department of Genetics, Univer- sity of Leipzig, D-04103 Leipzig, Johannisallee 21–23, Germany. Here we show that the fusion of Su(var)3-9 and eIF-2␥ E-mail: [email protected] into a dicistronic transcription unit is conserved in holo-

Genetics 156: 1157–1167 (November 2000) 1158 V. Krauss and G. Reuter metabolic insects. It is not found in the centipede Litho- cDNA product were amplified by genomic PCR and RT-PCR bius forficatus. In D. melanogaster SU(VAR)3-9 is fused to using primer pair EF120/EF440. Sequences were determined by direct sequencing or se- the 80 N-terminal amino acids of eIF-2␥ whereas in the quencing of two or three independent clones from different coleopteran and lepidopteran species studied in addi- genomic PCR reactions. In addition, all transcribed regions tion alternative splicing at a nonconserved splice donor were sequenced as RT-PCR products (directly or as a clone). site at the 3Ј end of the SU(VAR)3-9 open reading frame Therefore, the transcript sequences were determined from (ORF) results in a complete fusion of the SU(VAR)3-9 three independent specimens of each species. PCR fragments were cloned using the pGEM-T PCR cloning kit (Promega, and eIF-2␥ ORFs. Madison, WI) according to the manufacturer’s conditions. 5؅ RACE, 3؅ RACE, and RT-PCR: 5Ј RACE experiments were carried out on 1–2 ␮g total RNA using the 5Ј RACE kit MATERIALS AND METHODS version 2 (Life Technologies) according to the manufacturer’s instructions. 3Ј RACE was carried out on 1–2 ␮g total RNA Study specimens: D. erecta was received from Umea Euro- with 200 units M-MLV reverse transcriptase (Life Technolo- pean Drosophila Stock Center. Adult specimens of Leptinotarsa gies). For first-strand cDNA synthesis a poly(T)-primer with decemlineata and L. forficatus were captured around Halle anchor sequences was used. After second-strand synthesis (Sachsen-Anhalt, Germany). Adult Clytus arietis and Scoliopterix cDNA was amplified with anchor oligo and gene-specific prim- libatrix were captured near Nebra (Sachsen-Anhalt) and Ruhla ers. The 3Ј RACE amplification was performed with a nested (Thu¨ringen, Germany), respectively. From each species one gene-specific primer. Reverse transcription of total RNA was individual was used for RNA isolation (Trizol reagent; Life carried out with M-MLV reverse transcriptase (Life Technolo- Technologies) and RT-PCR analysis. Two other individuals gies) and subsequent PCR was done using standard conditions. from each species were used for independent DNA isolation Sequencing: PCR products from 5Ј RACE, 3Ј RACE, and and PCR analysis of genomic fragments to evaluate DNA poly- RT-PCR were gel eluted and directly sequenced using an ABI morphisms. 377 sequencer. Genomic PCR fragments from D. erecta and Sampling DNA sequences: Sequences of primers and their products of inverse genomic PCR were directly sequenced. positions within the corresponding genes can be found in For DNA sequence analysis, MacVector 5.0 (Oxford Molecu- Table 1 and Figure 1, respectively. Su(var)3-9-specific se- lar, Palo Alto, CA) software was used. The GenBank/EMBL .kb length were isolated with the degenerated numbers of the genomic and cDNA sequences are, for D 1ف quences of primer pair chromo800 and chromo1790 from C. arietis and melanogaster, AJ290956; D. erecta, AJ290957; L. forficatus, AJ290958; S. libatrix. Partial eIF-2␥ cDNAs from C. arietis, S. libatrix, L. S. libatrix, AJ290959 and AJ290960; C. arietus, AJ290961, decemlineata, and L. forficatus were isolated by reverse tran- AJ290962, and AJ290963; and L. decemlineata, AJ290964 and scriptase (RT)-PCR using degenerate primers EF120 and AJ290965. EF440. Thereafter, for each of the studied species, a separate Phylogenetic analysis: Multiple protein sequence align- sequence sampling strategy was applied: In Scoliopterix, geno- ments were performed by means of the ClustalW algorithm mic PCR fragments were generated using the primer pairs of the SeqPup program (D. G. Gilbert), with corrections made EF120/Sco4-1 and Sco4-3/EF440. The genomic region includ- by eye. For phylogenetic analysis we used the maximum-likeli- ing exons 1 and 2 was amplified by inverse PCR using primers hood tree reconstruction method of the PUZZLE 4.0 program Sco1-5, Sco2, and Sco4-3. The exon-intron structure at the 5Ј (Strimmer and Haeseler 1997) on the basis of the JTT model end of Scoliopterix eIF-2␥ was determined after RT-PCR using of amino acid substitution (Jones et al. 1992). Quartet puzzling the primers Sco5-1, Sco5-2 (5Ј to the putative translation start (1000 steps) was performed to infer support values for internal point), and primer Sco8-2 (inside exon 3). Similarly, the splic- branches. Trees were drawn using Treeview (R. Page). The ing of Su(var)3-9 was determined by RT-PCR using the primers alignments were also analyzed by bootstrapping (branch-and- Sco5-1, Sco5-2, Sco1-4, and Sco4-2 for the 5Ј end; and Sco4-3, bound search method, 100 steps) with the maximum par- Sco4-4, Sco8-1, and Sco8-2 at the 3Ј end of the Su(var)3-9- simony algorithm of PAUP 3.1 (Swofford 1993). Align- /geneticsف/specific exon (2a). 5Ј rapid amplification of cDNA ends ments are accessible at http://www.uni-leipzig.de (RACE) and 3Ј RACE products could not be generated from ForschVK.htm. this species. In Clytus, genomic PCR fragments were amplified using the primer pairs EF120/Clyrev1 and Cly3-9-4/EF440. The se- RESULTS quence of the eIF-2␥ transcript was determined by overlapping 5Ј RACE (primers ClyEF2 and ClyEF3) and 3Ј RACE (primers The Su(var)3-9/eIF-2␥ dicistronic transcription unit Cly5, ClyEF1, and ClyEF4). With the help of primer pairs of holometabolic insects: The fusion of the two function- ClyEF4/ClyEF8 and ClyEF5/ClyEF7, the position and sizes of introns 3, 4, 5, and 6 were determined. The exon-intron ally unrelated Su(var)3-9 and eIF-2␥ genes into a dicis- structure of Su(var)3-9 was analyzed by RT-PCR using primer tronic transcription unit was first indicated in D. melano- pairs Cly5/Clyrev1, Cly3-9-4/ClyEF2, and Cly3-9-5/ClyEF7. gaster (Tschiersch et al. 1994). The eIF-2␥ locus of D. In Leptinotarsa, the sequence of the eIF-2␥-transcript was melanogaster consists of five exons (Figure 1). Most of determined with overlapping 5Ј RACE (primers Lep2 and the ORF of the SU(VAR)3-9 heterochromatin protein Lep7) and 3Ј RACE (primers Lep5, Lep1, and Lep15) experi- ments. The sequence of exons 1 and 2 and of introns 1 and is situated within the large intron 2 of eIF-2␥. 5Ј RACE 2a, as well as the 5Ј part of the Su(var)3-9-specific exon 2a experiments proved that both the Su(var)3-9 and eIF-2␥ were determined using inverse PCR. Circularized genomic mRNAs use the same transcription initiation site 109 bp DNA was amplified using primer Lepinv in combination with before the translation initiation codon of both proteins Lep8, Lepint1, Lep9, Lep10, Lep11, and Lep12. The exon- (accession no. AJ290956), suggesting that the two mRNAs intron structure of Su(var)3-9 was determined by RT-PCR us- ing primer pairs Lep5/Lep13, Lep14/Lep2, and Lep17/ for the unrelated proteins are transcribed as a single Lep16. pre-mRNA, which is alternatively spliced. The two genes For the centipede L. forficatus a 1.7-kb genomic and a 320-bp share a common promoter and both mRNAs contain the Su(var)3-9/eIF-2␥ Dicistronic Unit 1159

TABLE 1 Primers used for PCR analysis

Species Primer name Sequence Degenerate chromo800 GGITAMICMIGAYWSIGARWIIACITGGGA oligonucleotides chromo1790 ARRTTIGGRTCRCAISWRTGRTTIAIRAARTG EF120 GGYACICGITGYACIGGHTAICAIGGHTAYAAHTAICAICGRAC EF440 CTRACRGGICCIGTRCTRTADRAITACYSITGITACRAITTRCC S. libatrix Sco5-1 TGTAAAATTCGTTGCCTACG Sco5-2 CCACAATTATGGCTTCG Sco2 AGAAATATAAACTTAACGCTAGTC Sco1-5 TGTCTGCACTCCTGAGATG Sco1-4 GGCCAACCTTTCCACTTTATGTG Sco4-2 TCAATCTTATCACAGTATCTCAG Sco4-1 GAAGAGCATTTACAAGCC Sco4-3 TTGTGGATGCAGCACATC Sco4-4 AATCTTGGAGTGTGGGCTGC Sco8-1 CCAAGTGGTTCTGAAGC Sco8-2 TCAGGCACGGGAAATTG Sco8-3 AATTTCCCGTGCCTGAG C. arietis Cly5 TTCTTCGTCCAAGTTCC ClyEF3 CGAGTTCATTCTTGAATCG ClyEF1 TCGATTCAAGAATGAACTC Clyint1 CGTTTGTGTACTTTGTTAAG Clyrev2 ACCTGCGTACCTAACGTCTC Clyrev3 TGTGGATATGTCGGTCTCGC Clyrev1 GAAAATACACAACGGCACC Cly3-9-4 TTGGACTACAACTCCAGGG Cly3-9-5 GCACTTTATCAACCACTCGTG ClyEF2 CAGAAATGTAACAGCCCG ClyEF4 GCCACGTCAGTTTCGTC ClyEF5 GCAGTTCGCTGTGCCTGG ClyEF8 ATTAGACCTCCAGGCACAG ClyEF6 GTCGCCCTCCATACGCAC ClyEF7 TCAAACGGACAAATAAACTTTTTTCC L. decemlineata Lep5 GTGCAAAACATGGCGTC Lepinv TTGGTTGACCCGTAGTCACG Lep7 ACTTCTGGGGATAAGGCTG Lep8 ATTGGACACGTAGCTCACCGG Lep1 TCAACAGTAGTCAAAGCCATTTC Lepint1 CTTCGTTGAAGGTGTGTTG Lep9 AAACATAAGGAATGCCG Lep10 TGTTTTTCTTTTCTGGACTG Lep11 AGGAGCGAAAGGGAATC Lep12 TCGCAAGTCTTGGATGC Lep14 TGATCCAAATCTCGGAG Lep13 AACTGCATACACTCCGAG Lep17 GGTGCCAACAACTCCTG Lep2 GCTGACGTGTCTCACCAGTTG Lep15 TCGTTGATTGCCCTGGTC Lep16 AGTTATTCGATGGTTCGATGG L. forficatus Lit1-1 TTAGGATATGCCAACGC I, inosine positions; for degenerate positions the International Union of Pure and Applied Chemistry key was used. See Figure 1 for the position and orientation of the primers in the sequence.

first two exons, resulting in an 80-amino-acid N-terminal eoptera), and S. libatrix (Lepidoptera). Genomic and extension of SU(VAR)3-9 in D. melanogaster. cDNA fragments of the Su(var)3-9/eIF-2␥ region in these An identical genomic organization of Su(var)3-9/eIF-2␥ species were isolated by a combination of genomic PCR was found in D. erecta, which is 9–15 million years evolu- using degenerated primers, inverse genomic PCR, 5Ј tionarily distant (Powell 1997). To study the evolution- and 3Ј RACE, as well as RT-PCR (cf. materials and ary conservation of this unusual gene structure within methods). insects, we analyzed C. arietis, L. decemlineata (both Col- Sequence analysis of these DNA fragments proved 1160 V. Krauss and G. Reuter ␥ ;Lf, eIF-2 enerate PCR L. decemlineata ; Ld, C. arietis ;Ca, S. libatrix ;Sl, D. melanogaster exons are in yellow boxes, the chromo domain coding regions are ␥ gene of the centipede Lithobius. The commonly accepted phylogenetic ␥ eIF-2 eIF-2 gene structure. Positions of the used species-specific primers are given under each exon-intron scheme. Downstream-directed dicistronic transcription unit of four species and the 1998) of the analyzed species are indicated by the tree. Dm, ␥ containing undetermined intron positions is in a shaded box. All eIF-2 / D.melanogaster Thomas and Su(var)3-9 L. decemlineata Fortey ; and pA, poly(A) tail. The nomenclature of the exon-intron structure is given at the top. Introns are in open boxes and exons are in solid boxes. The 1.—The Figure relationships ( L. forficatus transcript part of indicated in red, the SAC domain coding regions are in green, and the SET domain coding regions are marked blue. Positions and directions of the used deg oligonucleotides are red and upstream-directed oligonucleotides are blue. primers are shown at the Su(var)3-9/eIF-2␥ Dicistronic Unit 1161 the location of the SU(VAR)3-9 ORF within intron 2 use of the same start site of translation in SU(VAR)3-9. The of eIF-2␥ in all these species (Figure 1). In 5Ј RACE first AUG in Su(var)3-9 mRNAs is in the same sequence experiments the transcription initiation site of eIF-2␥ context as in the eIF-2␥ transcripts. In both Leptinotarsa was determined in Clytus and Leptinotarsa. Sequence and Scoliopterix, in-frame stop codons are found 5Ј comparison with Su(var)3-9-specific RT-PCR fragments to the first AUG codon. No other methionine codon showed that exons 1 and 2 are present in both the N-terminal to the chromobox is conserved between Su(var)3-9 and the eIF-2␥ mRNAs. all the studied insect species. Furthermore, in Clytus The cDNA sequences of eIF-2␥ were completed with and Leptinotarsa no other AUG codon is found between 3Ј RACE or RT-PCR experiments. Surprisingly, cDNA the putative start AUG and the chromobox. sequences that cover exon 2a [Su(var)3-9] and exon 3 The sequence data available allow detailed analysis of (eIF-2␥) could be amplified by RT-PCR in all studied conserved and more variable regions within SU(VAR)3-9 coleopteran and lepidopteran species. Direct sequenc- and eIF-2␥ (Figure 4). No significant sequence conserva- ing of these RT-PCR amplificates yielded the interest- tion between different insect orders is found within re- ing result that in all the non-Drosophilid species the gions II, IV, and VII, which are located between the Su(var)3-9 exon 2a was fused in-frame with exon 3 of conserved regions of SU(VAR)3-9. The highest degree eIF-2␥. This was proved by RT-PCR for every species of sequence identity between all the known SU(VAR)3-9 using two different primer pairs. First, the region be- orthologs is found in the SET domain (35%), followed tween the SET domain-encoding sequence of Su(var)3-9 by the SET-domain-associated cysteine-rich (SAC; 16%) and exon 3 of eIF-2␥ was amplified. Afterward, for the two and chromo domains (12%). Phylogenetic trees were coleopteran species primer pairs were used that amplified computed using the complete SU(VAR)3-9 sequence or the sequence of the fusion transcript between Su(var)3-9 individual regions within the gene. If the SET and SAC exon 2a (Cly3-9-5 or Lep17 and ClyEF7 or Lep16, respec- domains are computed together (Figure 5), all known tively) and the 3Ј untranslated region of eIF-2␥ (cf. Figure SU(VAR)3-9 orthologs are branched together with 93% 1). In C. arietis beside the Su(var)3-9-eIF-2␥ in-frame fusion quartet puzzle and 67% bootstrap support excluding transcript, a polyadenylated Su(var)3-9 transcript variant other similar SET domain proteins as Dm E(Z) (Jones not fused with eIF-2␥ exons 3–7 has been detected in 3Ј and Gelbart 1993), Hs G9a (Milner and Campbell RACE experiments. Within Su(var)3-9 exon 2a in each of 1993), and Ce R05D3.11 (accession no. P34544). Our the studied species a specific 5Ј splice site is found (Figure results suggest a common origin of all SU(VAR)3-9 or- 2). The in-frame fusion of the SU(VAR)3-9 and eIF-2␥ thologs, independent of the diagnostic feature, which ORFs results in small terminal deletions in the ORF of is the occurrence of both the chromo and SET domains exon 2a. Furthermore, in S. libatrix two different amplifi- within all SU(VAR)3-9 proteins. cates were isolated, indicating the usage of two alternative Characterization of the eIF-2␥ gene structure in ar- 5Ј splice sites (Figure 2). This could be proved by an RT- thropods: The eIF-2␥ gene was isolated from Saccharo- PCR plot using primers Sco4-4/Sco8-2 for PCR and a 300- myces cerevisiae, S. pombe, and humans (Hannig et al. bp genomic fragment covering exon 3 within eIF-2␥ for 1993; Gaspar et al. 1994; Dorris et al. 1995). Further hybridization (Figure 3). According to the Drosophila 5Ј eIF-2␥ homologous sequences could be identified by splice site scoring table (Mount 1993), the best possible BLAST searches (Altschul et al. 1997) in Arabidopsis splice site between the SET domain and the conserved thaliana [two different genes, accession nos. AC002411 stop is used. In the Drosophila species studied no splice and AL021713, verified by overlapping expressed se- consensus sequence is found in the corresponding region quence tags (EST)] and C. elegans (genomic clone Y74C10, of Su(var)3-9. Remarkably, only three single nucleotide verified by overlapping EST). We isolated eIF-2␥ homo- substitutions between Clytus and Leptinotarsa cause the logous sequences from five holometabolic insects (D. use of a slightly differently positioned 5Ј splice site. melanogaster, D. erecta, S. libatrix, C. arietis, and L. decemli- The dicistronic organization of Su(var)3-9/eIF-2␥ was neata) as well as from the centipede L. forficatus (Figure 6). found in all the holometabolic insect species studied In all analyzed insect species eIF-2␥ is associated with (Figure 1). The mRNAs of Su(var)3-9 and eIF-2␥ in holo- Su(var)3-9 within a dicistronic transcription unit. In con- metabolic insects are produced by alternative splicing, trast, in all noninsect species studied these genes are resulting in an N-terminal fusion of the SU(VAR)3-9 pro- independent. tein with the first 80 amino acids of eIF-2␥. A fur- A maximum-likelihood tree that is based on eIF-2␥ ther fusion of SU(VAR)3-9 at the C terminus with the eIF- amino acid sequences is shown in Figure 7 and is consis- 2␥ protein appears to be caused by the usage of a 5Ј tent with a recent consensus species phylogeny (Maddi- splice site at the 3Ј end of the SU(VAR)3-9 ORF in the son 1997). However, Keeling et al. (1998) places C. coleopteran and lepidopteran species (Figures 1 and 2). elegans on different nodes in an eIF-2␥ gene tree. This Phylogenetic analysis of SU(VAR)3-9 orthologs: Be- may be due to a partially incorrect protein sequence con- cause of high sequence conservation between all known tained in the WORMPEP (rel. 16) database (Y39G10A eIF-2␥ genes, the first AUG used as a start codon can 246.C). We corrected this sequence with the help of unambiguously be deduced. There is evidence for the overlapping ESTs and improved the alignment. The Figure 2.—Splicing of intron 2b in Su(var)3-9/eIF-2␥. For nomenclature see Figure 1. Specific exons of Su(var)3-9 are green; exons of eIF-2␥ are yellow. The out-of- frame ORF of the pink boxed eIF-2␥ exon 3 sequence in Scoliopterix is in parentheses. Note that exon 3 in Scoliopterix is identical in both cases, but used in different reading frames. Also in Clytus two splice variants of Su(var)3-9 exist, one with and one without exons 3–7. Su(var)3-9/eIF-2␥ Dicistronic Unit 1163

gans (Alfonso et al. 1994), is also found in insects and mammals (Berrard et al. 1995; Kitamoto et al. 1998), indicating an early origin during evolution as well as a functional relevance of this structure. In contrast, our analysis of the Su(var)3-9/eIF-2␥ dicistronic transcrip- tion unit showed that it is only found in insects. There- fore, we conclude that the fusion of these two function- ally completely unrelated genes into a dicistronic unit arose before radiation of holometabolic insects after the crown arthropod lines (chelicerates, crustaceans, myriapods, and hexapods) had been separated. Sug- gesting a monophyletic origin of this gene arrangement, we have to place the event between 280 and 545 million years ago (Fortey and Thomas 1998). In contrast to already known cases of polycistronic clusters (Alfonso et al. 1994; Zhang et al. 1999) where the shared exons are noncoding, the fused Su(var)3-9 and eIF-2␥ genes are unique in sharing N-terminally two coding exons. The insertion of Su(var)3-9 into an intron of eIF-2␥ could be caused by either an insertion of an intron-free Su(var)3-9 gene by retroinsertion or via a translocation-like process. The ancient Su(var)3-9 gene has been lost because in none of the studied species has any indication for other Su(var)3-9 or eIF-2␥ homologous sequences been detected and it is unlikely that gene duplications exist. In the dicistronic transcription unit SU(VAR)3-9 be- comes fused with the 80 N-terminal amino acids of eIF- Figure 3.—RT-PCR amplification of Scoliopterix Su(var)3-9 2␥. Our data suggest that the same translation start site and eIF-2␥ transcripts. Oligo(dT)-primed cDNA from adults is used for both proteins, also indicating that the original was amplified by using the Su(var)3-9-specific primer pair Su(var)3-9 gene was not completely inserted into eIF-2␥. Sco4-4 and Sco8-2 or the eIF-2␥-specific primer pair Sco5-2 and Probably, the ancient promoter and at least one exon Sco8-2, respectively. The amplified regions span the intron 2b remained at the original site. In all studied species such of Su(var)3-9 or introns 1 and 2 of eIF-2␥, respectively. Frag- ments of 451 and 264 bp correspond to the major and minor as yeast, D. melanogaster, and mammals, the SU(VAR)3-9 splice variants of Su(var)3-9. The 351-bp fragment represents protein is associated with heterochromatin and is con- the transcript of eIF-2␥. The RT-PCR reaction separated on a nected with gene silencing (Tschiersch et al. 1994; 1% agarose gel was hybridized after blotting with the indicated Ivanova et al. 1998; Aagaard et al. 1999). This func- ␥ eIF-2 -exon3-specific 300-bp fragment. The three additional tional conservation indicates that the N-terminal addi- fragments in the Su(var)3-9 lane might indicate the presence of three other transcript variants. The additional signal in the tion of eIF-2␥ amino acid sequences to SU(VAR)3-9 eIF-2␥ lane is likely due to a partial spliced transcript con- in insects does not interfere with its function. This is taining intron 1. also supported by studies with transgenic Drosophila lines expressing N-terminally truncated SU(VAR)3-9 (Aagaard et al. 1999). Overexpression of both the Dro- resulting perfect agreement of the eIF-2␥ gene tree with sophila full-length as well as the N-terminally truncated the species tree argues for a true orthologue relation protein results in enhancement of gene silencing in of all compared eIF-2␥ sequences, irrespective of their position-effect variegation. quite distinct gene structure in insects, and underlines In Coleopterans and Lepidopterans an in-frame fu- the usefulness of the eIF-2␥ sequence and gene structure sion of the SU(VAR)3-9 and eIF-2␥ encoding sequences as an excellent molecular marker for phylogenetics. raises the question about possible implications for func- tion of the deduced fusion protein. The N-terminal amino acids of eIF-2␥ represent the conserved G 200ف DISCUSSION domain, involved in both GTP and MettRNA binding In only a few cases have polycistronic transcription (Dorris et al. 1995; Erickson and Hannig 1996; Erick- units in higher eukaryotes been found where unrelated son et al. 1997). In the putative fusion protein, amino proteins are specified by a single pre-mRNA after alter- acids of the G domain essential for GTP and MettRNA native splicing (Blumenthal 1998). The unc-17/cha-1 binding become separated by the inserted SU(VAR)3-9 dicistronic cluster, which was first discovered in C. ele- sequence. Structure predictions (PredictProtein server; 1164 V. Krauss and G. Reuter

Figure 4.—Alignment of SU(VAR)3-9 orthologs. Above each amino acid sequence the regional subdivision and the degree of conservation are shown. Sequences of conserved regions are colored: The common region with eIF-2␥ is yellow boxed, the chromo domain is red, the SAC domain is green, and the SET domain is blue. A block of a weakly conserved sequence [consensus E(RL)(LV)(SQ)(EF)] immediately C-terminal to the common region with eIF-2␥ is gray boxed. The SAC domain (Huang et al. 1998) contains parts before and after the SET domain. Sequences that are black boxed at the C terminus are deleted as a result of the in-frame fusion of SU(VAR)3-9 and eIF-2␥ ORFs in the indicated species. Abbreviations (in addition to Figure 1): Sp Clr4p, S. pombe Clr4p (Ivanova et al. 1998); Dm Suv39, D. melanogaster Su(var)3-9 (Tschiersch et al. 1994); Hs SUV39H1, Homo sapiens SUV39H1 (Aagaard et al. 1999). rost/predictprotein/; Rost 1996) total disruption of protein secondary structure withinف/http://www2.ebi.ac.uk indicate a complete disruption of the conserved second- the N-terminal part of the eIF-2␥ G domain fused to ary structure of the whole N-terminal region of eIF-2␥ SU(VAR)3-9. within the putative SU(VAR)3-9/eIF-2␥ fusion protein Within all Su(var)3-9 genes the position of the stop (Figure 6). Furthermore, in the SU(VAR)3-9-specific codon is conserved, indicating that this position pre- sequence immediately following the N-terminal 80–82 dates the evolution of the dicistronic gene structure. amino acids of eIF-2␥ in all the species studied, a helical In the coleopteran and lepidopteran species studied, structure is predicted for the first 5–10 amino acids. In where a transcript with a complete in-frame fusion of this region a weak sequence conservation with a consen- the SU(VAR)3-9 and eIF-2␥ ORFs is found, a noncon- sus of E(R/K)(L/V)(S/Q)(E/F) is found (gray se- served 5Ј splice donor site located at the 3Ј end of the quence block in Figure 4), which could be efficient in Su(var)3-9 ORF is used. In the putative SU(VAR)3-9/ Su(var)3-9/eIF-2␥ Dicistronic Unit 1165

Figure 5.—SAC and SET domain- based tree of SU(VAR)3-9-related protein sequences. Numbers refer to support above (quartet puzzling) and below (bootstrap) internal branches, and branch length reflects maximum- likelihood distances. Abbreviations (in addition to Figures 1 and 4): Dm E(Z), D. melanogaster E(Z) (Jones and Gelbart 1993); Hs G9a, H. sapiens G9a (Milner and Campbell 1993); Ce R05D3.11, C. elegans ORF (acces- sion no. P34544).

Figure 6.—eIF-2␥ alignment. Above the amino acid sequence the three parts of the GTP binding domain (Naranda et al. 1995) are boxed in gray. The N-terminal prolongation in budding yeast is boxed. A deletion of this region is without consequences (Erickson et al. 1997). In all sequences, identified locations of loss-of-function point mutations are shaded gray (Dorris et al. 1995; Naranda et al. 1995; Erickson and Hannig 1996; Erickson et al. 1997). Known intron positions are boxed in black. Under the sequence of the N-terminal amino acid block a secondary structure prediction (Rost 1996) for the eIF-2␥ proteins is shown. “Disturbed” indicates the prediction of secondary structure for the N-terminal eIF-2␥ region fused to SU(VAR)3-9. ␣-helices and ␤-sheets are minimal consensus predictions of structures for all shown insect proteins. Abbreviations (in addition to Figures 1 and 4): Sc GCD11p, S. cerevisiae eIF-2␥ (Hannig et al. 1993); Ce, C. elegans. 1166 V. Krauss and G. Reuter

Figure 7.—Maximum-likelihood tree of eIF-2␥ protein sequences. Numbers refer to support above (quartet puzzling) and below (bootstrap) internal branches, and branch length reflects maximum-likelihood distances. eIF-2␥ chimeric protein of coleopteran and lepidop- modifier Su(var)3-9 encode centromere-associated proteins that complex with the heterochromatin component M31. EMBO J. .most highly conserved amino acid 18: 1923–1938 470ف teran species positions of eIF-2␥ are added C-terminally to the puta- Alfonso, A., K. Grundahl, J. R. McManus, J. M. Asbury and J. B. tive SU(VAR)3-9 proteins. Whether such a fusion pro- Rand, 1994 Alternative splicing leads to two cholinergic pro- teins in Caenorhabditis elegans. J. Mol. Biol. 241: 627–630. tein is stable or post-translationally processed is not yet Altschul, S. F., T. L. Madden, A. A. Scha¨ffer, J. Zhang, Z. Zhang known. However, the fusions of SU(VAR)3-9 with eIF-2␥ et al., 1997 Gapped BLAST and PSI-BLAST: a new generation sequences might also have promoted modifications of of protein database search programs. Nucleic Acids Res. 25: 3389– SU(VAR)3-9 function. 3402. Berrard, S., H. Varoqui, R. Cervini, M. Israel, J. Mallet et al., The dicistronic Su(var)3-9/eIF-2␥ gene structure in 1995 Coregulation of two embedded gene products, choline holometabolic insects represents an exceptionally use- acetyltransferase and the vesicular acetylcholine transporter. J. ful tool for phylogenetic analysis within arthropods. Neurochem. 65: 939–942. Blumenthal, T., 1998 Gene clusters and polycistronic transcription There are considerable differences in phylogenetic con- in eukaryotes. Bioessays 20: 480–487. cepts within this phylum (Fortey and Thomas 1998). Brogna, S., and M. Ashburner, 1997 The Adh-related gene of Dro- The fusion of the two conserved Su(var)3-9 and eIF-2␥ sophila melanogaster is expressed as a functional dicistronic messen- ger RNA: multigenic transcription in higher organisms. EMBO genes into a dicistronic transcription unit is unlikely to J. 16: 2023–2031. occur several times independently. This gene arrange- Cleard, F., M. Delattre and P. Spierer, 1997 SU(VAR)3-7, a Dro- ment therefore should represent a synapomorphy sophila heterochromatin-associated protein and companion of (shared derived character) for a monophyletic group HP1 in the genomic silencing of position-effect variegation. EMBO J. 17: 5280–5288. of arthropods. Both the genomic organization as well Dorris, D. R., F. L. Erickson and E. M. Hannig, 1995 Mutations in as sequence conservation of Su(var)3-9 and eIF-2␥ allow GCD11, the structural gene for eIF-2␥ in yeast, alter translational a comprehensive molecular analysis of arthropod sys- regulation of GCN4 and the selection of the start site for protein synthesis. EMBO J. 14: 2239–2249. tematics and evolution. Ehrmann, I. E., P. S. Ellis, S. Mazeyrat, S. Duthie, N. Brockdorff We are grateful to Andreas Fischer for supplying specimens. We et al., 1998 Characterization of genes encoding translation initia- ␥ thank Dr. Michael Ashburner for critical reading of the manuscript tion factor eIF-2 in mouse and human: sex chromosome localiza- tion, escape from X-inactivation and evolution. Hum. Mol. Genet. and helpful comments. This work was supported by grants from the 7: 1725–1737. Deutsche Forschungsgemeinschaft and the Fonds der Chemischen Eissenberg, J. C., G. D. Morris, G. Reuter and T. Hartnett, 1992 Industrie (to G.R.). The heterochromatin-associated protein HP-1 is an essential pro- tein in Drosophila with dosage-dependent effects on position- effect variegation. Genetics 131: 345–352. LITERATURE CITED Erickson, F. L., and E. M. Hannig, 1996 Ligand interactions with eukaryotic translation initiation factor 2: role of the ␥-subunit. Aagaard, L., G. Laible, P. Selenko, M. Schmid, R. Dorn et al.,1999 EMBO J. 15: 6311–6320. Functional mammalian homologues of the Drosophila PEV- Erickson, F. L., L. D. Harding, D. R. Dorris and E. M. Hannig, Su(var)3-9/eIF-2␥ Dicistronic Unit 1167

1997 Functional analysis of homologs of translation initiation Milner, C. M., and D. R. Campbell, 1993 The G9a gene in the factor 2␥ in yeast. Mol. Gen. Genet. 253: 711–719. human major histocompatibility complex encodes a novel pro- Fortey, R. A., and R. H. Thomas, 1998 Arthropod Relationships. Chap- tein containing ankyrin-like repeats. Biochem. J. 290: 811–818. man & Hall, London. Mount, S. M., 1993 Messenger RNA splicing signals in Drosophila Gaspar, N. J., T. G. Kinzy, B. J. Scherer, M. Hu¨mbelin, J. W. B. genes, pp. 333–358 in An Atlas of Drosophila Genes, edited by G. Hershey et al., 1994 Translation initiation factor eIF-2: cloning Maroni. Oxford University Press, Oxford/New York. and expression of the human cDNA encoding the ␥-subunit. J. Naranda, T., I. Sirangelo, B. J. Fabbri and J. W. B. Hershey, 1995 Biol. Chem. 269: 3415–3422. Mutations in the NKXD consensus element indicate that GTP Geraghty, M. T., L. C. Brody, L. S. Martin, M. Marble, W. Kearns binds to the ␥-subunit of translation initiation factor eIF2. FEBS et al., 1993 The isolation of cDNAs from OATL1 at Xp11.2 using Lett. 372: 249–252. a 480-kb YAC. Genomics 16: 440–446. Powell, J. R., 1997 Progress and Prospects in Evolutionary Biology: The Hannig, E. M., A. M. Cigan, B. A. Freeman and T. G. Kinzy, 1993 Drosophila Model. Oxford University Press, Oxford/New York. GCD11, a negative regulator of GCN4 expression, encodes the ␥ Reuter, G., and P. Spierer, 1992 Position-effect variegation and subunit of eIF-2 in Saccharomyces cerevisiae. Mol. Cell. Biol. 13: chromatin proteins. Bioessays 14: 605–612. 506–520. Rost, B., 1996 PHD: predicting one-dimensional protein structure Huang, N., E. V. Baur, J.-M. Garnier, T. Lerouge, J.-L. Vonesch by profile-based neural networks. Methods Enzymol. 266: 525– et al., 1998 Two distinct nuclear receptor interaction domains 539. in NSD1, a novel SET protein that exhibits characteristics of both Schotta, G., and G. Reuter, 2000 Controlled expression of tagged corepressors and coactivators. EMBO J. 17: 3398–3412. protein proteins in Drosophila using a new modular P-element Ivanova, A. V., M. J. Bonaduce, S. V. Ivanov and A. J. S. Klar, 1998 vector system. Mol. Gen. Genet. 262: 916–920. The chromo and SET domains of the Clr4 protein are essential Strimmer, K., and A. V. Haeseler, 1997 PUZZLE 4.0: Maximum for silencing in fission yeast. Nat. Genet. 19: 192–195. Likelihood Analysis for Nucleotide, Amino Acid, and Two-State James, T. C., and S. C. R. Elgin, 1986 Identification of a nonhistone Data. http://evolution.genetics.washington.edu/phylip/software. chromosomal protein associated with heterochromatin in Dro- html sophila melanogaster and its gene. Mol. Cell. Biol. 6: 3862–3872. Swofford, D. L., 1993 PAUP: Phylogenetic Analysis Using Parsimony. Jones, R. S., and W. M. Gelbart, 1993 The Drosophila polycomb- Illinois Natural History Survey, Champaign, IL. group gene enhancer of zeste contains a region with sequence Tschiersch, B., A. Hofmann, V. Krauss, R. Dorn, G. Korge et al., similarity to trithorax. Mol. Cell. Biol. 13: 6357–6366. 1994 The protein encoded by the Drosophila position-effect Jones, D. T., W. R. Taylor and J. M. Thornton, 1992 The rapid variegation suppressor gene Su(var)3-9 combines domains of an- generation of mutation data matrices from protein sequences. tagonistic regulators of homeotic gene complexes. EMBO J. 13: Comput. Appl. Biosci. 8: 275–282. 3822–3831. Keeling, P. J., N. M. Fast and G. I. McFadden, 1998 Evolutionary Wallrath, L. L., 1998 Unfolding the mysteries of heterochromatin. relationship between translation initiation factor eIF-2␥ and sele- Curr. Opin. Genet. Dev. 8: 147–153. nocysteine-specific elongation factor SELB: change of function Weiler, K. S., and B. T. Wakimoto, 1995 Heterochromatin and in translation factors. J. Mol. Evol. 47: 649–655. gene expression in Drosophila. Annu. Rev. Genet. 29: 577–605. Kitamoto, T., W. Wang and P. M. Salvaterra, 1998 Structure and Zhang, Y. Q., J. Roote, S. Brogna, A. W. Davis, D. N. Barbash et organization of the Drosophila cholinergic locus. J. Biol. Chem. al., 1999 Stress sensitive B encodes an adenine nucleotide translo- 273: 2706–2713. case in Drosophila melanogaster. Genetics 153: 891–903. Lee, S. J., 1991 Expression of growth/differentiation factor 1 in the Zorio, D. A. R., N. N. Cheng, T. Blumenthal and J. Spieth, 1994 nervous system: conservation of a dicistronic structure. Proc. Natl. Operons as a common form of chromosomal organization in C. Acad. Sci. USA 88: 4250–4254. elegans. Nature 372: 270–272. Maddison, D. R., 1997 The Tree of Life homepage. http://phylogeny. arizona.edu/tree Communicating editor: J. A. Birchler