<<

Proc. Natl. Acad. Sci. USA Vol. 92, pp. 9122-9126, September 1995 Evolution

A nuclear gene of eubacterial origin in Euglena gracilis reflects cryptic endosymbioses during evolution (endosymbiotic gene transfer/eukaryotic evolution//glyceraldehyde-3-phosphate dehydrogenase/kinetoplastids) KATRIN HENZE*,ABDELFATTAH BADRt MICHAEL WETIrERNt, RUDIGER CERFF*, AND WILLIAM MARTIN*§ *Institut fuir Genetik, Technische Universitat Braunschweig, Spielmannstrasse 7, D-38023 Braunschweig, Federal Republic of Germany; tInstitut fur Botanik, Technische Universitat Braunschweig, Mendelssohnstrasse 1, D-38023 Braunschweig, Federal Republic of Germany; and tBotany Department, Faculty of Science, Tanta University, Tanta, Egypt Communicated by Russell F. Doolittle, University of California, San Diego, CA, June 26, 1995 (received for review April 5, 1995)

ABSTRACT Genes for glycolytic and Calvin-cycle glycer- may have entailed reduction and fusion of different genetic aldehyde-3-phosphate dehydrogenase (GAPDH) of higher eu- apparatuses which now comprise the three-membrane- karyotes derive from ancient gene duplications which oc- bounded genome. (iii) Cytological (13, 14) and curred in eubacterial genomes; both were transferred to the DNA sequence (15) data indicate that the host cell ofEuglena's nucleus during the course of endosymbiosis. We have cloned secondary symbiosis shared a common ancestor with kineto- cDNAs encoding chloroplast and cytosolic GAPDH from the , a group of nonphotosynthetic encompassing early-branching photosynthetic protist Euglena gracilis and trypanosomes and their relatives (16). Thus, in Euglena, nu- have determined the structure of its nuclear gene for cytosolic clear genes of endosymbiotic origin may have been transferred GAPDH. The gene contains four introns which possess un- twice: once to the algal nucleus and once more to the kineto- usual secondary structures, do not obey the GT-AG rule, and nucleus. are flanked by 2- to 3-bp direct repeats. A gene phylogeny for Biochemical studies had indicated that Euglena, like other these sequences in the context of eubacterial homologues photosynthetic , possesses two distinct glyceralde- indicates that euglenozoa, like higher eukaryotes, have ob- hyde-3-phosphate dehydrogenase (GAPDH) enzymes, the tained their GAPDH genes from eubacteria via endosymbiotic Calvin-cycle GAPDH of (GapA, EC 1.2.1.13), (organelle-to-nucleus) gene transfer. The data further suggest and the glycolytic GAPDH of cytosol (GapC, EC 1.2.1.12) that the early-branching protists Giardia lamblia and Entamoe- (17); neither is encoded in Euglena's chloroplast DNA (cp- ba histolytica-which lack mitochondria-and portions of the DNA) (18). Here we show that the nuclear GapC and GapA trypanosome lineage have acquired GAPDH genes from eu- genes of Euglena, as in other eukaryotes, are descendants of bacterial donors which did not ultimately give rise to con- ancient gene duplications which occurred in eubacterial ge- temporary membrane-bound organelles. Evidence that "cryp- nomes (4, 19). We argue that during evolution, the kineto- tic" (possibly ephemeral) endosymbioses during evolution plastid lineage as well as the amitochondriate protists Giardia may have entailed successful gene transfer is preserved in and Entamoeba have independently obtained GAPDH genes protist nuclear gene sequences. from eubacteria through cryptic endosymbiosis-i.e., in an endosymbiotic context resulting in abortive organelle genesis. Some genes for proteins essential to chloroplasts and mito- We also report the structure of Euglena's expressed nuclear chondria were encoded in the genomes of free-living anteced- GapC gene.l ents of these organelles and were transferred to the nucleus during evolution. This process, endosymbiotic gene transfer, is MATERIALS AND METHODS a special case of interkingdom horizontal gene transfer and took place in a biologically meaningful context. The contem- Isolation of Recombinant Clones. Euglena cultures (SAG porary protein products of these genes are synthesized on 1224-5/25) were grown as described (20) under a 14-hr cytosolic and reimported into the organelle of their light/10-hr dark regime aerated with 1.5% CO2. Nucleic acid genetic origin. Although intracellular gene transfer is an isolation and cDNA cloning were performed as described (21). ongoing process, the evidence suggests that most genes were The cDNA library was screened by plaque hybridization (4) transferred during the early phases of endosymbiosis (1-4). It with an end-labeled oligonucleotide, 5'-TGGTAYGAYAAN- is conceivable that DNA transferred from organelles to the GART-3'. About 106 recombinants of an Mbo I genomic nucleus may have carried not only coding sequences, but also library in AEMBL4 (22) were screened by plaque hybridization the forerunners of spliceosomal introns now widespread in with the random-labeled Not I insert of pEGC20 (encoding eukaryotic nuclei (5-7). Little is known about nuclear genes GapC; see Results). Five clones containing a 4.2-kb HindlIl from protists which branched early in eukaryotic evolution. fragment identified by Southern hybridization of genomic Euglena gracilis is well suited for the study of endosymbiosis DNA (data not shown) were purified. The hybridizing Hindlll and organellar gene transfer. (i) Euglena's plastids are sur- fragment of AEGCgllO was subcloned into pBluescript vectors rounded by three membranes instead of two and possess (Stratagene) and sequenced. Other molecular methods were as chlorophylls a and b, findings which led to the suggestion (8) described (22). that Euglena's plastids may have arisen through engulfment of Phylogenetic Analysis. The amino acid alignment (available a eukaryotic alga (secondary endosymbiosis), a notion sup- upon request) from which nucleotide sequences (368 codons ported by molecular sequence analyses (9, 10). (ii) Whereas per sequence) were aligned was produced with the LINEUP some of secondary symbiotic origin possess a vestigial program of the WISGEN package (23). A matrix of divergence nucleus (nucleomorph) of the eukaryotic symbiont (11, 12), Euglena does not. Evolutionary degeneration of the symbiont Abbreviations: GAPDH, glyceraldehyde-3-phosphate dehydrogenase; cpDNA, chloroplast DNA. §To whom reprint requests should be addressed. The publication costs of this article were defrayed in part by page charge IThe sequences from Euglena gracilis reported in this paper have been payment. This article must therefore be hereby marked "advertisement" in 4jeposited in the GenBank data base [accession nos. L21903 (GapC accordance with 18 U.S.C. §1734 solely to indicate this fact. cDNA), L21904 (GapA cDNA), and L39772 (GapC gene)]. 9122 Evolution: Henze et al. Proc. Natl. Acad. Sci. USA 92 (1995) 9123

at nonsynonymous sites (24) was used to construct a neighbor- teria have not been reported, but sufficient isolated eubacterial joining tree (25). The topology was tested by bootstrap neigh- gap gene sequences exist in the data base to reveal that the bor-joining analysis using the Dayhoff matrix (PHYLIP 3.5) common eubacterial ancestor had four or more gap genes. between protein sequences. Members of the ancestral gene family may have been lost independently in different eubacterial lineages or have not RESULTS AND DISCUSSION been characterized to date, as indicated by open branches in tI-tIV. Comparisons of sequences across different subtrees GAPDH: A Eubacterial Gene Family. Previous work indi- reveal an average of about 50% amino acid identity. Within cated that genes for eukaryotic GAPDH enzymes are descen- subtrees, average amino acid identity is about 60% or greater, dants of an ancient gene family which existed in the common except within subtree tlIl (45-50%), which contains the rapidly ancestor of extant eubacteria (4, 19, 26). This view is supported evolving Anabaena gap3 and E. coli gap2 sequences and by the GAPDH gene phylogeny in Fig. 1. To date, GAPDH receives only very weak support from bootstrap analysis, gene families of at least three members have been character- making the identification of this subtree tentative. ized in two eubacteria, Anabaena variabilis (4) and E. coli (28, Subtree tI contains by far the greatest number of sequenced 29) (the third E. coli sequence is not complete; GenBank eubacterial genes. E. coli gapl and the Serratia sequence were accession no. L09067). For gapl, gap2, and gap3 of E. coli, chosen to represent the roughly 60 sequences reported from orthologous genes have been characterized in other free-living enterics (ref. 30 and GenBank release 84). The partial se- eubacteria, as shown in the schematic topologies (subtrees) tI, quence of a gapl gene from the ,-purple bacterium Pseudo- tIl, and tIV in Fig. 1. For Anabaena gap2, orthologues have monas solanacearum was recently reported (GenBank acces- been characterized only in cyanobacteria (unpublished data) sion no. L19269); phylogenetic analysis of the C-terminal 75 aa and in the nuclei of photosynthetic eukaryotes, to which they which have been sequenced for that gene suggests that it is were transferred from the antecedents of modern chloroplasts orthologous to E. coli and Anabaena gapl, branching robustly (tII). Broad-scale surveys of GAPDH gene diversity in eubac- within tI (data not shown), supporting this interpretation of

Species and gene Taxon Enzymeor genecompartmentcluster Introntype Schematic Eubacterial Topology Zea GapC Chlorophyta cytosol GT-AG Chondnus GapC Rhodophyta cytosol GT-AG Gallus GapC Metazoa cytosol GT-AG Drosophila GapC Metazoa cytosol GT-AG 26= Entamoeba GapC Rhizopoda (no mt) cytosol none Giardia GapC Metamonada (no mt) cytosol none Chiamydomonas GapC Chlorophyta cytosol GT-AG P-purple* Saccharomyces GapC none Ascomycota cytosol _ a-purple (mt) 99 97 T. brucei GapC Euglenozoa cytosol none 100 L. mexicana GapC Euglenozoa cytosol none I rpurp e E. coli gapl rpurple gap none Serratia cyano 100~~~~~0 rpurple gap none

99 T. cruzi 10 100 GapCg Euglenozoa none T. brucei GapCg Euglenozoa glycosome none 60 L. mexicana GapCg Euglenozoa glycosome none 100 T. borelli GapCg Euglenozoa glycosome none Euglena gracilis GapC Euglenozoa cytosol structured Anabaena gapl cyanobacterium gap-pyk-tal none

Chiamydonnonas GapA Chlorophyta cp (2 mem) GT-AG a-purple 99 Zea GapA Chlorophyta cp (2 mem) GT-AG 56 Chondrus GapA Rhodophyta cp (2 mem) GT-AG rpurple 100 Euglena gr.acilis GapA Euglenozoa cp (3 mem) nd AnabaenaXgap2 cyanobactenum gap none } cyano(cp) Bacillus gram pos. gap-pgk none

Rhodobacti ler 5] a-purple fbbp-prk-tkl-gap-fda-rbcL none a-purple IS 52^ - Zymomona. a-purple gap-pgk none E. coligap2 -purple gap-pgk-fda none opurple Anabaena ggap3 cyanobacterum gap none cyano 4d ~Trichl C Parabasala (no mt) cytosol none 41 Thermus thermophile gap-pgk none _ yrpurple 100 E. coli gap3 rpurple gap none Streptococccus gram pos. gap none IVl cyano Clostridium gram pos. gap-pgk none _ gram pos.

FIG. 1. Phylogenetic distribution of GAPDH gene types, enzyme compartmentalization, operon structures, and introns. The gene phylogeny was constructed as described in Material and Methods. Bootstrap numbers indicate the number of times that the branch was detected out of 100 neighbor-joining replicates using the Dayhoff matrix distances (PHYLIP 3.5) between aligned protein sequences. Branches lacking numbers were found in <40 replicates. Eukaryotic taxon designations refer to phyla recognized by Corliss (27); no mt, amitochondriate protists. Gene designations in eubacterial operons are gap (GAPDH), pyk (pyruvate kinase), tal (transaldolase), fbp (fructose-1,6-bisphosphatase), prk (phosphoribulokinase), tkl (transketolase), fda (fructose-1,6-bisphosphate aldolase), and pgk (phosphoglycerate kinase). type refers to introns characterized in GAPDH genes; GT-AG, spliceosomal intron(s) present. Schematic eubacterial topologies tI-tIV are intended to support and clarify the interpretation of eukaryotic GAPDH genes as descendants of a eubacterial gene family; branches bearing characterized genes are shown solid; those carrying genes which have yet to be characterized (or have been lost during evolution) for the taxa indicated are not filled in. *, The (3-purple branch in tI refers to the partial sequence reported for Pseudomonas solanacearum (not shown in tree; see text). T., Trypanosoma (cruzi or brucei) or Trypanoplasma (borelli); L., Leishmania; E., Escherichia; GapCg, glycosomal GapC; purple, purple (proteo-) bacteria; gram pos., Gram-positive eubacteria; mem, membranes; nd, not determined; mt, mitochondria; cp, chloroplasts. Sequences were retrieved from GenBank (release 84). 9124 Evolution: Henze et aL Proc. Natl. Acad. Sci. USA 92 (1995) the subtree. The data do not indicate whether the Thermus and origin for Euglena's plastids nor clearly indicates from what Trichomonas sequences in Fig. 1 are orthologs of tI-tIV or type of eubacterium it was donated. whether they represent branches of further duplicate gap gene Cryptic Endosymbiosis in Kinetoplastids. In the gene phy- subtrees. logeny in Fig. 1, cytosolic GapC of Euglena is orthologous to From the Euglena gracilis cDNA library of 60,000 recombi- glycosomal GAPDH from kinetoplastids and has thus been nants, we identified 41 positive clones which hybridized to the recompartmentalized during euglenozoan evolution. Yet some GAPDH-specific oligonucleotide probe. Nineteen clones con- kinetoplastids-e.g., T. brucei and L. mexicana-also possess taining >1-kb inserts were terminally sequenced, revealing a second, distinct cytosolic GAPDH enzyme in addition to the that- 10 were identical in 5' and/or 3' untranslated regions to glycosomal form (35, 36). The overall topology of Fig. 1 makes pEGC20 (accession no. L21903), which encodes Euglena's it exceedingly unlikely that enterobacterial gapl and kineto- cytosolic GapC enzyme (see below). The remaining 9 positive plastid cytosolic GapC genes reflect - clones for which terminal sequences were determined were divergence. One or the other of these genes appears to reside identical to pEGA23, which encodes the chloroplast enzyme, in the wrong genome, suggesting that gene transfer has oc- GapA (accession no. L21904). pEGC20 and pEGA23 thus curred between the antecedents of these organisms (37), but in appear to represent Euglena's major transcripts for GapC and which direction? We argue that E. coli gapl is native to the GapA, respectively. eubacterial genome, suggesting that kinetoplastids have ac- GapA of Euglena Does Not Encode a Polyprotein. Two quired the gene for their cytosolic GAPDH from bacteria via previously studied nuclear gene-encoded chloroplast enzymes an additional gene transfer event subsequent to the acquisition of Euglena, ribulose-bisphosphate carboxylase small subunit of the glycosomal homologue. Since contemporary kinetoplas- (rbcS) and light-harvesting chloroplast protein I (LHCI), are tids and are known to carry endosymbiotic bacteria translated as multimeric precursor polyproteins whose concat- (13, 16), it seems quite plausible that the common ancestor of enate subunits are proteolytically processed to yield mature T. brucei and L. mexicana acquired its gene for cytosolic enzyme monomers subsequent to transport across the three GAPDH from a y-purple bacterial donor in a symbiotic chloroplast membranes (31, 32). GapA of Euglena encodes a context which did not result in a membrane-bound organelle. chloroplast protein but is not organized in the same unusual More distantly related euglenozoans, such as Trypanoplasma manner. pEGA23 encodes a single 333-aa mature subunit borelli and Euglena, separated from the Trypanosomal preceded by a 147-aa N-terminal extension. Northern blots of Leishmania lineage prior to the symbiotic event (Fig. 1 and ref. Euglena poly(A)+ RNA probed with pEGA23 clearly revealed 38) and thus apparently never obtained the gene. The gene for a single band at 1.8 kb (data not shown), excluding the cytosolic GAPDH in kinetoplastids provides evidence for an possibility that multimeric cDNAs might have recombined out evolutionarily recent symbiotic event which entailed gene during cloning in E. coli. transfer but did not proceed to organellogenic symbiosis. The N-terminal extension of GapA is considerably longer Amitochondriate Protists with Eubacterial GAPDH Genes. than typical transit peptides of higher plants (33). It shows no In light of this evidence, the positions of GAPDH genes from similarities to other chloroplast GAPDH transit peptides but the amitochondriate protists Giardia lamblia and Entamoeba is similar in length to the -140-aa transit peptides of three histolytica deserve particular attention. Their GAPDH genes previously characterized nuclear gene-encoded chloroplast assume an (albeit poorly supported) crown position within the proteins of Euglena (34) (Fig. 2). These contain two highly eukaryotic phylogeny, which contrasts with results obtained hydrophobic domains (underlined in Fig. 2) separated by an with other phylogenetic markers (15, 39). Have Giardia and "60-aa hydrophilic stretch, as well as topogenic signals for Entamoeba obtained eubacterial gap genes even though they targeting to the endoplasmic reticulum during precursor im- possess no obvious which would betray a port across Euglena's three outer chloroplast membranes (32, symbiotic event? Recently, Entamoeba was shown to possess 34). nuclear genes for two typically mitochondrial proteins: chap- Euglena rbcS, rbcL, psbA, and tufA sequences are more eronin cpn60 and pyridine nucleotide transhydrogenase (40). similar to chlorophyte than to rhodophyte or eubacterial The strongly mitochondrion-like phylogeny of those genes homologues, suggesting a secondary endosymbiotic origin supports the view that Entamoeba is not primitively amito- from chlorophyte antecedents for its plastids (9, 10). We chondriate. Although its position in Fig. 1 is not resolved, expected Euglena's GapA sequence to reveal the same pattern Entamoeba's GAPDH gene could derive from the same en- of homology. The gene phylogeny shows that although GapA dosymbiotic donor as its cnp6O and pyridine nucleotide trans- of Euglena is clearly a homologue of cyanobacterial gap2, it hydrogenase sequences. Similar observations were made for assumes an outgroup position relative to both chlorophyte and Giardia's bacterial-like 78-kDa glucose-regulated protein rhodophyte GapA homologues (Fig. 1). This result is incon- (GRP78)/70-kDa heat shock protein (HSP70) homologue gruent with phylogenies inferred from genes of cpDNA, which (41). Those who would argue that these protists are not are more reliable markers for plastid origins than nuclear primitively amitochondriate might contend that their posses- genes are (9, 10). Furthermore, bootstrap support for the sion of eubacterial GAPDH genes circumstantially supports outgroup position ofAnabaena gap2 in this subtree is low, and that view. sequences ofgap2 homologues from purple bacteria, which in The position in the gene phylogeny of GAPDH from a our prediction exist but have yet to be found, are lacking for different amitochondriate protist, Trichomonas vaginalis, is internal reference. Thus, the position of Euglena's GapA highly intriguing. This organism lacks mitochondria yet pos- sequence in Fig. 1 indicates a eubacterial origin for the gene sesses a DNA-free, double-membrane-bound, energy- but neither supports the view of a secondary endosymbiotic producing organelle: the (42). Trichomonas

+ 0 00 0 00 + 00+ 0 + 0+ 0 - GapA MDAKNVVVGTMGVAACVAVGMA,VGSSMSATTQMHVQPAVRATTRVQPTLAMRGQI SRLAANSQVYAEAPAAYEFQAPAQASGVSSGIAANVMAPVAAAAAFFAYKKGOSDAQTYEVADPAGVQTYGTELVFAMNATIGNKYPGNSNT> + 0000 0 000 o0 0 + - -00- 0 0-+0o 0 +0 + -0 0 0 00 ++o Lhcp MPNIXADAMKFGLAAGAAMGVIVYVLAGAASSTSLAATHVNIQQAPAVI PRMASVPSAYTIATNPIGASARVVDANYESTDYLTLPATEKSTNGSLLMIAAAGVAAAVAFVWKSVPRQQDSVINVPLLPVSVATNATSGKKS> + 0 -+ 00 00 0 0 o + 0 0 + -0 0+ 0 + 0o -0-0++ 0 0 + + + 00-+- RbcS MPFDRQPLLSGEKG0PATSLWLVGGAVIAAVCVIVNTSYNGT+LSVTARPIQAAVSQVSMARFAESGVSRGSGNRVSQAVPLMAASVGAESESRRWVASAIFLSGLFAAVALKMAMKPKVAAVLPFTSEKD> + -o -o+ o o oo oo + oo oo +-o o oo + -+o o o o o + + +oo+ - + o Pbgd MYCGRYETIGETbRGNSLNVFIGAAAGP VAAVALlINSGILATSFYSTPVRAVPQVIVPSSLAASSQLPVVPKETNIQVNSAQILYPDSTVKCGQERTITIIGVCSFLSASLFYIWKCQFGMKARTTKPADLQE3VSGGRIWSILA>

FIG. 2. Comparison of transit-peptide regions for nuclear gene-encoded chloroplast proteins ofE. gracilis. Sequences were taken from this paper and from references in ref. 34. Positively charged, negatively charged, and hydroxylated amino acids are indicated with +, -, and 0, respectively. Hydrophobic domains are doubly underlined. Evolution: Henze et al. Proc. Natl. Acad. Sci. USA 92 (1995) 9125

GAPDH has eubacterial features at both the biochemical and H 1 2 3 4 H the sequence level (43), yet like Thermus, its gene cannot be m assigned to any of the schematic topologies tI-tIV in Fig. 1. m Though it appears that Trichomonas has also obtained its 0.5 kb GAPDH gene from eubacteria (43), the data do not indicate clearly from what type of bacterium it might have obtained the Intron 2 gene, whether the gene donor was implicated in hydrogeno- d uUuu some evolution, or whether the gene represents a descendant A G GCUCo Ccs:Ae of an otherwise uncharacterized member of the ancestral gap U ' cCCCCA gene family. However, its unique position in the phylogeny N6AUU would not exclude the possibility that Trichomonas GapC represents the only truly endogenous GAPDH of eukaryotes described to date. Clearly, the schematic topologies in Fig. 1 represent a Intron 1 working hypothesis. Barring extensive differential loss, it would predictgap genes to be found which ultimately trace the 0UCAC overall course of eubacterial evolution in at least four inde- pendent subtrees. But alternative interpretations can also C~? UUAUC account for various aspects of the data. It is possible that UG dG widespread (conjugational or other) transfer of GAPDH genes 5' 3' between eubacteria has occurred in evolution, in which case tI-tIV may be erroneous, but an explanation of why such promiscuity is not observed for other genes is wanting. Also, subtree tI might consist of two separate eubacterial phylog- enies in the upper and lower portions, as suggested by the unstable position of Anabaena gapl. As a further consider- ation, eukaryotic origins have not been fully resolved (44) and conceivably could have entailed fusion of archaebacterial and eubacterial genomes, the latter (or both) of which may have possessed several gap genes. In that case, any sequence of prefusion duplication and subsequent cryptic endosymbiotic transfer events involving GAPDH genes found in contempo- rary eukaryotic chromosomes would become extremely diffi- cult to reliably reconstruct. Euglena GapC Contains Four Unusual Introns. The se- quence of the 4.2-kb genomic HindIII fragment revealed that Intron 3 GapC contains no nucleotide substitutions relative to pEGC20. Three introns of 46, 251, and 378 bp occur in the coding region; intron 4, in the 3' untranslated region, is 441 bp long. All four may assume an unusual secondary structure (Fig. AGGC 3). Less elaborate secondary structures were also noted by GUUUA Tessier et al. (45) for introns in the rbcS gene from Euglena. The acceptor and donor ends of all four introns can assume a stem-loop structure, and introns 2-4 possess further internal structures very atypical of pre-mRNA introns in higher eu- karyotes. We found no similarity between these secondary structures and those characteristic of group I (46) or group II Intron 4 (47) introns, nor could we identify a "bulging A" motif as in group III introns (48). A c All four GapC introns are flanked by 2- to 3-bp repeats, so A190 nt that their position could not be determined to the base. With the notable exception of intron 1, none of the introns can be FIG. 3. Structure of the Euglena GapC gene on the sequenced placed within their short duplication so as to conform to the 4.2-kb HindIII fragment and predicted intron RNA secondary struc- GT-AG rule (Fig. 4). The two previously characterized nu- tures. Exons are shown as boxes, coding regions are shaded black. The clear genes from Euglena, rbcS (45) and lhcp2 (49), also contain secondary structures of individual introns were generated with the introns which lack consensus MFOLD and SQUIGGLES programs of the Genetics Computer Group GT-AG borders. The Euglena (Madison, WI) package. The 190-nt region not shown in intron 4 GapC introns might possess G at position +5 (Fig. 4), a residue consists of a single 95-bp imperfect hairpin. also conserved in group III introns (48), contingent upon precise intron localization. to common ancestry from eubacterial genomes (19) nor the On the basis of Fig. 4, none of Euglena's GapC introns are view that it is due to independent insertion (50). precisely conserved in position with introns from any known The lack of consensus borders and the presence of unusual GAPDH gene. With a previous numbering scheme for secondary structures within Euglena nuclear introns raise the GAPDH intron positions (19), intron 1 is 9 bp 3' of position question of their origins, which in turn relates to the origin of 15, intron 2 is 3 bp 3' of position 32, and intron 3 is 4 bp 3' of spliceosomal introns in general. If one accepts the common- position 38. Intron 2 could be considered identical to position alities of group II and spliceosomal introns as evidence of 32 if shifted to the extreme left within the 3-bp border common ancestry, three scenarios ensue for the origin of duplication. These three positions are found in GapC but not spliceosomal introns (7): (i) group II introns invaded the in GapA or GapB and thus favor neither the view that nucleus via endosymbiotic gene transfer from organelles and conservation of intron positions in GAPDH genes may be due subsequently gave rise to spliceosomal introns, (ii) group II 9126 Evolution: Henze et aL Proc. Natl. Acad. Sci. USA 92 (1995)

S. pombe 5' :GTAWT...... AYAG: 3' 8. Gibbs, S. P. (1978) Can. J. Bot. 56, 2883-2889. S. cereviseae G:GTATGT...... YAG: Vertebrate MAG: GTAAGT...... Y,n,NCAG: G 9. Martin, W., Somerville, C. C. & Loiseaux-de Goer, S. (1992) J. Mol. Plant MAG:GTAAGT ...... TCCAG:G Evol. 35, 385-403. Group Ill (Euglena cpDNA) :NTNG...... 10. Morden, C. W., Delwiche, C. F., Kuhsel, M. & Palmer, J. D. (1992) Euglena nuclear introns BioSystems 28, 75-90.

GapC Intron 1 C.:.G ... CT'CTA:ATGA 11. Maier, U.-G. (1992) BioSystems 28, 69-74. Intron 2 GA:GCAAG...... GIXGCA:CCCA 12. McFadden, G., Gilson, P. R., Hoffmann, C. J. B., Adcock, G. J. & Intron 3 GG .:CCG ... ¶GCTGAC:TGGC Intron 4 TG:2AGGC...... AAGAG:QTTG Maier, U.-G. (1994) Proc. Natl. Acad. Sci. USA 91, 3690-3694. rbcS Intron b TGCg:CTCAGAT....GAAAC¶U:ATG 13. Walne, P. L. & Kivic, P. A. (1989) in Handbook of Protoctista, eds. * Intron C CTAT:OTCCAAGC.. GCCATCCC:GACA Intron D .C.A:.ACCAGGG... CTGG:CG Margulis, L., Corliss, J. O., Melkonian, M. & Chapman, D. J. (Jones Intron E CTAC:ACCCAGGC ...GCCATCCT:GACA & Bartlett, Boston), pp. 270-287. Intron F TTA:A AGGC... -GTTGT=C:GACA Intron G CT:A.CCAGGC...GCCAT7C:GACA 14. Surek, B. & Melkonian, M. (1986) Protoplasma 133, 39-49. * Intron H .GT...GGGCTCT:lCC 15. Sogin, M., Gunderson, J., Elwood, H., Alonso, R. M. & Peattie, D. Intron j : .CAGA.. .TGGCA=:A&GC (1989) Science 243, 75-77. Intron k CGAb.hAACAGGCT... lGG'GGCb:GGGC Intron m TT.T:AACAGCCA....GTrTCTCC:TTCC 16. Vickerman, K. (1990) in Handbook of Protoctista, eds. Margulis, L., Intron n ATGA:ACGCAGGT ... TCTGGTTC:GTTG Corliss, J. O., Melkonian, M. & Chapman, D. J. (Jones & Bartlett, Intron p GGGC:AGAGGGT...... CCcACACT:GTTr Boston), pp. 215-238. Ihcp2 Intron AAAC:GCCCAGG... ClGAGGCC:AACC * Intron 2 CC=:AACCAAAA.. .T¶GGTu:GACC 17. Theiss-Seuberling, H.-B. (1984) Plant Cell Physiol. 25, 601-609. * Intron 3 GTGG..XG... GGAGCTCC:I= 18. Hallick, R. B., Hong, L., Drager, R. G., Favreau, M., Monfort, A., * Intron 4 .:AAC.GG .CCCTGCC:AGCG Intron 5 AAGG:CC G .GCCTGTGA:GG Orsat, B., Spielmann, A. & Stutz, E. (1993) Nucleic Acids Res. 21, Intron 6 Cb:.AAG....GGGGCCM:=GT 3537-3544. * Intron 7- C :CAGG.. TGCGCM:ACTG * Intron 8 19. Kersanach, R., Brinkmann, H., Liaud, M.-F., Zhang, D.-X., Martin, CG:L GATC ATGMTg:T W. & Cerff, R. (1994) Nature (London) 367, 387-389. FIG. 4. Intron border consensus sequences from eukaryotic nuclei 20. Cook, J. R. (1968) in The Biology of Euglena II, ed. Buetow, D. E. and Euglena chloroplast group III introns in comparison to Euglena (Academic, New York), pp. 243-314. nuclear intron borders. Euglena nuclear introns flanked by short direct 21. Henze, K., Schnarrenberger, C., Kellermann, J. & Martin, W. (1994) repeats (underlined) cannot be unambiguously placed between exons; Plant Mol. Biol. 26, 1961-1973. in such cases, the placement was arbitrarily chosen here to maximize 22. Sambrook, J., Fritsch, E. F. & Maniatis, T. (1989) Molecular Cloning: A Laboratory Manual (Cold Spring Harbor Lab. Press, Plainview, the occurrence of G (or R) at position +5 of the intron, which is the NY), 2nd Ed. only recognizable feature of these introns shared with other types 23. Devereux, J., Haeberli, P. & Smithies, 0. (1984) NucleicAcids Res. 12, listed. Euglena nuclear introns other than GapC were taken from the 387-395. published sequences (45, 49). References to intron border consensus 24. Nei, M. & Gojobori, T. (1986) Mol. Biol. Evol. 3, 418-426. sequences given are found in refs. 45-49. S. pombe, Schizosaccharo- 25. Saitou, N. & Nei, M. (1987) Mol. Biol. Evol. 4, 406-425. myces pombe; S. cerevisiae, Saccharomyces cerevisiae. 26. Meyer-Gauen, G., Cerff, R., Schnarrenberger, C. & Martin, W. (1994) Plant Mol. Biol. 26, 1155-1166. introns were present in the common ancestor of eubacteria and 27. Corliss, J. (1994) Acta Protozool. 33, 1-51. eukaryotes but gave rise to spliceosomal introns only in the 28. Branlant, G. & Branlant, C. (1985) Eur. J. Biochem. 150, 61-66. latter, or (iii) group II and spliceosomal introns coexisted in 29. Alefounder, P. R. & Perham, R. N. (1989) Mol. Microbiol. 3, 723-732. the common ancestor of eubacteria and eukaryotes, but splice- 30. Nelson, K., Whittam, T. S. & Selander, R. K (1991) Proc. Natl. Acad. osomal introns became abundant only in the latter. Our Sci. USA 88, 6667-6671. 31. Chan, R., Keller, M., Canaday, S., Weil, J. & Imbault, P. (1990)EMBO findings suggest that the evolution of nuclear introns pro- J. 9, 333-338. ceeded independently in Euglena and higher eukaryotes, but 32. Houlne, G. & Schantz, R. (1988) Mol. Gen. Genet. 213, 479-486. do not distinguish among Roger et al.'s scenarios (7). At the 33. von Heijne, G., Hirai, T., Klosgen, R.-B., Stepphuhn, J., Bruce, B., structural level, the Euglena introns are sufficiently distinct to Keegstra, K. & Herrmann, R. (1991) Plant Mol. Biol. Reporter 9, suggest that they may represent a novel intron class, and their 104-126. splicing mechanism deserves further attention. 34. Kishore, R., Muchhal, U. & Schwartzbach, S. (1993) Proc. Natl. Acad. Endosymbiosis and gene transfer have to the Sci. USA 90, 11845-11849. contributed 35. Michels, P., Marchand, M., Kohl, L., Allert, S., Vellieux, F. M. D., nuclear complement of eukaryotic genes. Some symbiotic Wierenga, R. & Opperdoes, F. (1991) Eur. J. Biochem. 198,421-428. events have left very clear evidence of their occurrence in the 36. Hannaert, V., Blaauw, M., Kohl, L., Allert, S., Opperdoes, F. R. & form of organellar genomes in plastids and mitochondria. Yet Michels, P. A. M. (1992) Mol. Biochem. Parasitol. 55, 115-126. other, "cryptic" symbioses appear to have entailed gene 37. Doolittle, R., Feng, D., Anderson, K. & Alberro, M. (1990) J. Mol. transfer but resulted either in organelles which no longer Evol. 31, 383-388. possess vestigial genomes or in no organelle at all. Nuclear 38. Wiemer, E. A. C., Hannaert, V., van den IJssel, P. R. L. A., Van Roy, J., Opperdoes, F. & Michels, P. A. M. (1995)J. Mol. Evol. 40,443-454. genes in protists provide evidence for the existence of cryptic 39. Hashimoto, T., Nakamura, Y., Nakamura, F., Shirakura, T., Adachi, symbiotic events. J., Goto, N., Okamoto, K. & Hasegawa, M. (1994) Mol. Biol. Evol. 11, 65-71. We thank M. Muller, H. Brinkmann, P. Michels, and C. Schnar- 40. Clark, C. G. & Roger, A. J. (1995) Proc. Natl. Acad. Sci. USA 92, renberger for critical comments on the manuscript and C. Kohler for 6518-6521. excellent technical assistance. A.B. was the recipient of an Alexander 41. Gupta, R. S., Aitken, K., Falah, M. & Singh, B. (1994) Proc. Natl. von Humboldt fellowship. This work was supported by Grants Acad. Sci. USA 91, 2895-2899. Ma1426/1-2 and Ma1426/3-1 from the Deutsche Forschungsgemein- 42. Muller, M. (1988) Annu. Rev. Microbiol. 42, 465-488. schaft to W.M. 43. Markos, A., Miretsky, A. & Muller, M. (1993) J. Mol. Evol. 37, 631-643. 1. Weeden, N. F. (1981) J. Mol. Evol. 17, 133-139. 44. Doolittle, R. F. (1995) Proc. Natl. Acad. Sci. USA 92, 2421-2423. 2. Baldauf, S. & Palmer, J. (1990) Nature (London) 344, 262-265. 45. Tessier, L. H., Paulus, F., Keller, M., Vial, C. & Imbault, P. (1995) J. 3. Gannt, J. S., Baldauf, S., Calie, P. J., Weeden, N. F. & Palmer, J. D. Mol. Biol. 245, 22-33. (1991) EMBO J. 10, 3073-3078. 46. Cech, T. R. (1990) Annu. Rev. Genet. 59, 543-568. 4. Martin, W., Brinkmann, H., Savona, C. & Cerff, R. (1993) Proc. Natl. 47. Michel, F., Umesono, K & Ozeki, H. (1989) Gene 82, 5-30. Acad. Sci. USA 90, 8692-8696. 48. Copertino, D. W., Shigeoka, S. & Hallick, R. B. (1992) EMBO J. 11, 5. Roger, A. J. & Doolittle, W. F. (1993) Nature (London) 364,289-290. 5041-5050. 6. Ferat, J.-L. & Michel, F. (1993) Nature (London) 364, 358-361. 49. Muchhal, U. S. & Schwartzbach, S. D. (1992) Plant Mol. Biol. 18, 7. Roger, A. J., Keeling, P. J. & Doolittle, W. F. (1994) Molecular 287-299. Evolution of Physiological Process (Rockefeller Univ. Press, New 50. Stoltzfus, A., Spencer, D. F., Zuker, M., Logsdon, J. M. & Doolittle, York). W. F. (1994) Science 265, 202-207.