Proc. Natl. Acad. Sci. USA Vol. 95, pp. 11769–11774, September 1998 Evolution

Amitochondriate amoebae and the evolution of DNA-dependent RNA polymerase II

JOHN W. STILLER*, ELLEN C. S. DUFFIELD, AND BENJAMIN D. HALL

Department of Botany, University of Washington, Box 355325, Seattle, WA 98195

Communicated by Estella B. Leopold, University of Washington, Seattle, WA, August 13, 1998 (received for review May 11, 1998)

ABSTRACT Unlike parasitic groups that are de- phylogenetic analyses of small-subunit ribosomal RNA gene fined by the absence of mitochondria, the Pelobiontida is sequences (SSU rDNA) (4). In addition, the presence in composed mostly of free-living species. Because of the pres- of two genes apparently of mitochondrial origin ence of ultrastructural and cellular features that set them suggests that it is secondarily amitochondriate (5). Because no apart from all other eukaryotic organisms, it has been sug- stage has been observed in Entamoeba, however, gested that pelobionts are primitively amitochondriate and corroborative ultrastuctural evidence relating Entamoeba to may represent the earliest-evolved lineage of extant . the Pelobiontida is lacking, and proposed relationships be- Analyses of rRNA genes, however, have suggested that the tween the two taxa (2) have no support. Therefore, neither the group arose well after the diversification of the earliest- derived position of Entamoeba in phylogenetic analyses, nor evolved protists. Here we report the sequence of the gene indications that its ancestors contained mitochondria, should encoding the largest subunit of DNA-dependent RNA poly- be construed as evidence against the antiquity and possibly merase II (RPB1) from the pelobiont invertens. primitive amitochondrial nature of pelobiont amoebae. Sequences within RPB1 encompass several of the conserved Analysis of the SSU rDNA sequence from the pelobiont catalytic domains that are common to eubacterial, archaeal, Mastigamoeba balamuthi (under the synomym Phreatamoeba and eukaryotic nuclear-encoded RNA polymerases. In RNA balamuthi) also does not indicate an ancient origin of the polymerase II, these domains catalyze the transcription of all Pelobiontida (6). Although the sequence contains large inser- nuclear pre-mRNAs, as well as the majority of small nuclear tions in variable regions, and is one of the longest SSU rRNA RNAs. In contrast with rDNA-based trees, phylogenetic anal- genes yet reported, the M. balamuthi sequence branches at a yses of RPB1 sequences indicate that Mastigamoeba represents derived position near or within the so-called ‘‘crown’’ (7) an early branch of eukaryotic evolution. Unlike sequences (6, 8). A partial large-subunit rDNA sequence from from parasitic amitochondriate protists that were included in our study, there is no indication that Mastigamoeba RPB1is an unidentified species ascribed to the pelobiont Pelo- attracted to the base of the eukaryotic tree artifactually. In myxa also does not branch near the base of the eukaryotic tree addition, the presence of introns and a heptapeptide C- (9). To our knowledge, however, neither this sequence nor a terminal repeat in the Mastigamoeba RPB1 sequence, features specific identification of the taxon has been published. that are typically associated with more recently derived eu- The unusually long M. balamuthi rDNA sequence has been karyotic groups, raise provocative questions regarding models the only molecular character from a clearly defined pelobiont of protist evolution that depend almost exclusively on rDNA available for comparative evolutionary analysis. Given sequence analyses. the cytological and ultrastructural evidence for a possible ancient origin of the Pelobiontida, additional molecular mark- The protistan order Pelobiontida is composed predominantly ers are needed to provide an accurate account of their of free-living amoeboflagellates that inhabit microoxic and evolutionary position among eukaryotes. Here we report the anoxic environments (1–3). Because they lack mitochondria, sequence of the gene encoding the largest subunit of RNA Golgi bodies, and most membrane-bound organelles it has polymerase II (RPB1) from ‘Mastigamoeba invertens’ Klebs. been argued that pelobionts, along with other protist groups Analyses of this sequence suggest that Mastigamoeba repre- defined by an absence of mitochondria (Metamonada, Para- sents an early eukaryotic lineage and raise questions about basalia, and Microsporidia) branched from the eukaryotic line current models of eukaryotic evolution that depend heavily on before the establishment of an endosymbiotic relationship with phylogenetic analyses of aligned rDNA sequences. the ancestor of mitochondria and, therefore, represent the earliest eukaryotic lineages (1). Among these groups, the MATERIALS AND METHODS Pelobiontida exhibit cytologic features that have been inter- preted as ancestral even to other amitochondrial protists (2). Culturing. ‘‘Mastigamoeba invertens’’ was obtained from the The presence of a single basal body and flagellum, unique American Type Culture Collection (ATCC no. 50338; ww- flagellar-root ultrastructure, and a highly reduced endomem- w.atcc.org͞atcc.html) and cultured in #1773 Hexamita me- brane complement suggest that pelobionts are distinct from all dium following ATCC instructions. Cultures were maintained other eukaryotes (2, 3). Based primarily on these cytologic in 16 ϫ 122 mm screw-capped tubes at 25°C for 5–9 days to features, amitochondriate amoebae were proposed to be the achieve peak cell densities of 1–2 ϫ 106 cells per ml and most ancient of extant eukaryotes (2). Initial molecular anal- subsequently either subcultured or harvested. Tubes were yses have not supported this conclusion. sampled every few days and examined under ϫ320 phase- The amitochondriate amoeboid gut parasite Entamoeba histolytica does not branch near the base of eukaryotes in Abbreviations: CTD, C-terminal of RNA polymerase II; RPB1: RNA polymerase II largest subunit; RPC1, RNA polymerase III The publication costs of this article were defrayed in part by page charge largest subunit; SSU rDNA, small-subunit ribosomal RNA; Bya, billion years ago. payment. This article must therefore be hereby marked ‘‘advertisement’’ in Data deposition: The sequence reported in this paper has been accordance with 18 U.S.C. §1734 solely to indicate this fact. deposited in the GenBank database (accession no. AF083338). © 1998 by The National Academy of Sciences 0027-8424͞98͞9511769-6$2.00͞0 *To whom reprint requests should be addressed. e-mail: stiller@u. PNAS is available online at www.pnas.org. washington.edu.

11769 Downloaded by guest on October 1, 2021 11770 Evolution: Stiller et al. Proc. Natl. Acad. Sci. USA 95 (1998)

contrast microscopy to determine cell densities and to monitor puzzling trees with two randomly generated sequences and the for the presence of eukaryotic contaminants. substitution parameters described above. DNA Extraction. Cells were pelleted at 2,000 ϫ g and 4°C for 10 min. Pellets were gently resuspended in lysis buffer (2% RESULTS cetyltrimethylammonium bromide͞1.4 M NaCl͞20 mM EDTA͞100 mM Tris⅐Cl, pH 8.0͞10 ␮g͞ml RNase A) and A total of 5,866 bp were sequenced from the Mastigamoeba immediately recentrifuged at 12,000 ϫ g for 10 min to pellet RPB1 gene, from the first highly conserved region A motif unlysed bacteria and cellular debris. The supernatant was then through the stop codon. The sequence is interrupted by five extracted with an equal volume of 1:1 (vol͞vol) chloroform͞ large insertions (Fig. 1), each of which disrupts the inferred isoamyl alcohol. DNA was precipitated with two-thirds volume amino acid sequence either by introducing a stop codon or by of isopropyl alcohol, resuspended in TE (10 mM Tris⅐HC1, producing a shift in the reading frame. Three of these inser- pH7.4/1 mM EDTA) buffer, and column purified (Qiagen, tions occur in conserved regions of the RPB1 gene that contain Chatsworth, CA). no significant indels among aligned eukaryotic sequences, and Isolation of Mastigamoeba Genes. Conserved core regions A a fourth interrupts a motif in region F that is universally toGofRBP1 (10) were PCR-amplified by using a combination conserved in size, even in archaeal and eubacterial homo- of universal and specific primers as described previously (11). logues (22). In all five cases the assumption of the presence of The 3Ј end of the gene was isolated by using suppression PCR a spliceosomal intron bounded by GT-AG splice sites restores genomic walking using Mastigamoeba-specific primers in op- the RPB1 reading frame. position to linker primers ligated to genomic DNA (12, 13). All In addition to canonical splice-junction dinucleotides, the PCR fragments were cloned and sequenced in complementary Mastigamoeba insertions have other features typical of eukary- directions. otic spliceosomal introns. In four of the five insertions the Sequence and Phylogenetic Analyses. The inferred Mastig- positions immediately following the presumed GT splice donor amoeba RPB1 amino acid sequence was aligned with eukary- site match the conserved yeast consensus sequence exactly otic polymerase II, RNA polymerase III (RPC1), and archaeal (ref. 23, Fig. 1). For the most proximal insertion, which occurs largest subunits by using Clustal W (14). Sequences were in a region of RPB1 that is unconserved and difficult to align, further adjusted by eye and areas with ambiguous gaps were there are several GT-AG boundaries that can restore the ORF; removed from the alignment. The remaining gaps were treated however, none of the possible GT-splice donor sites is followed as missing data in phylogenetic analyses using unweighted by the conserved nucleotides present in the four downstream parsimony (PAUP 3.1.1 at default settings) (15), distance insertions. In contrast to yeast introns, but similar to the (PHYLIP 5.372), (16) and maximum-likelihood (PUZZLE) (17) corresponding intron regions from plants, vertebrates, and red algorithms. Distance matrices were constructed in PROTDIST by algae (23), sequences upstream (positions Ϫ3toϪ15) of the ͞ ͞ using the George Baker Hunt categories model for weighting putative 3Ј-splice acceptor sites are pyrimidine-rich (66% amino acid changes. Distance trees were produced by neigh- TϩC) in the Mastigamoeba insertions (Fig. 1). Although no bor-joining. Maximum-likelihood analyses were performed clearly conserved internal motif is present, there are several under the JTT (Jones, Taylor, and Thornton) model of amino potential branch sites in each of the five insertions. Thus, the acid-substitution probabilities and assuming a gamma distri- insertions in Mastigamoeba appear to be best characterized as bution, estimated from the data set, for rate variation among spliceosomal introns. sites. The 3Ј end of the Mastigamoeba gene encodes 25 tandemly One hundred bootstrap replicates were performed in par- repeated heptads (Fig. 2), similar to those of the C-terminal simony and distance analyses. Parsimony values were produced domain (CTD) of RPB1 sequences from so-called ‘‘crown’’ by using 10 random-edition subreplicates per round of boot- eukaryotic groups. The CTD has been found in all animals, strapping. Likelihood support values were obtained from plants, and fungi, as well as in the slime mold Dictyostelium and PUZZLE based on the number of times a given bipartition was in the soil protist Acanthamoeba, but is absent in other, supported in 10,000 quartet-puzzling trees (17). Analyses were putatively more-ancient protists (13). There are several fea- carried out by using 1,138 aligned positions from 24 RPB1, tures that differ, however, between the Mastigamoeba and RPC1, and archaeal sequences. A second set of analyses were CTD repeats. The CTD heptads are composed of the consen- performed with a somewhat larger alignment (1,232 positions) sus sequence YSPTSPS, whereas in Mastigamoeba the C of only the 15 RPB1 sequences to better assess their relative branching order in the absence of more-distant outgroups. Paired-sites tests were used to compare the relative support for different basal sequences as the earliest RPB1 branch as well as support for the early branching position of Mastig- amoeba. Alternative trees were constucted using PAUP by constraining only the relevant branches and finding the re- maining topology that was most parsimonious. These alterna- tive topologies were supplied as user trees in Kishino– Hasegawa and Templeton tests (18, 19) by using PUZZLE and PROTPARS (PHYLIP), respectively. To examine the tendency of different RPB1 sequences to attract to long branches in phylogenetic reconstructions (20), we produced 100 random sequences based on maximum- likelihood estimates (PUZZLE)ofRPB1 amino acid composi- tion using MACCLADE 3.01 (21). One hundred independent parsimony analyses were performed with the 1,232-position alignment to determine which branches of the RPB1 tree attracted each random sequence. Five separate searches were FIG. 1. Insertion sites relative to conserved domains and terminal used with random addition of taxa to find the most parsimo- sequences for five putative introns found in the Mastigamoeba RPB1 nious point of attachment for each random sequence. Because gene. Amino acid residues shown in boldface are conserved and can parsimony may be more prone to long-branch effects (20), we be aligned among all RPB1 sequences. Dinucleotides coding for also perfomed maximum-likelihood analyses of 1,000 quartet- putative splice donor and acceptor sites are also in boldface. Downloaded by guest on October 1, 2021 Evolution: Stiller et al. Proc. Natl. Acad. Sci. USA 95 (1998) 11771

AGC, and all fifth-position serine residues are encoded by TCN codons (Fig. 2). Likewise, all prolines at position three are encoded by the CCA triplet. Phylogenetic Position of Mastigamoeba RPB1. Phylogenetic analyses of aligned RPB1 sequences place Mastigamoeba out- side the so-called ‘‘crown’’ of eukaryotic evolution with strong statistical support (Fig. 3A). Parsimony and distance bootstrap values of 91% and 100%, repectively, separate the four most basal taxa, Giardia, Mastigamoeba, Trichomonas, and Trypano- soma from the remaining eukaryotic taxa. Quartet-puzzling maximum likelihood support is 95% for that node. With outgroup sequences removed (Fig. 3B) these respective sup- port values remain robust at bootstrap values of 95%, 100%, and 95% (parsimony, distance, and maximum likellihood). In addition, paired-sites tests using both parsimony (P ϭ 0.0002) and likelihood (P ϭ 0.008) analyses (Table 1) significantly reject grouping Mastigamoeba RPB1 with sequences that con- tain a canonical CTD. The relative branching order among the four basal taxa is not well-resolved. Bootstrap values are low in all analyses, and branching topologies differ among parsimony, distance, and FIG. 2. Codon usage in the 25 tandemly repeated Mastigamoeba maximum-likelihood trees. Paired-sites analyses (Table 1) heptads. The nonsynonymous substitution is indicated in boldface and indicate that placing any of these four taxa as the earliest underlined, and the resulting amino acid change is shown to the left eukaryotic branch does not represent a significantly worse of the heptad in boldface. topology, although the tree with Trypanosoma at the base is rejected at P ϭ 0.09 and P ϭ 0.07 in parsimony and likelihood terminus consists of YSPASPA repeats. In addition, in the analyses, respectively. There appears to be no significant or ‘‘crown’’ sequences there are typically numerous substitutions even consistent preference for Giardia, Trichomonas, or Mas- within individual CTD heptads that cause them to differ from tigamoeba as the most-basal RPB1 sequence. the consensus sequence. For example, in Drosophila melano- Reliability of Basal Branches? Several methods were used to gaster only 2 perfect heptads are present among 42 repeats. In assess the possibility that any of the four sequences occupying Saccharomyces, Arabidopsis, and Mus, 65%, 42%, and 40% of a deep branching position may do so because of an increased repeats, repectively, contain the consensus heptapeptide (24). rate of amino acid substitution with corresponding long- In contrast, the Mastigamoeba heptads are nearly invariant branch effects (20). We examined the frequency of individual with only a single amino acid that deviates from the consensus substitutions at the most-highly conserved positions in our among all 25 repeats (Fig. 2). The Mastigamoeba sequence also alignment. Unique autapomorphic changes at otherwise in- exhibits absolute codon bias at sixfold degenerate serine variable positions were counted, as well as nonconservative residues; all second-position serine residues are encoded by amino acid substitutions (George͞Baker͞Hunt categories)

FIG. 3. Consensus trees from parsimony, neighbor-joining, and maximum-likelihood phylogenetic analyses. Branch lengths are from maximum-likelihood analyses. (A) Tree based on an alignment of 24 RNA polymerase largest-subunit homologues. Bootstrap support values (parsimony, neighbor-joining, and maximum-likelihood) are shown at nodes relevant to the branching position of Mastigamoeba.(B) The consensus tree based on RPB1 sequences produced in the absence of outgroups. Downloaded by guest on October 1, 2021 11772 Evolution: Stiller et al. Proc. Natl. Acad. Sci. USA 95 (1998)

Table 1. Paired-sites analyses with maximum-likelihood͞parsimony of alternative tree topologies with each of the four earliest-branching sequences constrained as the most-basal branch ␦ SD-ML͞ Tree tested 1nL͞steps SD-MP P Most basal RPB1 branch Giardia best͞15 —͞25 —͞0.54 Trichomonas 18.4͞best 26.1͞— 0.48͞— Trypanosoma 44.5͞31 24.7͞18.3 0.07͞0.09 Mastigamoeba 9.1͞28 13.5͞24.3 0.50͞0.25 M.i. ϩ CTD taxa 97.3͞102 36.9͞26.8 0.008͞0.0002 Also tested is the placement of Mastigamoeba (M.i.) with the CTD-containing taxa. ␦͞nL, change in natural logarithm of likelihood; SD, standard deviation; ML, maximum likelihood; MP, Maximum parsimony. FIG. 5. Substitutions (boldface and underlined) in the most highly (16) at otherwise conservative positions. Because some in- conserved polymerase largest-subunit motifs. Ac, Acanthamoeba; At, crease in such changes is expected in more ancient eukaryotes, Arabidopsis; Dd, Dictyostelium; Gl, Giardia; Hh, Halobacterium; Hs, we also scored the reverse condition, that is, positions at which Homo; Mi, Mastigamoeba; Mt, Methanobacterium; Sa, Sulfolobus; Sc, each basal taxon matched an invariable or conserved amino Saccharomyces; Tb, Trypanosoma; Tv, Trichomonas; B1, RPB1; C1, acid in the archaebacterial sequences that did not occur in RPC1. other RPB1 sequences. There is an overabundance of substitutions at highly con- protists and, therefore, that they had far more time to accu- served positions primarily in Giardia but also in Trichomonas mulate unique differences in their RPB1 sequences. In that (Fig. 4A). Mastigamoeba RPB1 has the fewest of these unique event we may also expect to see an excess of unique conserved substitutions of the four basal sequences. The Giardia se- residues shared between Giardia or Trichomonas and archae- quence has several amino acid substitutions at positions that bacterial outgroups. This is because equally numerous but are invariant in RPB1 and archaebacterial sequences and in different autapomorphic substitutions also should have accu- nearly all largest-subunit homologues (ref. 25, Fig. 5). For mulated in the long branch leading to all remaining eukaryotic example, there are differences in the Giardia sequence at groups. The number of such putatively ancestral sites, however, positions in the D and G conserved motifs that are otherwise is approximately equal in the sequences from each of the four strongly conserved and are known to participate in forming the most basal branches of the RPB1 tree (Fig. 4A). catalytic center of DNA-dependent RNA polymerases (25). Random Rooting of RPB1 Trees. To further explore the Furthermore, the only other sequence in our full alignment tendency of long branches to attract each of the four basal taxa, that has a substitution anywhere in the highly conserved D we rooted RPB1 trees with 100 randomly generated sequences motif is the largest subunit of Giardia RNA polymerase III (26). Of the 202 most parsimonious trees produced in these (Fig. 5). analyses, the random sequence was attracted to the Giardia Given the great number of substitutions at the most- branch 88% of the time (Fig. 4B). In contrast, Mastigamoeba conserved RPB1 positions, it stands to reason that the Giardia was attracted to the random sequence only once among all 202 sequence would have evolved even more rapidly at less- most-parsimonious trees. When Giardia was removed from the conserved sites. This increase in substitution rate appears to be alignment and the analyses repeated, the Mastigamoeba se- reflected in the extremely long branch length of Giardia RPB1 quence still failed to attract the random sequence (only 6%) in phylogenetic analyses (Fig. 3). Moreover, the Giardia se- compared with Trichomonas (53%) and Trypanosoma (41%). In analyses performed using maximum likelihood, the random quence is the only one of the four basal sequences that deviates ϩ significantly (␹2 test) from a maximum-likelihood estimate of sequences were attracted to Giardia in 96%, Giardia RPB1 amino acid composition for all eukaryotes. Trichomonas in 19%, and Trichomonas alone in 3% of quartet- A possible explanation for the high level of autapomorphic puzzling bipartitions. Although these data suggest that the posi- substitutions in Giardia or Trichomonas may be that they tions of Giardia and possibly Trichomonas near the base of the represent groups that arose long before the emergence of other RPB1 clade should be viewed with caution, there is no indication that the position of Mastigamoeba is artifactual.

DISCUSSION Phylogenetic analyses based on conserved domains of RPB1 provide strong evidence that Mastigamoeba belongs to an early branch of the eukaryotic tree. Unlike other branches near the base of the RPB1 clade, there is no indication that the Mastigamoeba sequence is attracted to outgroups because of an increased rate of substitution or a biased amino acid compo- sition. An early evolutionary origin of the Pelobiontida is FIG. 4. Indicators for long branch attraction among RPB1 se- consistent with the proposal that the group diverged before the quences. (A) The number of unique changes in each sequence at endosymbiotic origin of mitochondria (2, 3, 27) and with otherwise universally conserved sites as well as sites with a conserved features of cellular ultrastructure that suggest they may be the ancestral character shared only with archaebacterial outgroups. G, most ancient surviving eukaryotic lineage (2, 3). Giardia; Tr, Trichomonas; Ty, Trypanosoma;M,Mastigamoeba;P, Plasmodium; R, red algae; C, other crown taxa. Averages are shown for C-Terminal Repeats and Introns in an Ancient ? red algae and for other crown eukaryotes. (B) The number of times The presence of introns and a heptad repeat in the RPB1 gene randomly generated sequences were attracted to each taxon in 100 of an early eukaryote raise intriguing questions. The CTD, independent parsimony analyses. consisting of tandemly repeated heptapeptides with the con- Downloaded by guest on October 1, 2021 Evolution: Stiller et al. Proc. Natl. Acad. Sci. USA 95 (1998) 11773

sensus sequence YSPTSPS, appears to be a synapomorphic One of the key functions of the CTD is to act as an feature that unites certain ‘‘crown’’ eukaryotes including green organizational platform for bringing together the elongating plants, animals, fungi, and several related protist lineages (12, RNA polymerase II, spliceosomes, and related splicing factors 28). Heptads with the CTD consensus are not present in RPB1 for the efficient cotranscriptional excision of large numbers of genes isolated from any eukaryote outside of these taxa (Fig. introns that are present in many genes (32). The presence of 3) including red algae, which appear to have emerged just both introns and a tandemly repeated RPB1 heptad in Mas- before the radiation of CTD-containing lineages (13, Fig. 3). tigamoeba, therefore, is probably not coincidental. Spliceoso- Mastigamoeba RPB1, however, is not the only gene from a mal introns and C-terminal repeats are absent from other eukaryote outside of this group that has tandem heptads with putatively ancient protist lineages (13, 33). That they are both a non-CTD consensus. The apicomplexan parasite Plasmo- present in Mastigamoeba RPB1 suggests that its heptad repeat dium falciparum also contains an RPB1 C-terminal heptapep- may act to recruit spliceosomes and that the repeat arose tide with its own regularly repeated difference from the CTD concurrently with the spread of introns through the Mastig- consensus: lysine in place of serine at position seven (29). The amoeba genome. Plasmodium heptads also show codon bias and, based on The presence of introns in Mastigamoeba RPB1 raises an phylogenetic analyses, are believed to be the result of a recent additional question regarding the placement of pelobionts amplification of this YSPTSPK heptad only in the P. falcipa- among the earliest eukaryotic lineages. The observed distri- rum lineage (30). bution of introns among eukaryotic genomes has led to the Likewise, the unique characteristics of the tandem repeats idea that their presence is restricted to more recently evolved in Mastigamoeba are most easily explained if they resulted from groups (33). The relatively derived position of Mastigamoeba the amplification of a YSPASPA heptad that occurred inde- in rDNA trees and the presence of introns in Mastigamoeba pendently of the initial amplification of the YSPTSPS repeats RPB1 appear at first glance to provide two independent pieces present in certain ‘‘crown’’ eukaryotes. Small tracts with partial of evidence that the Pelobiontida is not an ancient eukaryotic sequence similarity to CTD heptads are present in virtually all lineage. However, the argument that introns first appeared in eukaryotic RPB1 C termini. Although no tandem repeats late-evolving eukaryotes is based on correlations between occur in other putatively ancient protists, the RPB1 C termini intron presence or absence along with position on the rDNA of Trichomonas, Giardia, and Trypanosomids all are enriched tree (33). Therefore, both pieces of evidence absolutely de- in amino acids that make up the CTD, and each contains pend on the ability of rDNA phylogenies to recover the earliest fragmentary sequences that are identical to portions of the events in eukaryotic evolution accurately. CTD consensus heptapeptide. Presumably the ancestral RPB1 Early Eukaryotic Evolution and rDNA. It has been sug- gene had similar sequences that provided the raw material for gested that parasitic organisms cluster at the base of the rDNA subsequent but independent heptad multiplications in differ- tree because of increased rates of sequence evolution or ent lineages. The hypothesis that distinct repeats were ampli- variant base composition (34–36). A growing body of cyto- fied independently in Mastigamoeba and in the ‘‘crown’’ group logic, biochemical, and now molecular evidence (34) suggests is supported by their statistically robust separation in all that such an artifactual misplacement in rDNA trees is true for RPB1-based phylogenetic analyses (Fig. 3, Table 1), combined at least one parasitic amitiochondriate group, the Microspo- with the lack of a canonical CTD in groups that branch ridia. Phylogenetic analyses of both mitochondrial-type HSP70 between the Mastigamoeba and the CTD-containing taxa. (34) and tubulin (37) sequences indicate a close relationship It is also possible that the common ancestor of Mastigamoeba between microsporidia and fungi. Although phylogenies based and ‘‘crown’’ eukaryotes contained a CTD, but that the on EF-1␣ place a microsporidian as the most basal eukaryotic consensus sequence diverged between the two lineages. These taxon, the presence of a shared insertion is consistent with a ancestral repeats would then have been lost independently at close relationship between microsporidia and fungi and sug- least from red algae and from the Plasmodium lineage and then gests that the deep branching position is artifactual, because of regained in Plasmodium falciparum. In this case, small heptad- the extremely long branch length associated with this micro- like segments in other protists also may represent the remains sporidian sequence (38). Despite the evolutionary connection of degenerate CTD regions. Given the centrality of the CTD to fungi indicated by these various independent characters, to the entire mRNA transcription cycle (ref. 31, and see below) microsporidia branch consistently near the root of the eukary- such wholesale loss in so many different lineages seems otic tree in rDNA-based phylogenetic analyses, even when unlikely; however, the functional role of tandem repeats has corrections are made for variations in substitution rates and been investigated only in plants, animals, and fungi (31). base composition among sequences (39). Repeated heptads might have been under different selective We have provided evidence that Giardia and probably constraints during the evolutionary history of other eukaryotic Trichomonas RPB1 sequences are unusually divergent and lineages. attract long branches. This suggests that their deep branching The functional significance of the Mastigamoeba heptads position in RPB1 phylogenies may be unreliable, perhaps also is unclear. The CTD in animals, plants, and fungi acts as because of more rapid rates of sequence evolution in these a coordinating center for much of the mRNA transcription organisms. Data from microsporidia suggest that such in- cycle (31, 32). In an unphosphorylated state, the CTD plays a creased substitution rates can affect multiple genes (34, 38). key role in the assembly of polymerase II holoenzyme com- Given the deep, robust, but apparently incorrect position of ponents needed to establish the transcription initiation com- microsporidia in rDNA phylogenies, it seems reasonable to ask plex. Subsequent phosphorylation of the CTD allows RNA whether similar effects may underlie the early branching polymerase II to release the promoter and undergo processive position of other protists as well. elongation. When phosphorylated, the CTD also interacts with Fossils and Molecular Clocks. The extremely long inter- a myriad of protein cofactors that mediate most aspects of nodes on rDNA trees between archaebacterial outgroups and pre-mRNA processing (31). The coupling of these various the first branches of the eukaryotic ‘‘crown’’ radiation (40) also stages of mRNA synthesis afforded by the RNA polymerase II are inconsistent with the fossil record and with dates for the CTD is essential to the complex development and tissue origin of eukaryotes from protein sequence-based molecular differentiation that occurs in multicellular plants and animals. clock estimates. Fossils of multicellular algae that are consid- Which if any of these functions are performed by noncanonical ered part of the rDNA crown radiation indicate that several of C-terminal heptads in a pelobiont amoeba is unknown; how- these groups already had achieved broad diversity by 1 billion ever, the presence of introns in Mastigamoeba suggests at least years ago (Bya) (41). Large, blade-like algae are found from at one role for these tandem repeats. least 1.7 Bya (42), whereas multicellular eukaryotes of un- Downloaded by guest on October 1, 2021 11774 Evolution: Stiller et al. Proc. Natl. Acad. Sci. USA 95 (1998)

known affiliation appear as long ago as 2.1 Bya (43). Molecular 6. Hinkle, G., Leipe, D. D., Nerad, T. A. & Sogin, M. L. (1994) clock calibrations based on numerous protein-coding genes Nucleic Acids Res. 22, 465–469. place the origin of all eukaryotes at just over 2 Bya (44). 7. Knoll, A. H. (1992) Science 256, 622–627. Independent clock estimates from rDNA sequences suggest an 8. Cavalier-Smith, T. & Chao, E. E. (1996) J. Mol. Evol. 43, 551–562. origin for all ciliated protists, a lineage that emerges near the 9. Morin, L. & Mignot, J.-P. (1995) Eur. J. Protistol. 31, 402. 10. Jokerst, R. S., Weeks, J. R., Zehring, W. A. & Greenleaf, A. L. base of the ‘‘crown’’ radiation, at over 2 Bya (45). (1989) Mol. Gen. Genet. 215, 266–275. Taken together, these mutually consistent data imply that 11. Stiller, J. W. & Hall, B. D. (1997) Proc. Natl. Acad. Sci. USA 94, the earliest branching ‘‘crown’’ taxa began to radiate shortly 4520–4525. after the common origin of all extant eukaryotes. Yet the vast 12. Siebert, P. D., Chenchik, A., Kellogg, D. E., Lukyanov, K. A. & majority of the total length of the rDNA eukaryotic tree occurs Lukyanov, S. A. (1995) Nucleic Acids Res. 23, 1087–1088. in the segments leading to the radiation of multicellular 13. Stiller, J. W. & Hall, B. D. (1998) J. Phycol. 34, in press. organisms (40). Indeed, the assertion that ‘‘crown taxa’’ 14. Thompson, J. D., Higgens, D. G. & Gibson, T. J. (1994) Nucleic evolved late in eukaryotic evolution is based largely on the Acids Res. 22, 4673–4680. extremely long branches at the base of the rDNA tree (7). If 15. Swofford, D. L. (1993) PAUP (Illinois Natural History Survey, the fossil record and protein-clock data are more accurate Champaign, IL), version 3.1.1. Cladistics 5, reflections of the evolutionary record, most of this length in 16. Felsenstein, J. (1989) 164–165. 17. Strimmer, K. & von Haeseler, A. (1996) Mol. Biol. Evol. 13, rDNA trees must be artifactual. 964–969. Statistical analysis of over 200 eukaryotic rDNA sequences, 18. Kishino, H. & Hasagawa, M. (1989) J. Mol. Biol. 29, 170–179. accounting for biases in base composition and rate variation, 19. Felsenstein, J. (1985) Syst. Zool. 34, 152–161. as well as nonindependence of substitutions across sites, 20. Hendy, M. D. & Penny, D. (1989) Syst. Zool. 38, 297–309. indicates that the basal branches of the eukaryotic tree are not 21. Maddison, W. P. & Maddison, D. R. (1992) MACCLADE (Sinauer, well-resolved in rDNA phylogenies (39). If, as suggested by Sunderland, MA), version 3.0. these authors, the base of the rDNA tree is reduced to a 22. Pu¨hler, G., Leffers, H., Gropp, F., Palm, P., Klenk, H.-P., polytomy, the conflict between protein and rDNA-based Lottspeich, F., Garrett, R. A. & Zillig, W. (1989) Proc. Natl. phylogenetic analyses of pelobionts is eliminated. It is partly Acad. Sci. USA 86, 4569–4573. because of historical accident that rDNA was the first gene to 23. Liaud, M.-F., Brandt, U. & Cerff, R. (1995) Plant Mol. Biol. 28, 313–325. be sampled widely, thus providing the first molecular frame- 24. Corden, J. L. (1990) Trends Biochem. Sci. 15, 383–387. work for early eukaryotic evolution. An objective understand- 25. Mustaev, A., Kozlov, M., Markovtsov, V., Zaychikov, E. ing of the complex historical clues contained in modern Denissova, L. & Goldfarb, A. (1997) Proc. Natl. Acad. Sci. USA genomes must leave room for a broad interpretation of 94, 6641–6645. accumulating molecular data from the protist world, and not 26. Graham, S. (1997) Ph.D. Dissertation (University of Toronto, be restricted by phylogenetic topologies based on rDNA or by Canada). any other single gene sequence. 27. Brugerolle, G. (1991) Protoplasma 164, 70–90. The RPB1 sequence from Mastigamoeba raises many ques- 28. Klenk, H.-P., Zillig, W., Lanzendo¨rfer, M., Brampp, B. & Palm, tions about eukaryotic evolution. Additional RPB1 genes P. (1995) Arch. Protistenkd. 145, 221–230. should be sampled from diverse eukaryotes to provide a larger 29. Li, W.-B., Bzik, D. J., Gu, H., Tanaka, M., Fox, B. A. & Inselburg, J. (1989) Nucleic Acids Res. 17, 9621–9636. data set for phylogenetic analyses, a better understanding of 30. Giesecke, H., Barale, J.-C., Langsley, G. & Cornelissen, W. C. A. the evolutionary history of C-terminal repeats, their relation- (1991) Biochem. Biophys. Res. Commun. 180, 1350–1355. ship to the presence of introns, and the evolution of mRNA 31. Steinmetz, E. J. (1997) Cell 89, 491–494. transcription in general. In particular, RPB1 sequences from 32. Corden, J. L. & Patturajan, M. (1997) Trends Biochem. Sci. 22, other pelobionts will help to determine whether the heptad 413–416. repeats and introns in ‘Mastigamoeba invertens’ are represen- 33. Palmer, J. D. & Logsdon, Jr., J. M. (1991) Curr. Opin. Genet. Dev. tative of the Pelobiontida as a whole. Molecular analyses also 1, 470–477. are needed to determine whether there is vestigial evidence, as 34. Germot, A., Philippe, H. & Le Guyader, H. (1997) Mol. Biochem. has been found in all parasitic protist groups (34, 40), that Parasitol. 87, 159–168. pelobionts once contained mitochondria. Their simple intra- 35. Siddall, M. E., Hong. H. & Desser, S. S. (1992) J. Protozool. 39, 361–367. cellular structure, ultrastructural features, and lack of para- 36. Hasegawa, M. & Hashimoto, T. (1993) Nature (London) 361, 23. sitism, all combined in organisms that occur sympatrically with 37. Keeling, P. J. & Doolittle, W. F. (1996) Mol. Biol. Evol. 13, many mitochondriate protists (3), argue for a further explo- 1297–1305. ration of the intriguing hypothesis that pelobiont amoebae may 38. Baldauf, S. L. & Doolittle, W. F. (1997) Proc. Natl. Acad. Sci. be a true lineage of primitively amitochondriate eukaryotes. USA 94, 12007–12012. 39. Kumar, S. & Rzhetsky, A. (1996) J. Mol. Evol. 42, 183–193. 1. Cavalier-Smith, T. (1983) in Endocytobiology II, eds. 40. Sogin, M. L. (1997) Curr. Opin. Genet. Dev. 7, 792–799. Schwemmler, W. & Schenk, H. E. A. (de Gruyter, Berlin), pp. 41. Xiao, S., Zhang, Y. & Knoll, A. H. (1998) Nature (London) 391, 1027–1034. 553–558. 2. Cavalier-Smith, T. (1991) Biosystems 25, 25–38. 42. Shixing, Z. & Huineng, C. (1995) Science 270, 620–622. 3. Simpson, A. G. B., Bernard, C., Fenchel, T. & Patterson, D. J. 43. Han, T.-M. & Runnegar, B. (1992) Science 257, 232–235. (1997) Eur. J. Protistol. 33, 87–98. 44. Feng D.-F., Cho, G. & Doolittle, R. F. (1997) Proc. Natl. Acad. 4. Sogin, M. L. (1991) Curr. Opin. Genet. Dev. 1, 457–463. Sci. USA 94, 13028–13033. 5. Clark, C. G. & Roger, A. J. (1995) Proc. Natl. Acad. Sci. USA 92, 45. Wright, A. D. G. & Lynn, D. H. (1997) Arch. Protistenkd. 148, 6518–6521. 329–341. Downloaded by guest on October 1, 2021