<<

Spliceosomal introns in the deep-branching vaginalis

Sˇ teˇpa´ nka Vanˇa´cˇova´*†, Weihong Yan*, Jane M. Carlton‡, and Patricia J. Johnson*§

*Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, CA 90095; and ‡The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850

Edited by W. Ford Doolittle, Dalhousie University, Halifax, NS, Canada, and approved February 2, 2005 (received for review October 11, 2004) have evolved elaborate splicing mechanisms to remove introns thought to be excised by a spliceosome-independent introns that would otherwise destroy the protein-coding capacity mechanism (10, 11). In contrast, apicomplexan parasites Plas- of genes. Nuclear premRNA splicing requires sequence motifs in modium and Toxoplasma and entamoebids (12) appear to have the intron and is mediated by a ribonucleoprotein complex, the only conventional cis-splicing introns and no transsplicing. spliceosome. Here we demonstrate the presence of a splicing The two deepest-branching lineages are the apparatus in the protist and show that RNA and lineages (13–15). Only three motifs found in yeast and metazoan introns are required for diplomonad genes are known to contain introns (16, 17), and no splicing. We also describe the first introns in this deep-branching intron-containing genes have been reported in the parabasalid lineage. The positions of these introns are often conserved in lineage. However, the presence of a highly conserved essential orthologous genes, indicating they were present in a common spliceosomal protein, PRP8, in the parabasalid, Trichomonas ancestor of trichomonads, yeast, and metazoa. All examined T. vaginalis (18), provides evidence that splicing is likely to occur. vaginalis introns have a highly conserved 12-nt 3؅ splice-site motif The spliceosomal machinery and the RNA motifs required for that encompasses the branch point and is necessary for splicing. splicing have not been examined for either of these lineages. This motif is also found in the only described intron in a gene from Such analyses are critical, because the only known intron from another deep-branching eukaryote, intestinalis. These the diplomonad, Giardia, lacks the canonical 5Ј GT essential for studies demonstrate the conservation of intron splicing signals splicing in yeast and metazoa (16), suggesting that splicing across large evolutionary distances, reveal unexpected motif con- mechanisms in deep-branching may differ (19). servation in deep-branching lineages that suggest a simplified Here we examine the motifs required for the parabasalid T. mechanism of splicing in primitive unicellular eukaryotes, and vaginalis to splice heterologous introns and characterize the first support the presence of introns in the earliest eukaryote. introns found in this lineage. These introns share an extremely conserved extended 3Ј SS motif that encompasses the branch evolution ͉ gene expression ͉ old introns ͉ splicing point and is necessary for splicing with the only known intron in Giardia. Their positions within respective genes are also often conserved in orthologous genes in distal lineages, indicating the plicing of nuclear premRNA plays a central role in eukary- presence of introns in this deep-branching lineage before its otic gene expression and contributes to gene regulation and S divergence from its common ancestor with plants and . protein diversity (1, 2). Splice-site (SS) motifs, recognized by small nuclear RNAs and protein components of the spliceosome, Materials and Methods coordinate multiple steps in splicing (1). Precision is required to Cultures. T. vaginalis strain T1 was maintained in Diamond’s generate mRNAs that encode functional proteins, yet the SS complete media supplemented with iron (20). motifs that define exon͞intron junctions are short and often weakly conserved. Two types of spliceosomal introns, U2- and Generation of Chloramphenicol Acetyltransferase (CAT) Constructs U12-dependent, have been described in eukaryotes (reviewed in Containing Introns. The CAT gene was inserted into the Asp-718 ref. 3). Groups I and II introns, capable of self splicing in vitro and BamHI restriction sites of a pBluescript vector that had and present in some eubacterial genomes, are also found in previously been subjected to site-specific mutagenesis to disrupt organellar genomes of unicellular eukaryotes and plants (re- the two PvuII restriction sites. Forward and reverse oligomers viewed in refs. 4 and 5). Most spliceosomal introns are U2- corresponding to the sequence of each specific intron tested were Ј dependent and are bordered at the 5 end by a consensus of 6 nt, annealed and blunt-end ligated into the PvuII site of the CAT Ј Ј encompassing the invariant 5 GT. The 5 splice-site consensus gene within this construct. The CAT genes containing these in yeast (Saccharomyces cerevisiae) introns is generally highly introns were subsequently inserted at the Asp-718 and BamHI conserved, whereas the consensus in mammalian introns is often restriction sites to the T. vaginalis expression vector Master-Neo Ј Ј limited to the 5 GT. The 3 SS motif of mammalian and yeast (21) that contains a neomycin phosphotransferase selectable U2-dependent introns is shorter and typically limited to the last marker. The sequence of each construct was verified by sequencing. 3nt(T͞CAG3Ј) (1, 3, 6, 7). In addition to the 5Ј and 3Ј SS motifs, yeast introns also require a conserved branch-point sequence Transfections. Transfections were conducted as described (20), (TACTAAC), which is only loosely conserved in mammalian and cultures were selected with the drug Gentamicin (GIBCO) U2-dependent introns. (G418) for 5 days before harvesting to assess CAT activity. Spliceosomal introns are common in and plant genes, absent in eubacterial and archaeal genes, and found in several unicellular protists. The prevalence of splicing in protists is This paper was submitted directly (Track II) to the PNAS office. unknown, and the properties of protist introns vary considerably. Freely available online through the PNAS open access option. In euglenoids and trypanosomes, both transsplicing and cis- Abbreviations: SS, splice site; CAT, chloramphenicol acetyltransferase; STK, serine-threo- splicing U2-dependent introns are present (8). Although virtu- nine protein kinase. ally all trypanosome premRNAs undergo transsplicing, only one †Present address: Biozentrum, University of Basel, Klingelbergstrasse 70, CH-4056 Basel, cis-splicing intron has been identified (9). In euglenoids, U2- Switzerland. dependent cis-splicing introns are common; however, a subset of §To whom correspondence should be addressed. E-mail: [email protected]. nuclear genes are also present that contain nonconventional © 2005 by The National Academy of Sciences of the USA

4430–4435 ͉ PNAS ͉ March 22, 2005 ͉ vol. 102 ͉ no. 12 www.pnas.org͞cgi͞doi͞10.1073͞pnas.0407500102 Downloaded by guest on September 28, 2021 CAT Assay. CAT assays were performed on cell lysates by using radioactively labeled chloramphenicol C14, as described by Promega.

DNA and RNA Isolation. Genomic DNA from T. vaginalis was isolated by using the guanidium hydrochloride method as de- scribed (22). Total RNA was isolated by using a lithium chloride procedure and enriched in mRNA by using a Promega mRNA isolation kit.

RT-PCR Analyses. The first-strand synthesis for RT-PCR was generated with SuperScript RTIII (Invitrogen) on RNaseH- treated mRNA with a CAT-specific reverse primer, AAGGC- CGGATAAAACTTGTGCT. Four microliters of the resulting cDNA was then used for PCR with a combination of CAT- specific primers (CATFor2, TCAGTTGCTCAATGTAC- CTATAACCAG; CATRT2, TCATCAGGCGGGCAA- GAATGT). The final PCR products were resolved on a 2% agarose gel.

Confirmation of Trichomonas Endogenous Introns. RNA was isolated by TRIzol preparation (Invitrogen). The first-strand cDNA was generated on total RNA with the oligo dT primer by using Superscript RT III reverse transcriptase. A list of oligonucleo- tides can be found in Supporting Text, which is published as supporting information on the PNAS web site.

RNA Blot Analysis. Total RNA or mRNA was separated on a 1.2% agarose gel and blotted to nitrocellulose by using standard procedures. Hybridization probes were labeled by nick transla- tion by using DNA polymerase (Boehringer), and hybridizations were conducted at 65°C in 3ϫ SSC, followed by washing at the same temperature, reducing the salt concentration to 0.1 ϫ SSC. Fig. 1. T. vaginalis can splice a 35-nt intron. (A) CAT activity in T. vaginalis DNA Sequencing. All RT-PCR products were subcloned into the cells transfected with plasmid constructs of a CAT gene containing the wild- TOPO vector (Invitrogen). DNA sequencing was performed type Giardia intron (Int2) and mutations of this 35-nt sequence (Int1 and with M13 forward and reverse primers by using the Big Dye Int3–12). Mutations are indicated in bold and underlined. The positive control version 3 labeling kit (Applied Biosystems). The sequence was (at the top) contains a construct with an intron-less CAT gene and the negative obtained by using an Applied Biosystems 3700 Capillary DNA control (at the bottom) has a CAT gene with the Int1 intron inserted in the opposite orientation. CAT activity is expressed as cpm of n-butyrylated chlor- Analyzer. amphenicol per microgram of protein. The number of independent transfec- tions tested are designated in brackets, and standard error bars are shown. (B) Algorithm for the Intron Search. A PERL script was written to search RT-PCR products separated on a 2% agarose gel show two populations of CAT

the preliminary T. vaginalis sequence data, obtained from The cDNAs corresponding to spliced and unspliced mRNA in T. vaginalis cells EVOLUTION Institute for Genomic Research through the web site (www.tigr. transfected with the Int1 (GT) and the Int4 (GC) constructs. No spliced CAT org͞tdb͞e2k1͞tvg), for the presence of the following motif: cDNA is detected in cells transfected with the Int2 construct containing wild-type Giardia intron (CT) or the Int1 construct with the intron inserted in GYNNGYN(n)ACTAACACACAG, where Y, T or C and n, any the reverse orientation (neg). Pos, RT-PCR reaction by using cells transfected nucleotide. As indicated by N(n), no size limit was set for the distance between the putative 5Ј SS motif (GYNNGY) and the with the intronless CAT construct. (C) RNA blot analysis of total RNA from Ј transfected cells. The blot was hybridization with a CAT gene probe showing putative 3 SS motif (ACTAACACACAG). that cells without CAT activity do transcribe the CAT RNA. A ferredoxin probe was used as an internal standard for RNA loading. Results Trichomonads Are Capable of Splicing Short Introns and Require Conserved Splicing Motifs. To address whether the primitive eu- histolytica, T. cruzi, and adenovirus-2 contained canonical 5Ј karyote T. vaginalis contains the spliceosomal machinery re- GT-AG3Ј SS motif and putative branch site sequences, none of quired for intron removal, we developed a functional assay to test the introns were removed from the CAT gene (data not shown). its ability to splice heterologous introns. Introns were inserted In contrast, analysis of T. vaginalis transfectants harboring a into the PvuII site in the CAT reporter gene contained in a T. CAT gene containing the G. intestinalis intron (16) in which the vaginalis expression vector (21), creating either an in-frame stop 5Ј CT was modified to a 5Ј GT revealed the presence of CAT codon or shifting the ORF. This would require accurate splicing activity (Fig. 1A, Int1). Transfectants possessing the CAT gene of the intron from the premRNA for the production of CAT Ј Ј activity in cell transfected with these constructs. with the wild-type Giardia intron (e.g., 5 CT-AG3 SS) do not Initially, we tested whether T. vaginalis could remove introns produce CAT activity (Fig. 1A, Int2). These data demonstrate from two related anaerobic parasitic protists, a 35-nt intron from that T. vaginalis contains a functional premRNA splicing ma- the Giardia intestinalis ferredoxin gene (16) and a 46-nt intron chinery and suggest it requires a 5ЈGT SS. from an Entamoeba histolytica serine͞threonine kinase gene Additional mutational analyses of the motifs necessary for (23), as well as a 302-nt intron from the cruzi splicing in T. vaginalis transfectants established the requirement poly(A) polymerase gene (9) and a 46-nt modified version of an of standard eukaryotic canonical 5Ј and 3Ј SS for splicing. adenovirus intron (24). Although the three introns from E. Alterations at any of these positions in the Giardia intron

Vanˇa´cˇova´ et al. PNAS ͉ March 22, 2005 ͉ vol. 102 ͉ no. 12 ͉ 4431 Downloaded by guest on September 28, 2021 Fig. 2. T. vaginalis poly(A) polymerase gene contains an intron with extended conserved 5Ј and 3Ј SS that are necessary for splicing. (A) Alignment of the only known Giardia intron and the T. vaginalis poly(A) polymerase (PAP) intron reveals identity at 5 of 6 nt at the 5Ј SS and 12 or 12 at the 3Ј SSs. (B) RT-PCR products generated by using T. vaginalis genomic DNA (gDNA) or mRNA (cDNA) as the template and primers flanking the putative intron in the PAP gene were separated on a 2% agarose gel. (C) CAT activity in T. vaginalis cells transfected with plasmid constructs of a CAT gene containing the Giardia 35-nt intron with mutations (Int13–25), as indicated in bold and underlined. The positive control is described in Fig. 1. CAT activity is expressed as cpm of n-butyrylated chloramphenicol per microgram of protein. The number of independent transfections assayed is shown in brackets, and bars designate standard error. A derived consensus sequence required at the 5Ј and 3Ј SS for CAT activity is listed in bold.

abolished CAT activity (Fig. 1A, Ints 2, 3, 5, 6, and 10–12), with fectants harboring intron-containing CAT genes were shown to the exception of an intron containing 5Ј GC-AG3Ј sites (Int4). contain CAT premRNA (Fig. 1C). These findings are consistent with reports demonstrating that a minority of introns contain a 5Ј GC SS (25). We also observed A T. vaginalis Gene Contains an Intron with Extended Conserved that changing the third and fourth or the fifth and sixth dinucle- Motifs Found in the Only Known Giardia Intron. Having demon- otides at the 5Ј SS to TG abolished CAT activity (Fig. 1A; Int7 strated the presence of splicing machinery in T. vaginalis,we and Int8), indicating a critical role for the first 6 nt at the 5Ј SS searched for intron-containing genes on release of sequence data in premRNA splicing in T. vaginalis. To test whether a sequence from the T. vaginalis Genome Sequencing Project (www.tigr.org͞ identical to the consensus branch site (ACTAAC) of yeast tdb͞e2k1͞tvg) by examining homologues of intron-containing introns (26) is required for splicing in T. vaginalis, we mutated genes in other protists. An ORF encoding a putative poly(A) this sequence in the 35-nt intron (Fig. 1A, Int9) and found that polymerase (PAP1) (9) that is interrupted by a 94-nt sequence CAT activity was abolished. Combined, these re- flanked by canonical 5Ј GT-AG3Ј SS was identified. Inspection sults show that the splicing machinery necessary to remove of this putative intron exposed an unexpected identity of the last U2 spliceosomal introns from premRNAs is present in 12 nt, ACTAACACACAG, with those of the 35-nt Giardia trichomonads, and that the same motifs used by yeast and intron (Fig. 2A). Moreover, 5 of the 6 nt at the putative 5Ј SS of metazoa are required for splicing in this deep-branching the two sequences are also identical (Fig. 2 A and C) and similar eukaryote. to that found in yeast introns. RT-PCR with PAP1-specific To confirm that CAT activity reflects splicing and to demon- primers was used to show that the 94-nt sequence is an intron, strate precise intron removal, we used RT-PCR to clone and because it is removed from the premRNA precisely at the sequence CAT mRNA. As shown in Fig. 1B, two products of the predicted positions (Fig. 2B). The conservation at the 5Ј SS and predicted size for unspliced and spliced mRNAs are found in the atypical extended strict conservation at the 3Ј SSs strongly cells transfected with the CAT gene introns bordered by either suggest a functional role for these motifs. To test this, these a GT (Int1) or a GC (Int4) 5Ј SS. In contrast, transfectants that sequences were mutated in the 35-nt CAT intron (Int1) and did not express detectable CAT activity (e.g., CT͞Int2 and assayed for CAT activity. Single- or double-nucleotide mutations negative control cells͞Int13) produce only a single RT-PCR of6ntatthe5Ј SS revealed that several positions could vary and product, corresponding in length to unspliced RNA (Fig. 1B). still support splicing (T to C at position 3 and GT to AC or GC These data confirm that transfectants without activity produce at positions 5 and 6; see Fig. 2C, Int13, Int15, and Int16). CAT premRNA but are incapable of splicing it. Sequencing of However, splicing activity was lost when the C at the fourth the RT-PCR products showed accurate splicing by T. vaginalis of position was preceded by a G (GTGCGT) (Fig. 2C; Int14). the canonical introns in Int1 and Int4 transfectants (Fig. 1B) and Similarly, CAT activity was observed only when the C at position demonstrated that longer RT-PCR products contain the intron. 6 was preceded by an A or a G (Fig. 2C; Int15 and Int16), because The presence of unspliced RNA in Int1 and Int4 transfectants placingaTorCatposition 5 abolished splicing (Fig. 2C; Int17 indicates inefficiency of splicing that might be attributed to the and Int18). Strict sequence requirements for the 12 nt at the 3Ј short length of the intron or the artificial exons that flank it. To end were found to be necessary for splicing (Fig. 2C, Int19 to demonstrate that the lack of gene transcription does not account Int25). Two single-nucleotide mutations in the putative branch for the absence of CAT activity, Northern blot analysis of total site, the conserved A and T (Int 19, ACCAAC and Int 20, RNA was also performed, and all examined T. vaginalis trans- ACTACC) also abolished splicing, consistent with the role for

4432 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0407500102 Vanˇa´cˇova´ et al. Downloaded by guest on September 28, 2021 81) were used in RT-PCR. The sequences interrupting these ORFs were found to be precisely removed from their mRNAs, demonstrating they are introns (Fig. 3B).

Analyses of T. vaginalis Introns. Classification of the types of genes containing introns shows that 10 of 18 ORFs are homologous to different types of STKs. The remaining eight encode three C-terminal phosphatases (Scp), two centaurins, two TATA-binding protein-associated factors (Taf6), and one poly(A) polymerase (PAP1) (Table 2 and Fig. 4). To determine whether paralogous copies of these genes might also contain introns with an imperfect match to the search motif, BLAST analyses were performed, and two additional introns, one in a putative centaurin gene (cent115) and another in a STK gene (stk70), were found. These putative introns have a precisely Fig. 3. In silico analysis reveals the presence of additional intron-containing conserved 5Ј-GTATGT motif but differ in 1 of 12 nt at the 3Ј SS genes. (A) The sequence used to search the T. vaginalis genome database (Table 2 and Fig. 4). (www.tigr.org͞tdb͞e2k1͞tvg) for introns is shown (at the top), and the 5Ј and We found that the position of introns in T. vaginalis genes is Ј 3 motifs and length of the 39 sequences identified are listed (at the bottom). strongly biased toward the first third of the gene. Also slightly (B) RT-PCR products generated by using T. vaginalis genomic DNA (gDNA) or over half (15 of 28) of the introns are inserted between two mRNA (cDNA) as the template and primers flanking the putative introns in genes encoding a TATA-binding protein-associated factor (Taf6–81), a small codons [e.g., phase zero introns (28, 29)]. Eight interrupt codons C-terminal domain phosphatase (Scp 67), and a STK (STK 196). after the first nucleotide (phase 1 introns) and five after the second nucleotide (phase 2 introns), resulting in a phase distri- bution of 5:2.7:1.7 (phases 0, 1, and 2, respectively; Table 5, which this sequence as a branch site motif. Similarly, we observed that is published as supporting information on the PNAS web site). splicing is inactivated when the branch motif ACTAAC is moved No strong nucleotide preference was observed in the exon 2 or 7 nt upstream (Int24 and Int25), indicating a spatial immediately adjacent to the SS; however, an A oraTisoften requirement for the motif immediately upstream of the 3Ј SS. found in the last exon position bordering the 5Ј SS. This is in Finally, three additional mutants provide evidence that the contrast to that observed for U2 exon͞intron borders of other entire 12-nt motif is required for splicing, because alterations in eukaryotic introns where this position is G Ͼ 80% of the time (3). all positions 3–7 nt upstream of the 3Ј AG disrupted splicing (Fig. T. vaginalis introns found in orthologous genes are of similar 2C; Int 21, Int 22, and Int 23). These results led to the prediction length and structure and are often, but not always, located in that T. vaginalis introns are typically bordered by 5Ј-GYAYGY roughly the same position within the gene (Tables 3–5). This and ACTAACACACAG-3Ј conserved motifs. positional conservation indicates that the intron was present before gene multiplication and divergence. Not all orthologous Large-Scale Search for Intron-Containing Genes in the T. vaginalis genes have positionally conserved introns; for example, only one Genome. Strict conservation of 5Ј and 3Ј SS and placement of the of the four PAP genes in the T. vaginalis genome database putative branch site adjacent to the 3Ј SS motif in introns from contains an intron (data not shown). Giardia and T. vaginalis, as well as the observed requirement of these sequences for splicing, led us to search for other T. vaginalis Intron Positions in Orthologous Genes from Representing intron-containing genes by using in silico methods. A database Other Eukaryotic Lineages. Introns found in the same position in containing 2.7-fold coverage of the T. vaginalis genome (www. homologous genes of phylogenetically diverse species are inter- tigr.org͞tdb͞e2k1͞tvg) was searched with the motif 5Ј- preted to be ‘‘old’’ introns, whereas those in a unique position are EVOLUTION referred to as ‘‘new’’ introns (30). To determine whether T. GYnnGY. . . no size limit. . . ACTAACACACAG-3Ј, and 39 vaginalis introns are old or new, we compared their position to putative introns were identified. These were all short and varied that of introns in orthologous genes from a protist and various in length from 59 to 196 nt. Although our search motif allowed yeast and metazoa. Sixty percent of the T. vaginalis introns were any nucleotide to be in positions 3 and 4 at the 5Ј end search Ј found to interrupt the gene in the exact position of an intron in motif, 34 of the 39 hits contain the same 6 nt at the putative 5 an orthologous gene from at least one other eukaryotic lineage SS, GTATGT, and the remaining five differ by only 1 nt Ј (Tables 1 and 5). Moreover, an intron was found in another (GTACGT) (Fig. 3A). Further analyses of the 5 end of these eukaryotic orthologue within four amino acids of the position sequences revealed a strong bias for an A at position 7 (32 of 39) observed for all T. vaginalis introns. Given the extreme diver- andaTatposition 8 (33 of 39). Positions 9–13 likewise showed gence of T. vaginalis proteins, this likely reflects a relatively strong preference for a T (21 of 39 and 25 of 39, divergence around an old intron as opposed to the addition of a respectively) (Fig. 4, which is published as supporting informa- new intron. These data indicate that some T. vaginalis genes tion on the PNAS web site). Although a consensus sequence is contain old introns, present before the divergence of parabasa- common for the first 6 nt of U2 introns (3), extension of this lids from other eukaryotic lineages. Moreover, they reinforce the conservation to 13 nt was unexpected. conclusion that introns were present when unicellular eu- A search of the regions surrounding these putative introns by karyotes first evolved, and that their paucity in most extant using ARTEMIS (27) was then performed, revealing the presence protists and yeasts is the result of subsequent loss (31, 32). of interrupted ORFs in 26 introns, and BLAST analyses showed that 20 of these introns are orthologues of known genes in other Discussion organisms, whereas the other 8 encode unique proteins (Tables We have shown that T. vaginalis, a protist of the parabasalid 2–4, which are published as supporting information on the PNAS deep-branching eukaryotic lineage, possesses splicing machinery web site). To directly test whether these sequences are introns, that functionally depends on standard conserved motifs and have specific primers flanking three genes encoding a putative C- identified the first intron-containing genes in this lineage. Our terminal domain phosphatase (scp67), a serine threonine kinase studies illustrate the conservation of functional splicing signals (STK) gene (stk196), and the transcription factor TAF6 (taf6– across large evolutionary expanses, show that T. vaginalis contain

Vanˇa´cˇova´ et al. PNAS ͉ March 22, 2005 ͉ vol. 102 ͉ no. 12 ͉ 4433 Downloaded by guest on September 28, 2021 Table 1. Conservation of intron position in T. vaginalis genes vaginalis, where Prp8 plays an indispensable role. In silico and in orthologous genes from other eukaryotic lineages analyses to determine whether genes encoding other critical Gene͞organism H.s. D.m. C.e. A.t. S.p. S.c. P.f. spliceosomal proteins are present in trichomonads await a complete coverage of the genome sequence. The structural PAP1 94 0 ϪϪϪϪϪϪsimilarity between T. vaginalis and Giardia spliceosomal introns Cent 116, 122, 127 0 0 ϪϪϪϪϪand group II self-splicing introns, a class of introns thought to be SCP 67, 70, 76 ϪϪ 2 ϪϪϪϪevolutionarily related to spliceosomal small nuclear RNA TAF6 81, 91 3 Ϫ 14ϪϪϪ(snRNA) machinery (5, 35), is noteworthy. The distance (7 nt) STK 93 3 3 0 2 4 ϪϪbetween the branch site and the 3Ј SS of group II introns is the STK 59, 68,105 ϪϪϪ0 ϪϪ4 same as that observed for the putative branch site contained with STK 70,78 Ϫ 2341ϪϪthe conserved 12 nt at the 3Ј SS of Trichomonas and Giardia STK 110,114,134 0 1 0 3 3 ϪϪintrons. The use of analogous splicing pathways by group II STK 99 0 Ϫ 23ϪϪϪintrons and nuclear premRNA has been taken to support a STK 196 2 3 2 2 ϪϪ4 common evolutionary origin (35). The possibility that T. vagi- nalis and Giardia introns represent evolutionary intermediates Genes are abbreviated as follows: PAP, poly(A) polymerase; SCP, small C-terminal domain phosphatase; and TAF6, TBP-associated factor TAF6, where possessing features of both systems is an exciting one, with numbers correspond to intron length. Organisms are abbreviated as follows: important mechanistic implications. Despite strong conservation H.s., Homo sapiens; D.m., Drosophila melanogaster; C.e., Caenorhabditis of canonical 5Ј GT-AG3Ј and branch site motifs in T. vaginalis elegans; A.t., Arabidopsis thaliana; S.c., Saccharomyces cerevisiae; S.p., Schizo- introns implying that trichomonad snRNAs would likewise be saccharomyces pombe; and P.f., Plasmodium falciparum. 0 indicates an intron conserved, these RNAs have yet to be identified. at the same position, and 1, 2, 3, or 4 indicates the presence of an intron shifted We have shown that Trichomonas is capable of splicing a 35-nt in position by one to four amino acids relative to the T. vaginalis intron. Ϫ intron, whereas most eukaryotic introns have a minimal length indicates the absence of an intron in a similar position in the orthologous of Ϸ50 nt (36). This minimal length has been interpreted to gene. See Table 5 and Supporting Text for detailed analyses. reflect physical constraints imposed by the spliceosome. Whether the ability to splice such short introns by T. vaginalis and Giardia reflects significant differences in their splicing compo- old introns, and imply that spliceosomal introns were present in ͞ genes of the common ancestor of all extant eukaryotes. nents and or mechanisms awaits further analyses. In addition to the conservation of typical U2 introns, extended The complete sequence and annotation of genomes of 5Ј and 3Ј splicing motifs were found in all examined T. vaginalis numerous eukaryotes have allowed large-scale comparisons of introns. These extended motifs, which are also found in the only intron position(s) in phylogenetically diverse organisms (31). With rare exceptions, intron-poor unicellular eukaryotes have reported intron in Giardia (16, 33), a protist belonging to the Ј sister diplomonad lineage, appear to be required for splicing, been found to have a significant 5 bias in intron position, a property not observed in metazoan genes (31, 37). Our because their mutation abolishes splicing. Strikingly, 100% analyses of T. vaginalis introns are consistent with these nucleotide identity was observed in five of six positions at the 5Ј findings, because introns are typically found within the first SS with 92% identity at the remaining position in 41 putative T. one-third of the gene. These data support the argument that vaginalis introns and the Giardia intron. Likewise, 11 or 12 nt at the paucity of introns in unicellular eukaryotes results from a the 3Ј SS of these introns are identical, with the remaining secondary loss, driven by homologous recombination between position exhibiting 95% identity. Although the first 6 nt at the 5Ј the gene and an intronless cDNA derived by reverse transcrip- SS are often conserved in U2 introns, particularly in yeast introns tion of the corresponding mRNA (31, 37, 38), as opposed to (3), the extraordinarily high level of conservation of 12 nt at the Ј introns simply being less abundant during early eukaryotic 3 SS in T. vaginalis and Giardia introns is unprecedented. evolution. Moreover, the remarkable conservation of this trait between two Two theories have dominated the ‘‘origin of introns’’ debate. lineages and their requirement for splicing imply a difference in ͞ ͞ The first posited that introns were present in the common SS recognition and or intron excision ligation by spliceosomes ancestor of all cells and were subsequently lost in extant eubac- in these unicellular organisms. On the other hand, endogenous teria, archaea, and certain unicellular eukaryotes (the ‘‘intron- ͞ genes with introns of variable lengths and exon intron compo- early’’ hypothesis or the exon theory of genes) (28, 39–42), and sitions that allow for a degree of nucleotide flexibility incom- the second postulated that introns were added to genes in patible with our assay may be present that do not require strict eukaryotes and never existed in orthologous genes in intron- conservation of SS sequences. deficient organisms (43, 44) (the ‘‘intron-late’’ hypothesis). At Remarkably, the putative branch site motif (ACTAAC) re- the crux of this debate is the issue of whether introns were quired for splicing in T. vaginalis is contained within the highly present in the last common ancestor of archaea, eubacteria, and conserved 12-nt 3Ј motif, from Ϫ7toϪ12 nt. Attempts to eukaryotes. The data presented here support the intron-early exchange this sequence with upstream sequences eliminated hypothesis. splicing, indicating that the distance between the 3Ј SS and this A mixed model of intron evolution incorporating features of both element is critical. This spacing requirement is also consistent the intron-early and intron-late hypotheses has recently been em- with the ability of T. vaginalis to splice the 5Ј GT-Giardia intron braced (27–30, 37, 39, 45–48). Comparative genomics have revealed (Int1) and not three others, all of which contain standard 5Ј and that numerous introns present in a common ancestor of fungi, 3Ј SSs but do not have their branch site motifs adjacent to the 3Ј nematodes, and arthropods were lost in these lineages but retained SS. This immediate proximity of the branch site and the 3Ј end in vertebrates and plants (29, 39). When present, such introns are in T. vaginalis introns, an extremely rare trait of yeast and located in the same position in divergent orthologues and are metazoan introns, promotes speculation that their spliceosomes referred to as old introns. Unique positions for introns in the genes may combine the steps of branch site and 3Ј SS recognition. of specific groups of vertebrates and plants have also provided Interestingly, an essential spliceosomal protein, Prp8, that inter- evidence that introns can be added to genes (49), so-called new acts with both 5Ј and 3Ј canonical motifs and the branch site (34), introns (39, 48, 50). Consistent with the hypothesis that the 5Ј is the only reported spliceosomal protein in T. vaginalis (18). positional bias of introns in T. vaginalis genes is indicative of Thus, the close juxtaposition of the branch site and the 3Ј SS may secondary loss of introns, we have shown that many of the retained reflect a simplified mechanism for spliceosomal assembly in T. introns in T. vaginalis genes are old. The position of Ϸ60% of

4434 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0407500102 Vanˇa´cˇova´ et al. Downloaded by guest on September 28, 2021 examined T. vaginalis introns are conserved in orthologous genes We thank Drs. Doug Black, Richard Stefl, and Nancy Sturm, as well from at least one other eukaryotic lineage. This observation and the as our colleagues in the lab, for helpful discussions and critical requirement of canonical motifs for splicing in this deep-branching comments on the manuscript, and we thank Reviewer no. 1 for eukaryote indicate that introns and the spliceosomal machinery insightful comments. This work was supported by the National Insti- tute of Allergy and Infectious Diseases, National Institutes of Health necessary for their excision were present in the ancestral eukaryote. Grant R01AI30537, and a Co-Operative Agreement Award This places spliceosomes on the roster of essential defining eukary- (U01AI050913) to The Institute for Genomic Research to sequence otic features and establishes the presence of introns in all examined the genome of T. vaginalis. P.J.J. is a Burroughs Wellcome Scholar in eukaryotes. Molecular Parasitology.

1. Reed, R. (2000) Curr. Opin. Cell. Biol. 12, 340–345. 26. Sharp, P. A. (1994) Cell 77, 805–815. 2. Black, D. L. (2000) Cell 103, 367–370. 27. Rutherford, K., Parkhill, J., Crook, J., Horsnell, T., Rice, P., Rajandream, M. A. 3. Sharp, P. A. & Burge, C. B. (1997) Cell 91, 875–879. & Barrell, B. (2000) Bioinformatics 16, 944–945. 4. Belfort, M., Derbyshire, V., Parker, M. M., Cousineau, B. & Lambowitz, A. M. 28. Roy, S. W. (2003) Genetica 118, 251–266. (2002) in Mobile DNA II, eds. Craig, N. L., Craigie, R., Gellert, M. & 29. Zhaxybayeva, O. & Gogarten, J. P. (2003) Curr. Biol. 13, 764–766. Lambowitz, A. M. (Am. Soc. Microbiol. Press, Washington, DC), pp. 761–783. 30. Fedorov, A., Merican, A. F. & Gilbert, W. (2002) Proc. Natl. Acad. Sci. USA 5. Bonen, L. & Vogel, J. (2001) Trends Genet. 17, 322–331. 99, 16128–16133. 6. Tarn, W. Y. & Steitz, J. A. (1997) Trends Biochem. Sci. 22, 132–137. 31. Mourier, T. & Jeffares, D. C. (2003) Science 300, 1393. 7. Moore, M. J. (2000) Nat. Struct. Biol. 7, 14–16. 32. Roy, S. W., Fedorov, A. & Gilbert, W. (2003) Proc. Natl. Acad. Sci. USA 100, 8. Ebel, C., Frantz, C., Paulus, F. & Imbault, P. (1999) Curr. Genet. 35, 542–550. 7158–7162. 9. Mair, G., Shi, H., Li, H., Djikeng, A., Aviles, H. O., Bishop, J. R., Falcone, F. H., 33. McArthur, A. G., Morrison, H. G., Nixon, J. E., Passamaneck, N. Q., Kim, U., Gavrilescu, C., Montgomery, J. L., Santori, M. I., et al. (2000) RNA 6, 163–169. Hinkle, G., Crocker, M. K., Holder, M. E., Farr, R., Reich, C. I., et al. (2000) 10. Muchhal, U. S. & Schwartzbach, S. D. (1994) Nucleic Acids Res. 22, 5737–5744. FEMS Microbiol. Lett. 189, 271–273. 11. Tessier, L. H., Paulus, F., Keller, M., Vial, C. & Imbault, P. (1995) J. Mol. Biol. 34. Siatecka, M., Reyes, J. L. & Konarska, M. M. (1999) Genes Dev. 13, 1983–1993. 245, 22–33. 35. Jacquier, A. (1996) Biochimie 78, 474–487. 12. Wilihoeft, U., Campos-Gongora, E., Touzni, S., Bruchhaus, I. & Tannich, E. 36. Deutsch, M. & Long, M. (1999) Nucleic Acids Res. 27, 3219–3228. (2001) Protist 152, 149–156. 37. Sakurai, A., Fujimori, S., Kochiwa, H., Kitamura-Abe, S., Washio, T., Saito, R., 13. Baldauf, S. L., Roger, A. J., Wenk-Siefert, I. & Doolittle, W. F. (2000) Science Carninci, P., Hayashizaki, Y. & Tomita, M. (2002) Gene 300, 89–95. 290, 972–977. 38. Fink, G. R. (1987) Cell 49, 5–6. 14. Horner, D. S. & Embley, T. M. (2001) Mol. Biol. Evol. 18, 1970–1975. 39. Rogozin, I. B., Wolf, Y. I., Sorokin, A. V., Mirkin, B. G. & Koonin, E. V. (2003) 15. Simpson, A. G., Roger, A. J., Silberman, J. D., Leipe, D. D., Edgcomb, V. P., Curr. Biol. 13, 1512–1517. Jermiin, L. S., Patterson, D. J. & Sogin, M. L. (2002) Mol. Biol. Evol. 19, 40. de Souza, S. J., Long, M., Schoenbach, L., Roy, S. W. & Gilbert, W. (1996) Proc. 1782–1791. 93, 16. Nixon, J. E., Wang, A., Morrison, H. G., McArthur, A. G., Sogin, M. L., Loftus, Natl. Acad. Sci. USA 14632–14636. B. J. & Samuelson, J. (2002) Proc. Natl. Acad. Sci. USA 99, 3701–3705. 41. Darnell, J. E. & Doolittle, W. F. (1986) Proc. Natl. Acad. Sci. USA 83, 17. Simpson, A. G., MacQuarrie, E. K. & Roger, A. J. (2002) Nature 419, 270. 1271–1275. 18. Fast, N. M. & Doolittle, W. F. (1999) Mol. Biochem. Parasitol. 99, 275–278. 42. Gilbert, W., Marchionni, M. & McKnight, G. (1986) Cell 46, 151–153. 19. Johnson, P. J. (2002) Proc. Natl. Acad. Sci. USA 99, 3359–3361. 43. Palmer, J. D. & Logsdon, J. M., Jr. (1991) Curr. Opin. Genet. Dev. 1, 470–477. 20. Diamond, L. S. (1957) J. Parasitol. 43, 488–490. 44. Cavalier-Smith, T. (1991) Trends Genet. 7, 145–148. 21. Ortiz, D. & Johnson, P. J. (2003) Mol. Biochem. Parasitol. 128, 43–49. 45. Fedorov, A., Roy, S., Fedorova, L. & Gilbert, W. (2003) Genome. Res. 13, 22. Delgadillo, M. G., Liston, D. R., Niazi, K. & Johnson, P. J. (1997) Proc. Natl. 2236–2241. Acad. Sci. USA 94, 4716–4720. 46. Fedorov, A., Roy, S., Cao, X. & Gilbert, W. (2003) Genome. Res. 13, 1155–1157. 23. Urban, B., Blasig, C., Forster, B., Hamelmann, C. & Horstmann, R. D. (1996) 47. Stoltzfus, A. (2004) Curr. Biol. 14, 351–352. Mol. Biochem. Parasitol. 80, 171–178. 48. Qiu, W. G., Schisler, N. & Stoltzfus, A. (2004) Mol. Biol. Evol. 21, 1252–1263. 24. Konarska, M. M., Grabowski, P. J., Padgett, R. A. & Sharp, P. A. (1985) Nature 49. Logsdon, J. M., Jr., Stoltzfus, A. & Doolittle, W. F. (1998) Curr. Biol. 8, 313, 552–557. 560–563. 25. Burset, M., Seledtsov, I. A. & Solovyev, V. V. (2000) Nucleic Acids Res. 28, 50. Sverdlov, A. V., Rogozin, I. B., Babenko, V. N. & Koonin, E. V. (2003) Curr. 4364–4375. Biol. 13, 2170–2174. EVOLUTION

Vanˇa´cˇova´ et al. PNAS ͉ March 22, 2005 ͉ vol. 102 ͉ no. 12 ͉ 4435 Downloaded by guest on September 28, 2021