<<

Proc. Natl. Acad. Sci. USA Vol. 96, pp. 10267–10271, August 1999

Late changes in spliceosomal introns define in evolution

BYRAPPA VENKATESH*†,YANA NING‡, AND SYDNEY BRENNER*‡

*Marine Molecular Genetics Laboratory, Institute of Molecular and Cell Biology, 30 Medical Drive, Singapore 117609; and ‡Molecular Sciences Institute, 2168 Shattuck Avenue, Berkeley, CA 94704

Contributed by Sydney Brenner, July 2, 1999

ABSTRACT The evolutionary origin of spliceosomal in- in the rhodopsin gene, which has been shown to contain introns trons has been the subject of much controversy. Introns are in (22) and in primitive such as lampreys proposed to have been both lost and gained during evolution. (23) and skates (24), but not in some (25). If the gain or loss of introns are unique events in evolution, they can serve as markers for phylogenetic analysis. We have METHODS made an extensive survey of the phylogenetic distribution of seven spliceosomal introns that are in Fugu genes, but PCR. Genomic DNA was extracted from fresh or frozen not in their mammalian homologues; we show that these tissues by using standard protocols. Gene fragments were introns were acquired by actinopterygian (ray-finned) amplified by PCR with AmpliTaq DNA polymerase (Perkin– at various stages of evolution. We have also investigated the Elmer) or the Expand Long Template PCR system (Boehr- intron pattern of the rhodopsin gene in fishes, and show that inger Mannheim). A typical PCR cycle consisted of an initial the four introns found in the ancestral rhodopsin denaturing step of 92°C for 2 min, 35 cycles of 92°C for 30 sec, gene were simultaneously lost in a common ancestor of 55°C for 1 min, and 72°C for 2 min, followed by a final ray-finned fishes. These changes in introns serve as excellent elongation step at 72°C for 6 min. For some combinations of markers for phylogenetic analysis because they reliably define primers and templates, higher (60°C) or lower (50°C) anneal- clades. Our intron-based establishes the difficult- ing temperatures were used to optimize the amplification to-ascertain phylogenetic relationships of some ray-finned conditions. The following sense and antisense primers com- fishes. For example, it shows that (Polypterus) are the plementary to the exons flanking the intron were used in PCR. sister group of all other extant ray-finned fishes. Growth hormone intron 4a (Gh4a), TGYTTYAARAAR- GAYATG and AGRTANGTYTCNACCT flanking the Fugu Two competing theories have been proposed to explain the growth hormone gene from codon 175 to 177 (15) (the antisense primer includes two bases complementary to AG, origin of spliceosomal introns, which are widespread in eu- Ј karyote genomes but absent from prokaryotes. The ‘‘introns which are the invariant 3 end bases of introns, and thus no early theory’’ states that the introns are ancient and have been PCR product was obtained when the gene lacked the intron). lost in different lineages (1, 2). On the other hand, the ‘‘introns Mhc II B-chain intron 2a (Mhc2a), TGCWGYGYR- late theory’’ maintains that the spliceosomal introns were TAYGRSTTCTACCC and AGGCTKGKRTGCTCCACC- inserted into the eukaryote genes later in evolution (3–6). WRRCA (extended primers Tu97 and Tu40) (26). Although the distribution of intron phases and the correlation Mixed lineage leukemia intron 25a (Mll25a), GCNCGNTC- between intron positions and protein module boundaries have NAAYATGTTYTTYGG and ATRTTNCCRCARTCRT- been proposed as evidence for the ancient origin of introns (2), CRCTRTT flanking the Fugu Mll gene from codon 3124 to the restricted phylogenetic distribution of some introns sug- 3376 (16). gests that they arose late in evolution (7–10). Dystrophin intron 6a (Dyst6a), ATGGCNGGNYTNCAR- We and others have identified extra spliceosomal introns in CARAC and GCNARNCCRTCRTTCCARCT flanking the the pufferfish (Fugu) and some other teleosts that are absent human dystrophin gene from codon 134 to 166 (27). from mammals (refs. 11–18; B. Peixoto and S.B., unpublished Dystrophin intron 10a (Dyst10a), TAYCARACNGCNYT- work). Likewise, extra introns were also found in some mam- NGARGA and TGNGTRTGRAAYTGNTCYTT flanking malian genes (16, 19, 20). These discordant introns could be the human dystrophin gene from codon 350 to 376 (27). the result of either loss of ancestral introns or gain of novel RAG1 intron a (RAG1a), AARTTYTCNGANTG- introns in different lineages. Because the spliceosomal introns GAARTT and ACRTCNACYTTRAANACYTT flanking are not self-splicing and are not known to be mobile, the loss the zebrafish RAG1 gene from codon 27 to 157 (21). or the gain of spliceosomal introns in a lineage is likely to be RAG1 intron b (RAG1b), TTYGCNGARAARGAR- a unique event, occurring at a specific point in its evolution; GARGG and TACATYTTRTGRTAYTGRCT flanking the hence it might serve as a decisive marker for evolutionary zebrafish RAG1 gene from codon 456 to 511 (21). studies. To evaluate and confirm whether the extra vertebrate Rhodopsin, CCNTAYGAYTAYCCNCARTAYTA and introns are the result of a loss or gain of introns, we made an TTNCCRCARCAYAANGTNGT flanking the rho- extensive survey of the phylogenetic distribution of seven of dopsin gene from codon 30 to 319 (25). these introns, including one each from the growth hormone gene (15), the major histocompatibility class II B-chain (Mh- Abbreviations: Dyst6a and Dyst10a, novel introns found in the Fugu cII) gene (13) and the mixed lineage leukemia-like (Mll) gene dystrophin gene; Gh4a, fifth intron in the growth hormone gene; Mhc2a, intron 2a in the Mhc class II ␤-chain gene; Mll25a, intron 25a (16), and two each from the dystrophin gene (14) and the in the Fugu mixed lineage leukemia gene; RAG1a and RAG1b, novel RAG1 gene (21). We also analyzed the distribution of introns introns in the RAG1 gene. Data deposition: The sequences reported in this paper have been The publication costs of this article were defrayed in part by page charge deposited in the GenBank database (accession nos. AF134595– AF134630; AF134919–AF134976; AF137083–AF137130; AF137132– payment. This article must therefore be hereby marked ‘‘advertisement’’ in AF137262; AF142553–AF142564; and AF148142–AF148144). accordance with 18 U.S.C. §1734 solely to indicate this fact. †To whom reprint requests should be addressed. E-mail: mcbbv@ PNAS is available online at www.pnas.org. imcb.nus.edu.sg.

10267 Downloaded by guest on October 1, 2021 10268 Evolution: Venkatesh et al. Proc. Natl. Acad. Sci. USA 96 (1999)

FIG. 1. Phylogenetic distribution of spliceosomal introns. The is based on Nelson’s classification of fishes (28). A plus sign indicates that no PCR fragment was obtained, presumably (ء) ϩ) indicates presence and a minus sign (Ϫ) indicates absence of intron. An asterisk) because of the large size of the intron. The letter ‘‘a’’ indicates a failure to clone by PCR because of the highly variable coding sequence in this region of RAG1. Data from previous studies (Gh4a, refs. 11 and 15; Mll25a, ref. 16; Dyst6a and Dyst10a, ref. 14; Mhc2a, refs. 12, 13, and 29; RAG1a and RAG1b, B. Peixoto and S.B., unpublished work and refs. 21 and 30; Rhod, refs. 22, 24, and 25) are enclosed in parentheses. Gh4a, (15); Mll25a, Downloaded by guest on October 1, 2021 Evolution: Venkatesh et al. Proc. Natl. Acad. Sci. USA 96 (1999) 10269

FIG. 2. Cladogram showing the phylogenetic relationships of fishes, inferred by using the presence or absence of introns as character states. The numbers in the boxes represent introns cloned by us (1, rhodopsin; 2, RAG1b; 3, RAG1a; Gha4a; 5, Mll25a; 6, Dyst10a; 7, Dyst6a; and 8, Mhc2a). B is an extension of the Division Teleostei from A.

The rhodopsin primers amplified the gene when there was the major groups of ray-finned fishes; we confirmed the no intron present but failed to amplify the gene when introns presence or absence of an intron by cloning and sequencing the were present (e.g., Polypterus), presumably because of the large PCR products. We also cloned the introns from shark͞torpedo size of the gene that contained introns. The presence of intron () which are the common ancestors of ray- in the Polypterus () rhodopsin gene was subsequently finned fishes and the . Fig. 1 summarizes our results confirmed by amplifying and sequencing the second intron and shows that most of the intron gain or loss events occur at together with its flanking exons by using the primers comple- unique points in the lineage. mentary to the flanking exons (GTNGTNTTYACNTG- The MhcII locus is an apparent exception to this general GATHATGGC and CCRCANGARCAYTGCATNCCYTC). rule. Sequences without introns and sequences with small The absence of Gh4a intron from the Torpedo (electric ray), introns can always be amplified by PCR. However, if a large (), and () was confirmed by cloning intron is present, we often fail to obtain a PCR product. Thus and sequencing the growth hormone gene fragments spanning we find MhcII fragments (more than one in each ) with exon 3, intron 3, and exon 4 by PCR with primers specific for no intron until we encounter paracanthopterygians, for which the genes. Because of the sequence similarity between the no product was obtained. We conclude that the paracan- dystrophin and its related protein utrophin, the dystrophin thopterygian MhcII has an intron, but it is too large to be primers that we used also amplified the corresponding utro- amplified. In Mugil () and some other fishes including phin fragments. Utrophin fragments from all of the fishes were the tetraodontoid boxfish (Ostracion), two PCR products were cloned and sequenced, and none contained introns at positions found, one without and one with an intron. corresponding to Dyst6a and Dyst10a (data not shown). In other recent fishes only the intron-containing fragments Cloning and Sequencing. The PCR fragments were cloned were found. However, four of the perciforms—seabass (Di- into a T-vector (modified pBluescript) and sequenced by using centrarchus), grouper (Epinephelus), blenny (Salarias), and an ABI 373A DNA sequencer (Perkin–Elmer Applied Bio- goby (Cryptocentrus)—contain only intron-lacking fragments. systems). All the PCR fragments were sequenced completely, Because it is common for to have multiple copies of the with the exception of those which contained introns larger than histocompatibility gene (29, 31, 32), only one of these copies 1.5 kb. Only end sequences (about 350 bp) of such large introns might have acquired an intron, and in later lineages either copy were determined. could have been lost. We believe that the losses of the copy with the intron account for the genes without introns in the RESULTS AND DISCUSSION four perciform fishes. The late changes in spliceosomal introns clearly define By using degenerate PCR primers, we amplified genomic clades in the ray-finned fish lineage and resolve some of the fragments spanning the introns from representatives of most of taxonomic problems of this large group of living .

(16); Dyst6a and Dyst10a, (14); Mhc2a, (13); RAG1a and RAG1b, (21); Rhod, rhodopsin gene (ϩ represents presence of four introns and Ϫ represents absence of all four introns). The rhodopsin gene from the primitive () contains four introns (23) (not shown) as in the mammalian rhodopsin. RAG1a intron was previously reported to be absent from the rainbow (Oncorhynchus mykiss) (30), but our PCR products from the rainbow trout and the brown trout (data not shown) contained this intron. Downloaded by guest on October 1, 2021 10270 Evolution: Venkatesh et al. Proc. Natl. Acad. Sci. USA 96 (1999)

Table 1. The size range and phase of introns, and the consensus ( ϩ ϩ Teleostei). sequences flanking them Among actinopterygiians, () Size Flanking consensus branched off after Polypteriformes (bichirs), and Neopterygii Intron range, kb Phase sequence͞intron ( and ) and Teleostei are sister groups. These relationships are in agreement with the morphology-based Gh4a 0.07–1.08 0 AAG͞GT ͞ phylogeny of these fishes (34), and they disagree with the Mll25a 0.07–1.20 2 TTT GA mitochondrial sequence-based phylogenetic tree that had Dyst6a 0.09–4.00 0 CAG͞GT ͞ identified Chondrichthyes and Teleostei as sister groups (35). Dyst10a 0.09–1.22 0 GAG GT Our cladogram also resolves the interrelationships of protac- MhcII2a 0.08–0.68 2 CAG͞GT anthopterygian fishes (Fig. 2B). It shows that Osmeridae RAG1a 0.09–1.27 2 CAG͞GT evolved after both and salmoniformes (their rel- RAG1b 0.06–4.1 1 AAG͞GC ative positions are not resolved by our data). Likewise, Gal- axiidae branched after Osmeridae. Extensive sampling of fish Although a robust classification of fishes has been developed species belonging to should demarcate by using morphological characters, the phylogenetic relation- the species in which introns Gh4a, Mll25a, and Dyst10a are ships of many closely related groups of fishes are ambiguous, present, thus providing further insight into their evolutionary ͞ because of the paucity of synapomorphies and or mosaic relationships. distribution of morphological characters (28, 33). Molecular Our results provide direct evidence that both gain and loss data also have not been very useful in resolving these rela- of spliceosomal introns have occurred in evolution. The loss of tionships (33). The intron-based cladogram inferred by us some or all of the introns from a gene can be explained by the shows that the bichir (Polypterus) is the sister group of all other insertion of a reverse-transcribed, partially or completely extant ray-finned fishes. The presence of a large number of spliced RNA into the genome. However, the events involved in both primitive and derived characters in this bony fish has the gain of spliceosomal introns are not well understood. The rendered its phylogenetic position uncertain. The intron pat- consensus sequence flanking five of the seven ‘‘late’’ introns tern in the bichir, in particular the presence of intron RAG1b, that we have cloned (Table 1) is identical to the previously which is absent from Chondricthyes (cartilaginous fishes) and proposed ‘‘proto-splice’’ site (MAG͞R) (36), and thus lends tetrapods but is present in other ray-finned fishes (Acti- support to the hypothesis that new introns are gained at this nopterygii), and the presence of rhodopsin introns, which are site. Based on the sequence similarity between recent introns absent from all other ray-finned fishes but are present in and preexisting introns, it has been suggested that the new Chondrichthyes and tetrapods, unequivocally place this fish at introns were gained through gene conversion events (7) or by the bottom of the ray-finned fish lineage. transposition of a preexisting intron to a new location (8). We Our results also clearly establish the relationships among the have observed that some of the extra introns found here groups Chondrichthyes, Chondrostei, Neopterygii, and Tel- contain sequences with some degree of homology to the eostei (Fig. 2A). The Chondrichthyes are the ancestors of sequences in their 5Ј flanking exons (Fig. 3). The Gh4a intron

FIG. 3. Introns that contain sequences corresponding to their 5Ј flanking exons. Coding sequences are shown in uppercase letters and intron sequences in lowercase letters. Intron sequences are aligned with their 5Ј flanking exons to show the codon sequences (underlined) in the intron. Nucleotide sequences are numbered above the line. Amino acid sequences are shown in single-letter codes. Downloaded by guest on October 1, 2021 Evolution: Venkatesh et al. Proc. Natl. Acad. Sci. USA 96 (1999) 10271

from Gadus (Atlantic ) contains sequences corresponding 8. Tarrio, R., Rodriguez-Trelles, F. & Ayala, F. J. (1998) Proc. Natl. to a stretch of 17 codons with 9 conserved (similar exon Acad. Sci. USA 95, 1658–1662. sequences are also found in the Gh4a intron from the Arctic 9. O’Neill, R. J. W., Brennan, F. E., Delbridge, M. L., Crozier, R. H. cod, Boreogadus; data not shown); Mhc2a from Mugil contains & Graves, J. A. M. (1998) Proc. Natl. Acad. Sci. USA 95, two copies of sequences corresponding to 14 codons, and 1653–1657. Mhc2a from Dissotichus contains sequences corresponding to 10. Frugoli, J. A., McPeek, M. A., Thomas, T. L. & McClung, C. R. (1998) Genetics 149, 355–365. 11 codons with 9 conserved (Fig. 3). 11. Agellon, L. B., Davies, S. L., Chen, T. T. & Powers, D. A. (1988) These exons have undergone tandem duplications, and, Proc. Natl. Acad. Sci. USA 85, 5136–5140. although the conservation of sequence suggests that the du- 12. Figueroa, F., Ono, H., Tichy, H., O’hUigin, C. & Klein, J. (1995) plications are recent, their existence leads us to propose a Proc. R. Soc. Lond. Ser. B 259, 325–330. hypothesis for the origin of late introns. We suggest that 13. Lim, E. H. & Brenner, S. (1995) Immunogenetics 42, 432–433. tandem duplications of exons could give rise to novel introns 14. Elgar, G. (1994) Ph.D. thesis (Cambridge Univ., Cambridge, if the sequence contains or acquires the elements required for U.K.). splicing. Indeed, we note that if the proto-splice site is part of 15. Venkatesh, B. & Brenner, S. (1997) Gene 187, 211–215. the sequence duplicated, it will automatically provide the 16. Caldas, C., Kim, M.-H., MacGregor, A., Cain, D., Aparicio, S. & correct sequences for the intron boundaries. It does not require Wiedemann, L. M. (1998) Oncogene 16, 3233–3241. recombination or transposition events acting over long dis- 17. Gottgens, B., Gilbert, J. G. R., Barton, L. M., Aparicio, S., tances. The virtue of this theory (exon duplication) for the Hawker, K., Mistry, S., Vaudin, M., King, A., Bentley, D., Elgar, G., et al. (1998) Genomics 48, 52–62. origin of late introns is that it can also explain the origin of 18. Sandford, R., Sgotto, B., Aparicio, S., Brenner, S., Vaudin, M., novel exons for alternative splicing, and it provides a common Wilson, R. K., Chissoe, S., Pepin, K., Bateman, A., Chothia, C., mechanism for both events. et al. (1997) Hum. Mol. Genet. 6, 1483–1489. Genetic events leading to the gain or loss of introns might 19. Armes, N., Gilley, J. & Fried, M. (1997) Genome Res. 7, have occurred initially in a single individual, and they would 1138–1152. still be required to be fixed. Thus the ancestral population 20. Tassone, R., Villard, L., Clancy, K. & Gardiner, K. (1999) Gene would be polymorphic for these changes, but could still have 226, 211–223. constituted a single species. The fixation of the event would 21. Willet, C. E., Zapata, A. G., Hopkins, N. & Steiner, L. A. (1997) take place in the descendants of the polymorphic ancestor Immunogenetics 45, 394–404. regardless of the taxonomic position we now ascribe to it. 22. Baehr, W., Falk, J. D., Bugra, K., Triantafyllos, J. T. & McGinnis, These descendants thus constitute a monophyletic lineage J. F. (1988) FEBS Lett. 238, 253–256. 23. Zhang, H. & Yokoyama, S. (1997) Gene 191, 1–6. derived from a single species. Thus the changes in spliceosomal 24. O’Brien, J., Ripps, H. & Al-Ubaidi, M. R. (1997) Gene 193, introns are robust cladistic markers for tracing monophyletic 141–150. lineages and establishing evolutionary relationships. The dis- 25. Fitzgibbon, J., Hope, A., Slobodyanyuk, S. J., Bellingham, J., cordant introns can be easily identified from comparisons of Bowmaker, J. K. & Hunt, D. M. (1995) Gene 164, 273–277. the sequences of orthologous genes from phylogenetically 26. Betz, U. A. K., Mayer, W. E. & Klein, J. (1994) Proc. Natl. Acad. distant taxa such as the Fugu and human; these introns can Sci. USA 91, 11065–11069. then be traced in many different species by using PCR methods 27. Nobile, C., Marchi, J., Nigro, V., Roberts, R. G. & Danieli, G. A. to identify the phylogenetic branch points. (1997) Genomics 45, 421–424. 28. Nelson, J. S. (1994) Fishes of the World (Wiley, New York). We thank C. H. Cheng-DeVries, M. Inoue, P. Linser, G. Martinez, 29. Ono, H., Klein, D., Vincek, V., Figueroa, F., O’hUigin, C., Tichy, G. Mikawa, J. G. Patil, H. S. Pradeep, S. H. San Ling, N. Takamatsu, H. & Klein, J. (1992) Proc. Natl. Acad. Sci. USA 89, 11886–11890. S. M. Veeranna, Y. Wakamatsu, S. Winkler, and G. Yearsley for fish 30. Hansen, J. D. & Kaattari, S. L. (1995) Immunogenetics 42, DNA; and we are grateful to B. H. Tay and B. Y. Goh for technical 188–195. help in cloning and sequencing. 31. Ono, H., O’hUigin, C., Vincek, V. & Klein, J. (1993) Immuno- genetics 38, 223–234. 1. Gilbert, W., Marchionni, M. & McKnight, G. (1986) Cell 46, 32. McConnell, T. J., Godwin, U. B. & Cuthbertson, B. J. (1998) 151–154. Immunol. Rev. 166, 294–300. 2. Gilbert, W., de Souza, S. J. & Long, M. (1997) Proc. Natl. Acad. 33. Stepien, C. A. & Kocher, T. D. (1997) in Molecular Systematics Sci. USA 94, 7698–7703. of Fishes, eds. Kocher, T. D. & Stepien, C. A. (Academic, San 3. Palmer, J. D. & Logsdon, J. M. (1991) Curr. Opin. Genet. Dev. 1, Diego), pp. 1–11. 470–477. 34. De Pinna, M. C. C. (1996) in Interrelationships of Fishes, eds. 4. Cavalier-Smith, T. (1991) Trends. Genet. 7, 145–148. Stiassny, M. L. J., Parenti, L. R. & Johnson, G. D. (Academic, San 5. Cho, G. & Doolittle, R. F. (1997) J. Mol. Evol. 44, 573–584. Diego), pp. 147–162. 6. Logsdon, J. M. (1998) Curr. Opin. Genet. Dev. 8, 637–648. 35. Rasmussen, A. & Arnason, U. (1999) Proc. Natl. Acad. Sci. USA 7. Hankeln, T., Fried, H., Ebersberger, I., Martin, J. & Schmidt, 96, 2177–2182. E. R. (1997) Gene 205, 151–160. 36. Dibb, N. J. & Newman, A. J. (1989) EMBO J. 8, 2015–2021. Downloaded by guest on October 1, 2021