Genes Genet. Syst. (2010) 85, p. 319–326 Different status of the gene for ribosomal protein S16 in the chloroplast genome during evolution of the genus Arabidopsis and closely related

Shradha Roy1†, Minoru Ueda1†‡, Koh-ichi Kadowaki2 and Nobuhiro Tsutsumi1* 1Laboratory of Molecular Genetics, Graduate School of Agricultural and Life Sciences, University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan 2Genetic Diversity Department, National Institute of Agrobiological Sciences, 2-1-2 Kannondai, Tsukuba, Ibaraki 305-8602, Japan

(Received 8 May 2010, accepted 20 November 2010)

The ribosomal protein S16 (RPS16), the product of the rps16, is generally encoded in the chloroplast genomes of flowering . However, it has been reported that chloroplast-encoded RPS16 in mono- and dicotyledonous plants has been substituted by the product of nuclear-encoded rps16, which was transferred from the mitochondria to the nucleus before the early divergence of angiosperms. Current databases show that the chloroplast-encoded rps16 has become a pseudo- gene in four species of the (Aethionema grandiflorum, Arabis hirsuta, Draba nemorosa, and maritima). Further analysis of Arabidopsis thaliana and its close relatives has shown that pseudogenization has also occurred via the loss of its splicing capacity (Arabidopsis thaliana and Olimarabidopsis pumila). In contrast, the spliced product of chloroplast-encoded rps16 is observed in close relatives of Arabidopsis thaliana (Arabidopsis arenosa, Arabidopsis lyrata, and Crucihimalaya lasiocarpa). In this study, we identified the different func- tional status of rps16 in several chloroplast genomes in the genus Arabidopsis and its close relatives. Our results strongly suggest that nuclear- and chloroplast- encoded rps16 genes coexisted for at least 126 million years. We raise the possi- bility of the widespread pseudogenization of rps16 in the angiosperm chloroplast genomes via the loss of its splicing capacity, even when the rps16 encoded in the chloroplast genome is transcriptionally active.

Key words: Brassicaceae, Gene substitution, group II intron, splicing

the present structures of the chloroplast and mitochon- INTRODUCTION drial genomes. Since then, the components of each It is generally accepted that the mitochondria and organelle have changed relative to those of their progen- chloroplasts are descendants of α-proteobacteria and itors. The details of this process are complex and still cyanobacteria, respectively. Most of the genes in the unclear (Gray, 1992; Rand et al., 2004; Timmis et al., ancestral endosymbionts either have been translocated to 2004). the nuclear genome of the host cell or have been lost dur- At present, more than 70 chloroplast genomes of angio- ing evolution after the initial endosymbiotic event. Alter- sperms have been completely sequenced, and this number natively, nuclear genes with different organellar origins is increasing. Although the number of genes and their have substituted for organelle-encoded genes, resulting in order are generally conserved among angiosperm chloro- plast genomes, exceptional gene losses have been identi- Edited by Toru Terachi fied (e.g., rpl33 in Phaseolus vulgaris (Guo et al., 2007) * Corresponding author. E-mail: [email protected] † These authors contributed equally to this work. and Vigna radiata (Tangphatsornruang et al., 2010), infA ‡ Present address: Department of Botany, Graduate School of in almost all rosid species (Millen et al., 2001), rpl32 in Science, Kyoto University, Oiwake-cho, Kitashirakawa, Sakyo- the Populus genus (Okumura et al., 2006; Steane, 2005), ku, Kyoto 606-8502, Japan rps16 in Medicago truncatula (Saski et al., 2005), P. Note: Sequences obtained in this work have been deposited in vulgaris (Guo et al., 2007), Cicer arietinum (Jansen et al., DDBJ/EMBL/GenBank with accession numbers: AB467267- AB467269, AB514944. A supplementary figure in article is at 2008), V. radiata (Tangphatsornruang et al., 2010), and http://wwwsoc.nii.ac.jp/gsj3/sup/85(5)Roy the Populus genus (Okumura et al., 2006; Steane, 320 S. ROY et al.

2005)). It is also possible that genes that have been the status of the cp rps16 genes in Arabidopsis and its transferred or substituted recently in evolution remain in close relatives, and provide an example of the evolution- the chloroplast genome as remnants when gene transfer ary time required for a gene to complete the gene substi- or substitution has occurred. These residual genes tution process. Our data indicate that the chloroplast should provide insight into the mechanistic process by and nuclear-encoded rps16 genes coexisted for more than which they were transferred or substituted (for example, 126 million years (myr) in Arabidopsis thaliana and O. infA (Millen et al., 2001), rpl32 (Cusack and Wolfe, 2007; pumila, and raise the possibility that the pseudogeniza- Ueda et al., 2007), and rps16 (Ueda et al., 2008)). RPS16 tion of cp rps16 occurred via a splicing defect, despite the is essential for cell viability (Persson et al., 1995) and detectable coding sequence of rps16 in the plant chloro- plays an important role in the assembly of the 30S sub- plast genome. unit (Held and Nomura, 1975) in Escherichia coli. It was The correlation of gene loss with self-compatibility is also demonstrated that chloroplast-encoded (cp) rps16 has discussed in this study because in the genus Arabidopsis been replaced with the nuclear-encoded rps16 which had and its close relatives, out-crossing plants (Arabidopsis originated from the mitochondria in M. truncatula and arenosa and Arabidopsis lyrata) tend to have retained, and Populus alba, and that the present status of the rps16 inbreeding plants (Arabis hirsuta, Arabidopsis thaliana, gene substitution in the chloroplast genome corresponds C. lasiocarpa, and O. pumila) tend to have lost, the cp to the intermediate stage in most angiosperms (Ueda et rps16 in their chloroplast genomes. al., 2008). This is interesting evidence of the process of gene substitution in eukaryotes. MATERIALS AND METHODS A database analysis revealed that the cp rps16 genes in four Brassicaceae species have already become pseudo- Plant material and growth conditions Arabidopsis genes in that the gene contains a deletion within its cod- thaliana (L.) Heynh. cv. Columbia, Arabidopsis arenosa ing sequence in Arabis hirsuta, a non-sense mutation in (The Arabidopsis Biological Resource Center (ABRC), Aethionema grandiflorum, and the complete loss of the stock number CS3901), Arabidopsis lyrata subsp. Lyrata second exon in Draba nemorosa and Lobularia maritima. (ABRC stock number CS22696), Arabis hirsuta (RIKEN This suggests that this type of pseudogenization may also BioResource Center stock number SJO02300), O. pumila have occurred in the other Brassicaceae chloroplast (The Nottingham Arabidopsis Stock Centre (NASC), stock genomes. In the Brassicaceae, the phylogenetic posi- number N3700), C. lasiocarpa (NASC stock number tions of Arabidopsis and its close relatives and the esti- N6191), and Brassica oleracea var. acephala (The mated times of their divergence have been analyzed in Genebank stock number 87899 in National Institute of detail (Clauss and Koch, 2006). If the rps16 in the chlo- Agrobiological Sciences (Tsukuba, Japan)) were used as roplast genome of each species were poised at a different plant materials. O. pumila and C. lasiocarpa once functional stage, each pseudogenization event must have belonged to the Arabidopsis genus (Clauss and Koch, occurred after the divergence of Arabidopsis and its close 2006). Plants were grown in a growth chamber at 22°C relatives, suggesting that the timing of cp rps16 pseudo- with a long day photoperiod (18 h/6 h light/dark cycle). genization has occurred at different times during evolu- tion. Database analysis All sequence similarity searches The chloroplast genome encodes rps16 in Arabidopsis (BLASTN or 2BLAST) were conducted using the internet- thaliana (Sato et al., 1999). However, an expression based Basic Local Alignment Search Tool (BLAST) analysis revealed that the cp rps16 is a pseudogene in this available through the National Center for Biotechnology species because the splicing of the group II intron is Information (NCBI). Each search was conducted using defective. Further analysis has shown that this pseudo- default parameters. The ClustalX software (ftp://ftp- genization of cp rps16 has occurred in another closely igbmc.u-strasbg.fr/pub/ClustalX/) (Thompson et al., 1997) related species, because a similar splicing defect has been was used for the multiple alignment of nucleotide detected in Olimarabidopsis pumila. In contrast, active sequences. The alignments were alphabetically ordered rps16 genes, confirmed by their nucleotide sequences and in Figs. 1B and 1C. intron splicing capacities, have been detected in the chloroplast genomes of other closely related species, DNA cloning of RT-PCR and the genomic PCR including Arabidopsis arenosa, Arabidopsis lyrata, and products Total DNA and RNA were isolated from the Crucihimalaya lasiocarpa. These observations indicate green of plants using the Isoplant II kit (Nippon that the functional status of rps16 in these chloroplast gene, Tokyo, Japan) and the RNeasy Plant Mini kit genomes is poised at different stages, and that the cp (Qiagen, Valencia, CA, USA), respectively. Isolated rps16 gene substitution has been completed in Arabidopsis RNA was further purified by incubation with RNase-free thaliana and O. pumila. DNase I (TaKaRa Bio, Ohtsu, Japan), according to the In this study, we demonstrate dynamic differences in manufacturer’s instructions. First-strand cDNA synthe- Different status of cp rps16 in Brassicaceae 321

sis was performed using 1 μg of total RNA, 0.5 units of SuperScript III reverse transcriptase, and 20 pmol of ran- dom hexamer primer mix (Invitrogen, Carlsbad, CA, USA). The resultant cDNAs were used as templates to amplify the cp rps16 cDNA. Primer pairs for the ampli- fication of genes on the chloroplast genome were as fol- lows: cp rps16 for Arabidopsis arenosa, Arabidopsis lyrata, Arabidopsis thaliana, B. oleracea, O. pumila, and C. lasiocarpa: P1 (5′-GGAATGTTATGGTAAAACTTCG- 3′) and P2 (5′-CCGTAAAAATCCCAGCCTTC-3) primers; cp rps16 for Arabis hirsuta, P3 (5′-CTATGGTAAAACT- TAATGCGG-3′) and P4 (5′-CCGTAAAAAGTCCTGC- CTTC-3′) primers; cp ndhB for all plants, P5 (5′- GCTTCAAGAAATAGTGAATGG-3′) and P6 (5′-GAG- TAATAGCAATGAGATTCC-3′) primers. P1, P2, P5, and P6 primers were designed from Arabidopsis thaliana chloroplast genome sequence information (GenBank accession no. AP000423). P3 and P4 primers were designed from Arabis hirsuta chloroplast genome sequence information (GenBank accession no. AP009369). The positions of these primer pairs are shown in Fig. 3A. Amplification of the genomic sequence of cp rps16 was performed using LA Taq DNA polymerase (TaKaRa Bio) and primers P7 (5-TTCGTACCTTAGGAGGAATAC-3′) and P8 (5′-TCCATTCATAGAGAAAGAAAAGG-3), which were designed from conserved sequences between Arabidopsis thaliana and Arabis hirsuta. Total DNA was denatured at 94°C for 5 min and was amplified using 35 PCR cycles at 94°C for 30 s, 55°C for 30 s, and 72°C for 2 min. PCR products were directly sequenced. RT– Fig. 1. Alignment of the nucleotide sequences of the chloroplast- encoded (cp) rps16 genes of the Brassicaceae. (A) Schematic PCR was conducted using KOD plus DNA polymerase representation of the exon–intron structure of the cp rps16 of (TOYOBO, Osaka, Japan). cDNAs were denatured at Arabidopsis thaliana. White boxes and an oblique line indicate 98°C for 2 min and were amplified using 35 PCR cycles exons and the intron, respectively. (B) Comparison of the first at 98°C for 15 s, 53°C for 30 s, and 68°C for 1.5 min. The exons and the 5′ splice site consensus sequences of the Brassicaceae. The black line pinpoints the 5′ consensus splice PCR products were subsequently cloned into the pCR- sites. Black and gray backgrounds indicate nucleotide sequences Blunt II-TOPO vector by using Zero Blunt TOPO PCR that are conserved among 16 and among more than 12 sequences, Cloning Kit (Invitrogen). Independent clones were respectively. (C) Comparison of the sequences around the 3′ sequenced using universal primers. splice site in the Brassicaceae. The black line pinpoints the 3′ splice site consensus sequences. Black and gray backgrounds indicate nucleotide sequences that are conserved among 14 and 13 sequences, respectively. (D) Comparison of the sequences RESULTS around the presumed stop codon of rps16 among Arabidopsis- rps16 related species. Underlining indicates the presumed translation Pseudogenization of in the chloroplast stop codon in Arabidopsis lyrata and other species, except genomes of some Brassicaceae The rps16 in the Arabidopsis thaliana. Black and gray backgrounds indicate angiosperm chloroplast genome contains two exons and nucleotide sequences that are conserved among six and five one group II intron (Fig. 1A). To obtain the sequences sequences, respectively. In all panels, dashes indicate gaps that have been inserted to increase the sequence similarities. The encoding rps16 in the Brassicaceae chloroplast genomes, numbers in the upper part of the white boxes in A or in the align- we conducted a BLASTN search using the sequence of the ments in B–D indicate the positions of the coding sequences cor- cp rps16 gene of Arabidopsis thaliana as the query. The responding to rps16 in Arabidopsis arenosa. Plants encoding results identified 11 complete Brassicaceae chloroplast pseudo-rps16 gene in the chloroplast genome are denoted with ψ in B–D. Abbreviations: Aar, Arabidopsis arenosa; Agr, Aethionema genomes (GenBank accession numbers: Aethionema grandiflorum; Ahi, Arabis hirsuta; Aly, Arabidopsis lyrata; Ath, grandiflorum, AP009367; Arabis hirsuta, AP009369; Arabidopsis thaliana; Bra, Brassica rapa; Bve, Barbarea verna; Barbarea verna, AP009370; Brassica rapa subsp pekinen- Cbu, Capsella bursa-pastoris; Cla, Crucihimalaya lasiocarpa; sis, AC189190; Capsella bursa-pastoris, AP009371; Cwa, Crucihimalaya wallichii; Dne, Draba nemorosa; Lma, Lobularia maritima; Lvi, Lepidium virginicum; Nof, Nasturtium Crucihimalaya wallichii, AP009372; D. nemorosa, officinale; Opu, Olimarabidopsis pumila; and Sal, Sinapis alba. AP009373; Lepidium virginicum, AP009374; L. maritima, 322 S. ROY et al.

AP009375; Nasturtium officinale, AP009376; O. pumila, Comparison of the genomic sequences of cp rps16 NC_009267) and the partial chloroplast genome, includ- in the genus Arabidopsis and its close relatives ing rps16, of Sinapis alba (GenBank accession number: To assess whether the pseudogenization of cp rps16 in the X13609)(Neuhaus et al., 1989). A comparison of these chloroplast genome is widespread among the genus sequences revealed that the rps16 genes in the Aethionema Arabidopsis and its close relatives (Arabidopsis arenosa, grandiflorum, Arabis hirsuta, D. nemorosa, and L. Arabidopsis lyrata, O. pumila, and C. lasiocarpa; their maritima chloroplast genomes have become pseudogenes. phylogenetic relationships are shown in Fig. 2), the In the chloroplast genomes of Arabis hirsuta and D. genomic sequences of cp rps16 were compared among these nemorosa, there is a 10-bp deletion within the first exon species. The complete chloroplast genome sequences are of rps16, leading to a frameshift (Fig. 1B). Deletions of available for Arabidopsis thaliana and O. pumila. For 9 bp, 1 bp (Supplementary Fig. 1), and an insertion of 6 the other species, the sequences around cp rps16 were bp were also found in the second exon (Fig. 1D) of rps16 obtained by PCR amplification and were analyzed in this in the Arabis hirsuta chloroplast genome relative to the study (GenBank accession numbers: Arabidopsis arenosa, conserved sequences of the genus Arabidopsis. In the AB467268; Arabidopsis lyrata, AB467269; C. lasiocarpa, chloroplast genomes of both D. nemorosa and L. maritima, AB467267). The sequences encoding cp rps16 in the loss of the entire second exon and part of the intron Arabidopsis arenosa, Arabidopsis lyrata, O. pumila, and of rps16 was observed (Supplementary Fig. 1). The C. lasiocarpa showed high similarity to that of cp rps16 nucleotide sequence of rps16 in the Arabidopsis arenosa of Arabidopsis thaliana (the whole-sequence alignments chloroplast genome, including the intron, is 1,161 bp are shown in Supplementary Fig. 1). An open reading long. The genomic sequences of rps16 in D. nemorosa frame (ORF) encoding 88 amino acids of RPS16 was found and L. maritima are similar (434 bp and 468 bp, respec- in the chloroplast genomes of Arabidopsis arenosa, O. tively) to the Arabidopsis arenosa chloroplast rps16 pumila, and C. lasiocarpa (Fig. 1D). In Arabidopsis genome sequence (Supplementary Fig. 1). In Aethionema lyrata, a 1-bp insertion at position 239 of the coding grandiflorum, the loss of 18 C-terminal amino acids in sequence of the cp rps16 caused a predicted translational RPS16 is deduced by the occurrence of a nonsense muta- stop codon 19 bp upstream from the conserved transla- tion in the second exon (Supplementary Fig. 1). tional stop codon in the close relatives of Arabidopsis, and Arabis is phylogenetically close to Arabidopsis thaliana the ORF thus encodes 82 amino acids (Fig. 1D). There- (Koch et al., 2000). Therefore, it is possible that the fore, it is likely that both nuclear- and chloroplast- pseudogenization of cp rps16 observed in Arabis hirsuta encoded rps16 genes exist in Arabidopsis arenosa, also occurs in the Arabidopsis lineage. Arabidopsis lyrata, Arabidopsis thaliana, C. lasiocarpa,

Fig. 2. Phylogenetic diagram of the relationships among Brassicaceae species. The phylogenetic relationships among Brassicaceae species derived by Beilstein et al. (2006), Clauss and Koch, (2006), and Koch et al., (2000). The numbers indicate the estimated divergence times predicted by Clauss and Koch, (2006). Abbreviations in splicing status: +, spliced; –, unspliced; nt, not tested. Abbreviations for the status of rps16 in the chloroplast genome: ○, intact gene; Ψ, pseudogene; nt, not tested. Abbreviations for the mating systems: SC, self-compatible; SI, self-incompatible. Abbreviation in the phylogenetic diagram: Mya, million years ago. Different status of cp rps16 in Brassicaceae 323 and O. pumila.

Confirmation of the splicing of cp rps16 transcripts in Arabidopsis and its closely related species The group II intron is further divided into subgroups IIA and IIB. The cp rps16 contains a subgroup IIB intron (Michel et al., 1989). The strictly conserved primary con- sensus splicing sequences of group II intron are GUGYG and AY (Y: C or U) at the 5′ and 3′ splice sites, respec- tively (Lehmann and Schmidt, 2003; Michel and Ferat, 1995; Michel et al., 1989). Further analysis revealed that the consensus sequence at the conserved 5′ splice site of the group II intron of cp rps16 is disrupted in Arabidopsis thaliana, where it is changed from GUGYG to GUACG (Fig. 1B). In Arabis hirsuta and other species closely related to Arabidopsis, the 5′ and 3′ splice site con- sensus sequences are conserved. This suggests that the cp rps16 gene may have become a pseudogene as the result Fig. 3. Confirmation of the occurrence of intron splicing in cp of a splicing defect in the group II intron of Arabidopsis rps16 transcripts among Arabidopsis-related species. (A) Sche- thaliana. matic representation of the exon–intron structures of the chloro- To confirm the splicing of cp rps16 transcripts in plast-encoded rps16 and ndhB genes of Arabidopsis thaliana. Arabidopsis-related species and B. oleracea as the out- White boxes and oblique lines indicate exons and introns, respectively. The two arrows (which are not drawn to scale) group, we performed RT-PCR on RNA extracted from show the positions of the primer pairs used for the RT–PCR these plants and cloned and sequenced the cDNAs. The shown in Fig. 3B. (B) RT–PCR amplification of the chloroplast- deduced sequence of chloroplast RPS16 showed only 31% encoded rps16 and ndhB genes from Brassica oleracea (lane 1), amino acid identity to nuclear-encoded RPS16 Arabidopsis lyrata (lane 2), Arabidopsis arenosa (lane 3), (AT5G56940; dual-targeted protein to the mitochondria Arabidopsis thaliana (lane 4), O. pumila (lane 5), C. lasiocarpa (lane 6), and Arabis hirsuta (lane 7). The ndhB gene was used and chloroplasts) in Arabidopsis thaliana. Moreover, no as the control to check the efficacy of the first-strand cDNA significant nucleotide similarity between their cDNAs preparation. The bands represent size markers in increments was detected. The splicing products of cp rps16 were of 100 bp (range, 100–1,000 bp). detected in Arabidopsis arenosa, Arabidopsis lyrata, B. oleracea, and C. lasiocarpa, and the sizes of the amplified products were all 243 bp (GenBank accession numbers: pumila (1,147 bp) (Fig. 3B), although the 5′ and 3′ splice Arabidopsis arenosa, AB467268; Arabidopsis lyrata, site consensus sequences of their cp rps16 genes are con- AB467269; B. oleracea, AB514944; C. lasiocarpa, served (Figs. 1B and 1C). These data strongly suggest AB467267). The detection of ndhB splicing products that the pseudogenization of the cp rps16 has also been (320 bp) in all species was used as a control to check the initiated in at least O. pumila through a defect in the efficacy of first-strand cDNA preparation (Fig. 3B). The splicing of the gene transcript. RT-PCR result for B. oleracea showed that the splicing of rps16 occurred before the divergence of the Arabidopsis DISCUSSION lineage in Brassicaceae species. Sequencing the RT-PCR products and genomic sequences of the cp rps16 genes Timing of the cp rps16 gene substitution Our anal- from Arabidopsis lyrata, Arabidopsis arenosa, and C. ysis of the functional status of cp rps16 in Arabidopsis lasiocarpa revealed that the consensus splice site and its close relatives suggests that the pseudogenization sequences (5′, GUGYG and 3′, AY) of the group II intron of cp rps16 occurred independently in them, as in four were functional. In contrast, only the primary transcript Brassicaceae species, after their divergence. To the best of cp rps16 was detected in Arabidopsis thaliana (1,110 bp), of our knowledge, no other study has reported that the as predicted (Fig. 3B). The pseudogenization of cp rps16 status of an organelle-encoded gene differs within the in Arabidopsis thaliana was also considered in the sup- same genus, as shown in this study. Therefore, no report plemental data of Asakura et al. (2008). Our results, has defined the period from the onset of the substitution combined with theirs, strongly suggest that the pseudog- of an organelle-encoded gene by a nuclear gene to the enization of the cp rps16 was initiated in Arabidopsis occurrence of the disruption of that gene in the organellar thaliana because the intron of cp rps16 lost its splicing genome. Because pseudogenization occurred indepen- activity. Furthermore, only the primary transcript of cp dently in the chloroplast genomes investigated, the evolu- rps16 was detected in Arabis hirsuta (1,095 bp) and O. tionary timing of the gene loss from the chloroplast 324 S. ROY et al. genome by gene substitution can be accurately estimated. deletions have occurred in each species (Kahlau et al., The estimated time of divergence of Arabidopsis 2006). C-terminal variations in cp RPS16 may imply the thaliana and all the remaining Arabidopsis species was beginning of pseudogenization. The presence and splic- 3.0–5.8 million years ago (mya) (Clauss and Koch, 2006). ing of cp rps16 have been confirmed in Glycine max and Moreover, the time of divergence of the Arabidopsis and Lotus japonicus in legumes (Ueda et al., 2008). However, Olimarabidopsis (Crucihimalaya) species is estimated to Doyle et al. (1995) have demonstrated, using slot-blot have been 10–14 mya (Clauss and Koch, 2006). This hybridization, that the independent loss of the cp rps16 suggests that the independent pseudogenization of the cp has occurred in several leguminous plants, as in the rps16 in Arabidopsis thaliana and O. pumila by dysfunc- Brassicaceae. tional splicing occurred within the last 5.8 myr and 14 When all these data are considered, it is highly likely myr, respectively (Fig. 2). The occurrence of gene substi- that the pseudogenization of cp rps16 is widespread and tution implies the emergence of a nuclear-encoded coun- has occurred independently in several angiosperm spe- terpart of the organellar protein. The onset of cp rps16 cies, even when an active rps16 seems to be encoded in gene substitution is minimally estimated at 140–150 mya the chloroplast genome. Our study also indicates that (Ueda et al., 2008). Considering the time of divergence the creation of a phylogenetic tree using only one gene is of Arabidopsis thaliana and O. pumila, the nuclear unreliable and misrepresents phylogenetic relationships genome gained an rps16 copy ~140 mya, and the chloro- because a pseudogene does not always reflect the phylo- plast and nuclear copies have coexisted (perhaps redun- genetic position of each species, as is true for cp rps16. dantly) since then. However, in the last 5.8–14 myr, the cp rps16 copies have become recognizable pseudogenes in Possible correlation between self-compatibility Arabidopsis thaliana and O. pumila. (inbreeding) and the loss of an organelle-encoded This suggests that the process of complete gene substi- gene in the presence of a functional counterpart in tution of cp rps16 lasted for over 126 myr in Arabidopsis the nuclear genome Why does the loss of the rps16 thaliana and O. pumila. Thus, our work pinpoints the from the chloroplast genome seem to have accelerated in evolutionary period from the onset to the completion of evolutionarily recent times in the Arabidopsis lineage? this gene substitution event. Self-incompatibility is ancestral in the Brassicaceae, and self-compatible plants have emerged independently The possibility of widespread cp rps16 pseudogeni- during the divergence of the Brassicaceae (Mitchell-Olds zation The 5′ splicing consensus sequence of cp pseudo- et al., 2005). When Brassicaceae plants are compared in rps16 is conserved in Aethionema grandiflorum, D. terms of their self-compatibility (Hall et al., 2002), those nemorosa, O. pumila, and Arabis hirsuta, but not in L. plants with pseudo-rps16 in their chloroplast genomes maritima. Only in L. maritima has it changed from (Arabidopsis thaliana, Arabis hirsuta, and O. pumila) are GUGYG to GUGCA (Fig. 1B). This indicates that this found to be self-compatible (inbred) (Fig. 2). Aethionema pseudogenization event was initiated by nucleotide sub- grandiflorum (http://www.sunnygardens.com/garden_plants/ stitutions within an intron, as well as at the 5′ splicing aethionema/aethionema_0029.php), D. nemorosa (Hall et consensus sequence, although the disruption of the 5′ al., 2002), and L. maritima (Picóa and Retana, 2001) are splicing consensus in Arabidopsis thaliana is considered also self-compatible plants. Conversely, plants with a to define its pseudogenization. So far, the CRS2, CFM3, probably intact rps16 in their chloroplast genomes are self- and WTF1 proteins have been demonstrated to be neces- incompatible (outcrossed) (Arabidopsis arenosa (Chen, sary for the splicing of the cp rps16 subgroup IIB intron 2007), Arabidopsis lyrata, and B. oleracea (Hall et al., in angiosperms (Kroeger et al., 2009). It is possible that 2002), and S. alba (Ford and Kay, 1985; Melzer et al., their functions in promoting the splicing of the cp rps16 1990)). The expressed sequence tags (ESTs) encoding the intron have been impaired. Therefore, the pseudogeni- spliced rps16 genes of the chloroplast genomes of Raphanus zation of cp rps16 might be attributable to nucleotide raphanistrum (a self-incompatible plant; GenBank acces- changes within its intron but also to the impairment of sion numbers: EY915189 and EY911083; (Sampson, 1967) nuclear-encoded proteins. and Brassica napus (self-compatible plant; GenBank acces- rps16 is encoded in the chloroplast genomes of three sion number: EV076332; (Okamoto et al., 2007)) are avail- Arabidopsis-related species (Arabidopsis arenosa, able in the NCBI EST database. B. napus, Barbarea verna Arabidopsis lyrata, and C. lasiocarpa), and the C-terminal (http: //www.pfaf.org/user /Plant.aspx?LatinName= Barbarea amino acid residues of RPS16 vary among these species verna), C. bursa-pastoris (Hintz et al., 2006), C. lasiocarpa (Arabidopsis arenosa and C. lasiocarpa, 88 aa; Arabidopsis (Tague, 2001), C. wallichii (Hall et al., 2002), L. virginicum lyrata, 82 aa). Similar C-terminal variations in RPS16 (Lemen, 1980), and N. officinale (Manton, 1935) are self- are found in other angiosperms (Arabidopsis thaliana, 78 compatible, and self-compatible plants tend to lose the aa; Oryza sativa, Nicotiana tabacum, and Zea mays, 85 rps16 from their chloroplast genomes, whereas self- aa; Solanum lycopersicum, 88 aa) because independent incompatible plants tend to retain the rps16 in their chlo- Different status of cp rps16 in Brassicaceae 325 roplast genomes (Fig. 2). Six complete legume chloro- (2004) A sequence-based genetic map of Medicago truncatula plast genomes have been reported to date (G. max, L. and comparison of marker colinearity with M. sativa. japonicus. M. truncatula, P. vulgaris, C. arietinum, and V. Genetics 166, 1463–1502. Clauss, M. J., and Koch, M. A. (2006) Poorly known relatives of radiata) (Jansen et al., 2008; Tangphatsornruang et al., Arabidopsis thaliana. Trends Plant Sci. 11, 449–459. 2010). A similar tendency to lose the cp rps16 in self- Cusack, B. P., and Wolfe, K. H. (2007) When gene marriages compatible plants is observed in these species. The loss don’t work out: divorce by subfunctionalization. Trends of the functional rps16 from the chloroplast genome has Genet. 23, 270–272. been demonstrated in M. truncatula, P. vulgaris, C. ari- Doyle, J. J., Doyle, J. L., and Palmer, J. D. (1995) Multiple inde- pendent losses of two genes and one intron from legume etinum, and V. radiata, and these are all self-compatible chloroplast genomes. Syst. Bot. 20, 272–294. plants (Choi et al., 2004; Guo et al., 2007; Khan et al., Ford, M. A., and Kay, Q. O. N. (1985) The genetics of incompat- 2004; Toker et al., 2006). The epistatic model (Wade and ibility in Sinapis arvensis L. Heredity 54, 99–102. Goodnight, 2006) predicts that selfing reproduction main- Gray, M. W. (1992) The endosymbiont hypothesis revisited. Int. tains cyto–nuclear gene combinations and increases the Rev. Cytol. 141, 233–357. Guo, X., Castillo-Ramirez, S., Gonzalez, V., Bustos, P., Fernandez- response to the selection of epistatic combinations, poten- Vazquez, J. L., Santamaria, R. I., Arellano, J., Cevallos, M. tially encouraging gene transfer. Conversely, outcross- A., and Davila, G. (2007) Rapid evolutionary change of com- ing tends to break apart adaptive cyto–nuclear gene mon bean (Phaseolus vulgaris L) plastome, and the genomic combinations, potentially reducing the amount of adap- diversification of legume chloroplasts. BMC Genomics 8, tive transfer in outcrossing lineages (Brandvain and 228. Hall, A. E., Fiebig, A., and Preuss, D. (2002) Beyond the Arabi- Wade, 2009). In short, it predicts that the level of dopsis genome: opportunities for comparative genomics. inbreeding is positively associated with the level of func- Plant Physiol. 129, 1439–1447. tional transfer (and loss) of organellar genes (Brandvain Held, W. A., and Nomura, M. (1975) Escherichia coli 30 S ribo- et al., 2007), so self-compatibility may correlate positively somal proteins uniquely required for assembly. J. Biol. with the loss of rps16 from the chloroplast Chem. 250, 3179–3184. Hintz, M., Bartholmes, C., Nutt, P., Ziermann, J., Hameister, S., genome. Although a strong correlation between the loss Neuffer, B., and Theissen, G. (2006) Catching a ‘hopeful of the cp rps16 and self-compatibility has been observed, monster’: shepherd’s purse (Capsella bursa-pastoris) as a the mechanism is completely unknown. model system to study the evolution of development. J. Exp. Bot. 57, 3531–3542. Jansen, R. K., Wojciechowski, M. F., Sanniyasi, E., Lee, S. B., We thank the Arabidopsis Biological Resources Center and Daniell, H. (2008) Complete plastid genome sequence of (ABRC), the Genebank of National Institute of Agrobiological the chickpea (Cicer arietinum) and the phylogenetic distri- Sciences (Tsukuba, Japan), and Nottingham Arabidopsis Stock bution of rps12 and clpP intron losses among legumes Center (NASC) for providing the plant seeds. Arabis hirsuta (Leguminosae). Mol. Phylogenet. Evol. 48, 1204–1217. was provided by the RIKEN BRC through the National Bio- Kahlau, S., Aspinall, S., Gray, J. C., and Bock, R. (2006) Resource Project of the MEXT, Japan. The authors are grateful Sequence of the tomato chloroplast DNA and evolutionary to Drs. T. Shikanai and Y. Nishimura for their supports. This comparison of solanaceous plastid genomes. J. Mol. Evol. work was partly supported by a Grant-in-Aid for Scientific 63, 194–207. Research from the Ministry of Education, Culture, Sports, Khan, S., Wani, M. R., and Parveen, K. (2004) Induced genetic Science and Technology of Japan (grant 15208001) to N.T. and variability for quantitative traits in Vigna radiata (L.) K.K., and by a Japan Society for the Promotion of Science Wilczek Pak. J. Bot. 36, 845–850. Research Fellowship for Young Scientists to M.U. Koch, M. A., Haubold, B., and Mitchell-Olds, T. (2000) Compar- ative evolutionary analysis of chalcone synthase and alcohol REFERENCES dehydrogenase loci in Arabidopsis, Arabis, and related gen- era (Brassicaceae). Mol. Biol. Evol. 17, 1483–1498. Asakura, Y., Bayraktar, O. A., and Barkan, A. (2008) Two CRM Kroeger, T. S., Watkins, K. P., Friso, G., Van Wijk, K. J., and protein subfamilies cooperate in the splicing of group IIB Barkan, A. (2009) A plant-specific RNA-binding domain introns in chloroplasts. RNA 14, 2319–2332. revealed through analysis of chloroplast group II intron Beilstein, M. A., Al-Shehbaz, I. A., and Kellogg, E. A. (2006) splicing. Proc. Natl. Acad. Sci. USA 106, 4537–4542. Brassicaceae phylogeny and trichome evolution. Am. J. Lehmann, K., and Schmidt, U. (2003) Group II introns: struc- Bot. 93, 607–619. ture and catalytic versatility of large natural ribozymes. Brandvainl, Y., and Wade, M. J. (2009) The functional transfer Crit. Rev. Biochem. Mol. Biol. 38, 249–303. of genes from the mitochondria to the nucleus: the effects of Lemen, C. (1980) Allocation of reproductive effort to the male selection, mutation, population size and rate of self- and female strategies in wind-pollinated plants. Oecologia fertilization. Genetics 182, 1129–1139. 45, 156–159. Brandvain, Y., Barker, M. S., and Wade, M. J. (2007) Gene co- Manton, I. (1935) The cytological history of Watercress inheritance and gene transfer. Science5 31 , 1685. (Nasturtium officinale R. Br.). Mol. Gen. Genet. 69, 132– Chen, Z. J. (2007) Genetic and epigenetic mechanisms for gene 157. expression and phenotypic variation in plant polyploids. Melzer, S., Majewski, D. M., and Apel, K. (1990) Early changes Annu. Rev. Plant Biol. 58, 377–406. in gene expression during the transition from vegetative to Choi, H. K., Kim, D., Uhm, T., Limpens, E., Lim, H., Mun, J. H., generative growth in the long-day plant Sinapis Kalo, P., Penmetsa, R. V., Seres, A., Kulikova, O., et al. alba. Plant Cell 2, 953–961. 326 S. ROY et al.

Michel, F., and Ferat, J. L. (1995) Structure and activities of Saski, C., Lee, S. B., Daniell, H., Wood, T. C., Tomkins, J., Kim, group II introns. Annu. Rev. Biochem. 64, 435–461. H. G., and Jansen, R. K. (2005) Complete chloroplast Michel, F., Umesono, K., and Ozeki, H. (1989) Comparative and genome sequence of Glycine max and comparative analyses functional anatomy of group II catalytic introns--a review. with other legume genomes. Plant Mol. Biol. 59, 309–322. Gene 82, 5–30. Sato, S., Nakamura, Y., Kaneko, T., Asamizu, E., and Tabata, S. Millen, R. S., Olmstead, R. G., Adams, K. L., Palmer, J. D., Lao, (1999) Complete structure of the chloroplast genome of N. T., Heggie, L., Kavanagh, T. A., Hibberd, J. M., Gray, J. Arabidopsis thaliana. DNA Res. 6, 283–290. C., Morden, C. W., et al. (2001) Many parallel losses of infA Steane, D. A. (2005) Complete nucleotide sequence of the chloro- from chloroplast DNA during angiosperm evolution with plast genome from the Tasmanian blue gum, Eucalyptus multiple independent transfers to the nucleus. Plant Cell globulus (Myrtaceae). DNA Res. 12, 215–220. 13, 645–658. Tague, B. W. (2001) Germ-line transformation of Arabidopsis Mitchell-Olds, T., Al-Shehbaz, I. A., Koch, M., and Sharbel, T. F. lasiocarpa. Transgenic Res. 10, 259–267. (2005) Crucifer evolution in the post-genomic era. In: Plant Tangphatsornruang, S., Sangsrakru, D., Chanprasert, J., Diversity and Evolution: Genotypic and Phenotypic Varia- Uthaipaisanwong, P., Yoocha, T., Jomchai, N., and tion in Higher Plants (ed.: Henry, R. J.), pp. 119–137. CAB Tragoonrung, S. (2010) The chloroplast genome sequence of International, Oxfordshire, UK. mungbean (Vigna radiata) determined by high-throughput Neuhaus, H., Scholz, A., and Link, G. (1989) Structure and pyrosequencing: Structural organization and phylogenetic expression of a split chloroplast gene from mustard (Sinapis relationships. DNA Res. 17, 11–22. alba): ribosomal protein gene rps16 reveals unusual tran- Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F., scriptional features and complex RNA maturation. Curr. and Higgins, D. G. (1997) The CLUSTAL_X windows inter- Genet. 15, 63–70. face: flexible strategies for multiple sequence alignment Okamoto, S., Odashima, M., Fujimoto, R., Sato, Y., Kitashiba, aided by quality analysis tools. Nucleic Acids Res. 25, H., and Nishio, T. (2007) Self-compatibility in Brassica 4876–4882. napus is caused by independent mutations in S-locus Timmis, J. N., Ayliffe, M. A., Huang, C. Y., and Martin, W. genes. Plant J. 50, 391–400. (2004) Endosymbiotic gene transfer: organelle genomes Okumura, S., Sawada, M., Park, Y. W., Hayashi, T., Shimamura, forge eukaryotic chromosomes. Nat. Rev. Genet. 5, 123– M., Takase, H., and Tomizawa, K. (2006) Transformation of 135. poplar (Populus alba) plastids and expression of foreign pro- Toker, C., Canci, H., and Ceylan, F. O. (2006) Estimation of out- teins in tree chloroplasts. Transgenic Res. 15, 637–646. crossing rate in chickpea (Cicer Arietinum L.) sown in Persson, B. C., Bylund, G. O., Berg, D. E., and Wikstrom, P. M. autumn. Euphytica 151, 201–205. (1995) Functional analysis of the ffh-trmD region of the Ueda, M., Fujimoto, M., Arimura, S., Murata, J., Tsutsumi, N., Escherichia coli chromosome by using reverse genetics. J. and Kadowaki, K. (2007) Loss of the rpl32 gene from the Bacteriol. 177, 5554–5560. chloroplast genome and subsequent acquisition of a preex- Picóa, F. X., and Retana, J. (2001) The flowering pattern of the isting transit peptide within the nuclear gene in Populus. perennial herb Lobularia maritima: an unusual case in the Gene 402, 51–56. Mediterranean basin. Acta Oecol-Int. J. Ecol. 22, 209–217. Ueda, M., Nishikawa, T., Fujimoto, M., Takanashi, H., Arimura, Rand, D. M., Haney, R. A., and Fry, A. J. (2004) Cytonuclear S., Tsutsumi, N., and Kadowaki, K. (2008) Substitution of coevolution: the genomics of cooperation. Trends Ecol. the gene for chloroplast RPS16 was assisted by generation Evol. 19, 645–653. of a dual targeting signal. Mol. Biol. Evol. 25, 1566–1575. Sampson, D. R. (1967) Frequency and distribution of self- Wade, M. J., and Goodnight, C. J. (2006) Cyto-nuclear epistasis: incompatibility alleles in Raphanus raphanistrum. Genetics two-locus random genetic drift in hermaphroditic and dioe- 56, 241–251. cious species. Evolution 60, 643–659.