<<

Zurich Open Repository and Archive University of Zurich Main Library Strickhofstrasse 39 CH-8057 Zurich www.zora.uzh.ch

Year: 2020

Towards a model for enigmatic U‐to‐C RNA editing: the organelle genomes, transcriptomes, editomes and candidate RNA editing factors in the Anthoceros agrestis

Gerke, Philipp ; Szövényi, Péter ; Neubauer, Anna ; Lenz, Henning ; Gutmann, Bernard ; McDowell, Rose ; Small, Ian ; Schallenberg‐Rüdinger, Mareike ; Knoop, Volker

Abstract: are crucial to understand the phylogeny of early land . The emergence of ‘reverse’ U‐to‐C RNA editing accompanying the widespread C‐to‐U RNA editing in plant chloroplasts and mitochondria may be a molecular synapomorphy of a hornwort–tracheophyte clade. C‐to‐U RNA editing is well understood after identification of many editing factors in models like Arabidopsis thaliana and Physcomitrella patens, but there is no plant model yet to investigate U‐to‐C RNA editing. The hornwort Anthoceros agrestis is now emerging as such a model system. We report on the assembly and analyses of the A. agrestis chloroplast and mitochondrial genomes, their transcriptomes and editomes, and a large nuclear gene family encoding pentatricopeptide repeat (PPR) proteins likely acting as RNA editing factors. Both organelles in A. agrestis feature high amounts of RNA editing, with altogether > 1100 sites of C‐to‐U and 1300 sites of U‐to‐C editing. The nuclear genome reveals > 1400 genes for PPR proteins with variable carboxyterminal DYW domains. We observe significant variants of the ‘classic’ DYW domain, in the meantime confirmed as the cytidine deaminase for C‐to‐U editing, and discuss the first attractive candidates for reverse editing factors given their excellent matches to U‐to‐C editing targets according to the PPR‐RNA binding code.

DOI: https://doi.org/10.1111/nph.16297

Posted at the Zurich Open Repository and Archive, University of Zurich ZORA URL: https://doi.org/10.5167/uzh-178955 Journal Article Published Version

The following work is licensed under a Creative Commons: Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) License.

Originally published at: Gerke, Philipp; Szövényi, Péter; Neubauer, Anna; Lenz, Henning; Gutmann, Bernard; McDowell, Rose; Small, Ian; Schallenberg‐Rüdinger, Mareike; Knoop, Volker (2020). Towards a plant model for enigmatic U‐to‐C RNA editing: the organelle genomes, transcriptomes, editomes and candidate RNA editing factors in the hornwort Anthoceros agrestis. New Phytologist, 225(5):1974-1992. DOI: https://doi.org/10.1111/nph.16297 Research

Towards a plant model for enigmatic U-to-C RNA editing: the organelle genomes, transcriptomes, editomes and candidate RNA editing factors in the hornwort Anthoceros agrestis

Philipp Gerke1,Peter Szov€ enyi2 , Anna Neubauer2, Henning Lenz3 , Bernard Gutmann4 , Rose McDowell5, Ian Small5 , Mareike Schallenberg-Rudinger€ 1 and Volker Knoop1 1Institut fur€ Zellul€are und Molekulare Botanik (IZMB), University of Bonn, Kirschallee 1, 53115 Bonn, Germany; 2Department of Systematic and Evolutionary Botany, University of Zurich, Zollikerstr. 107, 8008 Zurich,€ Switzerland; 3IBG-2: Plant Sciences, Forschungszentrum Julich€ GmbH, 52425 Julich,€ Germany; 4EditForce Inc., West Zone #429, Kyushu University, 744 Motooka, Nishi-Ku, Fukuoka 819-0395, Japan; 5ARC Centre of Excellence in Plant Energy Biology, University of Western Australia at Crawley, Perth, WA 6009, Australia

Summary Author for correspondence:  Hornworts are crucial to understand the phylogeny of early land plants. The emergence of Volker Knoop ‘reverse’ U-to-C RNA editing accompanying the widespread C-to-U RNA editing in plant Tel: +49 228 73 6466 chloroplasts and mitochondria may be a molecular synapomorphy of a hornwort–tracheo- Email: [email protected] phyte clade. C-to-U RNA editing is well understood after identification of many editing factors Received: 29 June 2019 in models like Arabidopsis thaliana and Physcomitrella patens, but there is no plant model yet Accepted: 20 October 2019 to investigate U-to-C RNA editing. The hornwort Anthoceros agrestis is now emerging as such a model system. New Phytologist (2019)  We report on the assembly and analyses of the A. agrestis chloroplast and mitochondrial doi: 10.1111/nph.16297 genomes, their transcriptomes and editomes, and a large nuclear gene family encoding penta- tricopeptide repeat (PPR) proteins likely acting as RNA editing factors.  > Key words: Anthoceros agrestis, chloroplast Both organelles in A. agrestis feature high amounts of RNA editing, with altogether 1100 > DNA, DYW domain, mitochondrial DNA, sites of C-to-U and 1300 sites of U-to-C editing. The nuclear genome reveals 1400 genes PPR proteins, PPR-RNA binding code, reverse for PPR proteins with variable carboxyterminal DYW domains. U-to-C RNA editing, RNA editing factors.  We observe significant variants of the ‘classic’ DYW domain, in the meantime confirmed as the cytidine deaminase for C-to-U editing, and discuss the first attractive candidates for reverse editing factors given their excellent matches to U-to-C editing targets according to the PPR-RNA binding code.

et al., 1997; Steinhauser et al., 1999; Rudinger€ et al., 2012). Introduction However, there is no doubt that reverse U-to-C RNA editing is The phylogenetic placement of hornworts (Anthocerotophyta) abundantly present in hornworts (Yoshinaga et al., 1996; Stein- among land plants (Embryophyta) is still contentious (e.g. Cox, hauser et al., 1999; Kugita et al., 2003b), in ferns (Vangerow 2018). A consensus seemed to have been established that horn- et al., 1999; Guo et al., 2015; Knie et al., 2016), and, among worts are sister to vascular plants (tracheophytes), suggested by lycophytes, at least in the order Isoetales (Grewe et al., 2011). the gain of a shared mitochondrial intron absent in the other two The mechanism of C-to-U-type RNA editing is reasonably well bryophyte clades, the liverworts and the mosses (Groth-Malonek understood, mainly owing to the characterization of many RNA et al., 2005). A hornwort–tracheophyte (HT) clade was subse- editing factors in model systems such as the flowering plants quently well supported by concatenated, organellar ‘phyloge- Arabidopsis thaliana and Oryza sativa and in the moss model sys- nomic’ sequence data sets (Qiu et al., 2006). Recent phylogenetic tem Physcomitrella patens (Barkan & Small, 2014; Ichinose et al., analyses using nuclear transcriptome data sets, however, suggest 2014; Schallenberg-Rudinger€ & Knoop, 2016). By now, c. 80 alternative scenarios for the phylogeny of early embryophytes site-specific RNA editing factors have been characterized, recently (Puttick et al., 2018). summarized in the database EdiFacts, an addition to the PREPACT One intriguing character that would provide a further molecu- service for the analysis of plant-type RNA editing (Lenz et al., lar synapomorphy of the HT clade is ‘reverse’ U-to-C RNA edit- 2018). These site-specific RNA editing factors are unique RNA- ing in plant mitochondria and chloroplasts. Whereas C-to-U binding pentatricopeptide repeat (PPR) proteins featuring addi- RNA editing is present in all major land plant clades, including tional carboxyterminal domains called E1, E2, and DYW (Cheng the liverworts and the mosses, no evidence has ever been found et al., 2016). Their upstream arrays of PPRs in editing factors are for U-to-C editing in these two clades (Malek et al., 1996; Freyer of the ‘PLS-type’ featuring classical 35 amino acid P-type repeats

Ó 2019 The Authors New Phytologist (2019) 1 New Phytologist Ó 2019 New Phytologist Trust www.newphytologist.com This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made. New 2 Research Phytologist

along with shorter (S-type) and longer (L-type) variants (Lurin among land plants it is suggestive that PPR proteins remain at et al., 2004; Cheng et al., 2016). PPR arrays are fundamental for the core of target recognition also for sites of U-to-C editing. We specific binding to transcripts in a one-PPR-per-ribonucleotide present the first preliminary candidates for potential U-to-C manner, and the essentials of a PPR-RNA recognition code with RNA editing factors to be investigated in future functional stud- the fifth (5) and the last (L) amino acid of P-type and S-type PPRs ies in A. agrestis as an emerging new model system in plant molec- recognizing individual nucleotides in the RNA target have been ular biology. identified (Barkan et al., 2012; Takenaka et al., 2013; Yagi et al., 2013; Kobayashi et al., 2019; Yan et al., 2019). Materials and Methods Given its evident similarity to known cytidine deaminases, + including important zinc ion (Zn2 )-binding motifs, the terminal DNA/RNA extraction and sequencing DYW domain is the prime candidate to carry the enzymatic activ- ity to convert cytidines into uridines (Salone et al., 2007; Iyer Both RNA and DNA were extracted from the A. agrestis BONN et al., 2011; Boussardon et al., 2014; Hayes et al., 2015; Wagoner strain described previously (Sz€ovenyi et al., 2015). DNA samples et al., 2015; Ichinose & Sugita, 2018; Oldenkott et al., 2019). (100 ng) were used to prepare paired-end DNA-sequencing The upstream PPR stretch for RNA recognition linked in cis to a libraries using the Nextera XT library preparation kit (Illumina downstream E1, E2, and the DYW domain is evident for all edit- Inc., San Diego, CA, USA) and each sequenced on one-third of a ing factors in the model moss P. patens. Thanks to the simplicity MiSeq flow cell (250 bp) at the Functional Genomic Center of this plant model, all organelle editing sites in the moss have Zurich (FGCZ) as described. Raw reads were assembled using been assigned to their corresponding DYW-type editing factors the A5-MISEQ pipeline (Coil et al., 2015), specially designed for (Ichinose et al., 2013; Schallenberg-Rudinger€ et al., 2013; Sugita paired-end MiSeq reads and small genomes. Raw DNA reads et al., 2013; Ichinose et al., 2014; Schallenberg-Rudinger€ & were deposited in the European Nucleotide Archive and are avail- Knoop, 2016). However, the setup of organelle RNA editing is able under study accession no. PRJEB8683. To identify sites evidently more complex in flowering plants, where truncated pro- undergoing RNA editing in the organellar transcripts, we teins require interactions with DYW domains supplied in trans, extracted total RNA from 4-wk-old gametophyte tissues as frequently mediated by extra helper proteins (e.g. NUWA and described (Sz€ovenyi et al., 2015). RNA was processed using the multiple organellar RNA editing factor (MORF)/RNA-editing RiboMinus Plant Kit for RNA-Seq (Thermo Fisher Scientific) to factor interacting protein (RIP) proteins) in much more complex deplete ribosomal RNAs (rRNAs) and used to prepare a stranded editosomes (Takenaka et al., 2012; Bentolila et al., 2012; Boussar- RNA-sequencing (RNA-seq) library (TruSeq mRNA library kit) don et al., 2012; Sun et al., 2013, 2015, 2016; Zehrmann et al., that was paired-end sequenced (150 bp) on 1/4th lane of an Illu- 2015; Diaz et al., 2017; Bayer-Csaszar et al., 2017; Andres-Colas mina HiSeq4000 machine at FGCZ. Raw RNA-seq reads were et al., 2017; Guillaumot et al., 2017; Sandoval et al., 2019). deposited in the European Nucleotide Archive and are available In contrast to C-to-U editing, we have no idea yet about the under study (run) accession no. PRJEB33107 (ERR3383408). mechanisms of reverse U-to-C RNA editing. The main reason is that the editomes of the aforementioned model systems and those Organelle genome assembly of other flowering plants seem to be entirely devoid of U-to-C RNA editing (Edera et al., 2018; Lenz et al., 2018). We could Assembly of next-generation sequencing (NGS) raw sequence data not corroborate occasional reports of reverse RNA editing in (ERR771108) was carried out using MEGAHIT software (Li et al., angiosperms (P. Gerke, V. Knoop, unpublished findings) and 2016) with stepwise increase of k-mer values up to 141 bp to consider it likely that U-to-C RNA editing is phylogenetically assemble sequence contigs. Mitochondrial and chloroplast contigs restricted to hornworts, lycophytes, and ferns. Hence, the investi- were initially identified with BLASTN searches using available gation of reverse RNA editing calls for a new model organism organelle genomes as queries. Mitochondrial contigs were charac- from one of the latter three plant clades. For this purpose, we terized by MEGAHIT ‘multi’ values reflecting coverage between 105 consider the hornworts to be the most attractive candidates, and 337, whereas those of chloroplast origin reached higher values assuming that, independent of their exact phylogenetic position of up to 1947 for the inverted repeat (IR) regions. Gaps between among the bryophytes, they are phylogenetically closest to the contigs were due to difficult microrepeat or homopolymer evolutionary origins of U-to-C RNA editing. sequences in intergenic regions, which were filled by targeted Towards that goal, we report here on the assembly of the PCRs in both organelles. The chloroplast and mitochondrial organelle genomes of Anthoceros agrestis, on the accompanying genomes of A. agrestis were submitted under NCBI/GenBank transcriptome and editome studies, and on the first analyses of accession nos. MK087646 and MK087647, respectively. the vastly extended and surprisingly diversified nuclear gene fam- ily of ‘DYW-type’ PPR proteins in that hornwort. We speculate Determination of RNA editing events that U-to-C RNA editing has originated from the more ancient and widespread C-to-U editing, using the same mechanisms for DNA reads (ERR771108) and newly generated RNA reads were RNA target recognition linked to a biochemical enzyme variant, trimmed (settings PE phred33 ILLUMINACLIP:TruSeq3PE. possibly converting a deaminase into a transaminase. Given the fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW: likely earlier evolutionary origin of plant C-to-U RNA editing 4:15 MINLEN:36) with TRIMMOMATIC v.0.35 (Bolger et al.,

New Phytologist (2019) Ó 2019 The Authors www.newphytologist.com New Phytologist Ó 2019 New Phytologist Trust New Phytologist Research 3

2014) and mapped against the organelle genomes using GSNAP Table 1 Weights assigned to individual PPRs used as the input for the ARGET CAN REPACT et al. (Wu et al., 2016). JACUSA (Piechotta et al., 2017) was used to T S feature of P (Lenz , 2018) to scan for candidate determine RNA–DNA differences among the two mapping files RNA targets of individual PPR proteins. generated. To identify RNA editing sites we set thresholds of cov- Nucleotide erage by at least 30 reads and an editing efficiency of at least 5%. identity weights Care was given to problematic cases like RNA editing close to (%) exon–intron borders, mapping to pseudogene fragments, or Position PPR-type Pos. 5 Pos. L ACGUweight (%) mismapping to rRNA sequences from the other organelle. RNA editing was independently determined for selected cases by tar- P or S T OR S N 90 0 10 0 200 geted reverse transcription (RT)-PCR as discussed later. P or S T OR S D 10 0 90 0 200 PorS TORS NOT 50 0 50 0 200 (N OR D) Identification of candidate RNA editing factors P or S N N OR S 0 60 0 40 100 P or S N D 0 30 0 70 100 An updated version of the PPR finder tool (http://ppr.130.95. P or S N NOT 0 50 0 50 100 176.97.xip.io/fasta/) based on the recent reassessment of PPR- (N OR D OR S) types, E1, E2, and DYW domains (Cheng et al., 2016), was used L ANY ANY 25 25 25 25 0 to identify proteins encoding those domains in the A. agrestis genome assembly. A total of 3089 PPR proteins were identified, of which 1464 were selected as being of an ‘E+’ type, revealing at Searches for targets were performed using the TARGETSCAN least the beginning of a DYW domain with the characteristic PG option ‘around known editing sites’ for the newly determined box or variants thereof at its amino terminus. Amino acids at organelle editomes. Initial scores (ISC) for a match between a positions 5 and L (last) were extracted from the PPR repeats for PPR protein and a candidate target are the sum of percentages evaluation of candidate targets using the core rules of the PPR- for the individual positions. The ISC values were divided by the RNA recognition code (Barkan et al., 2012). respective matrix length (ml) to compensate for the length differ- ences of PPR arrays. For the ‘reverse’ assignments of PPR pro- teins to a given editing site, the rank (Rk) among the top matches Phylogenetic tree construction for a given protein was additionally considered to result in an PPR proteins were aligned with MAFFT (Kuraku et al., 2013) fol- ultimate ‘score-of-fit’ SOF = ISC/(ml 9 Rk). To test for statisti- lowed by manual adjustment. An alignment region comprising cal significance in the assignments of different DYW-types to C- 191 positions including the three C-terminal PPRs P2, L2, and to-U vs U-to-C editing sites, a one-proportion Z-test vs equal S2, the E1 and E2 domains (Cheng et al., 2016), and extending proportions (0.5) was conducted. into the first 20 amino acids of the DYW domain was selected for phylogenetic analysis given the variable downstream trunca- Results tions of many Anthoceros PLS-type proteins and to avoid nonin- formative similarities arising from homoplasies within the further The assembly of the complete chloroplast and mitochondrial upstream PPRs. The set of 1464 proteins initially identified was genomes of A. agrestis from NGS data were straightforward given reduced to 1428 for phylogenetic reconstruction since 36 pro- their stoichiometric dominance in the total DNA preparations. teins (including three ‘pure’ DYW proteins) showed degenera- On average, the respective contigs representing single-copy tions in the C-terminal domains. Maximum likelihood sequences in the three different plant genomes had coverages of phylogenetic tree construction was done with IQ-TREE v.1.6.5 above 1000 for chloroplast sequences, of c. 100–170 for mito- (Trifinopoulos et al., 2016) using the JTT+F+G4 model identi- chondrial sequences and c. 5–20 for nuclear sequences. fied as best-fitting substitution model with the implemented MODELFINDER (Kalyaanamoorthy et al., 2017). Node reliability The Anthoceros agrestis plastome was determined from 1000 bootstrap replicates with ultrafast bootstrap approximation UFBoot (Hoang et al., 2018). The chloroplast genome of A. agrestis is conserved as typical for land plants, featuring the canonical arrangement of a large single- copy (LSC) region and a small single-copy (SSC) region separated PPR target prediction by a pair of inverted repeats (Fig. 1). The total chloroplast DNA PPR positions 5 and L were extracted for P and S-type PPRs and (cpDNA) size is 160 760 bp, consisting of an LSC of 107 329 bp translated into weight matrices as input for the TARGETSCAN mod- and an SSC of 22 167 bp separated by a pair of IRs of 15 632 bp ule recently implemented in PREPACT (Lenz et al., 2018). Arbi- each. Likewise, the A. agrestis cpDNA features an expected gene trary numerical assignments for matches according to the PPR- complement. Noteworthy characteristics, however, are a continu- RNA recognition code were essentially as outlined previously, but ous, large ycf1 reading frame, which is disrupted in the cpDNAs now extended with weights for purine or pyrimidine ambiguities of Anthoceros angustus, formerly referred to as Anthoceros formosae should only position 5 but not position L match according to the (Kugita et al., 2003a) and Nothoceros aenigmaticus (Villarreal code rules, hence resulting in the weight matrix shown in Table 1. et al., 2013). A continuous ycf1 is, however, also present in the

Ó 2019 The Authors New Phytologist (2019) New Phytologist Ó 2019 New Phytologist Trust www.newphytologist.com New 4 Research Phytologist

Fig. 1 The Anthoceros agrestis chloroplast genome. The chloroplast DNA (cpDNA) map was drawn with OGDRAW (Lohse et al., 2013). The A. agrestis cpDNA (deposited in the database under accession no. MK087646) has a typical plant circular plastome structure consisting of a large and a small single-copyregion separated by a pair of inverted repeats and the expected gene complement. Gene categories are indicated in the legend. Notable features are the presence of a trnS-CGA gene between psbK and psbI, a continuous long ycf1 reading frame, and the presence of a group I intron in the rrn23 gene for the large ribosomal rRNA gene (rrn23). The numbers given for all protein-coding sequences indicate nonsilent C-to-U/U-to-C edits (bold) and additional silent edits after the plus sign. Symbols ‘>’ and ‘<’ indicate creation of start and stop codons by RNA editing, respectively. Genes indicated in red lack messenger RNA editing.

very recently determined cpDNA of Leiosporoceros dussii (Villar- detected a trnS-CGA gene between psbK and psbI, which is also real Aguilar et al., 2018). A group I intron (rrn23i2620g1) in the present in Leiosporoceros. Upon closer inspection, we found that rrn23 gene for the large chloroplast rRNA is exclusively present in this peculiar trnS gene is also conserved in the liverwort Pellia the genus Anthoceros. Both observations are noteworthy, given a endiviifolia, in the moss Takakia lepidozioides, and in the other more ancestral state of evolution and an extended intron comple- hornwort plastomes but had previously been missed in the respec- ment, which we also observe for the mitochondrial DNA tive annotations. (mtDNA) in A. agrestis (see below). The IRs are extended to include the 30-ends of rps12, rps7 and ndhB, which are part of the The Anthoceros agrestis chloroplast editome LSC in other land plants, including Leiosporoceros and Nothoceros. As in the other hornwort plastomes, A. agrestis also lacks the rps15 Most events of RNA editing in plant organelles serve to reconsti- gene ancestrally located between ycf1 and ndhH in the SSC. We tute conserved amino acid identities in protein coding

New Phytologist (2019) Ó 2019 The Authors www.newphytologist.com New Phytologist Ó 2019 New Phytologist Trust New Phytologist Research 5 sequences. We predicted 1371 candidate sites of RNA editing sequences match the predictions very well. Only 21 strongly pre- using the 12 nonangiosperm chloroplast references available dicted editing events remained unconfirmed, 12 nonsilent edits with the latest update of PREPACT and the default ‘commons’ were unexpected, and 52 fell below the initial prediction thresh- threshold level of 70% (Lenz et al., 2018). Analysing the tran- old owing to lack of amino acid sequence conservation at these scriptome data, we ultimately identified 1549 sites of chloroplast sites (Table S1). RNA editing (636 C-to-U and 913 U-to-C edits) in the A particularly interesting finding concerns the divergent RNA A. agrestis chloroplast (Supporting Information Table S1). We editing patterns in the two Anthoceros sister species (Fig. 2). An use the previously proposed nomenclature to designate RNA edited nucleotide in one species may be ‘pre-edited’ with the editing sites (Rudinger€ et al., 2009; Lenz et al., 2010) indicating appropriate pyrimidine already present in the cpDNA of the the affected gene, followed by ‘eU’ or ‘eC’ to indicate creation respective other species. Strikingly, the majority of C-to-U edits of uridine or cytidine, respectively, followed by its position and are shared between the two taxa, whereas reverse U-to-C editing finally, for the dominating type of edits in coding regions, the sites are much more variable, most often with A. agrestis requiring effected codon sense change. Hence, edit ‘atpAeU2TM’ would U-to-C editing where A. angustus features a cytidine at genomic create a methionine (AUG) start codon from a genomic (ACG) level (Fig. 2a). The psbC and psbD genes are prominent examples threonine codon in the atpA mRNA. The chloroplast editome (Fig. 2b,c). Of 44 nonsilent edits – 17 in psbC and 27 in psbD – includes 67 editing sites in 50 and 30 untranslated regions only 12 are shared between the two Anthoceros species, whereas (UTRs), 27 editing sites in introns, and 124 silent edits in cod- 32 occur exclusively in one taxon to reconstitute on a transcript ing regions that could not be predicted. No case of silent editing level what is genomically present in the other. Most importantly, in either direction of pyrimidine exchange was observed in AGY 26 of these unique editing sites in these two genes (i.e. 80%) are serine, CGY arginine, GGY glycine, or UGY cysteine codons, U-to-C edits in A. agrestis. fully matching previous observations of only very rare RNA edit- RNA editing contributes to classifying organellar genes as func- ing immediately downstream of a guanidine (Lenz et al., 2018). tional or as pseudogenes, most notably given the numerous neces- Many silent sites and those outside of coding regions are edited sary conversions of stops into glutamine or arginine codons to much lower degrees than those in codon-changing positions through U-to-C editing. An intriguing case is matK, the maturase (Table S1). in the group II intron of the trnK gene, highly conserved among The confirmed sites of nonsilent RNA editing fit the predic- plants. We confirmed 11 edits as predicted, but only at very low tions very well. Altogether, 275 in-frame stop codons are frequencies (Table S1). More importantly, we could not identify removed from reading frames by U-to-C editing (Table S1) and editing at 12 further sites including necessary removals of eight six translation start codons (in atpB, atpH, ccsA, cysA, ndhD, and stop codons, even after rechecking whether they had been missed petA) and four stop codons (in ndhC, ndhG, petD,andpetG) are owing to the initial threshold levels. Hence, we consider matK a created by C-to-U editing (Fig. 1). Only five very short reading pseudogene also in A. agrestis, yet in an earlier state of degenera- frames and psbA are not affected by RNA editing (Fig. 1). The tion than its counterpart in A. angustus (Kugita et al., 2003b). atpA transcript is a prototypical example with its 35 nonsilent Notably, however, the host gene of the corresponding intron, (and expected) sites edited to levels between 75% and 98% and trnK, is subject to efficient editing as expected. As in A. angustus, the only silent site (atpAeU1068PP) edited to only 10% the trnK gene features a UUC anticodon at the DNA level that (Table S1). Similarly, in psbC, 15 expected nonsilent sites are would erroneously match GAR glutamate codons, but which is edited to 91–99%, whereas an unexpected nonsilent ‘extra’ edit edited into a UUU anticodon to properly match AAR lysine (K) that does not reconstitute a conserved amino acid codons instead. (psbCeC734IT) and two others in the 50-UTR are edited only 5– Whereas RNA editing patterns suggest matK to be a 11% (Table S1). However, few notable exceptions exist. The pseudogene, exactly the opposite is observed for the hornwort- psaB gene features two silent sites (psaBeU15FF and specific small orf51 downstream of rps16, lacking sequence similar- psaBeU525LL) edited at high frequencies above 97% (Table S1). ities to other proteins. Here, we find that three RNA editing sites By contrast, removal of some stop codons is surprisingly ineffi- are shared with A. angustus, supporting a possible functional role. cient; for example, 9% at the cysTeC163*Q site or only 6% at the ndhCeC175*Q site. Because these values were barely above The Anthoceros agrestis chondrome our general criteria to reliably identify editing sites in the RNA- seq data (minimum 5% change at minimum 30-fold coverage), The A. agrestis mtDNA of 227 925 bp (Fig. 3) is larger than those we rechecked similar positions with initially undetected edits, in other hornworts previously investigated: 184 908 bp in confirming that expected edits in rpoB, rpoC2, ycf2, and, most N. aenigmaticus, earlier designated as aenigmaticus (Li notably, in ycf1 indeed exist but fell below those initial quality et al., 2009), 209 482 bp in Phaeoceros laevis (Xue et al., 2010) or thresholds (Table S1). 212 153 bp in L. dussii (Villarreal Aguilar et al., 2018). However, We verified numerous predicted RNA editing sites in chlL, it is surpassed in size by the very recently determined mtDNA ccsA, ndhE, petN, psaJ, psbN, rpl14, and rpl22 in A. agrestis that sequence of 242 410 bp in the sister taxon A. angustus (Dong also seem to be necessary in the sister species A. angustus, but were et al., 2018). The extended Anthoceros chondrome sizes result likely missed in an early RT-PCR-based editing study (Kugita from a larger set of introns, larger intron sequences, less pseudo- et al., 2003b). Altogether, the nonsilent edits identified in coding gene degeneration, and larger intergenic sequences (IGSs; Fig. 3).

Ó 2019 The Authors New Phytologist (2019) New Phytologist Ó 2019 New Phytologist Trust www.newphytologist.com New 6 Research Phytologist

A. angustus (a) C-to-U U-to-C 82 88

325 441

470

105

A. agrestis

(b) psbC A. angustus A. agrestis

(c) psbD A. angustus A. agrestis

● C-to-U ● U-to-C ● multistep edit ► stop removal Fig. 2 (a) Venn diagrams summarizing chloroplast nonsilent RNA editing events in coding regions in Anthoceros agrestis (this study) and Anthoceros angustus (Kugita et al., 2003b). Most C-to-U editing sites (441) are shared between the two sister taxa, whereas most U-to-C editing sites (470) are exclusively present in A. agrestis. (b, c) The chloroplast psbC (b) and psbD (c) genes are given as examples for the differences in the editing patterns. Editing overview panels have been created with PREPACT (Lenz et al., 2018). Scales on top indicate codon numbering; a symbol legend is shown at the bottom. Only 12 of altogether 44 editing sites are shared, whereas 32 are species specific (grey shading) to reconstitute codons conserved at the genomic level in the other species. Among the species-specific edits, 26 are U-to-C editing events exclusively occurring in A. agrestis (grey arrows). Codon 196 in psbD is affected by multistep editing psbDeCC586LP (twice U-to-C, purple dot) to convert a leucine into a proline codon (double arrowhead).

adapt a systematic maturase nomenclature as suggested (Guo & The mitochondrial gene complement Mower, 2013). Maturase loci are labelled with mat followed by a The A. agrestis mtDNA reflects an evolutionarily ancient state hyphen and the systematic name for the respective host intron. with intact genes (atp8, rpl2 and trnS-GCU) that are degenerated Accordingly, the matR locus, conserved in other plants (but or missing in other hornworts (Table 2). We explicitly include degenerated into a pseudogene in hornworts), would become here the analysis of intron-encoded maturases in group II introns, mat-nad1i728g2. The A. agrestis chondrome carries two, most which have been somewhat neglected in the previous studies and likely functional, maturases in other introns: mat-cox2i373g2 and

New Phytologist (2019) Ó 2019 The Authors www.newphytologist.com New Phytologist Ó 2019 New Phytologist Trust New Phytologist Research 7

atp9: 11/3 rpl2: 5/28 +1/0

nad3: 23/6 +2/0 cob: >27/28 +3/0 mat-cobi787g2c: 3/12

atp1: >19/15

cox3: 16/10 +4/0

+5/0 nad9: 6/12 cox2: 12/16 mat-cox2i373g2: 2/6

tatC: 21/9 +6/0 nad6: 11/3 +3/0 nad4L: 10/4 +1/0 +1/1 sdh4: 3/0 atp6: >33/12< +1/1 sdh3-PSX: 0/0 +1/1

nad2: 24/11 +2/0

nad1: 36/25< +4/1 nad4: >42/26 +6/1

cox1: >42/36 +5/2 nad5: 41/24 +13/0 atp8: 3/6 +2/0 atp4: 3/3< rpl10: >1/0

PSX: pseudogenes Fig. 3 The Anthoceros agrestis mitochondrial genome. As in other bryophytes, the chondrome maps as a circular molecule. The most notable differences to the mitochondrial DNAs (mtDNAs) of other hornworts concern less degenerated pseudogenes (PSX), more recognizable pseudogene traces, and apparently intact genes degenerated in the other species. The A. agrestis mtDNA is deposited in the database under accession no. MK087647. The genome map was drawn with OGDRAW (Lohse et al., 2013). Gene categories are indicated in the legend. The display of RNA editing sites is like in Fig. 1. mat-cobi787g2c (Fig. 3). We suggest an added ‘c’ in the latter case absent in the other hornwort genera (Table 2). Four of the to indicate that the maturase ORF is continuous with the A. agrestis mitochondrial introns are of group I (g1) and 40 are of upstream reading frame of the host gene. Like its cob host gene, the group II (g2) type, the latter including clearly detectable ‘fos- mat-cobi787g2c is heavily edited, including several sites evidently sil’ introns in degenerated pseudogenes ccmFC, rps3 and sdh3. All reconstituting amino acid positions conserved among maturases of these are conserved in the very recently determined mtDNA of (Table S2). A. angustus (Dong et al., 2018), for which we suggest a reinterpre- tation of some gene structures. Intron atp9i87g2 is conserved in liverworts, mosses, and lyco- Mitochondrial introns phytes but is absent in Nothoceros and Phaeoceros. Likely owing to The mtDNA of A. agrestis sets a record for an embryophyte the tiny second atp9 exon of only 8 bp, it has been missed in the organelle genome with altogether 44 introns, several of which are annotation of the A. angustus mtDNA where it was erroneously

Ó 2019 The Authors New Phytologist (2019) New Phytologist Ó 2019 New Phytologist Trust www.newphytologist.com 8 Research e Phytologist New www.newphytologist.com

Table 2 The Anthoceros agrestis mitochondrial DNA gene and intron complement compared with those of Anthoceros angustus (Dong et al., 2018), Nothoceros aenigmaticus (Li et al., 2009), and Phaeoceros laevis (Xue et al., 2010). (2019) A. A. N. P. A. A. N. P. A. A. N. P. agrestis angustus aenigmaticus laevis agrestis angustus aenigmaticus laevis agrestis angustus aenigmaticus laevis

atp1 ++ + +nad1 ++ + +rps12 ww w w atp1i805g2 ++ + +nad1i287g2 ++ + +rps13 ww ++ atp1i1019g2 ++ + +nad1i348g2 ++ + +rps14 + atp1i1050g2 ++ + +nad1i728g2M ++ + +rps19 ww atp4 ++ + +nad2 ++ + +rrn5 ++ + + atp6 ++ + +nad2i709g2 ++ + +rrn26 ++ + + atp6i80g2 ++ + +nad2i1282g2 ++ + +rrn18 ++ + + atp6i439g2 ++ + +nad3 ++ + +sdh3 ww w w atp8 ++ wwnad3i52g2 ++ + +sdh3i100g2 + + + + atp9 ++ + +nad3i140g2 ++ + +sdh4 ++ + + atp9i87g2 ++ nad4 ++ + +tatC ++ + + atp9i95g2 ++ + +nad4i461g2 ++ + + ccmFC ww w wnad4i976g2 ++ + +trnA-UGC ++ + + ccmFCi829g2 + + + + nad4L ++ + +trnC-GCA ++ + + cob ++ + +nad5 ++ + +trnD-GUC ++ + + cobi420g1 ++ +nad5i230g2 ++ + +trnE-UUC ++ + + cobi787g2Mc ++ + +nad5i881g2 ++ trnF-GAA ++ + + cobi838g2 ++ + +nad5i1455g2 ++ + +trnG-GCC ++ + + cox1 ++ + +nad5i1477g2 ++ + +trnG-UCC ww w w cox1i44g2 ++ + +nad6 ++ + +trnH-GUG ++ + + cox1i150g2 ++ + +nad6i444g2 ++ + +trnI-CAU ++ + + cox1i253g1 ++ nad7 ww w wtrnK-UUU ++ + + cox1i653g2 ++ nad9 ++ + +trnL-CAA ++ + + cox1i1116g1 ++ nad9i246g2 ++ + +trnL-UAA ++ + + cox1i1298g2 ++ + +nad9i502g2 ++ + +trnL-UAG ++ w + cox1i1305g1 ++ + +rpl2 ++ trnM-CAU ++ + + cox2 ++ + +rpl5 w trnMf-CAU ++ + + cox2i98g2 ++ rpl6 ww w wtrnP-UGG ++ + + cox2i281g2 ++ + +rpl10 ++ + +trnQ-UUG ++ + + e Phytologist New cox2i373g2M ++ + +rpl16 ww trnR-UCU ww w w cox2i381g2 ++ +rps1 ww w wtrnS-GCU ++ ww cox2i564g2 ++ +rps2 ww wtrnS-GCUi43g2 ++ ww cox2i691g2 ++ +rps3 ww trnS-UGA ++ ww cox3 ++ + +rps3i74g2 + + trnT-GGU ++ + +

Ó cox3i109g2 ++ +rps4 ww w wtrnV-UAC ww w w 09NwPyooitTrust Phytologist New 2019 mat-cobi787g2c ++ + w rps7 ww w wtrnW-CCA ++ + + mat-cox2i373g2 ++ w + rps8 w trnY-GUA ++ + +

Ó mat-nad1i728g2 ww w wrps11 ww w w 09TeAuthors The 2019 Phytologist Our assessment on gene and intron complements differs from those reported in the previous studies in several instances, as exemplarily discussed in the main text for selected examples. Pseudogenes New are indicated by w. Yellow shading highlights group II (g2), orange shading indicates group I (g1) introns. Introns are labelled as previously suggested (Dombrovska & Qiu, 2004; Knoop, 2004), and a nomenclature proposal (Guo & Mower, 2013) is adapted to consistently label intron-encoded maturases (‘mat’) to properly indicate their respective host introns (see main text). New Phytologist Research 9 merged with atp9i95g2 (Table 2). Anthoceros features three addi- VI at the end of group II introns (Fig. 5). Most of these editing tional introns in cox1: cox1i249g1, cox1i653g2, and cox1i1116g1. events in A. agrestis contribute to stabilizing the characteristic Introns cox1i249g1 and cox1i653g2 at present share no significant structures of these two small domains at the 30 intron ends and similarities with any other sequence in the databases. We also exist in a ‘pre-edited’ state at the mtDNA level in the homologous detected the terminal group I intron cox1i1305g1 that has been introns of other hornworts. Notably, chloroplast introns were overlooked in previous hornwort mtDNA studies where – again much less affected despite an overall dominance of chloroplast owing to a tiny exon of only 7 bp – a larger upstream cox1i1298g2 over mitochondrial RNA editing (Table S1). One U-to-C editing was erroneously annotated previously (Table 2). Intron event in consensus position 29 of domain V is shared by 19 nad5i881g2, initially detected in Anthoceros punctatus (Beckert group II introns (Figs 5, 10; see later). Later, we discuss a candi- et al., 1999), is conserved in the other Anthoceros species and in date RNA editing factor with an excellent match to the domain V Leiosporoceros but absent in the other hornworts. sequences conserved in seven of these introns (Fig. 10; see later). Most interestingly, we could identify rps3i74g2 in the A. agrestis rps3 pseudogene (Table 2), an intron previously The diverse Anthoceros agrestis nuclear PPR gene family detected only in vascular plants. Not listed in the earlier surveys (Dong et al., 2018), we now find that rps3 (including rps3i74g2) The recently redefined HMMER profiles for the different types of and rpl16 are present as pseudogenes in the mtDNA of PPRs and for the E1, E2 and DYW domains (Cheng et al., 2016) A. angustus, too. The discovery of rps3i74g2 adds to the candi- were used to scan the A. agrestis genome assemblies. We identified date synapomorphies of an HT clade. Finally, we identified the a total of 3089 loci encoding PPR proteins. Of these, only 145 trnS-GCU gene including intron trnS-GCUi43g2, conserved in were of the P-type containing only canonical P-type PPRs, liverworts and overlooked in the previous hornwort mtDNA whereas most protein models featured PLS-type PPRs, as typical annotations (Fig. 3; Table 2). for hitherto identified C-to-U RNA editing factors. Among the latter, 1480 were initially scored as ‘pure’ PLS-type PPR proteins lacking recognizable additional carboxyterminal domains, 77 The Anthoceros agrestis mitochondrial editome with an E1 domain, 447 with an E1 and E2 domain, and 928 as We identified 496 events of C-to-U and 403 sites of U-to-C edit- E+ proteins (i.e. C-terminally extended beyond their E2 ing in the mitochondrial transcriptome (Table S2). Hence, domain). Only 12 protein models were initially classified as hav- chloroplast exceeds mitochondrial RNA editing both in total ing a canonical DYW domain, including three with no extensive numbers (1549 vs 899) and in the U-to-C/C-to-U ratio (58% vs upstream PPR array. 46%). As in the chloroplast, mitochondrial RNA editing mostly Carefully reinspecting the initially identified loci revealed that reconstitutes codon identities as predicted (Table S2). Among many of these in fact feature highly deviant DYW domain vari- the rare edits in IGSs, four were found in the large and pseudo- ants and/or DYW domain truncations. A phylogenetic tree of the gene-rich IGS between rpl2 and atp9, possibly as a leftover of A. agrestis PLS-type proteins reassessed as extending beyond an formerly functional rps3 and rpl16 editing, and 10 edits were E2 domain – hence including canonical, divergent, and truncated identified in the pseudogene-rich IGS between atp8 and nad5, DYW domain variants – is shown in Fig. 6. All nine RNA editing not affecting the (likely) rpl10 pseudogene, however (Fig. 3). factors of P. patens and 28 A. thaliana RNA editing factors with Like in the chloroplast editome, we observed surprisingly inef- full-length DYW domains were used as an outgroup. ficient stop codon removal in a few cases (e.g. of only 20% for The extended DYW protein family sensu lato in A. agrestis falls cox2eC55*Q). Most importantly, RNA editing efficiencies con- into distinct clades (Fig. 6) featuring significant deviations from the sistently remained low for the tatC gene also in independent RT- DYW domains in the hitherto identified C-to-U editing factors PCR approaches, and we found only marginal evidence (< 3%) (Fig. 7). Only a few A. agrestis proteins have a complete DYW for the necessary stop codon conversion tatCeC511*Q domain with a conserved N-terminal ‘PG box’ including the char- (Table S2), leaving the status of tatC as a functional gene dubi- acteristic PGxSWIE motif (Okuda et al., 2007; Hayes et al.,2013). ous. Editing in the other mitochondrial genes, however, is largely They are accompanied by clades including C-terminally truncated as predicted. We use the cox1 gene here as an example (Fig. 4), and many ‘WW-type’ homologues that feature a notable variant of and also later in the interest of discussing candidate site-specific the PG box with the tryptophan (W) in position 5 of the editing factors. PGxSWIE motif duplicated (Figs 6, 7). Most proteins (734) of the Mitochondrial RNA editing in 50 and 30-UTRs and in other A. agrestis gene family, however, are placed in a superclade of pro- structural RNAs is low (Table S2). One noteworthy exception is the teins with generally full-length, but highly diverged DYW domains tRNA-Asp (GUC) encoded by the trnD-GUC gene. Two events of with characteristic differences in conserved amino acid positions U-to-C editing strengthen base pairing in the dihydrouridine and (Fig. 7). Proteins of this superclade are characterized by a signifi- in the pseudouridine stem by converting G–UintoG–C base pairs, cantly different PG box (‘KPAxA’) and fall into two subtypes with but a further candidate C-to-U edit to create an additional base pair ‘DRH’ or ‘GRP’ replacing the eponymous DYW tripeptide at the in the anticodon stem was not detected (Fig. S1). end of the ‘classic’ DYW domain. Strikingly, we detected many edits in mitochondrial introns. A Within the KPAxA superclade, the GRP-type proteins domi- surprising number of 61 RNA editing sites in both directions of nate in number and seem to be a more recent expansion of the pyrimidine exchange cluster in the characteristic domains V and gene family emerging from the DRH-type proteins (Fig. 6).

Ó 2019 The Authors New Phytologist (2019) New Phytologist Ó 2019 New Phytologist Trust www.newphytologist.com New 10 Research Phytologist

Fig. 4 The cox1 gene as an example for mitochondrial RNA editing in Anthoceros agrestis. The cox1 exons show 80 sites of C-to-U (blue) and U-to-C (red) editing, predictably reconstituting conserved codons. Silent editing sites are shown in green font and codons affected by multiple editing in purple. The output is based on the complementary DNA analysis function in PREPACT (Lenz et al., 2018). Some 50% of A. agrestis edits are shared with lycophytes Isoetes engelmannii and/or Selaginella moellendorffii included in the PREPACT 3.0 editome references. RNA editing of the cox1 start codon simultaneously creates the stop codon for atp4, which overlaps by 4 bp. Top-scoring candidate binding regions are indicated exemplarily for PLS-type PPR proteins of the DRH, GRP and WW types now identified in A. agrestis, which show characteristic differences to canonical DYW domains (see Figs 6, 7).

Other than a joint four-amino-acid deletion, many differences in variants, prime editing factor candidates, also considering their amino acid conservation (most notably G3A, S5A, D21E, Y47H, large number correlating well with the abundant RNA editing S71A, I76L, I89L, V97M, K110R, F124V, and Y137R) are now identified. The high number of proteins with deviant DYW shared between all KPAxA-type DYW domains (Fig. 7). Addi- domains is intriguing in the light of the high amount of reverse tional differences (most notably V4, T6, A31, L32, V34, T37, U-to-C RNA editing that we could identify. Naturally, the char- P57, S85, V90, E106, D115, G121, Y122, A128, K135, G136 acteristic DRH and GRP DYW domain variants (Fig. 7) could and P138) are seen in GRP-type DYW domains alone (Fig. 7). be attractive candidates to represent factors for reverse U-to-C RNA editing. We used the consensus motifs for the different DYW protein Candidate factors for C-to-U and U-to-C RNA editing clades now identified in A. agrestis (Fig. 7) as queries to scan Previously characterized C-to-U RNA editing factors are PLS- available genome and transcriptome data (GenBank/NCBI and type proteins with a complete canonical DYW domain or, when OneKP data). We could identify DYW proteins with a canoni- truncated, extend across a recognizable PG box motif and acquire cal PG box in all land plant clades except the marchantiid (com- the DYW cytidine deaminase activity in trans. Hence, we con- plex-thalloid) liverworts, lacking RNA editing altogether and sider most of the DYW variant proteins in A. agrestis, including fitting the previous surveys on RNA editing (Rudinger€ et al., the truncated versions with recognizable PG boxes and their 2012). By contrast, KPAxA-type DYW domains were exclusively

New Phytologist (2019) Ó 2019 The Authors www.newphytologist.com New Phytologist Ó 2019 New Phytologist Trust New Phytologist Research 11

Fig. 5 Numerous events of RNA editing (yellow background) were identified in terminal domains V and VI of mitochondrial group II introns and contribute to stabilizing their canonical secondary structures by converting G–U into G–C pairs (red) or establishing A–U pairs from A–C mismatches (blue, colons). The bulged A for lariat formation in domain VI is highlighted (bold, underlined). Two cases of apparent misediting weakening base pairs in atp1i805g2 and nad4i976g2 occur at low frequencies only (Supporting Information Table S2). Seven additional C-to-U edits could be expected for atp1i1019g2, cox2i281g2, cox2i564g2, nad1i728g2, nad2i709g2, nad2i1282g2 and nad9i502g2 (blue font, no background) but were not observed in the transcriptome data. Noteworthy is the U-to-C editing event in domain V consensus position 29 (underlined in the cobi787g2 example, top right) occurring in altogether 19 introns. This editing event is also identified in seven further introns (Fig. 10) where domain V sequences match a candidate RNA editing factor. identified in the available transcriptome data of hornworts, a single case of reverse U-to-C editing had been identified ferns, and lycophytes. We could not find evidence for KPAxA- (Hecht et al., 2011; Oldenkott et al., 2014). Hence, the presence type proteins in mosses, liverworts, seed plants, or in the of the now discovered KPAxA-type DYW domains with their Selaginellales, where, despite most extensive C-to-U editing, not divergent amino acid signatures (Fig. 7) seems to have a perfect

Ó 2019 The Authors New Phytologist (2019) New Phytologist Ó 2019 New Phytologist Trust www.newphytologist.com New 12 Research Phytologist

Fig. 6 Maximum likelihood phylogenetic tree of 1428 PLS-type PPR proteins identified in the nuclear genome assembly of Anthoceros agrestis, which carry carboxyterminal domains extending into recognizable full or truncated DYW domains. The DYW-type PPR proteins of Physcomitrella patens and those identified as C-to-U RNA editing factors in Arabidopsis thaliana were used to root the gene family tree. Only five proteins in A. agrestis have full-length canonical DYW domains; many others are variably truncated behind the ‘PG box’. Most proteins in A. agrestis with characteristic alterations in their DYW domain signatures (‘WW-type’, ‘DRH-type’ and ‘GRP-type’; see also Fig. 7) fall into clades with significant bootstrap support. Protein models are shown next to each clade, with significant amino acid changes indicated in the colour of the respective collapsed clade. Amino acids of the conserved cytidine deaminase signature are underlined. Dotted lines indicate variable E2/DYW domain truncations in some members of the respective clades.

New Phytologist (2019) Ó 2019 The Authors www.newphytologist.com New Phytologist Ó 2019 New Phytologist Trust New Phytologist Research 13

PGxSWWTD in ‘WW-type’

ePGxSW_DYW

KPAxA_DRH

KPAxA_GRP

|4 aa| deletion

ePGxSW_DYW

KPAxA_DRH

KPAxA_GRP

Fig. 7 Different sequence conservation profiles in the DYW domain variants ‘DRH’ and ‘GRP’ identified in Anthoceros agrestis. Conservation plots were created using the WEBLOGO service at http://weblogo.berkeley.edu/logo.cgi (Crooks et al., 2004). The profile for canonical DYW domains (ePGxSW_DYW-type) is based on the alignment of proven C-to-U RNA editing factors with full-length DYW domains characterized in Arabidopsis thaliana (28 sequences) and Physcomitrella patens (nine sequences) and their five full-length homologues in A. agrestis. No significant differences in the conservation profile are observed for the 236 C-terminally truncated homologues in A. agrestis in the respective amino-terminal regions. Numerous characteristic changes in conserved positions along the entire DYW domain are observed among the KPAxA_DRH (168 sequences) and the KPAxA_GRP- type DYW proteins (482 sequences) identified in A. agrestis. Truncated DRH-type and GRP-type proteins lacking a DYW domain were excluded for the WEBLOGO creation. Significant changes in amino acids conserved at a threshold of at least 0.6 (orange lines) are highlighted with black lines for positions shared among all KPAxA-type DYW domains and with red lines for those in the GRP-type proteins alone. Other than shifts in amino acid conservation, the KPAxA-type proteins share a deletion of four amino acids (alignment positions 24–27). The amino-terminal PGSWIE heptapeptide motif of the ‘PG box’ in the canonical DYW domain is changed to PGSWWTD including a tryptophan (W) duplication in alignment position 6 in 474 truncated ‘WW-type’ proteins 70 101 125 134 (top left). No differences are seen, however, in the highly conserved motifs H xExnCxxC and H xFx4CSC , which are very likely relevant for binding zinc ions (Hayes et al., 2013). phylogenetic overlap with taxa showing reverse U-to-C RNA revealed that the reverse U-to-C editing sites strongly dominate editing. among the top candidate targets for the KPAxA-type proteins Consequently, we strived to identify candidate targets in the (Fig. 8). By contrast, the PPR arrays upstream of the classic organelle transcriptomes for the PPR arrays in front of the differ- PGxSWIE-type and of the WW-type variants preferentially ent DYW variant proteins. To this end, we used all reassessed matched sequences upstream of the now identified C-to-U edit- proteins featuring at least a recognizable PG box variant and min- ing positions. imally 14 upstream PLS-type PPRs to extract possible RNA tar- Examples of top matches for one member each of the deviant geting information from positions 5 and L of their P and S-type WW, DRH, and GRP-type DYW proteins within the cox1 PPRs (see Materials and Methods section). To avoid bias, no pre- mRNA are shown in Fig. 9. The PPR arrays in each of the pro- selection for possible organelle targeting preference to chloro- teins have at least 13 perfect matches to their potential targets plasts or mitochondria was done, and top matches were scored upstream of the respective editing sites, with the WW-type pro- both for searches of PPR arrays against candidate editing targets tein matching to a C-to-U editing site and the DRH and GRP (Fig. 8a) and the other way around (Fig. 8b). This analysis proteins potentially binding upstream of U-to-C editing sites.

Ó 2019 The Authors New Phytologist (2019) New Phytologist Ó 2019 New Phytologist Trust www.newphytologist.com New 14 Research Phytologist

(a) (b)

Fig. 8 Bar charts summarizing the respective top matches between members of the five different clades of DYW proteins now identified in Anthoceros agrestis (x-axis) and all C-to-U (blue) or U-to-C (red) editing sites now determined in the two organelles. Positions 5 and L were extracted from P and S- type PPRs of 1049 ‘DYW’ proteins featuring at least 14 PPRs and translated into a scoring matrix following the PPR-RNA code rules (see the Materials and Methods section). (a) The numbers of respective top matching C-to-U or U-to-C editing sites in the complete A. agrestis organelle editome for the members of the five different DYW protein clades. The respective numbers of PPR proteins per type are given above the bars. Only cases where the score of the best-fitting editing site is higher than the second-best hit are included. (b) Reciprocal assignments for the superset of 2405 editing sites identified among the top candidate targets for the different DYW-type proteins under A. ***, P > 0.99 for the preferred mutual assignments of DRX and GRP proteins to U-to-C edits and SWIEV and SWW-type DYW proteins to C-to-U edits over equal distributions (50% each) in the one-proportion Z-test.

We noted that among the RNA editing sites within group II The biochemical composition of cell-wall xyloglucans (Pena~ intron domains V and VI, one event of U-to-C editing in et al., 2008; Schultink et al., 2014) or the ‘fossil’ group II intron domain V consensus position 29 is shared among 19 different rps3i74g2 in A. agrestis identified here may also support an HT mitochondrial group II introns (Figs 5, 10). Seven of these clade, similar to the nad5i1477g2 intron (Groth-Malonek et al., introns share extended similarities in their upstream domain V 2005) or, possibly, the evolutionary gain of ‘reverse’ U-to-C sequences. The PPR array of a GRP-type DYW protein matches RNA editing in land plant organelles. excellently to the candidate target sequence upstream of this The ‘reverse’ type of U-to-C RNA editing in plant organelles shared U-to-C editing site (Fig. 10). Intriguingly, matches are is frequently referred to as ‘occasional’, suggesting it to comprise not only observed for the P and S-type PPRs according to the rare events accompanying the dominant and near-omnipresent PPR-RNA recognition code rules but may be extended to two of C-to-U editing in plant chloroplasts and mitochondria. How- the L-type repeats (L-5SN and L-8SN) potentially matching the ever, U-to-C editing is clearly abundantly present in hornworts conserved adenines in the corresponding targets, as recently and ferns and can even be the dominant direction of pyrimidine observed for the moss DYW protein PPR65 targeting the ccmFC exchange RNA editing, as reported here and in earlier studies RNA (Oldenkott et al., 2019). (Kugita et al., 2003b; Guo et al., 2015; Knie et al., 2016). Independent of their exact phylogenetic position, hornworts will likely represent the most ancient plant clade featuring reverse U-to- Discussion C RNA editing in their organelle genomes. Hence, we consider Despite being the species poorest of all major clades of extant them the best apriorichoice for future functional studies of U-to-C land plants, hornworts are fundamentally important to under- RNA editing, likely retaining the ancestral features of the ‘reverse’ stand the backbone phylogeny of embryophytes (Villarreal et al., editing biochemistry. A small genome size of only 84 Mbp – no- 2013). Fossils like Horneophyton may represent ‘evolutionary tably in contrast to the voluminous and polyploid genomes of most bridges’ between bryophytes and early tracheophytes (e.g. ferns – and the current progress on establishing it as a new plant Hetherington & Dolan, 2018), but no morphological or devel- model system (Sz€ovenyi et al.,2015)makeA. agrestis particularly opmental synapomorphies conclusively resolve the phylogeny of attractive for studies of U-to-C RNA editing. the three extant bryophyte clades relative to tracheophytes. The With the assembled A. agrestis chloroplast and mitochondrial discussion has recently been reactivated with analyses of large genomes and their complete editomes now available in addition nuclear transcriptome data sets (Wickett et al., 2014; Cox, 2018; to nuclear genome assemblies, the hornwort allows the correla- de Sousa et al., 2018; Morris et al., 2018; Puttick et al., 2018; tion of abundant RNA editing in both directions of pyrimidine Rensing, 2018a,b), questioning the previously suggested phy- exchange with potential specificity factors. Analysing the vastly logeny with liverworts sister to all other embryophytes and an extended and diversified family of DYW-type PPR proteins in HT clade (Qiu et al., 1998; Groth-Malonek et al., 2005; Qiu A. agrestis revealed three highly derived variants (referred to here et al., 2006). Notably, however, the latter phylogeny was identi- as WW, DRH, and GRP; see Figs 6, 7) of the likely ancestral fied again in a recent study using concatenated chloroplast genes DYW domain characteristic of RNA editing factors identified in with broad embryophyte taxon sampling (Lutzoni et al., 2018). C-to-U-only RNA editing models like Arabidopsis or

New Phytologist (2019) Ó 2019 The Authors www.newphytologist.com New Phytologist Ó 2019 New Phytologist Trust New Phytologist Research 15

(a)

Fig. 9 Matches of selected PLS-type PPR proteins with noncanonical DYW domain variants (a) WW, (b) DRH, and (c) GRP having top-scoring candidate targets upstream of RNA editing sites in the cox1 (b) gene. The potential cox1 targets (underlined sequences in Fig. 4) are the respective best- scoring sequences upstream of more than 2400 organelle editing sites now identified for each protein according to a scoring matrix following the PPR-RNA recognition rules (see the Materials and Methods section). Numbering runs backward both for the target sequence upstream of the editing sites and for the PPR arrays with the terminal S2- type PPR juxtaposed with target position À4. The terminal ‘P2-L2-S2’ PPR triplet with slightly differing amino acid signatures is (c) underlined. Background shading indicates matches following the core RNA recognition rules for P and S-type PPRs (grey shading) according to amino acids in positions 5 and L (T/S+N: A; T/S+D: G; N+D: U; N+S: C; N+N: Y), with green indicating perfect matches, blue indicating pyrimidine transitions, and orange indicating mismatches. U-to-C editing is indicated in red; C-to-U editing is in blue.

Physcomitrella. Intriguingly, we find that the DRH and GRP vari- large PLS-type PPR gene family in the A. agrestis genome, which ants comprising the larger KPAxA clade now identified seem to feature the conserved cytidine deaminase signatures and a termi- have no homologues in plant taxa lacking U-to-C RNA editing nal DYW tripeptide (Fig. S2). Judged from transcript coverage, and to match preferably to reverse U-to-C editing sites in these three genes are more highly expressed than the PLS-type A. agrestis (Figs 8–10). Additionally, the significant variability of proteins, just as previously found for DYW2 in angiosperms U-to-C RNA editing in particular within Anthoceros (Fig. 2) may (Andres-Colas et al., 2017). Hence, they could represent DYW reveal valuable insights on the coevolution of editing sites and domains to be supplied in trans for the many truncated proteins their cofactors. in A. agrestis, similar to the C-to-U editing setup in angiosperms. One surprising additional result of our survey is the numerous Disruption of single-polypeptide RNA editing factors like in the edits in the small terminal domains V and VI of mitochondrial moss Physcomitrella (Schallenberg-Rudinger€ & Knoop, 2016) group II introns. The current understanding of target identifica- may have occurred independently or may be yet another synapo- tion implies an alignment of PPRs and RNAs in a collinear fash- morphy of the HT clade. We found no evidence in A. agrestis, ion, but RNA secondary structures may interfere with this however, for additional, non-PPR ‘helper’ components identified process. RNA editing may occur immediately after transcription in angiosperms, such as MORFs/RIPs indicative of more com- before highly base-paired secondary structures are formed, or the plex editosomes (Bentolila et al., 2012; Takenaka et al., 2012; binding of the PPR array may compete with secondary structure Zehrmann et al., 2015; Bayer-Csaszar et al., 2017; Haag et al., formation in an equilibrium of RNA molecules in different states. 2017). Despite hundreds of conventional C-to-U edits along with the By contrast, complete full-length DYW domains dominate more abundant U-to-C edits, we found most of the ‘classic’ among the here defined KPAxA-type DYW proteins (Fig. 6). DYW domains typical of C-to-U editing factors to be C-termi- Despite the numerous deviations in their amino acid conservation + nally truncated in A. agrestis (Fig. 6). In the angiosperm models, profiles, the cytidine deaminase signature for Zn2 coordination such truncations are compensated for by separate DYW domains (Salone et al., 2007; Boussardon et al., 2014; Hayes et al., 2015; provided in trans (Boussardon et al., 2012; Andres-Colas et al., Wagoner et al., 2015; Ichinose & Sugita, 2018) is highly con- 2017; Diaz et al., 2017; Guillaumot et al., 2017). Interestingly, served among these proteins (Fig. 7). Consequently, and despite we also identified three small DYW-only proteins outside of the all the many differences compared with the more widespread

Ó 2019 The Authors New Phytologist (2019) New Phytologist Ó 2019 New Phytologist Trust www.newphytologist.com New 16 Research Phytologist

(a)

(b)

Fig. 10 A candidate U-to-C RNA editing factor for a conserved editing event in Anthoceros agrestis mitochondrial group II introns. (a) Graphic display of group II intron domains V and VI is like in Fig. 5. The U-to-C RNA editing event in consensus position 29 of domain V is shared by altogether 19 group II introns (see Fig. 5 for 12 additional cases). (b) The PPR array of the variant DYW protein KPAxA-GRP-8056F3 matches to the domain V sequences of seven introns upstream of the editing site in consensus position 29. Numbering and shading are like in Fig. 9. PPR-13 is of the ‘SS’-type (italics). Other than for the P and S-type repeats (light grey), matches are also observed for L-8SN and L-5SN (dark grey) juxtaposed with conserved adenosines in positions À11 and À8, respectively.

‘classic’ counterpart (Fig. 7), the KPAxA-type DYW domain cytidine deaminase acting on polyribonucleotides (Oldenkott would be unlikely to use a completely different mechanism of et al., 2019). Possibly, the new E. coli assay system could also prove catalysis. As a working hypothesis, the characteristic differences to perform reverse U-to-C RNA editing and would subsequently between the KPAxA-type and classic DYW domains may result in allow the analysis of its biochemistry in detail. The matches acceptance of an amino-group donor as a co-substrate for uridine between candidate reverse editing factors and their potential tar- amination. We assume that future work on the organelle editomes gets identified here are evidently a good starting point in that and candidate editing factors in Anthoceros presented here will direction. The simple bacterial system will be more straightfor- help to elucidate those and alternative hypotheses. Possibly, such ward and superior in allowing the screening of many more candi- reverse editing factors may in the future even prove to operate in date factors and targets than by genetic transformation of heterologous systems, as recently shown for C-to-U RNA editing established plant models like Arabidopsis or Physcomitrella. In par- factors of Physcomitrella, conferring C-to-U editing in Escherichia allel, we will aim for the creation of knockout lines for candidate coli and ultimately confirming the ‘classic’ DYW domain as reverse editing factors in A. agrestis to elucidate their function.

New Phytologist (2019) Ó 2019 The Authors www.newphytologist.com New Phytologist Ó 2019 New Phytologist Trust New Phytologist Research 17

Comparing plant lifestyles gives no reasonable clues as to why Barkan A, Small I. 2014. Pentatricopeptide repeat proteins in plants. Annual Review of Plant Biology 65: 415–442. some plant lineages (like the Selaginellales) have lost reverse RNA   € € editing altogether, may have never possessed it in the first place Bayer-Csaszar E, Haag S, Jorg A, Glass F, Hartel B, Obata T, Meyer EH, Brennicke A, Takenaka M. 2017. The conserved domain in MORF proteins has (possibly mosses and liverworts, depending on the ultimately true distinct affinities to the PPR and E elements in PPR RNA editing factors. phylogeny of the bryophyte clades), or why U-to-C editing may Biochimica et Biophysica Acta (BBA) – Gene Regulatory Mechanisms 1860:813–828. even dominate over C-to-U editing in other lineages (Knie et al., Beckert S, Steinhauser S, Muhle H, Knoop V. 1999. A molecular phylogeny of bryophytes based on nucleotide sequences of the mitochondrial nad5 gene. 2016). Based on the working hypotheses presented here, the experi- – mental approaches outlined herein will hopefully help to answer Plant Systematics and Evolution 218: 179 192. Bentolila S, Heller WP, Sun T, Babina AM, Friso G, van Wijk KJ, Hanson MR. that puzzling evolutionary question or, for example, also why RNA 2012. RIP1, a member of an Arabidopsis protein family, interacts with the editing evolves so dramatically fast in at least some genera, like protein RARE1 and broadly affects RNA editing. Proceedings of the National Amaranthus or Silene among the angiosperms (Sloan et al.,2010; Academy of Sciences, USA 109: E1453–E1661. Bolger AM, Lohse M, Usadel B. 2014. TRIMMOMATIC: a flexible trimmer for Hein et al.,2019),Selaginella among the lycophytes (Smith, 2019), – Adiantum among ferns (Zumkeller et al., 2016), or, as also demon- Illumina sequence data. Bioinformatics 30: 2114 2120. Boussardon C, Avon A, Kindgren P, Bond CS, Challenor M, Lurin C, Small I. strated here for U-to-C editing, in Anthoceros among the hornworts. 2014. The cytidine deaminase signature HxE(x)nCxxC of DYW1 binds zinc and is necessary for RNA editing of ndhD-1. New Phytologist 203: 1090–1095. Boussardon C, Salone V, Avon A, Berthome R, Hammani K, Okuda K, Shikanai Acknowledgements T, Small I, Lurin C. 2012. Two interacting proteins are necessary for the – Work on RNA editing is supported by a grant of the German editing of the ndhD-1 site in Arabidopsis plastids. Plant Cell 24: 3684 3694. Cheng S, Gutmann B, Zhong X, Ye Y, Fisher MF, Bai F, Castleden I, Song Y, Research Foundation (DFG grant no. SCHA1952/2-1) to MS- Song B, Huang J et al. 2016. Redefining the structural motifs that determine R. PS and AN are thankful for the financial support of the Swiss RNA binding and RNA editing by pentatricopeptide repeat proteins in land National Science Foundation (grants 160004 and 131726), the plants. The Plant Journal 85: 532–547. Coil D, Jospin G, Darling AE. 2015. A5-MISEQ: an updated pipeline to assemble Georges and Antoine Claraz Foundation (Switzerland), the US – National Science Foundation, the ‘Forschungskredit’, and the microbial genomes from Illumina MiSeq data. Bioinformatics 31: 587 589. Cox CJ. 2018. Land plant molecular phylogenetics: a review with comments on University Research Priority Program ‘Evolution in Action’ of evaluating incongruence among phylogenies. Critical Reviews in Plant Sciences the University of Zurich. AN was also supported by the Founda- 37: 113–127. tion of German Business (sdw). Crooks GE, Hon G, Chandonia JM, Brenner SE. 2004. WEBLOGO: a sequence logo generator. Genome Research 14: 1188–1190. Diaz MF, Bentolila S, Hayes ML, Hanson MR, Mulligan RM. 2017. A protein Author contributions with an unusually short PPR domain, MEF8, affects editing at over 60 Arabidopsis mitochondrial C targets of RNA editing. The Plant Journal 92: PG assembled organelle genomes and did editome analyses. AN 638–649. and PS isolated nucleic acids and conducted NGS and assem- Dombrovska E, Qiu Y-L. 2004. Distribution of introns in the mitochondrial gene nad1 in land plants: phylogenetic and molecular evolutionary blies. IS, HL, PG, BG and RM designed bioinformatic pipelines – to analyse PPR proteins and editing targets. MSR did phyloge- implications. Molecular Phylogenetics and Evolution 32: 246 263. Dong S, Wu H, Zhang S, Zhang L, Liu Y, Xue JY, Chen Z, Goffinet B. 2018. netic analyses. VK and MSR designed the study and coordinated Complete mitochondrial genome sequence of Anthoceros angustus: conservative experimental efforts. VK wrote and edited the manuscript after evolution of the mitogenomes in hornworts. Bryologist 121:14–22. critical input from the co-authors. Edera AA, Gandini CL, Sanchez-Puerta MV. 2018. Towards a comprehensive picture of C-to-U RNA editing sites in angiosperm mitochondria. Plant Molecular Biology 97:1–17. ORCID Freyer R, Kiefer-Meyer M-C, K€ossel H. 1997. Occurrence of plastid RNA editing in all major lineages of land plants. Proceedings of the National Academy Bernard Gutmann https://orcid.org/0000-0003-4657-0925 of Sciences, USA 94: 6285–6290. € Volker Knoop https://orcid.org/0000-0002-8485-9423 Grewe F, Herres S, Viehover P, Polsakiewicz M, Weisshaar B, Knoop V. 2011. A unique transcriptome: 1782 positions of RNA editing alter 1406 codon Henning Lenz https://orcid.org/0000-0002-8080-0328 € identities in mitochondrial mRNAs of the lycophyte Isoetes engelmannii. Mareike Schallenberg-Rudinger https://orcid.org/0000-0002- Nucleic Acids Research 39: 2890–2902. 6874-4722 Groth-Malonek M, Pruchner D, Grewe F, Knoop V. 2005. Ancestors of trans- Ian Small https://orcid.org/0000-0001-5300-1216 splicing mitochondrial introns support serial sister group relationships of Peter Sz€ovenyi https://orcid.org/0000-0002-0324-4639 hornworts and mosses with vascular plants. Molecular Biology and Evolution 22: 117–125. Guillaumot D, Lopez-Obando M, Baudry K, Avon A, Rigaill G, Falcon de Longevialle A, Broche B, Takenaka M, Berthome R, De Jaeger G et al. 2017. Two interacting PPR proteins are major Arabidopsis editing factors in plastid References and mitochondria. Proceedings of the National Academy of Sciences, USA 114: Andres-Colas N, Zhu Q, Takenaka M, De Rybel B, Weijers D, Van Der 8877–8882. Straeten D. 2017. Multiple PPR protein interactions are involved in the RNA Guo W, Grewe F, Mower JP. 2015. Variable frequency of plastid RNA editing editing system in Arabidopsis mitochondria and plastids. Proceedings of the among ferns and repeated loss of uridine-to-cytidine editing from vascular National Academy of Sciences, USA 114: 8883–8888. plants. PLoS ONE 10: e0117075. Barkan A, Rojas M, Fujii S, Yap A, Chong YS, Bond CS, Small I. 2012. A Guo W, Mower JP. 2013. Evolution of plant mitochondrial intron-encoded combinatorial amino acid code for RNA recognition by pentatricopeptide maturases: frequent lineage-specific loss and recurrent intracellular transfer to repeat proteins. PLoS Genetics 8: e1002910. the nucleus. Journal of Molecular Evolution 77:43–54.

Ó 2019 The Authors New Phytologist (2019) New Phytologist Ó 2019 New Phytologist Trust www.newphytologist.com New 18 Research Phytologist

Haag S, Schindler M, Berndt L, Brennicke A, Takenaka M, Weber G. 2017. tool PREPACT and an update on RNA editing site nomenclature. Current Crystal structures of the Arabidopsis thaliana organellar RNA editing factors Genetics 56: 189–201. MORF1 and MORF9. Nucleic Acids Research 45: 4915–4928. Li D, Luo R, Liu C-M, Leung C-M, Ting H-F, Sadakane K, Yamashita H, Lam Hayes ML, Dang KN, Diaz MF, Mulligan RM. 2015. A conserved glutamate T-W. 2016. MEGAHIT v1.0: a fast and scalable metagenome assembler driven residue in the C-terminal deaminase domain of pentatricopeptide repeat by advanced methodologies and community practices. Methods 102:3–11. proteins is required for RNA editing activity. Journal of Biological Chemistry Li L, Wang B, Liu Y, Qiu YL. 2009. The complete mitochondrial genome 290: 10136–10142. sequence of the hornwort Megaceros aenigmaticus shows a mixed mode of Hayes ML, Giang K, Berhane B, Mulligan RM. 2013. Identification of two conservative yet dynamic evolution in early land plant mitochondrial genomes. pentatricopeptide repeat genes required for RNA editing and zinc binding by Journal of Molecular Evolution 68: 665–678. C-terminal cytidine deaminase-like domains. Journal of Biological Chemistry Lohse M, Drechsel O, Kahlau S, Bock R. 2013. ORGANELLARGENOMEDRAW – a 288: 36519–36529. suite of tools for generating physical maps of plastid and mitochondrial Hecht J, Grewe F, Knoop V. 2011. Extreme RNA editing in coding islands and genomes and visualizing expression data sets. Nucleic Acids Research 41: W575– abundant microsatellites in repeat sequences of Selaginella moellendorffii W581. mitochondria: the root of frequent plant mtDNA recombination in early Lurin C, Andres C, Aubourg S, Bellaoui M, Bitton F, Bruyere C, Caboche M, tracheophytes. Genome Biology and Evolution 3: 344–358. Debast C, Gualberto J, Hoffmann B et al. 2004. Genome-wide analysis of Hein A, Brenner S, Knoop V. 2019. Multifarious evolutionary pathways of a Arabidopsis pentatricopeptide repeat proteins reveals their essential role in nuclear RNA editing factor: disjunctions in co-evolution of DOT4 and its organelle biogenesis. Plant Cell 16: 2089–2103. chloroplast target rpoC1eU488SL. Genome Biology and Evolution 11: 798–813. Lutzoni F, Nowak MD, Alfaro ME, Reeb V, Miadlikowska J, Krug M, Arnold AE, Hetherington AJ, Dolan L. 2018. Bilaterally symmetric axes with rhizoids Lewis LA, Swofford DL, Hibbett D et al. 2018. Contemporaneous radiations of composed the rooting structure of the common ancestor of vascular plants. fungi and plants linked to symbiosis. Nature Communications 9: e5451. Philosophical Transactions of the Royal Society B: Biological Sciences 373: Malek O, L€attig K, Hiesel R, Brennicke A, Knoop V. 1996. RNA editing in e20170042. bryophytes and a molecular phylogeny of land plants. EMBO Journal 15: Hoang DT, Chernomor O, von Haeseler A, Minh BQ, Vinh LS. 2018. 1403–1411. UFBOOT2: improving the ultrafast bootstrap approximation. Molecular Biology Morris JL, Puttick MN, Clark JW, Edwards D, Kenrick P, Pressel S, Wellman and Evolution 35: 518–522. CH, Yang Z, Schneider H, Donoghue PCJ. 2018. The timescale of early land Ichinose M, Sugita M. 2018. The DYW domains of pentatricopeptide repeat plant evolution. Proceedings of the National Academy of Sciences, USA 115: RNA editing factors contribute to discriminate target and non-target editing E2274–E2283. sites. Plant and Cell Physiology 59: 1652–1659. Okuda K, Myouga F, Motohashi R, Shinozaki K, Shikanai T. 2007. Conserved Ichinose M, Sugita C, Yagi Y, Nakamura T, Sugita M. 2013. Two DYW domain structure of pentatricopeptide repeat proteins involved in chloroplast subclass PPR proteins are involved in RNA editing of ccmFc and atp9 RNA editing. Proceedings of the National Academy of Sciences, USA 104: 8178– transcripts in the moss Physcomitrella patens: first complete set of PPR editing 8183. factors in plant mitochondria. Plant and Cell Physiology 54: 1907–1916. Oldenkott B, Yamaguchi K, Tsuji-Tsukinoki S, Knie N, Knoop V. 2014. Ichinose M, Uchida M, Sugita M. 2014. Identification of a pentatricopeptide Chloroplast RNA editing going extreme: more than 3400 events of C-to-U repeat RNA editing factor in Physcomitrella patens chloroplasts. FEBS Letters editing in the chloroplast transcriptome of the lycophyte Selaginella uncinata. 588: 4060–4064. RNA 20: 1499–1506. Iyer LM, Zhang D, Rogozin IB, Aravind L. 2011. Evolution of the deaminase Oldenkott B, Yang Y, Lesch E, Knoop V, Schallenberg-Rudinger€ M. 2019. fold and multiple origins of eukaryotic editing and mutagenic nucleic Plant-type pentatricopeptide repeat proteins with a DYW domain drive C-to-U acid deaminases from bacterial toxin systems. Nucleic Acids Research 39: 9473– RNA editing in Escherichia coli. Communications Biology 2: e85. 9497. Pena~ MJ, Darvill AG, Eberhard S, York WS, O’Neill MA. 2008. Moss and Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. 2017. liverwort xyloglucans contain galacturonic acid and are structurally distinct MODELFINDER: fast model selection for accurate phylogenetic estimates. Nature from the xyloglucans synthesized by hornworts and vascular plants. Glycobiology Methods 14: 587–589. 18: 891–904. Knie N, Grewe F, Fischer S, Knoop V. 2016. Reverse U-to-C editing exceeds C- Piechotta M, Wyler E, Ohler U, Landthaler M, Dieterich C. 2017. JACUSA: site- to-U RNA editing in some ferns – a monilophyte-wide comparison of specific identification of RNA editing events from replicate sequencing data. chloroplast and mitochondrial RNA editing suggests independent evolution of BMC Bioinformatics 18: e7. the two processes in both organelles. BMC Evolutionary Biology 16: e134. Puttick MN, Morris JL, Williams TA, Cox CJ, Edwards D, Kenrick P, Pressel Knoop V. 2004. The mitochondrial DNA of land plants: peculiarities in S, Wellman CH, Schneider H, Pisani D et al. 2018. The interrelationships of phylogenetic perspective. Current Genetics 46: 123–139. land plants and the nature of the ancestral embryophyte. Current Biology 28: Kobayashi T, Yagi Y, Nakamura T. 2019. Comprehensive prediction of target 733–745. RNA editing sites for PLS-class PPR proteins in Arabidopsis thaliana. Plant and Qiu YL, Cho Y, Cox JC, Palmer JD. 1998. The gain of three mitochondrial Cell Physiology 60: 862–874. introns identifies liverworts as the earliest land plants. Nature 394: 671–674. Kugita M, Kaneko A, Yamamoto Y, Takeya Y, Matsumoto T, Yoshinaga K. Qiu Y-L, Li L, Wang B, Chen Z, Knoop V, Groth-Malonek M, Dombrovska O, 2003a. The complete nucleotide sequence of the hornwort (Anthoceros Lee J, Kent L, Rest J et al. 2006. The deepest divergences in land plants formosae) chloroplast genome: insight into the earliest land plants. Nucleic Acids inferred from phylogenomic evidence. Proceedings of the National Academy of Research 31: 716–721. Sciences, USA 103: 15511–15516. Kugita M, Yamamoto Y, Fujikawa T, Matsumoto T, Yoshinaga K. 2003b. RNA Rensing SA. 2018a. Great moments in evolution: the conquest of land by plants. editing in hornwort chloroplasts makes more than half the genes functional. Current Opinion in Plant Biology 42:49–54. Nucleic Acids Research 31: 2417–2423. Rensing SA. 2018b. Plant evolution: phylogenetic relationships between the Kuraku S, Zmasek CM, Nishimura O, Katoh K. 2013. ALEAVES facilitates on- earliest land plants. Current Biology 28: R210–R213. demand exploration of metazoan gene family trees on MAFFT sequence Rudinger€ M, Funk HT, Rensing SA, Maier UG, Knoop V. 2009. RNA editing: alignment server with enhanced interactivity. Nucleic Acids Research 41: W22– only eleven sites are present in the Physcomitrella patens mitochondrial W28. transcriptome and a universal nomenclature proposal. Molecular Genetics and Lenz H, Hein A, Knoop V. 2018. Plant organelle RNA editing and its specificity Genomics 281: 473–481. factors: enhancements of analyses and new database features in PREPACT 3.0. Rudinger€ M, Volkmar U, Lenz H, Groth-Malonek M, Knoop V. 2012. Nuclear BMC Bioinformatics 19: e255. DYW-type PPR gene families diversify with increasing RNA editing Lenz H, Rudinger€ M, Volkmar U, Fischer S, Herres S, Grewe F, Knoop V. frequencies in liverwort and moss mitochondria. Journal of Molecular Evolution 2010. Introducing the plant RNA editing prediction and analysis computer 74:37–51.

New Phytologist (2019) Ó 2019 The Authors www.newphytologist.com New Phytologist Ó 2019 New Phytologist Trust New Phytologist Research 19

Salone V, Rudinger€ M, Polsakiewicz M, Hoffmann B, Groth-Malonek M, in inverted repeat expansion, pseudogenization, and intron gain. American Szurek B, Small I, Knoop V, Lurin C. 2007. A hypothesis on the identification Journal of Botany 100: 467–477. of the editing enzyme in plant organelles. FEBS Letters 581: 4132–4138. Wagoner JA, Sun T, Lin L, Hanson MR. 2015. Cytidine deaminase motifs Sandoval R, Boyd RD, Kiszter AN, Mirzakhanyan Y, Santibanez P, Gershon within the DYW domain of two pentatricopeptide repeat-containing proteins PD, Hayes ML. 2019. Stable native RIP9 complexes associate with C-to-U are required for site-specific chloroplast RNA editing. Journal of Biological RNA editing activity, PPRs, RIPs, OZ1, ORRM1, and ISE2. The Plant Journal Chemistry 290: 2957–2968. 99: 1116–1126. Wickett NJ, Mirarab S, Nguyen N, Warnow T, Carpenter E, Matasci N, Schallenberg-Rudinger€ M, Kindgren P, Zehrmann A, Small I, Knoop V. 2013. Ayyampalayam S, Barker MS, Burleigh JG, Gitzendanner MA et al. 2014. A DYW-protein knockout in Physcomitrella affects two closely spaced Phylotranscriptomic analysis of the origin and early diversification of land mitochondrial editing sites and causes a severe developmental phenotype. The plants. Proceedings of the National Academy of Sciences, USA 111: E4859– Plant Journal 76: 420–432. E4868. Schallenberg-Rudinger€ M, Knoop V. 2016. Coevolution of organelle RNA editing Wu TD, Reeder J, Lawrence M, Becker G, Brauer MJ. 2016. GMAP and GSNAP and nuclear specificity factors in early land plants. In: Rensing SA, ed. Genomes for genomic sequence alignment: enhancements to speed, accuracy, and and evolution of charophytes, bryophytes and ferns. Advances in Botanical Research, functionality. Methods in Molecular Biology 1418: 283–334. Vol. 78. Amsterdam, the Netherlands: Elsevier Academic Press, 37–93. Xue JY, Liu Y, Li L, Wang B, Qiu YL. 2010. The complete mitochondrial Schultink A, Liu L, Zhu L, Pauly M. 2014. Structural diversity and function of genome sequence of the hornwort Phaeoceros laevis: retention of many ancient xyloglucan sidechain substituents. Plants 3: 526–542. pseudogenes and conservative evolution of mitochondrial genomes in Sloan DB, MacQueen AH, Alverson AJ, Palmer JD, Taylor DR. 2010. hornworts. Current Genetics 56:53–61. Extensive loss of RNA editing sites in rapidly evolving Silene mitochondrial Yagi Y, Hayashi S, Kobayashi K, Hirayama T, Nakamura T. 2013. Elucidation genomes: selection vs. retroprocessing as the driving force. Genetics 185: 1369– of the RNA recognition code for pentatricopeptide repeat proteins involved in 1380. organelle RNA editing in plants. PLoS ONE 8: e57286. Smith DR. 2019. Unparalleled variation in RNA editing among Selaginella Yan J, Yao Y, Hong S, Yang Y, Shen C, Zhang Q, Zhang D, Zou T, Yin P. plastomes. Plant Physiology. doi: 10.1104/pp.19.00904. 2019. Delineation of pentatricopeptide repeat codes for target RNA prediction. de Sousa F, Foster PG, Donoghue PCJ, Schneider H, Cox CJ. 2018. Nuclear Nucleic Acids Research 47: 3728–3738. protein phylogenies support the monophyly of the three bryophyte groups Yoshinaga K, Iinuma H, Masuzawa T, Uedal K. 1996. Extensive RNA editing of (Bryophyta Schimp.). New Phytologist 222: 565–575. U to C in addition to C to U substitution in the rbcL transcripts of hornwort Steinhauser S, Beckert S, Capesius I, Malek O, Knoop V. 1999. Plant chloroplasts and the origin of RNA editing in green plants. Nucleic Acids mitochondrial RNA editing: extreme in hornworts and dividing the liverworts? Research 24: 1008–1014. Journal of Molecular Evolution 48: 303–312. Zehrmann A, H€artel B, Glass F, Bayer-Csaszar E, Obata T, Meyer E, Brennicke Sugita M, Ichinose M, Ide M, Sugita C. 2013. Architecture of the PPR gene A, Takenaka M. 2015. Selective homo and heteromer interactions between the family in the moss Physcomitrella patens. RNA Biology 10: 1439–1445. Multiple Organellar RNA Editing Factor (MORF) proteins in Arabidopsis Sun T, Bentolila S, Hanson MR. 2016. The unexpected diversity of plant thaliana. Journal of Biological Chemistry 290: 6445–6456. organelle RNA editosomes. Trends in Plant Science 21: 926–973. Zumkeller SM, Knoop V, Knie N. 2016. Convergent evolution of fern-specific Sun T, Germain A, Giloteaux L, Hammani K, Barkan A, Hanson MR, Bentolila mitochondrial group II intron atp1i361g2 and its ancient source paralogue S. 2013. An RNA recognition motif-containing protein is required for plastid rps3i249g2 and independent losses of intron and RNA editing among RNA editing in Arabidopsis and maize. Proceedings of the National Academy of Pteridaceae. Genome Biology and Evolution 8: 2505–2519. Sciences, USA 110: E1169–E1178. Sun T, Shi X, Friso G, Van Wijk K, Bentolila S, Hanson MR. 2015. A zinc finger motif-containing protein is essential for chloroplast RNA editing. PLoS Genetics 11: e1005028. Supporting Information Sz€ovenyi P, Frangedakis E, Ricca M, Quandt D, Wicke S, Langdale JA. 2015. Establishment of Anthoceros agrestis as a model species for studying the biology Additional Supporting Information may be found online in the of hornworts. BMC Plant Biology 15: e98. Supporting Information section at the end of the article. Takenaka M, Zehrmann A, Brennicke A, Graichen K. 2013. Improved computational target site prediction for pentatricopeptide repeat RNA editing Fig. S1 The tRNA-Asp(GUC) as an example for mitochondrial factors. PLoS ONE 8: e65343. RNA editing in a structural RNA. Takenaka M, Zehrmann A, Verbitskiy D, Kugelmann M, H€artel B, Brennicke A. 2012. Multiple organellar RNA editing factor (MORF) family proteins are required for RNA editing in mitochondria and plastids of plants. Proceedings of Fig. S2 Three ‘DYW-only’-type PPR proteins of Anthoceros the National Academy of Sciences, USA 109: 5104–5109. agrestis. Trifinopoulos J, Nguyen L-T, von Haeseler A, Minh BQ. 2016. W-IQ-TREE:a fast online phylogenetic tool for maximum likelihood analysis. Nucleic Acids Table S1 Anthoceros agrestis chloroplast RNA editing sites. Research 44: W232–W235. Vangerow S, Teerkorn T, Knoop V. 1999. Phylogenetic information in the mitochondrial nad5 gene of pteridophytes: RNA editing and intron sequences. Table S2 Anthoceros agrestis mitochondrial RNA editing sites. Plant Biology 1: 235–243. Villarreal Aguilar JC, Turmel M, Bourgouin-Couture M, Laroche J, Salazar Allen N, Li F-W, Cheng S, Renzaglia K, Lemieux C. 2018. Genome-wide Please note: Wiley Blackwell are not responsible for the content organellar analyses from the hornwort Leiosporoceros dussii show low frequency of RNA editing. PLoS ONE 13: e0200491. or functionality of any Supporting Information supplied by the Villarreal JC, Forrest LL, Wickett N, Goffinet B. 2013. The plastid genome of authors. Any queries (other than missing material) should be the hornwort Nothoceros aenigmaticus (Dendrocerotaceae): phylogenetic signal directed to the New Phytologist Central Office.

Ó 2019 The Authors New Phytologist (2019) New Phytologist Ó 2019 New Phytologist Trust www.newphytologist.com