Marcussen et al. BMC Evolutionary Biology 2010, 10:45 http://www.biomedcentral.com/1471-2148/10/45

RESEARCH ARTICLE Open Access Evolution of RNA polymerase IV/V genes: evidence of subneofunctionalization of duplicated NRPD2/NRPE2-like paralogs in () Thomas Marcussen1, Bengt Oxelman2, Anna Skog1, Kjetill S Jakobsen1*

Abstract Background: DNA-dependent RNA polymerase IV and V (Pol IV and V) are multi-subunit enzymes occurring in . The origin of Pol V, specific to angiosperms, from Pol IV, which is present in all land plants, is linked to the duplication of the gene encoding the largest subunit and the subsequent subneofunctionalization of the two paralogs (NRPD1 and NRPE1). Additional duplication of the second-largest subunit, NRPD2/NRPE2, has happened independently in at least some eudicot lineages, but its paralogs are often subject to concerted evolution and gene death and little is known about their evolution nor their affinity with Pol IV and Pol V. Results: We sequenced a ~1500 bp NRPD2/E2-like fragment from 18 Viola , mostly paleopolyploids, and 6 non-Viola Violaceae species. Incongruence between the NRPD2/E2-like gene phylogeny and species phylogeny indicates a first duplication of NRPD2 relatively basally in Violaceae, with subsequent sorting of paralogs in the descendants, followed by a second duplication in the common ancestor of Viola and Allexis.InViola, the mutation pattern suggested (sub-) neofunctionalization of the two NRPD2/E2-like paralogs, NRPD2/E2-a and NRPD2/E2-b. The dN/dS ratios indicated that a 54 bp region exerted strong positive selection for both paralogs immediately following duplication. This 54 bp region encodes a domain that is involved in the binding of the Nrpd2 subunit with other Pol IV/V subunits, and may be important for correct recognition of subunits specific to Pol IV and Pol V. Across all Viola taxa 73 NRPD2/E2-like sequences were obtained, of which 23 (32%) were putative pseudogenes - all occurring in polyploids. The NRPD2 duplication was conserved in all lineages except the diploid MELVIO clade, in which NRPD2/E2-b was lost, and its allopolyploid derivates from hybridization with the CHAM clade, section Viola and section Melanium, in which NRPD2/E2-a occurred in multiple copies while NRPD2/E2-b paralogs were either absent or pseudogenized. Conclusions: Following the relatively recent split of Pol IV and Pol V, our data indicate that these two multi- subunit enzymes are still in the process of specialization and each acquiring fully subfunctionalized copies of their subunit genes. Even after specialization, the NRPD2/E2-like paralogs are prone to pseudogenization and gene conversion and NRPD2 and NRPE2 copy number is a highly dynamic process modulated by allopolyploidy and gene death.

* Correspondence: [email protected] 1Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biology, University of Oslo, 0316 Oslo, Norway

© 2010 Marcussen et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Marcussen et al. BMC Evolutionary Biology 2010, 10:45 Page 2 of 15 http://www.biomedcentral.com/1471-2148/10/45

Background angiosperms, duplication of the second-largest subunit Eukaryotes normally possess three nuclear DNA-depen- genes (NRPD2/E2) seems comparatively rare. NRPD2/E2 dent RNA polymerases (Pols), Pol I-III, functionally spe- is apparently a singleton in monocots (Oryza, Zea)as cialized for synthesis of different types of RNA and thus well as in several families of , e.g., Aceraceae essential for viability. The Pol holoenzymes consist of (Acer), Asteraceae (Carthamus), Lamiaceae (Galeopsis; about 12 subunits, of which the two largest are tightly Brysting AK, unpublished), Myrtaceae (Myrtus), Solana- bound and together constitute the catalytic seat of the ceae (Solanum) and Vitaceae (Vitis) [3,6-8]. A few eudi- enzyme and are generally polymerase-type specific [1-4]. cot lineages, however, possess duplicate NRPD2/E2 Angiosperms (flowering plants) are unique in possessing copies, e.g., Brassicaceae (Arabidopsis;butonlyone two additional RNA polymerases that are not essential paralog is expressed), Caprifoliaceae (Lonicera), Celastra- for viability, Pol IV and Pol V (previously called Pol IVa ceae (Maytenus), Euphorbiaceae (Manihot), Salicaceae and IVb, or RNAP IVa and IVb). They are functionally (Populus; but only one paralog is expressed), Caryophyl- distinct, with Pol IV being required for 24 nt siRNA laceae (Silene and many other genera) and Violaceae production and Pol V for siRNA-mediated gene silen- (Viola and Allexis; herein) [6,9-11]. This indicates that cing of transposons and other repeated elements [5]. duplicated NRPD2/E2 genes may in fact be a common The subunit nomenclature of nuclear RNA poly- feature in eudicots. It is clear that the NRPD2/E2 dupli- merases has varied among research groups and organ- cations have occurred independently in these lineages, isms, and is often in conflict with names for unrelated and that they are also frequently lost, with sorting genes. In the following, we have therefore adopted the among lineages as a common result [10]. 4-letter gene names registered with The Arabidopsis The mechanisms behind gene duplication are well Information Resource. By convention the largest subu- known in eukaryotes and in plants [e.g., [12]]. While it nits of Pol I, II, III, IV and V are Nrpa1, Nrpb1, Nrpc1, is clear that by far the most likely fate of a duplicate Nrpd1 and Nrpe1 respectively, and the genes encoding gene is gene death [7,13,14], mechanisms accounting for these are NRPA1, NRPB1, NRPC1, NRPD1 and NRPE1, the duplications being retained in the genome have respectively. Likewise, the genes encoding the second- been, until recently, less well understood [15]. Duplicate largest subunits of the five polymerases are designated genes may be preserved by a neutral mechanism in NRPA2, NRPB2, NRPC2, NRPD2 and NRPE2, which each paralog accumulates loss-of-function muta- respectively. tions (degeneration) that are complemented by the The genes encoding the largest and second-largest other copy. Such mutations can happen either at the subunits of Pol IV, NRPD1 and NRPD2 respectively, ori- regulatory level, causing the paralogs to diverge in pat- ginated by independent duplication of their Pol II tern of expression (duplication-degeneration-comple- homologs, NRPB1 and NRPB2.TheNRPB1/NRPD1 mentation (DDC) [16]), or at the product level, causing duplication is shared by both charophytes and embryo- the paralogs to diverge in function (subfunctionalization phytes while the NRPB2/NRPD2 duplication is found [17]). Furthermore, either mechanism can eliminate pos- only in embryophytes [3]. While Pol IV is found in all sible structural trade-offs imposed by different functions plants, Pol V appears to exist only in angiosperms (flow- performed by a multifunctional gene [18], by unlinking ering plants) following duplication of, at least, the lar- these functions. These mechanisms can thus be gest subunit gene (NRPD1/NRPE1) basally in this regarded as prerequisites for the ability of duplicate lineage [3]. A recent study in the eudicot angiosperm genes to specialize and acquire new functions (subneo- Arabidopsis thaliana confirms the close relationship of functionalization [19]). Regulatory and functional Pol IV and Pol V with Pol II and shows that many of subfunctionalization are both well-documented in their 12 subunits are shared among these three RNA gene families, in eukaryotes in general as well as in polymerases [1]. Nevertheless, 4 subunits of Pol IV and plants [e.g., [15,20-23]]. 6subunitsofPolVaredistinctfromtheirPolIIpara- The RNA polymerase subunit encoded by NRPD2/E2, logs, and Pol IV and Pol V differ in 4 subunits. Interest- Nrpd2/Nrpe2, has a discrete double function in angios- ingly, 3 duplicated Pol IV/V genes (third, seventh and perms, assembling either with Pol IV or with Pol V. A nineth largest subunits) appear to be incompletely sub- duplication of this gene might have been preserved if functionalized with respect to Pol IV and Pol V. These the two paralogs underwent subfunctionalization with have a higher sequence similarity than the fully specia- respect to Pol type, and would have required some lized gene pairs (e.g. NRPD1/NRPE1)andarepresum- degree of co-evolution of co-assembling subunits. ably derived from more recent duplication events. In this study we have investigated the evolution of Following the duplication and specialization of the Pol NRPD2/E2-like genes within the Violaceae (Malpigh- IV/V largest subunit genes (NRPD1/NRPE1)in iales), with particular reference to the Viola Marcussen et al. BMC Evolutionary Biology 2010, 10:45 Page 3 of 15 http://www.biomedcentral.com/1471-2148/10/45

In Arabidopsis five of the 12 genes associated with Pol A Arabidopsis 1 2 34 5 6 7 NRPD2/E2 IV/V have been duplicated, apparently independently, 0.5 kb and have undergone subfunctionalization with respect to Pol IV and V [cf. [1]]. For NRPD2/E2 in eudicots, avail- PCR 2 (~900 bp) able sequence information suggests numerous indepen- B PCR 1 (~900 bp) dent duplications and that these paralogs are often subject to concerted evolution and gene death [10]. In this study, we elucidate the origin of duplication of the NRPD2/E2-like genes within the Violaceae and aspects 5F3062 7R3883 5F2898 vex7F3418 6F3263 vex6R3371 Svex7R3420 intron 5 intron 6 of its evolution and phylogeny within Viola. Polyploidy, ex. 5 exon 6 exon 7 which is known to be a major evolutionary process in Viola, could be thought to interact with a nascent gene interaction interaction interaction interaction NRPD2/E2 Nrpd3/e3 Nrpd10/e10 Nrpd3/e3 Nrpd1/e1 family such as . For instance, could redun- dancy resulting from polyploidy destabilize the incipient Figure 1 Sequence characteristics of NRPD2/E2.A.The corresponding localization in Arabidopsis NRPD2/E2 of the region differentiation of the two paralogs, NRPD2/E2-a and amplified for Violaceae. B. Specifications for the region in Violaceae. NRPD2/E2-b, or could the occasional loss of primary Filled triangles denote PCR primers used to amplify the two duplication be compensated for by secondary duplica- standard PCR regions, and open triangles denote primers used to tions resulting from polyploidy? The immediate conse- amplify shorter stretches if PCRs with the standard primers quence of gene duplication is redundancy, which will repeatedly failed. NRPD2/E2 domains involved in recognition and binding to other Pol IV/V subunits are indicated, based on findings generally lead to loss or pseudogenization of one paralog in yeast Pol II [2]. unless the paralogs become subfunctionalized or neo- functionalized. Positive selection can be taken as evi- dence of neofunctionalization. It is therefore of (Figure 1). This gene occurs in a single copy in most relevance to detect to what degree positive selection has genera of the family but it is duplicated in others acted on duplicated NRPD2/E2-like paralogs within the (Allexis and Viola). In a similar system within tribe Sile- Violaceae, and if it has, at which sites and on which neae of the Caryophyllaceae (Caryophyllales), concerted phylogenetic branches. evolution was found to be prominent among NRPD2/E2 paralogs [10]. In that study, however, only intron 6 was Results investigated. In order to be able to examine possible Assignment and naming of NRPD2/E2-like homologs neofunctionalization among NRPD2/E2-like duplicants in Violaceae in the Violaceae system, we have expanded this range to NRPD2/E2-a and NRPD2/E2-b are arbitrary labels that include also the flanking exons of intron 6. denote the two paralogs found in Viola and Allexis. The Violaceae consist of some 900 species in 23 mostly They do not reflect orthology to duplicated NRPD2/E2 tropical genera [24]. Their relationships have recently loci outside of Violaceae, and do not imply that the been examined in a phylogenetic study based on plastid respective binding specificities of the paralogs to Pol IV and nuclear ribosomal gene DNA sequences [25]. With and Pol V are known. Appended digits to the sequence more than 500 species, Viola is the largest genus of the name separate homoeologs of a paralog within a single Violaceae and the only one widely distributed in the specimen (e.g., banksii_B2 refers to homoeolog 2 of northern hemisphere [26]. Based on chromosome counts NRPD2/E2-b in V. banksii). [e.g., [27]] and isozyme expression data [28,29] it can be estimated that roughly two thirds of Viola species belong NRPD2/E2-like homologs in Violaceae to paleopolyploid lineages having secondary base GenBank sequence data for the demon- numbers ranging from x =10tox =27orhigher.“True” strate duplicate copies of NRPD2/E2 in both Manihot diploids are known only from two sections, Andinium esculenta (CK652029, DV448133) and Populus tricho- (x =7)fromSouthAmerica[30]andChamaemelanium carpa (e.g., DT509274, CV227572) but not in Euphorbia (x = 6) which is mainly northern amphi-Pacific [e.g., esula (DV145650). In Manihot both paralogs are poten- [31]]. Tentative genus phylogenies have been based on tially functional but in Populus one paralog (CV227572) rRNA Internal Transcribed Spacer (ITS) sequence data is characterized by frameshift and non-synonymous in several studies [e.g., [26,32]] but this marker has pro- mutations not reconcilable with NRPD2/E2 activity. The ven of no use for recovering any of these polyploid rela- two copies found in Manihot, Populus and Viola are not tionships [e.g., [33]]. The genus phylogeny is currently orthologous to each other (not shown). We obtained being re-examined using low-copy nuclear genes (Mar- and analyzed sequence information from six non-Viola cussen, Oxelman, Jakobsen, unpublished data). Violaceae taxa (Table 1). These sequences were aligned Marcussen et al. BMC Evolutionary Biology 2010, 10:45 Page 4 of 15 http://www.biomedcentral.com/1471-2148/10/45

Table 1 Material and gene sequences used Species Taxonomic group (base 2nxGenBank accession ID Voucher ID chromosome number) Viola congesta sect. Andinium (x =7) – 2x a: GU289564; b: GU289615 Marcussen 641 (O) Viola biflora sect. Chamaemelanium (x =6) 2n =12 2x a: GU289574; b: GU289625 Marcussen 775 (O) Viola brevistipulata sect. Chamaemelanium (x =6) 2n =12 2x a: GU289575; b: GU289626, Marcussen 803 (O) GU289627 Viola canadensis sect. Chamaemelanium (x =6) 2n = 12, 24 4x aa: GU289576; ba: GU289637 Marcussen 802 (O) Viola nuttallii sect. Chamaemelanium (x =6) 2n =24 4x a: GU289577, GU289578, Marcussen 801 (O) GU289579; b: GU289628, GU289629 Viola pubescens sect. Chamaemelanium (x =6) 2n =12 2x a: GU289580; b: GU289630 Marcussen 637 (O) Viola maculata sect. Chilenium – 8x a: GU289570, GU289571, Marcussen 804 (O) GU289572, GU289573; b: GU289616, GU289617, (GU289618b), GU289619 Viola banksii sect. Erpetion – 8x-10x a: GU289565, GU289566, Marcussen 630 (O) GU289567c, GU289568c, (GU289569b); b: (GU289620c), GU289621, GU289622b, GU289623, GU289624 Viola bicolor sect. Melanium 2n =34 12x? a: GU289603, GU289604, Marcussen 743 (O) GU289605c, GU289606b, GU289607c, GU289608b Viola calcarata sect. Melanium 2n =20 12x? a: GU289609, GU289610, Marcussen 672 (O) GU289611, GU289612c, GU289613, GU289614 Viola dirimliensis sect. Melanium 2n =8 8x? a: GU289599, GU289600, Marcussen 650 (O) GU289601, GU289602c Viola epipsila sect. Viola (x = 10, 12) 2n =24 4x a: GU289587, GU289588; Marcussen 661 (O) b: GU289635b Viola hirta sect. Viola (x = 10, 12) 2n =20 4x a: GU289581, GU289582 Marcussen 682 (O) Viola mirabilis sect. Viola (x = 10, 12) 2n =20 4x a: GU289583, GU289584; Marcussen 683 (O) b: GU289631 Viola selkirkii sect. Viola (x = 10, 12) 2n =24 4x a: GU289589, GU289590; Marcussen 698 (O) b: GU289634b Viola spathulata sect. Viola (x = 10, 12) – 8x? a: GU289593, GU289594, Marcussen 670 (O) GU289595b, GU289596b, GU289597, GU289598; b: GU289636b Viola uliginosa sect. Viola (x = 10, 12) 2n =20 4x a: GU289585, GU289586; Marcussen 662 (O) b: GU289632 Viola verecunda sect. Viola (x = 10, 12) 2n =24 4x a: GU289591, GU289592; Marcussen 697 (O) b: GU289633 Allexis batangae Violaceae (outgroup) – 2x a: GU289562; b: GU289563c Bos 4241 (UPS) Anchietea parvifolia Violaceae (outgroup) – 2x GU289559c Myndel Pedersen 13944 (UPS) Corynostylis arborea Violaceae (outgroup) – 2x GU289560 Asplund 14509 (UPS) Cubelium concolor (= Violaceae (outgroup) 2n =48 2x GU289561 Pläck & Bodin s.n. concolor) (UPS) Hybanthus Violaceae (outgroup) 2n = 16, 32 2x GU289558 unknown 2001-05- enneaspermus 13 (UPS) Rinorea ilicifolia Violaceae (outgroup) – 2x GU289557 Friis et al. 2445 (UPS) Populus trichocarpa Salicaceae (outgroup) 2n =38 – a: DT509274; b: CV227572 – Manihot esculenta Euphorbiaceae (outgroup) 2n = 36, 54, 72 – a: DV448133; b: (CK652029) – Marcussen et al. BMC Evolutionary Biology 2010, 10:45 Page 5 of 15 http://www.biomedcentral.com/1471-2148/10/45

Table 1: Material and gene sequences used (Continued) Euphorbia esula Euphorbiaceae (outgroup) 2n = 20, 48-60 – DV145650 – Taxa used in this study, with respective GenBank accessions for DNA sequences and voucher information. For each taxon systematic affinity (sections within Viola for ingroup, and families within Malpighiales for outgroup), chromosome counts (2n, where available), and putative ploidal levels (x, inferred from the NRPD2/E2 data), are indicated. GenBank accession IDs are sorted by paralog (a and b), and by homoeolog (ascending numbers); gene copies excluded from phylogenetic analysis, because they were considered too short for reliable analysis, are put in brackets. Note that NRPD2/E2-a and NRPD2/E2-b are arbitrary labels, and that NRPD2/E2-a and NRPD2/E2-b in Euphorbiaceae, Salicaceae and Violaceae are not orthologous to each other. Herbarium acronyms for voucher specimen deposition (i.e., O, U) follow Holmgren and Holmgren [50]. a the secondarily duplicated gene copies in Viola canadensis differed only in 3 (NRPD2/E2-a)and8(NRPD2/E2-b) substitutions, and their respective consensuses were used as single sequences in the analyses b partial sequence (exon 6 to exon 7); PCR 1 failed (see Figure 1) c partial sequence (exon 5 to intron 6); PCR 2 failed (see Figure 1)

to exon (mRNA) sequences from GenBank of Euphor- the topologies resulting from phylogenetic analyses of bia, Manihot and Populus. Outside Viola, we found sin- the partitions were congruent. gleton NRPD2/E2 in all of Anchietea parvifolia, The reconciled tree (Figure 2c), constructed in Gene- Corynostylis arborea, Cubelium concolor (= Hybanthus Tree by embedding the NRPD2/E2 tree (Figure 2b) concolor), Hybanthus enneaspermus and Rinorea ilicifo- within the species tree (Figure 2a), explains the incon- lia.LikeViola, Allexis batangae had duplicated NRPD2/ gruence between these two trees by hypothesizing two E2 genes, but only the NRPD2/E2-b paralog was puta- events of gene duplication and three losses. A first tively functional; its NRPD2/E2-a paralog was a pseudo- duplication was postulated on the basal branch of all gene that contained three frameshift mutations and stop Violaceae except Rinorea, meaning that one paralog codons in all three reading frames. would have been lost in Viola and Allexis but retained Our inferences of the plastid and nuclear ribosomal in Cubelium, Hybanthus, Anchietea and Corynostylis. phylogeny of Violaceae (Figure 2a) were congruent with The second paralog may have been retained only in previous analyses of the family, regarding both general Viola and Allexis, before duplicating a second time in topology [25] and the placement of Cubelium [34]. their common ancestor and diversify into their present Rinorea was placed as sister to the rest of the Violaceae, NRPD2/E2-a and NRPD2/E2-b paralogs. with a Cubelium + Orthion clade and an Allexis + Viola clade as successive sisters to a Hybanthus (Anchietea + NRPD2/E2-like homologs in Viola Corynostylis) clade. All branches received high (95- There were considerable differences in the relative num- 100%) bootstrap support. ber of copies of NRPD2/E2-a and NRPD2/E2-b across The NRPD2/E2 phylogenies (Figure 2b) were incon- lineages of the genus Viola (Table 1), but seen as a gruent with the species tree. Again, Rinorea was placed whole NRPD2/E2 always occurred in two or more as sister to the rest of the Violaceae with relatively high potentially functional copies. Only the two diploid sec- bootstrap support (MP: 71%/ML: 93%). Within rest-Vio- tions Andinium and Chamaemelanium appeared to laceae three well-supported clades were found, one con- have single and functional copies of each of NRPD2/E2- sisting of NRPD2/E2-a copy of Allexis and Viola (92%/ a and NRPD2/E2-b. All gene copies appeared functional 95%), a second of the NRPD2/E2-b copy of Allexis and in the neopolyploids of the latter section (V. nuttallii Viola (92%/96%), and a third (67%/76%) consisting of and V. canadensis). Non-functional gene copies were Hybanthus and Cubelium as sisters to a strongly sup- identified by the often numerous occurrence of prema- ported (93%/100%) Anchietea + Corynostylis clade. ture stop codons and frameshift mutations within exons Whether it is Hybanthus (MP) or Cubelium (ML) that (up to a single 862 bp deletion comprising all of exon 6 is sister to the rest within the last clade depends on the in B1_banksii); in a single case (NRPD2/E2-b in V. uligi- analysis, but neither topology receives strong bootstrap nosa) the sequence was assumed to be non-functional support (52% and 61%, respectively). Weak support is because of a partial duplication within the highly con- given for an NRPD2/E2-a + NRPD2/E2-b clade (Allexis served GEMERD amino acid motif of exon 7. Taxa of and Viola; 52%/68%). However, the inter-relationships section Erpetion (V. banksii) and section Chilenium (V. of these three main clades remain elusive and depend maculata) had equal numbers of NRPD2/E2-a and on whether Cubelium and Hybanthus are included in NRPD2/E2-b copies; 5 and 4 of each, respectively, but the analysis (not shown). differed in their respective numbers of putatively func- No evidence of recombination was detected in the tional copies. All members of the sections Melanium Violaceae alignment using GARD (see methods). Two and Viola had unbalanced numbers of NRPD2/E2-a and possible recombination breakpoints were detected, but NRPD2/E2-b. Typically, taxa of section Viola had two Marcussen et al. BMC Evolutionary Biology 2010, 10:45 Page 6 of 15 http://www.biomedcentral.com/1471-2148/10/45 outgroup 100 Adenia sp 100 Passiflora quadrangularis Piriqueta cistoides Rinorea ilicifolia 100 98 Cubelium concolor 100 100 Orthion subsessile Violaceae 100 100 99 Allexis obanensis 100 95 Viola yedoensis 100 Hybanthus enneaspermus 100 100 99 Anchietea parvifolia 0.01 100 Corynostylis arborea

97 A Manihot esculenta outgroup 92 Euphorbia esula 100 A Populus trichocarpa 100 B POPULUS TRICHOCARPA Rinorea ilicifolia A ALLEXIS BATANGAE 92 95 A Viola biflora 100 100 NRPD2/E2-a 52 100 99 A Viola congesta Violaceae 68 B Allexis batangae 92 71 96 100 B Viola biflora NRPD2/E2-b 93 100 B Viola congesta Cubelium concolor 67 Hybanthus enneaspermus 76 - 61 93 Anchietea parvifolia 0.1 100 Corynostylis arborea

outgroup Rinorea x (Cubelium) Allexis NRPD2/E2-a Viola Allexis NRPD2/E2-b Viola (Hybanthus) x (Anchietea) (Corynostylis) Cubelium (Allexis) x (Viola) Hybanthus Anchietea duplication Corynostylis x loss Figure 2 Phylogeny of the NRPD2/E2-like gene family in Violaceae. Bootstrap support values (1000 replicates) are indicated above (MP) and below (ML) branches, respectively. Names of pseudogenes are capitalized. A. Violaceae phylogeny inferred by ML analysis of the four genes atpB, matK, rbcL and 16S. Sequence data were obtained from Tokuoka [25] with an additional matK sequence of Cubelium concolor from GenBank (EF135550, as Hybanthus concolor). Names of taxa not included in phylogeny B are underlined. B. Phylogeny of NRPD2/E2 paralogs in Violaceae inferred by ML analysis. C. Tree reconciliation between the NRPD2/E2 gene tree (B) and the Violaceae organism tree (A), using the GeneTree software. The two inferred gene duplication events and three gene loss events are indicated. putatively functional copies of NRPD2/E2-a and one likely reflects heterozygosity in one of the NRPD2/E2-a non-functional copy of NRPD2/E2-b (except in V. hirta loci (V. nuttallii)ortheNRPD2/E2-b locus (V. and in V. spathulata). Members of section Melanium brevistipulata). had four to six copies of NRPD2/E2-a,ofwhichoneor The MP and ML phylogenies of NRPD2/E2-a and several could be non-functional, but no copies of NRPD2/E2-b in Viola are all largely congruent (Figure NRPD2/E2-b. Unbalanced numbers of NRPD2/E2-a and 3) with an (as of yet) unpublished phylogeny for the NRPD2/E2-b copies were found also in V. brevistipulata genus based on another low-copy nuclear gene (Marcus- and V. nuttallii (section Chamaemelanium) but, in light sen T, Oxelman B, Blaxland K, Jakobsen KS, in prep.), of their ploidy levels and expected copy number, this with the exceptions that NRPD2/E2-b is absent in the Marcussen et al. BMC Evolutionary Biology 2010, 10:45 Page 7 of 15 http://www.biomedcentral.com/1471-2148/10/45

NRPD2/E2-a NRPD2/E2-b

56 51 83 94

MELVIO 96 clade 93

- 68 95 94 CHILERP - - CHILERP clade 54 75 52 clade 99 79 80

CHAM 88 CHAM clade 89 clade

61 - 52 62

- - 67 55 84 - 72 93 - 81 92 54 63 57 67 69 54 - 76 75

95 99 outgroup (Allexis batangae) 95 99 - 96

Figure 3 Phylogenies of NRPD2/E2-a and NRPD2/E2-b in Viola. Allexis batangae NRPD2/E2-a and NRPD2/E2-b paralogs are used as outgroup. Bootstrap support values are indicated as percentages, MP values above branches (based on 1000 replicates), and ML values below branches (based on 100 replicates). Branches receiving high bootstrap support (>95% for both MP and ML) are indicated with a terminal dot. Pseudogenes are indicated in capital letters. Taxa are grouped and color-coded according to section.

MELVIO clade, in the entire section Melanium and in referred to as CHILERP, MELVIO (only NRPD2/E2-a) V. hirta of section Viola. Generally higher bootstrap and CHAM. The CHILERP clade, which received only support was obtained for NRPD2/E2-b than for NRPD2/ weak ML bootstrap support, but was recovered for both E2-a, reflecting that the former has ca 200 bp longer NRPD2/E2-a (54%) and NRPD2/E2-b (52%), consisted of introns and therefore more phylogenetically informative various V. banksii (section Erpetion)andV. maculata sites. Working our way up from the root of the two (section Chilenium) lineages, of which one internal consensus trees in Figure 3, V. congesta (section Andi- mixed species lineage received 100% bootstrap support. nium) is sister to the rest of the genus, sandwiched by The MELVIO clade, missing for NRPD2/E2-b, received branches receiving strong bootstrap support in all ana- strong support for NRPD2/E2-a (MP: 96%/ML: 93%) lyses. Next comes a polytomy of three lineages, here and consisted of a basal polytomy of taxa of section Marcussen et al. BMC Evolutionary Biology 2010, 10:45 Page 8 of 15 http://www.biomedcentral.com/1471-2148/10/45

Viola within which a strongly supported (100%) section part also so for NRPD2/E2-a and NRPD2/E2-b,butfor Melanium is nested. The large and strongly supported both paralogs a 54 bp (18 amino acid) region with dN/ CHAM clade included sequences from all represented dS considerably higher than 1 is identified near the 3’ sections except Andinium and, in the case of NRPD2/ end of exon 6 (nucleotide positions 249 through 302), E2-b, Melanium. A basal dichotomy in the CHAM clade indicating positive selection in both these paralogs. lead to two strongly supported sub-clades: one sub- Within the region of positive selection a compensatory clade consisting of one V. maculata sister to two V. pattern conserving regionally the net charge of muta- banksii sequences, and a second sub-clade in which taxa tions was found (not shown): substitution of E/D (gluta- of sections Chamaemelanium and Viola formed a polyt- mic acid/aspartic acid) in position 300 is compensated omy; in NRPD2/E2-a section Melanium was a mono- for by gain of E in position 252 in NRPD2/E2-b. Thus, phyletic group (92%/99%) within this basal polytomy. the positions of charged amino acids are subject to Within the polytomies of CHAM and MELVIO the spe- selection. cies constellation of (V. mirabilis (V. uliginosa + V. The 54 bp region where positive selection was hirta), (V. epipsila + V. selkirkii + V. verecunda), and V. detected (Figure 4) was further analyzed using the dirimliensis sister to the rest of section Melanium were CodeML software of the PAML package for estimating common. dN/dS ratios of 60 specified branches in the predefined phylogenetic tree. A 60-parameter model, assuming one Selective forces dN/dS ratio for each branch, was found to marginally The pattern of change of dN/dS ratios along the better fit the data (p = 0.0516) than a single-parameter sequence is shown in Figure 4, using the sliding window model, assuming a uniform dN/dS ratio across all option for pairwise comparison of Rinorea with three branches in the tree. Although many branches in the data sets: (1) 3 NRPD2/E2 sequences from Cubelium/ tree had positive dN/dS ratios, especially those immedi- Corynostylis/Hybanthus, in which the gene is not dupli- ately after the duplication basal to Allexis and Viola, cated; (2) 13 NRPD2/E2-a sequences from Allexis and only for the branch basal to B_congesta was the dN/dS Viola;and(3)10NRPD2/E2-b sequences from Viola. ratio significantly larger than 1 (p = 0.0526). A model The dN/dS ratios are well below 1 throughout most of assuming a common dN/dS ratio for the three basal- thesequenceforCubelium/Corynostylis/Hybanthus, most branches following the duplication, i.e. basal to thus indicating purifying selection. It is for the most A_Allexis, A_congesta and B_congesta, received strong support (p = 0.0101). Thus, both NRPD2/E2 paralogs seem to have been subjected to positive selection (dN 12 Rinorea vs. NRPD2/E2-a >dS) soon after the duplication, but apparently not at Rinorea vs. NRPD2/E2-b exactly the same time (Figures 5 and 6). For NRPD2/E2- 10 Rinorea vs. singleton a, positive selection is hypothesized (i) immediately after 8 the duplication of NRPD2/E2 and before the split of ratio

S 6 Allexis and Viola, and (ii) within the rest of Viola after /d N

d 4 Viola section Andinium split off, and finally (iii) also

2 within the CHAM clade. For NRPD2/E2-b positive selection occurred somewhat later, and only in the 0 d10/e10 d3/e3 d1/e1 branch leading to Viola (i.e. not in Allexis).

0 100 200 300 400 500 600 700 800 Discussion Nucleotide position NRPD2/E2 phylogeny within Violaceae d d NRPD2/E2 Figure 4 Sliding window plot of N/ S ratios for in The NRPD2/E2-a and NRPD2/E2-b phylogenies differ in Violaceae. The plot was generated by comparing the Rinorea sequence to singleton NRPD2/E2 in Cubelium, Hybanthus and several respects from the already published organism Corynostylis (black), to NRPD2/E2-a in Allexis and Viola (gray), and to phylogenies for Viola [26,32,35], based solely on the (3) NRPD2/E2-b in Viola (white). Window length was set to 54 bases nuclear ITS region, and for Violaceae [25], based on 4 and step size to 9 bases. Sites interacting with the other Pol IV/V nuclear and chloroplast regions. Our results indicate subunits Nrpd1/Nrpe1 (d1/e1), Nrpd3/Nrpe3 (d3/e3) and Nrpd10/ that, at the family level, this incongruence is due to Nrpe10 (d10/e10) are shown, based on findings for Pol II [2]. Sites duplication of NRPD2/E2 and the uneven sorting of under neutral (dN/dS = 1) or positive selection (dN/dS > 1) are seen in a restricted 54 bp region, from position 249 through 302, for paralogs among lineages (Figure 2). Within Viola,the

NRPD2/E2-a and NRPD2/E2-b while purifying selection (dN/dS <1) incongruence appears to result partly from the notorious predominates in the rest of the locus. Corynostylis arborea and failure of ITS to capture allopolyploid relationships, a B_Allexis batangae were excluded from the sliding window analysis common consequence of gene conversion among the because of a lack of data from exon 7. often thousands of copies of this gene within the plant Marcussen et al. BMC Evolutionary Biology 2010, 10:45 Page 9 of 15 http://www.biomedcentral.com/1471-2148/10/45

Rinorea Hybanthus singleton Cubelium Corynostylis NRPD2/E2 Anchietea B Allexis B congesta Andinium B2 BANKSII CHILERP .0276 .2442 B SELKIRKII .0000 B VERECUNDA .0002 B EPIPSILA B ULIGINOSA .0304 B spathulata NRPD2/E2-b .0726 B pubescens B1 nuttallii CHAM duplication B biflora B1 maculata B3 BANKSII .0000 B4 maculata .0707 B4 banksii A ALLEXIS A congesta Andinium .0571 A1 MACULATA CHILERP .0001 A1 hirta A1 verecunda A1 mirabilis .0000 A2 dirimliensis .0529 A3 DIRIMLIENSIS .0288 MELVIO .0577 A1 BICOLOR .0693 .0000 A2 bicolor .0000 A1 calcarata .0668 A2 calcarata A3 calcarata NRPD2/E2-a A3 BANKSII A biflora .0281 A2 selkirkii A4 dirimliensis .0000 d A3 bicolor N A6 calcarata CHAM dS .0274 A5 calcarata A4 CALCARATA d /d > 1, p = .010 .0000 A6 BICOLOR N S A5 BICOLOR dN = dS = 0 (no data) A4 bicolor

Figure 5 dN and dS estimated for NRPD2/E2-a and NRPD2/E2-b along individual branches in the Violaceae phylogeny.ValuesofdN (above branches) and dS (below branches) along each branch were estimated by using the free-ratio model using the CodeML program in PAML [49]. For the branches drawn with bold lines, dN/dSwas larger than 1 (p = 0.010) under a 2-rate model, which suggests that positive selection acted on these lineages. Branches collapsing in a MP phylogenetic analysis of the 54 bp region are indicated with broken lines. NRPD2/ E2 duplication in the common ancestor of Allexis and Viola is indicated. Only species names are shown for Viola, and non-Viola taxon names are shown in boldface. Pseudogenes are indicated with capital letters. genome [33], and partly from the evolutionary unstable less strongly supported (MP: 67%, ML: 76%). On the copy number of NRPD2/E2 (Figure 3). A genus phylo- other hand, we found no evidence that this incongru- geny based on low-copy nuclear genes is currently being ence resulted of recombination. constructed and is largely congruent with the NRPD2/ The second duplication event, in the common ances- E2-a phylogeny (Marcussen T, Oxelman B, Blaxland K, tor of the genera Allexis and Viola,isincontestable Jakobsen KS, in prep.). because it is retained in most of the descendants. Line- In comparison with an organism phylogeny of Viola- age sorting of paralogs may, however, also explain the ceae, based on data from Tokuoka [25] with an extra phylogenetic pattern and link the two duplication accession of Cubelium rbcL, our data suggest that events. The first duplication may in fact have persisted NRPD2/E2 was duplicated twice within the evolutionary inthelineageleadingtoAllexis and Viola, but been history of the family. The reconciled GeneTree phylo- subject to an event of sequence replacement in the com- geny (Figure 2C) indicated a first duplication relatively mon ancestor of these two genera. Either scenario basally in the family, after the split of Rinorea, with sub- would appear in the phylogeny as an independent dupli- sequent complete sorting of paralogs in the descendant cation basal to Allexis and Viola. genera. so that one paralog was retained in Cubelium, Interestingly, for the Caryophyllaceae it has not been Hybanthus, Anchietea and Corynostylis,andthesecond possible to trace back the origin of the NRPD2/E2 dupli- paralog was retained in Allexis and Viola. This interpre- cation event either. Judging by paralog similarity, the tation, however, entirely rests on the conflicting phylo- duplicationintribeSileneae seems to be a relatively genetic position of Cubelium in the species phylogeny, recent one and may well have occurred within this tribe which received strong bootstrap support (MP: 98%, ML: [10]. In contrast, Cerastium, which belongs to another 100%), versus in the NRPD2/E2 phylogeny which was subfamily [36], has NRPD2/E2-a and NRPD2/E2-b Marcussen et al. BMC Evolutionary Biology 2010, 10:45 Page 10 of 15 http://www.biomedcentral.com/1471-2148/10/45

Allexis

Viola sect. Andinium NRPD2/E2 duplication

CHILERP clade V. sect. Chilenium

V. sect. Erpetion

CHAM V. sect. Chamaemelanium clade V. sect. Viola

V. spathulata

V. sect. Melanium organism lineage NRPD2/E2-a lineage NRPD2/E2-b lineage MELVIO clade Figure 6 NRPD2/E2 gene lineage phylogeny versus organism phylogeny of Allexis and Viola. Unbroken lines denote putatively expressed paralogs, broken lines denote pseudogenes, and thick lines branches that have undergone positive selection (cf. Figure 5). Dots indicate events of pseudogenization or gene death. Truncated gray branches indicate the three (presumably extinct) lineages that contributed genomes to the four paleopolyploid Viola sections Chilenium, Erpetion, Melanium and Viola. Putative allopolyploidization events are indicated by diagonal NRPD2/ E2 lines connecting the organism lineages. For simplificity, the sections Melanium and Viola are shown as derived from the same allopolyploidization event, and secondary gene duplications within section Melanium have been omitted. paralogs that are substantially more divergent than in dehydrogenase (unpublished data) and homeotic floral the Sileneae and that may result from an older duplica- genes (Ballard HE, personal communication). tion (Brysting AK, Mathiesen C, Marcussen T, in prep.). Most Viola groups were found to have a more or less Thus, it may be that the small NRPD2/E2 gene family is balanced number of NRPD2/E2-a and NRPD2/E2-b subject to massive concerted evolution between para- copies. Presumably due to redundancy following poly- logs, as already indicated in Silene by gene conversion ploidy, massive pseudogenization of this gene family has (loss of NRPD2/E2-b and duplication of NRPD2/E2-a) happened in the paleopolyploid sections Chilenium and within one lineage. Due to these factors it may be hard Erpetion. However, the situation is very different in the to pinpoint the duplication event on a phylogenetic tree. other two polyploid sections, Viola and Melanium. These Within the Violaceae, there are certain indications two sections have their allopolyploid origin in one or sev- from ongoing research that the original duplication of eral wide hybridization events between two major diploid NRPD2/E2 may be connected with whole genome dupli- clades,CHAMandMELVIO.TheCHAMclade,today cations via allopolyploidy. Recent findings for the genus represented by the diploid section Chamaemelanium, Ionidium, which belongs to the same clade as Anchietea has apparently functional copies of both NRPD2/E2-a and Corynostylis in the present study, based on karyol- and NRPD2/E2-b, while the MELVIO clade, which is ogy (Seo MN, Sanso AM, Xifreda CC, unpublished) and now extinct as diploid, has secondarily lost its NRPD2/ a low-copy nuclear gene (unpublished data), suggest E2-b paralog. Assuming subneofunctionalization of the that the currently accepted base chromosome number two paralogs (which is suggested by positive selection, (x = 8) for this genus, and for large parts of the family, see below), this would mean that the remaining MELVIO is in fact tetraploid. There is some evidence of paleaote- paralog, which is by phylogenetic origin an “A” paralog, traploidy also in Viola as, apart from NRPD2/E2,also must have regained the ancestral expression state per- several other low-copy genes have been found to be forming both “A” and “B” functions. Thus, the sections duplicated, i.e. chalcone synthase [37], shikimate Viola and Melanium inherited one paralog of each Marcussen et al. BMC Evolutionary Biology 2010, 10:45 Page 11 of 15 http://www.biomedcentral.com/1471-2148/10/45

NRPD2/E2-a and NRPD2/E2-b from the CHAM ances- The first half of this region corresponds to an ordered tor, and from the MELVIO an NRPD2/E2-a paralog with loop interacting with the tenth-largest subunit (Nrpb10) both “A” and “B” function. This “incomplete redundancy” and its second half forms an a-helix that interacts with in the polyploids may have lead to further gene death the third-largest subunit (Nrpb3). Since structure and within the two sections. In species belonging to these function tend to be conserved over all eukaryot Pols [2] sections (V. spathulata excepted), the current CHAM and, especially, because of the close phylogenetic rela- NRPD2/E2-b paralog is either a pseudogene (section tionship between the Pol II and Pol IV/V subunit genes Viola except V. hirta) or has been completely lost (sec- [3], we can assume that this region of Nrpd2/Nrpe2 tion Melanium and V. hirta), while having two NRPD2/ interacts with homologs of Nrpd3 and Nrpd10. Under E2-a copies, one derived from CHAM and a second from the assumption that the differentiation of NRPD2/E2-a MELVIO. In V. spathulata all the MELVIO paralogs and NRPD2/E2-b reflect specialization with respect to have been pseudogenized and only the CHAM-derived PolIVandV,wesuggestthatthisregionisimportant paralogs are expressed; this was, apparently, also followed for correct recognition of subunits specific to Pol IV by a more recent polyploidization event. and Pol V in Viola. This likely applies also to duplicated Thus, both sections Viola and Melanium, although NRPD2/E2 in other eudicot lineages, and in this respect tetraploid, possess putatively functional NRPD2/E2 the between-paralog divergence of the very same region copies only of NRPD2/E2-a, derived from the ancestral also in Silene [10] is noteworthy. MELVIO and CHAM genomes. As both these seem In Arabidopsis thaliana the exact subunit composi- functional, and have not suffered the same fate as the tions of Pol IV and Pol V are known [1]. In this species, NRPD2/E2-b paralog during the same period of time, it only a single copy of NRPD2/E2 is expressed (although maybethatsomedegreeofde novo subfunctionaliza- another very similar duplicate is pseudogenized) and its tion has evolved between these two NRPD2/E2-a para- protein product assembles with both Pol IV and Pol V logs. Further research is needed to shed light upon this [1]. Of the two subunit genes with whose gene products issue. Nrpd2-a and Nrpd2-b interact, the NRPD10 homolog is notduplicatedinArabidopsis and shared between Pols Positive selection is seen in regions associated with II, IV and IV. NRPD3/E3, however, exists in two rather subunit interaction similar paralogs (85% sequence similarity at the protein In cases with ongoing neofunctionalization following level) that are incompletely subfunctionalized between gene duplication, one would expect positive selection to Pol II/V and Pol V and apparently have been under be acting on parts of one or both paralogs, and dN to be positive selection (not shown). In the other genome larger than dS [e.g., [38]]. Our findings for NRPD2/E2 fit sequenced eudicots, Populus trichocarpa and Vitis vini- well with these assumptions: only purifying selection fera, neither of these genes are duplicated. If, however, was detected among taxa with singleton NRPD2/E2; NRPD3/E3 is duplicated in Viola and Violaceae, in addi- while soon after duplication of the gene in the common tion to the basal differentiation of NRPD1 and NRPE1, ancestor of Allexis and Viola both NRPD2/E2 paralogs this could give some indications about how NRPD2/E2 seem to have been subjected to rapid sub- and neofunc- came to be duplicated in this angiosperm family. tionalization. This is detected as dN/dS ratios larger than 1 along these branches, indicating positive selection, Conclusions especially early in the divergence process (Figure 6). Aspects of the build, function and origin of the two aty- This process appears to have happened at different pical plant RNA polymerases Pol IV and Pol V are a hot times in NRPD2/E2-a and in NRPD2/E2-b.Ourbranch topic in current research. This knowledge has in turn analysis suggests rapid specialization of NRPD2/E2-a in opened for study the dynamics of the origin and specia- the common ancestor of Allexis and Viola while for lization of the individual subunits and their co-evolution NRPD2/E2-b, positive selection occurred at a later time within a phylogenetic framework. and only in Viola,notinAllexis.ComparedtoViola, Herein, we have presented the first documentation of higher redundancy in Allexis due to a still incomplete possible co-evolution among subunits of Pol IV/V, from complementation of the two paralogs, may have facili- within the angiosperm family Violaceae. Following tated pseudogenization of NRPD2/E2-a in A. batangae. duplication, NRPD2/E2-a and NRPD2/E2-b,encoding It is likely that the 54 bp region is important for the Pol IV/V subunits, underwent rapid specialization (neo- specialization and neofunctionalization of the two functionalization) in a region that is important for subu- NRPD2/E2 paralogs in Viola. Crystallography of yeast nit interaction and recognition. We conclude that Pol II has shown that this region of the second-largest correct recognition of the type-specific subunits is subunit (Nrpb2) is part of a “hybrid binding” domain important for the correct function of each of Pol IV and that is involved in subunit recognition and binding [2]. Pol V. Marcussen et al. BMC Evolutionary Biology 2010, 10:45 Page 12 of 15 http://www.biomedcentral.com/1471-2148/10/45

Our study on Violaceae and previous studies on Car- were omitted; the obtained working solution was not yophyllaceae draw a picture of NRPD2/E2 as a young diluted, and 5-10 μl were used in 80-160 μlPCRreac- gene family, still in the process of diverging and specia- tions divided into an appropriate number of tubes. lizing and still subject to strong concerted evolution among paralogs. The few species and genera that have PCR and sequencing been studied show a variable number of Pol IV/V gene Primer sequences and standard PCR conditions are pre- copies. Since the divergence of Pol IV and Pol V seems sented in Table 2. The nomenclature of exons and to have only barely preceded the radiation of the angios- introns follows the terminology in Arabidopsis.The perms, we can expect their young gene families to have NRPD2/E2 locus, ranging from exon 5 through most of acquired different lineage-specific specializations within exon 7, was in most taxa amplified in two PCRs with angiosperms. overlapping range, one PCR covering exon 5 through intron 6 using the primers 5F2898 and Svex7R3420, and Methods a second covering exon 6 through exon 7 using the pri- Material mers vex6F3263 and 7R3883 (Figure 1). This approach The investigated 18 species of Viola (Table 1) were was preferred to amplifying the entire locus in a single selected so as to cover the taxomomic diversity (i.e. fol- PCR, because (1) it increases the chance of discovering lowing Ballard et al. [26]), geographical diversity and all paralogs (especially for pseudogenes where the primer ploidal levels of the genus Viola. Represented in this binding sites are no longer conserved), because (2) it study were Viola section Andinium (South America; V. reduces the amount of PCR recombination which is congesta), section Chilenium (South America; V. macu- expected to increase with gene copy number, their simi- lata), section Erpetion (eastern Australia; V. banksii), larity and length of the amplified fragment [cf. [40]]. section Chamaemelanium (mainly East Asia and North Where one of the PCRs failed (notably for the outgroup America; 4 species), section Melanium (mainly Mediter- taxa for which DNA was extracted from herbarium mate- ranean; 3 species) and section Viola (northern hemi- rial and of inferior quality), shorter stretches of DNA sphere; 7 species). Six outgroup taxa (Table 1) were were sought amplified in three separate PCRs, (1) using selected from within Violaceae, of which Allexis was the primer pairs 5F2898 (or 5F3062) and vex6R3371, (2) known to be phylogenetically close to Viola [25], and vex6F3263 and Svex7R3420, and (3) vex7F3418 and non-Violaceae outgroups from within the Malpighiales, 7R3883. The primers were designed based on DNA Populus trichocarpa (Salicaceae), Manihot esculenta and sequences available on GenBank, and in some cases Euphorbia esula (Euphorbiaceae). based on already existing Viola sequence data. PCR products were separated by electrophoresis on 1% DNA isolation agarose gels, and multiple bands were cut out separately DNA was extracted using a CTAB extraction protocol and cleaned using the E.Z.N.A. Gel Extraction Kit (Omega [39]. In most cases stock DNA was diluted 20 times for Bio Tek, Doraville, GA, USA) following the manufacturer’s working solutions of which 1 μl was used per 20-40 μl manual. Some cleaned products were sequenced directly, PCR reaction. For “difficult” DNA preparations, where but generally these were cloned using the TOPO TA higher template amounts or cleaner template were Cloning Kit (Invitrogen, Carlsbad, CA, USA) according to needed in the PCR reaction, the obtained stock DNA the manufacturer’s manual, with the exception that only solution was further cleaned using the DNeasy Blood & half of the volumes recommended for the reactions were Tissue Kit (Qiagen, Düsseldorf, Germany), following the used. Between 3 and 20 positive colonies from each reac- manufacturer’s guidelines except that the first 2 steps tion were screened by direct PCR using primers TOPO_F

Table 2 Standard PCR and sequencing primers, primer combinations and annealing temperatures used Region Forward primer Reverse primer Annealing temperature ex5-in6 (PCR 1) 5F2898: TTGACAGCCTYGATGATGAT Svex7R3420: ATCTTGAAAATCCAGCCC 52°C ex5-in6 5F3062: AATGATGASGGGAAGAATTTTGC Svex7R3420 52°C ex6-ex7 (PCR 2) vex6F3263: GYCARCTYCTTGAGGCTGC 7R3883: ATVCCCATGCTGAAKAGCTCYTG 59°C ex5-ex6 5F2898 vex6R3371: YMTCRACACTGGGAGTGGAG 54°C ex5-ex6 5F3062 vex6R3371 57°C ex6-in6 vex6F3263 Svex7R3420 52°C ex7 vex7F3418: GGCTGGATTTTCAAGATGG 7R3883 55°C PCR mix: 20 to 40 μl reactions; 0.2 mM dNTPs, 0.25 μM of each of the primers, 1× Phusion HF buffer, 0.008 U/μl Phusion polymerase. The PCR conditions were as follows: initial denaturation at 95°C for 30 s followed by 35 cycles of 95°C for 9 s, annealing at a temperature specified below for 30 s, and 72°C for 30 s. The PCR ended with 7:30 minutes at 72°C and subsequent soak at 10°C. Marcussen et al. BMC Evolutionary Biology 2010, 10:45 Page 13 of 15 http://www.biomedcentral.com/1471-2148/10/45

(GGCTCGTATGT-TGTGTGGAATTGTG) and three partitions of the sequence data: exons, introns and TOPO_R (AGTCACGACGTTGTAAAACGACGG). PCR coded indels. Nucleotide substitution models for the products were diluted 10 times and sequenced one way exon and intron partitions were proposed by Treefinder, using either T7 or M13R universal primer. Sequencing while for coded indels a uniform rate model (Jukes-Can- was done with BigDye 3.1 sequencing Kit (Applied Biosys- tor) was applied (in Treefinder this substitution model tems, Foster City, CA, USA) on 3730 ABI DNA analyzer had to be specified as “HKY [{1,1,1,1,1,1}, Optimum]:GI (Applied Biosystems). [Optimum]:4”). The Violaceae matrix (1) was analyzed All sequence chromatograms were controlled manu- using the 4-rate model J1+I+G (TA = TG; CA = CG) for ally and sequence alignments established in BioEdit [41] both exons and introns. The Viola matrix (2) was ana- by manual adjustments. Indel characters were coded by lyzed with the 3-rate model TN (TA = TG = CA = CG) using the simple gap-coding method in SeqState [42] for exons and the 4-rate model J1+I+G (TA = TG; CA = and appended to the alignment. Four data alignment CG) for introns. Mamimum Likelihood bootstrap ana- matrices were generated; these are available as addi- lyses were carried out with the same settings and with tional files 1, 2, 3 and 4. (1) The first (Violaceae) matrix 100 replicates. (see Additional file 1) consisted of intron and exon sequences of Allexis batangae, Anchietea parvifolia, Cor- Detection of gene duplication in Violaceae ynostylis arborea, Cubelium concolor, Hybanthus enneas- The obtained NRPD2/E2 phylogeny was incongruent permus, Rinorea ilicifolia, Viola biflora and V. congesta with the organism phylogeny of Violaceae, and so we aligned to an outgroup consisting of GenBank exon- further analyzed the Violaceae matrix for possible gene only sequences of non-Violaceae Populus trichocarpa, recombination and/or duplication events. The GeneTree Manihot esculenta and Euphorbia esula. (2) The second software [45] was used to construct a reconciled tree by (Viola) matrix (see Additional file 2) consisted of all embedding the NRPD2/E2 tree within the Violaceae spe- Viola sequences aligned to Allexis batangae sequences cies tree. The Violaceae species tree was obtained from (outgroup). (3) The third matrix (see Additional file 3) re-analysis of Tokuoka’s [25] 4-gene data set of atpB, was used to examine dN/dS ratios with a sliding window matK, rbcL and 16S for a taxon subset corresponding to approach using the DnaSP software, and consisted of the one used in this study. Cubelium concolor was not only exon sequences of all the non-Viola Violaceae included in Tokuoka’sphylogenybutamatKsequence sequences along with all NRPD2/E2-a and NRPD2/E2-b was available on GenBank (EF135550, as Hybanthus con- sequences of V. congesta, V. banksii, V. maculata, V. color). In order to firmly place Cubelium in the family biflora, V. brevistipulata, V. nuttallii and V. pubescens. phylogeny and to compensate for weaker data for this (4) The fourth matrix (see Additional file 4) was used to taxon, we also included in our analysis Orthion subsessile, estimate individual dN/dS ratios for phylogenetic which has been considered close to Cubelium on mor- branches using the PAML software; this matrix con- phological grounds [34]. Maximum Parsimony and ML sisted of a reduced data set of the 44 Violaceae taxa analyses were carried out as above, except for ML using a having unidentical NRPD2/E2 sequences (i.e., duplicate 6-rate substitution model GTR+G. To screen the Viola- sequences were removed) within a 54 bp exon domain ceae alignment for possible recombination we employed in which positive selection was detected in the DnaSP the Genetic Algorithms for Recombination Detection analysis. Rinorea was used as outgroup. (GARD) [46,47], on the exons only, using general para- meter settings (GTR model of nucleotide substitution Phylogenetic reconstruction and beta-gamma rate variation with 2 rate classes). We The Violaceae and Viola matrices were used for phyloge- then separated the alignments at the detected break- netic reconstruction. Owing to a large number of points within the region, estimated MP phylogenetic sequences in the Viola matrix NRPD2/E2-a and NRPD2/ trees (1000 bootstrap replicates) for the individual sec- E2-b were analyzed separately, using maximum parsi- tions, and checked for incongruent topologies. mony (MP) and maximum likelihood (ML). MP analyses of all three data sets were performed with TNT version Detection of positive selection 1.1 [43], using traditional search, tree bisection-reconnec- Estimating dN (the number of nonsynonymous substitu- tion (TBR) branch swapping, 10 replicates (number of tions per nonsynonymous site) and dS (the number of added sequences), and 10 trees saved per replication. synonymous substitutions per synonymous site) between Maximum Parsimony bootstrap analyses were carried coding sequences is useful for detecting whether there out with the same settings and with 1000 replicates. Max- has been purifying selection (dN/dS <1),neutralevolu- imum Likelihood analyses of all three data sets were per- tion (dN/dS = 1) or positive selection (dN/dS > 1). formed with Treefinder version of March 2008 [44] and For data matrix 3 (exons), the polymorphism and run with different nucleotide substitution models for the divergence module of the DnaSP package [48], using the Marcussen et al. BMC Evolutionary Biology 2010, 10:45 Page 14 of 15 http://www.biomedcentral.com/1471-2148/10/45

sliding window option, were used to make a graphic version of this manuscript. Nicola Barson is thanked for correcting the language. This work was supported by the Norwegian Research Council representation of the pattern of change of dN/dS ratios (grant no. 170832: “Allopolyploid evolution in plants: patterns and processes along the sequence. We generated three sequence cate- within the genus Viola“) and the Swedish Research Council (grant no. 2006- gories that were each compared to Rinorea (which is sis- 3766). ter to the other examined Violaceae taxa). The first Author details category consisted of all taxa (except Rinorea)having 1Centre for Ecological and Evolutionary Synthesis (CEES), Department of singleton NRPD2/E2,i.e.Corynostylis, Cubelium and Biology, University of Oslo, 0316 Oslo, Norway. 2Department of Plant and Environmental Sciences, University of Gothenburg, SE-40530 Göteborg, Hybanthus. The second category consisted of 13 Sweden. NRPD2/E2-a exon sequences from Allexis and Viola (V. congesta,2ofV. banksii,3ofV. maculata, V. biflora, V. Authors’ contributions TM designed the study, collected material, contributed the molecular brevistipulata,3ofV. nuttallii and V. pubescens). The studies, performed the phylogenetic and selection analysis and led the third category consisted of 10 NRPD2/E2-b exon writing of the manuscript. AS contributed to the molecular studies. BO and sequences from the same Viola taxa as for the second KSJ participated in the coordination and design of the study, interpretation of the results and writing of the manuscript. All authors read and approved category. As DnaSP does not support gaps nor stop the final manuscript. codons, pseudogenes (except for Allexis NRPD2/E2-a, because of its phylogenetic position) and incomplete Received: 16 May 2009 sequences (e.g., Anchietea NRPD2/E2 and Allexis Accepted: 16 February 2010 Published: 16 February 2010

NRPD2/E2-b) had to be omitted. Within Viola,taxaof References the sections Melanium and Viola were omitted because 1. Ream TS, Haag JR, Wierzbicki AT, Nicora CD, Norbeck A, Zhu JK, Hagen G, they do not possess functional copies of both NRPD2/ Guilfoyle TJ, Paša-Tolić L, Pikaard CS: Subunit compositions of the RNA- silencing enzymes Pol IV and Pol V reveal their origins as specialized E2-a and NRPD2/E2-b. forms of RNA Polymerase II. Mol Cell 2009, 33:192-203. For the 54 bp region for which positive selection was 2. Cramer P, Bushnell DA, Kornberg RD: Structural basis of transcription: RNA detected with DnaSP, we used the CodeML software of polymerase II at 2.8 Ångström resolution. Science 2001, 292(5523):1863-1876. the PAML package [49] to further determine at which 3. Luo J, Hall BD: A multistep process gave rise to RNA polymerase IV of branches in the phylogeny positive selection had land plants. J Mol Evol 2007, 64(1):101-112. occurred. A simplified organism phylogeny was used as 4. Onodera Y, Haag J, Ream T, Nunes P, Pontes O, Pikaard C: Plant nuclear RNA polymerase IV mediates siRNA and DNA methylation-dependent input tree file due to the short length (54 bp) of the heterochromatin formation. Cell 2005, 120(5):613-622. sequence analyzed (Figure 5). 5. Wierzbicki AT, Haag JR, Pikaard CS: Noncoding transcription by RNA Polymerase Pol IVb/Pol V mediates transcriptional silencing of overlapping and adjacent genes. Cell 2008, 135(4):635-648. Additional file 1: Violaceae matrix. This file represents the NRPD2/E2 6. Oxelman B, Yoshikawa N, McConaughty BL, Luo J, Denton AL, Hall BD: alignment from 8 Violaceae taxa aligned to exon sequences of a non- RPB2 gene phylogeny in flowering plants, with particular emphasis on Violaceae outgroup. asterids. Mol Phylogenet Evol 2004, 32(2):462-479. Click here for file 7. Jaillon O, Aury J-M, Noel B, Policriti A, Clepet C, Casagrande A, Choisne N, [ http://www.biomedcentral.com/content/supplementary/1471-2148-10- Aubourg S, Vitulo N, Jubin C, et al: The grapevine genome sequence 45-S1.TXT ] suggests ancestral hexaploidization in major angiosperm phyla. Nature Additional file 2: Viola matrix. This file represents the alignment of 2007, 449(7161):463-468. NRPD2/E2 copies from 18 Viola taxa aligned to Allexis batangae as 8. Vilatersana R, Brysting AK, Brochmann C: Molecular evidence for hybrid outgroup. origins of the invasive polyploids Carthamus creticus and C. turkestanicus Click here for file (Cardueae, Asteraceae). Mol Phylogenet Evol 2007, 44(2):610-621. [ http://www.biomedcentral.com/content/supplementary/1471-2148-10- 9. The Arabidopsis Genome Initiative: Analysis of the genome sequence of 45-S2.TXT ] the Arabidopsis thaliana. Nature 2000, 408:796-815. Additional file 3: DnaSP Positive selection sliding window matrix. 10. Popp M, Oxelman B: Evolution of a RNA polymerase gene family in Silene This file represents the alignment of 27 NRPD2/E2 exon sequences from (Caryophyllaceae) - incomplete concerted evolution and topological congruence among paralogues. Syst Biol 2004, 53(6):914-932. Violaceae taxa used to examine dN/dS ratios with a sliding window approach using the DnaSP software. 11. Ralph S, Oddy C, Cooper D, Yueh H, Jancsik S, Kolosova N, Philippe RN, Click here for file Aeschliman D, White R, Huber D, et al: Genomics of hybrid poplar [ http://www.biomedcentral.com/content/supplementary/1471-2148-10- (Populus trichocarpa × deltoides) interacting with forest tent caterpillars 45-S3.TXT ] (Malacosoma disstria): normalized and full-length cDNA libraries, expressed sequence tags, and a cDNA microarray for the study of Additional file 4: PAML positive selection matrix. This file represents insect-induced defences in poplar. Mol Ecol 2006, 15(5):1275-1297. the 54 basepair alignment of NRPD2/E2 exon sequences for 44 Violaceae 12. Kong H, Landherr LL, Frohlich MW, Leebens-Mack J, Ma H, dePamphilis CW: taxa. Patterns of gene duplication in the plant SKP1 gene family in Click here for file angiosperms: evidence for multiple mechanisms of rapid gene birth. [ http://www.biomedcentral.com/content/supplementary/1471-2148-10- Plant J 2007, 50(5):873-885. 45-S4.TXT ] 13. Town CD, Cheunga F, Maitia R, Crabtree J, Haasa BJ, Wortmana JR, Hinea EE, Althoffa R, Arbogasta TS, Tallona LJ, et al: Comparative genomics of Brassica oleracea and Arabidopsis thaliana reveal gene loss, fragmentation, and dispersal after polyploidy. Plant Cell 2006, Acknowledgements 18:1348-1359. We thank Kim Blaxland and Gerd Knoche for providing plant material, and 14. Tuskan GA, DiFazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, two anonymous reviewers for reading and commenting on a previous Putnam N, Ralph S, Rombauts S, Salamov A, et al: The Genome of Black Marcussen et al. BMC Evolutionary Biology 2010, 10:45 Page 15 of 15 http://www.biomedcentral.com/1471-2148/10/45

Cottonwood, Populus trichocarpa (Torr. & Gray). Science 2006, 40. Popp M, Oxelman B: Inferring the history of the polyploid Silene aegaea 313(5793):1596-1604. (Caryophyllaceae) using plastid and homoeologous nuclear DNA 15. Yang X, Tuskan GA, Cheng Z-M: Divergence of the Dof gene families in sequences. Mol Phylogenet Evol 2001, 20(3):474-481. poplar, Arabidopsis, and rice suggests multiple modes of gene evolution 41. Hall TA: BioEdit: a user-friendly biological sequence alignment editor and after duplication. Plant Physiol 2006, 142:820-830. analysis program for Windows 95/98/NT. Nuc Acids Symp Ser 1999, 16. Force A, Lynch M, Pickett FB, Amores A, Yan Y-L, Postlethwait J: 41:95-98. Preservation of duplicate genes by complementary, degenerative 42. Müller K: SeqState - primer design and sequence statistics for mutations. Genetics 1999, 151:1531-1545. phylogenetic DNA data sets. Appl Bioinform 2005, 4:65-69. 17. Hughes AL: The evolution of functionally novel proteins after gene 43. Goloboff PA, Farris JS, Nixon KC: TNT, a free program for phylogenetic duplication. Proc Biol Sci 1994, 256(1346):119-124. analysis. Cladistics 2008, 24:774-786. 18. Hittinger C, Carroll S: Gene duplication and the adaptive evolution of a 44. Jobb G, von Haeseler A, Strimmer K: TREEFINDER: a powerful graphical classic genetic switch. Nature 2007, 449(7163):677-681. analysis environment for molecular phylogenetics. BMC Evol Biol 2004, 19. He X, Zhang J: Rapid subfunctionalization accompanied by prolonged 4(18):9. and substantial neofunctionalization in duplicate gene evolution. 45. Page RDM: GeneTree: comparing gene and species phylogenies using Genetics 2005, 169:1157-1164. reconciled trees. Bioinformatics 1998, 14(9):819-820. 20. Drea SC, Lao NT, Wolfe KH, Kavanagh TA: Gene duplication, exon gain and 46. Rieseberg LH, Baird SJE, Gardner KA: Hybridization, introgression, and neofunctionalization of OEP16-related genes in land plants. Plant J 2006, linkage evolution. Plant Mol Ecol 2000, 42(1):205-224. 46(5):723-735. 47. Kosakovsky Pond SL, Posada D, Gravenor MB, Woelk CH, Frost SD: 21. Akhunov ED, Akhunova AR, Dvorak J: Mechanisms and rates of birth and Automated phylogenetic detection of recombination using a genetic death of dispersed duplicated genes during the evolution of a algorithm. Mol Biol Evol 2006, 23(10):1891-1901. multigene family in diploid and tetraploid wheats. Mol Biol Evol 2007, 48. Rozas J, Sánchez-DelBarrio JC, Messeguer X, Rozas R: DnaSP, DNA 24(2):539-550. polymorphism analyses by the coalescent and other methods. 22. Wang R, Chong K, Wang T: Divergence in spatial expression patterns and Bioinformatics 2003, 19:2496-2497. in response to stimuli of tandem-repeat paralogues encoding a novel 49. Yang Z: PAML 4: Phylogenetic Analysis by Maximum Likelihood. Mol Biol class of proline-rich proteins in Oryza sativa. J Exper Biol 2006, Evol 2007, 24(8):1586-1591. 57(11):2887-2897. 50. Holmgren PK, Holmgren NH: [continuously updated]. Index Herbariorum: 23. Federico ML, Iñiguez-Luy FL, Skadsen RW, Kaeppler HF: Spatial and A global directory of public herbaria and associated staff. New York temporal divergence of expression in duplicated barley germin-like Botanical Garden’s Virtual Herbarium. 1998http://sweetgum.nybg.org/ih/. protein-encoding genes. Genetics 2006, 174(1):179-190. 24. Munzinger J, Ballard HE: Hekkingia (Violaceae), a new genus of doi:10.1186/1471-2148-10-45 arborescent violet from French Guiana, with a key to genera in the Cite this article as: Marcussen et al.: Evolution of plant RNA polymerase family. Syst Bot 2003, 28(2):345-351. IV/V genes: evidence of subneofunctionalization of duplicated NRPD2/ 25. Tokuoka T: Molecular phylogenetic analysis of Violaceae (Malpighiales) NRPE2-like paralogs based on plastid and nuclear DNA sequences. J Plant Res 2008, in Viola (Violaceae). BMC Evolutionary Biology 2010 10:45. 121:253-260. 26. Ballard HE, Sytsma KJ, Kowal RR: Shrinking the violets: Phylogenetic relationships of infrageneric groups in Viola (Violaceae) based on internal transcribed spacer DNA sequences. Syst Bot 1998, 23(4):439-458. 27. Miyaji Y: Untersuchungen über die Chromosomezahlen bei einigen Viola- Arten. [In Japanese]. Bot Mag Tokyo 1913, 27:443-460. 28. Marcussen T, Nordal I: Viola suavis, a new species in the Nordic flora, with analyses of the relation to other species in the subsection Viola (Violaceae). Nord J Bot 1998, 18(2):221-237. 29. Nordal I, Jonsell B: A phylogeographic analysis of Viola rupestris: Three post-glacial immigration routes into the Nordic area?. Bot J Linn Soc 1998, 128(2):105-122. 30. Sanso AM, Seo MN: Chromosomes of some Argentine angiosperms and their taxonomic significance. Caryologia 2005, 58(2):171-177. 31. Clausen J: Cytotaxonomy and distributional ecology of western North American violets. Madroño 1964, 17:173-197. 32. Ballard HE, Sytsma KJ: Evolution and biogeography of the woody Hawaiian violets (Viola, Violaceae): Arctic origins, herbaceous ancestry and bird dispersal. Evolution 2000, 54(5):1521-1532. 33. Álvarez I, Wendel JF: Ribosomal ITS sequences and plant phylogenetic inference. Mol Phylogenet Evol 2003, 29:417-434. 34. Feng M: Floral morphogenesis and molecular systematics of the family Violaceae. PhD Athens: Ohio University 2005. 35. Yoo KO, Jang SK, Lee WT: Phylogeny of Korean Viola based on ITS sequences. Korean J Plant Taxon 2005, 35(1):7-23, (in Korean). Submit your next manuscript to BioMed Central 36. Fior S, Karis PO, Casazza G, Minuto L, Sala F: Molecular phylogeny of the and take full advantage of: Caryophyllaceae (Caryophyllales) inferred from chloroplast matK and nuclear rDNA ITS sequences. Amer J Bot 2006, 93(3):399-411. 37. Hof van den K, Berg van den RG, Gravendeel B: Chalcone synthase gene • Convenient online submission lineage diversification confirms allopolyploid evolutionary relationships • Thorough peer review of European rostrate violets. Mol Biol Evol 2008, 25(10):2099-2108. • No space constraints or color figure charges 38. Yang Z: Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol Biol Evol 1998, • Immediate publication on acceptance 15(5):568-573. • Inclusion in PubMed, CAS, Scopus and Google Scholar 39. Gabrielsen TM, Bachmann K, Jakobsen KS, Brochmann C: Glacial survival • Research which is freely available for redistribution does not matter: RAPD phylogeography of Nordic Saxifraga oppositifolia. Mol Ecol 1997, 6:831-842. Submit your manuscript at www.biomedcentral.com/submit