This is an Accepted Manuscript of an article published in The Journal on 14 February 2017, available online: http://dx.doi.org/10.1111/tpj.13442

Cytogenetic features of rRNA genes across land : analysis of the Plant rDNA database

Sònia Garcia1, Ales Kovařík2, Andrew R. Leitch3, Teresa Garnatje1

1Institut Botànic de Barcelona (IBB‐CSIC‐ICUB), Passeig del Migdia s/n, 08038 Barcelona,

Catalonia, Spain.

2Institute of Biophysics, Academy of Sciences of the Czech Republic, Královopolská 135, 612 65

Brno, Czech Republic.

3School of Biological and Chemical Sciences, Queen Mary University of London, London, UK.

Author for correspondence: Sònia Garcia. Tel: +34 932890611. Fax: +34 932890614. E‐mail:

[email protected]

Emails of all other authors: [email protected], [email protected], [email protected]

Running title: Analysis of the Plant rDNA database

Keywords: ribosomal DNA, 5S, 18S‐5.8S‐26S, 35S, rDNA loci, genome evolution, chromosome,

cytogenetics, land plants.

Total word count (main body: excluding summary, references and 4266 legends) Summary 202

Introduction 543

Results 1407

Discussion 1462 Experimental procedures 743

Acknowledgements 111

1

Summary

The online resource www.plantrdnadatabase.com stores information on number, chromosomal locations and structure of the 5S and 18S‐5.8S‐26S (35S) ribosomal DNAs (rDNA) in plants. This resource was exploited to study relationships between rDNA locus number, distribution, the occurrence of linked (L‐type) and separated (S‐type) 5S and 35S rDNA units, chromosome number, genome size and ploidy level. The analyses presented summarise current knowledge on rDNA locus numbers and distribution in plants. We analysed 2,949 karyotypes, from 1,791 species and 86 plant families and performed ancestral character state reconstructions. The ancestral karyotype (2n=16) has two terminal 35S sites and two interstitial 5S sites, while the median (2n=24) presents four terminal 35S sites and three interstitial 5S sites. Whilst 86.57% karyotypes show S‐type organisation (ancestral condition), the L‐type arrangement has evolved independently several times during plant evolution. Non‐terminal position of 35S rDNA was found in about 25% of single locus karyotypes, suggesting that terminal locations are not essential for functionality and expression. Single locus karyotypes are very common, even in polyploids. In this regard, polyploidy is followed by subsequent locus loss. This results in a decrease in locus number per monoploid genome, forming part of the diploidisation process returning polyploids to a diploid‐like state over time.

2

Significance Statement

Ribosomal DNAs (rDNAs) are abundant in genomes and exhibit sequence conservation. Here we analyse rDNA locus numbers, arrangements and distribution across plant phylogeny and assess their impact on general genomic processes such as polyploidisation.

3

Introduction

Since the early 1980’s Fluorescence In Situ Hybridization (FISH) has enabled informed karyotyping with many applications in both medical and life science fields. In particular, FISH has been used against many taxa to discriminate specific chromosomes, and against numerous hybrids and allopolyploids to identify individual genomes (Jiang and Gill, 1994, 2006). With numerous technical variants and refinements (such as fiber‐FISH, genomic ISH, mFISH and Q‐

FISH), FISH continues to be useful in, for example, species or crop line characterisation. FISH studies targetting rDNA remains the most common, with more than one report a week in 2015 alone (Web‐of‐knowledge search for “rDNA and (FISH or in situ) and plant”), probably because the sequences are abundant, repeated and highly conserved (Hemleben and Zentgraf, 1994;

Heslop‐Harrison and Schwarzacher, 2011).

Two types of rDNA are present in eukaryotes, these being 35S (in plants, Seitz and Seitz, 1979) and 45S (in animals) rDNA encoding 18S‐5.8S‐26S rRNA genes, and 5S rDNA encoding 5S rRNA.

Typically these genes occur as tandemly arranged, repetitive units that vary greatly in copy numbers between species, from few to several hundreds or even thousands of copies. In most plants and animals the 18S‐5.8S‐26S rRNA genes are physically separated from the 5S rRNA genes (Separate or S‐type arrangement). More rarely they are linked in the same unit, the so‐ called Linked or L‐type arrangement (see examples, Sone et al., 1999 in bryophytes; Garcia et al., 2009a, 2010, in angiosperms; Galián et al., 2012 in Ginkgo; Garcia and Kovařík, 2013, in other gymnosperms).

Ribosomal DNA repeat units occur at one or many rDNA loci in the genome. Here they are thought to evolve in concert (concerted evolution), so that they are more homogeneous in sequence structure within the array than would be expected by random mutation, yet at non‐ coding regions at least, the copies do differ within and between species (Elder and Turner,

4

1995; Eickbush and Eickbush, 2007). The mechanism(s) giving rise to concerted evolution is unclear, potentially copy variants arise from unit “birth‐and‐death” (Nei and Rooney, 2005), whereby new genes arise from successive duplications that are either maintained for a long time, lost, or degenerate into pseudogenes, or via recombination‐based processes (e.g. unequal recombination). Certainly many rDNA arrays carry pseudogenes (Rooney and Ward,

2005). Many species have copy variants too, as examples of many, there are different 5S rDNA variants in the genomes of fish (Martins and Wasko, 2004), orchids (Lan and Albert, 2011), cotton and tobacco (Cronn et al., 1996, Matyášek et al., 2002) and different 35S rDNA variants in some species of Cactaceae (Harpke and Peterson, 2006), Apocynaceae (Weitemier et al.,

2015) and cycads (Wang et al., 2016).

This paper reports the distribution of 5S and 35S rDNA loci reported for plant genomes. It exploits the data contained in the Plant rDNA database, release 2 (Garcia et al., 2014) with additional data published until December 2015. The Plant rDNA database

(www.plantrdnadatabase.com) is an online resource that currently presents the data from more than 600 publications which localise rDNA using FISH. A synthesis of these rDNA data is the objective of this paper. Given the amount of data assembled, the Plant rDNA database provides a unique opportunity to evaluate the biology influencing rDNA locus number and distribution, enabling a comprehensive overview of rDNA distribution in plants.

Results

The Plant rDNA database and data from Supporting Information Table S1 compiles information from 2,949 karyotypes, including 1,791 plant species, excluding duplicates (i.e. when the same species has the same count). A detailed summary of these data is presented in Table 1 and in

5

Supporting Information Table S5. The ploidy levels range from 1x to 20x and chromosome numbers from 4 to 180. The study includes data from 77 angiosperm families, representing

18.51% of the 416 families recognised by the APG IV (2016). For gymnosperms, data is available for 6 out of 12 families considered by Christenhusz et al. (2011). Bryophytes are represented by only two of 212 families (1.79%) according to Goffinet and Buck (2004) and ferns by three of 37 families recognised by Smith et al. (2006). Those families best represented in terms of percentage of all families are the Pinaceae (amongst gymnosperms), Poaceae

(amongst monocots) and Fabaceae (amongst ). Most karyotypes (66.26%) are diploid and amongst gymnosperms, all the species analysed but two are diploid.

As for the phylogenetic reconstruction, the resulting tree topology was overall consistent with currently accepted land plant phylogeny. All the large groups (bryophytes, ferns, gymnosperms, monocots and eudicots), as well as the most important plant families, were monophyletic and highly supported (Posterior Probability values, PPs, between 0.99 and 1; Supporting

Information Fig. S4).

5S and 35S rDNA locus numbers, locus distributions and chromosome numbers

The median karyotype, calculated from all data is 2n=24 with one to two interstitial 5S rDNA loci and two terminal 35S rDNA. Most species have more 35S rDNA than 5S rDNA loci (47.79% of karyotypes), although for 19.05% the reverse is the case, the remainder (33.16%) having the same number of loci. The percentages are similar if eudicots and monocots are considered separately, whilst in gymnosperms most species have more 35S rDNA loci than 5S rDNA loci

(see Supporting Information Table S5). However, these data are strongly influenced by the prevalence of Pinaceae in the database (61.67% of gymnosperm karyotypes belong to this family).

6

If a karyotype displays a single 5S or 35S rDNA locus then that locus must be functional, whilst in karyotypes with multiple loci, some may be inactive. When considering only diploid karyotypes, 51.38% have a single 5S rDNA locus and 35.16% have a single 35S locus. Even when the analysis includes polyploids, which might be expected to have multiple loci, the percentages of single locus karyotypes remains high: 40.35% of karyotypes have a single 5S rDNA locus and 27.94% a single 35S rDNA locus (see Table 2 for a breakdown to taxonomic group ‐ eudicots, monocots and gymnosperms).

In eudicots and monocots, the position of 35S rDNA on the chromosomes is typically terminal, whilst in gymnosperms the sites are mostly interstitial (Figure 1), the later again reflecting the prevalence of Pinaceae. To account for uneven representation of taxa across the land plant lineages we reconstructed ancestral karyotypes (i.e. the somatic chromosome number and the numbers and positions of 5S and 35S rDNA of the land plants ancestor), which considers phylogenetic relationships amongst taxa. The reconstructed ancestral land plant karyotype has one interstitial 5S locus and one terminal 35S locus, with 2n=16. The ancestral position of 35S rDNA is reconstructed as terminal, and it is the most common position across the tree (Figure

2). Moreover, 76.21% of plants with a single 35S rDNA locus have that locus at a terminal location. In contrast, when there are multiple 35S loci, the proportion of terminal sites drops to

51.51% of sites.

In contrast to 35S rDNA, there is a more variable distribution of 5S rDNA sites across land plants, with interstitial sites being the most common overall (34.99%), although in gymnosperms (dominated by Pinaceae) a terminal position is most abundant (38.71% of gymnosperm karyotypes).

Many (32.77%) of karyotypes have at least one chromosome with both 5S and 35S sites. Of these, 72.61% have both sites on the same chromosome arm and usually, the 35S site is distal

7 to the 5S rDNA site. The numbers of 5S and 35S rDNA loci are positively correlated, both in the

PGLS (phylogenetically based generalised least squares (PGLS) algorithm) test (p<0.005) considering phylogenetic relationships across taxa, and in the non‐PGLS test (rho=0.338, p<0.0001) which includes the whole dataset.

Finally, where there are multiple data on rDNA number and distribution for the same species

(by the same or different research groups), 50.9% (113 karyotypes) had the same results, whilst the remainder, reported different results. A summary of the cases of variation or inconsistency of results with multiple sampling at the species level is presented in Table 3.

L‐type and S‐type organisation of rDNA

There are many fewer species recorded with an L‐type organisation of rDNA (4.21% of karyotypes) compared with an S‐type of organisation (86.57% of karyotypes) which is typical for both angiosperms and gymnosperms. The later organisation is reconstructed to be ancestral for land plants. From FISH images alone (overlapping 5S and 35S rDNA signals), only a few species (9.22%) appear to have the L‐type arrangement somewhere in the karyotype.

However, it is difficult to be confident that the rDNA sequences are indeed linked, and to be certain molecular evidence (i.e. sequencing data confirming the proximity and joint organisation of both genes) would be needed. With molecular evidence, there are no records for L‐type organisation in monocots, and only 5.52% of eudicots show this arrangement.

Amongst gymnosperms there is a higher proportion of L‐type organisation (11.83%), but again, the data are dominated by Pinaceae. Species with L‐type arrangement of rDNA have terminal locations of rDNA in most cases (82.14%). Species with L‐type and S‐type arrangements of 35S and 5S rRNA genes do not have significantly different numbers of rDNA loci.

8

Genome size, rDNA, ploidy level and life cycle

There is no correlation between the number of 5S loci or 35S rDNA loci and chromosome number (2n) but there is a significant positive relationship with ploidy level (for 5S, rho=0.496, p<0.0001; for 35S, rho=0.331, p<0.0001, see Fig. 3). There is a significant but small positive correlation between genome size and number of 5S rDNA signals (rho=0.200, p<0.0001) and

35S rDNA signals (rho=0.202, p<0.0001), which remains when phylogenetic relationships amongst taxa are considered using PGLS (5S, p=0.005; 35S, p=0.02).

Over the entire dataset, there is a small but significant reduction in 35S rDNA loci number per monoploid genome (number of loci/ploidy level) with ploidy level (p<0.0001, rho=‐0.237; see

Figure 4). This trend remains apparent when analysing monocots (p<0.0001, rho=‐0.287) and eudicots (p<0.0001, rho=‐0.177) separately. Similarly for 5S rDNA there is a small but significant reduction in number of 5S sites per monoploid genome with increasing ploidy level (p<0.0001, rho=‐0.120), although that reduction is less pronounced than for 35S rDNA (Figure 4). Table 4 lists genera with ploidy level variation in the database. Loci number reduction with increasing ploidy level is observed in all these genera and for both 5S and 35S loci. Nevertheless, in those genera with the largest range in ploidy levels (e.g. Saccharum and Fragaria, Table 4) the loss of

35S and 5S rDNA loci per monoploid genome is more pronounced (although in Fragaria, as in some other genera presented in Table 4 that trend is insignificant, probably due to small sample sizes). Within a genus, in most groups of angiosperms and in gymnosperms analysed,

5S rDNA locus number is more constant than 35S rDNA locus number. Of the species with linked arrangement of 5S and 35S, only genus Artemisia had taxa at several ploidy levels. In this genus, there is significant locus reduction per monoploid genome with ploidy level (R2=0.98, p=0.009).

9

When phylogenetic relationships amongst taxa are considered there is a significant reduction in the number of 5S (PGLS, p=0.005) and 35S (PGLS, p<0.0001) rDNA loci per chromosome with increasing chromosome number. The parallel test with ploidy level could not be performed as all taxa included in the phylogenetic reconstruction were diploid.

An analysis of the relationship between the abundance of rDNA loci in karyotypes and the position of these loci in chromosomes showed that a “mixed” (i.e. terminal, interstitial or centromeric positions are found for a given rDNA) organisation was associated with large numbers of loci (Kruskal‐Wallis test, p<0.0001 for both rDNAs in the whole dataset and considering eudicots, monocots and gymnosperms separately).

We tested the relationship between number of 5S and 35S rDNA loci and life cycle, but no significant differences were found (Kruskal‐Wallis test p=0.202 for 35S and p=0.09 for 5S in the whole dataset).

Discussion

Plant family representation in the database

Between the first and second release of the Plant rDNA Database, we increased species number by more than 50% and near doubled the number of karyotypes included (Garcia et al.,

2014). This dataset, compiled for the present study, increases the dataset by 16.48% (excluding duplicates), and includes data from a further 83 new publications. All this data will be included in a third release of the Plant rDNA Database (Garcia et al., in preparation). The most extensively represented families include crops, including Poaceae (19.29%), Fabaceae (11.97%) and Asteraceae (11.83%), followed closely by Brassicaceae and Solanaceae. Of the many families that are missing in this analysis, perhaps the most surprising is an absence of data from

10 early diverging angiosperms (the most basal angiosperms or ANITA grade –made up of

Amborella, Nymphaeales and Astrobaileyales). There are also too few data for mosses and ferns, and none for algae. However, a very recent study (Rosato et al., 2016), published after the compilation analysed in this paper, provides information on rDNA organisation, locus and copy number of a wide range of the three earliest land plant lineages Marchantiophyta

(liverworts), Bryophyta (mosses), and Anthocerotophyta (hornworts).

The distribution, evolution and activity of rDNA units

The results show that (1) the typical plant karyotype has more 35S rDNA loci than 5S rDNA loci and (2) 35S rDNA loci are mostly terminal to the chromosome, particularly in species with an L‐ type arrangement of rDNA and in species with a single 35S rDNA locus. In addition, the ancestral location of 35S rDNA is reconstructed as being terminal (Figure 2). We also show that

(3) 5S rDNA loci are predominantly interstitial or centromeric, a trend that is most apparent in species with a single locus. This location is also reconstructed as ancestral to seed plants

(Figure 2). However, terminal 35S rDNA or interstitial/centromeric 5S rDNA are not essential for their functionality, since many species occur with a single locus in other chromosomal locations. Results also show that (4) mixed position of 5S and 35S rDNAs in a given karyotype are related with a high numbers of loci in the karyotype. This could be caused by the activity of mobile genetic elements acting to disperse rDNA around the genomes; in this regard, fragments of ribosomal RNA genes have been found in some transposable elements (Kapitonov and Jurka, 2003).

A tendency to be closer to the chromosome ends had already been noted for 35S rDNA genes or nucleolar organiser regions (NORs) by Lima de Faria (1976) in his chromosome field hypothesis. It has also been suggested that terminal positions of 35S rDNA loci might facilitate

11 higher frequencies of interlocus homogenization than is found in interstitial or pericentromeric loci (Skalicka et al., 2003, Cronn et al., 1996; Fulneček et al., 2002; Lan and Albert, 2011;

Fukushima et al., 2011). If so, a predominant terminal location of 35S rDNA may have implications for the typical rates of evolution of rDNA arrays in plants, and selection may act favourably on individuals with homogenised arrays (Eikbush and Eikbush, 2007). Potentially, high transcriptional activity of 35S rDNA loci can turn the locus into a fragile site on the chromosome (Kobayashi, 2011), and perhaps terminal 35S rDNA fragile sites are less deleterious than interstitial ones. Furthermore, centromeric domains are frequently heterochromatic (e.g. Arabidopsis thaliana, Pontes et al., 2006). It is possible that heterochromatisation process associated with centromeres negatively impact gene expression in these regions in some species. In this context, the 5S locus might be more prone to heterochromatinisation than the 35S locus due to its quite frequent centromeric position. In

Arabidopsis 5S loci harbour heterochromatic epigenetic marks including those of constitutive heterochromatin (Vaillant et al., 2007).

Relationships between 5S, 35 rDNA loci and genome size and intraspecific variation of 5S and

35S rDNA loci

Almost one third of the karyotypes (32.77%) display at least one chromosome carrying both rDNAs, and in 72.61% of cases they are found on the same chromosome arm. Examples of this include several species of Brassicaceae (Ali et al., 2005; Hasterok et al., 2006), Fabaceae

(Robledo and Seijoo, 2010), Linum (Muravenko et al., 2004, 2009), Coffea (Hamon et al., 2009),

Festuca (Harper et al., 2004; Loureiro et al., 2007) and Avena (Badaeva et al., 2011). The finding that 5S and 35S rDNA occur on the same chromosome arm more often than by random

12 distribution is consistent with Roa and Guerra (2015) findings that the association occurred multiple times in evolution, although was rarely conserved across a genus.

The number of 5S and 35S loci is positively correlated and a high proportion of species have the same number of 5S and 35S sites (33.16%). It has been proposed that concerted evolution of copy numbers of 5S and 45S genes in human and mouse genomes maintains stoichiometric ratios of 5S and 45S gene transcripts (Gibbons et al., 2015). It is easy to envisage that a stoichiometric balance of transcripts can be readily with linked arrangement (L‐type) of 5S and

35S units, since array homogenisation likely acts to maintain functionality and copy numbers of both unit types (Garcia et al., 2012). But an L‐type arrangement of rRNA genes has only been shown at the sequence level in a few genera of the family Asteraceae in eudicots (Garcia et al.,

2009a, 2010) and in several groups of gymnosperms (Galián et al., 2012; Garcia et al., 2013).

No monocots with L‐type arrangement have been demonstrated as far as we are aware.

Previously the L‐type organisation has been reported in a liverwort and a moss (Sone et al.,

1999), in yeast (Rubin and Sulston, 1973) and bacteria (Lafontaine and Tollervey, 2001). This occurrence has led to hypothesis that L‐type organisation was the ancestral condition (Wicke et al., 2011). However, the distribution of L‐type taxa is scattered across land plant phylogenies, as also demonstrated for fungi (Bergeron and Drouin, 2008), protists (Drouin and Tsang, 2012), fishes (Rebordinos et al., 2013), arthropods (Drouin et al., 1992), metazoans as a whole (Vierna et al., 2013). Even within family Asteraceae, where the L‐type is relatively common, it has evolved on multiple occasions (Garcia et al., 2010). The overwhelming prevalence of S‐type arrangement of rDNA in seed plants may be related to the fact that each rDNA unit type is transcribed by a different polymerase (RNA polymerase I for 35S and RNA polymerase III for 5S) in a different nuclear compartment, i.e. inside or outside the nucleolus (reviewed in Layat et al., 2012).

13

The number of 5S and 35S rDNA loci is positively correlated with genome size. Potentially the relationship between genome size and locus number reflects the accumulation of repeats, including rDNA, in species with large genome sizes and the multiplication of locus number associated with polyploidy (see also below). However, there are several exceptions to this trend, including at both extremes of plant genome size range. For example, in Brassicaceae, species with small genomes (2C≈ 0.5 pg) can have up to eight 35S loci while certain Liliaceae species with large genomes (2C≈ 125 pg) have only two 35S loci.

The dataset analysed here includes more than 200 species that have been assessed at least twice, by the same or different authors. We have found that about one half of these show different distributions of rDNAs (see Table 3), with the number of 5S rDNA sites remaining more stable within a species than the number of 35S rDNA sites, the latter sometimes being highly variable. Interindividual variability of rDNA sites could potentially be exploited as population genetics markers (Olanj et al., 2015).

Reduction of rDNA loci in polyploids rDNA locus number tends to increase with ploidy level (Figure 3), reflecting the multiplication of the whole genome. However, there is a reduction of number of rDNA loci per monoploid genome following polyploidy across monocots, eudicots, and the entire database (Figure 4 and

Table 4). This reduction in number of rDNA sites per monoploid genome remains significant, despite some exceptions and the huge variety of species with dramatically different chromosome numbers and genome sizes. The phenomenon has been reported previously for individual genera, e.g. Artemisia (Garcia et al., 2009b) and Nicotiana (Leitch et al., 2008; Renny‐

Byfield et al., 2012). Given that whole genome duplication (WGD) or polyploidy is a driving force in angiosperm evolution, we might expect that there would be high numbers of rDNA

14 loci, reflecting the numerous paelopolyploidy events in the evolutionary history of most angiosperm lineages (Renny‐Byfield and Wendel, 2014; Soltis et al., 2009). But (1) the high number of single locus karyotypes (including from chromosomally polypoid species) and (2) and the reduction in number of loci per monoploid genome with increasing ploidy level (Figure

4), provide evidence of diploidising processes occurring in polyploids over millions of years, reflecting the “wondrous cycles of polyploidy in plants” (Wendel, 2015).

Experimental procedures

Data colletion

We used data on number and chromosomal location of 5S and 35S rDNA loci available in the second release of the Plant rDNA database (Garcia et al., 2014) with additional data listed in

Supporting Information S1. For the statistical analyses, chromosomal positions of rDNA loci were reduced to four categories: “centromeric” (including pericentromeric), “interstitial”,

“terminal” (including subterminal) and “mixed” (when at least two of these categories were found in the same karyotype). The L‐type organisation of a given taxon is only considered if physical linkage of both rDNAs has also been demonstrated at the molecular level. Other criteria used for compiling the database are explained in detail in Garcia et al. (2012).

Statistical analyses and ancestral state reconstruction

We performed statistical test that considered the phylogenetic relationships between taxa to reconstruct ancestral conditions. However, restricted sequence data for tree construction resulted in a significant reduction in the number of taxa that could be analysed (from 2949 to

259 karyotypes, i.e. less than 10% of the dataset available). Thus we also statistically analysed the dataset as a whole, without using ancestral reconstruction approaches.

15

The Taxonomic Name Resolution Service v. 4.0 (http://tnrs.iplantcollaborative.org/) was used for correcting and standardizing plant names, in order to avoid ambiguous, superfluous or incorrect names resulting in mismatched or unwitting duplication of records (Boyle et al.,

2013). All data manipulations and statistical analyses were performed with RStudio, v.0.98.1078, a user interface for R (www.rstudio.com). Duplicates were removed from the dataset prior to all analyses. If a species had rDNA locus numbers that differed between or within publications, we treated each difference as a separate record. Analyses of regression and one‐way ANOVA and Shapiro–Wilk test for normality were performed. Since in most cases datasets were not normally distributed, we performed non‐parametric tests such as Spearman rank correlation, Kruskal‐Wallis test by ranks and multiple comparison test after Kruskal‐Wallis

(using pgirmess package for R). In addition, to analyse variation of chromosome number, genome size (2C), number and position of rDNA sites in a phylogenetic context, the phylogenetically based generalised least squares (PGLS) algorithm, as implemented in the nlme package for R (Version 3.1‐118), was used as in Pinheiro et al. (2015). Phylogenetic analysis of variance was used to test the difference in the number of rDNA sites with regards to chromosome position using the function phylANOVA in the phytools package for R (Revell,

2012). Packages ape and geiger were also required for the phylogenetic‐statistical analyses.

In order to perform the phylogenetic‐statistical analyses, a phylogenetic tree was constructed with sequences of matK (1066 bp), atpF‐atpH (297 bp) and rbcL (1240 bp) chloroplastic regions

(total: 2603 bp), downloaded from GenBank and listed in Supporting Information Table S2. The resulting tree included 259 plant genera, of which two were bryophytes (0.8%), three were ferns (1.16%), 14 were gymnosperms (5.41%), 56 were monocots (21.62%) and 184 (71.01%) were eudicots (i.e. 92.66% of angiosperms). All taxa included in these analyses were chromosomally diploid. In ancestral character reconstructions one species per genus was used in most cases (although sometimes sequences from different species had to be used to

16 represent a genus) and the analysis conducted using modal values for chromosome numbers

(2n), 5S and 35S rDNAs numbers and positions.

The three sequence matrices obtained with the three molecular markers were manually edited and concatenated (Supporting Information Data S3) with Mesquite v. 3.02 (Maddison and

Maddison 2015). The phylogenetic analyses were performed in the CIPRES Science Gateway

(https://www.phylo.org/). Bayesian Inference phylogenetic analysis was performed in MrBayes v. 3.2.6 (Ronquist and Huelsenbeck, 2003) using the GTR+I+G model previously determined from jModeltest v. 2.1.6 (Darriba et al., 2012) under the Akaike information criterion (Akaike,

1979). The Markov chain Monte Carlo (MCMC) sampling approach was used to calculate posterior probabilities (PPs). Four consecutive MCMC computations were run for 100,000,000 generations, with tree sampling every 10,000 generations. The first 25% tree samples were discarded as the burn‐in period. PPs were estimated through the construction of a 50% majority rule consensus tree.

The ancestral character reconstructions were performed with Mesquite v. 3.02. The 50% majority rule consensus tree resulting from Bayesian Inference analysis was used as the input tree file. The number, position of each rDNA sites and 2n chromosome numbers were transformed into categorical data (discrete and ordered). Ancestral states using the “trace character history” function were then inferred using maximum likelihood under the Mk1 model, in which all changes are equally probable, following the same approach as Vaio et al.

(2013).

Acknowledgments

This work was supported by the Dirección General de Investigación Científica y Técnica, government of Spain (CGL2013‐49097‐C2‐2‐P), the Generalitat de Catalunya, government of

17

Catalonia ("Ajuts a grups de recerca consolidats", 2014SGR514), the IRBI0 (“Institut de Recerca en Biodiversitat”), the Czech Science Foundation (P506‐16‐02149J) and NERC (UK). SG benefits from a “Ramón y Cajal” contract from the government of Spain. Professors Alfredo Ruiz (UAB) and Miguel Ángel Canela (IESE) are acknowledged for their advice on statistics. Daniel Vitales

(IBB‐CSIC) and Maité Guignard (Queen Mary University of London) are thanked for their advice on phylogenetic inference and Ugo d’Ambrosio (IBB‐CSIC) for compiling additional data published until December 2015 for the Plant rDNA database.

18

Short legends for Supporting Information

Table S1 Data from July 2013 to December 2015 added to the 2nd release of the Plant rDNA

Database (June 2013) for the analyses.

Table S2 List of matK, atpF‐atpH and rbcL sequences with accession numbers from GenBank used for phylogenetic tree construction for ancestral states reconstruction of 2n, positions of

5S and position of 35 rDNA loci in chromosomes.

Data S3 Concatenated sequence matrix (matK + atpF‐atpH + rbcL) used for phylogenetic tree construction for ancestral states reconstruction of 2n, positions of 5S and position of 35 rDNA loci in chromosomes.

Figure S4 Phylogenetic tree used for the statistical analyses and ancestral states reconstruction.

Numbers above branches are PPs.

Table S5 Additional information on the distribution of 5S and 35S rDNA loci contained in the

Plant rDNA database (second Release) and in Supporting Information S1.

19

References

Akaike, H. (1979) A Bayesian extension of the minimum AIC procedure of autoregressive model fitting. Biometrika, 66, 237‐242.

Ali, H.B.M., Lysak, M.A. and Schubert, I. (2005) Chromosomal localization of rDNA in the

Brassicaceae. Genome, 48, 341‐346.

Angiosperm Phylogeny Group, The. (2016) An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG IV. Bot. J. Linn. Soc., 181, 1‐20.

Badaeva, E.D., Shelukhina, O.Y., Dedkova, S., Loskutov, I.G. and Pukhalskyi, V.A. (2011)

Comparative cytogenetic analysis of hexaploid Avena L. species. Russ. J. Genet., 47, 691‐702.

Bergeron, J. and Drouin, G. (2008) The evolution of 5S ribosomal RNA genes linked to the rDNA units of fungal species. Curr. Genet., 54, 123‐131.

Christenhusz, M., Reveal, J., Farjon, A., Gardner, M.F., Mill, R.R. and Chase, M.W. (2011) A new classification and linear sequence of extant gymnosperms. Phytotaxa, 19, 55‐70.

Cronn, R.C., Zhao, X., Paterson, A. and Wendell, J.F. (1996) Polymorphism and concerted evolution in a tandemly repeated gene family: 5S ribosomal DNA in diploid and allopolyploid cottons. J. Mol. Evol., 42, 685‐705.

Darriba, D., Taboada, G.L., Doallo, R. and Posada, D. (2012) jModelTest 2: more models, new heuristics and parallel computing. Nat. Methods, 9, 772‐772.

Drouin, G. and Tsang, C. (2012) 5S rRNA gene arrangements in protists: a case of nonadaptive evolution. J. Mol. Evol., 74, 342‐351.

20

Drouin, G., Sévigny, J.M., McLaren, I.A., Hofman, J.D. and Doolittle, W.F. (1992) Variable arrangement of 5S ribosomal genes within the ribosomal DNA repeats of arthropods. Mol. Biol.

Evol., 9, 826‐835.

Eickbush, T.H. and Eickbush, D.G. (2007) Finely orchestrated movements: evolution of the ribosomal RNA genes. Genetics, 1752, 477‐485.

Elder, J.F. Jr. and Turner, B.J. (1995) Concerted evolution of repetitive DNA sequences in eukaryotes. Q. Rev. Biol., 70, 297‐320.

Fukushima, K., Imamura, K., Nagano, K. and Hoshi, Y. (2011) Contrasting patterns of the 5S and 45S rDNA evolutions in the Byblis liniflora complex Byblidaceae. J. Plant Res., 124, 231‐244.

Fulneček, J., Matyášek, R. and Kovařík, A. (2002) Distribution of 5‐methylcytosine residues in

5S rRNA genes in Arabidopsis thaliana and Secale cereale. Mol. Genet. Genom., 268, 510‐517.

Galián, J.A., Rosato, M. and Rosselló, J.A. (2012) Early evolutionary colocalization of the nuclear ribosomal 5S and 45S gene families in seed plants: evidence from the living fossil gymnosperm Ginkgo biloba. Heredity, 108, 640‐646.

Garcia, S., Lim, K.Y., Chester, M., Garnatje, T., Pellicer, J., Vallès, J., Leitch, A.R. and Kovařík, A.

(2009a) Linkage of 35S and 5S rRNA genes in Artemisia (family Asteraceae): first evidence from angiosperms. Chromosoma, 118, 85‐97.

Garcia, S., Garnatje, T., Pellicer, J., McArthur, E.D., Siljak‐Yakovlev, S. and Vallès, J. (2009b)

Ribosomal DNA, heterochromatin, and correlation with genome size in diploid and polyploid

North American endemic sagebrushes (Artemisia, Asteraceae). Genome, 52, 1012‐1024.

21

Garcia, S., Panero, J.L., Siroky, J. and Kovařík, A. (2010) Repeated reunions and splits feature the highly dynamic evolution of 5S and 35S ribosomal RNA genes rDNA in the Asteraceae family. BMC Plant Biol, 10, 176.

Garcia, S., Garnatje, T. and Kovařík, A. (2012) Plant rDNA database: ribosomal DNA loci information goes online. Chromosoma, 121, 389‐394.

Garcia, S. and Kovařík, A. (2013) Dancing together and separate again: gymnosperms exhibit frequent changes of fundamental 5S and 35S rRNA gene rDNA organisation. Heredity, 111, 23‐

33.

Garcia, S., Gálvez, F., Gras, A., Kovařík, A. and Garnatje, T. (2014) Plant rDNA database: update and new features. Database, 2014, bau063.

Gibbons, J.G., Branco, A.T., Godinho, S.A., Yu, S. and Lemos, B. (2015) Concerted copy number variation balances ribosomal DNA dosage in human and mouse genomes. Proc. Natl Acad.

Sci. USA, 112, 2485‐2490.

Goffinet, B. and Buck, W.R. (2004) Systematics of the Bryophyta mosses: from molecules to a revised classification. Monog. Sys. Bot., 98, 205‐239.

Hamon, P., Siljak‐Yakovlev, S., Srisuwan, S., Robin, O., Poncet, V., Hamon, S. and De Kochko,

A. (2009) Physical mapping of rDNA and heterochromatin in chromosomes of 16 Coffea species: a revised view of species differentiation. Chromosome Res., 17, 291‐304.

Harper, J.A., Thomas, I.D., Lovatt, J.A. and Thomas, H.M. (2004) Physical mapping of rDNA sites in possible diploid progenitors of polyploid Festuca species. Plant Syst. Evol., 245, 163‐168.

Harpke, D. and Peterson, A. (2006) Non‐concerted ITS evolution in Mammillaria (Cactaceae).

Mol. Phyl. Evol., 41, 579‐593.

22

Hasterok, R., Wolny, E., Hosiawa, M., Kowalczyk, M., Kulak‐Ksiazczyk, S., Ksiazczyk, T.,

Heneen, W.K. and Maluszynska, J. (2006) Comparative analysis of rDNA distribution in chromosomes of various species of Brassicaceae. Ann. Bot., 97, 205‐216.

Hemleben, V. and Zentgraf, U. (1994) Structural organization and regulation of transcription by

RNA polymerase I of plant nuclear ribosomal RNA genes. In Plant Promoters and Transcription

Factors (Nover, L., ed.). Springer Berlin Heidelberg, pp. 3‐24.

Heslop‐Harrison, J.S. and Schwarzacher, T. (2011) Organisation of the plant genome in chromosomes. Plant J., 66, 18‐33.

Jiang, J. and Gill, B. S. (1994) Nonisotopic in situ hybridization and plant genome mapping: the first 10 years. Genome, 37, 717‐725.

Kapitonov, V.V. and Jurka, J. (2003) A novel class of SINE elements derived from 5S rRNA. Mol.

Biol. Evol., 20, 694‐702.

Kobayashi, T. (2011) Regulation of ribosomal RNA gene copy number and its role in modulating genome integrity and evolutionary adaptability in yeast. Cell. Mol. Life Sci., 68, 1395‐1403.

Lafontaine, D.L. and Tollervey, D. (2001) The function and synthesis of ribosomes. Nat. Rev.

Mol. Cell Biol., 2, 514‐520.

Lan, T. and Albert, V.A. (2011) Dynamic distribution patterns of ribosomal DNA and chromosomal evolution in Paphiopedilum, a lady's slipper orchid. BMC Plant Biol., 11, 126.

Layat, E., Sáez‐Vásquez, J. and Tourmente, S. (2012) Regulation of Pol I‐transcribed 45S rDNA and Pol III‐transcribed 5S rDNA in Arabidopsis. Plant Cell Physiol.,53, 267‐276.

23

Leitch, I.J., Hanson, L., Lim, K.Y., Kovařík, A., Chase, M.W., Clarkson, J.J. and Leitch, A.R. (2008)

The ups and downs of genome size evolution in polyploid species of Nicotiana

(Solanaceae). Ann. Bot., 101, 805‐814.

Lima‐de‐Faria, A. (1976) The chromosome field. Hereditas, 83, 1‐22.

Loureiro, J., Kopecký, D., Castro, S., Santos, C. and Silveira, P. (2007) Flow cytometric and cytogenetic analyses of Iberian Peninsula Festuca spp. Plant Syst. Evol., 269, 89‐105.

Maddison, W.P. and Maddison, D.R. (2015) Mesquite: a modular system for evolutionary analysis. Version 2.75. 2011. URL http://mesquiteproject.org.

Martins, C. and Wasko, A.P. (2004) Organization and evolution of 5S ribosomal DNA in the fish genome. In Focus on Genome Research (Williams CR, ed.). Nova Science Publishers Inc., pp.

335‐363.

Matyásek, R., Fulneček, J., Lim, K.Y., Leitch, A.R. and Kovařík, A. (2002) Evolution of 5S rDNA unit arrays in the plant genus Nicotiana (Solanaceae). Genome, 45, 556‐562.

Mizouchi, H., Marasek, A. and Okazaki, K. (2007) Molecular cloning of Tulipa fosteriana rDNA and subsequent FISH analysis yields cytogenetic organization of 5S rDNA and 45S rDNA in T. gesneriana and T. fosteriana. Euphytica, 155, 235‐248.

Muravenko, O.V., Amosova, A.V., Samatadze, T.E., Semenova, O.Y., Nosova, I.V., Popov, K.V.,

Shostak, N.G., Zoshchuk, S.A. and Zelenin, A.V. (2004) Chromosome localization of 5S and 45S ribosomal DNA in the genomes of Linum L. species of the section Linum Syn. Protolinum and

Adenolinum. Russ. J. Genet., 40, 193‐196.

Muravenko, O.V., Yurkevich, O.Y., Bolsheva, N.L., Samatadze, T.E., Nosova, I.V., Zelenina, D.A.,

Volkov, A.A., Popov, K.V. and Zelenin, A.V. (2009) Comparison of genomes of eight species of

24 sections Linum and Adenolinum from the genus Linum based on chromosome banding, molecular markers and RAPD analysis. Genetica, 135, 245‐255.

Nei, M. and Rooney, A.P. (2005) Concerted and birth‐and‐death evolution of multigene families. Ann. Rev. Genet., 39, 121.

Olanj, N., Garnatje, T., Sonboli, A., Vallès, J. and Garcia, S. (2015) The striking and unexpected cytogenetic diversity of genus Tanacetum L. Asteraceae: a cytometric and fluorescent in situ hybridisation study of Iranian taxa. BMC Plant Biol., 15, 174.

Pinheiro, F., Cafasso, D., Cozzolino, S. and Scopece, G. (2015) Transitions between self‐ compatibility and self‐incompatibility and the evolution of reproductive isolation in the large and diverse tropical genus Dendrobium (Orchidaceae). Ann. Bot., 116, 457‐467.

Pontes, O., Li, C.F., Costa Nunes, P., Haag, J., Ream, T., Vitins, A., Jacobsen, S.E. and Pikaard,

C.S. (2006) The Arabidopsis chromatin‐modifying nuclear siRNA pathway involves a nucleolar

RNA processing center. Cell, 126, 79‐92.

Rebordinos, L., Cross, I. and Merlo, A. (2013) High evolutionary dynamism in 5S rDNA of fish: state of the art. Cytogenet. Genome Res., 141, 103‐113.

Renny‐Byfield, S. and Wendel, J.F. (2014) Doubling down on genomes: polyploidy and crop plants. Am. J. Bot., 101, 1711‐1725.

Renny‐Byfield, S., Kovařík, A., Chester, M., Nichols, R.A., Macas, J., Novák, P. and Leitch, A.R.

(2012) Independent, rapid and targeted loss of highly repetitive DNA in natural and synthetic allopolyploids of Nicotiana tabacum. PloS ONE, 7, e36963.

Revell, L.J. (2012) Phytools: an R package for phylogenetic comparative biology and other things. Method. Ecol. Evol., 3, 217‐223.

25

Roa, F. and Guerra, M. (2015) Non‐random distribution of 5S rDNA sites and its association with 45S rDNA in plant chromosomes. Cytogenet. Genome Res., 146, 243‐249.

Robledo, G. and Seijo, G. (2010) Species relationships among the wild B genome of Arachis species section Arachis based on FISH mapping of rDNA loci and heterochromatin detection: a new proposal for genome arrangement. Theor. Appl. Genet., 121, 1033‐1046.

Ronquist, F. and Huelsenbeck, J.P. (2003) MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics, 19, 1572‐1574.

Rooney, A.P and Ward, T.J. (2005) Evolution of a large ribosomal RNA multigene family in filamentous fungi: birth and death of a concerted evolution paradigm. Proc. Natl Acad.

Sci. USA, 102, 5084‐5089.

Rosato, M., Kovařík, A., Garilleti, R. and Rosselló, J.A. (2016) Conserved organisation of 45S rDNA sites and rDNA gene copy number among major clades of early Land Plants. PloS

ONE, 11, e0162544.

Rubin, G.M. and Sulston, J.E. (1973) Physical linkage of the 5 S cistrons to the 18 S and 28 S ribosomal RNA cistrons in Saccharomyces cerevisiae. J. Mol. Biol., 79, 521‐530.

Seitz, U. and Seitz, U. (1979) The molecular weight of rRNA precursor molecules and their processing in higher plant cells. Z. Naturforschung C, 34, 253‐258.

Skalická, K., Lim, K.Y., Matyášek, R., Koukalová, B., Leitch, A.R. and Kovařík, A. (2003) Rapid evolution of parental rDNA in a synthetic tobacco allotetraploid line. Am. J. Bot., 90, 988‐996.

Smith, A.R., Pryer, K.M., Schuettpelz, E., Korall, P., Schneider, H. and Wolf, P.G. (2006) A classification for extant ferns. Taxon, 55, 705‐731.

26

Soltis, D.E., Albert, V.A., Leebens‐Mack, J., Bell, C.D., Paterson, A.H., Zheng, C., Sankoff, D., dePamphilis, C.W., Wall, P.K. and Soltis, P.S. (2009) Polyploidy and angiosperm diversification.

Am. J. Bot., 96, 336‐348.

Sone, T., Fujisawa, M., Takenaka, M., Nakagawa, S., Yamaoka, S., Sakaida, M., Nishiyama, R.,

Katsuyuki, T.Y., Ohmido, N., Fukui, K., Fukuzawa, H. and Ohyama, K. (1999) Bryophyte 5S rDNA was inserted into 45S rDNA repeat units after the divergence from higher land plants. Plant

Mol. Biol., 41, 679‐685.

Vaillant, I., Tutois, S., Cuvillier, C., Schubert, I. and Tourmente, S. (2007) Regulation of

Arabidopsis thaliana 5S rRNA genes. Plant Cell Physiol., 48, 745‐752.

Vaio, M., Gardner, A., Emshwiller, E, and Guerra, M. (2013) Molecular phylogeny and chromosome evolution among the creeping herbaceous Oxalis species of sections Corniculatae and Ripariae (Oxalidaceae). Mol. Phylogenet Evol., 68, 199‐211.

Vierna, J., Wehner, S., zu Siederdissen, C.H., Martínez‐Lage, A. and Marz, M. (2013)

Systematic analysis and evolution of 5S ribosomal DNA in metazoans. Heredity, 1115, 410‐421.

Wang, W., Ma, L., Becher, H., Garcia, S., Kovaříkova, A., Leitch, I.J., Leitch, A.R. and Kovařík, A.

(2016) Astonishing 35S rDNA diversity in the gymnosperm species Cycas revoluta

Thunb. Chromosoma, 125, 683‐699.

Weitemier, K., Straub, S.C., Fishbein, M. and Liston, A. (2015) Intragenomic polymorphisms among high‐copy loci: a genus‐wide study of nuclear ribosomal DNA in Asclepias

(Apocynaceae). PeerJ, 3, e718.

Wendel, J. (2015) The wondrous cycles of polyploidy in plants. Am. J. Bot., 102, 1753‐1756.

27

Wicke, S., Costa, A., Muñoz, J. and Quandt, D. (2011) Restless 5S: The re‐arrangement s and evolution of the nuclear ribosomal DNA in land plants. Mol. Phylogenet. Evol., 61, 321‐332.

28

Median; range Median; range Median; range

Families Genera Species Karyotypes Range in ploidy Polyploids (%) 2n 5S sites 35S sites

Monocots 18 (20.93%) 91 (23.08%) 537 (29.98%) 1037 2‐20 40.89 24; 4‐180 3; 1‐71 4; 1‐45

Eudicots 59 (68.60%) 284 (72.08%) 1172 (65.44%) 1780 2‐16 22.92 22; 4‐136 2; 1‐34 4; 1‐42

Angiosperms 77(89.53%) 375(95.16%) 1709 (95.42%) 2817 2‐20 29.53 24; 4‐180 2; 1‐71 4; 1‐45

Gymnosperms 6 (6.98%) 16 (4.06%) 76 (4.24%) 120 2‐4* 1.66 24; 16‐48 4; 1‐24 10; 2‐38

Bryophytes 2 (2.32%) 2 (0.51%) 3(0.17%) 6 2 ‐ 9; 9‐18 4; 2‐8 2; 1‐10

Pteridophytes 3 (3.49%) 3 (0.76%) 6 (0.34%) 6 1‐2 ‐ 22; 20‐78 ‐ 7; 2‐10

Global 86 394 1791 2949 1‐20 28.28 24; 4‐180 3; 1‐71 4; 1‐45

Table 1 Summary of the data analysed from the Plant rDNA Database (release 2.0) and the present partial update with data until December 2015. (*)

Only two gymnosperm species from genus Juniperus are tetraploid.

29

Table 2 Proportion of karyotypes with a single 5S or 35S rDNA locus in the Plant rDNA Database

(second release) and Supporting Information Table S1.

All ploidy levels Only diploid level

5S rDNA 35S rDNA 5S rDNA 35S rDNA

Eudicots 48.88% 31.29% 57.88% 36.89%

Monocots 26.32% 23.43% 38.69% 34.91%

Gymnosperms 38.33% 17.50% 38.33% 17.50%

Non‐Pinaceae gymnosperms 50% 45.24% 50% 45.24%

Global 40.35% 27.94% 51.38% 35.16%

30

Table 3 Taxa showing examples of intraspecific variation (A, B, C) and constancy (D) in the number of rDNA loci. N. ind: number of individuals assessed; n. sources: number of publications contributing data for a given taxon. (1) L‐type Artemisia species, where numbers of 5S and 35S in are equal; (2) some individuals within a species with different chromosome number (+/‐ 2).

Ploidy

Taxa Family 2n level 5S 35S n. ind. n. sources

A. Examples of variation of both 5S and 35S rDNAs site

numbers in a species

Alstroemeria aurea Alstroemeriaceae 16 2 8, 12, 14 24, 26 3 2

Brassica napus Brassicaceae 38 4 8, 10, 12, 18 10, 11, 14, 16 13 6

Brassica rapa Brassicaceae 20 2 4, 6, 8, 9, 10 6, 10, 12 12 6

Phaseolus vulgaris Fabaceae 22 2 2, 4, 8 4, 5, 6, 8, 9, 12, 14, 16 14 7

Pinus nigra Pinaceae 24 2 2, 4 16, 18, 26, 22 7 3

Pinus sylvatica Pinaceae 24 2 2, 4 14, 16 4 4

Tulipa gesneriana Liliaceae 24 2 45, 56 11, 13 2 1

Zingeria biebersteiniana Poaceae 4 2 2, 4 2,4,6 4 2

B. Examples of variation of 5S rDNA

site number in a species

Arabidopsis thaliana Brassicaceae 10 2 4, 6, 10 4 4 3

Lens culinaris1 Fabaceae 14 2 2, 4 2 5 3

Musa acuminata Musaceae 22 2 4, 6, 8 2 5 3

C. Examples of variation of 35S rDNA

site number in a species

Brachypodium sylvaticum Poaceae 18 2 2 4, 5, 6 5 1

Brassica juncea Brassicaceae 36 4 10 12, 14, 16 7 3

Capsicum annuum Solanaceae 24 2 2 2, 4, 6 5 2

Fragaria vesca Rosaceae 14 2 2 4, 5, 6 12 2

Lotus japonicus Fabaceae 12 2 2 4, 6 4 3

Genus Larix Pinaceae 24 2 2 4, 5, 6 9 3

31

D. Examples of constancy of both 5S and 35S rDNAs site numbers in a species

Arachis duranensis Fabaceae 20 2 2 4 3 3

Artemisia absinthium Asteraceae 18 2 41 41 4 3

Brachypodium pinnatum Poaceae 28 4 4 4 4 1

Hydrangea aspera 342 2 2 4 4 1

Medicago sativa Fabaceae 16 2 4 2 6 2

Medicago sativa Fabaceae 32 4 8 4 3 1

Vicia narborensis Fabaceae 14 2 2 2 6 4

Genus Daucus Apiaceae 18, 20, 22, 44 2, 4 2 2 13 2

Genus Spondias Anacardiaceae 32 ? 2 2 6 1

32

Table 4 Genera presenting different ploidy levels in the database and the range of their rDNA site numbers. The R2 and p‐values refer to the linear regression

model between the average numbers of rDNA sites per monoploid genome vs. ploidy level for each genus, all slopes being negative. Significant values are

highlighted in bold. Shapiro‐Wilk test was performed to check for normality. N: sample size.

(*) Only one rDNA type was considered for this genus as its rDNA arrangement is L‐type, with both 5S and 35S rRNA genes linked in the same sites.

Ploidy level 2x 3x 4x 5x 6x 7x 8x 9x 10x 11x 12x 14x R2 and p‐value

35 35 35 35 35 35

Genus N 5S 35S 5S S 5S 35S 5S S 5S 35S 5S S 5S 35S 5S 35S 5S 35S 5S S 5S S 5S S 5S 35S

R2=0.98 Artemisia* 53 1‐9 6‐12 12 16‐18 p=0.009

R2=0.88 R2=0.138 Brachyscome 7 2 2 2 2 4 2‐4 6 2 8 2 9 p=0.536 p=0.016

R2=0.92 R2=0.412 Chenopodium 54 2‐4 2‐4 2‐6 2‐4 6‐8 2‐4 3 p=0.55 p=0.179

R2=0.54 R2=0.646 Eleocharis 30 2 4‐10 2‐4 2‐8 2 4 2‐4 4‐8 4 8‐10 9 p=0.102 p=0.152

33

R2=0.81 R2=0.75

Festuca 66 2‐6 2‐6 4‐8 2‐8 2‐8 4‐8 8 2 4‐11 2‐12 1 9

p=0.036 p=0.05

R2=0.61 R2=0.612 Fragaria 37 2 4‐6 4 12 6 16‐17 2 10 4‐6 12‐15 8 p=0.118 p=0.115

R2=0.86 Helictotricho R2=0.851 17 2‐6 4‐12 6‐9 6‐12 18 10 4 n p=0.252 p=0.24

R2=0.938 R2=0.75 Hordeum 51 1‐9 1‐12 3‐14 2‐24 4‐10 6‐18 p=0.159 p=0.333

R2=0.91

2‐ R2=0.656 2 Lepidium 13 2 2 2 2‐4 2 2 2 4 2 4 p=0.096 p=0.001

1

R2=0.85 Melampodiu R2=0.443 32 2 4 2‐4 4 4‐6 4‐6 6 m p=0.536 p=0.247

R2=0.885 R2=0.78 Rumex 12 2‐4 2‐4 4‐6 4 4 2 2‐8 4 p=0.05 7

34

p=0.113

R2=0.86 R2=0.847 Saccharum 16 4 4 6 8 6‐10 6‐10 8‐12 8‐12 10‐14 10 12 12 14 10 4 p=0.003 p=0.002

35

Figure legends

Figure 1 Distribution of the terminal (including subterminal), interstitial, centromeric (including pericentromeric) and mixed positions of 5S and 35S rDNA loci across eudicots, monocots and gymnosperms in the Plant rDNA database (second release) and Supporting Information Table

S1. The distribution in gymnosperms excluding Pinaceae is presented to show the bias produced by this overrepresented family.

Figure 2 Character state reconstruction of the ancestral 35S (left) and 5S (right) chromosome localization.

Figure 3 Number of 5S and 35S rDNA loci per ploidy level. Both rDNAs are positively and significantly correlated with ploidy level (for 5S, rho=0.496, p<0.0001; for 35S, rho=0.331, p<0.0001).

Figure 4 Number of 5S and 35S rDNA loci per monoploid genome across different ploidy levels.

There is a mild but significant loci loss with increasing ploidy (p<0.0001, rho=‐0.237 for 35S; p<0.0001, rho=0.120 for 5S).

36