<<

Plant James C. Schnable1 and Eric Lyons3,4

1Center for Science Innovation, University of Nebraska-Lincoln, Lincoln, NE, USA 2Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE, USA 3School of Plant Sciences, University of Arizona, Tucson, AZ, USA 4iPlant Collaborative, Bio5 Institute, University of Arizona, Tucson, AZ, USA

ABSTRACT

Identifying and characterizing plant paleopolyploidies is remains an ongoing area of investigation. Ancient whole duplications can be identified through whole genome comparisons using a combination of the data derived from genomic structure (e.g. syntenic dotplots) and evolutionary distances (e.g. synonymous mutation rates). These methods require large genomic and/or transcriptomic datasets, and our understanding identity, placement, and timing of these ancient events continues to change as new datasets become available. The images and texzt presented here represent our best estimates using the data available at this time, but, like many things, must be seen as tentative and subject to revision as new datasets and types of analyses become available.

Keywords: Comparative , Plant , Whole Genome Duplication, Plant Paleopolyploidy

INTRODUCTION

Figure 1. Phylogenetic tree of plant , predominantly those with published genome sequences with known ancient whole genome duplications marked. Tree is current as of April 2013.

1 EUDICOT HEXAPLOIDY Synonyms: Arabidopsis Gamm This hexaploidy (genome tripling) is shared by the core (the and ), and may be present in additional, basal eudicots, although it will not be possible to reach this conclusion until the of species from basal lineages are sequenced and assembled to the pseudomolecule level. This whole genome duplication was first identified as the most ancient of three whole genome duplication present in the genome of , and assigned the name ”gamma” by in 2003 by (Bowers et al., 2003). In this paper, hampered by the fact that only two plant genomes had yet been sequenced, and the ridiculously accelerated rate of base pair substitution in arabidopsis the authors concluded the gamma event was likely shared by both monocots and eudicots and could potentially be as old as the split between gymnosperms and flowering 300 million years ago. With the publication of the grape genome in 2007 (Jaillon et al., 2007) which has not experienced any duplications since the eudicot hexaploidy and doesn’t show the same acceleration of nucleotide substitutions, it became possible to conclude that the eudicot hexaploidy was NOT shared with monocots and was shared by all rosids. More recent work in the asterids (Cenci et al., 2010) indicate that this evolutionary successful eudicot clade shares the same ancient whole genome duplication. Based on patterns of fractionation the eudicot hexaploidy is believed to have been a two-step event. citeplyons2008value It is possible – although not confirmed – that the tetraploidy seen in columbine (WGD #10) comes from the tetraploid intermediate of this process.

2 ARABIDOPSIS ALPHA The alpha tetraploidy of arabidopsis was first given that name in (Bowers et al., 2003). It is shared by most or all of the crucifers (family Brassicaceae).

3 ARABIDOPSIS BETA As of yet not a single lineage has been identified in which the beta tetraploidy (naming conventions from (Bowers et al., 2003)) is the most recent whole genome duplication. Despite what the image above might indicate, this duplication is significantly older than arabidopsis alpha, however precise dating is difficult given the acceleration of rate in the arabidopsis lineage.

4 HEXAPLOIDY The first explanation the base Brassica genomes (B. rapa, B. oleracea, and B. nigra) may be comprised of three ancestral genomes that were Arabidopsis like in structure was based on mapping studies (Lagercrantz, 1998); but remained controversial (LUKENS et al., 2004). Studies based on comparative painting (Lysak et al., 2005) and BAC-sequencing (Yang et al., 2006) (Cheung et al., 2009) further established the Brassica triplication hypothesis, which was confirmed by the genome sequence of Brassica rapa citepwang2011genome. Ongoing studies are attempting to confirm if the Brassica hexaploidy event occurred in one or two steps, and the (allo) origin of the event(s). An analysis of fractionation patterns in Brassica rapa supports the two-step model for this hexaploidy (Tang et al., 2012).

5 POPLAR TETRAPLOIDY When the genome of poplar was released in 2006, researchers identified a new ancient whole genome duplication (Tuskan et al., 2006). Poplar retains around 8000 pairs of duplicated genes, which is an unexpectedly high rate of retention given the age of this duplication.

6 TETRAPLOIDY Occurred within the genus . Domesticated flax (L usitatissimim) and its close relative Linum bienne both share this event.

7 APPLE TETRAPLOIDY The genome paper of apple discussed an ancient whole genome duplication identified in that linage which is estimated to be ¿50 million years old (Velasco et al., 2010)

2/7 8 TETRAPLOIDY The relatively recent whole genome duplication in soybean was long suspected based a number of different forms of analysis, from chromosome number and early linkage mapping studies (Shoemaker et al., 1996), analysis of Ks peaks (Schlueter et al., 2004), phylogenies of individual gene families (Pfeil et al., 2005) and analyzing the fractionation of individual sequenced regions (Schlueter et al., 2008). As expected, when the genome of soybean was published in 2010, researchers did identify a recent whole genome duplication (peak ks=0.13, estimated age of divergence between whole genome duplicates 13 million years). The minimum age of the event has been fixed at five million years based on the divergence of Glycine species carrying the duplication (Doyle and Egan, 2010).

9 PANILIONOID TETRAPLOIDY Synonym: Legume tetraploidy ¡– note that this name is misleading as this duplication is not shared by many clades within the legumes Linkage mapping studies in soybean led to the hypothesis of a second, older polyploidy event in that lineage (Shoemaker et al., 1996). This hypothesis was corroborated by evidence from Ks studies (Blanc and Wolfe, 2004) (Schlueter et al., 2008) for an ancient polyploidy event in the Medicago truncatula genome. Phylogenomic studies (Pfeil et al., 2005) provided evidence that the soybean and Medicago Ks peaks were due to a single event that occurred in their common ancestor (also shared by Lotus). Subsequently the event was also shown to be shared by peanut (Arachis hypogaea), a member of the clade sister to the Glycine-Medicago-Lotus clade (Bertioli et al., 2009). Unpublished information suggests that this WGD is also found in lupin (Lupinus) in the clade sister to the Arachis-Glycine et al. clade. Phylogenomic studies in a caesalpinioid legume, Chamaecrista fasciculata showed that this species—and thus all caesalpinioid and mimosoid legumes—lacks this polyploidy event, indicating the duplication occurred at the base of (or within) the papilionoid subfamily (Cannon et al., 2010). It is unknown whether the “papilionoid WGD” occurred in the ancestor of all papilionoid legumes, because there is not yet any data for early diverging lineages within the subfamily. The “papilionoid” WGD is estimated to have occurred between 50-60 million years ago, early in the of the legume family.

10 COLUMBINE TETRAPLOIDY Determined from SynMap analysis (Lyons et al., 2008).

11 TETRAPLOIDY An analysis of conserved orthologous gene groups (COGs) and huge numbers of ESTs identified evidence of two ancient whole genome duplications shared by both monocots and eudicots (Jiao et al., 2011). The more recent of the two, placed at 192 million years ago, occurred after the split of gymnosperms (non-flowering seed plants) but is shared by all extant flowering plant species including trichopoda.

12 SEED PLANT TETRAPLOIDY The more ancient of the two events identified in (Jiao et al., 2011) is shared by all flowering plants as well as gymnosperms, but after the divergence from Selaginella, a basal .

13 MAIZE TETRAPLOIDY The suspicion that maize is an ancient polyploid can be traced back through at least a generation of maize geneticists, and finds its earlier roots in the large number of duplicate mutant loci found in maize, sometimes found in parallel orders along different . Perhaps the most famous of these are the pairs of duplicate regulators of anthocyanin biosynthesis: aleurone1 and Purple plant1. Brandon Gaut and John Doebley concluded maize was an allopolyploid in the late 1990s (Gaut and Doebley, 1997). While whether maize is an allo- or auto- polyploid has been argued back and forth over the years, the polyploidy question was settled more than a decade before the publication of the first draft of the maize genome.

3/7 The two subgenomes of maize are estimated to have diverged 12 million years ago (Swigonovˇ a´ et al., 2004). If maize is an autopolyploid, the two genomes also merged into a single genome 12 million years ago, but if maize is an allopolyploid the two genomes could have evolved as separate species for several million years before the wide cross that created the polyploid ancestor of modern maize. In either case, the two ancestral genomes of maize have been contained in the same nucleus for at least five million years (Swigonovˇ a´ et al., 2004). By comparing the organization of the maize genome to other grass species it is possible to reconstruct the ten pairs of homeologous chromosomes present in that first polyploid maize ancestor (Wei et al., 2007). By measuring biased gene loss (fractionation) and biased expression of duplicate pairs, it is possible to assign one copy of each ancestral chromosome pair in maize to a parental subgenome, either maize1 or maize2 (Schnable et al., 2011).

14 GRASS TETRAPLOIDY Synonyms: Rho All grass species sequenced to date share a common whole genome duplication. This duplication is estimated to have occurred between 70-90 million years ago, and homeologs from this duplication have a modal synonymous substitution rate 0.9 (Paterson et al., 2004). While the fact that a significant portion of the rice is covered by duplicate syntenic region has been known for many years, the reason for this duplication was disputed, ranging from multiple segmental duplications to one – or more – whole genome duplications. There were two reasons for this confusion. Only 65.7% of the rice genome is covered by syntenic duplicate regions (Yu et al., 2005).The duplicate homeologous region located on chromosomes 11 and 12 of rice and 5 and 7 of sorghum have continued to experience gene conversion, so this pair duplicate regions appears much younger than the rest of the duplicate regions in grass genomes (Wang et al., 2011).

15 GRASS TETRAPLOIDY B Synonyms: Sigma By using the genomes of both rice and sorghum to reconstruct a gene order present in duplicate segments prior to the shared tetraploidy in all grasses, Tang and coworkers were able to identify an even more ancient whole genome duplication in the monocot lineage (Tang et al., 2010). They estimated age of 130 million years and a median synonymous substitution rate between gene pairs of 1.7 for this duplication. However, the authors caution that these estimates could be very off as synonymous substitution rates of these gene pairs are close to saturation and there are potentially confounding effects from mutation rate variations between multiple lineages. Comparisons to the banana genome placed sigma in the lineage leading to grasses after the split between Poales (contains grasses) and Zingiberales (contains banana) (D’Hont et al., 2012).

16 MONOCOT TETRAPLOIDY By comparing sigma duplicate regions in the grasses to the grape genome, the same research group citeptang2010angiosperm found that at least in some cases eight sigma regions showed detectable to a region of the grape genome. This suggests there were in fact two whole genome duplications in the monocot lineage following the monocot-eudicot split but before the pre-grass duplication shared by all grasses.

17 BANANA BETA The publication of the banana genome reported two whole genome duplications with roughly similar rates of divergence indicating that the two events occurred at roughly the same time (D’Hont et al., 2012). This is the older of the two events and is dated to approximately 65 million years ago.

18 BANANA ALPHA Evidence for at least one whole genome duplication in the banana lineage was reported prior to the publication of the banana genome (Lescot et al., 2008) and is the more recent of the two tetraploidies identified in the analysis of the banana genome (D’Hont et al., 2012). This duplication where the duplication was estimated to have occurred roughly 61 million years ago (Lescot et al., 2008).

4/7 19 COTTON WGD The cotton lineage whole genome duplication has been estimated to have occurred between 13 and 20 million years ago (Wang et al., 2012).

20 SOLANUM HEXAPLOIDY The sequencing of the tomato genome revealed a hexaploidy shared by both tomato and potato and estimated to have occurred between 52-92 million years ago (Consortium et al., 2012). The more recent end of that range the whole genome duplication is consistent with the placement shown in figure 1 above. At the more ancient end of the range this triplication would also be shared by Monkey Flower and a large proportion of all the asterids.

21 DATE PALM WGD Date palm is believed to contain a whole genome duplication which occurred sometime after its split from the lineage leading to banana and the grasses. This duplication was inferred based on a peak in the synonymous substitution rate between duplicate date palm genes around 0.25 synonymous substitutions per site. (See supplementary figure 19 of (D’Hont et al., 2012). However the Date Palm genome is not yet well enough assembled for this duplication to be verified and classified as either tetraploidy, hexaploidy, etc through syntenic analysis.

22 SWITCHGRASS TETRAPLOIDY Switchgrass (Panicum virgatum) is a relatively recent tetraploid species (estimated to have occurred the last 0.5-1 million years)

23 CLEOME WGD Cleome belongs to a family of plants that is sister to the crucifers (the Brassicaceae) and at least some species in the genus (including C. spinosa and C. gynandra) carry a hexaploidy which occurred after that lineage split from the ancestor of the crucifers. This event has been identified by both synteny within individual sequenced BACs (Schranz and Mitchell-Olds, 2006) and synonymous substitution rate analysis of a thousands of sequenced Cleome transcripts barker2009paleopolyploidy.

24 BANANA GAMMA The publication of the banana genome (D’Hont et al., 2012) reported a third more ancestral whole genome duplication estimated to have occured around 100 million years ago. By comparing the sigma duplicate regions of poales (WGD 15) with the banana beta ancestral blocks, D’Hont and coworkers placed this gamma WGD after the split between Poales and Zingiberales.

REFERENCES Bertioli, D. J., Moretzsohn, M. C., Madsen, L. H., Sandal, N., Leal-Bertioli, S. C., Guimaraes,˜ P. M., Hougaard, B. K., Fredslund, J., Schauser, L., Nielsen, A. M., et al. (2009). An analysis of synteny of arachis with lotus and medicago sheds new light on the structure, stability and evolution of legume genomes. BMC genomics, 10(1):45. Blanc, G. and Wolfe, K. H. (2004). Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. The Plant Cell, 16(7):1667–1678. Bowers, J. E., Chapman, B. A., Rong, J., and Paterson, A. H. (2003). Unravelling angiosperm by phylogenetic analysis of chromosomal duplication events. Nature, 422(6930):433–438. Cannon, S. B., Ilut, D., Farmer, A. D., Maki, S. L., May, G. D., Singer, S. R., and Doyle, J. J. (2010). Polyploidy did not predate the evolution of nodulation in all legumes. PLoS One, 5(7):e11630–e11630. Cenci, A., Combes, M.-C., and Lashermes, P. (2010). Comparative sequence analyses indicate that coffea (asterids) and vitis (rosids) derive from the same paleo-hexaploid ancestral genome. Molecular and Genomics, 283(5):493–501.

5/7 Cheung, F., Trick, M., Drou, N., Lim, Y. P., Park, J.-Y., Kwon, S.-J., Kim, J.-A., Scott, R., Pires, J. C., Paterson, A. H., et al. (2009). Comparative analysis between homoeologous genome segments of brassica napus and its progenitor species reveals extensive sequence-level divergence. The Plant Cell, 21(7):1912–1928. Consortium, T. G. et al. (2012). The tomato genome sequence provides insights into fleshy fruit evolution. Nature, 485(7400):635–641. Doyle, J. J. and Egan, A. N. (2010). Dating the origins of polyploidy events. New Phytologist, 186(1):73– 85. D’Hont, A., Denoeud, F., Aury, J.-M., Baurens, F.-C., Carreel, F., Garsmeur, O., Noel, B., Bocs, S., Droc, G., Rouard, M., et al. (2012). The banana (musa acuminata) genome and the evolution of monocotyledonous plants. Nature, 488(7410):213–217. Gaut, B. S. and Doebley, J. F. (1997). Dna sequence evidence for the segmental allotetraploid origin of maize. Proceedings of the National Academy of Sciences, 94(13):6809–6814. Jaillon, O., Aury, J.-M., Noel, B., Policriti, A., Clepet, C., Casagrande, A., Choisne, N., Aubourg, S., Vitulo, N., Jubin, C., et al. (2007). The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. nature, 449(7161):463–467. Jiao, Y., Wickett, N. J., Ayyampalayam, S., Chanderbali, A. S., Landherr, L., Ralph, P. E., Tomsho, L. P., Hu, Y., Liang, H., Soltis, P. S., et al. (2011). Ancestral polyploidy in seed plants and angiosperms. Nature, 473(7345):97–100. Lagercrantz, U. (1998). Comparative mapping between arabidopsis thaliana and brassica nigra indicates that brassica genomes have evolved through extensive genome replication accompanied by chromosome fusions and frequent rearrangements. Genetics, 150(3):1217–1228. Lescot, M., Piffanelli, P., Ciampi, A. Y., Ruiz, M., Blanc, G., Leebens-Mack, J., da Silva, F. R., Santos, C. M., D’Hont, A., Garsmeur, O., et al. (2008). Insights into the musa genome: syntenic relationships to rice and between musa species. BMC genomics, 9(1):58. LUKENS, L. N., QUIJADA, P. A., UDALL, J., Pires, J. C., Schranz, M., and Osborn, T. C. (2004). Genome redundancy and plasticity within ancient and recent brassica crop species. Biological Journal of the Linnean Society, 82(4):665–674. Lyons, E., Pedersen, B., Kane, J., and Freeling, M. (2008). The value of nonmodel genomes and an example using synmap within coge to dissect the hexaploidy that predates the rosids. Tropical Plant , 1(3-4):181–190. Lysak, M. A., Koch, M. A., Pecinka, A., and Schubert, I. (2005). Chromosome triplication found across the tribe brassiceae. , 15(4):516–525. Paterson, A., Bowers, J., and Chapman, B. (2004). Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proceedings of the National Academy of Sciences of the United States of America, 101(26):9903–9908. Pfeil, B., Schlueter, J., Shoemaker, R., and Doyle, J. (2005). Placing paleopolyploidy in relation to divergence: a phylogenetic analysis in legumes using 39 gene families. Systematic Biology, 54(3):441–454. Schlueter, J. A., Dixon, P., Granger, C., Grant, D., Clark, L., Doyle, J. J., and Shoemaker, R. C. (2004). Mining est databases to resolve evolutionary events in major crop species. Genome, 47(5):868–876. Schlueter, J. A., Scheffler, B. E., Jackson, S., and Shoemaker, R. C. (2008). Fractionation of synteny in a genomic region containing tandemly duplicated genes across glycine max, medicago truncatula, and arabidopsis thaliana. Journal of heredity, 99(4):390–395. Schnable, J. C., Springer, N. M., and Freeling, M. (2011). Differentiation of the maize subgenomes by genome dominance and both ancient and ongoing gene loss. Proceedings of the National Academy of Sciences, 108(10):4069–4074. Schranz, M. E. and Mitchell-Olds, T. (2006). Independent ancient polyploidy events in the sister families brassicaceae and cleomaceae. The Plant Cell, 18(5):1152–1165. Shoemaker, R., Polzin, K., Labate, J., Specht, J., Brummer, E., Olson, T., Young, N., Concibido, V., Wilcox, J., Tamulonis, J., et al. (1996). Genome duplication in soybean (glycine subgenus soja). Genetics, 144(1):329–338. Swigonovˇ a,´ Z., Lai, J., Ma, J., Ramakrishna, W., Llaca, V., Bennetzen, J. L., and Messing, J. (2004). Close split of sorghum and maize genome progenitors. Genome research, 14(10a):1916–1923. Tang, H., Bowers, J. E., Wang, X., and Paterson, A. H. (2010). Angiosperm genome comparisons

6/7 reveal early polyploidy in the monocot lineage. Proceedings of the National Academy of Sciences, 107(1):472–477. Tang, H., Woodhouse, M. R., Cheng, F., Schnable, J. C., Pedersen, B. S., Conant, G., Wang, X., Freeling, M., and Pires, J. C. (2012). Altered patterns of fractionation and exon deletions in brassica rapa support a two-step model of paleohexaploidy. Genetics, 190(4):1563–1574. Tuskan, G. A., Difazio, S., Jansson, S., Bohlmann, J., Grigoriev, I., Hellsten, U., Putnam, N., Ralph, S., Rombauts, S., Salamov, A., et al. (2006). The genome of black cottonwood, populus trichocarpa (torr. & gray). science, 313(5793):1596–1604. Velasco, R., Zharkikh, A., Affourtit, J., Dhingra, A., Cestaro, A., Kalyanaraman, A., Fontana, P., Bhatnagar, S. K., Troggio, M., Pruss, D., et al. (2010). The genome of the domesticated apple (malus [times] domestica borkh.). Nature genetics, 42(10):833–839. Wang, K., Wang, Z., Li, F., Ye, W., Wang, J., Song, G., Yue, Z., Cong, L., Shang, H., Zhu, S., et al. (2012). The draft genome of a diploid cotton gossypium raimondii. Nature genetics, 44(10):1098–1103. Wang, X., Tang, H., and Paterson, A. H. (2011). Seventy million years of concerted evolution of a homoeologous chromosome pair, in parallel, in major poaceae lineages. The Plant Cell, 23(1):27–37. Wei, F., Coe, E., Nelson, W., Bharti, A. K., Engler, F., Butler, E., Kim, H., Goicoechea, J. L., Chen, M., Lee, S., et al. (2007). Physical and genetic structure of the maize genome reflects its complex evolutionary history. PLoS Genet, 3(7):e123. Yang, T.-J., Kim, J. S., Kwon, S.-J., Lim, K.-B., Choi, B.-S., Kim, J.-A., Jin, M., Park, J. Y., Lim, M.-H., Kim, H.-I., et al. (2006). Sequence-level analysis of the diploidization process in the triplicated flowering locus c region of brassica rapa. The Plant Cell, 18(6):1339–1347. Yu, J., Wang, J., Lin, W., Li, S., Li, H., Zhou, J., Ni, P., Dong, W., Hu, S., Zeng, C., et al. (2005). The genomes of oryza sativa: a history of duplications. PLoS Biol, 3(2):e38.

7/7