<<

Article The water lily genome and the early evolution of flowering

https://doi.org/10.1038/s41586-019-1852-5 Liangsheng Zhang1,23,24*, Fei Chen1,2,23,24, Xingtan Zhang1,23, Zhen Li3,4,23, Yiyong Zhao5,6,23, Rolf Lohaus3,4,23, Xiaojun Chang1,7,23, Wei Dong1, Simon Y. W. Ho8, Xing Liu1, Aixia Song1, Received: 6 August 2019 Junhao Chen9, Wenlei Guo9, Zhengjia Wang9, Yingyu Zhuang1, Haifeng Wang1, Xuequn Chen1, Accepted: 31 October 2019 Juan Hu1, Yanhui Liu1, Yuan Qin1, Kai Wang1, Shanshan Dong7, Yang Liu7,10, Shouzhou Zhang7, Xianxian Yu11, Qian Wu12,13, Liangsheng Wang12,13, Xueqing Yan13,14, Yuannian Jiao13,14, Published online: 18 December 2019 Hongzhi Kong13,14, Xiaofan Zhou15, Cuiwei Yu16, Yuchu Chen16, Fan Li17, Jihua Wang17, Open access Wei Chen18, Xinlu Chen19, Qidong Jia20, Chi Zhang19, Yifan Jiang2, Wanbo Zhang2, Guanhua Liu21, Jianyu Fu21, Feng Chen2,19,20,24, Hong Ma6,24, Yves Van de Peer3,4,22,24 & Haibao Tang1,24

Water lilies belong to the angiosperm order Nymphaeales. Amborellales, Nymphaeales and Austrobaileyales together form the so-called ANA-grade of angiosperms, which are extant representatives of lineages that diverged the earliest from the lineage leading to the extant mesangiosperms1–3. Here we report the 409-megabase genome sequence of the blue-petal water lily (Nymphaea colorata). Our phylogenomic analyses support Amborellales and Nymphaeales as successive sister lineages to all other extant angiosperms. The N. colorata genome and 19 other water lily transcriptomes reveal a Nymphaealean whole-genome duplication event, which is shared by Nymphaeaceae and possibly Cabombaceae. Among the genes retained from this whole-genome duplication are homologues of genes that regulate fowering transition and fower development. The broad expression of homologues of foral ABCE genes in N. colorata might support a similarly broadly active ancestral ABCE model of foral organ determination in early angiosperms. Water lilies have evolved attractive foral scents and colours, which are features shared with mesangiosperms, and we identifed their putative biosynthetic genes in N. colorata. The chemical compounds and biosynthetic genes behind foral scents suggest that they have evolved in parallel to those in mesangiosperms. Because of its unique phylogenetic position, the N. colorata genome sheds light on the early evolution of angiosperms.

Many water lily species, particularly from Nymphaea (Nymphaeaceae), were anchored onto 14 pseudo-chromosomes (Extended Data Fig. 1 have large and showy flowers and belong to the angiosperms (also and Extended Data Table 1). Genome completeness was estimated to called flowering plants). Their aesthetic beauty has captivated nota- be 94.4% (Supplementary Note 2). We annotated 31,580 protein-coding ble artists such as the French impressionist Claude Monet. Water lily genes and predicted repetitive elements with a collective length of flowers have limited differentiation in perianths (outer floral organs), 160.4 Mb, accounting for 39.2% of the genome (Supplementary Note 3). but they possess both male and female organs and have diverse scents The N. colorata genome provides an opportunity to resolve the and colours, similar to many mesangiosperms (core angiosperms, relationships between Amborellales, Nymphaeales and all other extant including , monocots, and magnoliids) (Supplementary angiosperms (Fig. 1a). Using six eudicots, six monocots, N. colorata and Note 1). In addition, some water lilies have short life cycles and enormous Amborella5, and each of three gymnosperm species (Ginkgo biloba, numbers of seeds4, which increase their potential as a model to rep- Picea abies and Pinus taeda) as an outgroup in turn, we identified 2,169, resent the ANA-grade of angiosperms and to study early evolutionary 1,535 and 1,515 orthologous low-copy nuclear (LCN) genes, respec- events within the angiosperms. In particular, N. colorata Peter has a tively (Fig. 1b). Among the LCN gene trees inferred from nucleotide relatively small genome size (2n = 28 and approximately 400 Mb) and sequences using G. biloba as an outgroup, 62% (294 out of 475 trees) blue petals that make it popular in breeding programs (Supplementary place Amborella as the sister lineage to all other extant angiosperms Note 1). with bootstrap support greater than 80% (type II, Fig. 1c). Using P. abies We report here the genome sequence of N. colorata, obtained using or P. taeda as the outgroup, Amborella is placed as the sister lineage PacBio RSII single-molecule real-time (SMRT) sequencing technol- to the remaining angiosperms in 57% and 54% of the LCN gene trees, ogy. The genome was assembled into 1,429 contigs (with a contig N50 respectively. LCN gene trees inferred using amino acid sequences show of 2.1 Mb) and total length of 409 Mb with 804 scaffolds, 770 of which similar phylogenetic patterns (Supplementary Note 4.1).

A list of affiliations appears at the end of the paper.

Nature | Vol 577 | 2 January 2020 | 79 Article a Type I Type II Type III Eudicots (E) E E Bootstrap (BS) BS BS M M Arabidopsis lyrata Monocots (M) d Arabidopsis thaliana Capsella rubella Amborellales (A) N N Thellungiella halophila Gossypium arboreum

Theobroma cacao Nymphaeales (N) A A Citrus clementina Ricinus communis Gymnosperms (G) Manihot esculenta G G Populus trichocarpa Glycine max b Total Type I Type II Type III Outgroup Phaseolus vulgaris trees Trees BS>50% BS>80% Trees BS>50% BS>80% Trees BS>50% BS>80% Medicago truncatula Citrullus lanatus Ginkgo biloba 2,169 310 194 46 811 662 294 314 241 135 Cucumis sativus Prunus persica Picea abies 1,535 233 139 26 407 302 108 183 129 56 Fragaria vesca Morus notabilis Pinus taeda 1,515 220 131 24 381 285 116 190 138 74 Eucalyptus grandis Vitis vinifera Vitales c Ginkgo biloba Picea abies Pinus taeda Chenopodium quinoa

Beta vulgaris Alternanthera philoxeroides Type I Type I Type I Stellaria media 46 Mirabilis jalapa 26 24 10% Type III Phytolacca americana 14% Type III 11% Type III 135 F14 cooperi 56 74 Opuntia dillenii 28% 29% 35% Polygonum runcinatum Tamarix chinensis Hieracium umbellatum Type II Type II Type II

294 116 Cichorium intybus Eudicots 108 Lactuca sativa 62% 57% 54% Sonchus asper Tagetes erecta Dahlia pinnata Cirsium vulgare BS > 80% Dipsacus laciniatus Asterids Lonicera japonica Ligusticum striatum 1 Daucus carota Apium graveolens e Hedera nepalensis Mentha canadensis Mimulus guttatus 2 3 Phyla dulcis F16 Sesamum indicum Olea europaea Solanum lycopersicum 4 5 6 Capsicum annuum 6 Vinca major Aucuba japonica F7 Philadelphus incanus 8 9 F12 Hydrangea macrophylla Cornus wilsoniana Camellia japonica Gunnera manicata Gunnerales Buxus sinica 10 14 11 F4 Nelumbo nucifera F10 Platanus acerifolia Meliosma parviflora Early-diverging Aquilegia coerulea Eudicot lineages 12 13 Nandina domestica F21 Eschscholzia californica Cinnamomum camphora Magnoliids Laurales F8 Litsea cubeba 15 17 16 Chimonanthus praecox Liriodendron chinense Magnoliales F9 Piper nigrum Houttuynia cordata Piperales 18 19 Aristolochia tagala Oryza sativa Brachypodium distachyon F15 Sorghum bicolor Ananas comosus Commelinids 20 F20 Canna indica F13

Musa acuminata Monocots Trachycarpus fortunei Asparagus officinalis Yucca filamentosa Asparagales Phalaenopsis equestris Phalaenopsis aphrodite F19 Lilium brownii Liliales 7 Pandanus utilis Pandanales 7 Dioscorea opposita Dioscoreales Pinellia ternate Spirodela polyrhiza Alismatales Zostera marina Austrobaileyales Acorus calamus Acorales Illicium henryi Austrobaileyales Nymphaea ‘Choolarp’ 1 Nymphaea ‘Paramee’ 2 Nymphaea ‘Thong Garnjana’ 3 ANA-grade angiosperms Nymphaea ‘Midnight’ 4 Summarized BS values from 5 coalescent trees F18 Nymphaea ‘Woods blue goddess’5 At least 4 trees: BS ≥ 80 Nymphaea caerulea 6 Nymphaea colorata 7 At least 4 trees: BS ≥ 70 Nymphaeales Nymphaea gigantea ‘Hybrid I’ 8 At least 3 trees: BS ≥ 80 Nymphaea gigantea‘Albert de Lestang’ 9 Low support Nymphaea tetragona 10 Nymphaea mexicana 11 F Fossil calibration point Nymphaea potamophila12 F17 Nymphaea prolifera 13 Nymphaea rubra 14 F6 Euryale ferox 15 Nymphaeales Victoria cruziana 16 F5 Nuphar lutea 17 Nuphar advena 18 F1 Brasenia schreberi 19 F11 Amborellales Cabomba caroliniana 20 Amborella trichopoda Amborellales Pinus taeda F3 Picea abies Gymnosperms F2 Gnetum montanum Ginkgo biloba Carboniferous Permian Triassic Jurassic Cretaceous Palaeogene Neogene 375 350 325 300 275 250 225 200 175 150 125 100 75 50 25 0 Time (Ma)

Fig. 1 | Phylogenomic relationships of angiosperms. a, Three different outgroup. The percentage was calculated by dividing the number of type I, II or evolutionary relationships among major clades of angiosperms. b, Number of III trees (BS > 80%) by the total number of trees. d, Summary phylogeny and LCN gene trees with different bootstrap support (BS) values based on timescale of 115 plant species. Blue bars at nodes represent 95% credibility nucleotide sequences from six eudicots, six monocots, N. colorata, Amborella intervals of the estimated dates. e, The flowers of the 20 sampled water lilies in and three different gymnosperms. c, Comparison of gene trees supporting the Nymphaeales used in d. three evolutionary relationships using each gymnosperm in turn as the

To minimize the potential shortcomings of sparse taxon sampling6, ANA-grade, eudicots, magnoliids, monocots and a gymnosperm out- we also inferred an angiosperm species tree using sequences from group (Gnetum montanum, G. biloba, P. abies and P. taeda) (Methods). 44 genomes and 71 transcriptomes, including representatives of the For further phylogenetic inference of these 115 species, we selected,

80 | Nature | Vol 577 | 2 January 2020 b Number of gene families 1 a 1 600 Number of anchor pairs 1 Nymphaea colorata Victoria cruziana−N. colorata 1 Nymphaea caerulea 500 Nuphar advena−N. colorata 2.5 Nymphaea gigantea 0.2 (KS) 9 400 2.0 9 Nymphaea mexicana 0.2569 0.1016 Euryale ferox 300 Cabomba caroliniana−N. colorata 1.5 19 0.0607 19 0.0734 Victoria cruziana 0.0527 0.0864

1.0 thologue density 200 211(28) Nuphar lutea Illicium henryi−N. colorata Nymphaeales 213

Or 0.0203

Number of anchor pairs Amborella trichopoda 0.5132 0.1301 0.5 5(1) Nuphar advena 100 −N. colorata 5 0.0435 Cabomba caroliniana 5(2) 0.033 0.5284 0 0 6 Austrobaileyales Illicium henryi 0.5892 01234 Amborellales Amborella trichopoda 0.7231 KS Ginkgo biloba

c d N2 N2 N. colorata N. colorata C2 (loss) N1 N1 C. caroliniana C. caroliniana C1 C1

Fig. 2 | A Nymphaealean WGD shared by Nymphaeaceae and possibly pairs in N. colorata coalesced. All mapped duplication events have BS ≥ 80% in

Cabombaceae. a, KS age distributions for paralogues found in collinear regions the gene trees. c, Left, the scenario of a WGD (yellow four-pointed star) before (anchor pairs) of N. colorata and for orthologues between N. colorata and the divergence between Nymphaeaceae and Cabombaceae. Right, a possible selected Nymphaealean and angiosperm species. Red and yellow arrows gene tree under this scenario, with loss of one duplicate in C. caroliniana. Two indicate under- and overestimations of the N. colorata–Nuphar advena and red dots show where the anchor pair of N. colorata would coalesce. d, Left, N. colorata–C. caroliniana divergence, respectively. b, WGD phylogenomic scenario of a WGD in the stem lineage of Nymphaeaceae involving an analysis. Numbers in parentheses are the number of gene families with retained allotetraploid (green four-pointed star) that formed between two ancestral C. caroliniana duplicates supporting the duplication events. Numbers below parents after the divergence of the lineages leading to N. colorata and branches show branch lengths in KS units. The double-arrowed line denotes C. caroliniana, with one of the parents being more closely related to total KS from the pointed node to N. colorata. We used G. biloba (dashed branch) C. caroliniana. Right, a gene tree under such a scenario. Red dots are as in c. as an outgroup. The red dot denotes the branch on which most of the anchor

based on various criteria, five different LCN gene sets including 1,167, occurred before or close to the divergence between Nymphaeaceae 834, 683, 602 and 445 genes. Analyses of these five datasets all yielded and Cabombaceae (Extended Data Fig. 2d, Supplementary Note 5.3), similar tree topologies with Amborella and Nymphaeales as successive considering the variable substitution rates among Nymphaealean line- sister lineages to all other extant angiosperms (Fig. 1d, e, Supplemen- ages (Fig. 2a, b, Extended Data Fig. 2c). An alternative interpretation tary Note 4.2). of the above results could be that the WGD signatures were from an Molecular dating of angiosperm lineages, using a stringent set of 101 allopolyploidy event that occurred between ancestral Nymphaeaceae LCN genes and with age calibrations based on 21 fossils7, inferred the and Cabombaceae lineages shortly after their divergence and that crown age of angiosperms at 234–263 million years ago (Ma) (Fig. 1d). gave rise to the Nymphaeaceae (but not Cabombaceae) stem lineage The split between monocots and eudicots was estimated at 171–203 Ma (Fig. 2d, Supplementary Note 5.4). and that between Nymphaeaceae and Cabombaceae at 147–185 Ma. The water lily lineage descended from one of the early divergences Genomic collinearity unveiled evidence of a whole-genome dupli- among angiosperms, before the radiation of mesangiosperms. Thus, cation (WGD) event in N. colorata (Extended Data Figs. 1f, 2a and Sup- this group offers a unique window into the early evolution of angio- plementary Note 5.1). The number of synonymous substitutions per sperms, particularly that of the flower. We identified 70 MADS-box synonymous site (KS) distributions for N. colorata paralogues further genes, including homologues of the genes for the ABCE model of floral showed a signature peak at KS of approximately 0.9 (Fig. 2a) and peaks at organ identities: AP1 (and also FUL) and AGL6 (A function for sepals similar KS values were identified in other Nymphaeaceae species (Sup- and petals), AP3 and PI (B function for petals and stamen), AG (C func- plementary Note 5.2), which suggests an ancient single WGD event that tion for stamen and carpel), and SEP1 (E function for interacting with is probably shared among Nymphaeaceae members. Comparison of ABC function proteins). Phylogenetic and collinearity analyses of the the N. colorata paralogue KS distribution with KS distributions of ortho- MADS-box genes and their genomic neighbourhood indicate that an logues (representing speciation events) between N. colorata and other ancient tandem duplication before the divergence of seed plants gave Nymphaeales lineages, Illicium henryi, and Amborella suggests that birth to the ancestors of A function (FUL) and E function genes (SEP) the WGD occurred just after the divergence between Nymphaeaceae (Extended Data Fig. 3, Supplementary Note 6.1). Also, owing to the Nym- and Cabombaceae (Fig. 2a). By contrast, phylogenomic analyses of phaealean WGD, N. colorata has two paralogues, AGa and AGb of the gene families that contained at least one paralogue pair from collinear C-function gene AG (Extended Data Fig. 4). Similarly, the Nymphaealean regions of N. colorata suggest that the WGD is shared between Nym- WGD-derived duplicates are homologous to other genes associated phaeaceae and Cabombaceae (Fig. 2b, Supplementary Note 5.4). If true, with development of carpel and stamen8, and to genes that regulate Cabomba caroliniana seems to have retained few duplicates (Fig. 2b, flowering time9 and auxin-controlled circadian opening and closure of c), which would explain the absence of a clear peak in the C. caroliniana the flower10 (Extended Data Figs. 4–6, Supplementary Note 6.2–6.4). paralogue KS distribution (Supplementary Note 5.2). Absolute dating The expression profiles of N. colorata ABCE homologues largely agree of the paralogues of N. colorata does suggest that the WGD could have with their putative ascribed roles in floral organ patterning (Fig. 3a).

Nature | Vol 577 | 2 January 2020 | 81 Article

a 10 8 6 4 2 0 b Eudicots, typical ABCE model log (FPKM+1) 2 B FLCb OsMADS32 AC Cluster 1 FLCa E SOC1 SVP Sepal Petal Stamen Carpel 2 AGa, C function AGb, C function Cluster Nymphaea, ancestral ABCE model SEP, E function AP3, B function AGL6, A function A, AGL6 AGL32b AGL32c B, AP3 AGL32f AGL12a B, PI AGL32g STK B, AGL32d AGL32a 3 AGL32d, B function C, AGa ANR1a Cluster ANR1b C, AGb AGL32e PI, B function E, SEP AGL15 FUL Sepal Petal Stamen Carpel AGL12b Gymnosperms, BC model Root Petal Sepal Carpel B Stamen s C Juvenile leaMaturef leaf C Mature leafstalk Juvenile ower Female cone/ Juvenile leafstalk Male cone/ megasporangiate Floral organs microstrobilus strobilus Vegetative organ

Fig. 3 | MADS-box genes in N. colorata and proposed floral ABCE model in genes. Expression values were scaled by log2(FPKM + 1), in which FPKM is early angiosperms. a, Gene expression patterns of MIKCc from various organs fragments per kilobase of exon per million mapped reads. b, The flowering of N. colorata. Three clusters of genes were classified according to the ABCE model in N. colorata that specifies floral organs is proposed based on the expression of type II MADS-box genes. The organ types (vegetative organs and gene expression values (bar heights) from a. floral organs) were matched to the expression patterns of type II MADS-box

Notably, the N. colorata AGL6 homologue is mainly expressed in sepals sesquiterpene biosynthase in N. colorata. Also, methyl decanoate has and petals, whereas the FUL homologue is mainly expressed in carpels, not been detected as a volatile compound in monocots and eudicots18 suggesting that AGL6 acts as an A-function gene in N. colorata. The two and is thought to be synthesized in N. colorata by the SABATH family C-function homologues AGa and AGb are highly expressed in stamens of methyltransferases19. The N. colorata genome contains 13 SABATH and carpels, respectively, whereas AGb is also expressed in sepals and homologues and 12 of them form a Nymphaeales-specific group (Sup- petals, suggesting that they might have undergone subfunctionaliza- plementary Fig. 41). Among these 12 members, NC11G0120830 showed tion and possibly neofunctionalization for flower development after the highest expression in petals (Fig. 4c) and its corresponding recom- the Nymphaealean WGD. Furthermore, the ABCE homologues in N. binant protein was demonstrated to be a fatty acid methyltransferase colorata generally exhibit wider ranges of expression in floral organs that had the highest activity with decanoic acid as the substrate (Fig. 4d, than their counterparts in eudicot model systems (Fig. 3b). This wider Supplementary Note 7.1). These results suggest that the floral scent expression pattern, in combination with broader expression of at least biosynthesis in N. colorata has been accomplished through enzymatic some ABCE genes in some eudicots representing an early-diverging functions that have evolved independently from those in mesangio- lineage11, some monocots12 and magnoliids13, suggest an ancient ABCE sperms (Fig. 4e). model for flower development, with subsequent canalization of gene Nymphaea colorata is valued for the aesthetically attractive blue expression and function regulated by the more specialized ABCE genes colour of petals, which is a rare trait in ornamentals. To understand the during the evolution of mesangiosperms, especially core eudicots8. molecular basis of the blue colour, we identified delphinidin 3′-O-(2″- This could also account for the limited differentiation between sepals O-galloyl-6″-O-acetyl-β-galactopyranoside) as the main blue anthocya- and petals in Nymphaeales species, and is consistent with a single type nidin pigment (Extended Data Fig. 8a–c). By comparing the expression of perianth organ proposed in an ancestral angiosperm flower14. profiles between two N. colorata cultivars with white and blue petals Floral scent serves as olfactory cues for insect pollinators15. Whereas for genes in a reconstructed anthocyanidin biosynthesis pathway, we Amborella flowers are scentless16, N. colorata flowers release 11 different found genes for an anthocyanidin synthase and a delphinidin-modi- volatile compounds, including terpenoids (sesquiterpenes), fatty- fication enzyme, the expression of which was significantly higher in acid derivatives (methyl decanoate) and benzenoids (Fig. 4a). The N. blue petals than in white petals (Extended Data Fig. 8d, e). These two colorata genome contains 92 putative terpene synthase (TPS) genes, enzymes catalyse the last two steps of anthocyanidin biosynthesis and which are ascribed to four previously recognized TPS subfamilies in are therefore key enzymes specialized in blue pigment biosynthesis20,21 angiosperms: TPS-b, TPS-c, TPS-e/f and TPS-g (Fig. 4b), but none was (Supplementary Note 7.2). found for TPS-a, which is responsible for sesquiterpene biosynthesis Water lilies have a global distribution that includes cold regions in mesangiosperms17. Notably, TPS-b contains more than 80 genes in (northern China and northern Canada), unlike the other ANA-grade N. colorata; NC11G0123420 is highly expressed in flowers (Extended angiosperms Amborella (Pacific Islands) and Austrobaileyales (tem- Data Fig. 7); this result suggests that it may be a candidate gene for perate and tropical regions). We detected marked expansions of genes

82 | Nature | Vol 577 | 2 January 2020 a b Arabidopsis thaliana Populus trichocarpa Oryza sativa Nonyl acetate Sorghum bicolor 85 TPS-b IS Nymphaea colorata in N. colorata Amborella trichopoda 5.0 Picea abies m-Anisaldehyde Selaginella moellendorffii 2 Physcomitrella patens Marchantia polymorpha 4.0 11 fragrance molecules Tetradecyne 3.0 9

2.0 Germacrene B Abundance (total ion) (E)-β- α-Bergamotene Fanesene Isodaucene Heptadecene Methyl Pentadecane 1.0 Methyl 10 decanoate3 78 octanoate 3 8 Farnesane 11 11 44 66 0 5 0.6 d 15 20 25 1.0 NC11G0120830 Retention time (min) 0.8 c 0.6 SABATH gene family FPKM NC11G0120830 0.4 NC7G0246550 1,400 NC2G0007900 Relative activity 0.2 NC11G0240820 1,200 NC11G0240810 0 NC13G0302010 1,000 Butanoic Hexanoic Octanoic Decanoic Dodecanoic Tetradecanoic NC12G0138630 acid (C4) acid (C6) acid (C8) acid (C10) acid (C12) acid (C14) NC11G0138610 800 NC11G0138620 Fatty acids 600 NC2G0007880 NC11G0120810 e Floral sesquiterpenes Floral methyl decanoate 400 NC11G0120820 Compound Gene Compound Gene Petal Stamen Sepal Juvenile leaf Juvenile leafstalk Root Mature leaf Carpel Mature leafstalk Juvenile owe r 200 Monocots + TPS-a –

Eudicots + TPS-a –

Nymphaeales + TPS-b/g + DAMT

Amborellales – –

Fig. 4 | Floral scent and biosynthesis in N. colorata. a, Gas chromatogram of d, Relative activity of Escherichia coli-expressed NC11G0120830 with six fatty floral volatiles from the flower of N. colorata. The internal standard (IS) is nonyl acids as substrates, with the activity on decanoic acid set at 1.0. Data are acetate. Methyl esters are in blue; terpenes are in red. Floral scent was mean ± s.d. of three independent measurements. e, The presence (+) and measured three times independently with similar results. b, Phylogenetic tree absence (−) of sesquiterpenes and methyl decanoate as floral scent compounds of terpene synthases from N. colorata and representative plants showing the and their respective biosynthetic genes in four major lineages of angiosperms subfamilies from a–h and x. c, Expression analysis of SABATH genes of when known. DAMT, decanoic acid methyltranferase. N. colorata showed that NC11G0120830 had the highest expression level in petal. related to immunity and stress responses in N. colorata, including 1. Byng, J. W. et al. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG IV. Bot. J. Linn. Soc. 181, 1–20 genes encoding nucleotide-binding leucine-rich repeat (NLR) proteins, (2016). protein kinases and WRKY transcription factors, compared with those 2. Zeng, L. et al. Resolution of deep angiosperm phylogeny using conserved nuclear genes in Amborella and some mesangiosperms (Extended Data Fig. 9, Sup- and estimates of early divergence times. Nat. Commun. 5, 4956 (2014). 3. Qiu, Y. L. et al. The earliest angiosperms: evidence from mitochondrial, plastid and plementary Note 8). It is possible that increased numbers of these genes nuclear genomes. Nature 402, 404–407 (1999). enabled water lilies to adapt to various ecological habitats globally. 4. Chen, F. et al. Water lilies as emerging models for Darwin’s abominable mystery. Hortic. In conclusion, the N. colorata genome offers a reference for compara- Res. 4, 17051 (2017). 5. Amborella Genome Project. The Amborella genome and the evolution of flowering tive genomics and for resolving the deep phylogenetic relationships plants. Science 342, 1241089 (2013). among the ANA-grade and mesangiosperms. It has also revealed a WGD 6. Wiens, J. J. Missing data, incomplete taxa, and phylogenetic accuracy. Syst. Biol. 52, specific to Nymphaeales, and provides insights into the early evolution 528–538 (2003). 7. Coiro, M., Doyle, J. A. & Hilton, J. How deep is the conflict between molecular and fossil of angiosperms on key innovations such as flower development and evidence on the age of angiosperms? New Phytol. 223, 83–99 (2019). floral scent and colour. 8. Alvarez-Buylla, E. R. et al. Flower development. Arabidopsis Book 8, e0127 (2010). 9. Zhao, N. et al. Identification of flowering regulatory genes in allopolyploid Brassica juncea. Hortic. Plant J. 5, 109–119 (2019). 10. Ke, M. et al. Auxin controls circadian flower opening and closure in the waterlily. BMC Online content Plant Biol. 18, 143 (2018). Any methods, additional references, Nature Research reporting sum- 11. Sharma, B. & Kramer, E. M. Aquilegia B gene homologs promote petaloidy of the sepals and maintenance of the C domain boundary. Evodevo 8, 22 (2017). maries, source data, extended data, supplementary information, 12. Dodsworth, S. Petal, sepal, or tepal? B-genes and monocot flowers. Trends Plant Sci. 22, acknowledgements, peer review information; details of author con- 8–10 (2017). tributions and competing interests; and statements of data and code 13. Chanderbali, A. S. et al. Conservation and canalization of gene expression during angiosperm diversification accompany the origin and evolution of the flower. Proc. Natl availability are available at https://doi.org/10.1038/s41586-019-1852-5. Acad. Sci. USA 107, 22570–22575 (2010).

Nature | Vol 577 | 2 January 2020 | 83 Article

14. Sauquet, H. et al. The ancestral flower of angiosperms and its early diversification. Nat. 1Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Commun. 8, 16047 (2017). Ministry of Education for Genetics, Breeding and Multiple Utilization of Crops, Key Laboratory of 15. Kessler, D. et al. How scent and nectar influence floral antagonists and mutualists. eLife 4, National Forestry and Grassland Administration for Orchid Conservation and Utilization, Fujian e07641 (2015). 2 Agriculture and Forestry University, Fuzhou, China. College of Horticulture, Nanjing 16. Thien, L. B. et al. The population structure and floral biology of Amborella trichopoda Agricultural University, Nanjing, China. 3Department of Plant Biotechnology and Bioinformatics, (Amborellaceae). Ann. Mo. Bot. Gard. 90, 466–490 (2003). 4 5 17. Chen, F., Tholl, D., Bohlmann, J. & Pichersky, E. The family of terpene synthases in plants: Ghent University, Ghent, Belgium. VIB Center for Plant Systems Biology, Ghent, Belgium. State a mid-size family of genes for specialized metabolism that is highly diversified throughout Key Laboratory of Genetic Engineering, Ministry of Education Key Laboratory of Biodiversity the kingdom. Plant J. 66, 212–229 (2011). Sciences and Ecological Engineering, School of Life Sciences, Fudan University, Shanghai, 18. Knudsen, J. T., Tollsten, L. & Bergstrom, L. G. Floral scents-a checklist of volatile China. 6Department of Biology, Huck Institutes of the Life Sciences, Pennsylvania State compounds isolated by head-space techniques. Phytochemistry 33, 253–280 (1993). University, University Park, PA, USA. 7Fairy Lake Botanical Garden, Shenzhen and Chinese 19. Zhao, N. et al. Structural, biochemical, and phylogenetic analyses suggest that indole-3- Academy of Sciences, Shenzhen, China. 8School of Life and Environmental Sciences, University acetic acid methyltransferase is an evolutionarily ancient member of the SABATH family. of Sydney, Sydney, New South Wales, Australia. 9State Key Laboratory of Subtropical Plant Physiol. 146, 455–467 (2008). Silviculture, School of Forestry and Biotechnology, Zhejiang A&F University, Hangzhou, China. 20. Chen, W. H. et al. Downregulation of putative UDP-glucose: flavonoid 3-O- 10BGI-Shenzhen, Shenzhen, China. 11School of Urban-Rural Planning and Landscape glucosyltransferase gene alters flower coloring in Phalaenopsis. Plant Cell Rep. 30, 12 1007–1017 (2011). Architecture, Xuchang University, Xuchang, China. Key Laboratory of Plant Resources/Beijing 13 21. Wu, Q. et al. Transcriptome sequencing and metabolite analysis for revealing the blue Botanical Garden, Institute of Botany, Chinese Academy of Sciences, Beijing, China. University flower formation in waterlily. BMC Genomics 17, 897 (2016). of the Chinese Academy of Sciences, Beijing, China. 14State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, China. 15Guangdong Province Key Laboratory of Microbial Signals and Disease Control, Integrative Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in Microbiology Research Centre, South China Agricultural University, Guangzhou, China. published maps and institutional affiliations. 16Hangzhou Tianjing Aquatic Botanical Garden, Zhejiang Humanities Landscape Co. Ltd., Hangzhou, China. 17National Engineering Research Center for Ornamental Horticulture, Key Open Access This article is licensed under a Creative Commons Attribution Laboratory for Flower Breeding of Yunnan Province, Floriculture Research Institute, Yunnan 4.0 International License, which permits use, sharing, adaptation, Academy of Agricultural Sciences, Kunming, China. 18Innovative Institute of Chinese Medicine distribution and reproduction in any medium or format, as long as you give and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China. appropriate credit to the original author(s) and the source, provide a link to the Creative 19Department of Plant Sciences, University of Tennessee, Knoxville, TN, USA. 20Graduate School Commons license, and indicate if changes were made. The images or other third party of Genome Science and Technology, University of Tennessee, Knoxville, TN, USA. 21Key material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Laboratory of Tea Quality and Safety Control, Ministry of Agriculture and Rural Affairs, Tea 22 Creative Commons license and your intended use is not permitted by statutory regulation Research Institute, Chinese Academy of Agricultural Sciences, Hangzhou, China. Centre for or exceeds the permitted use, you will need to obtain permission directly from the Microbial Ecology and Genomics, Department of Biochemistry, Genetics and Microbiology, copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/ University of Pretoria, Pretoria, South Africa. 23These authors contributed equally: Liangsheng by/4.0/. Zhang, Fei Chen, Xingtan Zhang, Zhen Li, Yiyong Zhao, Rolf Lohaus, Xiaojun Chang. 24These authors jointly supervised this work: Liangsheng Zhang, Fei Chen, Feng Chen, Hong Ma, Yves © The Author(s) 2019 Van de Peer, Haibao Tang. *e-mail: [email protected]

84 | Nature | Vol 577 | 2 January 2020