The Litsea Genome and the Evolution of the Laurel Family
Total Page:16
File Type:pdf, Size:1020Kb
ARTICLE https://doi.org/10.1038/s41467-020-15493-5 OPEN The Litsea genome and the evolution of the laurel family Yi-Cun Chen 1,2,17, Zhen Li 3,4,17, Yun-Xiao Zhao 1,2,17, Ming Gao 1,2,17, Jie-Yu Wang 5,6,17, Ke-Wei Liu 7,8,9,17, Xue Wang 1,2, Li-Wen Wu 1,2, Yu-Lian Jiao 1,2, Zi-Long Xu 1,2, Wen-Guang He 1,2, Qi-Yan Zhang 1,2, Chieh-Kai Liang 10, Yu-Yun Hsiao 11, Di-Yang Zhang5, Si-Ren Lan5, Laiqiang Huang7,8,9, ✉ ✉ ✉ Wei Xu 12, Wen-Chieh Tsai 10,11,13 , Zhong-Jian Liu 2,5,6,8,14 , Yves Van de Peer3,4,15,16 & ✉ Yang-Dong Wang 1,2 1234567890():,; The laurel family within the Magnoliids has attracted attentions owing to its scents, variable inflorescences, and controversial phylogenetic position. Here, we present a chromosome- level assembly of the Litsea cubeba genome, together with low-coverage genomic and transcriptomic data for many other Lauraceae. Phylogenomic analyses show phylogenetic discordance at the position of Magnoliids, suggesting incomplete lineage sorting during the divergence of monocots, eudicots, and Magnoliids. An ancient whole-genome duplication (WGD) event occurred just before the divergence of Laurales and Magnoliales; subsequently, independent WGDs occurred almost simultaneously in the three Lauralean lineages. The phylogenetic relationships within Lauraceae correspond to the divergence of inflorescences, as evidenced by the phylogeny of FUWA, a conserved gene involved in determining panicle architecture in Lauraceae. Monoterpene synthases responsible for production of specific volatile compounds in Lauraceae are functionally verified. Our work sheds light on the evo- lution of the Lauraceae, the genetic basis for floral evolution and specific scents. 1 State Key Laboratory of Tree Genetics and Breeding, Chinese Academy of Forestry, Beijing 100091, China. 2 Research Institute of Subtropical Forestry, Chinese Academy of Forestry, Hangzhou 311400, China. 3 Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium. 4 VIB Center for Plant Systems Biology, VIB, 9052 Ghent, Belgium. 5 Key Laboratory of National Forestry and Grassland Administration for Orchid Conservation and Utilization at College of Landscape Architecture, Fujian Agriculture and Forestry University, Fuzhou 350002, China. 6 College of Forestry and Landscape Architecture, South China Agricultural University, Guangzhou 510642, China. 7 School of Life Sciences, Tsinghua University, Beijing 100084, China. 8 Center for Biotechnology and Biomedicine, Shenzhen Key Laboratory of Gene and Antibody Therapy, State Key Laboratory of Chemical Oncogenomics, State Key Laboratory of Health Sciences and Technology (prep), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, Guangdong 518055, China. 9 Center for Precision Medicine and Healthcare, Tsinghua-Berkeley Shenzhen Institute (TBSI), Shenzhen, Guangdong 518055, China. 10 Department of Life Sciences, National Cheng Kung University, Tainan 701, Taiwan. 11 Orchid Research and Development Center, National Cheng Kung University, Tainan 701, Taiwan. 12 Novogene Bioinformatics Institute, Beijing 100083, China. 13 Institute of Tropical Plant Sciences, National Cheng Kung University, Tainan 701, Taiwan. 14 Henry Fok College of Biology and Agriculture, Shaoguan University, Shaoguan 512005, China. 15 Center for Microbial Ecology and Genomics, Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria 0028, South Africa. 16 College of Horticulture, Nanjing Agricultural University, Nanjing 210095, China. 17These authors contributed equally: Yi-Cun Chen, Zhen Li, Yun-Xiao Zhao, Ming Gao, ✉ Jie-Yu Wang, Ke-Wei Liu. email: [email protected]; [email protected]; [email protected]; [email protected] NATURE COMMUNICATIONS | (2020) 11:1675 | https://doi.org/10.1038/s41467-020-15493-5 | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-15493-5 auraceae, also referred to as the laurel family, is a family Table 2). Further scaffolding was done based on 292.17 Gb reads Lfrom the order Laurales in Magnoliids1. The family includes from a sequencing library of Genome-wide Chromosome Con- a total of 2500–3000 globally distributed species in 44 genera formation Capture (Hi-C). We were able to anchor a total of of the woody subfamily Lauroideae and about 25 species in 1 1018 scaffolds covering 1253.47 Mb (94.56%) of the assembled genus of the parasitic subfamily Cassythoideae2. The morpholo- genome into 12 pseudochromosomes (Supplementary Figs. 2c, 3, gical features of flowers in Lauraceae species3, including various and Supplementary Table 3). To confirm the completeness of the inflorescences and the existence of both bisexual and unisexual assembly, we performed CEGMA22, BUSCO23 assessments, and flowers4–6 (Supplementary Fig. 1), provide a reference for used mRNA sequences of L. cubeba and found the completeness studying flower evolution in angiosperms. In addition, the specific of the genome to be 95.97% (Supplementary Table 4), 88.4% scents from Lauraceae species have made the laurel family eco- (Supplementary Table 5), and 97% (Supplementary Table 6), nomically important as a source of medicine, spices, and per- respectively. A combination of homolog-based comparisons and fumes2. A diverse array of terpenoids, mainly monoterpenes and structure-based analyses resulted in an annotation of 735 Mb sesquiterpenes, defines the scents of different species in transposable elements (TEs), representing 55.47% of the L. Lauraceae7,8. Terpene synthases (TPSs) have been primarily cubeba genome (Supplementary Table 7), which is between that responsible for the monoterpene production9; however, research of C. kanehirae (~48% in a 730.7 Mb genome)12 and L. chinense on the TPS gene families in Lauraceae is still in its infancy due to (~62% in a 1742.4 Mb genome)11 (Supplementary Table 8). Long the hitherto limited genomic data. From a phylogenetic per- terminal repeats (LTRs) are the predominant TEs in the genome spective, the relationships among Magnoliids, monocots, and of L. cubeba, which represent 47.64% (631 Mb) of the whole eudicots still remain to be debated10–13. For instance, the analysis genome. Both L. cubeba and L. chinense have a larger genome size of the Cinnamomum kanehirae12 genome supported a sister than that of C. kanehireae12, and they both contain higher con- relationship of Magnoliids and eudicots to the exclusion of tent of LTR/gypsy and copia elements (45.31%) than that of the monocots, while the genomes of Liriodendron chinense11 and C. kanehireae genome (16.50%)12. Hence, it suggests that LTR/ Persea americana14 suggested Magnoliids as a sister group to the gypsy and copia elements contribute most to the expansions of clade consisting of both eudicots and monocots. Still some other the L. cubeba and L. chinense genomes (Supplementary Table 8). studies, amongst those based on organellar genes and a limited A high-confidence set of 31,329 protein-coding genes were number of nuclear genes, support a sister relationship between predicted in the L. cubeba genome, of which 29,262 (93.4%) and Magnoliids and monocots, to the exclusion of eudicots15,16. The 27,753 (88.59%) were supported by transcriptome data and often-conflicting evolutionary relationships also reflect the mor- protein homologs, respectively (Supplementary Fig. 4a and phological complexities among monocots, eudicots, and Magno- Supplementary Table 9). A total of 29,651 (94.6%) predicted liids. For example, the spiral floral phyllotaxis is present in protein-coding genes were functionally annotated (Supplemen- Magnoliids and eudicots, but not in monocots; and Magnoliids tary Fig. 4a and Supplementary Table 10) and 30,314 (91.3%) of and eudicots generally have carpels with one, two, or more ovules, the genes could be located on the 12 pseudochromosomes. In while most monocots have more than two ovules17. However, addition, 1284 (89.2%) of the 1440 protein-coding genes in the flowers are trimerous in Magnoliids and monocots but tetra- BUSCO plant set were predicted in the L. cubeba genome merous or pentamerous in eudicots. Intermediate ascidiate car- (Supplementary Table 5). pels are predominantly present in Magnoliids, which We then compared the genomes of 26 plant species to obtain differentiates them from other angiosperm lineages17. gene families that are significantly expanded in Lauraceae or that As two species, C. kanehirae and P. americana, from the core are unique to Lauraceae (Supplementary Fig. 4b, c and Supple- Lauraceae (including the Laureae-Cinnamomeae group and Per- mentary Table 11). Kyoto Encyclopedia of Genes and Genomes sea group)18 have already been sequenced12,14, we here present a (KEGG) and Gene Ontology (GO) enrichment analyses found that chromosome-level assembly of the genome of May Chang tree the significantly expanded gene families are especially enriched in (Litsea cubeba Lour.), which is from the sister clade to C. kane- the KEGG pathways of monoterpenoid biosynthesis, biosynthesis hirae in the core Lauraceae. It is an important species for pro- of secondary metabolites, and metabolic (Supplementary Table 12) ducing essential oils (roughly 95% terpenoid) that are widely used and in the GO terms of TPS activity, transferase activity, and in perfumes, cosmetics, and medicine all over the world19–21. catalytic activity (Supplementary Table 13). Many monoterpene Further, to revisit the phylogenetic position of Magnoliids relative synthase (TPS-b) genes are included in the above enriched