Plastome Phylogenomic Study of Gentianeae (Gentianaceae)
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/2020.04.02.021840; this version posted April 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 1 Plastome phylogenomic study of Gentianeae (Gentianaceae): widespread gene 2 tree discordance and its association with evolutionary rate heterogeneity of 3 plastid genes 4 5 Xu Zhang1,2,3,#, Yanxia Sun1,2,#, Jacob B. Landis4,5, Zhenyu Lv6, Jun Shen1,3, Huajie Zhang1,2, Nan Lin1,3, Lijuan 6 Li1,3, Jiao Sun1,3, Tao Deng6, Hang Sun6,*, Hengchang Wang1,2,* 7 8 1CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, 9 Chinese Academy of Sciences, Wuhan 430074, Hubei, China 10 2Center of Conservation Biology, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan 430074, 11 Hubei, China 12 3University of Chinese Academy of Sciences, Beijing, 100049 China 13 4Department of Botany and Plant Sciences, University of California Riverside, Riverside, CA, USA 14 5School of Integrative Plant Science, Section of Plant Biology and the L.H. Bailey Hortorium, Cornell 15 University, Ithaca, NY USA 16 6Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese 17 Academy of Sciences, Kunming, Yunnan, 650201 China 18 19 *Correspondence: [email protected]; [email protected] 20 #The authors contributed equally to this study 21 1 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.02.021840; this version posted April 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 22 Abstract 23 Background: Plastome-scale data have been prevalent in reconstructing the plant Tree of Life. 24 However, phylogenomic studies currently based on plastomes rely primarily on maximum likelihood 25 (ML) inference of concatenated alignments of plastid genes, and thus phylogenetic discordance 26 produced by individual plastid genes has generally been ignored. Moreover, structural and functional 27 characteristics of plastomes indicate that plastid genes may not evolve as a single locus and are 28 experiencing different evolutionary forces, yet the genetic characteristics of plastid genes within a 29 lineage remain poorly studied. 30 Results: We sequenced and annotated ten plastome sequences of Gentianeae. Phylogenomic 31 analyses yielded robust relationships among genera within Gentianeae. We detected great variation 32 of gene tree topologies and revealed more than half of the genes, including one (atpB) of the three 33 widely used plastid markers (rbcL, atpB and matK) in phylogenetic inference of Gentianeae, are 34 likely contributing to phylogenetic ambiguity of Gentianeae. Estimation of nucleotide substitution 35 rates showed extensive rate heterogeneity among different plastid genes and among different 36 functional groups of genes. Comparative analysis suggested that the ribosomal protein (RPL and 37 RPS) genes and the RNA polymerase (RPO) genes have higher substitution rates and genetic 38 variations in Gentianeae. Our study revealed that just one (matK) of the three (matK, ndhB and rbcL) 39 widely used markers show high phylogenetic informativeness (PI) value. Due to the high PI and 40 lowest gene-tree discordance, rpoC2 is advocated as a promising plastid DNA barcode for taxonomic 41 studies of Gentianeae. Furthermore, our analyses revealed a positive correlation of evolutionary rates 42 with genetic variation of plastid genes, but a negative correlation with gene-tree discordance under 43 purifying selection. 44 Conclusions: Overall, our results demonstrate the heterogeneity of nucleotide substitution rates and 45 genetic characteristics among plastid genes providing new insights into plastome evolution, while 46 highlighting the necessity of considering gene-tree discordance into phylogenomic studies based on 47 plastome-scale data. 48 Keyword Plastome; Phylogenetic discordance; Gentianeae; Coalescence; Gene trees; Nucleotide 49 substitution rates; Gene characteristics; Phylogenetic informativeness 50 2 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.02.021840; this version posted April 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 51 Background 52 Whole plastomes have become more accessible with the explosive development of next-generation 53 sequencing (NGS) technologies [1-3]. Due to the unique mode of inheritance, conservativeness in 54 gene content and order, and high copy number per cell [4, 5], plastomes have been widely used in 55 reconstructing the plant Tree of Life (e.g. [6-10]). Moreover, compared to standard fragment DNA 56 barcodes, plastome-scale data can provide an abundance of informative sites for phylogenetic 57 analyses [5, 11-13]. Nonetheless, the effectiveness of plastome-scale data is ultimately reflected by 58 the extent to which they reveal the “true” phylogenetic relationships of a given lineage [14]. 59 Although plastomes have been canonically regarded as a linked single locus due to its 60 uniparental inheritance and lack of sexual recombination [4, 5, 15], structural and functional 61 characteristics of plastomes suggest that plastid genes may not evolve as a single locus and are 62 experiencing different evolutionary forces [16-18]. Furthermore, despite evolving at a lower rate than 63 genes in the nucleus [19], rate variation within the plastomes has been documented between the 64 single copy and inverted repeat regions, different functional groups of genes, or across angiosperm 65 lineages (e.g. [16, 20-22]). Factors contributing to rate variation and affecting the evolution of 66 different plastid genes include mutation rate variation across families and between 67 coding/non-coding regions, as well as variation in the SSC due to the presence of two configurations 68 of the inversion [14]. However, in many recent phylogenomic studies using the full plastomes, only 69 the results from the full concatenated data set are presented (e.g. [6, 7, 23]). In these cases, gene-tree 70 discordance due to evolutionary rate variation of individual genes remains poorly understood. In 71 addition to concatenated approaches, multispecies coalescent methods (MSC) account for gene tree 72 heterogeneity allowing for the assessment of ancient hybridization, introgression, and incomplete 73 lineage sorting (ILS) by using the summed fits of gene trees to estimate the species tree [24, 25]. 74 Recently, phylogenomic studies suggest that a combination of concatenated and coalescent methods 75 can produce accurate phylogenies and benefit the investigation into the incongruence between gene 76 trees and species trees [18, 26]. 77 Comparative genomic studies based on plastomes have mainly concentrated on structure 78 variations, such as contraction or expansion of inverted repeats (IR) (e.g. [27-29]) and genomic 79 rearrangements (e.g. [30-33]), yet the genetic characteristics of plastid genes within a lineage, such 3 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.02.021840; this version posted April 3, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 80 as genetic variation and phylogenetic informativeness, remain poorly studied. These characteristics 81 may vary among different genes or functional groups of genes and are of great importance in our 82 understanding of plastome evolution and phylogenetic inference. Additionally, the correlation 83 between evolutionary rate and gene characteristics can be invoked as an explanation of the primary 84 impetus of plastome evolution [16, 20, 34, 35]. 85 The tribe Gentianeae, with its two subtribes Gentianinae and Swertiinae, include ca. 950 species 86 in 21 genera, exhibiting the highest species diversity of the Gentianaceae [36]. Members of 87 Gentianinae are easily distinguishable from Swertiinae by the presence of intracalycine membranes 88 between the corolla lobes and plicae between the corolla lobes, with both traits absent in Swertiinae 89 [36-38]. Although several phylogenetic studies have confirmed the monophyly of both subtribes 90 [36-40], the generic delimitation within Gentianeae remains ambiguous, especially within Swertiinae, 91 with some genera (e.g., Swertia L., Gentianella Moench, Comastoma (Wettst.) Toyok., Lomatogonium 92 A.Braun) being paraphyletic [38]. The current phylogeny of Gentianeae is based upon a few DNA 93 markers, commonly including ITS, atpB, rbcL, matK, and trnL–trnF [36-41], thus a full taxonomic 94 and evolutionary understanding of these groups is hindered by the unsatisfactory phylogenetic 95 resolution. 96 To gain new insights into the evolution of plastomes, and to improve delineation of the 97 phylogenetic affinities among genera in Gentianeae, we constructed a dataset of plastome sequences 98 including 29 Gentianeae species and three outgroups. We generated 76 protein-coding gene (PCG) 99 sequences to infer phylogenies via both