bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
1 Comparative analyses and phylogenetic relationships of 15 Trapa (Trapaceae) species
2 based on Complete Chloroplast Genomes
3 Xiangrong Fan a, b +, Wuchao Wang c, d +, Godfrey K. Wagutu c, d, Wei Li c, e, Xiuling Li f *,
4 Yuanyuan Chen c, e *
5
6 a College of Science, Tibet University, Lhasa 850000, Tibet Autonomous Region, P. R. China
7 b Research Center for Ecology and Environment of Qinghai-Tibetan Plateau, Tibet University, Lhasa
8 850000, Tibet Autonomous Region, P. R. China
9 c Hubei Key Laboratory of Wetland Evolution & Ecological Restoration, Wuhan Botanical Garden,
10 Chinese Academy of Sciences, Wuhan 430074, Hubei, P. R. China
11 d University of Chinese Academy of Sciences, Beijing 100049, P. R. China
12 e Key Laboratory of Aquatic Botany and Watershed Ecology, Wuhan Botanical Garden, Center of Plant
13 Ecology, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan 430074, P. R. China
14 f College of Life Science, Linyi University, Linyi 276000, Shandong, P. R. China
15
16 + The authors are equally contributing to the study
17 * Correspondence:
18 Yuanyuan Chen, Key Laboratory of Aquatic Botany and Watershed Ecology, Wuhan Botanical Garden,
19 Center of Plant Ecology, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan 430074, P. R.
20 China. Email: [email protected]
21 Xiuling Li, College of Life Science, Linyi University, Linyi 276000, Shandong, P. R. China. Email:
23
1 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
24 Abstract
25 Trapa L. is floating-leaved aquatic plants with important economic and ecological values. However,
26 species identification and phylogenetic relationship are still unresolved for Trapa. In this study, complete
27 chloroplast genomes of 13 Trapa species/taxa were sequenced and annotated. Combined with released
28 sequences of the other two species, comparative analysis of cp genomes was first performed on the 15
29 Trapa species/taxa. The 15 cp genomes exhibited quadripartite structures with medium size of 155,
30 453-155, 559 bp. IR/SC junctions were conservative with no obvious change found. Long repetitive
31 repeats and SSRs were mostly detected in the intergenic and LSC regions, providing useful plastid markers
32 for species and relationship identification. Three phylogenetic analyses (MP, ML and BI) consistently
33 showed two clusters within Trapa, including large- and small-seed species/taxa, respectively. This study
34 provided the baseline information for phylogeography of Trapa, which would facilitate the management
35 and utilization of genetic resources of the genus.
36
37 Keywords: Trapa; complete chloroplast genomes; comparative analysis; species identification;
38 phylogeography
39
2 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
40 1. Introduction
41 Trapaceae, containing the only genus Trapa, is annual floating-leaved aquatic herbs naturally distributed in
42 tropical, subtropical and temperate regions of Eurasia and Africa, and invading North America and
43 Australia (Chen et al., 2007). APG II (The Angiosperm Phylogeny Group, 2003) equated Trapaceae with
44 Lythraceae. However, a handful of morphological differences exist between the two families. For example,
45 flowers of Trapaceae are solitary, 4-merous and actinomorphic, with half-inferior and slightly perigynous
46 ovaries; Lythraceae has racemes or cymes, and the flowers are usually 4-, 6- or 8-merous, regular or
47 irregular, with obvious perigynous ovaries. Therefore, Trapaceae is still be used today by some researchers
48 (Chen et al., 2007). Trapa has important edible value because high content of starch of seeds, which has
49 been widely cultivated as an important aquatic crop in China and India (Hummel and Kiviat, 2004). Trapa
50 seed pericarps were traditional herb medicines in China, and recent studies found that the extraction of
51 seed pericarps had bioactive components to restrain cancer, atherosclerosis, inflammation and oxidation
52 (Akao et al., 2013; Ciou et al., 2008; Yu and Shen, 2015; Li et al., 2017; Kauser et al., 2018). Additionally,
53 Trapa plants can be used to purify water body due to their excellent performance in absorbing heavy
54 metals and nutrients (Sweta et al., 2015; Xu et al., 2020). Therefore, Trapa has important economic and
55 ecological values. However, many Trapa species are becoming endangered or even locally extinct due to
56 human interferences in Europe and China (Gupta and Beentje, 2017; Chen et al., field observations).
57 Conversely, Trapa plants were notorious intruders in Canada and the northeastern United States.
58 The knowledge of accurate species identification and phylogenetic relationship among species is
59 essential to effectively protect and utilize genetic resources of wild plants and to manage and remove
60 invasive plants (Campbell et al., 2009; Chorak et al., 2019). However, due to the large range of
61 morphological variation and the lack of effective diagnostic criteria, researchers have held sharp different
3 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
62 opinions on taxonomic classification of Trapa, with 1 or 2 polymorphic species or more than 20, 30 or 70
63 species within the genus (Cook, 1996; Vassiljev, 1949, 1965; Tutin, 1968; Yan, 1983; Kak, 1988; Chen et
64 al., 2007). Meanwhile, the phylogenetic relationship for Trapa species was unresolved by the efforts of
65 pollen morphology, cytology, quantitative classification and ontogenesis (Diao et al., 1990; Ding et al.,
66 1991; Huang et al., 1996; Wang and Ding, 1997; Fan et al., 2016). For example, Xiong et al. (1985a, b)
67 proposed that T. acornis Nakano and T. bispinosa Roxb were closely related based on the results of
68 quantitative classification, while Ding et al. (1991) showed that instead of T. bispinosa, T. quadrispinosa
69 Roxb was closed related to T. acornis based on their pollen morphology. And studies of quantitative
70 classification showed that the Trapa species/taxa with seeds of similar size evolved closely (Xiong et al.,
71 1990; Fan et al., 2016), which was proved by the results of allozymes (Takano and Kadono, 2005).
72 Additionally, molecular methods further illustrated that the Trapa species with small seeds, T. incisa and T.
73 maximowiczii, were of basal classification status within the genus (Jiang and Ding, 2004; Fan et al., 2021).
74 For the large-seed Trapa species, a close relationship was found between T. bispinosa and T. quadrispinosa
75 based on RAPDs and AFLPs (Jiang and Ding, 2004; Li et al., 2017; Fan et al., 2021). It is worth noting
76 that there are two diversity centers in Trapa genus, one distributed in the mid-lower reaches of Yangtze
77 River, and the other one located in the Tumen River and Amur River basins from bordering region of
78 Russia and China (Xue et al., 2016, 2017). However, most previous studies involved fewer species/taxa,
79 mostly collected from the mid-lower reaches of Yangtze River, and genotyped by molecular markers or
80 nuclear genome sequencing (Li et al., 2017; Fan et al., 2021). Researches about chloroplast genome were
81 rarely involved (Xue et al., 2017; Sun et al., 2020).
82 Chloroplast (cp) is a core organelle in plants for photosynthesis (Howe et al., 2003). In angiosperms,
83 the cp genome is haploid, maternal-inherited and shaped into a DNA circle with conserved quadripartite
4 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
84 structure. Additionally, the cp genome has slow evolutionary rate, high copy numbers per cell and compact
85 size with 120-170kb in length (Ruhlman and Jansen, 2014). Those characteristics of cp genome, combined
86 with the development of high-throughput technology, make sequencing of the complete cp genome an
87 ideal tool for species identification and plant phylogenic evaluation (Cauz-Santos et al., 2017; Hong et al.,
88 2020; Yang et al., 2018). To date, whole sequences of six Trapa chloroplast genomes (T. natans, T. bicornis,
89 T. quadrispinosa, T. kozhevnikovirum, T. incisa and T. maximowiczii) have been published, but there have
90 been no researches involving comparative analysis of the Trapa cp genomes. For Trapa, more efforts
91 should be made to analyze the interspecific difference and assess the phylogenic relation based on the
92 complete cp sequences.
93 Besides the wild species, Trapa also has a large number of cultivated species or varieties because of
94 its long history of cultivation. The comprehensive analysis of cp genomes focused on wild Trapa, and the
95 cultivated species or varieties would be included in another research. In this study, we sequenced the whole
96 cp genomes of 13 wild Trapa species/taxa. Combined with the two published cp genomes, comprehensive
97 analysis was first carried out within Trapa genus based on the 15 cp genomes data. Our specific goals were
98 as follows: (1) to compare the chloroplast structures within Trapa; (2) to detect the highly variable regions
99 as DNA barcoding markers for Trapa species identification; (3) to infer the phylogenetic relationship
100 among Trapa species. This study will provide the baseline information for phylogeography within Trapa
101 and facilitate the management and utilization of genetic resources of Trapa.
102
5 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
103 2. Materials and methods
104 2.1 Plant Materials and DNA Extraction
105 In the autumns of 2018 and 2019, young and healthy leaves were collected from 13 Trapa species/taxa.
106 Among them, 10 species/taxa were recorded in Chinese Flora Republicae Popularis Sinicae (T. bispinosa,
107 T. quadrispinosa, T. japonica, T. mammillifera, T. macropoda var. bispinosa, T. litwinowii, T. arcuata, T.
108 pseudoincisa, T. manshurica, and T. maximowiczii; Wan, 2000); two species (T. potaninii and T. sibirica)
109 were first recorded in Floral of USSR (Vassiljev, 1949). A new Trapa taxon was collected from Baidang
110 Lake, Anhui province, China. The seeds of the taxon have two horns with the height from 23.4 to 34.8 mm
111 and the width from 13.4 to 18.3 mm. The horns of the new taxon were wide and drooping, looking like a
112 pig's ears. Based on the nut size and the sample site, the taxon was named Trapa natans var. baidangensis.
113 All voucher specimens were deposited in the herbarium of Wuhan Botanical Garden (HIB; Table 1). The
114 fresh leaves were cleaned and dried in silica gel immediately. Genomic DNA was extracted from the dried
115 leaves according to the CTAB protocol described by Doyle (1987). The DNA concentration and quality
116 was quantified by the NanoDrop 2000 micro spectrophotometer (Thermo Fisher Scientific).
117 2.2 Chloroplast Genome Sequencing and Assembling
118 High quality DNA was used to build the genomic libraries. Sequencing was performed using paired end
119 150bp (average short-insert about 350 bp) on Illumina 2500 (Illumina, San Diego, California, USA) at
120 Beijing Novogene bio Mdt InfoTech Ltd (Beijing, China). To get the high quality clean data, Fastp (Chen
121 et al., 2018) was run to cut and filter the raw reads with default settings. For the 13 Trapa species/taxa
122 sequenced, more than 5G clean data for each sample was generated after removing adapters and low
123 quality reads, and the number of clean reads ranged from 5.22 G (T. mammillifera) to 6.06 G (T. bispinosa).
124 De novo assembly was carried out using the assembler GetOrganelle v1.7 (Jin et al., 2020) with the default
6 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
125 setting. The software Geneious primer (Biomatters Ltd., Auckland, New Zealand) was employed to align
126 the contigs and determine the order of the newly assembled plastomes, with T. bicornis genome sequence
127 as a reference. All the annotated cp sequence data reported here were deposited in GenBank with accession
128 numbers shown in Table 1.
129 2.3 Annotation and codon usage
130 We used the genome annotator PGA (Qu et al., 2019) and software GeSeq (Tillich et al., 2017) to annotate
131 protein coding genes, tRNAs and rRNAs, according to the references of T. quadrispinosa (MT_941481)
132 and T. maximowiczii (NC_037023). Manual correction was carried out to locate the start and stop codons
133 and the boundaries between the exons and introns. With tRNAscan-SE v1.21, BLASTN searches were
134 further performed to confirm the tRNA and rRNA genes (Schattner et al., 2005). The physical maps of cp
135 genomes were generated by OGDRAW (Lohse et al., 2007).
136 The relative synonymous codon usage (RSCU) was the ratio of the frequency of a particular codon to
137 the expected frequency of that codon, which was obtained by DAMBE v6.04 (Cauzsantos et al., 2017).
138 When the value of RSCU is larger than 1, the codon is used more often than expected. Otherwise, when the
139 RSCU value < 1, the codon is less used than expected (Sharp et al., 1987).
140 2.4 Comparative genomic analyses
141 Comparative genomic analyses were carried out among the 15 wild Trapa species/taxa, which included the
142 13 species/taxa newly sequenced, and two published ones (T. kozhevnikovirum and T. incisa) (Wang et al.,
143 2021; Wagutu et al., 2021). The published cp genomes were downloaded from the National Center for
144 Biotechnology Information (NCBI) organelle genome database (https://www.ncbi.nlm.nih.gov).
145 The mVISTA program in Shuffle-LAGAN mode was used to compare the complete cp genome within
146 the 15 species/taxa, with the annotation of T. quadrispinos as a reference (MT_941481). After a manual
7 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
147 alignment using MEGA X, all regions, including coding and non-coding regions, were extracted to detect
148 the hyper-variable sites (Kumar et al., 2018). The nucleotide variability (Pi) was computed using DnaSP
149 5.10 (Librado and Rozas, 2009).
150 2.5 Analysis of repeat sequences and SSRs
151 Repeat sequences, including forward, palindromic, reverse and complement repeats, were detected by
152 REPuter (Kurtz and Schleiermacher, 1999). The parameters were set with repeat size of ≥30 bp and 90% or
153 greater sequence identity (hamming distance of 3).
154 Simple sequence repeats (SSRs) were identified using MISA perl script (Thiel et al., 2003), with the
155 threshold number of repeats set as 10, 5, 4, 3, 3, and 3 for mono-, di-, tri-, tetra-, penta-, and
156 hexa-nucleotide SSRs, respectively.
157 2.6 Phylogenetic analyses
158 Phylogenetic analyses based on the complete chloroplast genomes were carried out for the 18 species/taxa,
159 including the 13 newly sequenced species/taxa and five published ones (T. bicornis, T. natans, T.
160 maximowiczii, T. insica and T. kozhevnikovirum). Considering the close relationship between Trapaceae
161 and Sonneratiaceae/Lythraceae (Sun et al., 2020), Sonneratia alba (Sonneratiaceae) and two Lagerstroemia
162 species (L. calyculata and L. intermedia, Lythraceae) were included as outgroups. The whole cp genome of
163 the five published Trapa species and the three species as outgroups were downloaded from Genbank.
164 The sequences were aligned using program Mafft 7.0 (Katoh and Standley, 2013) with default
165 parameters. Then, the F81+G was selected as the best-fit model of nucleotide substitution by software
166 Jmodeltest 2 (Darriba et al., 2012). The phylogenetic trees were constructed using three methods: (1) A
167 maximum likelihood (ML) tree was performed using PhyML v.3.0 (Guindon et al., 2010) with 5000
168 bootstrap replications. The result was visible by the software Figtree v1.4
8 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
169 (https://github.com/rambaut/figtree/releases); (2) The Maximum Parsimony (MP) tree was run by the mega
170 X (Kumar et al., 2018) with 5000 bootstrap values; (3) Bayesian inference (BI) tree was built by the
171 MrBayes v. 3.2.6 (Ronquist et al., 2012) with 2,000,000 generations and sampling every 5000 generations.
172 The first 25% of all trees were regarded as “burn-in” and discarded, and the Bayesian posterior
173 probabilities (PP) were calculated from the remaining trees.
174
175 3. Results and discussion
176 3.1 Structure and content of Chloroplast Genome
177 Previous studies showed that the length of cp genome ranged from 120-170kb (Ruhlman and Jansen, 2014).
178 In the present study, the cp genomes of the 15 Trapa species/taxa have medium size with 155,453-155,559
179 bp, compared to the recent study from Lythraceae with 152,049-160,769 bp (Xu et al., 2017; Zheng et al.,
180 2020). The two species with small seeds showed the smallest cp genome size with 155, 453 for T.
181 maximowiczii and 155, 477 for T. incisa. For the 13 Trapa species/taxa with large seeds, the cp genomes of
182 the three species/taxa (T. bispinosa, T. quadrispinosa and T. macropoda var. bispinosa) showed the relative
183 small size with 155, 485-155, 495 bp, and the others exhibited similar size with 155, 535 bp (T. litwinowii)
184 - 155, 559 bp (T. mammillifera and T. natans var. baidangensis) (Table 2).
185 Unlike those of the non-photosynthetic (parasitic and mycoheterotrophic) plants, the plastid genomes
186 of green plants are usually highly conserved with the quadripartite structure (Naumann et al., 2016). The
187 typical quadripartite structure was also found in Trapa, which consisted of a pair of inverted repeat regions
188 (IRs) (26, 380 – 26, 388 bp) separated by the small single copy (SSC) (18, 265 – 18, 279 bp) and large
189 single copy (LSC) regions (88, 398 – 88, 512 bp). This suggested the difference in cp genome size of wild
190 Trapa species mainly occurred in LSC region.
9 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
191 The 15 Trapa speices/taxa showed the identical level of GC content with the total content 36.40-
192 36.41%, which was close to that of Lagerstroemia (37.57 - 37.72%, Zheng et al., 2020) and Lythraceae
193 (36.41 - 37.72%, Gu et al., 2019). Previous studies detected that the GC content varied greatly among
194 different regions of cp genome. High level of GC content was generally contained in rRNA genes which
195 were located in IR regions (Zheng et al., 2020). Similarly, for Trapa, the GC content of rRNA genes
196 reached to 55.48%, which resulted in the highest GC content in the IR regions (42.77%) compared with
197 that of the other two regions (LSC, 34.17 - 34.20%; SSC, 30.17 - 30.21%) (Figure 1; Table 4).
198 There were 130 genes annotated in the cp genomes of the 13 Trapa species/taxa with large seeds,
199 compared with the 129 genes in the two small-seed Trapa species (T incisa and T. maximowiczii). Previous
200 studies showed that the number of PCGs and tRNAs varied in different species, but the number of rRNA
201 was stable (Zhang et al., 2020b), which was proved in the present study. The cp genomes of the 13 Trapa
202 species/taxa with large seeds contained 85 protein-coding genes (PCGs), 37 tRNA genes and 8 rRNA
203 genes. In comparison, the two Trapa species with small seeds had 83 PCGs, 38 tRNA genes and 8 rRNA
204 genes. The two small-seed species lost the gene ycf3 related to the conserved open reading frames and one
205 rps12 for self-replication. Among the unique genes, 44 genes were related to photosynthesis and 59 genes
206 were associated with self-replication (Table 3).
207 3.2 Hypervariable Regions and Comparative Genomic Analysis
208 Except the gene space, the remaining regions were non-coding sequences, comprising intergenic spacers
209 (IGS) regions, introns and pseudogenes (Lu et al., 2017). In this study, the intergenic regions and introns
210 among the 15 Trapa species/taxa ranged from 46, 592 to 46, 812 bp and from 15, 957 to 18, 852 bp,
211 respectively. IGS and introns generally have higher nucleotide substitution rates than the protein-coding
212 regions, which yield many phylogenetically informative sites (Shaw et al. 2007; Jansen and Ruhlman,
10 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
213 2012). In this study, high level of the nucleotide diversity (Pi) values was found in the non-coding regions
214 (0 - 0.0123, with average 0.000857) compared with those of the coding regions (0 - 0.00282, with average
215 0.000217) (Figure 7).
216 Previous studies exhibited a negative correlation between the GC content and the variability of cp
217 genome sequences (Zhang et al., 2020a; Zheng et al., 2020). This was proved in the present study. Most
218 variable regions for Trapa were located in IGS regions with the lowest GC content (30.28-30.42%) (Table
219 4). And the highest Pi values (0.00416-0.0123) were found in the IGS regions, including ndhE—ndhG,
220 psbA—trnK-UUU, psbK—psbI, rps2—rpoC2 and atpA—atpF (Figure 7). Therefore, IGS is often used as
221 DNA barcoding markers in many species (Nguyen et al., 2015). It is worth noting that IGS regions were
222 mostly distributed in the LSC region. Among the 20 highest Pi values, 16 divergence hotspots with low
223 sequence identity were found, which included 11 regions (psaC, clpP, psbE, atpF, rps2—rpoC2,
224 psbK—psbI, psbA—trnK-UUU, trnS-GCU—trnG-UCC, trnH-GUG—psbA, trnT-GGU—psbD and
225 accD-psaI) in the LSC and 5 regions (ndhD, ndhI, ycf1, ndhF and ndhE—ndhG) in the SSC (Figure 3).
226 We found 15/16 genes containing one or two introns for the small-seed/large-seed species, consisting
227 of 6 tRNA genes (trnK-UUU, trnA-UGC, trnI-GAU, trnG-UCC, trnV-UAC and trnL-UAA) and 8
228 protein-coding genes (PCGs) (rps16, rpoC1, atpF, ndhB, ndhA, petD, petB and rpl16) with one intron, and
229 one PCG clpP with two introns for the small-seed Trapa species, but two PCGs (ycf3 and clpP) for the
230 large-seed ones. The introns of the PCGs shared the similar position and length in each of the 15 Trapa
231 taxa (Table 5), suggesting the high conservation of cp genome within Trapa. The rpl2 intron loss was
232 considered as one of the important evolutionary event in Lythraceae, which was presumed to occur after
233 the divergence of Lythraceae from Onagraceae (Chao et al., 2017; Gu et al., 2016, 2019; Zheng et al.,
234 2020). Like Lythraceae, the 15 Trapa species/taxa exhibited the loss of rpl2 intron, which might suggest a
11 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
235 close genetic relationship between Lythraceae and Trapaceae.
236 The gene contains several internal stop codons, which tends to be a pseudogene (Lu et al., 2017;
237 Sloan et al., 2014). Alternatively, if the sequence was conserved over broad evolutionary distances and lack
238 of internal stop, it tends to be assumed as functional protein-coding genes (Raubeson et al., 2007). In the
239 present study, though the ycf3 gene was only detected in the large-seed Trapa taxa, none internal stop
240 codon was found within the gene. Therefore, instead of a pseudogene, the ycf3 appears as a protein-coding
241 gene.
242 3.3 Repeat Structure and cpSSR
243 For cp genomes, long repeats play an important role in the sequence rearrangement and recombination,
244 which could be as modules of large assemblies and participate in catalytic function or protein binding
245 (Nazareno et al. 2015; Zheng et al., 2020). A total of 595 long repetitive repeats (30 - 65bp) were identified
246 from the 15 Trapa taxa, consisting of 324 forward, 230 palindromic, 25 reverse and 16 complementary
247 repeats. For the genus Trapa, the size of top three most frequently shown long repeats was 30bp, 31bp and
248 65bp, which was found 227, 77 and 60 times, respectively. For each species/taxon, the number of long
249 repeats varied from 37 (T. incisa and T. maximowiczii) to 41 (T. sibirica); and the number of forward,
250 palindromic, reversed and complementary repeats were 19-24, 15-16, 0-3 and1-2, respectively (Figure 5).
251 Additionally, most long repetitive repeats were distributed in intergenic areas, and a few existed in shared
252 genes, such as ycf2, which was analogous to the results of other angiosperm lineages (Yang et al., 2017;
253 Zheng et al., 2020).
254 Due to a high polymorphism rate at the intra- and inter-species level, cp genome SSRs have been
255 viewed as excellent molecular markers in population genetics and phylogenetic research (Xue et al., 2017;
256 Zheng et al., 2020). In this study, for each species/taxon, the number of total SSRs was from 110 (T.
12 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
257 quadrispinosa) to 123 (T. incisa and T. maximowiczii). Most cp SSRs, from 83.48% (T. japonica, T.
258 manshurica and T. litwinowii) to 86.18% (T. maximowiczii), were distributed in the LSC regions, which
259 was higher than that in the genus Lagerstroemia (66.55%; Xu et al., 2017) and Myrsinaceae (74.37%; Yan
260 et al., 2019). Among these SSRs, the mononucleotide A/T repeat units occupied the highest proportion
261 with 78.18 - 80.49%, and the second and third highest proportions were dinucleotide repeats (AC/GT and
262 AT/AT) and tetranucleotide repeats (AAAG/CTTT, AAAT/ATTT, AACC/GGTT, AAGT/ACTT,
263 AATT/TTAA, ACAT/ATGT and AGAT/ATCT) accounting for 9.40 - 10.00% and 8.13 - 9.09%,
264 respectively (Figure 6). The observed high AT content in cp SSRs was also found in the other genera (Asaf
265 et al., 2016; Chao et al., 2017; Kuang et al., 2011)
266 3.4 Boundaries of IR regions and codon usage
267 For angiosperms, the boundaries between IR regions and single-copy (SC) regions result in the difference
268 of genome size by expansion or shrinkage (Kim and Lee, 2004; Lin et al., 2012). In this study, the
269 junctions of IR/SC regions were compared among the 15 Trapa species\taxa. The cp genomes of the 15
270 Trapa plants showed many minor differences in the boundary regions although the number and order of
271 genes were highly conserved. The rps19 gene spanned the LSC/IRb border and extended into the IRb
272 region with the length 75-83 bp (except T. manshurica with 75 bp, the others were 83 bp). The ycf1 gene
273 covered the junction of SSC/IRa, and the part of ycf1 into IRa region showed a stable size (1095 bp) for all
274 the Trapa plants. On the contrary, the remaining part of ycf1, located in SSC, showed a variable size, with
275 4539 bp for the two small-seed species and 4533 bp for the remaining 13 taxa. Accordingly, a pseudogene
276 fragment ψycf1 was detected in the border of IRa/LSC with 1098 bp for all Trapa taxa, and the parts
277 located in IRb and SSC were 1095 bp and 3 bp, respectively. The gene trnH was distributed on the right
278 side of the border of IRb/LSC, with an interval of 32 - 47 bp from the border to the gene (Figure 4).
13 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
279 For all the Trapa species/taxa, a total of 64 types of codons encoding 20 amino acids were detected.
280 In all, the 83/85 protein coding genes within Trapa encoded 25, 832 to 26, 590 codons. The results were
281 comparable to that of the genus Lagerstroemia, 79 genes encoding 25, 068 - 27, 111 codons in their cp
282 genomes (Zheng et al., 2020). Three termination codons, UGA, UAG and UAA, were found for all the
283 Trapa species. The highest number of codon occurrences (> 1000) was found in GAA, UUU, AUU and
284 AAA, which encoded Glutamicacid, Phenylalanine, Isoleucine and Lysine, respectively. Leucine,
285 Isoleucine and Serine were the top three amino acids with high frequency, , with the frequency 2771-2826,
286 2229-2298 and 1992-2067, respectively. Conversely, Cysteine (288-298) and Tryptophan (459-468) were
287 the least frequent amino acids.
288 Like that of Lagerstroemia (Zheng et al., 2020), the RSCU value of a single amino acid showed a
289 positive correlation with the number of codons encoding it. The highest and lowest RSCU values were
290 exhibited in the UUA encoding Leucine (1.96) and the AGC encoding Serine (approximately 0.34),
291 respectively (Figure 2). The results also showed that 30 codons were used frequently with RSCU values >
292 1, and all of them ended with A/U, which might be associated with the high proportion of A/T in cp
293 genomes (Eguiluz et al., 2017).
294 3.5 Phylogenetic Analysis
295 Based on the whole cp genomes, the three phylogenetic trees (MP, ML and BI) showed the similar
296 clustering patterns. The species from the same family clustered together. At first, the branch with the two
297 Lagerstroemia species (Lythraceae) separated from the other species. And S. alba (Sonneratiaceae) showed
298 to be a sister of Trapa species, which was also supported by recent studies (Sun et al., 2020; Wang et al.,
299 2021; Wagutu et al., 2021).
300 All the Trapa species/taxa were divided into two clusters, the clusters of small-seed species and
14 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
301 large-seed species, which were consistent with the results of morphological studies (Fan et al., 2016). The
302 two species with small seeds initially separated from the other Trapa taxa, which agreed to the results of
303 AFLPs study recently published (Fan et al., 2021). Additionally, the present study further detected a close
304 genetic relationship between the two species with small seeds. On the other hand, in the cluster with large
305 seeds, the cultivated species T. bicornis was the outermost part of the group, which might suggest a
306 complex origin of cultivated Trapa species. However, it was reported that another common cultivated
307 species T. bicornis var. cochinchinensis demonstrated a hybrid origin between T. quadrispinosa and T.
308 bispinosa (Fan et al., 2021). The remaining large-seed Trapa taxa were divided into three clusters: (1) the
309 first cluster included three Trapa species (T. quadrispinosa, T. bispinosa and T. macropoda var. bispinosa).
310 The intimate relationship between the former two has been proved by many studies (Li et al., 2017; Fan et
311 al., 2021). It was the first time for T. macropoda var. bispinosa being reported, and the only difference
312 between this species and T. bispinosa was the larger seed bottom of this species; (2) the second cluster
313 contained six species/taxa (T. japonica, T. mammillifera, T. potaninii, T. pseudoincisa, T. arcuata and T.
314 natans var. baidangensis), which were of the deformed and obvious tubercles and thick husks; (3) the third
315 cluster covered four species (T. litwinowii, T. manshurica, T. kozhevnikovirum and T. sibirica), which
316 possess smooth and tight seed coats and were collected from the Amur River (Figure 8). Among them, T.
317 litwinowii is the only one with two horns. Compared with T. litwinowii, T. manshurica shares the same
318 seed size and seed ornamentations, but T. manshurica has four horns. T. manshurica and T. sibirica share
319 similar morphologic characteristics, but T. sibirica has larger nuts, small crown and longer fruit neck. The
320 seed crown of the three Trapa species (T. litwinowii, T. manshurica and T. sibirica) was large and curled
321 outwardly. T. kozhevnikovirum and T. sibirica have the similar seed size, but T. kozhevnikovirum has a
322 small seed crown and unobvious seed neck.
15 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
323 4. Conclusions
324 This study reported the comparative analysis results of 15 Trapa cp genome sequences with detailed gene
325 annotation. The 15 cp genomes are of the similar quadripartite structure with a high degree of the synteny
326 in gene order, suggesting the high conservation of the cp genome in Trapa. There was no obvious change
327 found in the IR/SC junctions, indicating the junctions were relatively conservative in Trapa. Most of the
328 long repetitive repeats and SSRs were detected in the intergenic and LSC regions, which provided effective
329 cp markers for species identification, genetic relationship analysis and the population genetics. Three
330 phylogenetic analysis approaches (MP, ML and BI methods) consistently supported the 15 Trapa taxa
331 separated into two major evolutionary branches, large- and small-seed branches, which was agreeable to
332 the results of morphological studies. And a close genetic relationship was found between the two
333 small-seed species. The baseline cp genome information of Trapa should be useful for species
334 identification and phylogeographic analysis of Trapa species/taxa, which will be beneficial to the
335 management and utilization of genetic resources of the genus.
336
337 Author contributions
338 FX collected and identified the species of sample, designed the experiments, analyzed the data and wrote
339 the paper. WW performed the experiments, contributed reagents/materials/analysis tools, prepared figures
340 and/or tables and wrote the paper. CY conceived and designed the experiments, reviewed drafts of the
341 paper. GKW contributed reagents/materials/analysis tools. LW and LX help conceived and designed the
342 experiments and reviewed the paper.
343
344 Funding
16 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
345 This work was supported by the National Scientific Foundation of China (31100247), the Talent Program
346 of Wuhan Botanical Garden of the Chinese Academy of Sciences (Y855291B01) and the High-level Talent
347 Training Program of Tibet University (2018-GSP-018).
348
17 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
349 Reference
350 Akao, S., Maeda, K., Hosoi, Y., Nagare, H., Maeda, M., Fujiwara, T. (2013). Cascade utilization of water
351 chestnut: recovery of phenolics, phosphorus, and sugars. Environ. Sci. Pollut. Res. 20, 5373-5378. doi:
352 10.1007/s11356-013-1547-7
353 Angiosperm Phylogeny Group (2003). An update of the Angiosperm Phylogeny Group classification for
354 the orders and families of flowering plants: APG II. Bot. J. Linnean Soc. 141(4), 399-436. doi:
355 10.1046/j.1095-8339.2003.t01-1-00158.x
356 Asaf, S., Khan, A. L., Khan, A. R., et al. (2016). Complete chloroplast genome of Nicotiana otophora and
357 its comparison with related species. Front. Plant Sci., 7, 843. doi: org/10.3389/fpls.2016.00843
358 Campbell, B.T., Williams, V.E., Park, W. (2009). Using molecular markers and field performance data to
359 characterize the Pee Dee cotton germplasm resources. Euphytica 169, 285-301. doi:
360 10.1007/s10681-009-9917-4
361 Cauzsantos, L.A., Munhoz, C.F., et al. (2017). The chloroplast genome of passifora edulis (passiforaceae)
362 assembled from long sequence reads: structural organization and phylogenomic studies in
363 malpighiales. Front. Plant Sci. 8,334. doi: 10.3389/fpls.2017.00334
364 Cauz-Santos, L.A., Munhoz, C.F., Rodde, N., et al. (2017). The chloroplast genome of Passifora edulis
365 (Passiforaceae) assembled from long sequence reads: structural organization and phylogenomic
366 studies in malpighiales. Front. Plant Sci. 8, 334. doi: 10.3389/fpls.2017.00334
367 Chao, X., Dong, W., Li, W., Lu, Y., Xie, X., Jin, X., Shi, J., He, K., Suo, Z. (2017). Comparative analysis
368 of six Lagerstroemia complete chloroplast genomes. Front. Plant Sci. 8, 15. doi:
369 10.3389/fpls.2017.00015
370 Chen, J.R., Ding, B.Y., Funston, M. (2007). Trapaceae. In Flora of China, vol. 13. Science Press Beijing &
18 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
371 Missouri Botanical Garden Press, St. Louis, USA, pp. 290-291.
372 Chen, S., Zhou, Y., Chen, Y., Gu, J. (2018). Fastp: an ultra-fast all-in-one FASTQ preprocessor.
373 Bioinformatics 34, i884–i890. doi: 10.1093/bioinformatics/bty560
374 Chorak, G.M., Dodd L.L., Rybicki N., Ingram K., Buyukyoruk M., Kadono Y., Chen Y.Y., Thum R.A.,
375 (2019). Cryptic introduction of water chestnut (Trapa) in the northeastern United States. Aquat. Bot.
376 155, 32-37. doi: 10.1016/j.aquabot.2019.02.006
377 Ciou, J. Y., Wang, C. C. R., Chen, J., Chiang, P. Y. (2008). Total phenolics content and antioxidant activity
378 of extracts from dried water caltrop (Trapa taiwanensis Nakai) hulls. J. Food Drug Analysis, 16,
379 41e47. doi: 10.38212/2224-6614.2362
380 Cook, C.D.K. (1996). Aquatic plant book. SPB Academic Pub.
381 Darriba, D., Taboada, G., Doallo, R. et al. (2012). jModelTest 2: more models, new heuristics and parallel
382 computing. Nat. Methods 9, 772. doi: 10.1038/nmeth.2109
383 Diao, Z.S. (1990). The morphogenesis in the ontogeny of the family Trapaceae. J.Yuzhou Univ. 3, 1-11. (In
384 Chinese with English abstract)
385 Ding, B.Y., Fang Y.Y. (1991). Study on the pollen morphology of Trapa from Zhejiang. Acta
386 phytotaxonomica sinica. 29(3), 172-177. (In Chinese with English abstract)
387 Doyle, J.J., Doyle, J.L. (1987). A rapid DNA isolation procedure for small quantities of fresh leaf tissue.
388 Phytochem. Bull. 19, 11-15.
389 Eguiluz, M., Rodrigues, N.F., Guzman, F., Yuyama, P., Margis, R. (2017). The chloroplast genome
390 sequence from Eugenia unifora, a Myrtaceae from Neotropics. Plant Syst. Evol. 303, 1199–1212. doi:
391 10.1007/s00606-017-1431-x
392 Fan, X.R., Li, Z., Chu. H.J., Li, W., Liu, Y.L., Chen, Y.Y. (2016). Analysis of morphological plasticity of
19 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
393 Trapa L. from China and their taxonomic significance. Plant Sci. J. 34(3), 340-351. (In Chinese with
394 English Abstract)
395 Gu, C., Ma, L., Wu, Z., Chen, K., Wang, Y. (2019). Comparative analyses of chloroplast genomes from 22
396 Lythraceae species: inferences for phylogenetic relationships and genome evolution within Myrtales.
397 BMC Plant Biol. 19, 281. doi: 10.1186/s12870-019-1870-3
398 Gu, C., Tembrock, L.R, Johnson, N.G., Simmons, M.P., Wu, Z. (2016). The Complete Plastid Genome of
399 Lagerstroemia fauriei and Loss of rpl2 Intron from Lagerstroemia (Lythraceae). PLoS ONE 11(3),
400 e0150752. doi: 10.1371/journal.pone.0150752
401 Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O. (2010). New algorithms and
402 methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst
403 Biol. 59(3), 307-321. doi: 10.1093/sysbio/syq010
404 Gupta, A.K., Beentje, H.J. (2017). Trapa natans, The IUCN Red List of Threatened Species 2017,
405 e.T164153A84299204. doi:10.2305/IUCN.UK, 2017-1, RLTS.T164153A84299204.
406 Hong, Z., Wu, Z., Zhao, K., Yang, Z., Zhang, N., Guo, J., Tembrock, L. R., Xu, D. (2020). Comparative
407 analyses of five complete chloroplast genomes from the genus Pterocarpus (Fabacaeae). Int. J. Mol.
408 Sci. 21(11), 3758. doi: 10.3390/ijms21113758.
409 Howe, C. J., Barbrook, A. C., Koumandou, V. L., Nisbet, R. E. R., Symington, H. A., Wightman, T. F.
410 (2003). Evolution of the chloroplast genome. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci. 358,
411 99-107. doi: org/10.1098/rstb.2002.1176
412 Huang, T., Ding, B.Y., Hu, R.Y., et al. (1996). Cytotaxonomic studies on the genus Trapa in China. In
413 Research and Application of Life Sciences. Hangzhou, Zhejiang University Press, 235-239. (In
414 Chinese)
20 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
415 Hummel, M., Kiviat, E. (2004). Review of World literature on Water Chestnut with implications for
416 management in North America. J. Aquat. Plant Manage. 42, 17-27. doi:
417 10.1560/N0YU-G77C-TLKX-HW5K
418 Jansen, R. K.; Ruhlman, T. A. (2012). Plastid genomes of seed plants. In Genomics of Chloroplasts and
419 Mitochondria; Bock, R., Knoop, V., Eds.; Springer: Dordrecht, The Netherlands.
420 Jiang, W.M., Ding, B.Y. (2004). Genetic relationship among Trapa species assessed by RAPD markers. J.
421 Zhejiang Univ. Agr. Life Sci. 30, 191-196. (In Chinese with English Abstract)
422 Jin, J., Yu, W., Yang, J. et al. (2020). GetOrganelle: a fast and versatile toolkit for accurate de novo
423 assembly of organelle genomes. Genome Biol. 21, 241. doi: 10.1186/s13059-020-02154-5
424 Kak, A.M. (1988). Aquatic and wetland vegetation of western Himalayas. J. Econ. Taxon. Bot. 12,
425 447-451.
426 Katoh, K., and Standley, D. M. (2013). MAFFT multiple sequence alignment software version 7:
427 improvements in performance and usability. Mol. Biol. Evol. 30, 772-780. doi:
428 10.1093/molbev/mst010
429 Kauser, A., Shah, S. M. A., Iqbal, N., et al. (2018). In vitro antioxidant and cytotoxic potential of
430 methanolic extracts of selected indigenous medicinal plants. Progr. Nutr. 20(4), 706-12. doi:
431 10.23751/pn.v20i4.7523
432 Kim, K. J., Lee, H. L. (2004). Complete chloroplast genome sequences from Korean ginseng (Panax
433 schinseng Nees) and comparative analysis of sequence evolution among 17 vascular plants. DNA Res.
434 11(4), 247-261. doi: 10.1093/dnares/11.4.247
435 Kuang, D. Y., Wu, H., Wang, Y. L., Gao, L. M., Lu, L. (2011). Complete chloroplast genome sequence of
436 Magnolia kwangsiensis (Magnoliaceae): implication for DNA barcoding and population genetics.
21 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
437 Genome 54(8), 663-673. doi: 10.1139/g11-026
438 Kumar, S., Stecher, G., Li M., Knyaz, C., Tamura, K. (2018). MEGA X: Molecular Evolutionary Genetics
439 Analysis across computing platforms. Mol. Biol. Evol. 35, 1547-1549. doi: 10.1093/molbev/msy096
440 Kurtz, S., Schleiermacher, C. (1999). REPuter: fast computation of maximal repeats in complete genomes.
441 Bioinformatics 15, 426-427. doi: 10.1093/bioinformatics/15.5.426
442 Librado, P., and Rozas, J. (2009). DnaSP v5: a software for comprehensive analysis of DNA polymorphism
443 data. Bioinformatics 25, 1451-1452. doi: 10.1093/bioinformatics/btp187
444 Lin, C.P., Wu, C.S., Huang, Y.Y., Chaw, S.M. (2012). The complete chloroplast genome of ginkgo biloba
445 reveals the mechanism of inverted repeat contraction. Genome Biol. Evol. 4, 374-381. doi:
446 10.1093/gbe/evs021
447 Lohse, M., Drechsel, O., Bock, R.M. (2007). Organellar genome DRAW (OGDRAW): a tool for the easy
448 generation of high-quality custom graphical maps of plastid and mitochondrial genomes. Curr. Genet.
449 52, 267-274. doi: 10.1007/s00294-007-0161-y
450 Lu, R. S., Pan, L., Qiu, Y. X. (2017). The complete chloroplast genomes of three Cardiocrinum (Liliaceae)
451 species: comparative genomic and phylogenetic analyses. Front. Plant Sci. 7: 2054. doi:
452 10.3389/fpls.2016.02054
453 Naumann, J., Der, J. P., Wafula, E. K., Jones, S. S., Wagner, S. T., Honaas, L. A., Ralph, P. E., Bolin, J. F.,
454 Maass, E., Neinhuis, C. (2016). Detecting and characterizing the highly divergent plastid genome of
455 the nonphotosynthetic parasitic plant Hydnora visseri (Hydnoraceae). Genome Biol. Evol. 8, 345-363.
456 doi: 10.1093/gbe/evv256
457 Nazareno, A. G., Carlsen, M., Lohmann, L. G. (2015). Complete chloroplast genome of Tanaecium
458 tetragonolobum: the first Bignoniaceae plastome. PLoS ONE 10(6), e0129930. doi:
22 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
459 10.1371/journal.pone.0129930
460 Nguyen, P. A. T., Kim, J. S., Kim, J. H. (2015). The complete chloroplast genome of colchicine plants
461 (Colchicum autumnale L. and Gloriosa superba L.) and its application for identifying the genus.
462 Planta 242, 223-237. doi: 10.1007/s00425-015-2303-7
463 Qu, X., Moore, M.J., Li, D. et al. (2019). PGA: a software package for rapid, accurate, and flexible batch
464 annotation of plastomes. Plant Methods 15, 50. doi: 10.1186/s13007-019-0435-7
465 Raubeson, L.A., Peery, R., Chumley, T.W., Dziubek, C., Fourcade, H.M., Boore, J.L., Jansen, R.K. (2007).
466 Comparative chloroplast genomics: analyses including new sequences from the angiosperms Nuphar
467 advena and Ranunculus macranthus. BMC Genomics 8, 174. doi: org/10.1186/1471-2164-8-174
468 Ronquist, F., Teslenko, M., van der Mark, P. et al. (2012). MrBayes 3.2: efficient Bayesian phylogenetic
469 inference and model choice across a large model space. Syst. Biol. 61, 539-542. doi:
470 10.1093/sysbio/sys029
471 Ruhlman, T.A., Jansen, R.K. (2014). The plastid genomes of flowering plants. In: Maliga P. (eds)
472 Chloroplast Biotechnology. Methods in Molecular Biology (Methods and Protocols) 1132. Humana
473 Press, Totowa, NJ. doi: org/10.1007/978-1-62703-995-6_1
474 Schattner, P., Brooks, A.N., Lowe, T.M. (2005). The tRNAscan-SE, snoscan and snoGPS web servers for
475 the detection of tRNAs and snoRNAs. Nucleic Acids Res. 33, 686-689. doi: 10.1093/nar/gki366
476 Sharp, P. M., Li, W. H. (1987). The codon Adaptation Index--a measure of directional synonymous codon
477 usage bias, and its potential applications. Nucleic Acids Res. 15, 1281-1295. doi:
478 10.1093/nar/15.3.1281
479 Shaw, J., Lickey, E. B., Schilling, E. E., Small, R. L. (2007). Comparison of whole chloroplast genome
480 sequences to choose noncoding regions for phylogenetic studies in angiosperms: the tortoise and the
23 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
481 hare III. Am J Bot. 94(3), 275-88. doi: 10.3732/ajb.94.3.275
482 Sloan, D.B., Triant, D.A., Forrester, N.J., Bergner, L.M., Wu, M., Taylor, D.R. (2014). A recurring
483 syndrome of accelerated plastid genome evolution in the angiosperm tribe Sileneae (Caryophyllaceae).
484 Mol. Phylogenet Evol. 2014 Mar;72:82-9. doi: 10.1016/j.ympev.2013.12.004
485 Sun, F.F., Yin, Y., Xue, B., Zhou, R., Xu, J. (2020). The complete chloroplast genome sequence of Trapa
486 bicornis Osbeck (Lythraceae), Mitochondrial DNA Part B, 5, 2746-2747. doi:
487 10.1080/23802359.2020.1788432
488 Sweta, Bauddh, K., Singh, R., Singh, R.P. (2015). The suitability of Trapa natans for phytoremediation of
489 inorganic contaminants from the aquatic ecosystems. Ecol. Eng. 83, 39-42. doi:
490 10.1016/j.ecoleng.2015.06.003
491 Takano, A., Kadono, Y., 2005. Allozyme variations and classifcation of Trapa (Trapaceae) in Japan. Aquat.
492 Bot. 83, 108–118. doi: 10.1016/j.aquabot.2005.05.008
493 Thiel, T., Michalek, W., Varshney, R., Graner, A. (2003). Exploiting EST databases for the development
494 and characterization of gene-derived SSR markers in barley (Hordeum vulgare L.). Theor. Appl.
495 Genet. 106, 411-422. doi: 10.1007/s00122-002-1031-0
496 Tillich, M., Lehwark, P., Pellizzer, T., Ulbricht-Jones, E.S., Fischer, A., Bock, R., Greiner, S. (2017).
497 GeSeq-versatile and accurate annotation of organelle genomes. Nucleic Acids Res. 45(W1), W6-W11.
498 doi: 10.1093/nar/gkx391
499 Tutin, T.G. (1968). Flora Europaea. Cambridage University Press, Cambridage, 303-452.
500 Vassiljev, V. (1965). Species novae Africanicae generis Trapa L.. Nov. Sist. Vyss. Rast. 32, 175-194.
501 Vassiljev, V.N. (1949). Water Caltrops-Hydrocaryaceae Raimann. Flora of the USSR: Vol. 15 Moscow:
502 Publishing House of As of USSR, 637-662.
24 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
503 Wagutu, G.K., Fan, X., Wang, W., Li, W., Chen Y. (2021). The complete chloroplast genome sequence of
504 Trapa kozhevnikovirum Pshenn. (Lythraceae). Mitochondrial DNA Part B: Resources. (Under review)
505 Wan, W.H., 2000. Trapaceae. Flora Republicae Popularis Sinicae, Vol. 53. Science Press, Beijing, China,
506 pp. 1–26 (In Chinese)
507 Wang, L.J., Ding, B.Y., (1997). Study on the chromosomes of three Chinese Trapa species. Ningbo
508 Agricultural Science and Technology 1, 7-9. (In Chinese)
509 Wang, W., Fan, X., Li, X., Chen Y. (2021). The complete chloroplast genome sequence of Trapa incisa
510 Sieb. & Zucc (Lythraceae). Mitochondrial DNA Part B: Resources. (Under review)
511 Xiong, Z.T., Huang, D.S., Wang, H.Q., Sun X.Z. (1990). Numrical taxonomic studies in Trapa in Hubei .
512 Numerical evaluations of taxonomic characters. Plant Sci. J. 8(1), 47-52. (In Chinese with English
513 abstract)
514 Xiong, Z.T., Wang, H.Q., Sun X.Z. (1985a). Numrical taxonomic studies in Trapa in Hubei I. Plant Sci. J.
515 3, 45-53. (In Chinese with English abstract)
516 Xiong, Z.T., Wang, H.Q., Sun X.Z. (1985b). Numrical taxonomic studies in Trapa in Hubei II. Plant Sci. J.
517 3, 157-164. (In Chinese with English abstract)
518 Xu, C., Dong, W., Li, W. et al. (2017). Comparative analysis of six Lagerstroemia complete chloroplast
519 genomes. Front. Plant Sci. 8, 15. doi: 10.3389/fpls.2017.00015
520 Xu, L., Cheng, S., Zhuang, P. et al. (2020). Assessment of the nutrient removal potential of floating native
521 and exotic aquatic macrophytes cultured in Swine Manure Wastewater. Int. J. Environ. Res. Public
522 Health 17, 1103. doi: 10.3390/ijerph17031103
523 Xue, J.H., Xue, Z.Q., Wang, R.X., Rubtsova, T.A., Pshennikova, L.M., Guo, Y.M. (2016). Didtribution
524 pattren and morphological diversity of Trapa L. in the Heilong and Tumen River Basin. J. Plant Sci.
25 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
525 34, 506-520. (In Chinese with English abstract)
526 Xue, Z.Q., Xue, J.H., Victorovna, K.M., Ma, K.P. (2017). The complete chloroplast DNA sequence of
527 Trapa maximowiczii Korsh. (Trapaceae), and comparative analysis with other Myrtales species. Aquat.
528 Bot. 143, 54-62. doi: 10.1016/j.aquabot.2017.09.003
529 Yan S. Z. (1983). Aquatic macrophytes of China. Science press, Beijing.
530 Yan, X., Liu, T., Yuan, X., Xu, Y., Yan, H., Hao, G. (2019). Chloroplast genomes and comparative analyses
531 among thirteen taxa within Myrsinaceae s.str. Clade (Myrsinoideae, Primulaceae). Int. J. Mol. Sci.
532 20(18), 4534. doi: 10.3390/ijms20184534
533 Yang, J., Vázquez, L., Chen, X., Li, H., Zhang, H., Liu, Z., et al. (2017). Development of chloroplast and
534 nuclear DNA markers for Chinese oaks (Quercus subgenus Quercus) and assessment of their utility as
535 DNA barcodes. Front. Plant Sci. 8, 816. doi: 10.3389/fpls.2017.0081
536 Yang, Z., Zhao, T., Ma, Q., Liang, L., Wang, G. (2018). Comparative genomics and phylogenetic analysis
537 revealed the chloroplast genome variation and interspecific relationships of Corylus (Betulaceae)
538 Species. Front. Plant Sci. 9. doi: org/10.3389/fpls.2018.00927
539 Yu, H., Shen, S. (2015). Phenolic composition, antioxidant, antimicrobial and antiproliferative activities of
540 water caltrop pericarps extract. LWT-Food Sci. Technol. 61, 238-243. doi: 10.1016/j.lwt.2014.11.003
541 Zhang, C. Y., Liu, T. J., Mo, X. L., et al. (2020a). Comparative analyses of the chloroplast genomes of
542 patchouli plants and their relatives in Pogostemon (Lamiaceae). Plants, 2020, 9(11), 1497. doi:
543 10.3390/plants9111497
544 Zhang, F., Wang, T., Shu, X., Wang, N., Zhuang, W., Wang, Z. (2020b). Complete chloroplast genomes and
545 comparative analyses of L. chinensis, L. anhuiensis, and L. aurea (Amaryllidaceae). Int. J. Mol. Sci.
546 21(16), 5729. doi: 10.3390/ijms21165729
26 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
547 Zheng, G., Wei, L., Ma, L., Wu, Z., Chen, K. (2020). Comparative analyses of chloroplast genomes from
548 13 Lagerstroemia (Lythraceae) species: identification of highly divergent regions and inference of
549 phylogenetic relationships. Plant Mol. Biol. 102, 659-676. doi: 10.1007/s11103-020-00972-6
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
27 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
569 Figure legends
570 Figure 1 Structural map of the Trapa chloroplast genome. Genes drawn inside the circle are transcribed
571 clockwise, and those outside are counterclockwise. Small single copy (SSC), large single copy (LSC), and
572 inverted repeats (IRa, IRb) are indicated. Genes belonging to different functional groups are color-coded.
573
574 Figure 2 Codon contents of 20 amino acids and stop codon of coding genes of Trapa chloroplast genome.
575 Color of the histogram corresponds to the color of codons.
576
577 Figure 3 Sequence alignment of whole chloroplast genomes using the Shuffle LAGAN alignment
578 algorithm in mVISTA. Trapa bicornis was chosen to be the reference genome. The vertical scale indicates
579 the percentage of identity, ranging from 50 to 100%.
580
581 Figure 4 Comparison of junctions between the LSC, SSC, and IR regions among 15 species. Distance in
582 the figure is not to scale. LSC Large single-copy, SSC Small single-copy, IR inverted repeat.
583
584 Figure 5 Number of long repetitive repeats on the complete chloroplast genome sequence of 15 Trapa
585 species. (a) frequency of the repeats more than 30 bp, (b) frequency of repeat types.
586
587 Figure 6 The comparison of simple sequence repeats (SSRs) distribution in 15 chloroplast genomes. (a)
588 frequency of common motifs; (b) number of different SSR types.
589
590 Figure 7 The nucleotide variability (Pi) value in the 15 Trapa chloroplast genomes. (a) Intergenic regions.
28 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
591 b Protein-coding genes. These regions are arranged according to their location in the chloroplast genome.
592
593 Figure 8 The phylogenetic tree is based on 21 complete chloroplast genome sequences using Maximum
594 likelihood (ML), Maximum parsimony (MP) and Bayesian inference (BI). The number above the lines
595 indicates bootstrap values for ML and MP and posterior probabilities for BI of the phylogenetic analysis
596 for each clade.
597
29 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.