bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Comparative analyses and phylogenetic relationships of 15 Trapa (Trapaceae) species

2 based on Complete Chloroplast Genomes

3 Xiangrong Fan a, b +, Wuchao Wang c, d +, Godfrey K. Wagutu c, d, Wei Li c, e, Xiuling Li f *,

4 Yuanyuan Chen c, e *

5

6 a College of Science, Tibet University, Lhasa 850000, Tibet Autonomous Region, P. R.

7 b Research Center for Ecology and Environment of Qinghai-Tibetan Plateau, Tibet University, Lhasa

8 850000, Tibet Autonomous Region, P. R. China

9 c Hubei Key Laboratory of Wetland Evolution & Ecological Restoration, Wuhan Botanical Garden,

10 Chinese Academy of Sciences, Wuhan 430074, Hubei, P. R. China

11 d University of Chinese Academy of Sciences, Beijing 100049, P. R. China

12 e Key Laboratory of Aquatic Botany and Watershed Ecology, Wuhan Botanical Garden, Center of

13 Ecology, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan 430074, P. R. China

14 f College of Life Science, Linyi University, Linyi 276000, Shandong, P. R. China

15

16 + The authors are equally contributing to the study

17 * Correspondence:

18 Yuanyuan Chen, Key Laboratory of Aquatic Botany and Watershed Ecology, Wuhan Botanical Garden,

19 Center of Plant Ecology, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan 430074, P. R.

20 China. Email: [email protected]

21 Xiuling Li, College of Life Science, Linyi University, Linyi 276000, Shandong, P. R. China. Email:

22 [email protected]

23

1 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

24 Abstract

25 Trapa L. is floating-leaved aquatic with important economic and ecological values. However,

26 species identification and phylogenetic relationship are still unresolved for Trapa. In this study, complete

27 chloroplast genomes of 13 Trapa species/taxa were sequenced and annotated. Combined with released

28 sequences of the other two species, comparative analysis of cp genomes was first performed on the 15

29 Trapa species/taxa. The 15 cp genomes exhibited quadripartite structures with medium size of 155,

30 453-155, 559 bp. IR/SC junctions were conservative with no obvious change found. Long repetitive

31 repeats and SSRs were mostly detected in the intergenic and LSC regions, providing useful plastid markers

32 for species and relationship identification. Three phylogenetic analyses (MP, ML and BI) consistently

33 showed two clusters within Trapa, including large- and small-seed species/taxa, respectively. This study

34 provided the baseline information for phylogeography of Trapa, which would facilitate the management

35 and utilization of genetic resources of the .

36

37 Keywords: Trapa; complete chloroplast genomes; comparative analysis; species identification;

38 phylogeography

39

2 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

40 1. Introduction

41 Trapaceae, containing the only genus Trapa, is annual floating-leaved aquatic herbs naturally distributed in

42 tropical, subtropical and temperate regions of Eurasia and Africa, and invading and

43 (Chen et al., 2007). APG II (The Angiosperm Phylogeny Group, 2003) equated Trapaceae with

44 . However, a handful of morphological differences exist between the two families. For example,

45 flowers of Trapaceae are solitary, 4-merous and actinomorphic, with half-inferior and slightly perigynous

46 ovaries; Lythraceae has racemes or cymes, and the flowers are usually 4-, 6- or 8-merous, regular or

47 irregular, with obvious perigynous ovaries. Therefore, Trapaceae is still be used today by some researchers

48 (Chen et al., 2007). Trapa has important edible value because high content of of seeds, which has

49 been widely cultivated as an important aquatic crop in China and India (Hummel and Kiviat, 2004). Trapa

50 seed pericarps were traditional herb medicines in China, and recent studies found that the extraction of

51 seed pericarps had bioactive components to restrain cancer, atherosclerosis, inflammation and oxidation

52 (Akao et al., 2013; Ciou et al., 2008; Yu and Shen, 2015; Li et al., 2017; Kauser et al., 2018). Additionally,

53 Trapa plants can be used to purify water body due to their excellent performance in absorbing heavy

54 metals and nutrients (Sweta et al., 2015; Xu et al., 2020). Therefore, Trapa has important economic and

55 ecological values. However, many Trapa species are becoming endangered or even locally extinct due to

56 human interferences in and China (Gupta and Beentje, 2017; Chen et al., field observations).

57 Conversely, Trapa plants were notorious intruders in Canada and the northeastern .

58 The knowledge of accurate species identification and phylogenetic relationship among species is

59 essential to effectively protect and utilize genetic resources of wild plants and to manage and remove

60 invasive plants (Campbell et al., 2009; Chorak et al., 2019). However, due to the large range of

61 morphological variation and the lack of effective diagnostic criteria, researchers have held sharp different

3 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

62 opinions on taxonomic classification of Trapa, with 1 or 2 polymorphic species or more than 20, 30 or 70

63 species within the genus (Cook, 1996; Vassiljev, 1949, 1965; Tutin, 1968; Yan, 1983; Kak, 1988; Chen et

64 al., 2007). Meanwhile, the phylogenetic relationship for Trapa species was unresolved by the efforts of

65 pollen morphology, cytology, quantitative classification and ontogenesis (Diao et al., 1990; Ding et al.,

66 1991; Huang et al., 1996; Wang and Ding, 1997; Fan et al., 2016). For example, Xiong et al. (1985a, b)

67 proposed that T. acornis Nakano and T. bispinosa Roxb were closely related based on the results of

68 quantitative classification, while Ding et al. (1991) showed that instead of T. bispinosa, T. quadrispinosa

69 Roxb was closed related to T. acornis based on their pollen morphology. And studies of quantitative

70 classification showed that the Trapa species/taxa with seeds of similar size evolved closely (Xiong et al.,

71 1990; Fan et al., 2016), which was proved by the results of allozymes (Takano and Kadono, 2005).

72 Additionally, molecular methods further illustrated that the Trapa species with small seeds, T. incisa and T.

73 maximowiczii, were of basal classification status within the genus (Jiang and Ding, 2004; Fan et al., 2021).

74 For the large-seed Trapa species, a close relationship was found between T. bispinosa and T. quadrispinosa

75 based on RAPDs and AFLPs (Jiang and Ding, 2004; Li et al., 2017; Fan et al., 2021). It is worth noting

76 that there are two diversity centers in Trapa genus, one distributed in the mid-lower reaches of Yangtze

77 River, and the other one located in the Tumen River and Amur River basins from bordering region of

78 Russia and China (Xue et al., 2016, 2017). However, most previous studies involved fewer species/taxa,

79 mostly collected from the mid-lower reaches of Yangtze River, and genotyped by molecular markers or

80 nuclear genome sequencing (Li et al., 2017; Fan et al., 2021). Researches about chloroplast genome were

81 rarely involved (Xue et al., 2017; Sun et al., 2020).

82 Chloroplast (cp) is a core organelle in plants for photosynthesis (Howe et al., 2003). In angiosperms,

83 the cp genome is haploid, maternal-inherited and shaped into a DNA circle with conserved quadripartite

4 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

84 structure. Additionally, the cp genome has slow evolutionary rate, high copy numbers per cell and compact

85 size with 120-170kb in length (Ruhlman and Jansen, 2014). Those characteristics of cp genome, combined

86 with the development of high-throughput technology, make sequencing of the complete cp genome an

87 ideal tool for species identification and plant phylogenic evaluation (Cauz-Santos et al., 2017; Hong et al.,

88 2020; Yang et al., 2018). To date, whole sequences of six Trapa chloroplast genomes (T. natans, T. bicornis,

89 T. quadrispinosa, T. kozhevnikovirum, T. incisa and T. maximowiczii) have been published, but there have

90 been no researches involving comparative analysis of the Trapa cp genomes. For Trapa, more efforts

91 should be made to analyze the interspecific difference and assess the phylogenic relation based on the

92 complete cp sequences.

93 Besides the wild species, Trapa also has a large number of cultivated species or varieties because of

94 its long history of cultivation. The comprehensive analysis of cp genomes focused on wild Trapa, and the

95 cultivated species or varieties would be included in another research. In this study, we sequenced the whole

96 cp genomes of 13 wild Trapa species/taxa. Combined with the two published cp genomes, comprehensive

97 analysis was first carried out within Trapa genus based on the 15 cp genomes data. Our specific goals were

98 as follows: (1) to compare the chloroplast structures within Trapa; (2) to detect the highly variable regions

99 as DNA barcoding markers for Trapa species identification; (3) to infer the phylogenetic relationship

100 among Trapa species. This study will provide the baseline information for phylogeography within Trapa

101 and facilitate the management and utilization of genetic resources of Trapa.

102

5 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

103 2. Materials and methods

104 2.1 Plant Materials and DNA Extraction

105 In the autumns of 2018 and 2019, young and healthy leaves were collected from 13 Trapa species/taxa.

106 Among them, 10 species/taxa were recorded in Chinese Flora Republicae Popularis Sinicae (T. bispinosa,

107 T. quadrispinosa, T. japonica, T. mammillifera, T. macropoda var. bispinosa, T. litwinowii, T. arcuata, T.

108 pseudoincisa, T. manshurica, and T. maximowiczii; Wan, 2000); two species (T. potaninii and T. sibirica)

109 were first recorded in Floral of USSR (Vassiljev, 1949). A new Trapa taxon was collected from Baidang

110 Lake, Anhui province, China. The seeds of the taxon have two horns with the height from 23.4 to 34.8 mm

111 and the width from 13.4 to 18.3 mm. The horns of the new taxon were wide and drooping, looking like a

112 pig's ears. Based on the size and the sample site, the taxon was named Trapa natans var. baidangensis.

113 All voucher specimens were deposited in the herbarium of Wuhan Botanical Garden (HIB; Table 1). The

114 fresh leaves were cleaned and dried in silica gel immediately. Genomic DNA was extracted from the dried

115 leaves according to the CTAB protocol described by Doyle (1987). The DNA concentration and quality

116 was quantified by the NanoDrop 2000 micro spectrophotometer (Thermo Fisher Scientific).

117 2.2 Chloroplast Genome Sequencing and Assembling

118 High quality DNA was used to build the genomic libraries. Sequencing was performed using paired end

119 150bp (average short-insert about 350 bp) on Illumina 2500 (Illumina, San Diego, California, USA) at

120 Beijing Novogene bio Mdt InfoTech Ltd (Beijing, China). To get the high quality clean data, Fastp (Chen

121 et al., 2018) was run to cut and filter the raw reads with default settings. For the 13 Trapa species/taxa

122 sequenced, more than 5G clean data for each sample was generated after removing adapters and low

123 quality reads, and the number of clean reads ranged from 5.22 G (T. mammillifera) to 6.06 G (T. bispinosa).

124 De novo assembly was carried out using the assembler GetOrganelle v1.7 (Jin et al., 2020) with the default

6 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

125 setting. The software Geneious primer (Biomatters Ltd., Auckland, New Zealand) was employed to align

126 the contigs and determine the order of the newly assembled plastomes, with T. bicornis genome sequence

127 as a reference. All the annotated cp sequence data reported here were deposited in GenBank with accession

128 numbers shown in Table 1.

129 2.3 Annotation and codon usage

130 We used the genome annotator PGA (Qu et al., 2019) and software GeSeq (Tillich et al., 2017) to annotate

131 protein coding genes, tRNAs and rRNAs, according to the references of T. quadrispinosa (MT_941481)

132 and T. maximowiczii (NC_037023). Manual correction was carried out to locate the start and stop codons

133 and the boundaries between the exons and introns. With tRNAscan-SE v1.21, BLASTN searches were

134 further performed to confirm the tRNA and rRNA genes (Schattner et al., 2005). The physical maps of cp

135 genomes were generated by OGDRAW (Lohse et al., 2007).

136 The relative synonymous codon usage (RSCU) was the ratio of the frequency of a particular codon to

137 the expected frequency of that codon, which was obtained by DAMBE v6.04 (Cauzsantos et al., 2017).

138 When the value of RSCU is larger than 1, the codon is used more often than expected. Otherwise, when the

139 RSCU value < 1, the codon is less used than expected (Sharp et al., 1987).

140 2.4 Comparative genomic analyses

141 Comparative genomic analyses were carried out among the 15 wild Trapa species/taxa, which included the

142 13 species/taxa newly sequenced, and two published ones (T. kozhevnikovirum and T. incisa) (Wang et al.,

143 2021; Wagutu et al., 2021). The published cp genomes were downloaded from the National Center for

144 Biotechnology Information (NCBI) organelle genome database (https://www.ncbi.nlm.nih.gov).

145 The mVISTA program in Shuffle-LAGAN mode was used to compare the complete cp genome within

146 the 15 species/taxa, with the annotation of T. quadrispinos as a reference (MT_941481). After a manual

7 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

147 alignment using MEGA X, all regions, including coding and non-coding regions, were extracted to detect

148 the hyper-variable sites (Kumar et al., 2018). The nucleotide variability (Pi) was computed using DnaSP

149 5.10 (Librado and Rozas, 2009).

150 2.5 Analysis of repeat sequences and SSRs

151 Repeat sequences, including forward, palindromic, reverse and complement repeats, were detected by

152 REPuter (Kurtz and Schleiermacher, 1999). The parameters were set with repeat size of ≥30 bp and 90% or

153 greater sequence identity (hamming distance of 3).

154 Simple sequence repeats (SSRs) were identified using MISA perl script (Thiel et al., 2003), with the

155 threshold number of repeats set as 10, 5, 4, 3, 3, and 3 for mono-, di-, tri-, tetra-, penta-, and

156 hexa-nucleotide SSRs, respectively.

157 2.6 Phylogenetic analyses

158 Phylogenetic analyses based on the complete chloroplast genomes were carried out for the 18 species/taxa,

159 including the 13 newly sequenced species/taxa and five published ones (T. bicornis, T. natans, T.

160 maximowiczii, T. insica and T. kozhevnikovirum). Considering the close relationship between Trapaceae

161 and Sonneratiaceae/Lythraceae (Sun et al., 2020), Sonneratia alba (Sonneratiaceae) and two Lagerstroemia

162 species (L. calyculata and L. intermedia, Lythraceae) were included as outgroups. The whole cp genome of

163 the five published Trapa species and the three species as outgroups were downloaded from Genbank.

164 The sequences were aligned using program Mafft 7.0 (Katoh and Standley, 2013) with default

165 parameters. Then, the F81+G was selected as the best-fit model of nucleotide substitution by software

166 Jmodeltest 2 (Darriba et al., 2012). The phylogenetic trees were constructed using three methods: (1) A

167 maximum likelihood (ML) tree was performed using PhyML v.3.0 (Guindon et al., 2010) with 5000

168 bootstrap replications. The result was visible by the software Figtree v1.4

8 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

169 (https://github.com/rambaut/figtree/releases); (2) The Maximum Parsimony (MP) tree was run by the mega

170 X (Kumar et al., 2018) with 5000 bootstrap values; (3) Bayesian inference (BI) tree was built by the

171 MrBayes v. 3.2.6 (Ronquist et al., 2012) with 2,000,000 generations and sampling every 5000 generations.

172 The first 25% of all trees were regarded as “burn-in” and discarded, and the Bayesian posterior

173 probabilities (PP) were calculated from the remaining trees.

174

175 3. Results and discussion

176 3.1 Structure and content of Chloroplast Genome

177 Previous studies showed that the length of cp genome ranged from 120-170kb (Ruhlman and Jansen, 2014).

178 In the present study, the cp genomes of the 15 Trapa species/taxa have medium size with 155,453-155,559

179 bp, compared to the recent study from Lythraceae with 152,049-160,769 bp (Xu et al., 2017; Zheng et al.,

180 2020). The two species with small seeds showed the smallest cp genome size with 155, 453 for T.

181 maximowiczii and 155, 477 for T. incisa. For the 13 Trapa species/taxa with large seeds, the cp genomes of

182 the three species/taxa (T. bispinosa, T. quadrispinosa and T. macropoda var. bispinosa) showed the relative

183 small size with 155, 485-155, 495 bp, and the others exhibited similar size with 155, 535 bp (T. litwinowii)

184 - 155, 559 bp (T. mammillifera and T. natans var. baidangensis) (Table 2).

185 Unlike those of the non-photosynthetic (parasitic and mycoheterotrophic) plants, the plastid genomes

186 of green plants are usually highly conserved with the quadripartite structure (Naumann et al., 2016). The

187 typical quadripartite structure was also found in Trapa, which consisted of a pair of inverted repeat regions

188 (IRs) (26, 380 – 26, 388 bp) separated by the small single copy (SSC) (18, 265 – 18, 279 bp) and large

189 single copy (LSC) regions (88, 398 – 88, 512 bp). This suggested the difference in cp genome size of wild

190 Trapa species mainly occurred in LSC region.

9 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

191 The 15 Trapa speices/taxa showed the identical level of GC content with the total content 36.40-

192 36.41%, which was close to that of Lagerstroemia (37.57 - 37.72%, Zheng et al., 2020) and Lythraceae

193 (36.41 - 37.72%, Gu et al., 2019). Previous studies detected that the GC content varied greatly among

194 different regions of cp genome. High level of GC content was generally contained in rRNA genes which

195 were located in IR regions (Zheng et al., 2020). Similarly, for Trapa, the GC content of rRNA genes

196 reached to 55.48%, which resulted in the highest GC content in the IR regions (42.77%) compared with

197 that of the other two regions (LSC, 34.17 - 34.20%; SSC, 30.17 - 30.21%) (Figure 1; Table 4).

198 There were 130 genes annotated in the cp genomes of the 13 Trapa species/taxa with large seeds,

199 compared with the 129 genes in the two small-seed Trapa species (T incisa and T. maximowiczii). Previous

200 studies showed that the number of PCGs and tRNAs varied in different species, but the number of rRNA

201 was stable (Zhang et al., 2020b), which was proved in the present study. The cp genomes of the 13 Trapa

202 species/taxa with large seeds contained 85 protein-coding genes (PCGs), 37 tRNA genes and 8 rRNA

203 genes. In comparison, the two Trapa species with small seeds had 83 PCGs, 38 tRNA genes and 8 rRNA

204 genes. The two small-seed species lost the gene ycf3 related to the conserved open reading frames and one

205 rps12 for self-replication. Among the unique genes, 44 genes were related to photosynthesis and 59 genes

206 were associated with self-replication (Table 3).

207 3.2 Hypervariable Regions and Comparative Genomic Analysis

208 Except the gene space, the remaining regions were non-coding sequences, comprising intergenic spacers

209 (IGS) regions, introns and pseudogenes (Lu et al., 2017). In this study, the intergenic regions and introns

210 among the 15 Trapa species/taxa ranged from 46, 592 to 46, 812 bp and from 15, 957 to 18, 852 bp,

211 respectively. IGS and introns generally have higher nucleotide substitution rates than the protein-coding

212 regions, which yield many phylogenetically informative sites (Shaw et al. 2007; Jansen and Ruhlman,

10 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

213 2012). In this study, high level of the nucleotide diversity (Pi) values was found in the non-coding regions

214 (0 - 0.0123, with average 0.000857) compared with those of the coding regions (0 - 0.00282, with average

215 0.000217) (Figure 7).

216 Previous studies exhibited a negative correlation between the GC content and the variability of cp

217 genome sequences (Zhang et al., 2020a; Zheng et al., 2020). This was proved in the present study. Most

218 variable regions for Trapa were located in IGS regions with the lowest GC content (30.28-30.42%) (Table

219 4). And the highest Pi values (0.00416-0.0123) were found in the IGS regions, including ndhE—ndhG,

220 psbA—trnK-UUU, psbK—psbI, rps2—rpoC2 and atpA—atpF (Figure 7). Therefore, IGS is often used as

221 DNA barcoding markers in many species (Nguyen et al., 2015). It is worth noting that IGS regions were

222 mostly distributed in the LSC region. Among the 20 highest Pi values, 16 divergence hotspots with low

223 sequence identity were found, which included 11 regions (psaC, clpP, psbE, atpF, rps2—rpoC2,

224 psbK—psbI, psbA—trnK-UUU, trnS-GCU—trnG-UCC, trnH-GUG—psbA, trnT-GGU—psbD and

225 accD-psaI) in the LSC and 5 regions (ndhD, ndhI, ycf1, ndhF and ndhE—ndhG) in the SSC (Figure 3).

226 We found 15/16 genes containing one or two introns for the small-seed/large-seed species, consisting

227 of 6 tRNA genes (trnK-UUU, trnA-UGC, trnI-GAU, trnG-UCC, trnV-UAC and trnL-UAA) and 8

228 protein-coding genes (PCGs) (rps16, rpoC1, atpF, ndhB, ndhA, petD, petB and rpl16) with one intron, and

229 one PCG clpP with two introns for the small-seed Trapa species, but two PCGs (ycf3 and clpP) for the

230 large-seed ones. The introns of the PCGs shared the similar position and length in each of the 15 Trapa

231 taxa (Table 5), suggesting the high conservation of cp genome within Trapa. The rpl2 intron loss was

232 considered as one of the important evolutionary event in Lythraceae, which was presumed to occur after

233 the divergence of Lythraceae from Onagraceae (Chao et al., 2017; Gu et al., 2016, 2019; Zheng et al.,

234 2020). Like Lythraceae, the 15 Trapa species/taxa exhibited the loss of rpl2 intron, which might suggest a

11 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

235 close genetic relationship between Lythraceae and Trapaceae.

236 The gene contains several internal stop codons, which tends to be a pseudogene (Lu et al., 2017;

237 Sloan et al., 2014). Alternatively, if the sequence was conserved over broad evolutionary distances and lack

238 of internal stop, it tends to be assumed as functional protein-coding genes (Raubeson et al., 2007). In the

239 present study, though the ycf3 gene was only detected in the large-seed Trapa taxa, none internal stop

240 codon was found within the gene. Therefore, instead of a pseudogene, the ycf3 appears as a protein-coding

241 gene.

242 3.3 Repeat Structure and cpSSR

243 For cp genomes, long repeats play an important role in the sequence rearrangement and recombination,

244 which could be as modules of large assemblies and participate in catalytic function or protein binding

245 (Nazareno et al. 2015; Zheng et al., 2020). A total of 595 long repetitive repeats (30 - 65bp) were identified

246 from the 15 Trapa taxa, consisting of 324 forward, 230 palindromic, 25 reverse and 16 complementary

247 repeats. For the genus Trapa, the size of top three most frequently shown long repeats was 30bp, 31bp and

248 65bp, which was found 227, 77 and 60 times, respectively. For each species/taxon, the number of long

249 repeats varied from 37 (T. incisa and T. maximowiczii) to 41 (T. sibirica); and the number of forward,

250 palindromic, reversed and complementary repeats were 19-24, 15-16, 0-3 and1-2, respectively (Figure 5).

251 Additionally, most long repetitive repeats were distributed in intergenic areas, and a few existed in shared

252 genes, such as ycf2, which was analogous to the results of other angiosperm lineages (Yang et al., 2017;

253 Zheng et al., 2020).

254 Due to a high polymorphism rate at the intra- and inter-species level, cp genome SSRs have been

255 viewed as excellent molecular markers in population genetics and phylogenetic research (Xue et al., 2017;

256 Zheng et al., 2020). In this study, for each species/taxon, the number of total SSRs was from 110 (T.

12 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

257 quadrispinosa) to 123 (T. incisa and T. maximowiczii). Most cp SSRs, from 83.48% (T. japonica, T.

258 manshurica and T. litwinowii) to 86.18% (T. maximowiczii), were distributed in the LSC regions, which

259 was higher than that in the genus Lagerstroemia (66.55%; Xu et al., 2017) and Myrsinaceae (74.37%; Yan

260 et al., 2019). Among these SSRs, the mononucleotide A/T repeat units occupied the highest proportion

261 with 78.18 - 80.49%, and the second and third highest proportions were dinucleotide repeats (AC/GT and

262 AT/AT) and tetranucleotide repeats (AAAG/CTTT, AAAT/ATTT, AACC/GGTT, AAGT/ACTT,

263 AATT/TTAA, ACAT/ATGT and AGAT/ATCT) accounting for 9.40 - 10.00% and 8.13 - 9.09%,

264 respectively (Figure 6). The observed high AT content in cp SSRs was also found in the other genera (Asaf

265 et al., 2016; Chao et al., 2017; Kuang et al., 2011)

266 3.4 Boundaries of IR regions and codon usage

267 For angiosperms, the boundaries between IR regions and single-copy (SC) regions result in the difference

268 of genome size by expansion or shrinkage (Kim and Lee, 2004; Lin et al., 2012). In this study, the

269 junctions of IR/SC regions were compared among the 15 Trapa species\taxa. The cp genomes of the 15

270 Trapa plants showed many minor differences in the boundary regions although the number and order of

271 genes were highly conserved. The rps19 gene spanned the LSC/IRb border and extended into the IRb

272 region with the length 75-83 bp (except T. manshurica with 75 bp, the others were 83 bp). The ycf1 gene

273 covered the junction of SSC/IRa, and the part of ycf1 into IRa region showed a stable size (1095 bp) for all

274 the Trapa plants. On the contrary, the remaining part of ycf1, located in SSC, showed a variable size, with

275 4539 bp for the two small-seed species and 4533 bp for the remaining 13 taxa. Accordingly, a pseudogene

276 fragment ψycf1 was detected in the border of IRa/LSC with 1098 bp for all Trapa taxa, and the parts

277 located in IRb and SSC were 1095 bp and 3 bp, respectively. The gene trnH was distributed on the right

278 side of the border of IRb/LSC, with an interval of 32 - 47 bp from the border to the gene (Figure 4).

13 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

279 For all the Trapa species/taxa, a total of 64 types of codons encoding 20 amino acids were detected.

280 In all, the 83/85 protein coding genes within Trapa encoded 25, 832 to 26, 590 codons. The results were

281 comparable to that of the genus Lagerstroemia, 79 genes encoding 25, 068 - 27, 111 codons in their cp

282 genomes (Zheng et al., 2020). Three termination codons, UGA, UAG and UAA, were found for all the

283 Trapa species. The highest number of codon occurrences (> 1000) was found in GAA, UUU, AUU and

284 AAA, which encoded Glutamicacid, Phenylalanine, Isoleucine and Lysine, respectively. Leucine,

285 Isoleucine and Serine were the top three amino acids with high frequency, , with the frequency 2771-2826,

286 2229-2298 and 1992-2067, respectively. Conversely, Cysteine (288-298) and Tryptophan (459-468) were

287 the least frequent amino acids.

288 Like that of Lagerstroemia (Zheng et al., 2020), the RSCU value of a single amino acid showed a

289 positive correlation with the number of codons encoding it. The highest and lowest RSCU values were

290 exhibited in the UUA encoding Leucine (1.96) and the AGC encoding Serine (approximately 0.34),

291 respectively (Figure 2). The results also showed that 30 codons were used frequently with RSCU values >

292 1, and all of them ended with A/U, which might be associated with the high proportion of A/T in cp

293 genomes (Eguiluz et al., 2017).

294 3.5 Phylogenetic Analysis

295 Based on the whole cp genomes, the three phylogenetic trees (MP, ML and BI) showed the similar

296 clustering patterns. The species from the same family clustered together. At first, the branch with the two

297 Lagerstroemia species (Lythraceae) separated from the other species. And S. alba (Sonneratiaceae) showed

298 to be a sister of Trapa species, which was also supported by recent studies (Sun et al., 2020; Wang et al.,

299 2021; Wagutu et al., 2021).

300 All the Trapa species/taxa were divided into two clusters, the clusters of small-seed species and

14 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

301 large-seed species, which were consistent with the results of morphological studies (Fan et al., 2016). The

302 two species with small seeds initially separated from the other Trapa taxa, which agreed to the results of

303 AFLPs study recently published (Fan et al., 2021). Additionally, the present study further detected a close

304 genetic relationship between the two species with small seeds. On the other hand, in the cluster with large

305 seeds, the cultivated species T. bicornis was the outermost part of the group, which might suggest a

306 complex origin of cultivated Trapa species. However, it was reported that another common cultivated

307 species T. bicornis var. cochinchinensis demonstrated a hybrid origin between T. quadrispinosa and T.

308 bispinosa (Fan et al., 2021). The remaining large-seed Trapa taxa were divided into three clusters: (1) the

309 first cluster included three Trapa species (T. quadrispinosa, T. bispinosa and T. macropoda var. bispinosa).

310 The intimate relationship between the former two has been proved by many studies (Li et al., 2017; Fan et

311 al., 2021). It was the first time for T. macropoda var. bispinosa being reported, and the only difference

312 between this species and T. bispinosa was the larger seed bottom of this species; (2) the second cluster

313 contained six species/taxa (T. japonica, T. mammillifera, T. potaninii, T. pseudoincisa, T. arcuata and T.

314 natans var. baidangensis), which were of the deformed and obvious tubercles and thick husks; (3) the third

315 cluster covered four species (T. litwinowii, T. manshurica, T. kozhevnikovirum and T. sibirica), which

316 possess smooth and tight seed coats and were collected from the Amur River (Figure 8). Among them, T.

317 litwinowii is the only one with two horns. Compared with T. litwinowii, T. manshurica shares the same

318 seed size and seed ornamentations, but T. manshurica has four horns. T. manshurica and T. sibirica share

319 similar morphologic characteristics, but T. sibirica has larger nuts, small crown and longer fruit neck. The

320 seed crown of the three Trapa species (T. litwinowii, T. manshurica and T. sibirica) was large and curled

321 outwardly. T. kozhevnikovirum and T. sibirica have the similar seed size, but T. kozhevnikovirum has a

322 small seed crown and unobvious seed neck.

15 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

323 4. Conclusions

324 This study reported the comparative analysis results of 15 Trapa cp genome sequences with detailed gene

325 annotation. The 15 cp genomes are of the similar quadripartite structure with a high degree of the synteny

326 in gene order, suggesting the high conservation of the cp genome in Trapa. There was no obvious change

327 found in the IR/SC junctions, indicating the junctions were relatively conservative in Trapa. Most of the

328 long repetitive repeats and SSRs were detected in the intergenic and LSC regions, which provided effective

329 cp markers for species identification, genetic relationship analysis and the population genetics. Three

330 phylogenetic analysis approaches (MP, ML and BI methods) consistently supported the 15 Trapa taxa

331 separated into two major evolutionary branches, large- and small-seed branches, which was agreeable to

332 the results of morphological studies. And a close genetic relationship was found between the two

333 small-seed species. The baseline cp genome information of Trapa should be useful for species

334 identification and phylogeographic analysis of Trapa species/taxa, which will be beneficial to the

335 management and utilization of genetic resources of the genus.

336

337 Author contributions

338 FX collected and identified the species of sample, designed the experiments, analyzed the data and wrote

339 the paper. WW performed the experiments, contributed reagents/materials/analysis tools, prepared figures

340 and/or tables and wrote the paper. CY conceived and designed the experiments, reviewed drafts of the

341 paper. GKW contributed reagents/materials/analysis tools. LW and LX help conceived and designed the

342 experiments and reviewed the paper.

343

344 Funding

16 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

345 This work was supported by the National Scientific Foundation of China (31100247), the Talent Program

346 of Wuhan Botanical Garden of the Chinese Academy of Sciences (Y855291B01) and the High-level Talent

347 Training Program of Tibet University (2018-GSP-018).

348

17 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

349 Reference

350 Akao, S., Maeda, K., Hosoi, Y., Nagare, H., Maeda, M., Fujiwara, T. (2013). Cascade utilization of water

351 chestnut: recovery of phenolics, phosphorus, and sugars. Environ. Sci. Pollut. Res. 20, 5373-5378. doi:

352 10.1007/s11356-013-1547-7

353 Angiosperm Phylogeny Group (2003). An update of the Angiosperm Phylogeny Group classification for

354 the orders and families of flowering plants: APG II. Bot. J. Linnean Soc. 141(4), 399-436. doi:

355 10.1046/j.1095-8339.2003.t01-1-00158.x

356 Asaf, S., Khan, A. L., Khan, A. R., et al. (2016). Complete chloroplast genome of Nicotiana otophora and

357 its comparison with related species. Front. Plant Sci., 7, 843. doi: org/10.3389/fpls.2016.00843

358 Campbell, B.T., Williams, V.E., Park, W. (2009). Using molecular markers and field performance data to

359 characterize the Pee Dee cotton germplasm resources. Euphytica 169, 285-301. doi:

360 10.1007/s10681-009-9917-4

361 Cauzsantos, L.A., Munhoz, C.F., et al. (2017). The chloroplast genome of passifora edulis (passiforaceae)

362 assembled from long sequence reads: structural organization and phylogenomic studies in

363 malpighiales. Front. Plant Sci. 8,334. doi: 10.3389/fpls.2017.00334

364 Cauz-Santos, L.A., Munhoz, C.F., Rodde, N., et al. (2017). The chloroplast genome of Passifora edulis

365 (Passiforaceae) assembled from long sequence reads: structural organization and phylogenomic

366 studies in malpighiales. Front. Plant Sci. 8, 334. doi: 10.3389/fpls.2017.00334

367 Chao, X., Dong, W., Li, W., Lu, Y., Xie, X., Jin, X., Shi, J., He, K., Suo, Z. (2017). Comparative analysis

368 of six Lagerstroemia complete chloroplast genomes. Front. Plant Sci. 8, 15. doi:

369 10.3389/fpls.2017.00015

370 Chen, J.R., Ding, B.Y., Funston, M. (2007). Trapaceae. In Flora of China, vol. 13. Science Press Beijing &

18 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

371 Missouri Botanical Garden Press, St. Louis, USA, pp. 290-291.

372 Chen, S., Zhou, Y., Chen, Y., Gu, J. (2018). Fastp: an ultra-fast all-in-one FASTQ preprocessor.

373 Bioinformatics 34, i884–i890. doi: 10.1093/bioinformatics/bty560

374 Chorak, G.M., Dodd L.L., Rybicki N., Ingram K., Buyukyoruk M., Kadono Y., Chen Y.Y., Thum R.A.,

375 (2019). Cryptic introduction of water chestnut (Trapa) in the northeastern United States. Aquat. Bot.

376 155, 32-37. doi: 10.1016/j.aquabot.2019.02.006

377 Ciou, J. Y., Wang, C. C. R., Chen, J., Chiang, P. Y. (2008). Total phenolics content and antioxidant activity

378 of extracts from dried (Trapa taiwanensis Nakai) hulls. J. Food Drug Analysis, 16,

379 41e47. doi: 10.38212/2224-6614.2362

380 Cook, C.D.K. (1996). book. SPB Academic Pub.

381 Darriba, D., Taboada, G., Doallo, R. et al. (2012). jModelTest 2: more models, new heuristics and parallel

382 computing. Nat. Methods 9, 772. doi: 10.1038/nmeth.2109

383 Diao, Z.S. (1990). The morphogenesis in the ontogeny of the family Trapaceae. J.Yuzhou Univ. 3, 1-11. (In

384 Chinese with English abstract)

385 Ding, B.Y., Fang Y.Y. (1991). Study on the pollen morphology of Trapa from Zhejiang. Acta

386 phytotaxonomica sinica. 29(3), 172-177. (In Chinese with English abstract)

387 Doyle, J.J., Doyle, J.L. (1987). A rapid DNA isolation procedure for small quantities of fresh leaf tissue.

388 Phytochem. Bull. 19, 11-15.

389 Eguiluz, M., Rodrigues, N.F., Guzman, F., Yuyama, P., Margis, R. (2017). The chloroplast genome

390 sequence from Eugenia unifora, a Myrtaceae from Neotropics. Plant Syst. Evol. 303, 1199–1212. doi:

391 10.1007/s00606-017-1431-x

392 Fan, X.R., Li, Z., Chu. H.J., Li, W., Liu, Y.L., Chen, Y.Y. (2016). Analysis of morphological plasticity of

19 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

393 Trapa L. from China and their taxonomic significance. Plant Sci. J. 34(3), 340-351. (In Chinese with

394 English Abstract)

395 Gu, C., Ma, L., Wu, Z., Chen, K., Wang, Y. (2019). Comparative analyses of chloroplast genomes from 22

396 Lythraceae species: inferences for phylogenetic relationships and genome evolution within .

397 BMC Plant Biol. 19, 281. doi: 10.1186/s12870-019-1870-3

398 Gu, C., Tembrock, L.R, Johnson, N.G., Simmons, M.P., Wu, Z. (2016). The Complete Plastid Genome of

399 Lagerstroemia fauriei and Loss of rpl2 Intron from Lagerstroemia (Lythraceae). PLoS ONE 11(3),

400 e0150752. doi: 10.1371/journal.pone.0150752

401 Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O. (2010). New algorithms and

402 methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst

403 Biol. 59(3), 307-321. doi: 10.1093/sysbio/syq010

404 Gupta, A.K., Beentje, H.J. (2017). Trapa natans, The IUCN Red List of Threatened Species 2017,

405 e.T164153A84299204. doi:10.2305/IUCN.UK, 2017-1, RLTS.T164153A84299204.

406 Hong, Z., Wu, Z., Zhao, K., Yang, Z., Zhang, N., Guo, J., Tembrock, L. R., Xu, D. (2020). Comparative

407 analyses of five complete chloroplast genomes from the genus Pterocarpus (Fabacaeae). Int. J. Mol.

408 Sci. 21(11), 3758. doi: 10.3390/ijms21113758.

409 Howe, C. J., Barbrook, A. C., Koumandou, V. L., Nisbet, R. E. R., Symington, H. A., Wightman, T. F.

410 (2003). Evolution of the chloroplast genome. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci. 358,

411 99-107. doi: org/10.1098/rstb.2002.1176

412 Huang, T., Ding, B.Y., Hu, R.Y., et al. (1996). Cytotaxonomic studies on the genus Trapa in China. In

413 Research and Application of Life Sciences. Hangzhou, Zhejiang University Press, 235-239. (In

414 Chinese)

20 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

415 Hummel, M., Kiviat, E. (2004). Review of World literature on Water Chestnut with implications for

416 management in North America. J. Aquat. Plant Manage. 42, 17-27. doi:

417 10.1560/N0YU-G77C-TLKX-HW5K

418 Jansen, R. K.; Ruhlman, T. A. (2012). Plastid genomes of seed plants. In Genomics of Chloroplasts and

419 Mitochondria; Bock, R., Knoop, V., Eds.; Springer: Dordrecht, The Netherlands.

420 Jiang, W.M., Ding, B.Y. (2004). Genetic relationship among Trapa species assessed by RAPD markers. J.

421 Zhejiang Univ. Agr. Life Sci. 30, 191-196. (In Chinese with English Abstract)

422 Jin, J., Yu, W., Yang, J. et al. (2020). GetOrganelle: a fast and versatile toolkit for accurate de novo

423 assembly of organelle genomes. Genome Biol. 21, 241. doi: 10.1186/s13059-020-02154-5

424 Kak, A.M. (1988). Aquatic and wetland vegetation of western Himalayas. J. Econ. Taxon. Bot. 12,

425 447-451.

426 Katoh, K., and Standley, D. M. (2013). MAFFT multiple sequence alignment software version 7:

427 improvements in performance and usability. Mol. Biol. Evol. 30, 772-780. doi:

428 10.1093/molbev/mst010

429 Kauser, A., Shah, S. M. A., Iqbal, N., et al. (2018). In vitro antioxidant and cytotoxic potential of

430 methanolic extracts of selected indigenous medicinal plants. Progr. Nutr. 20(4), 706-12. doi:

431 10.23751/pn.v20i4.7523

432 Kim, K. J., Lee, H. L. (2004). Complete chloroplast genome sequences from Korean ginseng (Panax

433 schinseng Nees) and comparative analysis of sequence evolution among 17 vascular plants. DNA Res.

434 11(4), 247-261. doi: 10.1093/dnares/11.4.247

435 Kuang, D. Y., Wu, H., Wang, Y. L., Gao, L. M., Lu, L. (2011). Complete chloroplast genome sequence of

436 Magnolia kwangsiensis (Magnoliaceae): implication for DNA barcoding and population genetics.

21 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

437 Genome 54(8), 663-673. doi: 10.1139/g11-026

438 Kumar, S., Stecher, G., Li M., Knyaz, C., Tamura, K. (2018). MEGA X: Molecular Evolutionary Genetics

439 Analysis across computing platforms. Mol. Biol. Evol. 35, 1547-1549. doi: 10.1093/molbev/msy096

440 Kurtz, S., Schleiermacher, C. (1999). REPuter: fast computation of maximal repeats in complete genomes.

441 Bioinformatics 15, 426-427. doi: 10.1093/bioinformatics/15.5.426

442 Librado, P., and Rozas, J. (2009). DnaSP v5: a software for comprehensive analysis of DNA polymorphism

443 data. Bioinformatics 25, 1451-1452. doi: 10.1093/bioinformatics/btp187

444 Lin, C.P., Wu, C.S., Huang, Y.Y., Chaw, S.M. (2012). The complete chloroplast genome of ginkgo biloba

445 reveals the mechanism of inverted repeat contraction. Genome Biol. Evol. 4, 374-381. doi:

446 10.1093/gbe/evs021

447 Lohse, M., Drechsel, O., Bock, R.M. (2007). Organellar genome DRAW (OGDRAW): a tool for the easy

448 generation of high-quality custom graphical maps of plastid and mitochondrial genomes. Curr. Genet.

449 52, 267-274. doi: 10.1007/s00294-007-0161-y

450 Lu, R. S., Pan, L., Qiu, Y. X. (2017). The complete chloroplast genomes of three Cardiocrinum (Liliaceae)

451 species: comparative genomic and phylogenetic analyses. Front. Plant Sci. 7: 2054. doi:

452 10.3389/fpls.2016.02054

453 Naumann, J., Der, J. P., Wafula, E. K., Jones, S. S., Wagner, S. T., Honaas, L. A., Ralph, P. E., Bolin, J. F.,

454 Maass, E., Neinhuis, C. (2016). Detecting and characterizing the highly divergent plastid genome of

455 the nonphotosynthetic parasitic plant Hydnora visseri (Hydnoraceae). Genome Biol. Evol. 8, 345-363.

456 doi: 10.1093/gbe/evv256

457 Nazareno, A. G., Carlsen, M., Lohmann, L. G. (2015). Complete chloroplast genome of Tanaecium

458 tetragonolobum: the first Bignoniaceae plastome. PLoS ONE 10(6), e0129930. doi:

22 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

459 10.1371/journal.pone.0129930

460 Nguyen, P. A. T., Kim, J. S., Kim, J. H. (2015). The complete chloroplast genome of colchicine plants

461 (Colchicum autumnale L. and Gloriosa superba L.) and its application for identifying the genus.

462 Planta 242, 223-237. doi: 10.1007/s00425-015-2303-7

463 Qu, X., Moore, M.J., Li, D. et al. (2019). PGA: a software package for rapid, accurate, and flexible batch

464 annotation of plastomes. Plant Methods 15, 50. doi: 10.1186/s13007-019-0435-7

465 Raubeson, L.A., Peery, R., Chumley, T.W., Dziubek, C., Fourcade, H.M., Boore, J.L., Jansen, R.K. (2007).

466 Comparative chloroplast genomics: analyses including new sequences from the angiosperms Nuphar

467 advena and Ranunculus macranthus. BMC Genomics 8, 174. doi: org/10.1186/1471-2164-8-174

468 Ronquist, F., Teslenko, M., van der Mark, P. et al. (2012). MrBayes 3.2: efficient Bayesian phylogenetic

469 inference and model choice across a large model space. Syst. Biol. 61, 539-542. doi:

470 10.1093/sysbio/sys029

471 Ruhlman, T.A., Jansen, R.K. (2014). The plastid genomes of flowering plants. In: Maliga P. (eds)

472 Chloroplast Biotechnology. Methods in Molecular Biology (Methods and Protocols) 1132. Humana

473 Press, Totowa, NJ. doi: org/10.1007/978-1-62703-995-6_1

474 Schattner, P., Brooks, A.N., Lowe, T.M. (2005). The tRNAscan-SE, snoscan and snoGPS web servers for

475 the detection of tRNAs and snoRNAs. Nucleic Acids Res. 33, 686-689. doi: 10.1093/nar/gki366

476 Sharp, P. M., Li, W. H. (1987). The codon Adaptation Index--a measure of directional synonymous codon

477 usage bias, and its potential applications. Nucleic Acids Res. 15, 1281-1295. doi:

478 10.1093/nar/15.3.1281

479 Shaw, J., Lickey, E. B., Schilling, E. E., Small, R. L. (2007). Comparison of whole chloroplast genome

480 sequences to choose noncoding regions for phylogenetic studies in angiosperms: the tortoise and the

23 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

481 hare III. Am J Bot. 94(3), 275-88. doi: 10.3732/ajb.94.3.275

482 Sloan, D.B., Triant, D.A., Forrester, N.J., Bergner, L.M., Wu, M., Taylor, D.R. (2014). A recurring

483 syndrome of accelerated plastid genome evolution in the angiosperm tribe Sileneae (Caryophyllaceae).

484 Mol. Phylogenet Evol. 2014 Mar;72:82-9. doi: 10.1016/j.ympev.2013.12.004

485 Sun, F.F., Yin, Y., Xue, B., Zhou, R., Xu, J. (2020). The complete chloroplast genome sequence of Trapa

486 bicornis Osbeck (Lythraceae), Mitochondrial DNA Part B, 5, 2746-2747. doi:

487 10.1080/23802359.2020.1788432

488 Sweta, Bauddh, K., Singh, R., Singh, R.P. (2015). The suitability of Trapa natans for phytoremediation of

489 inorganic contaminants from the aquatic ecosystems. Ecol. Eng. 83, 39-42. doi:

490 10.1016/j.ecoleng.2015.06.003

491 Takano, A., Kadono, Y., 2005. Allozyme variations and classifcation of Trapa (Trapaceae) in Japan. Aquat.

492 Bot. 83, 108–118. doi: 10.1016/j.aquabot.2005.05.008

493 Thiel, T., Michalek, W., Varshney, R., Graner, A. (2003). Exploiting EST databases for the development

494 and characterization of gene-derived SSR markers in barley (Hordeum vulgare L.). Theor. Appl.

495 Genet. 106, 411-422. doi: 10.1007/s00122-002-1031-0

496 Tillich, M., Lehwark, P., Pellizzer, T., Ulbricht-Jones, E.S., Fischer, A., Bock, R., Greiner, S. (2017).

497 GeSeq-versatile and accurate annotation of organelle genomes. Nucleic Acids Res. 45(W1), W6-W11.

498 doi: 10.1093/nar/gkx391

499 Tutin, T.G. (1968). Flora Europaea. Cambridage University Press, Cambridage, 303-452.

500 Vassiljev, V. (1965). Species novae Africanicae generis Trapa L.. Nov. Sist. Vyss. Rast. 32, 175-194.

501 Vassiljev, V.N. (1949). Water Caltrops-Hydrocaryaceae Raimann. Flora of the USSR: Vol. 15 Moscow:

502 Publishing House of As of USSR, 637-662.

24 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

503 Wagutu, G.K., Fan, X., Wang, W., Li, W., Chen Y. (2021). The complete chloroplast genome sequence of

504 Trapa kozhevnikovirum Pshenn. (Lythraceae). Mitochondrial DNA Part B: Resources. (Under review)

505 Wan, W.H., 2000. Trapaceae. Flora Republicae Popularis Sinicae, Vol. 53. Science Press, Beijing, China,

506 pp. 1–26 (In Chinese)

507 Wang, L.J., Ding, B.Y., (1997). Study on the chromosomes of three Chinese Trapa species. Ningbo

508 Agricultural Science and Technology 1, 7-9. (In Chinese)

509 Wang, W., Fan, X., Li, X., Chen Y. (2021). The complete chloroplast genome sequence of Trapa incisa

510 Sieb. & Zucc (Lythraceae). Mitochondrial DNA Part B: Resources. (Under review)

511 Xiong, Z.T., Huang, D.S., Wang, H.Q., Sun X.Z. (1990). Numrical taxonomic studies in Trapa in Hubei .

512 Numerical evaluations of taxonomic characters. Plant Sci. J. 8(1), 47-52. (In Chinese with English

513 abstract)

514 Xiong, Z.T., Wang, H.Q., Sun X.Z. (1985a). Numrical taxonomic studies in Trapa in Hubei I. Plant Sci. J.

515 3, 45-53. (In Chinese with English abstract)

516 Xiong, Z.T., Wang, H.Q., Sun X.Z. (1985b). Numrical taxonomic studies in Trapa in Hubei II. Plant Sci. J.

517 3, 157-164. (In Chinese with English abstract)

518 Xu, C., Dong, W., Li, W. et al. (2017). Comparative analysis of six Lagerstroemia complete chloroplast

519 genomes. Front. Plant Sci. 8, 15. doi: 10.3389/fpls.2017.00015

520 Xu, L., Cheng, S., Zhuang, P. et al. (2020). Assessment of the nutrient removal potential of floating native

521 and exotic aquatic macrophytes cultured in Swine Manure Wastewater. Int. J. Environ. Res. Public

522 Health 17, 1103. doi: 10.3390/ijerph17031103

523 Xue, J.H., Xue, Z.Q., Wang, R.X., Rubtsova, T.A., Pshennikova, L.M., Guo, Y.M. (2016). Didtribution

524 pattren and morphological diversity of Trapa L. in the Heilong and Tumen River Basin. J. Plant Sci.

25 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

525 34, 506-520. (In Chinese with English abstract)

526 Xue, Z.Q., Xue, J.H., Victorovna, K.M., Ma, K.P. (2017). The complete chloroplast DNA sequence of

527 Trapa maximowiczii Korsh. (Trapaceae), and comparative analysis with other Myrtales species. Aquat.

528 Bot. 143, 54-62. doi: 10.1016/j.aquabot.2017.09.003

529 Yan S. Z. (1983). Aquatic macrophytes of China. Science press, Beijing.

530 Yan, X., Liu, T., Yuan, X., Xu, Y., Yan, H., Hao, G. (2019). Chloroplast genomes and comparative analyses

531 among thirteen taxa within Myrsinaceae s.str. Clade (Myrsinoideae, Primulaceae). Int. J. Mol. Sci.

532 20(18), 4534. doi: 10.3390/ijms20184534

533 Yang, J., Vázquez, L., Chen, X., Li, H., Zhang, H., Liu, Z., et al. (2017). Development of chloroplast and

534 nuclear DNA markers for Chinese oaks (Quercus subgenus Quercus) and assessment of their utility as

535 DNA barcodes. Front. Plant Sci. 8, 816. doi: 10.3389/fpls.2017.0081

536 Yang, Z., Zhao, T., Ma, Q., Liang, L., Wang, G. (2018). Comparative genomics and phylogenetic analysis

537 revealed the chloroplast genome variation and interspecific relationships of Corylus (Betulaceae)

538 Species. Front. Plant Sci. 9. doi: org/10.3389/fpls.2018.00927

539 Yu, H., Shen, S. (2015). Phenolic composition, antioxidant, antimicrobial and antiproliferative activities of

540 water caltrop pericarps extract. LWT-Food Sci. Technol. 61, 238-243. doi: 10.1016/j.lwt.2014.11.003

541 Zhang, C. Y., Liu, T. J., Mo, X. L., et al. (2020a). Comparative analyses of the chloroplast genomes of

542 patchouli plants and their relatives in Pogostemon (Lamiaceae). Plants, 2020, 9(11), 1497. doi:

543 10.3390/plants9111497

544 Zhang, F., Wang, T., Shu, X., Wang, N., Zhuang, W., Wang, Z. (2020b). Complete chloroplast genomes and

545 comparative analyses of L. chinensis, L. anhuiensis, and L. aurea (Amaryllidaceae). Int. J. Mol. Sci.

546 21(16), 5729. doi: 10.3390/ijms21165729

26 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

547 Zheng, G., Wei, L., Ma, L., Wu, Z., Chen, K. (2020). Comparative analyses of chloroplast genomes from

548 13 Lagerstroemia (Lythraceae) species: identification of highly divergent regions and inference of

549 phylogenetic relationships. Plant Mol. Biol. 102, 659-676. doi: 10.1007/s11103-020-00972-6

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

27 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

569 Figure legends

570 Figure 1 Structural map of the Trapa chloroplast genome. Genes drawn inside the circle are transcribed

571 clockwise, and those outside are counterclockwise. Small single copy (SSC), large single copy (LSC), and

572 inverted repeats (IRa, IRb) are indicated. Genes belonging to different functional groups are color-coded.

573

574 Figure 2 Codon contents of 20 amino acids and stop codon of coding genes of Trapa chloroplast genome.

575 Color of the histogram corresponds to the color of codons.

576

577 Figure 3 Sequence alignment of whole chloroplast genomes using the Shuffle LAGAN alignment

578 algorithm in mVISTA. Trapa bicornis was chosen to be the reference genome. The vertical scale indicates

579 the percentage of identity, ranging from 50 to 100%.

580

581 Figure 4 Comparison of junctions between the LSC, SSC, and IR regions among 15 species. Distance in

582 the figure is not to scale. LSC Large single-copy, SSC Small single-copy, IR inverted repeat.

583

584 Figure 5 Number of long repetitive repeats on the complete chloroplast genome sequence of 15 Trapa

585 species. (a) frequency of the repeats more than 30 bp, (b) frequency of repeat types.

586

587 Figure 6 The comparison of simple sequence repeats (SSRs) distribution in 15 chloroplast genomes. (a)

588 frequency of common motifs; (b) number of different SSR types.

589

590 Figure 7 The nucleotide variability (Pi) value in the 15 Trapa chloroplast genomes. (a) Intergenic regions.

28 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

591 b Protein-coding genes. These regions are arranged according to their location in the chloroplast genome.

592

593 Figure 8 The phylogenetic tree is based on 21 complete chloroplast genome sequences using Maximum

594 likelihood (ML), Maximum parsimony (MP) and Bayesian inference (BI). The number above the lines

595 indicates bootstrap values for ML and MP and posterior probabilities for BI of the phylogenetic analysis

596 for each clade.

597

29 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. bioRxiv preprint doi: https://doi.org/10.1101/2021.05.17.444415; this version posted May 17, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.