diversity

Article Complete Chloroplast Genome Sequence and Comparative and Phylogenetic Analyses of the Cultivated esculentus

Wei Ren 1,†, Dongquan Guo 1,†, Guojie Xing 1,†, Chunming Yang 1, Yuanyu Zhang 1, Jing Yang 1, Lu Niu 1, Xiaofang Zhong 1, Qianqian Zhao 1, Yang Cui 1, Yongguo Zhao 2,* and Xiangdong Yang 1,*

1 Jilin Provincial Key Laboratory of Agricultural Biotechnology, Jilin Academy of Agricultural Sciences, Changchun 130024, China; [email protected] (W.R.); [email protected] (D.G.); [email protected] (G.X.); [email protected] (C.Y.); [email protected] (Y.Z.); [email protected] (J.Y.); [email protected] (L.N.); [email protected] (X.Z.); [email protected] (Q.Z.); [email protected] (Y.C.) 2 College of Biology and Food Engineering, Guangdong University of Petrochemical Technology, Maoming 525000, China * Correspondence: [email protected] (Y.Z.); [email protected] (X.Y.) † These authors contributed equally to this study.

Abstract: Cyperus esculentus produces large amounts of oil as one of the main oil storage reserves in underground tubers, making this crop species not only a promising resource for edible oil and biofuel in food and chemical industry, but also a model system for studying oil accumulation in non-seed tissues. In this study, we determined the chloroplast genome sequence of the cultivated C. esculentus (var. sativus Boeckeler). The results showed that the complete chloroplast genome   of C. esculentus was 186,255 bp in size, and possessed a typical quadripartite structure containing one large single copy (100,940 bp) region, one small single copy (10,439 bp) region, and a pair of Citation: Ren, W.; Guo, D.; Xing, G.; inverted repeat regions of 37,438 bp in size. Sequence analyses indicated that the chloroplast genome Yang, C.; Zhang, Y.; Yang, J.; Niu, L.; encodes 141 genes, including 93 protein-coding genes, 40 transfer RNA genes, and 8 ribosomal RNA Zhong, X.; Zhao, Q.; Cui, Y.; et al. genes. We also identified 396 simple-sequence repeats and 49 long repeats, including 15 forward Complete Chloroplast Genome repeats and 34 palindromes within the chloroplast genome of C. esculentus. Most of these repeats Sequence and Comparative and were distributed in the noncoding regions. Whole chloroplast genome comparison with those of the Phylogenetic Analyses of the Cultivated Cyperus esculentus. other four Cyperus species indicated that both the large single copy and inverted repeat regions were Diversity 2021, 13, 405. https:// more divergent than the small single copy region, with the highest variation found in the inverted doi.org/10.3390/d13090405 repeat regions. In the phylogenetic trees based on the complete chloroplast genomes of 13 species, all five Cyperus species within the formed a clade, and C. esculentus was evolutionarily Academic Editor: Mario A. Pagnotta more related to C. rotundus than to the other three Cyperus species. In summary, the chloroplast genome sequence of the cultivated C. esculentus provides a valuable genomic resource for species Received: 1 July 2021 identification, evolution, and comparative genomic research on this crop species and other Cyperus Accepted: 22 August 2021 species in the Cyperaceae family. Published: 26 August 2021 Keywords: Cyperus esculentus; chloroplast genome; comparative analysis; phylogeny Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affil- iations. 1. Introduction Cyperus esculentus L., also known as yellow tigernut, yellow nutsedge, or chufa, is a perennial C4 in the sedge family (Cyperaceae), which is comprised of approximately 5500 species worldwide. It occurs as wild or cultivated varieties and exhibits ecological Copyright: © 2021 by the authors. plasticity and a wide global distribution [1]. Cultivated C. esculentus (var. sativus Boeckeler) Licensee MDPI, Basel, Switzerland. originated in the Mediterranean area, where it has been grown for its edible tubers since This article is an open access article Cyperus esculentus distributed under the terms and pre-dynastic Egypt (fourth millennium BC) [1]. has been reported to conditions of the Creative Commons contain approximately 40% starch, 30% oil, 20% sugar, 9% protein, 6% fiber, and high levels Attribution (CC BY) license (https:// of vitamins E and C in its tubers [2]. In contrast to oil-bearing crops that mainly produce creativecommons.org/licenses/by/ oil in seeds as well as the mesocarp of certain fruits, C. esculentus might be the only plant 4.0/). known that accumulates a large amount of oil as one of the main storage reserves in its

Diversity 2021, 13, 405. https://doi.org/10.3390/d13090405 https://www.mdpi.com/journal/diversity Diversity 2021, 13, 405 2 of 14

underground tubers [2]. The unique characteristics of the species make it a promising resource for producing edible oil and biofuel in the food and chemical industries, and also provide a novel model system for studying oil accumulation in non-seed tissues [3]. Additionally, as a medicine, C. esculentus has been reported to help boost blood circulation, reduce cardiovascular diseases and heart attacks, and prevent stroke and inflammation in the respiratory passages [4–8]. Previous studies revealed that C. esculentus can inhibit free radicals and/or key enzymes involved in starch digestion, such as α-amylase and α-glucosidase, making this plant a dietary control option for patients with diabetes [9]. As active metabolic centers responsible for photosynthesis and the synthesis of amino acids, nucleotides, fatty acids, phytohormones, vitamins, and other metabolites, chloro- plasts play important roles in the physiology and development of land and al- gae [10,11]. In most land plants, the chloroplast genomes exhibit highly conserved struc- tures and organization, and typically exist as circular DNA molecules with a size of 120–170 kb [12]. Chloroplast genomes generally have a quadripartite structure and contain a large single copy (LSC) region and small single-copy (SSC) region separated by inverted repeats (IRs), although these IR regions are missing in some species [13,14]. Specific charac- teristics of the chloroplast genome, such as its maternal inheritance, haploid nature, and low level of recombination, make it a robust tool for genomics and phylogenetic studies of several plant families [15–18]. Moreover, considerable variations within chloroplast genomes can also provide useful information for evaluating the phylogenetic relationships of taxonomically unresolved plant taxa and understanding the relationship between plant nuclear, chloroplast, and mitochondrial genomes in plants [17,19–21]. C. esculentus exhibits remarkable variability with several morphotypes because of its ecological plasticity and wide distribution. However, very little information on the nuclear, chloroplast, and mitochondrial genomes of C. esculentus has been reported. Here, we assembled and annotated the complete chloroplast genome of cultivated C. esculentus. Comparative analysis of the chloroplast genomes of Cyperus species was conducted, and the phylogenetic position of C. esculentus in the Cyperaceae family was inferred.

2. Materials and Methods 2.1. Plant Materials and Genomic DNA Extraction Cultivated C. esculentus was collected from plants grown in Wuhan, China in 2017, and voucher specimen (herbarium voucher No. JYD-2) was maintained at the Jilin Academy of Agricultural Sciences, Changchun, China. Fresh leaves of the plants were collected and immediately stored at −80 ◦C until analysis. Total genomic DNA was extracted using the modified CTAB method [22]. The integrity, quality, and concentration of the DNA were determined by agarose gel electrophoresis and a NanoDrop spectrophotometer 2000 (Thermo Fisher Scientific, Waltham, MA, USA).

2.2. DNA Sequencing and Genome Assembly High-quality genomic DNA was used to construct libraries with an average length of 350 bp using the NexteraXT DNA Library Preparation Kit (Illumina, San Diego, CA, USA) and sequenced on the Illumina Noveseq 6000 platform (Illumina). More than 18.5 million paired-end reads with an average length of 150 bp were generated and edited using the NGS QC Tool Kit v2.3.3 [23]. Complete circular assembly graph was checked and further extracted by visualization (e.g., Bandage) of the GFA graph files that were assembled from SPAdes 3.11.0 software (http://cab.spbu.ru/software/spades/, accessed on 25 December 2020) [24].

2.3. Annotation and Analysis of C. esculentus Chloroplast DNA Sequence Annotation of C. esculentus chloroplast sequence was performed using PGA [25], and BLAST was used to evaluate the results. A circular gene map of the chloroplast genome was drawn using Organellar Genome DRAW v1.3.1 (https://chlorobox.mpimp-golm.mpg. de/OGDraw.html, accessed on 25 December 2020) [26]. The codon usage frequency and Diversity 2021, 13, 405 3 of 14

relative synonymous codon usage (RSCU) in C. esculentus were analyzed for all protein- coding genes (PCGs), using MEGAX to determine whether the chloroplast genes were under selection [27].

2.4. Genome Comparison with Other Cyperus Species For comparative analysis, the chloroplast genome sequences of four Cyperus species, including C. fuscus Linn. (MK431855), C. glomeratus Linn. (MK423990), C. difformis Linn. (MK423991), and C. rotundus Linn. (MT473237), were retrieved from the Gen- Bank. The mVISTA program (http://genome.lbl.gov/vista/mvista/about.shtml, accessed on 12 November 2021) in Shuffle-LAGAN mode was used to compare the complete chloro- plast genome of C. esculentus with those of four selected Cyperus species [28]. To identify the rapidly evolving molecular markers, the sequences were first aligned using MAFFT v7 [29], and then manually adjusted using BioEdit software (v8.1.0). Sliding window analysis was conducted to evaluate nucleotide variability (Pi) in the whole chloroplast genome using DnaSp v6 [30]. The window length was set to 800 bp, and the step size was set to 100 bp. Microsatellites were detected in the chloroplast genome sequences using the MIcroSAtellite identification tool (http://pgrc.ipk-gatersleben.de/misa, accessed on 13 November 2021) with the following thresholds (unit size and min repeats): eight repeat units for mononucleotide SSRs, five repeat units for dinucleotide SSRs, four repeat units for trinucleotide SSRs, and three repeat units for tetra-, penta-, and hexanucleotide SSRs. The minimum distance between the two SSRs was set to 100 bp [31]. Dispersed long repeats were determined by running the REPuter program with a minimum repeat size of 30 bp and similarities of 90% [32]. Tandem repeats were identified by running the web-based Tandem Repeats Finder (https://tandem.bu.edu/trf/trf.html, accessed on 14 November 2021), with alignment parameters set to 2, 7, and 7 for matches, mismatches, and indels, respectively [33].

2.5. Phylogenetic Analysis Phylogenetic trees were constructed using the 13 chloroplast genome sequences of the selected species, including 5 Cyperaceae species (C. esculentus, C. rotundus, C. fuscus, C. glom- eratus, and C. difformis), 4 Poaceae species (Brachypodium distachyon, Oryza sativa, Setaria italica, and sorghum bicolor), 1 Bromeliaceae species (Ananas comosus), 2 Arecaceae species (Elaeis guineensis and Phoenix dactylifera), and 1 Orchidaceae species (Phalaenopsis equestris), from the NCBI Organelle Genome and Nucleotide Resources database. We utilized the MAFFT7.1 with the strategy of FFT-NS-2 and model finder to select TVM+F+I+G4 [29]. The phylogenetic tree was then constructed by the IQTREE (v1.6) using 1000 bootstrap and maximum likelihood method [34,35], with P. equestris (JF719062) as the outgroup.

3. Results and Discussion 3.1. Cyperus Esculentus Chloroplast Genome Assembly and Its Features More than 18.5 million pair-end reads were produced with 5.51 Gb of clean data using the Illumina Novaseq 6000 platform. After assembly, a single contig with a size of 186, 255 bp spanning the entire chloroplast genome sequence of the cultivated C. esculentus was obtained and deposited at NCBI (GenBank accession number: MW542207). A circular representation of the complete chloroplast genome was shown in Figure1. Diversity 2021, 13, 405 4 of 14

Figure 1. Circular representation of the complete chloroplast genome of C. esculentus. Genes inside the circle are transcribed clockwise, and those outside the circle are transcribed counterclockwise. Genes with different functions are color-coded. The darker gray color in the inner circle shows the GC content, whereas the lighter gray color shows the AT content. LSC: large single copy. SSC: small single copy. IRA and IRB: inverted repeat.

Similar to those of most angiosperm, C. esculentus chloroplast genomes displayed a typical quadripartite structure, including one LSC region of 100, 940 bp and one SSC region of 10,439 bp, separated by two copies of IRs (IRA and IRB, 37,438 bp). As an important indicator of the affinity between different species, the total guanine-cytosine (GC) content of C. esculentus chloroplast genome was 33.19%, which is similar to that of closely related species in the Cyperus [36]. Specifically, the GC contents in the LSC and SSC regions were 30.96% and 25.11%, respectively, whereas the GC content was higher in IR regions (37.32%). The high GC content in the IR regions may be attributable to the high GC content of rRNA and tRNA genes in these regions [37,38]. Comparatively, the size and GC content of C. esculentus chloroplast genome is similar to those of C. rotundus, whereas the other three Cyperus species, including C. difformis, C. fuscus, and C. glomeratus, have smaller genome sizes and higher total GC contents (Table1). Diversity 2021, 13, 405 5 of 14

Table 1. Features of the chloroplast genomes of C. esculentus and four related Cyperus species.

Genome Features C. esculentus C. fuscus C. glomeratus C. difformis C. rotundus Genome size (bp) 186,255 167,660 167,523 167,974 186,119 LSC length (bp) 100,940 79,318 81,756 82,970 100,961 SSC length (bp) 10,439 12,192 9385 8150 10,414 IR length (bp) 37,438 38,075 38,191 38,427 37,372 Overall GC content, % 33.19 36.13 36.34 36.06 33.19 GC content in LSC, % 30.96 35.11 35.19 34.61 30.91 GC content in SSC, % 25.11 28.32 28.73 28.15 25.64 GC content in IR, % 37.32 38.45 38.5 38.48 37.37 Number of genes 141 137 135 137 132 Number of PCGs 93 94 93 91 84 Number of tRNA genes 40 35 34 38 40 Number of rRNA genes 8 8 8 8 8

In C. esculentus chloroplast genome, 141 genes were predicted, including 93 protein- coding genes (PCGs), 40 tRNA genes, and 8 rRNA genes. Specifically, there were four genes (infA, trnL-UAA, rps18, and accD) to be found only in the chloroplast genome of C. esculentus, but lost in those of the other four Cyperus species. Among these genes, 29 PCGs, 18 tRNA genes, and 8 rRNA genes were present and duplicated in the IR regions, whereas 58 PCGs and 20 tRNA genes were found in the LSC region, and six PCGs and two tRNA genes were found in the SSC region. Furthermore, in these chloroplast-encoded genes, we identified 17 PCGs and eight tRNA genes containing introns (Table2, Table S1). Except for ycf3 and rps12, which harbor two introns, most of these genes contain only one intron. Seven of the intron-containing genes, including ndhA, ycf68, ndhB, rps12, rpl2, trnA-UGC, and trnI-GAU, were in the IR regions and thus were duplicated. The other 11 genes were in the LSC region, and no intron-containing genes were found in the SSC region.

Table 2. Genes identified in the chloroplast genome of C. esculentus.

Category Function Genes trnA-UGC *, trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnfM-CAU, trnfM-CAU, trnG-GCC *, trnG-UCC, trnH-GUG, trnI-CAU, trnI-GAU *, trnK-UUU *, Transfer RNA trnL-CAA, trnL-UAA *, trnL-UAG, trnN-GUU, trnP-UGG, RNA genes trnQ-UUG, trnR-ACG, trnR-UCU, trnS-GCU, trnS-GGA, trnS-UGA, trnT-GGU, trnT-UGU, trnV-GAC, trnV-UAC *, trnW-CCA, trnY-GUA Ribosomal RNA rrn23, rrn16, rrn5, rrn4.5 Transcription and splicing rpoC1 *, rpoC2, rpoA, rpoB Transcription and rps2, rps3, rps4, rps7, rps8, rps11, rps12 **, rps14, rps15, translation related genes Translation, ribosomal proteins rps16 *, rps18, rps19, rpl2 *, rpl14, rpl16 *, rpl20, rpl22, rpl32, rpl33, rpl36 ATP synthase atpE, atpB, atpA, atpF *, atpH, atpI Photosystem I psaI, psaB, psaA, psaC, psaJ, ycf3 **, ycf4 psbD, psbC, psbZ, psbT, psbH, psbK, psbI, psbJ, psbF, psbE, Photosystem II psbM, psbN, psbL, psbA, psbB Photosynthesis Calvin cycle rbcL Cytochrome complex petN, petA, petL, petG, petB *, petD * ndhB*, ndhI, ndhK, ndhC, ndhF, ndhD, ndhG, ndhE, ndhA *, NADH dehydrogenase ndhH, ndhJ Other genes Conserved reading frames infA, ycf68 *, ycf2, accD, cemA, ccsA, matK * Genes containing one intron; ** genes containing two introns. Diversity 2021, 13, 405 6 of 14

Bias in codon usage plays a crucial role in regulating gene expression and cellular function and provides an additional means for studying speciation and evolution at the molecular level [39]. Based on the protein-coding sequences in C. esculentus chloroplast genome, the effective number of codons and frequency of codon usage were calculated (Table S2). These PCGs encoded a total of 22,609 codons. Of these, leucine (2442 codons, 10.8% in total) and cysteine (250 codons, 1.10% in total) were the most and least abundant amino acids, respectively, consistent with observations in many other angiosperm chloro- plast genomes [38]. Calculation of codon usage revealed that AAA (encoding lysine) was the most common synonymous codon, whereas UAG (encoding stop codon) was the least common (Table S2). We further calculated the RSCU to detect codon usage bias in the coding sequences of C. esculentus. UUA in leucine, AGA in arginine, and UCU in serine showed higher RSCU values (>2.0), indicating that the use of synonymous codons was more frequent than expected. Whereas, GCG in alanine, GGC in glycine, and CUG in leucine showed the lowest RSCU value (0.26). No usage bias (RSCU = 1) was found in methionine AUG or tryptophan UGG. The overall codon usage pattern in C. esculentus chloroplast genome tended towards A or U, with 30 ending with A or U among the 31 pre- ferred codons (RSCU > 1) (Table S2). Similar A- or U-ending codon usage bias has been observed in other species [38,40]. In contrast, most of the G- or C-ending codons exhibited RSCU values <1, indicating that they are less common in C. esculentus chloroplast genes. The species-specific patterns of synonymous codon usage can be used to investigate the gene expression regulation, speciation, and evolution of C. esculentus with other Cyperus species in future.

3.2. Long-Repeats and Simple Sequence Repeats (SSRs) in C. esculentus Chloroplast Genome Dispersed repeat sequences are considered to play an important role in chloroplast genome rearrangement and recombination [21,41]. Long repetitive sequences have also been used as valuable markers in studies of plant evolution, comparative genomics, and phylogenetics [42]. In the present study, we identified two types of long repeats in C. es- culentus chloroplast genome, which comprised 15 forward repeats and 34 palindromic repeats, whereas no inverse repeats or complementary repeats were detected (Table3). Most of the repeats (93.8%) varied from 64 to 385 bp in length, and only three repeats were more than 600 bp. In 49 long-repeats, 20 repeats were located only in the LSC region, 12 repeats were located only in the IR regions, and one repeat was located only in the SSC region. Sixteen repeats were found in more than one region, such as in both the LSC and IR regions or in both the SSC and IR regions (Table3). In addition, four forward repeats and two palindromic repeats were detected in the IR regions (IRA and IRB) and were thus duplicated. Among these 49 repeats, most repeats (85.71%) were in the intergenic spacers. In the protein-coding genes rpoC2 and infA, seven and two forward repeats were found, respectively, whereas in both rpl33 and rpoB, one palindromic repeat and one forward repeat were detected. Diversity 2021, 13, 405 7 of 14

Table 3. Size, type, and location of the long repeats distributed in C. esculentus chloroplast genome.

Repeat Repeat Mismatch ID Type Size (bp) E-Value Gene Region Start I Start 2 (bp) 1 2942 F 1175 31,558 0 0 IGS SSC; IRB 2 2942 P 1175 163,961 0 0 IGS SSC; IRA 3 120,442 F 697 143,897 0 0 IGS LSC 4 18,319 P 385 76,067 0 1.57 × 10−222 IGS; rpoB IRB; LSC 5 76,067 F 385 177,990 0 1.57 × 10−222 rpoB; IGS LSC; IRA 6 46,192 P 235 147,126 0 3.2 × 10−132 IGS IRB; LSC 7 147,126 F 235 150,267 0 3.2 × 10−132 IGS LSC; IRA 8 69,193 F 216 69,337 0 8.8 × 10−121 rpoC2 LSC 9 65,231 F 209 122,316 0 1.44 × 10−116 IGS LSC 10 15,879 P 206 20,605 0 9.22 × 10−115 IGS IRB 11 15,879 F 206 175,883 0 9.22 × 10−115 IGS IRB; IRA 12 20,605 F 206 180,609 0 9.22 × 10−115 IGS IRB; IRA 13 175,883 P 206 180,609 0 9.22 × 10−115 IGS IRB 14 6115 F 135 6246 0 5.14 × 10−72 IGS SSC 15 143,213 F 117 144,757 0 3.53 × 10−61 IGS LSC 16 66,316 F 116 146,776 0 1.41 × 10−60 infA; IGS LSC 17 69,266 F 106 69,746 0 1.48 × 10−54 rpoC2 LSC 18 69,410 F 106 69,746 0 1.48 × 10−54 rpoC2 LSC 19 41,590 F 95 122,865 0 6.22 × 10−48 IGS IRB; LSC 20 122,865 P 95 155,009 0 6.22 × 10−48 IGS LSC; IRA 21 64,509 F 92 142,999 0 3.98 × 10−46 IGS LSC 22 44,316 F 89 47,901 0 2.55 × 10−44 IGS IRB; LSC 23 47,901 P 89 152,289 0 2.55 × 10−44 IGS LSC; IRA 24 69,673 F 89 69,721 0 2.55 × 10−44 rpoC2 LSC 25 143,776 F 89 145,353 0 2.55 × 10−44 IGS LSC 26 69,529 F 88 69,577 0 1.02 × 10−43 rpoC2 LSC 27 20,178 P 81 58,390 0 1.67 × 10−39 IGS IRB; LSC 28 29,232 F 81 29,406 0 1.67 × 10−39 IGS IRB 29 29,232 P 81 167,207 0 1.67 × 10−39 IGS IRB; IRA 30 29,406 P 81 167,381 0 1.67 × 10−39 IGS IRB; IRA 31 58,390 F 81 176,435 0 1.67 × 10−39 IGS LSC; IRA 32 167,207 F 81 167,381 0 1.67 × 10−39 IGS IRA 33 143,058 F 79 145,560 0 2.67 × 10−38 IGS LSC 34 69,193 F 72 69,481 0 4.38 × 10−34 rpoC2 LSC 35 66,236 F 69 146,696 0 2.8 × 10−32 infA; IGS LSC 36 16,930 P 68 118,533 0 1.12 × 10−31 IGS; rpl33 IRB; LSC 37 118,533 F 68 179,696 0 1.12 × 10−31 rpl33; IGS LSC; IRA 38 145,380 F 68 146,273 0 1.12 × 10−31 IGS LSC 39 41,417 F 67 122,682 0 4.48 × 10−31 IGS IRB; LSC 40 65,220 F 67 142,805 0 4.48 × 10−31 IGS LSC 41 122,682 P 67 155,210 0 4.48 × 10−31 IGS LSC; IRA 42 19,762 P 66 29,760 0 1.79 × 10−30 IGS IRB 43 19,762 F 66 166,868 0 1.79 × 10−30 IGS IRB; IRA 44 29,760 F 66 176,866 0 1.79 × 10−30 IGS IRB; IRA 45 53,694 F 66 143,045 0 1.79 × 10−30 IGS LSC 46 166,868 P 66 176,866 0 1.79 × 10−30 IGS IRA 47 64,721 F 64 143,253 0 2.87 × 10−29 IGS LSC 48 64,721 F 64 144,797 0 2.87 × 10−29 IGS LSC 49 69,266 F 64 69,698 0 2.87 × 10−29 rpoC2 LSC F: forward; P: palindromic; IGS: intergenic space.

Simple sequence repeats (SSRs) are small stretches of DNA sequences that typically exhibit high levels of polymorphism, even among closely related species. Chloroplast SSRs are potentially useful molecular markers for population genetics and polymorphism stud- ies [43]. In the present study, 396 SSRs were detected in C. esculentus chloroplast genome, Diversity 2021, 13, 405 8 of 14

distributed in the LSC, SSC, and IR regions (Table S3). These SSRs included 339 mononu- cleotides (85.6%), 22 dinucleotides (5.56%), 10 trinucleotides (2.53%), 23 tetranucleotides (5.81%), two pentanucleotide (0.51%) repeats, and no hexanucleotides. Moreover, most of the SSRs belonged to the A/T types (95.45%), whereas only 18 C/G types (4.55%) of SSRs were detected in the genome, revealing that SSRs in C. esculentus chloroplast genome are generally composed of short polyadenine or polythymine repeats (Table S3), consistent with earlier studies [38]. Although most SSRs (297 in total, 75%) were found in intergenic spacers, 99 SSRs were also identified in 32 PCGs and six tRNA genes. Of these, most genes possessed mono- or di- nucleotide SSRs, whereas six protein-coding genes, ndhF, atpF, psaI, cemA, petB, and rpl16, contained each tetranucleotide SSR with a length of 12 bp. Extensive variations in the length, number, and distribution of these repeat sequences can be used to develop specific marks to investigate diversity and differentiation of C. esculentus and other relative species in the genus Cyperus.

3.3. Comparative Analysis of Chloroplast Genomic Structure with Other Cyperus Species To further understand the structural characteristics of C. esculentus chloroplast genome, sequence alignments were performed using five sequenced chloroplast genomes of Cype- rus species, including C. esculentus, C. rotundus, C. difformis, C. fuscus, and C. glomeratus. Both C. esculentus and C. rotundus have the largest chloroplast genomes and LSC regions (100,940 and 100,961 bp, respectively). All five species have IR regions of similar sizes (37,438–38,075 bp), and LSC and SSC regions of varying sizes (Table1). Variations in chloroplast sequence lengths among Cyperus species may be due to the different lengths of the LSC and SSC regions. Detailed comparative analysis of the junctions of the IRs and two single-copy regions, including LSC/IRA (JLA), LSC/IRB (JLB), SSC/IRA (JSA), and SSC/IRB (JSB), were conducted along with placement of adjacent genes in the chloroplast genomes of five Cyperus species. Six genes, psbA, rpl22, rps19, rps3, rpl16, and trnH, were detected at the junction of the LSC and IRs (Figure2).

Figure 2. Comparative analysis of the junctions of the IRs and two single copy regions, LSC and SSC regions, among five Cyperus species chloroplast genomes. Colored boxes above or below the main line indicate adjacent border genes. The distance between the genes and boundaries are represented by the base lengths (bp). JLA: junction between LSC and inverted repeat (IRA). JLB: junction between LSC and IRB. JSA: junction between SSC and IRA. JSB: junction between SSC and IRB.

In C. esculentus, rps3 is entirely located within the IR regions at 1728 bp from JLB; however, in C. rotundus, rpl22 is closer (2257 bp) to JLB and JLA than rps3 within the IR regions. In C. difformis and C. fuscus, rps19 is entirely located in IR regions 12 bp away from JLB and JLA, whereas in C. glomeratus, it spans the LSC and IR boundaries. The rpl16, which is entirely within the LSC region of C. rotundus, spans the LSC and IRA boundary Diversity 2021, 13, 405 9 of 14

in C. esculentus. Among three genes detected at the junction of the SSC and IRs, ndhE and ndhF in all five genomes were in SSC at 79–168 and 248–401 bp away from JSB and JSA, respectively, whereas ndhG is in IR regions at 37–154 bp from JSB and JSA. We also found that C. esculentus and C. rotundus contained the same genes in JLB (trnH) and in JLA (rpl16), whereas at similar locations, the duplicated rps19 was detected in C. fuscus, C. glomeratus, and C. difformis. In contrast, in C. esculentus and C. rotundus, the duplicated rps19 (in IR regions) was further away from both JLA and JLB than rps3 and rpl22. In addition, both psbA and trnH were located in LSC regions with psbA closer to JLB (168–404 bp) in C. fuscus, C. glomeratus, and C. difformis. However, in C. esculentus and C. rotundus, trnH is closer to JLB (31 bp) than psbA. Moreover, three genes, ndhG, ndhE, and ndhF, showed the same distance to JSB and JSA in C. fuscus, C. glomeratus, and C. difformis, which differed from those of C. esculentus and C. rotundus. These results indicated that C. esculentus and C. rotundus are more similar in genomic structure, whereas C. fuscus, C. glomeratus, and C. difformis are closely related. Expansion and contraction of the IR region at the borders play important roles in evolution and usually generate length variations in plant chloroplast genomes [44]. Compared with the IR–SSC boundaries (JSB and JSA), the IR-LSC boundaries (JLB and JLA) showed a high variability among five Cyperus species. The IR-LSC boundary variation likely results from expansion of the IR regions. In C. esculentus and C. rotundus, rps3 and rpl22 are present within the IR regions and thus duplicated; whereas in the other three species, these two genes are located in the LSC region. We further analyzed the differences in the chloroplast sequences of five Cyperus species using mVISTA, with C. glomeratus as a reference. The results indicated that the chloroplast genome sequences of C. esculentus and C. rotundus had very high sequence similarities, as shown in Figure3. In addition, the chloroplast genome sequences of C. fuscus, C. glomeratus, and C. difformis showed no significant differences and had similar sequence characteristics. However, great differences were identified between C. esculentus and C. rotundus groups and C. fuscus, C. glomeratus, and C. difformis groups. The highly variable regions were mainly concentrated in non-coding sequences, and only four highly variable regions containing infA, rpoC2, rps18, and accD were found (Figure3). Moreover, we found that both the LSC and IR regions were more divergent than the SSC region in the chloroplast genome of C. esculentus, with the two most variable regions (Pi > 0.32) distributed in the rps15-trnR of the IR regions. This was an exception to the general observations in other species, where the IR regions usually exhibited lower variability than the LSC and SSC regions [38]. Eight other regions that showed higher variability (0.32 > Pi > 0.2) were found to be distributed in the LSC region, including IGS rps19-psbA, rps16-atpA, atpI-rpoC2, rpoB-rbcL, rps12-psbB, petD-rpoA, rps11-rps14 and psaA-rps3. However, only one variable region was found in the SSC region (IGS ndhF- trnL) (Figure4). The high variability in these regions, especially that in the LSC and IR regions, provided diversity information to develop markers for molecular classification and phylogenetic analysis of C. esculentus. Diversity 2021, 13, 405 10 of 14

Figure 3. Global alignment of chloroplast genomes of five Cyperus species with C. glomeratus as a reference. Analysis was performed using mVISTA program. Gray arrows and thick black lines above the alignment indicate genes with their orientation and the position of the inverted repeats (IRs), respectively. The Y-axis represents the identity from 50% to 100%.

Figure 4. Sliding window analysis of the whole chloroplast genome of C. esculentus. X-axis: position of the midpoint of a window. Y-axis: nucleotide diversity (Pi) of each window with the threshold for variation hotspots set to 0.2. Highest variation hotspots for the chloroplast genome were annotated on the graph. The order of the chloroplast genome structure was represented as SSC-IRB-LSC-IRA. Diversity 2021, 13, 405 11 of 14

3.4. Phylogenetic Analysis of C. esculentus Complete chloroplast sequences are valuable for deciphering phylogenetic relation- ships, particularly between closely related taxa, or where recent divergence, rapid spe- ciation, or slow genome evolution has resulted in limited sequence variation [17,45,46]. Among the more than 950 species identified within the genus Cyperus, C. esculentus dis- plays wide genetic variability and is often confused with other Cyperus species [47]. In this study, we constructed a phylogenetic tree and estimated the relationship between C. esculentus and 12 other angiosperm species with P. equestris as outgroup using the complete chloroplast genome sequences across various taxonomic levels. We found that the phylogeny was largely congruent with the previous hypotheses on the C. esculentus evolutionary relationship [47]. All five Cyperus species are closer to Poaceae species on the tree than Ananas comosus (Bromeliaceae) and two Arecales species (Phoenix dactylifera and Elaeis guineensis), confirming the current taxonomic classification (i.e., C. esculentus belongs to lineage). Among the five species in the genus Cyperus, C. esculentus is most closely related to C. rotundus, whereas the other three species, C. fuscus, C. glomeratus, and C. difformis, had very close relationships (Figure5). These results are also consistent with previous observations obtained by comparative analysis, in which C. esculentus showed a closer relationship with C. rotundus [48].

Figure 5. Phylogenetic tree of 13 species’ chloroplast genomes using the maximum likelihood method. The position of five Cyperus species is presented at the upper part of the tree, and P. equestris was used as the outgroup.

4. Conclusions In this study, we first reported the complete chloroplast genome sequence of cultivated C. esculentus. The genome showed similar features to those of C. rotundus. Both have a larger genome size and LSC length and lower total GC content than the other three Cyperus species. Moreover, C. esculentus contained the largest number of genes in five Cyperus Diversity 2021, 13, 405 12 of 14

species. We identified 396 SSRs and 49 long repeats that can be used for population genetics and evolutionary studies. Comparative analysis further revealed that the chloroplast genome of C. esculentus had a structure and composition similar to that of C. rotundus, but significantly different from those of C. fuscus, C. glomeratus, and C. difformis. Both the LSC and IR regions were more divergent than the SSC region in the chloroplast genome of C. esculentus, with the two most variable regions (rps15-trnR) found in the IR regions. In this case, it is an exception to the general observations in other species where the IR regions are usually less variable than the LSC and SSC regions. Moreover, eight variable regions were distributed in single copy regions, and only one variable region was found in the SSC region. The phylogenetic results demonstrate that Cyperus belongs to the Poales lineage, and C. esculentus has the closest relationship with C. rotundus.

Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/d13090405/s1, Table S1: Summary of introns and exons of genes in chloroplast genome of C. esculentus, Table S2: Codon usage of chloroplast genome of C. esculentus, Table S3: SSRs identified in the chloroplast genome of C. esculentus. Author Contributions: X.Y. and Y.Z. (Yongguo Zhao) designed the experiments; W.R., D.G. and G.X. prepared the sample, performed the experiments, and analyzed the data; C.Y., Y.Z. (Yuanyu Zhang), J.Y., L.N., X.Z., Q.Z. and Y.C. made revisions to the final manuscript. All authors participated in the manuscript revision. All authors have read and agreed to the published version of the manuscript. Funding: This work was supported by the National Key Research and Development Program of China (2019YFC0507601, 2019YFC0507602). Institutional Review Board Statement: Not applicable. Informed Consent Statement: Not applicable. Data Availability Statement: The entire chloroplast genome sequence of the cultivated C. esculentus was obtained and deposited at NCBI (GenBank accession number: MW542207). Acknowledgments: We thank the reviewers who helped improve our manuscript. Conflicts of Interest: The authors declare no conflict of interest.

References 1. De Castro, O.; Gargiulo, R.; Del Guacchio, E.; Caputo, P.; De Luca, P. A molecular survey concerning the origin of Cyperus esculentus (Cyperaceae, poales): Two sides of the same coin (weed vs. Crop). Ann. Bot. 2015, 115, 733–745. [CrossRef] 2. Turesson, H.; Marttila, S.; Gustavsson, K.-E.; Hofvander, P.; Olsson, M.E.; Bülow, L.; Stymne, S.; Carlsson, A.S. Characterization of oil and starch accumulation in tubers of Cyperus esculentus var. Sativus (Cyperaceae): A novel model system to study oil reserves in nonseed tissues. Am. J. Bot. 2010, 97, 1884–1893. [CrossRef] 3. Yang, Z.; Ji, H.; Liu, D. Oil biosynthesis in underground oil-rich storage vegetative tissue: Comparison of Cyperus esculentus tuber with oil seeds and fruits. Plant Cell Physiol. 2016, 57, 2519–2540. [CrossRef][PubMed] 4. Hassan, H.A. Effect of dietary supplementation with tigernut tubers on streptozotocin-induced diabetic rats. Egypt J. Hosp. Med. 2007, 29, 475–485. [CrossRef] 5. Maritim, A.C.; Sanders, R.A.; Watkins, J.B., III. Diabetes, oxidative stress, and antioxidants: A review. J. Biochem. Mol. Toxicol. 2003, 17, 24–38. [CrossRef] 6. Olabiyi, A.A.; Oboh, G.; Adefegha, S.A. Effect of dietary supplementation of tiger nut (Cyperus esculentus L.) and walnut (tetracarpidium conophorum müll. Arg.) on sexual behavior, hormonal level, and antioxidant status in male rats. J. Food Biochem. 2017, 41, e12351. [CrossRef] 7. Sabiu, S.; Oladipo Ajani, E.; Sunmonu, T.O.; Tom Ashafa, A.O. Kinetics of modulatory role of Cyperus esculentus L. On the specific activity of key carbohydrate metabolizing enzymes. Afr. J. Tradit. Complement. Altern. Med. 2017, 14, 46–53. [CrossRef] 8. Sánchez-Zapata, E.; Fernández-López, J.; Angel Pérez-Alvarez, J. Tiger nut (Cyperus esculentus) commercialization: Health aspects, composition, properties, and food applications. Compr. Rev. Food Sci. Food Saf. 2012, 11, 366–377. [CrossRef] 9. Elshebini, S.M.; Moaty, M.; Tapozada, S.T.; Hanna, L.M.; Mohamed, H.I.; Raslan, H.M. Effect of regular consumption of tiger nut (Cyperus esculentus) on insulin resistance and tumor necrosis factor-alpha in obese type 2 diabetic egyptian women. Med. J. Cairo Univ. 2010, 78, 607–614. 10. Gray, M.W. The evolutionary origins of organelles. Trends Genet. 1989, 5, 294–299. [CrossRef] 11. Howe, C.J.; Barbrook, A.C.; Koumandou, V.L.; Nisbet, R.; Symington, H.A.; Wightman, T.F. Evolution of the chloroplast genome. Philos. Trans. R. Soc. Lond. B Biol. 2003, 358, 99–107. [CrossRef] Diversity 2021, 13, 405 13 of 14

12. Wicke, S.; Schneeweiss, G.M.; dePamphilis, C.W.; Müller, K.F.; Quandt, D. The evolution of the plastid chromosome in land plants: Gene content, gene order, gene function. Plant Mol. Biol. 2011, 76, 273–297. [CrossRef] 13. Sugiura, M. History of chloroplast genomics. Photosynth. Res. 2003, 76, 371–377. [CrossRef] 14. Shetty, S.M.; Md Shah, M.U.; Makale, K.; Mohd-Yusuf, Y.; Khalid, N.; Othman, R.Y. Complete chloroplast genome sequence of musa balbisiana corroborates structural heterogeneity of inverted repeats in wild progenitors of cultivated bananas and plantains. Plant Genom. 2016, 9.[CrossRef] 15. Wu, F.-H.; Chan, M.-T.; Liao, D.-C.; Hsu, C.-T.; Lee, Y.-W.; Daniell, H.; Duvall, M.R.; Lin, C.-S. Complete chloroplast genome of oncidium gower ramsey and evaluation of molecular markers for identification and breeding in oncidiinae. BMC Plant Biol. 2010, 10, 68. [CrossRef][PubMed] 16. Li, P.; Lu, R.-S.; Xu, W.-Q.; Ohi-Toma, T.; Cai, M.-Q.; Qiu, Y.-X.; Cameron, K.M.; Fu, C.-X. Comparative genomics and phyloge- nomics of east asian tulips (amana, liliaceae). Front. Plant Sci. 2017, 8, 451. [CrossRef] 17. Daniell, H.; Lin, C.-S.; Yu, M.; Chang, W.-J. Chloroplast genomes: Diversity, evolution, and applications in genetic engineering. Genom. Biol. 2016, 17, 134. [CrossRef][PubMed] 18. Bi, Y.; Zhang, M.-F.; Xue, J.; Dong, R.; Du, Y.-P.; Zhang, X.-H. Chloroplast genomic resources for phylogeny and DNA barcoding: A case study on fritillaria. Sci. Rep. 2018, 8, 1184. [CrossRef] 19. Eguiluz, M.; Rodrigues, N.F.; Guzman, F.; Yuyama, P.; Margis, R. The chloroplast genome sequence from eugenia uniflora, a myrtaceae from neotropics. Plant Syst. Evol. 2017, 303, 1199–1212. [CrossRef] 20. Du, Y.-P.; Bi, Y.; Yang, F.-P.; Zhang, M.-F.; Chen, X.-Q.; Xue, J.; Zhang, X.-H. Complete chloroplast genome sequences of lilium: Insights into evolutionary dynamics and phylogenetic analyses. Sci. Rep. 2017, 7, 5751. [CrossRef][PubMed] 21. Wang, X.; Zhou, T.; Bai, G.; Zhao, Y. Complete chloroplast genome sequence of fagopyrum dibotrys: Genome features, comparative analysis and phylogenetic relationships. Sci. Rep. 2018, 8, 12379. [CrossRef] 22. Doyle, J.J.; Doyle, J.L. A rapid DNA isolation procedure from small quantities of fresh leaf tissues. Phytochem. Bull. 1987, 19, 11–15. 23. Patel, R.K.; Jain, M. Ngs qc toolkit: A toolkit for quality control of next generation sequencing data. PLoS ONE 2012, 7, e30619. [CrossRef] 24. Bankevich, A.; Nurk, S.; Antipov, D.; Gurevich, A.A.; Dvorkin, M.; Kulikov, A.S.; Lesin, V.M.; Nikolenko, S.I.; Pham, S.; Prjibelski, A.D.; et al. Spades: A new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 2012, 19, 455–477. [CrossRef] 25. Qu, X.-J.; Moore, M.J.; Li, D.-Z.; Yi, T.-S. Pga: A software package for rapid, accurate, and flexible batch annotation of plastomes. Plant Methods 2019, 15, 50. [CrossRef][PubMed] 26. Lohse, M.; Drechsel, O.; Bock, R. Organellargenomedraw (ogdraw): A tool for the easy generation of high-quality custom graphical maps of plastid and mitochondrial genomes. Curr. Genet. 2007, 52, 267–274. [CrossRef] 27. Kumar, S.; Stecher, G.; Li, M.; Knyaz, C.; Tamura, K. Mega x: Molecular evolutionary genetics analysis across computing platforms. Mol. Biol. Evol. 2018, 35, 1547–1549. [CrossRef][PubMed] 28. Frazer, K.A.; Pachter, L.; Poliakov, A.; Rubin, E.M.; Dubchak, I. Vista: Computational tools for comparative genomics. Nucleic Acids Res. 2004, 32, 273–279. [CrossRef] 29. Katoh, K.; Standley, D.M. Mafft multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 2013, 30, 772–780. [CrossRef] 30. Rozas, J.; Ferrer-Mata, A.; Sánchez-DelBarrio, J.C.; Guirao-Rico, S.; Librado, P.; Ramos-Onsins, S.E.; Sánchez-Gracia, A. Dnasp 6: DNA sequence polymorphism analysis of large data sets. Mol. Biol. Evol. 2017, 34, 3299–3302. [CrossRef][PubMed] 31. Thiel, T.; Michalek, W.; Varshney, R.; Graner, A. Exploiting est databases for the development and characterization of gene-derived ssr-markers in barley (Hordeum vulgare L.). Theor. Appl. Genet. 2003, 106, 411–422. [CrossRef][PubMed] 32. Kurtz, S.; Choudhuri, J.V.; Ohlebusch, E.; Schleiermacher, C.; Stoye, J.; Giegerich, R. Reputer: The manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res. 2001, 29, 4633–4642. [CrossRef] 33. Benson, G. Tandem repeats finder: A program to analyze DNA sequences. Nucleic Acids Res. 1999, 27, 573–580. [CrossRef] [PubMed] 34. Nguyen, L.-T.; Schmidt, H.A.; von Haeseler, A.; Minh, B.Q. Iq-tree: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 2015, 32, 268–274. [CrossRef][PubMed] 35. Hoang, D.T.; Chernomor, O.; von Haeseler, A.; Minh, B.Q.; Vinh, L.S. Ufboot2: Improving the ultrafast bootstrap approximation. Mol. Biol. Evol. 2018, 35, 518–522. [CrossRef] 36. Zhu, Z. The complete chloroplast genome of pioneering plant Cyperus difformis(Cyperaceae) in ecological restoration. Mitochondrial DNA B Resour. 2019, 4, 1988–1989. [CrossRef] 37. Guo, S.; Guo, L.; Zhao, W.; Xu, J.; Li, Y.; Zhang, X.; Shen, X.; Wu, M.; Hou, X. Complete chloroplast genome sequence and phylogenetic analysis of paeonia ostii. Molecules 2018, 23, 246. [CrossRef] 38. Biju, V.C.; P.R., S.; Vijayan, S.; Rajan, V.S.; Sasi, A.; Janardhanan, A.; Nair, A.S. The complete chloroplast genome of , and phylogenetic analysis with . Plant Genom. 2019, 12, 190032. [CrossRef] 39. Plotkin, J.B.; Kudla, G. Synonymous but not the same: The causes and consequences of codon bias. Nat. Rev. Genet. 2011, 12, 32–42. [CrossRef] Diversity 2021, 13, 405 14 of 14

40. Li, X.; Li, Y.; Zang, M.; Li, M.; Fang, Y. Complete chloroplast genome sequence and phylogenetic analysis of quercus acutissima. Int. J. Mol. Sci. 2018, 19, 2443. [CrossRef] 41. Weng, M.-L.; Blazier, J.C.; Govindu, M.; Jansen, R.K. Reconstruction of the ancestral plastid genome in geraniaceae reveals a correlation between genome rearrangements, repeats, and nucleotide substitution rates. Mol. Biol. Evol. 2013, 31, 645–659. [CrossRef] 42. Park, I.; Yang, S.; Choi, G.; Kim, W.J.; Moon, B.C. The complete chloroplast genome sequences of aconitum pseudolaeve and aconitum longecassidatum, and development of molecular markers for distinguishing species in the aconitum subgenus lycoctonum. Molecules 2017, 22, 2012. [CrossRef][PubMed] 43. Yang, Y.; Zhou, T.; Duan, D.; Yang, J.; Feng, L.; Zhao, G. Comparative analysis of the complete chloroplast genomes of five quercus species. Front. Plant Sci. 2016, 7, 959. [CrossRef] 44. Kim, K.-J.; Lee, H.-L. Complete chloroplast genome sequences from korean ginseng ( panax schinseng nees) and comparative analysis of sequence evolution among 17 vascular plants. DNA Res. 2004, 11, 247–261. [CrossRef][PubMed] 45. Williams, A.V.; Miller, J.T.; Small, I.; Nevill, P.G.; Boykin, L.M. Integration of complete chloroplast genome sequences with small amplicon datasets improves phylogenetic resolution in acacia. Mol. Biol. Evol. 2016, 96, 1–8. [CrossRef][PubMed] 46. Tonti-Filippini, J.; Nevill, P.G.; Dixon, K.; Small, I. What can we do with 1000 plastid genomes? Plant J. 2017, 90, 808–818. [CrossRef] 47. Larridon, I.; Bauters, K.; Reynders, M.; Huygh, W.; Muasya, A.M.; Simpson, D.A.; Goetghebeur, P. Towards a new classification of the giant paraphyletic genus Cyperus (Cyperaceae): Phylogenetic relationships and generic delimitation in c4 Cyperus. Bot. J. Linn. Soc. 2013, 172, 106–126. [CrossRef] 48. Whitney, K.D.; Baack, E.J.; Hamrick, J.L.; Godt, M.J.W.; Barringer, B.C.; Bennett, M.D.; Eckert, C.G.; Goodwillie, C.; Kalisz, S.; Leitch, I.J.; et al. A role for nonadaptive processes in plant genome size evolution? Evolution 2010, 64, 2097–2109. [CrossRef]