bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
1 The complete mitochondrial genome of endangered Assam Roofed Turtle, Pangshura
2 sylhetensis (Testudines: Geoemydidae): Genomic features and Phylogeny
3 Shantanu Kundu, Vikas Kumar*, Kaomud Tyagi, Kailash Chandra
4 Centre for DNA Taxonomy, Molecular Systematics Division, Zoological Survey of India, M
5 Block, New Alipore, Kolkata-700053, India.
6
7 * Corresponding author:
8 Dr. Vikas Kumar, Scientist D
9 Centre for DNA Taxonomy, Molecular Systematics Division,
10 Zoological Survey of India, M Block, New Alipore, Kolkata-700053
11 Email: [email protected]
12 Mob. No. +91-9674944233
13
14 Running Head:
15 Complete mitogenome of Pangshura sylhetensis
16
17
18
19
20
1 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
22 Abstract
23 Assam Roofed Turtle, Pangshura sylhetensis is an endangered and least studied species endemic
24 to India and Bangladesh. The genomic feature of P. sylhetensis mitogenome is still anonymous
25 to the scientific community. The present study decodes the first complete mitochondrial genome
26 of P. sylhetensis (16,568 bp) by using next-generation sequencing. This de novo assembly
27 encodes 13 Protein-coding genes (PCGs), 22 transfer RNAs (tRNAs), two ribosomal RNAs
28 (rRNAs), and one control region (CR). Most of the genes were encoded on the majority strand,
29 except NADH dehydrogenase subunit 6 (nad6) and eight tRNAs. Most of the PCGs were started
30 with an ATG initiation codon, except for Cytochrome oxidase subunit 1 (cox1) and NADH
31 dehydrogenase subunit 5 (nad5) with GTG. The study also found the typical cloverleaf
32 secondary structure in most of the tRNA genes, except for serine (trnS1) with lack of
33 conventional DHU arm and loop. Both, Bayesian and Maximum-likelihood topologies showed
34 distinct clustering of all the Testudines species with their respective taxonomic ranks and
35 congruent with the previous phylogenetic hypotheses (Pangshura and Batagur sister taxa).
36 Nevertheless, the mitogenomic phylogeny with other amniotes corroborated the sister
37 relationship of Testudines with Archosaurians (Birds and Crocodilians). Additionally, the
38 mitochondrial Gene Order (GO) analysis indicated that, most of the Testudines species showed
39 plesiomorphy with typical vertebrate GO.
40 Keywords:
41 Chelonian, Geoemydidae, Mitogenome, Species phylogeny, Gene Order, Evolution.
42
43
2 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
44 Introduction
45 The evolution of living organisms is a continuous process over generations and difficult to
46 understand by measuring with a distinct hypothesis [1]. Several biological as well as
47 environmental factors play an important role to change the descendent gene of a common
48 ancestor that subsequently shifted into the next offspring’s or new species. Indeed, the genetic
49 traits of a certain population drift randomly and evidenced to be gradually leads by the natural
50 selection. Nevertheless, due to this high potency and tremendous success of genetic information,
51 the evolutionary pattern of many living taxa are now well-understood through phylogenetic
52 assessment. Hence, the world-wide collaborative effort of biologists ‘Tree of Life (ToL) project’
53 has been launched to illuminate the biodiversity and their evolutionary history [2].
54 Testudines (turtles, tortoises, and terrapins) are one of the oldest living organisms in the
55 earth with an extended evolutionary history [3,4]. Besides the morphology, the genetic approach
56 has been repeatedly applied to address the systematics of Testudines [5-7]. Both nuclear and
57 mitochondrial genes have been extensively utilized to perceive the knowledge on Testudines
58 phylogeny and genetic diversity [8-10]. Nonetheless, the Testudines mitogenomes were also
59 scrutinized to understand their evolution and furnished the evidence of ‘Turtle-Archosaur
60 affinity’ [11,12]. Additionally, to address the evolution of turtles within amniotes and strengthen
61 the Vertebrate tree of life, the phylogenomic approach was also evaluated [13,14]. However, the
62 phylogeny of Testudines is yet to be reconciled by mining of large-scale molecular data to know
63 their precise evolutionary scenario.
64 Testudines mitochondrial genomes are usually circular and double stranded with 16-19
65 kb in size contains 37 genes [15-19]. Remarkably, some of the reptile species including turtles,
3 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
66 have variable numbers of genes in their mitogenome due to the losing of protein-coding genes
67 (PCGs), duplication of control regions (CRs), and occurrence of multiple contiguous numbers of
68 one or more transfer RNA genes (tRNAs) [20]. Besides the structural features of different genes,
69 the gene orders (GOs) in mitogenome have also evidenced to be the signature of defining clades
70 at different taxonomic levels in both vertebrates and invertebrates [21-25]. In general, the
71 arrangements of GO are defined by transposition, inversion, inverse transposition, and tandem
72 duplication and random loss (TDRL) mechanisms in comparisons with the typical ancestral GO
73 [26]. Moreover, these evolutionary events also help to define the plesiomorphic/apomorphic
74 status and transformational pathways of mitogenome [27,28]. Thus, apart from the conventional
75 analysis; to infer the evolutionary pathways leading to the detected diversity of GOs, it is
76 essential to test phylogeny and GO conjointly [29,30]. As of now, more than 100 Testudines
77 species mitogenomes of 56 genera within 12 families and two sub-orders were generated
78 worldwide to address several phylogenetic questions. However, the combined study on GO and
79 phylogeny has never been assessed for Testudines to discuss the evolutionary tracts.
80 The evolutionarily distinct genus Pangshura is known by four extant species from
81 Southeast Asian countries [31]. Among them, the Assam Roofed turtle, Pangshura sylhetensis is
82 a highly threatened species and categorized as ‘Endangered’ in the International Union for
83 Conservation of Nature (IUCN) Red data list [32]. The distribution of P. sylhetensis is restricted
84 to Bangladesh and India [33,34]. Although, the partial segments of mitochondrial genetic
85 information are publicly available, but the mitogenomic information on this species is still
86 anonymous to the scientific communities. Therefore, the present study aimed to generate the
87 complete mitochondrial genome of P. sylhetensis and execute the comparative analysis with
88 other Testudines belonging to both sub-orders (Cryptodira and Pleurodira). Here, we used
4 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
89 phylogenetic (Bayesian and Maximum Likelihood) and GO analyses for better insights into the
90 Testudines evolutionary scenario.
91 Materials and Methods
92 Ethics statement and sample collection
93 To conduct the field survey and biological sampling, prior permission was acquired from the
94 wildlife authority of Arunachal Pradesh (Letter No. SFRI/APBB/09‐2011‐1221‐1228 dated
95 22.07.2016) and Zoological Survey of India, Kolkata (Letter No. ZSI/MSD/CDT/2016‐17 dated
96 29.07.2016). No turtle specimens were sacrificed in the present study. The P. sylhetensis
97 specimen was collected from the tributaries of Brahmaputra River (Latitude 27.50 N and
98 Longitude 96.24 E) from the nearby localities of Namdapha National Park in northeast India
99 (Fig. 1). The species photograph was taken by the first authors (S.K.) and edited manually in
100 Adobe Photoshop CS 8.0. The country level topology map has been downloaded from DIVA-
101 GIS Spatial data platform (http://www.diva-gis.org/datadown) and overlaying by ArcGIS 10.6
102 software (ESRI®, CA, USA). The range distribution was edited manually in Adobe Photoshop
103 CS 8.0 followed by the published record [35]. The blood sample was collected in sterile
104 condition from the hind limb of the live individual by using micro‐syringe and stored in EDTA
105 vial at 4°C. Subsequently the specimen was released back in the same eco-system with ample
106 care and attention.
107 Mitochondrial DNA extraction and sequencing
108 The collected blood sample (20 μl) was thoroughly blended with 1 ml buffer (0.32 M Sucrose, 1
109 mM EDTA, 10 mM TrisHCl) and centrifuged at 700 × g for 5 min at 4°C to remove nuclei and
110 cell debris. The supernatant was collected in 1.5 ml eppendorf tubes and centrifuged at 12,000 ×
5 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
111 g for 10 min at 4°C to precipitate the mitochondrial pellet. The pellet was re-suspended in 200 μl
112 of buffer (50 mM TrisHCl, 25 mM of EDTA, 150 mM NaCl), with the addition of 20 μl of
113 proteinase K (20 mg/ml) followed by incubation at 56°C for 1-2 hr and the mitochondrial DNA
114 was extracted by Qiagen DNeasy Blood & Tissue Kit (QIAGEN Inc.). The DNA quality was
115 visualized in 1% agarose gel electrophoresis, and the concentration was quantified by
116 NANODROP 2000 spectrophotometer (Thermo Scientific).
117 Mitogenome assembly and annotation
118 Complete mitochondrial genome sequencing and de novo assembly was carried out at Xcelris
119 Labs Limited, Gujarat, India (http://www.xcelrisgenomics.com/). The genome library was
120 sequenced using the Illumina platform (2x150bp PE chemistry) with the average size of library
121 was 545bp, to generate ~3 GB data (Illumina, Inc, USA). The total yield of the Number of Paired
122 end was 26,263,128 with the maximum data of 3.78 GB. The raw reads were handled using
123 cutadapt tool (http://code.google.com/p/cutadapt/) for adapters and low quality bases trimming
124 with cutoff of Phread quality score (Q score) of 20. The high quality reads were down sampled to
125 2 million reads using Seqtk program (https://github.com/lh3/seqtk). High quality paired end data
126 was assembled with NOVOPlasty v2.6.7 with default parameters [36]. The mitogenome of
127 Mauremys reevesii (accession no. KJ700438) was used as a reference seed sequence for de novo
128 assembly with 16568 bp (~16.5Kb) single contig. To validate the assembly obtained from
129 NOVOPlasty assembler, similarity search was carried out in GenBank database using BLASTn
130 v2.2.28 search (https://blast.ncbi.nlm.nih.gov) algorithm. Further, the contig was subjected to
131 confirm by MITOS v806 online webserver (http://mitos.bioinf.uni-leipzig.de). The DNA
132 sequences of the protein coding genes (PCGs) were primarily translated into the putative amino
133 acid sequences on the basis of the vertebrate mitochondrial genetic code. The exact initiation and
6 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
134 termination codons were identified in ClustalX using other publicly available reference
135 sequences of Testudines [37]. The de-novo mitogenome was submitted to the GenBank database
136 through the Sequin submission tool (Fig. S1).
137 Data set construction and comparative analysis
138 The spherical representation of the generated mitogenome of P. sylhetensis was plotted by
139 CGView Server (http://stoth ard.afns.ualbe rta.ca/cgview_server/) with default parameters [38].
140 The strand direction and arrangements of each PCG, tRNA, and ribosomal RNA (rRNA) were
141 also checked through MITOS online server. The overlapping regions and intergenic spacers
142 between the neighbor genes were counted manually through Microsoft Excel. The tRNA genes
143 of P. sylhetensis were verified in MITOS online server, tRNAscan‐SE Search Server 2.0
144 (http://lowel ab.ucsc.edu/tRNAs can-SE/) and ARWEN 1.2 [39,40]. The base composition of all
145 stems (DHU, acceptor, TѱC, anticodon) were examined manually to discriminate the
146 Watson‐Crick, wobble, and uneven base pairing. On the basis of homology search in the Refseq
147 database (https ://www.ncbi.nlm.nih.gov/refse q/), 51 Testudines mitogenomes representing nine
148 families of Cryptodira and three families of Pleurodira were downloaded from GenBank and
149 incorporated in the dataset for comparative analysis (Table S1). The genome size and
150 comparative analysis of nucleotide composition with other Testudines were calculated using
151 MEGA6.0 [41]. The base composition skew was calculated as defined previously: AT skew = (A
152 – T)/(A + T), GC skew = (G – C)/(G + C) [42]. The start and stop codons of PCGs were checked
153 through the Open Reading Frame Finder web tool (https ://www.ncbi.nlm.nih.gov/orffi nder/).
154 To determine the location for replication of the L-strand and putative secondary structures, all
155 Testudines CRs were analyzed through online Mfold web server (http://unafold.rna.albany.edu).
7 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
156 Due to the lack of proper annotation, the CRs of two species (Amyda cartilaginea and
157 Phrynops hilarii) were missing and thus not incorporated in the comparative analysis.
158 Phylogenetic and Gene Order (GO) analysis
159 To assess the Bayesian analysis (BA) and maximum‐likelihood (ML) phylogenetic relationship,
160 two datasets were prepared in the present analysis. The first dataset includes all the PCGs of 52
161 Testudines mitogenomes including P. sylhetensis (Table S1). As the targeted species is a
162 Cryptodiran species, all the Pleurodiran species were treated as out-group taxa in the first dataset.
163 Further, for reassess the existing hypothesis of Testudines phylogenetic position with other
164 amniotes, all the PCGs of 11 database sequences (Whale: KU891394, Human: AP008580,
165 Platypus: NC_000891, Opossum: AJ508398, Iguana: NC_002793, Lizards: AB080237, Snake:
166 HM581978, Tuatara: AF534390, Bird: AP003580, Alligator: NC_001922, and Crocodile:
167 HM488007) were acquired from the GenBank database to construct the second dataset (Table
168 S1). Among them, the database sequences of aquatic Amniota (Physeter catodon: KU891394)
169 was used as out-group taxa. The PCGs were aligned individually by codons using MAFFT
170 algorithm in TranslatorX with L‐INS‐i strategy with GBlocks parameters [43]. Finally, the two
171 datasets of all PCGs was concatenated using SequenceMatrix v1.7.84537 [44]. The suitable
172 models for phylogenetic analysis were estimated by PartitionFinder 2 [45] at CIPRES Science
173 Gateway V. 3.3 [46] (Table S2). The BA tree was built through Mr. Bayes 3.1.2 and the
174 metropolis‐coupled Markov Chain Monte Carlo (MCMC) was run for 100,000,000 generations
175 with sampling at every 100th generation and 25% of samples were discarded as burn‐in [47]. The
176 ML tree was constructed using IQ‐Tree web server with 1000 bootstrap support [48]. Both BA
177 and ML topologies for two datasets were further refined in iTOL v4.
8 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
178 (https://itol.embl.de/login.cgi) tool for better illustration [49] and edited with Adobe Photoshop
179 CS 8.0.
180 Further, to check the gene arrangement scenario, most contemporary TreeREx analysis
181 was adopted to infer the evolutionary pathways within the Testudines, leading to the observed
182 diversity of the GOs. TreeREx can easily distinguish the putative GOs at the internal nodes of a
183 reference tree as it works in a bottom-up manner through the iterative analysis of triplets or
184 quadruplets of GOs to decide all the GOs in the entire tree [30]. In the TreeREx analysis, the
185 consistent nodes are considered to be most reliable and marked by green color, whereas the
186 fallback nodes have the highest level of uncertainty marked by red color. In doing so, we used
187 the default settings of TreeREx suggested on the website (http://pacosy.informatik.uni-
188 leipzig.de/185-0-TreeREx.html) to analyze every node of the reference phylogenetic tree was
189 defined by a GO. To finalize the gene arrangements dataset of 52 Testudines for TreeREx
190 analysis (Table S3); the insertion, deletion, and duplication of genes within the database
191 available mitogenomes were reminisced as discussed in the previous studies [50-53].
192 Results and Discussion
193 Mitogenome structure and organization
194 The mitogenome (16,568 bp) of endangered Assam Roofed turtle, P. sylhetensis was determined
195 in the present study (GenBank accession no. MK580979). The mitogenome was encoded by 37
196 genes, comprising 13 PCGs, 22 tRNAs, two rRNAs, and a major non-coding CR. Among them,
197 nine genes (NADH dehydrogenase subunit 6 and eight tRNAs) were located on the minority
198 strand, while the remaining 28 genes were located on the majority strand (Table 1 and Fig. 1). In
199 the present dataset, the length of mitogenome was varied from 15,339 bp (A. cartilaginea) to
9 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
200 19,403 bp (Stigmochelys pardalis). Out of 52 Testudines species, 43 species were showed
201 strands symmetry as observed in typical vertebrates [54]. The gene arrangements scenarios
202 within the Testudines species are discussed elaborately in the forthcoming subsection. The
203 nucleotide composition of P. sylhetensis mitogenome was A+T biased (59.27%) as represented
204 in other Testudines mitogenomes ranging from 57.76% (Pleurodiran species P. hilarii) to
205 64.19% (Cryptodiran species Kinosternon leucostomum) (Table S4). The A+T composition of P.
206 sylhetensis PCGs was 58.77%. The AT skew and GC skew was 0.124 and -0.334 in the
207 mitogenome of P. sylhetensis. The comparative analysis showed that the AT skew ranged from
208 0.087 (Testudo graeca) to 0.208 (Carettochelys insculpta) and GC skew from -0.296 (S.
209 pardalis) to -0.412 (Trionyx triunguis) (Table S4). A total of 15 overlapping regions with a total
210 length of 73 bp were identified in P. sylhetensis mitogenome. The longest overlapping region (21
211 bp) was observed between tRNA‐Leucine (trnL1) and NADH dehydrogenase subunit 5 (nad5).
212 Further, a total of 12 intergenic spacer regions with a total length of 129 bp were observed in P.
213 sylhetensis mitogenome with a longest region (66 bp) between tRNA‐Proline (trnP) and CR
214 (Table 1).
215 Protein‐coding genes
216 The total length of PCGs was 11,268 bp in P. sylhetensis, which represent 68.01% of complete
217 mitogenome. The AT skew and GC skew was 0.056 and -0.345 in PCGs of P. sylhetensis (Table
218 S4). Most of the PCGs of P. sylhetensis started with an ATG initiation codon; however the GTG
219 initiation codon was observed in Cytochrome oxidase subunit 1 (cox1) and nad5. The TAG
220 termination codon was used by four PCGs, TAA by four PCGs, AGG by two PCGs, and GAA
221 by one PCG. The incomplete termination codons TA(A) and T(AA) were detected in
222 Cytochrome oxidase subunit 3 (cox3) and Cytochrome b (cytb) gene respectively. The
10 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
223 comparative analysis with other Testudines species revealed that the high frequency of initiation
224 codons (ATN, GTG) were observed in NADH dehydrogenase subunit 1 (nad1), NADH
225 dehydrogenase subunit 2 (nad2), NADH dehydrogenase subunit 3 (nad3), NADH dehydrogenase
226 subunit 4L (nad4L), NADH dehydrogenase subunit 4 (nad4), nad5, and cytb. The frequency of
227 ATG initiation codon is always higher in most of the PCGs (64.71% to 96.08%), except for cox1
228 with GTG start codon (72.55%). The higher frequency of complete termination codon was
229 observed in eight PCGs, except five PCGs with incomplete termination codon (Table S5 and Fig.
230 2).
231 Ribosomal RNA and transfer RNA genes
232 The total length of two rRNA genes of P. sylhetensis was 2,560 bp with ranged from 1,611 bp
233 (Caretta caretta) to 2,685 bp (Pelodiscus sinensis) in the present dataset. The AT content within
234 rRNA genes was 58.98% with AT and GC skew was 0.272 and -0.173 respectively (Table S4).
235 Total 22 tRNAs were found in the P. sylhetensis mitogenome with the total length of 1,550 bp.
236 In other Testudines, the length of tRNAs was varied from 1,407 bp (A. cartilaginea) to 1,684 bp
237 (Platysternon megacephalum). The AT content within tRNA genes was 59.87% with AT and GC
238 skew was 0.017 and 0.057 respectively (Table S4). Among all the tRNA genes, 14 were present
239 in majority strand and eight tRNA genes (trnQ, trnA, trnN, trnC, trnY, trnS2, trnE, and trnP) on
240 minority strand. The anticodons of all the tRNA genes were also detected in the present study.
241 Most of the tRNA genes were folded into classical cloverleaf structure except trnS1 (without
242 Dihydrouridine (DHU) stem and loop (Fig. S2). The conventional base pairings (A=T and G≡C)
243 were observed in most of the tRNAs [55]; however the wobble base pairing were observed in the
244 stem of 13 tRNAs (trnL2, trnN, trnA, trnW, trnP, trnE, trnR, trnG, trnC, trnY, trnS2, trnK, trnQ)
245 (Fig. S2).
11 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
246 Control regions
247 The CR of P. sylhetensis was typically distributed with three functional domains, the termination
248 associated sequence (TAS), the central conserved (CD), and the conserved sequence block
249 (CSB) as observed in other vertebrate CRs [56]. As compared to the TAS and CSB domain with
250 varying numbers of tandem repeats, the CD domain consisting with highly conserved sequences.
251 Hence, the pattern of CR was varied among different vertebrates, including Testudines [56,57].
252 The total length of P. sylhetensis CR was 1,067 bp, ranging from 600 bp (Lepidochelys olivacea)
253 to 3,885 bp (S. pardalis) in the present dataset. In P. sylhetensis CR, the AT and GC skew was -
254 0.025 and -0.249 (Table S4). The TAS domain was ‘TACATA’, while the CSB domain was
255 further divided into four regions: CSB-F (AGAGATAAGCAAC), CSB-1 (GACATA), CSB-2
256 (TTAAACCCCCCTACCCCCC), and CSB-3 (TCGTCAAACCCCTAAATCC). The CR is also
257 acknowledged for the initiation of replication, and is positioned between trnP and trnF for most
258 of the Testudines except P. megacephalum [50,52]. In, P. sylhetensis 27 bp nucleotides are
259 present between CSB-1 and stem-loop structure, however in other species, it was ranging from
260 zero to 90 bp (T. triunguis) (Fig. S3 and S4). Overall, the structural features of the replication of
261 the L-strand and putative secondary structures are different in all Testudines species, which can
262 be used as a species-specific marker.
263 Phylogeny of Testudines mitogenomes
264 The phylogenetic position of Testudines in the Vertebrate tree of life has been evaluated in last
265 few decades. Several approaches have been aimed to reconcile the phylogeny for better
266 understanding of their origin and diversification [4]. Besides, morphological parameters the gene
267 based topology have been widely measured for their identification, species delimitation, and
12 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
268 population genetics assessment [58,59]. Nevertheless, to interpret the evolutionary scenario, the
269 tested dataset should comprise with comprehensive genetic information on various taxa
270 representing all extant lineages. Both morphological and molecular data corroborates to erect all
271 the Pangshura species from the closest congeners of Batagur in earlier studies [31,60,61]. The
272 mitogenomic data have successfully evidenced the phylogenetic relationships of many
273 Testudines species, including the studied genus Pangshura under Geoemydidae family [62]. Till
274 date, majority of species under sub-family Geoemydinae has been targeted throughout the world
275 to generate their mitogenomes. Nevertheless, only two species mitogenomes of sub-family
276 Batagurinae are presently available in the global database. Hence, the present study adds the de
277 novo assembly third Batagurinae species (P. sylhetensis) in the global database. Both BA and
278 ML phylogenies effectively discriminated all the studied Testudines species compiled in the first
279 dataset with high bootstrap support and posterior probability (Fig. 3 and S5). The present
280 phylogeny evidenced the sister relationship of P. sylhetensis with Batagur trivittata species as
281 described in the previous phylogeny [62]. The other representative species are also showed close
282 clustering within their respective families and sub-orders as well as consistent with the previous
283 phylogenetic hypothesis [59,61]. The present mitogenomic phylogeny with concatenated 13
284 PCGs was adequate to generate a robust phylogeny and illuminate the relationship between P.
285 sylhetensis with other Testudines species. More taxa from different taxonomic lineages from
286 diverse distribution localities and their large-scale genetic data will be worthy to settle more
287 exhaustive phylogeny and evolutionary connection.
288 In addition, the evolutionary position of Testudines is still controversial in relationship
289 with other amniotes [63-65]. To resolve the impediments on the placement of Testudines either
290 in anapsids or diapsids, the mitogenome of Pleurodiran species has been analyzed previously and
13 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
291 reject the placement of Testudines as most basal position in the amniote tree of life [11,66]. The
292 mitogenome sequences have also been studied to evaluate the relationships between
293 Archosaurians (birds and crocodilians) and Lepidosaurians (tuatara, snakes, and lizards) [67]. In
294 recent past, the reptilian transcriptome based phylogenetic analysis also suggests that, Testudines
295 are not the basal of extant reptiles but showed affinity towards other Archosaurians [68]. Further,
296 the candidate nuclear protein-coding locus (NPCL) markers were also evaluated and
297 demonstrated that, turtles are robustly recovered as the sister group of Archosauria (birds and
298 crocodilians) and the evolutionary timescales was congruent with the Timetree of Life [69].
299 Later on, the phylogenomic approaches and ultra-conserved elements (UCEs) based analysis
300 were endeavored to test the prevailing hypothesis and supports the sister relationships of
301 Testudines with Archosaurians [13,70,71]. Hence, to affirm the existing phylogenetic hypothesis
302 of Testudines with other amniotes, we constructed the phylogenetic tree of mitogenomes dataset
303 (Testudines + other amniotes). Both BA and ML phylogeny of this dataset showed close
304 clustering with birds, alligator, and crocodile in comparison with other amniotes with high
305 bootstrap support and posterior probability (Fig. 4 and S6). Hence, the present mitogenome
306 based phylogenies are congruent with the prevailing hypothesis and strengthen the evolutionary
307 thoughts of Testudines in the Vertebrate tree of life.
308 Gene arrangements
309 To define the phylogenetic clustering at different taxonomic levels, and presume the
310 evolutionary pathways among Testudines, TreeRex analysis was adopted to assure the gene
311 arrangements. A total of 50 consistent nodes were detected in the present analysis (Fig. 5).
312 Considering A50 node as an ancestral trait of both Cryptodiran and Pleurodiran species in the
313 present dataset, the gene arrangement events are plesiomorphy for most of the Testudines species
14 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
314 with few exceptions. Four gene arrangements events were detected in the present dataset: (i) an
315 inversion of trnP was observed in A39 node of E. macrurus, (ii) an inversion of trnP was
316 observed in A21 node of E. imbricata, (iii) two inversion of trnP and trnS1 were observed on
317 node A11 towards A10 which separates the family Testudinidae from Geoemydidae, (iv) two
318 inversion of trnS1, trnP and one TDRL event towards P. megacephalum separates the family
319 Platysternidae to Emydidae (Fig. 5). However, the synapomorphy was observed in three species
320 under three different families, Chelidae (E. macrurus), Cheloniidae (E. imbricata), and
321 Platysternidae (P. megacephalum). As compared the gene arrangement scenario with other
322 species under Chelidae and Cheloniidae, the GO of both E. macrurus and E. imbricata might
323 occurr due to the parallel evolution, which needs further investigation. Further, to know the
324 phylogenetic position and evolutionary history of the sole member (P. megacephalum) under the
325 monotypic family Platysternidae, the mitogenomes data has been largely assessed. The unusual
326 gene arrangements, duplication of CR, and loss of redundant genes was observed in P.
327 megacephalum mitogenome [50,51,53]. Later on, the micro-evolutionary analysis was performed
328 to know the evolution of CRs, and evidenced that, the duplicate CR in P. megacephalum derived
329 from the heterological ancestral recombination of mitochondrial DNA [52]. The present TreeRex
330 based GO analysis revealed that, both inversion and TDRL events plays a major role in the
331 independent evolution of P. megacephalum as observed in the earlier studies. In the recent past,
332 the draft genome of this species has been assembled to know the species even better [72]. Hence,
333 we recommended whole genome data of Testudines and their comparative analysis will be
334 essential to comprehend the molecular mechanisms leading to their distinctive morphology and
335 physiology. Altogether, the structural features of mitochondrial genomes, phylogeny, and
336 TreeRex based gene arrangement analysis fortify the knowledge on the Testudines evolution.
15 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
337 Conclusions and conservation inferences
338 The evolutionarily distinct and globally endangered (EDGE) freshwater turtle, P. sylhetensis
339 (family Geoemydidae) is endemic to India and Bangladesh. Due to the habitat fragmentation,
340 anthropogenic threats, and illegal poaching; the populations of P. sylhetensis have remarkably
341 declining in its range distribution [73-75]. Hence, on priority, P. sylhetensis is categorized as an
342 ‘endangered’ species in the IUCN Red data list, ‘Appendix II’ category in the Convention on
343 International Trade in Endangered Species of Wild Fauna and Flora (CITES), and ‘Schedule I’
344 species in Indian Wildlife (Protection) Act, 1972. Nevertheless, owing to the unavailability of in-
345 depth genetic information of many Testudines; species-specific biological information has not
346 been amply resolved. The present study assembled and characterized the first complete
347 mitogenome of P. sylhetensis (16,568 bp) and deliberated the comparative analysis with other 52
348 Testudines representing 12 families and two sub-orders (Cryptodira and Pleurodira). The
349 estimated PCGs based BA and ML phylogenies indicated that, P. sylhetensis is closely related to
350 B. trivittata and congruent with the earlier study [62]. Nevertheless, the phylogenetic
351 reassessment with more mitogenomes, evidenced diapsid affinity and sister relationship of
352 Testudines with Archosaurians, which would be fortify the Vertebrate tree of life. The GO
353 analysis also revealed that, most of the species mitogenomes accommodate similar gene
354 arrangements as observed in typical vertebrates with few exceptions (E. macrurus, E. imbricata,
355 P. megacephalum, and shared nodes of Geoemydidae-Testudinidae). Hence, the generated
356 mitogenomic information of P. sylhetensis would be useful for further phylogenetic
357 reconciliation, to understand the evolutionary history, and adopt the precise conservation
358 management.
359
16 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
360 Acknowledgements
361 We thank the Director of Zoological Survey of India (ZSI), Ministry of Environment, Forests
362 and Climate Change (MoEF&CC), Govt. of India for providing necessary permissions and
363 facilities. We are thankful to the Arunachal Pradesh Biodiversity Board for providing necessary
364 permissions and facilities. The first author (S.K) acknowledges the fellowship grant received
365 from Council of Scientific and Industrial Research (CSIR) Senior Research Associateship
366 (Scientists’ Pool Scheme) Pool No. 9072-A.
367 Funding
368 This research was funded by the Ministry of Environment, Forest and Climate Change
369 (MoEF&CC): Zoological Survey of India (ZSI), Kolkata in-house project, ‘National Faunal
370 Genome Resources (NFGR). The funders had no role in study design, data collection and
371 analysis, decision to publish, or preparation of the manuscript.
372 Competing interests:
373 The authors declare that they have no competing interests.
374 Author Contributions
375 Conceptualization: SK, VK; Data curation: KT, SK; Formal analysis: SK; KT; Funding
376 acquisition: KC, VK; Investigation: SK, VK; Methodology: SK, KT, VK; Project administration:
377 KC, VK; Resources: KC; Software: SK, KT; Supervision: VK, KC; Validation: SK, VK;
378 Visualization: SK, KT; Writing – original draft: SK, KT, VK; Writing – review & editing: SK,
379 VK, KC.
17 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
381 Data Availability Statement
382 The following information was supplied regarding the accessibility of DNA sequences: The
383 complete mitogenome of Pangshura sylhetensis is deposited in GenBank of NCBI under
384 accession number MK580979.
385 ORCID IDs:
386 Shantanu Kundu: https://orcid.org/0000-0002-5488-4433
387 Vikas Kumar: http://orcid.org/0000-0002-0215-0120
388 Kaomud Tyagi: https://orcid.org/0000-0003-1064-9826
389 Kailash Chandra: https://orcid.org/0000-0001-9076-5442
390 References
391 1. Darwin C. On the origin of species. London: John Murray. 1859; 502 p.
392 2. Maddison DR, Schulz K-S. The Tree of Life Web Project. 2007; http://tolweb.org.
393 3. Hedges SB, Poling LL. A molecular phylogeny of reptiles. Science. 1999; 283: 998-1001.
394 4. Lourenço JM, Claude J, Galtier N, Chiari Y. Dating cryptodiran nodes: origin and
395 diversification of the turtle superfamily Testudinoidea. Mol Phylogenet Evol. 2012; 62: 496-
396 507.
397 5. Meylan PA. The phylogenetic relationships of softshelled turtles (family Trionychidae). Bull
398 Nat Hist Mus. 1987; 186: 1-101.
399 6. Gaffney ES, Meylan PA. A phylogeny of turtles. In: Benton MJ (Ed.), the Phylogeny and
400 Classification of Tetrapods. Clarendon Press, Oxford, England. 1988; 157-219.
401 7. Shaffer HB, Meylan P, McKnight ML. Tests of turtle phylogeny: molecular, morphological,
402 and paleontological approaches. Syst Biol. 1997; 46: 235-268.
18 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
403 8. Engstrom TN, Shaffer HB, McCord WP. Multiple Data Sets, High Homoplasy, and the
404 Phylogeny of Softshell Turtles (Testudines: Trionychidae), Syst Biol. 2004; 53: 693-710.
405 9. Fujita MK, Engstrom TN, Starkey DE, Shaffer HB. Turtle phylogeny: insights from a novel
406 nuclear intron. Mol Phylogenet Evol. 2004; 31: 1031-1040.
407 10. Barley AJ, Spinks PQ, Thomson RC, Shaffer HB. Fourteen nuclear genes provide
408 phylogenetic resolution for difficult nodes in the turtle tree of life. Mol Phylogenet Evol.
409 2010; 55: 1189-1194.
410 11. Zardoya R, Meyer A. Complete mitochondrial genome suggests diapsid affinities of turtles.
411 Proc Natl Acad Sci USA. 1998; 95: 14226-14231.
412 12. Kumazawa Y, Nishida M. Complete mitochondrial DNA sequences of the green turtle and
413 blue-tailed mole skink: Statistical evidence for Archosaurian affinity of turtles. Mol Biol Evol.
414 1999; 16: 784-792.
415 13. Fong JJ, Brown JM, Fujita MK, Boussau B. A Phylogenomic Approach to Vertebrate
416 Phylogeny Supports a Turtle-Archosaur Affinity and a Possible Paraphyletic Lissamphibia.
417 PLoS ONE. 2012; 7: e48990.
418 14. Shaffer HB, McCartney-Melstad E, Near T, Mount GG, Spinks PQ. Phylogenomic analyses
419 of 539 highly informative loci dates a fully resolved time tree for the major clades of living
420 turtles (Testudines). Mol Phylogenet Evol. 2017; 115: 7-15.
421 15. Zhang L, Nie L, Cao C, Zhan Y. The complete mitochondrial genome of the Keeled box
422 turtle Pyxidea mouhotii and phylogenetic analysis of major turtle groups. J Genet Genom.
423 2008; 35: 33-40.
19 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
424 16. Huang YN, Li J, Jiang QY, Shen XS, Yan XY, Tang YB, et al. Complete mitochondrial
425 genome of the Cyclemys dentata and phylogenetic analysis of the major family Geoemydidae.
426 Genet Mol Res. 2015; 14: 3234-3243.
427 17. Li W, Zhang X-C, Zhao J, Shi Y, Zhu X-P. Complete mitochondrial genome of Cuora
428 trifasciata (Chinese three-striped box turtle), and a comparative analysis with other box
429 turtles. Gene. 2015; 555: 169-177.
430 18. Feng L, Yang J, Zhang YP, Zhao GF. The complete mitochondrial genome of the Burmese
431 roofed turtle (Batagur trivittata) (Testudines: Geoemydidae). Conserv Genet Res. 2017; 9: 95.
432 19. Kundu S, Kumar V, Tyagi K, Chakraborty R, Singha D, Rahaman I, et al. Complete
433 mitochondrial genome of Black Soft-shell Turtle (Nilssonia nigricans) and comparative
434 analysis with other Trionychidae. Sci Rep. 2018; 8: 17378.
435 20. Rest JS, Ast JC, Austin CC, Waddell PJ, Tibbetts EA, Hay JM, et al. Molecular systematics
436 of primary reptilian lineages and the tuatara mitochondrial genome. Mol Phylogenet Evol.
437 2003; 29: 289-97.
438 21. Macey JR, Larson A, Ananjeva NB, Fang Z, Papenfuss TJ. Two novel gene orders and the
439 role of light-strand replication in rearrangement of the vertebrate mitochondrial genome. Mol
440 Biol Evol. 1997; 14: 91-104.
441 22. Mindell D, Sorenson MD, Dimcheff DE. Multiple independent origins of mitochondrial gene
442 order in birds. Proc Natl Acad Sci USA. 1998; 95: 10693-10697.
443 23. Cameron SL, Johnson KP, Whiting MF. The mitochondrial genome of the screamer louse
444 Bothriometopus (Phthiraptera: Ischnocera): effects of extensive gene rearrangements on the
445 evolution of the genome. J Mol Evol. 2007; 65: 589-604.
20 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
446 24. Wei S-J, Shi M, Chen X-X, Sharkey MJ, van Achterberg C, Ye G-Y, et al. New Views on
447 Strand Asymmetry in Insect Mitochondrial Genomes. PLoS ONE. 2010; 5: e12708.
448 25. Kumar V, Tyagi K, Kundu S, Chakraborty R, Singha D, Chandra K. The first complete
449 mitochondrial genome of marigold pest thrips, Neohydatothrips samayunkur (Sericothripinae)
450 and comparative analysis. Sci Rep. 2019; 9: 191.
451 26. San Mauro D, Gower DJ, Zardoya R, Wilkinson M. A hotspot of gene order rearrangement
452 by tandem duplication and random loss in the vertebrate mitochondrial genome. Mol Biol
453 Evol. 2006; 23: 227-234.
454 27. Perseke M, Fritzsch G, Ramsch K, Bernt M, Merkle D, Middendorf M, et al. Evolution of
455 mitochondrial gene orders in echinoderms. Mol Phylogenet Evol. 2008; 47: 855-864.
456 28. Babbucci M, Basso A, Scupola A, Patarnello T, Negrisolo E. Is it an ant or a butterfly?
457 Convergent evolution in the mitochondrial gene order of Hymenoptera and Lepidoptera.
458 Genome Biol Evol. 2014; 6: 3326-43.
459 29. Bernt M, Merkle D, Ramsch K, Fritzsch G, Perseke M, Bernhard D, et al. CREx: inferring
460 genomic rearrangements based on common intervals. Bioinformatics. 2007; 23 2957-2958.
461 30. Bernt M, Merkle D, Middendorf M. An algorithm for inferring mitogenome rearrangements
462 in a phylogenetic tree, In: Nelson CE, Vialette S, editors. Comparative genomics, RECOMB-
463 CG 2008, LNB 5267. Berlin: Springer. 2008; 143-157.
464 31. Praschag P, Hundsdörfer AK, Fritz U. Phylogeny and taxonomy
465 of endangered South and South-east Asian freshwater turtles elucidated by
466 mtDNA sequence variation (Testudines: Geoemydidae: Batagur, Callagur,
467 Hardella, Kachuga, Pangshura). Zool Scr. 2007; 36: 429-442.
21 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
468 32. IUCN. The IUCN red list of threatened species, Version. 2019-1. 2019. Accessed September
469 6, 2019.
470 33. Das I. Colour Guide to the Turtles and Tortoises of the Indian Subcontinent, Portishead,
471 Avon: R & A Publishing Limited. 1991.
472 34. Buhlmann KA, Akre TSB, Iverson JB, Karapatakis D, Mittermeier RA, Georges A, et al. A
473 global analysis of tortoise and freshwater turtle distributions with identification of priority
474 conservation areas. Chelonian Conserv Biol. 2009; 8: 116-149.
475 35. Turtle Taxonomy Working Group (TTWG) [Rhodin AGJ, Iverson JB, Bour R, Fritz U,
476 Georges A, Shaffer HB, van Dijk PP]. Turtles of the world: Annotated checklist and atlas of
477 taxonomy, synonymy, distribution, and conservation status (8th Ed.). In Rhodin AGJ, Iverson
478 JB, van Dijk PP, Saumure RA, Buhlmann KA, Pritchard PCH, Mittermeier RA (Eds.),
479 Conservation biology of freshwater turtles and tortoises: A compilation project of the
480 IUCN/SSC Tortoise and Freshwater Turtle Specialist Group. Chelonian Res Monogr. 2017; 7:
481 1-292.
482 36. Dierckxsens N, Mardulyn P, Smits G. NOVOPlasty: de novo assembly of organelle genomes
483 from whole genome data. Nucleic Acids Res. 2017; 45: e18.
484 37. Thompson JD, Gibson TJ, Higgins DG. Multiple Sequence Alignment Using ClustalW and
485 ClustalX. Curr Protoc Bioinformatics. 2002; 2.3.1-2.3.22.
486 38. Grant JR, Stothard P. The CGView Server: a comparative genomics tool for circular
487 genomes. Nucleic Acids Res. 2008; 36: W181-184.
488 39. Lowe TM, Chan PP. tRNAscan-SE On-line: Search and Contextual Analysis of Transfer
489 RNA Genes. Nucleic Acids Res. 2016; 44: W54-57.
22 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
490 40. Laslett D, Canbäck B. ARWEN, a program to detect tRNA genes in metazoan mitochondrial
491 nucleotide sequences. Bioinformatics. 2008; 24: 172-175.
492 41. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: Molecular evolutionary
493 genetics analysis Version 6.0. Mol Biol Evol. 2013; 30: 2725-2729.
494 42. Perna NT, Kocher TD. Patterns of nucleotide composition at fourfold degenerate sites of
495 animal mitochondrial genomes. J Mol Evol. 1995; 41: 353-359.
496 43. Abascal F, Zardoya R, Telford MJ. TranslatorX: multiple alignments of nucleotide sequences
497 guided by amino acid translations. Nucleic Acids Res. 2010; 38: W7-13.
498 44. Vaidya G, Lohman DJ, Meier R. SequenceMatrix: concatenation software for the fast
499 assembly of multigene datasets with character set and codon information. Cladistics. 2010;
500 27: 171-180.
501 45. Lanfear R, Frandsen PB, Wright AM, Senfeld T, Calcott B. PartitionFinder 2: New Methods
502 for Selecting Partitioned Models of Evolution for Molecular and Morphological Phylogenetic
503 Analyses. Mol Biol Evol. 2016; 34: 772-773.
504 46. Miller MA, Schwartz T, Pickett BE, He S, Klem EB, Scheuermann RH, et al. A RESTful
505 API for Access to Phylogenetic Tools via the CIPRES Science Gateway. Evol Bioinform.
506 2015; 11: 43-48.
507 47. Ronquist F, Huelsenbeck JP. MrBayes 3: Bayesian phylogenetic inference under mixed
508 models. Bioinformatics. 2003; 19: 1572-1574.
509 48. Trifinopoulos J, Nguyen L-T, von Haeseler A, Minh BQ. W-IQ-TREE: a fast online
510 phylogenetic tool for maximum likelihood analysis. Nucleic Acids Res. 2016; 44: W232-
511 W235.
23 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
512 49. Letunic I, Bork P. Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree
513 display and annotation. Bioinformatics. 2007; 23: 127-128.
514 50. Parham JF, Feldman CR, Boore JL. The complete mitochondrial genome of the enigmatic
515 bigheaded turtle (Platysternon): description of unusual genomic features and the reconciliation
516 of phylogenetic hypotheses based on mitochondrial and nuclear DNA. BMC Evol Biol. 2006;
517 6: 1-11.
518 51. Peng QL, Nie LW, Pu YG. Complete mitochondrial genome of Chinese big-headed turtle,
519 Platysternon megacephalum, with a novel gene organization in vertebrate mtDNA. Gene.
520 2006; 380: 14-20.
521 52. Zheng C, Nie L, Wang J, Zhou H, Hou H, Wang H, et al. Recombination and Evolution of
522 Duplicate Control Regions in the Mitochondrial Genome of the Asian Big-Headed Turtle,
523 Platysternon megacephalum. PLoS ONE. 2013; 8: e82854.
524 53. Luo H, Li H, Huang A, Ni Q, Yao Y, Xu H, et al. The Complete Mitochondrial Genome
525 of Platysternon megacephalum peguense and Molecular Phylogenetic Analysis. Genes
526 (Basel). 2019; 10: E487.
527 54. Anderson S, Bruijn M, Coulson A, Eperon I, Sanger F, Young I. Complete sequence of
528 bovine mitochondrial DNA. Conserved features of the mammalian mitochondrial genome. J
529 Mol Biol. 1982; 156: 683-717.
530 55. Varani G, McClain WH. The G-U wobble base pair: A fundamental building block of RNA
531 structure crucial to RNA function in diverse biological systems. EMBO Reports. 2000; 1: 18-
532 23.
533 56. Ruokonen M, Kvist L. Structure and evolution of the avian mitochondrial control region.
534 Mol Phylogenet Evol. 2002; 23: 422-432.
24 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
535 57. Wang L, Zhou X, Nie L. Organization and variation of mitochondrial DNA control region in
536 pleurodiran turtles. Zoologia. 2011; 28: 495-504.
537 58. Le M, Raxworthy CJ, McCord WP, Mertz L. A molecular phylogeny of tortoises
538 (Testudines: Testudinidae) based on mitochondrial and nuclear genes. Mol Phylogenet Evol.
539 2006; 40: 517-531.
540 59. Guillon JM, Guéry L, Hulin V, Girondot M. A large phylogeny of turtles (Testudines) using
541 molecular data. Contrib Zool. 2012; 81: 147-158.
542 60. Spinks PQ, Shaffer HB, Iverson JB, McCord WP. Phylogenetic hypotheses for the turtle
543 family Geoemydidae. Mol Phylogenet Evol. 2004; 32: 164-182.
544 61. Le M, McCord WP, Iverson JB. On the paraphyly of the genus Kachuga (Testudines:
545 Geoemydidae). Mol Phylogenet Evol. 2007; 45: 398-404.
546 62. Kundu S, Kumar V, Tyagi K, Chakraborty R, Chandra K. The first complete mitochondrial
547 genome of the Indian Tent Turtle, Pangshura tentoria (Testudines: Geoemydidae):
548 Characterization and comparative analysis. Ecol Evol. 2019; 1-15.
549 63. Rieppel O, deBragga M. Turtles as diapsid reptiles. Nature. 1996; 384: 453-455.
550 64. Rieppel O. Turtles as diapsid reptiles. Zool Scr. 2000; 29: 199-212.
551 65. Hedges SB. Amniote phylogeny and the position of turtles. BMC Biol. 2012; 10: 64.
552 66. Mindell DP, Sorenson MD, Dimcheff DE, Hasegawa M, Ast JC, Yuri T. Interordinal
553 Relationships of Birds and Other Reptiles Based on Whole Mitochondrial Genomes, Syst
554 Biol. 1999; 48: 138-152.
555 67. Janke A, Arnason U. The complete mitochondrial genome of Alligator mississippiensis and
556 the separation between recent archosauria (birds and crocodiles). Mol Biol Evol. 1997; 14:
557 1266-1272.
25 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
558 68. Tzika AC, Helaers R, Schramm G, Milinkovitch MC. Reptilian-transcriptome v1.0, a
559 glimpse in the brain transcriptome of five divergent Sauropsida lineages and the phylogenetic
560 position of turtles. EvoDevo. 2011; 2: 1-18.
561 69. Shen XX, Liang D, Wen JZ, Zhang P. Multiple genome alignments facilitate development of
562 NPCL markers: a case study of tetrapod phylogeny focusing on the position of turtles, Mol
563 Biol Evol. 2011; 28: 3237-3252.
564 70. Chiari Y, Cahais V, Galtier N, Delsuc F. Phylogenomic analyses support the position of
565 turtles as sister group of birds and crocodiles. BMC Biol. 2012; 10: 65.
566 71. Crawford NG, Faircloth BC, McCormack JE, Brumfield RT, Winker K, Glenn T.C. More
567 than 1000 ultraconserved elements provide evidence that turtles are the sister group of
568 archosaurs, Biol Lett. 2012; 8: 783-786.
569 72. Cao D, Wang M, Ge Y, Gong S. Draft genome of the big-headed turtle Platysternon
570 megacephalum. Sci Data. 2019; 6: 60.
571 73. van Dijk PP. The status of turtles in Asian. In van Dijk PP, Stuart BL, Rhodin AGJ. (Eds.),
572 Asian Turtle Trade: Proceedings of a Workshop on Conservation and Trade of Freshwater
573 Turtles and Tortoises in Asia. Lunenburg, MA: Chelonian Research Foundation. 2000; 15-23.
574 74. Kundu S, Kumar V, Laskar BA, Tyagi K, Chandra K. Pet and turtle: DNA barcoding
575 identified twelve Geoemydid species in northeast India. Mitochondrial DNA B. 2018; 3: 513-
576 518.
577 75. Rhodin AGJ, Stanford CB, van Dijk PP, Eisemberg C, Luiselli L, Mittermeier RA, et al.
578 Global Conservation Status of Turtles and Tortoises (Order Testudines). Chelonian Conserv
579 Biol. 2018; 17: 135-161.
26 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
581 Tables and Figures Captions:
582 Table 1. List of annotated mitochondrial genes of Pangshura sylhetensis.
583 Fig 1. The spatial range distribution (marked by gray shadow) and collection locality (27.50 N
584 96.24 E marked by red square) of P. sylhetensis. The species photograph and mitochondrial
585 genome of P. sylhetensis. Protein-coding genes are marked by blue, deep green, and yellowish-
586 green color boxes, rRNA genes are marked by violet color boxes, tRNA genes are marked by red
587 color boxes, and control region is marked by gray color box.
588 Fig 2. Frequency of start and stop codons for 13 protein-coding genes in the 52 Testudines
589 mitogenomes.
590 Fig 3. Bayesian (BA) phylogenetic tree based on the concatenated nucleotide sequences of 13
591 PCGs of 52 Testudines species showing the evolutionary relationship of P. sylhetensis. Color
592 boxes indicate the family level clustering for the studied species. The BA tree is drawn to scale
593 with posterior probability support values were indicated along with the branches. Representative
594 organism line diagrams are acquired from cyberspace.
595 Fig 4. Bayesian (BA) phylogenetic tree based on the concatenated 13 PCGs of Testudines and
596 other amniotes mitochondrial genomes indicated the diapsid affinities of Testudines and close
597 relationship with Archosaurians. Color boxes indicate the family level clustering for the studied
598 Testudines species and two sub-orders are marked by black bar. The BA tree is drawn to scale
599 with posterior probability support values were indicated along with the branches. Representative
600 organism line diagrams are acquired from cyberspace.
27 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
601 Fig 5. Gene-order based topology based on TreeREx analysis revealed the gene arrangement
602 scenario within 52 Testudines species. All internal nodes are shared color as per phylogenetic
603 clustering. Four gene arrangement scenarios are superimposed besides the tree.
604 Supporting information
605 S1 Fig. A pictorial overview of the methodologies used for sequencing and analysis of P.
606 sylhetensis mitogenome and bioanalyzer profiles after sonication of enriched mitochondrial DNA
607 sample and libraries.
608 S2 Fig. Putative secondary structures for 22 tRNA genes in mitochondrial genome of P.
609 sylhetensis. The first structure shows the nucleotide positions and details of stem-loop of tRNAs.
610 The tRNAs are represented by full names and IUPAC-IUB single letter amino acid codes.
611 Different base pairings are marked by red, blue and green color bars respectively.
612 S3 Fig. Comparison of control region (CR) stem-loop structures of the origin of L-strand
613 replication of 24 Testudines mitochondrial genomes.
614 S4 Fig. Comparison of control region (CR) stem-loop structures of the origin of L-strand
615 replication of 26 Testudines mitochondrial genomes.
616 S5 Fig. Maximum Likelihood (ML) phylogenetic tree based on the concatenated nucleotide
617 sequences of 13 PCGs of 52 Testudines species. Color boxes indicate the family level clustering
618 for the studied species. The ML tree is drawn by IQ-Tree with bootstrap support values were
619 indicated along with each node.
620 S6 Fig. Maximum Likelihood (ML) phylogenetic tree based on the concatenated 13 PCGs of
621 Testudines and other amniotes mitochondrial genomes. Color boxes indicate the family level
28 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
622 clustering for the studied Testudines species. The ML tree is drawn by IQ-Tree with bootstrap
623 support values were indicated along with each node.
624 S1 Table. List of mitogenome sequences of Testudines and other amniotes species acquired from
625 the NCBI database.
626 S2 Table. Estimated models by partitioning the 13 PCGs separately through PartitionFinder 2 for
627 phylogenetic analysis of two datasets (52 mitogenomes of Testudines and 63 mitogenomes of
628 Testudines + other amniotes).
629 S3 Table. Gene arrangements of the studied Testudines species used in the TreeREx analysis.
630 S4 Table. Nucleotide composition of 52 Testudines species mitochondrial genomes. The A+T
631 biases of whole mitogenome, protein coding genes, tRNA, rRNA, and control regions were
632 calculated by AT-skew = (A-T)/(A+T) and GC-skew= (G-C)/(G+C), respectively.
633 S5 Table. Frequency of start and stop codon distribution within the complete mitogenomes of 52
634 Testudines.
29 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
636 Table 1 List of annotated mitochondrial genes of Pangshura sylhetensis.
Size Anti- Start Stop Intergenic Gene Direction Location (bp) codon codon codon Nucleotides trnF + 1-69 69 GAA . . 0 rrnS + 70-1031 962 . . . 0 trnV + 1032-1100 69 TAC . . 11 rrnL + 1112-2697 1586 . . . 1 trnL2 + 2699-2774 76 TAA . . 0 nad1 + 2775-3743 969 . ATG TAG -1 trnI + 3743-3813 71 GAT . . -1 trnQ - 3813-3883 71 TTG . . -1 trnM + 3883-3951 69 CAT . . 0 nad2 + 3952-4992 1041 . ATG TAG -2 trnW + 4991-5064 74 TCA . . -1 trnA - 5064-5132 69 TGC . . 1 trnN - 5134-5207 74 GTT . . 27 trnC - 5235-5300 66 GCA . . 0 trnY - 5301-5371 71 GTA . . 1 cox1 + 5373-6923 1551 . GTG AGG -12 trnS2 - 6912-6982 71 TGA . . 0 trnD + 6983-7052 70 GTC . . 0 cox2 + 7053-7739 687 . ATG TAG 1 trnK + 7741-7813 73 TTT . . 1 atp8 + 7815-7982 168 . ATG TAA -10 atp6 + 7973-8656 684 . ATG TAA -1 cox3 + 8656-9440 785 . ATG TA(A) -1 trnG + 9440-9507 68 TCC . . 1 nad3 + 9509-9857 349 . ATG GAA 1 trnR + 9859-9927 69 TCG . . 0 nad4l + 9928-10224 297 . ATG TAA -7 nad4 + 10218-11594 1377 . ATG TAA 14 trnH + 11609-11677 69 GTG . . 0 trnS1 + 11678-11744 67 GCT . . -1 trnL1 + 11744-11816 73 TAG . . -21 nad5 + 11796-13628 1833 . GTG TAG -8 nad6 - 13621-14145 525 . ATG AGG -3 trnE - 14143-14210 68 TTC . . 4 cytb + 14215-15361 1147 . ATG T(AA) -3 trnT + 15359-15430 72 TGT . . 0 trnP - 15431-15501 71 TGG . . 66 D-loop . 15568-16389 822 . . . . 637
638
30 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.