<<

bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 The complete mitochondrial genome of endangered Assam Roofed , Pangshura

2 sylhetensis (Testudines: ): Genomic features and Phylogeny

3 Shantanu Kundu, Vikas Kumar*, Kaomud Tyagi, Kailash Chandra

4 Centre for DNA , Molecular Systematics Division, Zoological Survey of India, M

5 Block, New Alipore, Kolkata-700053, India.

6

7 * Corresponding author:

8 Dr. Vikas Kumar, Scientist D

9 Centre for DNA Taxonomy, Molecular Systematics Division,

10 Zoological Survey of India, M Block, New Alipore, Kolkata-700053

11 Email: [email protected]

12 Mob. No. +91-9674944233

13

14 Running Head:

15 Complete mitogenome of Pangshura sylhetensis

16

17

18

19

20

1 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

22 Abstract

23 , Pangshura sylhetensis is an endangered and least studied endemic

24 to India and Bangladesh. The genomic feature of P. sylhetensis mitogenome is still anonymous

25 to the scientific community. The present study decodes the first complete mitochondrial genome

26 of P. sylhetensis (16,568 bp) by using next-generation sequencing. This de novo assembly

27 encodes 13 Protein-coding genes (PCGs), 22 transfer RNAs (tRNAs), two ribosomal RNAs

28 (rRNAs), and one control region (CR). Most of the genes were encoded on the majority strand,

29 except NADH dehydrogenase subunit 6 (nad6) and eight tRNAs. Most of the PCGs were started

30 with an ATG initiation codon, except for Cytochrome oxidase subunit 1 (cox1) and NADH

31 dehydrogenase subunit 5 (nad5) with GTG. The study also found the typical cloverleaf

32 secondary structure in most of the tRNA genes, except for serine (trnS1) with lack of

33 conventional DHU arm and loop. Both, Bayesian and Maximum-likelihood topologies showed

34 distinct clustering of all the Testudines species with their respective taxonomic ranks and

35 congruent with the previous phylogenetic hypotheses (Pangshura and sister taxa).

36 Nevertheless, the mitogenomic phylogeny with other amniotes corroborated the sister

37 relationship of Testudines with Archosaurians (Birds and Crocodilians). Additionally, the

38 mitochondrial Gene Order (GO) analysis indicated that, most of the Testudines species showed

39 plesiomorphy with typical vertebrate GO.

40 Keywords:

41 Chelonian, Geoemydidae, Mitogenome, Species phylogeny, Gene Order, Evolution.

42

43

2 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

44 Introduction

45 The evolution of living organisms is a continuous process over generations and difficult to

46 understand by measuring with a distinct hypothesis [1]. Several biological as well as

47 environmental factors play an important role to change the descendent gene of a common

48 ancestor that subsequently shifted into the next offspring’s or new species. Indeed, the genetic

49 traits of a certain population drift randomly and evidenced to be gradually leads by the natural

50 selection. Nevertheless, due to this high potency and tremendous success of genetic information,

51 the evolutionary pattern of many living taxa are now well-understood through phylogenetic

52 assessment. Hence, the world-wide collaborative effort of biologists ‘Tree of Life (ToL) project’

53 has been launched to illuminate the biodiversity and their evolutionary history [2].

54 Testudines (, , and terrapins) are one of the oldest living organisms in the

55 earth with an extended evolutionary history [3,4]. Besides the morphology, the genetic approach

56 has been repeatedly applied to address the systematics of Testudines [5-7]. Both nuclear and

57 mitochondrial genes have been extensively utilized to perceive the knowledge on Testudines

58 phylogeny and genetic diversity [8-10]. Nonetheless, the Testudines mitogenomes were also

59 scrutinized to understand their evolution and furnished the evidence of ‘Turtle-Archosaur

60 affinity’ [11,12]. Additionally, to address the evolution of turtles within amniotes and strengthen

61 the Vertebrate tree of life, the phylogenomic approach was also evaluated [13,14]. However, the

62 phylogeny of Testudines is yet to be reconciled by mining of large-scale molecular data to know

63 their precise evolutionary scenario.

64 Testudines mitochondrial genomes are usually circular and double stranded with 16-19

65 kb in size contains 37 genes [15-19]. Remarkably, some of the species including turtles,

3 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

66 have variable numbers of genes in their mitogenome due to the losing of protein-coding genes

67 (PCGs), duplication of control regions (CRs), and occurrence of multiple contiguous numbers of

68 one or more transfer RNA genes (tRNAs) [20]. Besides the structural features of different genes,

69 the gene orders (GOs) in mitogenome have also evidenced to be the signature of defining clades

70 at different taxonomic levels in both vertebrates and invertebrates [21-25]. In general, the

71 arrangements of GO are defined by transposition, inversion, inverse transposition, and tandem

72 duplication and random loss (TDRL) mechanisms in comparisons with the typical ancestral GO

73 [26]. Moreover, these evolutionary events also help to define the plesiomorphic/apomorphic

74 status and transformational pathways of mitogenome [27,28]. Thus, apart from the conventional

75 analysis; to infer the evolutionary pathways leading to the detected diversity of GOs, it is

76 essential to test phylogeny and GO conjointly [29,30]. As of now, more than 100 Testudines

77 species mitogenomes of 56 genera within 12 families and two sub-orders were generated

78 worldwide to address several phylogenetic questions. However, the combined study on GO and

79 phylogeny has never been assessed for Testudines to discuss the evolutionary tracts.

80 The evolutionarily distinct Pangshura is known by four extant species from

81 Southeast Asian countries [31]. Among them, the Assam Roofed turtle, Pangshura sylhetensis is

82 a highly threatened species and categorized as ‘Endangered’ in the International Union for

83 Conservation of Nature (IUCN) Red data list [32]. The distribution of P. sylhetensis is restricted

84 to Bangladesh and India [33,34]. Although, the partial segments of mitochondrial genetic

85 information are publicly available, but the mitogenomic information on this species is still

86 anonymous to the scientific communities. Therefore, the present study aimed to generate the

87 complete mitochondrial genome of P. sylhetensis and execute the comparative analysis with

88 other Testudines belonging to both sub-orders ( and ). Here, we used

4 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

89 phylogenetic (Bayesian and Maximum Likelihood) and GO analyses for better insights into the

90 Testudines evolutionary scenario.

91 Materials and Methods

92 Ethics statement and sample collection

93 To conduct the field survey and biological sampling, prior permission was acquired from the

94 wildlife authority of Arunachal Pradesh (Letter No. SFRI/APBB/09‐2011‐1221‐1228 dated

95 22.07.2016) and Zoological Survey of India, Kolkata (Letter No. ZSI/MSD/CDT/2016‐17 dated

96 29.07.2016). No turtle specimens were sacrificed in the present study. The P. sylhetensis

97 specimen was collected from the tributaries of Brahmaputra River (Latitude 27.50 N and

98 Longitude 96.24 E) from the nearby localities of Namdapha National Park in northeast India

99 (Fig. 1). The species photograph was taken by the first authors (S.K.) and edited manually in

100 Adobe Photoshop CS 8.0. The country level topology map has been downloaded from DIVA-

101 GIS Spatial data platform (http://www.diva-gis.org/datadown) and overlaying by ArcGIS 10.6

102 software (ESRI®, CA, USA). The range distribution was edited manually in Adobe Photoshop

103 CS 8.0 followed by the published record [35]. The blood sample was collected in sterile

104 condition from the hind limb of the live individual by using micro‐syringe and stored in EDTA

105 vial at 4°C. Subsequently the specimen was released back in the same eco-system with ample

106 care and attention.

107 Mitochondrial DNA extraction and sequencing

108 The collected blood sample (20 μl) was thoroughly blended with 1 ml buffer (0.32 M Sucrose, 1

109 mM EDTA, 10 mM TrisHCl) and centrifuged at 700 × g for 5 min at 4°C to remove nuclei and

110 cell debris. The supernatant was collected in 1.5 ml eppendorf tubes and centrifuged at 12,000 ×

5 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

111 g for 10 min at 4°C to precipitate the mitochondrial pellet. The pellet was re-suspended in 200 μl

112 of buffer (50 mM TrisHCl, 25 mM of EDTA, 150 mM NaCl), with the addition of 20 μl of

113 proteinase K (20 mg/ml) followed by incubation at 56°C for 1-2 hr and the mitochondrial DNA

114 was extracted by Qiagen DNeasy Blood & Tissue Kit (QIAGEN Inc.). The DNA quality was

115 visualized in 1% agarose gel electrophoresis, and the concentration was quantified by

116 NANODROP 2000 spectrophotometer (Thermo Scientific).

117 Mitogenome assembly and annotation

118 Complete mitochondrial genome sequencing and de novo assembly was carried out at Xcelris

119 Labs Limited, Gujarat, India (http://www.xcelrisgenomics.com/). The genome library was

120 sequenced using the Illumina platform (2x150bp PE chemistry) with the average size of library

121 was 545bp, to generate ~3 GB data (Illumina, Inc, USA). The total yield of the Number of Paired

122 end was 26,263,128 with the maximum data of 3.78 GB. The raw reads were handled using

123 cutadapt tool (http://code.google.com/p/cutadapt/) for adapters and low quality bases trimming

124 with cutoff of Phread quality score (Q score) of 20. The high quality reads were down sampled to

125 2 million reads using Seqtk program (https://github.com/lh3/seqtk). High quality paired end data

126 was assembled with NOVOPlasty v2.6.7 with default parameters [36]. The mitogenome of

127 reevesii (accession no. KJ700438) was used as a reference seed sequence for de novo

128 assembly with 16568 bp (~16.5Kb) single contig. To validate the assembly obtained from

129 NOVOPlasty assembler, similarity search was carried out in GenBank database using BLASTn

130 v2.2.28 search (https://blast.ncbi.nlm.nih.gov) algorithm. Further, the contig was subjected to

131 confirm by MITOS v806 online webserver (http://mitos.bioinf.uni-leipzig.de). The DNA

132 sequences of the protein coding genes (PCGs) were primarily translated into the putative amino

133 acid sequences on the basis of the vertebrate mitochondrial genetic code. The exact initiation and

6 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

134 termination codons were identified in ClustalX using other publicly available reference

135 sequences of Testudines [37]. The de-novo mitogenome was submitted to the GenBank database

136 through the Sequin submission tool (Fig. S1).

137 Data set construction and comparative analysis

138 The spherical representation of the generated mitogenome of P. sylhetensis was plotted by

139 CGView Server (http://stoth ard.afns.ualbe rta.ca/cgview_server/) with default parameters [38].

140 The strand direction and arrangements of each PCG, tRNA, and ribosomal RNA (rRNA) were

141 also checked through MITOS online server. The overlapping regions and intergenic spacers

142 between the neighbor genes were counted manually through Microsoft Excel. The tRNA genes

143 of P. sylhetensis were verified in MITOS online server, tRNAscan‐SE Search Server 2.0

144 (http://lowel ab.ucsc.edu/tRNAs can-SE/) and ARWEN 1.2 [39,40]. The base composition of all

145 stems (DHU, acceptor, TѱC, anticodon) were examined manually to discriminate the

146 Watson‐Crick, wobble, and uneven base pairing. On the basis of homology search in the Refseq

147 database (https ://www.ncbi.nlm.nih.gov/refse q/), 51 Testudines mitogenomes representing nine

148 families of Cryptodira and three families of Pleurodira were downloaded from GenBank and

149 incorporated in the dataset for comparative analysis (Table S1). The genome size and

150 comparative analysis of nucleotide composition with other Testudines were calculated using

151 MEGA6.0 [41]. The base composition skew was calculated as defined previously: AT skew = (A

152 – T)/(A + T), GC skew = (G – C)/(G + C) [42]. The start and stop codons of PCGs were checked

153 through the Open Reading Frame Finder web tool (https ://www.ncbi.nlm.nih.gov/orffi nder/).

154 To determine the location for replication of the L-strand and putative secondary structures, all

155 Testudines CRs were analyzed through online Mfold web server (http://unafold.rna.albany.edu).

7 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

156 Due to the lack of proper annotation, the CRs of two species ( cartilaginea and

157 hilarii) were missing and thus not incorporated in the comparative analysis.

158 Phylogenetic and Gene Order (GO) analysis

159 To assess the Bayesian analysis (BA) and maximum‐likelihood (ML) phylogenetic relationship,

160 two datasets were prepared in the present analysis. The first dataset includes all the PCGs of 52

161 Testudines mitogenomes including P. sylhetensis (Table S1). As the targeted species is a

162 Cryptodiran species, all the Pleurodiran species were treated as out-group taxa in the first dataset.

163 Further, for reassess the existing hypothesis of Testudines phylogenetic position with other

164 amniotes, all the PCGs of 11 database sequences (Whale: KU891394, Human: AP008580,

165 Platypus: NC_000891, Opossum: AJ508398, Iguana: NC_002793, Lizards: AB080237, Snake:

166 HM581978, Tuatara: AF534390, Bird: AP003580, Alligator: NC_001922, and Crocodile:

167 HM488007) were acquired from the GenBank database to construct the second dataset (Table

168 S1). Among them, the database sequences of aquatic Amniota (Physeter catodon: KU891394)

169 was used as out-group taxa. The PCGs were aligned individually by codons using MAFFT

170 algorithm in TranslatorX with L‐INS‐i strategy with GBlocks parameters [43]. Finally, the two

171 datasets of all PCGs was concatenated using SequenceMatrix v1.7.84537 [44]. The suitable

172 models for phylogenetic analysis were estimated by PartitionFinder 2 [45] at CIPRES Science

173 Gateway V. 3.3 [46] (Table S2). The BA tree was built through Mr. Bayes 3.1.2 and the

174 metropolis‐coupled Markov Chain Monte Carlo (MCMC) was run for 100,000,000 generations

175 with sampling at every 100th generation and 25% of samples were discarded as burn‐in [47]. The

176 ML tree was constructed using IQ‐Tree web server with 1000 bootstrap support [48]. Both BA

177 and ML topologies for two datasets were further refined in iTOL v4.

8 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

178 (https://itol.embl.de/login.cgi) tool for better illustration [49] and edited with Adobe Photoshop

179 CS 8.0.

180 Further, to check the gene arrangement scenario, most contemporary TreeREx analysis

181 was adopted to infer the evolutionary pathways within the Testudines, leading to the observed

182 diversity of the GOs. TreeREx can easily distinguish the putative GOs at the internal nodes of a

183 reference tree as it works in a bottom-up manner through the iterative analysis of triplets or

184 quadruplets of GOs to decide all the GOs in the entire tree [30]. In the TreeREx analysis, the

185 consistent nodes are considered to be most reliable and marked by green color, whereas the

186 fallback nodes have the highest level of uncertainty marked by red color. In doing so, we used

187 the default settings of TreeREx suggested on the website (http://pacosy.informatik.uni-

188 leipzig.de/185-0-TreeREx.html) to analyze every node of the reference phylogenetic tree was

189 defined by a GO. To finalize the gene arrangements dataset of 52 Testudines for TreeREx

190 analysis (Table S3); the insertion, deletion, and duplication of genes within the database

191 available mitogenomes were reminisced as discussed in the previous studies [50-53].

192 Results and Discussion

193 Mitogenome structure and organization

194 The mitogenome (16,568 bp) of endangered Assam Roofed turtle, P. sylhetensis was determined

195 in the present study (GenBank accession no. MK580979). The mitogenome was encoded by 37

196 genes, comprising 13 PCGs, 22 tRNAs, two rRNAs, and a major non-coding CR. Among them,

197 nine genes (NADH dehydrogenase subunit 6 and eight tRNAs) were located on the minority

198 strand, while the remaining 28 genes were located on the majority strand (Table 1 and Fig. 1). In

199 the present dataset, the length of mitogenome was varied from 15,339 bp (A. cartilaginea) to

9 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

200 19,403 bp (Stigmochelys pardalis). Out of 52 Testudines species, 43 species were showed

201 strands symmetry as observed in typical vertebrates [54]. The gene arrangements scenarios

202 within the Testudines species are discussed elaborately in the forthcoming subsection. The

203 nucleotide composition of P. sylhetensis mitogenome was A+T biased (59.27%) as represented

204 in other Testudines mitogenomes ranging from 57.76% (Pleurodiran species P. hilarii) to

205 64.19% (Cryptodiran species leucostomum) (Table S4). The A+T composition of P.

206 sylhetensis PCGs was 58.77%. The AT skew and GC skew was 0.124 and -0.334 in the

207 mitogenome of P. sylhetensis. The comparative analysis showed that the AT skew ranged from

208 0.087 ( graeca) to 0.208 (Carettochelys insculpta) and GC skew from -0.296 (S.

209 pardalis) to -0.412 ( triunguis) (Table S4). A total of 15 overlapping regions with a total

210 length of 73 bp were identified in P. sylhetensis mitogenome. The longest overlapping region (21

211 bp) was observed between tRNA‐Leucine (trnL1) and NADH dehydrogenase subunit 5 (nad5).

212 Further, a total of 12 intergenic spacer regions with a total length of 129 bp were observed in P.

213 sylhetensis mitogenome with a longest region (66 bp) between tRNA‐Proline (trnP) and CR

214 (Table 1).

215 Protein‐coding genes

216 The total length of PCGs was 11,268 bp in P. sylhetensis, which represent 68.01% of complete

217 mitogenome. The AT skew and GC skew was 0.056 and -0.345 in PCGs of P. sylhetensis (Table

218 S4). Most of the PCGs of P. sylhetensis started with an ATG initiation codon; however the GTG

219 initiation codon was observed in Cytochrome oxidase subunit 1 (cox1) and nad5. The TAG

220 termination codon was used by four PCGs, TAA by four PCGs, AGG by two PCGs, and GAA

221 by one PCG. The incomplete termination codons TA(A) and T(AA) were detected in

222 Cytochrome oxidase subunit 3 (cox3) and Cytochrome b (cytb) gene respectively. The

10 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

223 comparative analysis with other Testudines species revealed that the high frequency of initiation

224 codons (ATN, GTG) were observed in NADH dehydrogenase subunit 1 (nad1), NADH

225 dehydrogenase subunit 2 (nad2), NADH dehydrogenase subunit 3 (nad3), NADH dehydrogenase

226 subunit 4L (nad4L), NADH dehydrogenase subunit 4 (nad4), nad5, and cytb. The frequency of

227 ATG initiation codon is always higher in most of the PCGs (64.71% to 96.08%), except for cox1

228 with GTG start codon (72.55%). The higher frequency of complete termination codon was

229 observed in eight PCGs, except five PCGs with incomplete termination codon (Table S5 and Fig.

230 2).

231 Ribosomal RNA and transfer RNA genes

232 The total length of two rRNA genes of P. sylhetensis was 2,560 bp with ranged from 1,611 bp

233 (Caretta caretta) to 2,685 bp ( sinensis) in the present dataset. The AT content within

234 rRNA genes was 58.98% with AT and GC skew was 0.272 and -0.173 respectively (Table S4).

235 Total 22 tRNAs were found in the P. sylhetensis mitogenome with the total length of 1,550 bp.

236 In other Testudines, the length of tRNAs was varied from 1,407 bp (A. cartilaginea) to 1,684 bp

237 (Platysternon megacephalum). The AT content within tRNA genes was 59.87% with AT and GC

238 skew was 0.017 and 0.057 respectively (Table S4). Among all the tRNA genes, 14 were present

239 in majority strand and eight tRNA genes (trnQ, trnA, trnN, trnC, trnY, trnS2, trnE, and trnP) on

240 minority strand. The anticodons of all the tRNA genes were also detected in the present study.

241 Most of the tRNA genes were folded into classical cloverleaf structure except trnS1 (without

242 Dihydrouridine (DHU) stem and loop (Fig. S2). The conventional base pairings (A=T and G≡C)

243 were observed in most of the tRNAs [55]; however the wobble base pairing were observed in the

244 stem of 13 tRNAs (trnL2, trnN, trnA, trnW, trnP, trnE, trnR, trnG, trnC, trnY, trnS2, trnK, trnQ)

245 (Fig. S2).

11 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

246 Control regions

247 The CR of P. sylhetensis was typically distributed with three functional domains, the termination

248 associated sequence (TAS), the central conserved (CD), and the conserved sequence block

249 (CSB) as observed in other vertebrate CRs [56]. As compared to the TAS and CSB domain with

250 varying numbers of tandem repeats, the CD domain consisting with highly conserved sequences.

251 Hence, the pattern of CR was varied among different vertebrates, including Testudines [56,57].

252 The total length of P. sylhetensis CR was 1,067 bp, ranging from 600 bp (Lepidochelys olivacea)

253 to 3,885 bp (S. pardalis) in the present dataset. In P. sylhetensis CR, the AT and GC skew was -

254 0.025 and -0.249 (Table S4). The TAS domain was ‘TACATA’, while the CSB domain was

255 further divided into four regions: CSB-F (AGAGATAAGCAAC), CSB-1 (GACATA), CSB-2

256 (TTAAACCCCCCTACCCCCC), and CSB-3 (TCGTCAAACCCCTAAATCC). The CR is also

257 acknowledged for the initiation of replication, and is positioned between trnP and trnF for most

258 of the Testudines except P. megacephalum [50,52]. In, P. sylhetensis 27 bp nucleotides are

259 present between CSB-1 and stem-loop structure, however in other species, it was ranging from

260 zero to 90 bp (T. triunguis) (Fig. S3 and S4). Overall, the structural features of the replication of

261 the L-strand and putative secondary structures are different in all Testudines species, which can

262 be used as a species-specific marker.

263 Phylogeny of Testudines mitogenomes

264 The phylogenetic position of Testudines in the Vertebrate tree of life has been evaluated in last

265 few decades. Several approaches have been aimed to reconcile the phylogeny for better

266 understanding of their origin and diversification [4]. Besides, morphological parameters the gene

267 based topology have been widely measured for their identification, species delimitation, and

12 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

268 population genetics assessment [58,59]. Nevertheless, to interpret the evolutionary scenario, the

269 tested dataset should comprise with comprehensive genetic information on various taxa

270 representing all extant lineages. Both morphological and molecular data corroborates to erect all

271 the Pangshura species from the closest congeners of Batagur in earlier studies [31,60,61]. The

272 mitogenomic data have successfully evidenced the phylogenetic relationships of many

273 Testudines species, including the studied genus Pangshura under Geoemydidae family [62]. Till

274 date, majority of species under sub-family has been targeted throughout the world

275 to generate their mitogenomes. Nevertheless, only two species mitogenomes of sub-family

276 Batagurinae are presently available in the global database. Hence, the present study adds the de

277 novo assembly third Batagurinae species (P. sylhetensis) in the global database. Both BA and

278 ML phylogenies effectively discriminated all the studied Testudines species compiled in the first

279 dataset with high bootstrap support and posterior probability (Fig. 3 and S5). The present

280 phylogeny evidenced the sister relationship of P. sylhetensis with Batagur trivittata species as

281 described in the previous phylogeny [62]. The other representative species are also showed close

282 clustering within their respective families and sub-orders as well as consistent with the previous

283 phylogenetic hypothesis [59,61]. The present mitogenomic phylogeny with concatenated 13

284 PCGs was adequate to generate a robust phylogeny and illuminate the relationship between P.

285 sylhetensis with other Testudines species. More taxa from different taxonomic lineages from

286 diverse distribution localities and their large-scale genetic data will be worthy to settle more

287 exhaustive phylogeny and evolutionary connection.

288 In addition, the evolutionary position of Testudines is still controversial in relationship

289 with other amniotes [63-65]. To resolve the impediments on the placement of Testudines either

290 in or diapsids, the mitogenome of Pleurodiran species has been analyzed previously and

13 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

291 reject the placement of Testudines as most basal position in the amniote tree of life [11,66]. The

292 mitogenome sequences have also been studied to evaluate the relationships between

293 Archosaurians (birds and crocodilians) and Lepidosaurians (tuatara, snakes, and lizards) [67]. In

294 recent past, the reptilian transcriptome based phylogenetic analysis also suggests that, Testudines

295 are not the basal of extant but showed affinity towards other Archosaurians [68]. Further,

296 the candidate nuclear protein-coding locus (NPCL) markers were also evaluated and

297 demonstrated that, turtles are robustly recovered as the sister group of Archosauria (birds and

298 crocodilians) and the evolutionary timescales was congruent with the Timetree of Life [69].

299 Later on, the phylogenomic approaches and ultra-conserved elements (UCEs) based analysis

300 were endeavored to test the prevailing hypothesis and supports the sister relationships of

301 Testudines with Archosaurians [13,70,71]. Hence, to affirm the existing phylogenetic hypothesis

302 of Testudines with other amniotes, we constructed the phylogenetic tree of mitogenomes dataset

303 (Testudines + other amniotes). Both BA and ML phylogeny of this dataset showed close

304 clustering with birds, alligator, and crocodile in comparison with other amniotes with high

305 bootstrap support and posterior probability (Fig. 4 and S6). Hence, the present mitogenome

306 based phylogenies are congruent with the prevailing hypothesis and strengthen the evolutionary

307 thoughts of Testudines in the Vertebrate tree of life.

308 Gene arrangements

309 To define the phylogenetic clustering at different taxonomic levels, and presume the

310 evolutionary pathways among Testudines, TreeRex analysis was adopted to assure the gene

311 arrangements. A total of 50 consistent nodes were detected in the present analysis (Fig. 5).

312 Considering A50 node as an ancestral trait of both Cryptodiran and Pleurodiran species in the

313 present dataset, the gene arrangement events are plesiomorphy for most of the Testudines species

14 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

314 with few exceptions. Four gene arrangements events were detected in the present dataset: (i) an

315 inversion of trnP was observed in A39 node of E. macrurus, (ii) an inversion of trnP was

316 observed in A21 node of E. imbricata, (iii) two inversion of trnP and trnS1 were observed on

317 node A11 towards A10 which separates the family Testudinidae from Geoemydidae, (iv) two

318 inversion of trnS1, trnP and one TDRL event towards P. megacephalum separates the family

319 Platysternidae to (Fig. 5). However, the synapomorphy was observed in three species

320 under three different families, (E. macrurus), (E. imbricata), and

321 Platysternidae (P. megacephalum). As compared the gene arrangement scenario with other

322 species under Chelidae and Cheloniidae, the GO of both E. macrurus and E. imbricata might

323 occurr due to the parallel evolution, which needs further investigation. Further, to know the

324 phylogenetic position and evolutionary history of the sole member (P. megacephalum) under the

325 monotypic family Platysternidae, the mitogenomes data has been largely assessed. The unusual

326 gene arrangements, duplication of CR, and loss of redundant genes was observed in P.

327 megacephalum mitogenome [50,51,53]. Later on, the micro-evolutionary analysis was performed

328 to know the evolution of CRs, and evidenced that, the duplicate CR in P. megacephalum derived

329 from the heterological ancestral recombination of mitochondrial DNA [52]. The present TreeRex

330 based GO analysis revealed that, both inversion and TDRL events plays a major role in the

331 independent evolution of P. megacephalum as observed in the earlier studies. In the recent past,

332 the draft genome of this species has been assembled to know the species even better [72]. Hence,

333 we recommended whole genome data of Testudines and their comparative analysis will be

334 essential to comprehend the molecular mechanisms leading to their distinctive morphology and

335 physiology. Altogether, the structural features of mitochondrial genomes, phylogeny, and

336 TreeRex based gene arrangement analysis fortify the knowledge on the Testudines evolution.

15 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

337 Conclusions and conservation inferences

338 The evolutionarily distinct and globally endangered (EDGE) freshwater turtle, P. sylhetensis

339 (family Geoemydidae) is endemic to India and Bangladesh. Due to the habitat fragmentation,

340 anthropogenic threats, and illegal poaching; the populations of P. sylhetensis have remarkably

341 declining in its range distribution [73-75]. Hence, on priority, P. sylhetensis is categorized as an

342 ‘endangered’ species in the IUCN Red data list, ‘Appendix II’ category in the Convention on

343 International Trade in Endangered Species of Wild Fauna and Flora (CITES), and ‘Schedule I’

344 species in Indian Wildlife (Protection) Act, 1972. Nevertheless, owing to the unavailability of in-

345 depth genetic information of many Testudines; species-specific biological information has not

346 been amply resolved. The present study assembled and characterized the first complete

347 mitogenome of P. sylhetensis (16,568 bp) and deliberated the comparative analysis with other 52

348 Testudines representing 12 families and two sub-orders (Cryptodira and Pleurodira). The

349 estimated PCGs based BA and ML phylogenies indicated that, P. sylhetensis is closely related to

350 B. trivittata and congruent with the earlier study [62]. Nevertheless, the phylogenetic

351 reassessment with more mitogenomes, evidenced diapsid affinity and sister relationship of

352 Testudines with Archosaurians, which would be fortify the Vertebrate tree of life. The GO

353 analysis also revealed that, most of the species mitogenomes accommodate similar gene

354 arrangements as observed in typical vertebrates with few exceptions (E. macrurus, E. imbricata,

355 P. megacephalum, and shared nodes of Geoemydidae-Testudinidae). Hence, the generated

356 mitogenomic information of P. sylhetensis would be useful for further phylogenetic

357 reconciliation, to understand the evolutionary history, and adopt the precise conservation

358 management.

359

16 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

360 Acknowledgements

361 We thank the Director of Zoological Survey of India (ZSI), Ministry of Environment, Forests

362 and Climate Change (MoEF&CC), Govt. of India for providing necessary permissions and

363 facilities. We are thankful to the Arunachal Pradesh Biodiversity Board for providing necessary

364 permissions and facilities. The first author (S.K) acknowledges the fellowship grant received

365 from Council of Scientific and Industrial Research (CSIR) Senior Research Associateship

366 (Scientists’ Pool Scheme) Pool No. 9072-A.

367 Funding

368 This research was funded by the Ministry of Environment, Forest and Climate Change

369 (MoEF&CC): Zoological Survey of India (ZSI), Kolkata in-house project, ‘National Faunal

370 Genome Resources (NFGR). The funders had no role in study design, data collection and

371 analysis, decision to publish, or preparation of the manuscript.

372 Competing interests:

373 The authors declare that they have no competing interests.

374 Author Contributions

375 Conceptualization: SK, VK; Data curation: KT, SK; Formal analysis: SK; KT; Funding

376 acquisition: KC, VK; Investigation: SK, VK; Methodology: SK, KT, VK; Project administration:

377 KC, VK; Resources: KC; Software: SK, KT; Supervision: VK, KC; Validation: SK, VK;

378 Visualization: SK, KT; Writing – original draft: SK, KT, VK; Writing – review & editing: SK,

379 VK, KC.

17 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

381 Data Availability Statement

382 The following information was supplied regarding the accessibility of DNA sequences: The

383 complete mitogenome of Pangshura sylhetensis is deposited in GenBank of NCBI under

384 accession number MK580979.

385 ORCID IDs:

386 Shantanu Kundu: https://orcid.org/0000-0002-5488-4433

387 Vikas Kumar: http://orcid.org/0000-0002-0215-0120

388 Kaomud Tyagi: https://orcid.org/0000-0003-1064-9826

389 Kailash Chandra: https://orcid.org/0000-0001-9076-5442

390 References

391 1. Darwin C. On the origin of species. London: John Murray. 1859; 502 p.

392 2. Maddison DR, Schulz K-S. The Tree of Life Web Project. 2007; http://tolweb.org.

393 3. Hedges SB, Poling LL. A molecular phylogeny of reptiles. Science. 1999; 283: 998-1001.

394 4. Lourenço JM, Claude J, Galtier N, Chiari Y. Dating cryptodiran nodes: origin and

395 diversification of the turtle superfamily . Mol Phylogenet Evol. 2012; 62: 496-

396 507.

397 5. Meylan PA. The phylogenetic relationships of softshelled turtles (family ). Bull

398 Nat Hist Mus. 1987; 186: 1-101.

399 6. Gaffney ES, Meylan PA. A phylogeny of turtles. In: Benton MJ (Ed.), the Phylogeny and

400 Classification of Tetrapods. Clarendon Press, Oxford, England. 1988; 157-219.

401 7. Shaffer HB, Meylan P, McKnight ML. Tests of turtle phylogeny: molecular, morphological,

402 and paleontological approaches. Syst Biol. 1997; 46: 235-268.

18 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

403 8. Engstrom TN, Shaffer HB, McCord WP. Multiple Data Sets, High Homoplasy, and the

404 Phylogeny of Softshell Turtles (Testudines: Trionychidae), Syst Biol. 2004; 53: 693-710.

405 9. Fujita MK, Engstrom TN, Starkey DE, Shaffer HB. Turtle phylogeny: insights from a novel

406 nuclear intron. Mol Phylogenet Evol. 2004; 31: 1031-1040.

407 10. Barley AJ, Spinks PQ, Thomson RC, Shaffer HB. Fourteen nuclear genes provide

408 phylogenetic resolution for difficult nodes in the turtle tree of life. Mol Phylogenet Evol.

409 2010; 55: 1189-1194.

410 11. Zardoya R, Meyer A. Complete mitochondrial genome suggests diapsid affinities of turtles.

411 Proc Natl Acad Sci USA. 1998; 95: 14226-14231.

412 12. Kumazawa Y, Nishida M. Complete mitochondrial DNA sequences of the green turtle and

413 blue-tailed mole skink: Statistical evidence for Archosaurian affinity of turtles. Mol Biol Evol.

414 1999; 16: 784-792.

415 13. Fong JJ, Brown JM, Fujita MK, Boussau B. A Phylogenomic Approach to Vertebrate

416 Phylogeny Supports a Turtle-Archosaur Affinity and a Possible Paraphyletic Lissamphibia.

417 PLoS ONE. 2012; 7: e48990.

418 14. Shaffer HB, McCartney-Melstad E, Near T, Mount GG, Spinks PQ. Phylogenomic analyses

419 of 539 highly informative loci dates a fully resolved time tree for the major clades of living

420 turtles (Testudines). Mol Phylogenet Evol. 2017; 115: 7-15.

421 15. Zhang L, Nie L, Cao C, Zhan Y. The complete mitochondrial genome of the Keeled box

422 turtle Pyxidea mouhotii and phylogenetic analysis of major turtle groups. J Genet Genom.

423 2008; 35: 33-40.

19 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

424 16. Huang YN, Li J, Jiang QY, Shen XS, Yan XY, Tang YB, et al. Complete mitochondrial

425 genome of the dentata and phylogenetic analysis of the major family Geoemydidae.

426 Genet Mol Res. 2015; 14: 3234-3243.

427 17. Li W, Zhang X-C, Zhao J, Shi Y, Zhu X-P. Complete mitochondrial genome of Cuora

428 trifasciata (Chinese three-striped ), and a comparative analysis with other box

429 turtles. Gene. 2015; 555: 169-177.

430 18. Feng L, Yang J, Zhang YP, Zhao GF. The complete mitochondrial genome of the Burmese

431 roofed turtle (Batagur trivittata) (Testudines: Geoemydidae). Conserv Genet Res. 2017; 9: 95.

432 19. Kundu S, Kumar V, Tyagi K, Chakraborty R, Singha D, Rahaman I, et al. Complete

433 mitochondrial genome of Black Soft-shell Turtle ( nigricans) and comparative

434 analysis with other Trionychidae. Sci Rep. 2018; 8: 17378.

435 20. Rest JS, Ast JC, Austin CC, Waddell PJ, Tibbetts EA, Hay JM, et al. Molecular systematics

436 of primary reptilian lineages and the tuatara mitochondrial genome. Mol Phylogenet Evol.

437 2003; 29: 289-97.

438 21. Macey JR, Larson A, Ananjeva NB, Fang Z, Papenfuss TJ. Two novel gene orders and the

439 role of light-strand replication in rearrangement of the vertebrate mitochondrial genome. Mol

440 Biol Evol. 1997; 14: 91-104.

441 22. Mindell D, Sorenson MD, Dimcheff DE. Multiple independent origins of mitochondrial gene

442 order in birds. Proc Natl Acad Sci USA. 1998; 95: 10693-10697.

443 23. Cameron SL, Johnson KP, Whiting MF. The mitochondrial genome of the screamer louse

444 Bothriometopus (Phthiraptera: Ischnocera): effects of extensive gene rearrangements on the

445 evolution of the genome. J Mol Evol. 2007; 65: 589-604.

20 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

446 24. Wei S-J, Shi M, Chen X-X, Sharkey MJ, van Achterberg C, Ye G-Y, et al. New Views on

447 Strand Asymmetry in Insect Mitochondrial Genomes. PLoS ONE. 2010; 5: e12708.

448 25. Kumar V, Tyagi K, Kundu S, Chakraborty R, Singha D, Chandra K. The first complete

449 mitochondrial genome of marigold pest thrips, Neohydatothrips samayunkur (Sericothripinae)

450 and comparative analysis. Sci Rep. 2019; 9: 191.

451 26. San Mauro D, Gower DJ, Zardoya R, Wilkinson M. A hotspot of gene order rearrangement

452 by tandem duplication and random loss in the vertebrate mitochondrial genome. Mol Biol

453 Evol. 2006; 23: 227-234.

454 27. Perseke M, Fritzsch G, Ramsch K, Bernt M, Merkle D, Middendorf M, et al. Evolution of

455 mitochondrial gene orders in echinoderms. Mol Phylogenet Evol. 2008; 47: 855-864.

456 28. Babbucci M, Basso A, Scupola A, Patarnello T, Negrisolo E. Is it an ant or a butterfly?

457 Convergent evolution in the mitochondrial gene order of Hymenoptera and Lepidoptera.

458 Genome Biol Evol. 2014; 6: 3326-43.

459 29. Bernt M, Merkle D, Ramsch K, Fritzsch G, Perseke M, Bernhard D, et al. CREx: inferring

460 genomic rearrangements based on common intervals. Bioinformatics. 2007; 23 2957-2958.

461 30. Bernt M, Merkle D, Middendorf M. An algorithm for inferring mitogenome rearrangements

462 in a phylogenetic tree, In: Nelson CE, Vialette S, editors. Comparative genomics, RECOMB-

463 CG 2008, LNB 5267. Berlin: Springer. 2008; 143-157.

464 31. Praschag P, Hundsdörfer AK, Fritz U. Phylogeny and taxonomy

465 of endangered South and South-east Asian freshwater turtles elucidated by

466 mtDNA sequence variation (Testudines: Geoemydidae: Batagur, Callagur,

467 Hardella, Kachuga, Pangshura). Zool Scr. 2007; 36: 429-442.

21 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

468 32. IUCN. The IUCN red list of threatened species, Version. 2019-1. 2019. Accessed September

469 6, 2019.

470 33. Das I. Colour Guide to the Turtles and Tortoises of the Indian Subcontinent, Portishead,

471 Avon: R & A Publishing Limited. 1991.

472 34. Buhlmann KA, Akre TSB, Iverson JB, Karapatakis D, Mittermeier RA, Georges A, et al. A

473 global analysis of and freshwater turtle distributions with identification of priority

474 conservation areas. Chelonian Conserv Biol. 2009; 8: 116-149.

475 35. Turtle Taxonomy Working Group (TTWG) [Rhodin AGJ, Iverson JB, Bour R, Fritz U,

476 Georges A, Shaffer HB, van Dijk PP]. Turtles of the world: Annotated checklist and atlas of

477 taxonomy, synonymy, distribution, and conservation status (8th Ed.). In Rhodin AGJ, Iverson

478 JB, van Dijk PP, Saumure RA, Buhlmann KA, Pritchard PCH, Mittermeier RA (Eds.),

479 Conservation biology of freshwater turtles and tortoises: A compilation project of the

480 IUCN/SSC Tortoise and Freshwater Turtle Specialist Group. Chelonian Res Monogr. 2017; 7:

481 1-292.

482 36. Dierckxsens N, Mardulyn P, Smits G. NOVOPlasty: de novo assembly of organelle genomes

483 from whole genome data. Nucleic Acids Res. 2017; 45: e18.

484 37. Thompson JD, Gibson TJ, Higgins DG. Multiple Sequence Alignment Using ClustalW and

485 ClustalX. Curr Protoc Bioinformatics. 2002; 2.3.1-2.3.22.

486 38. Grant JR, Stothard P. The CGView Server: a comparative genomics tool for circular

487 genomes. Nucleic Acids Res. 2008; 36: W181-184.

488 39. Lowe TM, Chan PP. tRNAscan-SE On-line: Search and Contextual Analysis of Transfer

489 RNA Genes. Nucleic Acids Res. 2016; 44: W54-57.

22 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

490 40. Laslett D, Canbäck B. ARWEN, a program to detect tRNA genes in metazoan mitochondrial

491 nucleotide sequences. Bioinformatics. 2008; 24: 172-175.

492 41. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: Molecular evolutionary

493 genetics analysis Version 6.0. Mol Biol Evol. 2013; 30: 2725-2729.

494 42. Perna NT, Kocher TD. Patterns of nucleotide composition at fourfold degenerate sites of

495 mitochondrial genomes. J Mol Evol. 1995; 41: 353-359.

496 43. Abascal F, Zardoya R, Telford MJ. TranslatorX: multiple alignments of nucleotide sequences

497 guided by amino acid translations. Nucleic Acids Res. 2010; 38: W7-13.

498 44. Vaidya G, Lohman DJ, Meier R. SequenceMatrix: concatenation software for the fast

499 assembly of multigene datasets with character set and codon information. Cladistics. 2010;

500 27: 171-180.

501 45. Lanfear R, Frandsen PB, Wright AM, Senfeld T, Calcott B. PartitionFinder 2: New Methods

502 for Selecting Partitioned Models of Evolution for Molecular and Morphological Phylogenetic

503 Analyses. Mol Biol Evol. 2016; 34: 772-773.

504 46. Miller MA, Schwartz T, Pickett BE, He S, Klem EB, Scheuermann RH, et al. A RESTful

505 API for Access to Phylogenetic Tools via the CIPRES Science Gateway. Evol Bioinform.

506 2015; 11: 43-48.

507 47. Ronquist F, Huelsenbeck JP. MrBayes 3: Bayesian phylogenetic inference under mixed

508 models. Bioinformatics. 2003; 19: 1572-1574.

509 48. Trifinopoulos J, Nguyen L-T, von Haeseler A, Minh BQ. W-IQ-TREE: a fast online

510 phylogenetic tool for maximum likelihood analysis. Nucleic Acids Res. 2016; 44: W232-

511 W235.

23 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

512 49. Letunic I, Bork P. Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree

513 display and annotation. Bioinformatics. 2007; 23: 127-128.

514 50. Parham JF, Feldman CR, Boore JL. The complete mitochondrial genome of the enigmatic

515 bigheaded turtle (Platysternon): description of unusual genomic features and the reconciliation

516 of phylogenetic hypotheses based on mitochondrial and nuclear DNA. BMC Evol Biol. 2006;

517 6: 1-11.

518 51. Peng QL, Nie LW, Pu YG. Complete mitochondrial genome of Chinese big-headed turtle,

519 Platysternon megacephalum, with a novel gene organization in vertebrate mtDNA. Gene.

520 2006; 380: 14-20.

521 52. Zheng C, Nie L, Wang J, Zhou H, Hou H, Wang H, et al. Recombination and Evolution of

522 Duplicate Control Regions in the Mitochondrial Genome of the Asian Big-Headed Turtle,

523 Platysternon megacephalum. PLoS ONE. 2013; 8: e82854.

524 53. Luo H, Li H, Huang A, Ni Q, Yao Y, Xu H, et al. The Complete Mitochondrial Genome

525 of Platysternon megacephalum peguense and Molecular Phylogenetic Analysis. Genes

526 (Basel). 2019; 10: E487.

527 54. Anderson S, Bruijn M, Coulson A, Eperon I, Sanger F, Young I. Complete sequence of

528 bovine mitochondrial DNA. Conserved features of the mammalian mitochondrial genome. J

529 Mol Biol. 1982; 156: 683-717.

530 55. Varani G, McClain WH. The G-U wobble base pair: A fundamental building block of RNA

531 structure crucial to RNA function in diverse biological systems. EMBO Reports. 2000; 1: 18-

532 23.

533 56. Ruokonen M, Kvist L. Structure and evolution of the avian mitochondrial control region.

534 Mol Phylogenet Evol. 2002; 23: 422-432.

24 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

535 57. Wang L, Zhou X, Nie L. Organization and variation of mitochondrial DNA control region in

536 pleurodiran turtles. Zoologia. 2011; 28: 495-504.

537 58. Le M, Raxworthy CJ, McCord WP, Mertz L. A molecular phylogeny of tortoises

538 (Testudines: Testudinidae) based on mitochondrial and nuclear genes. Mol Phylogenet Evol.

539 2006; 40: 517-531.

540 59. Guillon JM, Guéry L, Hulin V, Girondot M. A large phylogeny of turtles (Testudines) using

541 molecular data. Contrib Zool. 2012; 81: 147-158.

542 60. Spinks PQ, Shaffer HB, Iverson JB, McCord WP. Phylogenetic hypotheses for the turtle

543 family Geoemydidae. Mol Phylogenet Evol. 2004; 32: 164-182.

544 61. Le M, McCord WP, Iverson JB. On the paraphyly of the genus Kachuga (Testudines:

545 Geoemydidae). Mol Phylogenet Evol. 2007; 45: 398-404.

546 62. Kundu S, Kumar V, Tyagi K, Chakraborty R, Chandra K. The first complete mitochondrial

547 genome of the , Pangshura tentoria (Testudines: Geoemydidae):

548 Characterization and comparative analysis. Ecol Evol. 2019; 1-15.

549 63. Rieppel O, deBragga M. Turtles as diapsid reptiles. Nature. 1996; 384: 453-455.

550 64. Rieppel O. Turtles as diapsid reptiles. Zool Scr. 2000; 29: 199-212.

551 65. Hedges SB. Amniote phylogeny and the position of turtles. BMC Biol. 2012; 10: 64.

552 66. Mindell DP, Sorenson MD, Dimcheff DE, Hasegawa M, Ast JC, Yuri T. Interordinal

553 Relationships of Birds and Other Reptiles Based on Whole Mitochondrial Genomes, Syst

554 Biol. 1999; 48: 138-152.

555 67. Janke A, Arnason U. The complete mitochondrial genome of Alligator mississippiensis and

556 the separation between recent archosauria (birds and crocodiles). Mol Biol Evol. 1997; 14:

557 1266-1272.

25 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

558 68. Tzika AC, Helaers R, Schramm G, Milinkovitch MC. Reptilian-transcriptome v1.0, a

559 glimpse in the brain transcriptome of five divergent Sauropsida lineages and the phylogenetic

560 position of turtles. EvoDevo. 2011; 2: 1-18.

561 69. Shen XX, Liang D, Wen JZ, Zhang P. Multiple genome alignments facilitate development of

562 NPCL markers: a case study of tetrapod phylogeny focusing on the position of turtles, Mol

563 Biol Evol. 2011; 28: 3237-3252.

564 70. Chiari Y, Cahais V, Galtier N, Delsuc F. Phylogenomic analyses support the position of

565 turtles as sister group of birds and crocodiles. BMC Biol. 2012; 10: 65.

566 71. Crawford NG, Faircloth BC, McCormack JE, Brumfield RT, Winker K, Glenn T.C. More

567 than 1000 ultraconserved elements provide evidence that turtles are the sister group of

568 archosaurs, Biol Lett. 2012; 8: 783-786.

569 72. Cao D, Wang M, Ge Y, Gong S. Draft genome of the big-headed turtle Platysternon

570 megacephalum. Sci Data. 2019; 6: 60.

571 73. van Dijk PP. The status of turtles in Asian. In van Dijk PP, Stuart BL, Rhodin AGJ. (Eds.),

572 Asian Turtle Trade: Proceedings of a Workshop on Conservation and Trade of Freshwater

573 Turtles and Tortoises in Asia. Lunenburg, MA: Chelonian Research Foundation. 2000; 15-23.

574 74. Kundu S, Kumar V, Laskar BA, Tyagi K, Chandra K. Pet and turtle: DNA barcoding

575 identified twelve Geoemydid species in northeast India. Mitochondrial DNA B. 2018; 3: 513-

576 518.

577 75. Rhodin AGJ, Stanford CB, van Dijk PP, Eisemberg C, Luiselli L, Mittermeier RA, et al.

578 Global Conservation Status of Turtles and Tortoises (Order Testudines). Chelonian Conserv

579 Biol. 2018; 17: 135-161.

26 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

581 Tables and Figures Captions:

582 Table 1. List of annotated mitochondrial genes of Pangshura sylhetensis.

583 Fig 1. The spatial range distribution (marked by gray shadow) and collection locality (27.50 N

584 96.24 E marked by red square) of P. sylhetensis. The species photograph and mitochondrial

585 genome of P. sylhetensis. Protein-coding genes are marked by blue, deep green, and yellowish-

586 green color boxes, rRNA genes are marked by violet color boxes, tRNA genes are marked by red

587 color boxes, and control region is marked by gray color box.

588 Fig 2. Frequency of start and stop codons for 13 protein-coding genes in the 52 Testudines

589 mitogenomes.

590 Fig 3. Bayesian (BA) phylogenetic tree based on the concatenated nucleotide sequences of 13

591 PCGs of 52 Testudines species showing the evolutionary relationship of P. sylhetensis. Color

592 boxes indicate the family level clustering for the studied species. The BA tree is drawn to scale

593 with posterior probability support values were indicated along with the branches. Representative

594 organism line diagrams are acquired from cyberspace.

595 Fig 4. Bayesian (BA) phylogenetic tree based on the concatenated 13 PCGs of Testudines and

596 other amniotes mitochondrial genomes indicated the diapsid affinities of Testudines and close

597 relationship with Archosaurians. Color boxes indicate the family level clustering for the studied

598 Testudines species and two sub-orders are marked by black bar. The BA tree is drawn to scale

599 with posterior probability support values were indicated along with the branches. Representative

600 organism line diagrams are acquired from cyberspace.

27 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

601 Fig 5. Gene-order based topology based on TreeREx analysis revealed the gene arrangement

602 scenario within 52 Testudines species. All internal nodes are shared color as per phylogenetic

603 clustering. Four gene arrangement scenarios are superimposed besides the tree.

604 Supporting information

605 S1 Fig. A pictorial overview of the methodologies used for sequencing and analysis of P.

606 sylhetensis mitogenome and bioanalyzer profiles after sonication of enriched mitochondrial DNA

607 sample and libraries.

608 S2 Fig. Putative secondary structures for 22 tRNA genes in mitochondrial genome of P.

609 sylhetensis. The first structure shows the nucleotide positions and details of stem-loop of tRNAs.

610 The tRNAs are represented by full names and IUPAC-IUB single letter amino acid codes.

611 Different base pairings are marked by red, blue and green color bars respectively.

612 S3 Fig. Comparison of control region (CR) stem-loop structures of the origin of L-strand

613 replication of 24 Testudines mitochondrial genomes.

614 S4 Fig. Comparison of control region (CR) stem-loop structures of the origin of L-strand

615 replication of 26 Testudines mitochondrial genomes.

616 S5 Fig. Maximum Likelihood (ML) phylogenetic tree based on the concatenated nucleotide

617 sequences of 13 PCGs of 52 Testudines species. Color boxes indicate the family level clustering

618 for the studied species. The ML tree is drawn by IQ-Tree with bootstrap support values were

619 indicated along with each node.

620 S6 Fig. Maximum Likelihood (ML) phylogenetic tree based on the concatenated 13 PCGs of

621 Testudines and other amniotes mitochondrial genomes. Color boxes indicate the family level

28 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

622 clustering for the studied Testudines species. The ML tree is drawn by IQ-Tree with bootstrap

623 support values were indicated along with each node.

624 S1 Table. List of mitogenome sequences of Testudines and other amniotes species acquired from

625 the NCBI database.

626 S2 Table. Estimated models by partitioning the 13 PCGs separately through PartitionFinder 2 for

627 phylogenetic analysis of two datasets (52 mitogenomes of Testudines and 63 mitogenomes of

628 Testudines + other amniotes).

629 S3 Table. Gene arrangements of the studied Testudines species used in the TreeREx analysis.

630 S4 Table. Nucleotide composition of 52 Testudines species mitochondrial genomes. The A+T

631 biases of whole mitogenome, protein coding genes, tRNA, rRNA, and control regions were

632 calculated by AT-skew = (A-T)/(A+T) and GC-skew= (G-C)/(G+C), respectively.

633 S5 Table. Frequency of start and stop codon distribution within the complete mitogenomes of 52

634 Testudines.

29 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

636 Table 1 List of annotated mitochondrial genes of Pangshura sylhetensis.

Size Anti- Start Stop Intergenic Gene Direction Location (bp) codon codon codon Nucleotides trnF + 1-69 69 GAA . . 0 rrnS + 70-1031 962 . . . 0 trnV + 1032-1100 69 TAC . . 11 rrnL + 1112-2697 1586 . . . 1 trnL2 + 2699-2774 76 TAA . . 0 nad1 + 2775-3743 969 . ATG TAG -1 trnI + 3743-3813 71 GAT . . -1 trnQ - 3813-3883 71 TTG . . -1 trnM + 3883-3951 69 CAT . . 0 nad2 + 3952-4992 1041 . ATG TAG -2 trnW + 4991-5064 74 TCA . . -1 trnA - 5064-5132 69 TGC . . 1 trnN - 5134-5207 74 GTT . . 27 trnC - 5235-5300 66 GCA . . 0 trnY - 5301-5371 71 GTA . . 1 cox1 + 5373-6923 1551 . GTG AGG -12 trnS2 - 6912-6982 71 TGA . . 0 trnD + 6983-7052 70 GTC . . 0 cox2 + 7053-7739 687 . ATG TAG 1 trnK + 7741-7813 73 TTT . . 1 atp8 + 7815-7982 168 . ATG TAA -10 atp6 + 7973-8656 684 . ATG TAA -1 cox3 + 8656-9440 785 . ATG TA(A) -1 trnG + 9440-9507 68 TCC . . 1 nad3 + 9509-9857 349 . ATG GAA 1 trnR + 9859-9927 69 TCG . . 0 nad4l + 9928-10224 297 . ATG TAA -7 nad4 + 10218-11594 1377 . ATG TAA 14 trnH + 11609-11677 69 GTG . . 0 trnS1 + 11678-11744 67 GCT . . -1 trnL1 + 11744-11816 73 TAG . . -21 nad5 + 11796-13628 1833 . GTG TAG -8 nad6 - 13621-14145 525 . ATG AGG -3 trnE - 14143-14210 68 TTC . . 4 cytb + 14215-15361 1147 . ATG T(AA) -3 trnT + 15359-15430 72 TGT . . 0 trnP - 15431-15501 71 TGG . . 66 D-loop . 15568-16389 822 . . . . 637

638

30 bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/827782; this version posted November 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.