<<

Journal of Genetics (2019) 98:88 © Indian Academy of Sciences https://doi.org/10.1007/s12041-019-1134-x

RESEARCH ARTICLE

Comparative analysis of camelid mitochondrial genomes

MANEE M. MANEE1,2∗ , MANAL A. ALSHEHRI1, SARAH A. BINGHADIR1, SHAHAD H. ALDHAFER3, RIYOF M. ALSWAILEM3, ABDULMALEK T. ALGARNI1, BADR M. AL-SHOMRANI1 and MOHAMED B. AL-FAGEEH1

1National Centre for Biotechnology, King Abdulaziz City for Science and Technology, Riyadh 11442, Saudi Arabia 2Centre of Excellence for Genomics, King Abdulaziz City for Science and Technology, Riyadh 11442, Saudi Arabia 3College of Computer and Information Sciences, Imam Muhammad Ibn Saud Islamic University, Riyadh 11432, Saudi Arabia *For correspondence. E-mail: [email protected].

Received 9 April 2019; revised 26 June 2019; accepted 9 July 2019

Abstract. Camelus dromedarius has played a pivotal role in both culture and way of life in the Arabian peninsula, particularly in arid regions where other domestic cannot be easily domesticated. Although, the mitochondrial genomes have recently been sequenced for several camelid species, wider phylogenetic studies are yet to be performed. The features of conserved gene elements, rapid evolutionary rate, and rare recombination make the mitochondrial genome a useful molecular marker for phylogenetic studies of closely related species. Here we carried out a comparative analysis of previously sequenced mitochondrial genomes of camelids with an emphasis on C. dromedarius, revealing a number of noticeable findings. First, the arrangement of mitochondrial genes in C. dromedarius is similar to those of the other camelids. Second, multiple sequence alignment of intergenic regions shows up to 90% similarity across different kinds of , with camels to reach 99%. Third, we successfully identified the three domains (termination-associated sequence, conserved domain and conserved sequence block) of the control region structure. The phylogenetic tree analysis showed that C. dromedarius mitogenomes were significantly clustered in the same clade with pacos mitogenome. These findings will enhance our understanding of the nucleotide composition and molecular evolution of the mitogenomes of the genus Camelus, and provide more data for comparative mitogenomics in the family .

Keywords. mitochondrial DNA; genome annotation; phylogenetic analysis; control region; intergenic regions; Camelus dromedarius.

Introduction tribe characterized by thick eyelashes, a long curved neck, and a single hump as opposed to the two humps on the Camelus dromedarius, often referred as the Arabian , and . These unique is a member of the camelid family and by far the most physical features, among others, give Arabian camels their important domesticated herbivore desert of the ability to conserve water and regulate body temperature, Arabian peninsula. The family Camelidae is divided into thereby thriving in extreme harsh environments such as the two tribes, namely (Dromedary, Bactrian and Arabian peninsula (Al-Swailem et al. 2010). For instance, wild Bactrian camel) and (, , the camel’s hump stores up to 35 kg of fat, which can and vicua), and they mainly inhabit southwest Asia, north- be broken down into water and energy whenever needed. ern , and (Wang et al. 2012; Wu et al. 2014). C. dromedarius also adapts to arid conditions by promot- C. dromedarius is an even-toed in the Camelini ing the expression of various genes such as small heat shock genes (Manee et al. 2017). Heat shock proteins play a key MMM and MBA conceived and designed the experiments; MMM, role in the protection of cells against environmental stress. MAA, SAB, SHA, RMA and ATA carried out the experiments; MMM, MAA, SAB, ATA and BMA analysed the data; MMM, MAA and SAB In addition, recent genomic studies have also revealed the wrote the manuscript. All authors reviewed the manuscript. complex characteristics related to adaptations to desert

Electronic supplementary material: The online version of this article (https://doi.org/10.1007/s12041-019-1134-x) contains supplemen- tary material, which is available to authorized users.

0123456789().: V,-vol 88 Page 2 of 12 Manee. M. Manee et al.

Table 1. Thirty-one mitochondrial genomes from Camelidae family members used in this study.

Scientific name Common name Location Accession no. Length (bp)

1 C. dromedarius Arabian camel Middle East EU159113.1 16643 2 KU605075.1 16379 3 KU605079.1 16379 4 KU605080.1 16379 5 KU605076.1 16379 6 KU605072.1 16379 7 KU605073.1 16379 8 KU605074.1 16379 9 KU605077.1 16379 10 KU605078.1 16379 11 C. bactrianus Bactrian camel Central Asia KU666462.1 16379 12 KU666463.1 16398 13 KU666464.1 16398 14 KU666465.1 16398 15 NC_009628.2 16659 16 KU666460.1 16398 17 KU666461.1 16398 18 C. ferus Wild Bactrian camel Central Asia KU666452.1 16379 19 KU666453.1 16379 20 KU666454.1 16379 21 KU666451.1 16379 22 KU666455.1 16379 23 KU666457.1 16379 24 KU666456.1 16379 25 KU666458.1 16379 26 KU666459.1 16379 27 NC_009629.2 16379 28 L. glama Llama AP003426.1 16597 29 L. guanicoe Guanaco South America NC_011822.1 16649 30 L. pacos Alpaca South America Y19184.1 16652 31 V. Vicuña South America NC_013558.1 16084 harsh environments, including heat stress responses, fat particular, we annotated and characterized the mitochon- and water metabolism, dehydration, and choking dust drial genome of C. dromedarius from the United Arab (Wu et al. 2014). Genome data, particularly mitochondrial Emirates and described its gene structure, arrangement DNA (mtDNA), can be used to investigate evolutionary and base composition, as well as its noncoding regions. relationships among species and to infer the evolutionary Further, C. dromedarius mitogenome was compared with and demographic history of the species (Robinson et al. those from other camels to identify similarities and dif- 2010; Gonzalez-Freire et al. 2015). ferences between these animals. We also constructed a The mitochondria is a vital organelle responsible for phylogenetic tree for the purpose of analysing evolutionary oxidative phosphorylation to produce adenosine triphos- relationships among family Camelidae. phate (ATP), as well as having various other roles in metabolism, cell signalling, cell cycle regulation, initiation Materials and methods of cell differentiation, cell proliferation, ageing and apop- tosis (Mandal et al. 2011). The mitochondria possess its Annotation and analysis of the mitochondrial genome own DNA (mitogenome), which is considered a power- ful material for phylogenetic studies because of its small The C. dromedarius reference mitogenome was included size, lack of recombination, and lack of repair mechanisms as a sample due to the geographical location of the collec- (Chen 2013; Zinovkina 2018). Mitogenome is typically tion site (United Arab Emirates, the Arabian Peninsula). circular double-stranded DNA molecule, 16-kb long and This was the first C. dromedarius mitogenome deposited in encodes 37 genes in humans (Boore 1999). It also has a GenBank with accession number EU159113.1. We anno- massive copy rate compared to the nuclear genome, and tated this mitogenome and identified protein-coding genes its maternal inheritance pattern precludes recombination, (PCG), transfer RNA (tRNA) genes, and ribosomal RNA with the result that to the sequence of the mitogenome is (rRNA) genes using the free web server MITOS (http:// usually stable through generations (Gupta et al. 2015). mitos.bioinf.uni-leipzig.de/index.py) with default settings This study is the first report on mitogenome annotation and the genetic code set to ‘05 – Invertebrate/Mito’ and characterization of C. dromedarius.In(Bernt et al. 2013). All tRNA genes were confirmed Camelid Mitochondrial Genomes Page 3 of 12 88

Figure 1. The organization of the C. dromedarius mitogenome (16.643 bp). The circular genome depicted protein coding genes (green), ribosomal RNAs (red), tRNAs (pink), and repeat region (brown). The origin and direction of L-strand (OL) replication are indicated by bent blue arrows. using tRNAScan-SE (v2.0) with default parameters and numbers are given in table 1. Base composition, AT-skew sequence source set to vertebrate mitochondrial (Lowe and GC-skew were determined for these mitogenomes. and Chan 2016). The positions of rRNA genes and of We calculated base composition using Geneious soft- the control region were confirmed using the boundaries ware (Kearse et al. 2012). AT-skew and GC-skew were of the tRNA genes. All detected C. dromedarius genes, calculated according to the formulas: AT-skew = (A% including PCG, and the control region were later refined – T%)/(A% + T%) and GC-skew = (G% – C%)/(G% through multiple sequence alignment to the mitochondrial + C%). The relative synonymous codon usage of all DNA and amino acid sequences of 30 other Camelidae PCG for C. dromedarius (accession no. EU159113.1) and species (table 1) using the webserver MAFFT (v7.130) C. bactrianus (accession no. KU666462.1) was calculated (https://mafft.cbrc.jp/alignment/server/)(Katoh and Stan- using MEGA software (v7) (Kumar et al. 2016). The sec- dley 2013). Intergenic spacers and overlapping regions ondary structures of tRNA genes of the 14 selected camelid between genes were also identified using an in-house script. mitogenomes were also determined using MITOS web- Classification of the domains and elements of the control server (Bernt et al. 2013). region was based on previously published sequence data from several (Gemmell et al. 1996). Short tan- dem repeats were identified using PERF (v0.2.5) (Avvaru Phylogenetic analysis et al. 2017). We generated two phylogenetic trees for the 31 camelid Comparative analysis of the mitochondrial genomes mitogenomes (table 1), one based on whole mitogenomes, and the other based on the extracted intergenic regions. Wecompared the mitochondrial genomes of 31 individuals Multiple sequence alignments were conducted using Clu- from seven species, including C. dromedarius; all accession stalW integrated with MEGA (v7) (Kumar et al. 2016). 88 Page 4 of 12 Manee. M. Manee et al.

Table 2. Annotation of the C. dromedarius mitochondrial genome.

Gene/region Position (bp) Strand Length GC% Intergenic/overlap Start/stop

trnF(ttc) 1–67 + 67 47.76 0 rrnS 68–1034 + 967 43.32 −1 trnV(gta) 1034–1099 + 66 28.78 0 rrnL 1100–2663 + 1564 38.37 0 trnL2(tta) 2664–2738 + 75 40.00 3 nad1 2742–3698 + 957 46.37 −1ATG/TAA trnI(gat) 3698–3766 + 69 28.98 −3 trnQ(ttg) 3836–3764 73 43.83 1 trnM(cat) 3838–3906 + 69 43.47 0 nad2 3907–4950 + 1044 37.47 −2ATA/TAG trnW(tca) 4949–5016 + 68 39.70 5 trnA(tgc) 5090–5022 – 69 37.68 1 trnN(gtt) 5164–5092 – 73 43.83 2 OL 5167–5198 + 32 −1 trnC(gca) 5264–5198 – 67 43.28 0 trnY(gta) 5331–5265 – 67 47.76 1 cox1 5333–6877 + 1545 43.85 1 ATG/TAG trnS2(tga) 6947–6879 – 69 39.13 6 trnD(gtc) 6954–7020 + 67 37.31 0 cox2 7021–7704 + 684 39.20 3 ATG/TAA trnK(ttt) 7708–7774 + 67 44.77 1 atp8 7776–7979 + 204 37.43 −43 ATG/TAA atp6 7937–8617 + 681 39.55 −1GTG/TAA cox3 8617–9400 + 784 47.63 0 ATG/TAA trnG(tcc) 9401–9470 + 70 37.14 9 nad3 9480–9827 + 348 44.63 −10 ATA/TAG trnR(tcg) 9818–9885 + 68 26.47 0 nad4l 9886–10182 + 297 43.19 −7ATG/TAA nad4 10176–11553 + 1378 43.39 0 ATG/TAA trnH(gtg) 11554–11622 + 69 27.53 0 trnS1(gct) 11623–11681 + 59 38.98 1 trnL1(tag) 11683–11752 + 70 37. 14 −9 nad5 11744–13570 + 1827 41.41 −17 ATA/TAA nad6 14078–13554 – 525 40.26 4 ATG/TAA trnE(ttc) 14151–14083 – 69 36.23 4 cob 14156–15295 + 1140 42.41 0 ATG/AGA trnT(tgt) 15296–15364 + 69 44.92 −1 trnP(tgg) 15429–15364 – 66 54.54 0 OH 15605–16063 + 459 580 Control region 15430–16643 + 1214 46.7 0

Figure 2. Graphical illustration showing the AT-skew and GC-skew in the protein coding genes of C. dromedarius mitogenome.

The aligned sequences were subjected to phylogenetic tree The tree with the highest log likelihood was reported, inference using the maximum likelihood (ML) method using an initial tree obtained automatically by apply- with the Tamura–Nei model implemented in MEGA (v7). ing neighbour-join (NJ) and BioNJ algorithms to a Camelid Mitochondrial Genomes Page 5 of 12 88

Figure 3. Relative synonymous codon usage of mitogenomes in C. dromedarius and C. bactrianus (KU666462.1). The stop codon is not given. matrix of pairwise distances estimated using the maximum genes (trnQ, trnA, trnN, trnC, trnY, trnS2, trnE and trnP) composite likelihood (MCL) model. All positions contain- are encoded on the light strand (L-strand). Overlapping ing gaps or missing data were not included in this study. sequences were discovered for several genes, with a total Both evolutionary relationships were also assessed using of 96 shared nucleotides; the most significant overlaps the NJ method with the maximum composite likelihood were detected among PCG. The longest overlap, at 43 model with MEGA (v7). The resulting phylogenetic trees bp, was shared between atp8 and atp6, while the sec- were depicted using TreeGraph (v2) (Stöver and Müller ond largest overlap, at 17 bp, occurred between the PCG 2010). pair nad5 and nad6. No introns were detected in the C. mitogenome. The consensus sequence of the C. dromedaries mitogenome was aligned to other pre- Results viously sequenced camelid mitogenomes (table 1). Over- Mitochondrial genome organization all, these 31 mitogenomes showed 66.6% mitogenome nucleotide identity. The lengths of complete mitogenomes The mitochondrial DNA sequence of C. dromedaries varied from 16643 bp for C. dromedaries to 16084 bp for (GenBank accession no. EU159113.1) is 16643-bp long Vicugna vicugna. and contains 37 genes: 13 small PCG and large rRNA sub-unit genes, 22 tRNA genes, and a control region Genome composition (figure 1). The gene arrangement, position, length and intergenic spacers of C. dromedaries mitogenome are In all the investigated camelid species, mitochondrial shownintable2. Most of the genes (28) are encoded genome base composition showed a bias towards A and on the heavy strand (H strand); nad6 and eight tRNA T (table 1 in electronic supplementary material at http:// 88 Page 6 of 12 Manee. M. Manee et al.

Figure 4. Comparison on the secondary structure of the tRNA genes of 14 Camelidae mitogenomes. The secondary structures are designed from tRNA genes of C. dromedarius investigated in this study. Differences of each position in the other 13 species were pointed out next to corresponding nucleotide. Each species was indicated by a unique colour as displayed on the right bottom corner of the figure. Camelid Mitochondrial Genomes Page 7 of 12 88

Table 3. Sequences of the conserved regions in CR of the C. dromedarius.

Functional domains Nucleotide sequences

TAS TGCATAATTTGTTTG Central CD A TCCGCTATGGCCGTCTGAGGCCCCGTCGCAGTCAAATCAATT BTCATGCATTTGGTATTTT C TCTTAAATAAGACATCTCGATGG D ATCTGGTTCTTACTTCAGGACCAT E CCTCTTCTCGCTCCGGGCCCATCCATTGTGGGGGTTTCT F CAGGCCGCGTGAAATCATCAACCCGCT CSB CSB1 GTCAATGGTCGCAGGACATAA CSB2 AAACCCCCCTTACCCCCCA CSB3 CTGCCAAACCCCAAAAACA

Nucleotides presented in bold show variations in some tested species.

Figure 5. A schematic drawing of the structural organization of mitochondrial control region of C. dromedarius. The control region flanking genes tRNA-Phe and tRNA-Pro are presented in red colour. Conserved elements in the control region are presented by grey boxes: TAS, termination associated sequence; CD, central conserved domain; CSB, conserved sequence block. VNTRs denote regions containing tandem repeats. www.ias.ac.in/jgenet/). The nucleotide base composition vertebrate mitogenomes (Satoh et al. 2006). With regard of the C. dromedaries mitogenome is 30.8% A, 27.1% T, to stop codons, nine PCGs terminated in TAA, three with 26.6% C and 15.5% G. The overall AT-skew is 0.065 and TAG, and only one (cob) ended with AGA. We com- GC-skew is –0.264. In addition, it was evident that (A) and pared the PCGs of the 31 mitogenomes in terms of their (C) were more prevalent than (T) and (G) in all camelid lengths and start/stop codons. Among the investigated species mitogenomes, with GC-skew ranging from –0.286 species, nad4l, nad2 and cob were the most similar pro- to –0.256 and AT-skew ranging from 0.076 to 0.057 (table teins, while nad4 and atp8 were the least similar. We found 1 in electronic supplementary material). The negative GC- that the lengths and the protein positions of Lama glama skew values reflect richness of cytosine over guanine in (AP003426.1), L. guanicoe (NC_011822.1), and V.vicugna Camelidae members. We also computed GC content for (NC_013558.1) are identical. In all species, nad5 was the the control region and intergenic regions, and found it to longest PCG, with sequence lengths ranging from 1797 to be 46.7% and 51.86%, respectively. To further estimate the 1830 bp. level of base bias, we calculated AT-skew and GC-skew In both C. dromedarius and C. bactrianus mitogenomes, ratios for PCGs in C. dromedaries mtDNA (figure 2). All Leu, Ser, Pro and Thr were the most frequently-encoded GC-skew values were negative except for nad6, which has amino acids, while the least common amino acids were a positive GC-skew of 0.483. More typical were the values distinct: Asp in C. dromedarius and Glu in C. bactrianus for atp6, which had AT-skew = 0 and GC-skew=−0.363, (figure 3). Excluding stop codons, the most frequently used representing equal amounts of A and T; and an overall codons in C. dromedarius were UAU (Tyr), followed by nucleotide composition that is notably C-skewed. AAA (Lys), then AUU (Ile), and lastly AAU (Asn). These codons exclusively consist of A and T bases, which con- tribute to the high A + T content seen in C. dromedarius Protein coding genes as well as all the other 30 species in our study.

The complete mtDNA of C. dromedarius encodes 13 PCG, the total lengths of which is 11,407 bp and account for Transfer and rRNA genes 68.5% of the entire mitogenome. The lengths of individ- ual PCGs range from 204 bp (atp8) to 1827 bp (nad5). As in many metazoan mitogenomes, the C. dromedar- The majority of PCGs start with an ATG codon, though ius mitogenome contained 22 tRNA genes. They varied atp6 utilizes GTG, which is also a known start codon in in size from 59 bp (trnS1)to75bp(trnl2). The genes 88 Page 8 of 12 Manee. M. Manee et al.

Figure 6. Phylogenetic tree generated using the maximum likelihood method based on complete mitochondrial genomes of seven camelid species. C. dromedarius mitogenome (EU159113) is denoted in bold. did not have distinct size differences between different number of mutations was observed in trnD (14 sites), and camelid species. With the exception of trnS1,alltRNA trnT ranked next with 13 site mutations. All tested Arabian genes could be folded into typical cloverleaf secondary camels, which are depicted in black, have identical tRNA structures. As observed in many metazoan mitogenomes, structures and base compositions (figure 4). the secondary structure of trnS1 lacks a discernible dihy- As observed in many vertebrate mitogenome, the rRNA drouridine (DHU) stem (Shi et al. 2016). genes are located at the beginning of the sequence between Of the 31 camelid mitogenomes investigated, we selected trnF and trnL2, and separated by the trnV gene. The small 14 for the comparison of tRNA secondary structures, and large ribosomal RNA genes were 699-bp and 1559-bp including C. dromedarius. We studied variation in the base long, respectively, with GC contents of 43.3% and 38.4%. pair composition of tRNA genes in all selected species Sequence similarity ratios for the tested species were con- compared to C. dromedarius (figure 4). In all examined siderably high, reaching 95.3% for 12S rRNA and 96.7% genes, the amino acid acceptor (AA) stem and the anti- for 16S rRNA. codon (AC) loop were 7-bp long, while the DHU arm was 2–4 bp long, and the AC arm 4–5 bp long. Across Intergenic and control regions tRNAs, the AC loop was the most conserved loop, with one site substitution in trnL and trnC. When considering The C. dromedarius mitogenome containes a total of 116 whole genes, trnM was the only fully conserved tRNA, bp of intergenic spacers, spread over 22 locations and dis- followed by trnL1 with one site mutation. The greatest tributed equally on both strands. These intergenic regions Camelid Mitochondrial Genomes Page 9 of 12 88

Figure 7. Maximum likelihood phylogenetic tree based on intergenic regions from the mitochondrial genomes of seven camelid species. C. dromedarius mitogenome (EU159113) is denoted in bold. were identified as being between genes encoded on the Comparative analysis of the 31 camelid mitogenomes same strand or on different strands, and their lengths showed that unlike other mitogenome components, the ranged from 1 to 33 bp. The largest intergenic spacer (33 control region varied greatly, having a high frequency of bp) was located between trnN and trnC (figure 1). Multi- deletions and insertions. ple sequence alignment of the intergenic regions revealed In vertebrate mitogenomes, the control region con- 90% sequence similarity for the 31 investigated camelid sists of several highly conserved domains: the conserved mitogenomes. Across all species, the locations of the inter- sequence block (CSB I-III), the extended termination genic regions were highly similar, with a maximum shift of associated sequence (ETAS), and the central conserved three nucleotides. Within the 10 Arabian camels, a higher domain (CD). A termination-associated sequence (TAS) percentage of similarity (99%) was observed with only has additionally been identified in a number of vertebrate one nucleotide substitution found in KU605078.1 (figure mitogenomes, which has a proposed primary function of 1 in electronic supplementary material). When compared being a terminator for DNA synthesis. This sequence was wild Bactrian camels, we found the intergenic regions to also identified in camelid species at the 5 end of the control be identical. As with gene coding regions, the intergenic region, with over 96.5% similarity between tested species. regions in C. dromedarius species are rich in AT content, A conserved TGCAT motif was identified that could pro- with an average of 65.5%. As in many other mammals, the duce a secondary structure stem loop that may contribute control region is the largest noncoding sequence in the to TAS function. The conserved sequence block, CSB C. dromedarius mitogenome, with average length 1214 I-III, on the 3 end of the control region is suggested to bp, and it is located between the trnP and trnF genes. be involved in the origin of replication for the H-strand 88 Page 10 of 12 Manee. M. Manee et al.

(OH), as well as performing necessary functions in mito- when the phylogenetic tree is constructed using intergenic chondrial metabolism. Further, it contains promoters for regions. the transcription of both L-strand (LSP) and H-strand (HSP) genes. Characteristic motifs were used to detect the CSB domains: CSBI (GACATA), CSBII (AAACC- CCCCTTACCCCCCA), and CSBIII (CTGCCAAACC- Discussion CCAAAAACA). The three domains were found to be 98.2% conserved across all tested camelids with the excep- The mitochondrial genomes of animals are generally cir- tion of V. vicugna, in which the CSB II and CSB III were cular double-stranded DNA sequences, usually encoding absent. The CD domain is a conserved region located the same set of 13 PCGs, two rRNA genes, and 22 tRNA downstream from the TAS (111–115 bp) that contains genes, with a control region (Cui et al. 2007; Peng et al. a number of domains arranged in sequence. The lengths 2007; Wada et al. 2007). Our results revealed that the of these domains are 27, 39, 24, 23, 18, and 42 bp for gene organization and arrangement in C. dromedarius CSB-F, CSB-E, CSB-D, CSB-C, CSB-B, and CSB-A, mitogenome was the same as those of other mammals. respectively.Analysis of the CSB-D unit indicated that this The comparison of the mitogenomes of C. dromedarius sequence might have subsequence-complementary DNA and other camelid species showed that the slight difference (TTCTTACTTCAGGAC-CAT/GATGGTCCTGAAGT in lengths was mainly due to the difference of the control AA) that is able to form a hairpin loop. The sequences region. This finding is consistent with the fast evolution- of CSB-C and CSB-D were 100% identical across the 31 ary rate of the control region compared to protein-coding tested mitogenomes. When considering the entire region, and rRNA, and its prior use to examine genetic varia- the most variation (substitutions and insertions) was tion within mammals (Tang et al. 2006). The AT-skew, observed for the llama (table 3). GC-skew and genomic nucleotide content were utilized to Between CBS II and I, a variable number of tan- reveal the base composition behaviour of camelid mito- dem repetitive motifs (VNTRs) were observed in the chondrial genomes (Vanyushin and Kirnos 1977). The C. dromedarius mitogenome (figure 5). A 6 bp consensus AT contents were found to be significantly higher than motifs (ACGTAC) and (ACACGC) was repeated 10 and the GC content, which are consistent with the base com- five times in the control region, respectively; these repeats position of mitochondrial genomes of other mammalian were absent from most of the investigated camels. All tested species (Hu and Ga 2016; Xiao et al. 2016). The GC skews species in the llama outer group shared significantly sim- were negative when considering the whole mitochondrial ilar repeats with C. dromedarius, C. ferus (NC_009629.2), genomes, while the protein-coding regions are mainly C- and C. bactrianus (NC_009628.2). skewed. All PCGs begin with an ATG start codon, except for nad2, nad3,andnad5 which initiate with ATA, and atp6 Phylogenetic analysis which starts with GTG. ATG and ATA are the most start codon in mammals (Xiufeng and Árnason 1994). Within We constructed maximum likelihood and NJ phylogenetic PCGs, the UAU (Tyr) codon had the highest usage, while trees based on multiple sequence alignment of the Ara- GCG (Ala) appears to be the lowest. These findings are bian camel mitogenome and those from 30 additional in line with the observed bias against G and C across the camelid mitogenomes. As shown in figure 6; figure 2 in elec- whole C. dromedarius mitogenome. Structural analysis of tronic supplementary material, the four genera of camelids tRNA genes revealed that most have a cloverleaf structure, species clustered in three clades: blue, green, and an orange. the exception being trnS1, for which the DHU arm sim- Thus, our mitogenome of interest grouped together with ply forms a loop. As in other mammalian mitogenomes, all dromedary camels. In addition, Lama pacos was found which encode OL in a noncoding region of around 30 to cluster with C. dromedarius in the same clade, while the bp that is flanked by tRNA genes, we identified the OL other Lama species (L. guanicoe and L. glama) were closely of C. dromedarius in the intergenic region between trnN related to C. bactrianus. and trnC (figure 1)(Taanman 1999). We found that for The phylogenetic relationships reconstructed based on species within the same genus, intergenic regions are iden- the nucleotide sequences of the intergenic regions con- tical or closely similar. We succeeded in identifying the firmed that all members from the same genus cluster conserved sequences (TAS, CSBI-III and CD) within the together except for L. pacos. Further, C. dromedarius was control region based on their homology with other Camel- found to cluster with V. vicugna, L. guanicoe and L. glama idae species (Brown et al. 1986). Wenoticed that the CSB-C (figure 7; figure 3 in electronic supplementary material). It and CSB-D sequences were highly informative for the dis- isshownfromfigures6 and 7 that V. vicugna, L. guani- crimination of Camelidae members. These findings suggest coe and L. glama are always clustered together. However, that nucleotide sequences of both repeated motifs and the when considering the whole genome, they clustered with CSB are good markers for species discrimination (Saccone C. bactrianus, whereas they clustered with C. dromedarius et al. 1991). Camelid Mitochondrial Genomes Page 11 of 12 88

In summary, the sequence structure of the C. dromedar- Gupta A., Bhardwaj A., Supriya, Sharma P., Pal Y.Mamta. et al. ius mitogenome is typical and shares most of the common 2015 Mitochondrial DNA- a tool for phylogenetic and biodi- genomic features with other camelid species. In addition, versity search in equines. J. Biodivers. Endanger. Species S1, 006. the different camelid species were classified into three main Hu X.-D. and Gao L.-Z. 2016 The complete mitochondrial clusters according to the phylogenetic clades. Phyloge- genome of domestic sheep, ovis aries. Mitochondrial DNA A netic analyses showed that C. dromedarius mitogenomes DNA Mapp. Seq. Anal. 27, 1425–1427. were significantly clustered as the clade with L. pacos Katoh K. and Standley D. M. 2013 MAFFT multiple sequence mitogenome. Although the mitochondrial genome of alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780. the Arabian camel is highly similar to those of other Kearse M., Moir R., Wilson A., Stones-Havas S., Cheung six camelid species, there are distinct differences within M., Sturrock S. et al. 2012 Geneious basic: an integrated Camelidae members in the mtDNA evolutionary analysis and extendable desktop software platform for the organi- and control region. Taxonomic sampling is necessary to zation and analysis of sequence data. Bioinformatics 28, obtain further insights into the evolution of Camelidae. 1647–1649. Kumar S., Stecher G. and Tamura K. 2016 Mega7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Acknowledgements Mol. Biol. Evol. 33, 1870–1874. Lowe T. M. and Chan P. P. 2016 tRNAscan-SE on-line: inte- The authors would like to thank Maha A. Alshuaibi at the Col- grating search and context for analysis of transfer RNA genes. lege of Computer and Information Sciences, Imam Muhammad Nucleic Acids Res. 44, W54–W57. Ibn Saud Islamic University, for her valuable input in executing MandalS.,LindgrenA.G.,SrivastavaA.S.,ClarkA.T.and this project; and Amer S. Alharthi at the National Centre for Banerjee U. 2011 Mitochondrial function controls prolifera- Robotics Technologies and Intelligent Systems, King Abdulaziz tion and early differentiation potential of embryonic stem cells. City for Science and Technology, for his technical support. This Stem Cells 29, 486–495. work was funded by the Life Science and Environment Research Manee M. M., Alharbi S. N., Algarni A. T., Alghamdi W. M., Institute and the Centre of Excellence for Genomics (Grant 20- Altammami M. A., Alkhrayef M. N. et al. 2017 Molecu- 0078), King Abdulaziz City for Science and Technology, Saudi lar cloning, bioinformatics analysis, and expression of small Arabia. heat shock protein beta-1 from Camelus dromedarius Arabian camel. PLoS One 12, e0189905. Peng R., Zeng B., Meng X., YueB., Zhang Z. and Zou F.2007 The complete mitochondrial genome and phylogenetic analysis of the giant panda (ailuropoda melanoleuca). Gene 397, 76–83. References Robinson K., Creed J., Reguly B., Powell C., Wittock R., Klein D. et al. 2010 Accurate prediction of repeat prostate biopsy Al-Swailem A. M., Shehata M. M., Abu-Duhier F. M., Al- outcomes by a mitochondrial DNA deletion assay. Prostate Yamani E. J., Al-Busadah K. A., Al-Arawi M. S. et al. 2010 Cancer Prostatic Dis. 13, 126–131. Sequencing, analysis, and annotation of expressed sequence Saccone C., Pesole G. and Sbisa E. 1991 The main regulatory tags for Camelus dromedarius. PLoS One 5, e10720. region of mammalian mitochondrial DNA: structure-function Avvaru A. K., Sowpati D. T. and Mishra R. K. 2017 PERF: an model and evolutionary pattern. J. Mol. Evol. 33, 83–91. exhaustive algorithm for ultra-fast and efficient identification Satoh T. P., Miya M., Endo H. and Nishida M. 2006 Round and of microsatellites from large DNA sequences. Bioinformatics pointed-head grenadier fishes (actinopterygii: Gadiformes) 34, 943–948. represent a single sister group: evidence from the complete Boore J. L. 1999 mitochondrial genomes. Nucleic Acids mitochondrial genome sequences. Mol. Phylogenet. Evol. 40, Res. 27, 1767–1780. 129–138. Bernt M., Donath A., Jühling F., Externbrink F., Florentz C., Shi X., Tian P., Lin R., Huang D. and Wang J. 2016 Charac- Fritzsch G. et al. 2013 MITOS: improved de novo metazoan terization of the complete mitochondrial genome sequence of mitochondrial genome annotation. Mol. Phylogenet. Evol. 69, the globose head whiptail Cetonurus globiceps (Gadiformes: 313–319. Macrouridae) and its phylogenetic analysis. PLoS One 11, Brown G. G., Gadaleta G., Pepe G., Saccone C. and Sbis A. e0153666. E. 1986 Structural conservation and variation in the d-loop- Stöver B. C. and Müller K. F. 2010 Treegraph 2: combining containing region of vertebrate mitochondrial DNA. J. Mol. and visualizing evidence from different phylogenetic analyses. Biol. 192, 503–511. BMC Bioinformatics 11,7. Chen X. J. 2013 Mechanism of homologous recombination Taanman J.-W.1999 The mitochondrial genome: structure, tran- and implications for aging-related deletions in mitochondrial scription, translation and replication. Biochim. Biophys. Acta DNA. Microbiol. Mol. Biol. Rev. 77, 476–496. Bioenerg. 1410, 103–123. Cui P.,Ji R., Ding F., Qi D., Gao H., Meng H. et al. 2007 A com- Tang Q., Liu H., Mayden R. and Xiong B. 2006 Comparison of plete mitochondrial genome sequence of the wild two-humped evolutionary rates in the mitochondrial DNA cytochrome b camel (camelus bactrianus ferus): an evolutionary history of gene and control region and their implications for phylogeny camelidae. BMC Genomics 3, 241. of the cobitoidea (teleostei: Cypriniformes). Mol. Phylogenet. Gemmell N. J., Western P. S., Watson J. M. and Graves J. 1996 Evol. 39, 347–357. Evolution of the mammalian mitochondrial control region– Vanyushin B. F. and Kirnos M. D. 1977 Structure of animal comparisons of control region sequences between monotreme mitochondrial DNA (base composition, pyrimidine clusters, and therian mammals. Mol. Biol. Evol. 13, 798–808. character of methylation). Biochim. Biophys. Acta Nucleic Gonzalez-Freire M., De Cabo R., Bernier M., Sollott S. J., Fabbri Acids Protein Synth. 475, 323–336. E., Navas P. et al. 2015 Reconsidering the role of mitochondria Wada K., Nishibori M. and Yokohama M. 2007 The com- in aging. J. Gerontol. A Biol. Sci. Med. Sci. 70, 1334–1342. plete nucleotide sequence of mitochondrial genome in the 88 Page 12 of 12 Manee. M. Manee et al.

Japanese sika deer (cervus nippon), and a phylogenetic anal- Xiao X., Yang S., Lin D., Wang Y., Hua Y., Wang Y. et al. ysis between cervidae and bovidae. Small Res. 69, 2016 The complete mitochondrial genome and phylogenetic 46–54. analysis of Chinese Jianchang horse (Equus caballus). Clon. Wang Z., Ding G., Chen G., Sun Y., Sun Z., Zhang H. et al. 2012 Transgen. 5,2. Genome sequences of wild and domestic bactrian camels. Nat. Xiufeng X. and Árnason Ú. 1994 The complete mitochondrial Commun. 3, 1202–1202. DNA sequence of the horse, Equus caballus:extensivehetero- Wu H., Guang X., Al-Fageeh M. B., Cao J., Pan S., Zhou H. plasmy of the control region. Gene 148, 357–362. et al. 2014 Camelid genomes reveal evolution and adaptation Zinovkina L. 2018 Mechanisms of mitochondrial DNA repair in to desert environments. Nat. Commun. 5, 5188. mammals. Biochemistry 83, 233–249.

Corresponding editor: Subramaniam Ganesh