bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 Sequence and structure comparison of ATP synthase F0 2 subunits 6 and 8 in notothenioid fish. 3 4 Gunjan Katyal1, Brad Ebanks1, Magnus Lucassen2, Chiara Papetti3 and Lisa 5 Chakrabarti1,4*. 6 7 8 9 10 11

12 1 School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington,

13 LE12 5RD, UK.

14 2 Alfred Wegener Institute, Bremerhaven, Germany.

15 3 Biology Department, University of Padova, via U. Bassi, 58/b, 35121, Padova, Italy.

16 4 MRC-Versus Arthritis Centre for Musculoskeletal Ageing Research, UK.

17 * Corresponding author [email protected]

18 bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

19 20 21 22 23 24 25 Abstract: 26 27 The Channichthyidae family (icefish) are the only known vertebrate to be 28 devoid of haemoglobin. Mitochondrial changes such as tight coupling of the 29 mitochondria have facilitated sustained oxygen and respiratory activity in the fish. This 30 makes it important to appreciate features in the sequence and structure of the proteins 31 directly involved in proton transport, which could have physiological implications. ATP 32 synthase subunit a (ATP6) and subunit 8 (ATP8) are proteins that function as part of 33 the F0 component (proton pump) of the F0F1complex. Both are encoded by the 34 mitochondrial genome and involved in oxidative phosphorylation. To explore 35 mitochondrial sequence variation for ATP6 and ATP8 we have gathered sequences 36 and predicted structures of these two proteins of fish from the sub- 37 order, a sub-Antarctic species. We compared these with seven other vertebrate 38 species in order to reveal whether there might be physiologically important differences 39 that can help us to understand the unique biology of the icefish. 40 bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

41 Introduction 42 43 The oceans which surround Antarctica, and their sub-zero temperatures, are home to 44 fish of the suborder Notothenioidei - a prime example of a marine species flock. This 45 suborder comprises four families, three of which include only sub-Antarctic species 46 (Bovichtidae, Pseudaphritidae, Eleginopsidae) and the that 47 encompasses all Antarctic and secondary non-Antarctic species, and are sometimes 48 referred as Cryonotothenioidea[1–3]. 49 Notothenioids are renowned for their physiological adaptations to cold temperatures. 50 This includes the ability to synthesise antifreeze glycoproteins (AFGP) and antifreeze- 51 potentiating proteins (AFPP)[4]. The capacity to synthesise antifreeze glycopeptides 52 (AFGPs) is a biochemical adaptation that enabled the Notothenioidei to colonize and 53 thrive in the extreme polar environment[5]. These proteins are largely composed of a 54 Thr-Ala-Ala repeat with a conjugated disaccharide via the hydroxyl group of the Thr 55 residue and reduce the freezing point of the internal fluids[6,7]. 56 Within the Nototheniidae family the subfamily Channichthyinae, also known as icefish, 57 are remarkable due to the absence of haemoglobin and, in some species, myoglobin 58 too[8–10]. The sub-zero temperatures of the water they inhabit allow the highest levels 59 of oxygen solubility, which is suggested to facilitate their survival despite the loss of 60 globin proteins[10]. 61 Myoglobin is absent in the oxidative skeletal muscle in all icefish, but the absence of 62 myoglobin in cardiac muscle has been reported in only six of the sixteen species of 63 the Channichthyinae subfamily[11,12]. While the molecular genetics of how myoglobin 64 expression has been lost have been studied, the physiological differences between 65 those that express and those that do not express myoglobin are not fully understood. 66 Small intracellular diffusion distances to mitochondria and a greater percentage of cell 67 volume occupied by mitochondria are two evolutionary adaptations that can 68 compensate for the absence of myoglobin[13,14]. In the particular case of 69 Champsocephalus gunnari, the mRNA transcript of myoglobin is present in the cardiac 70 tissue but a 5-bp frameshift insertion hinders the synthesis of protein from the mRNA 71 transcript[11,15]. 72 Notothenioidei have high densities of mitochondria in muscle cells, versatility in 73 mitochondrial biogenesis and a unique lipidomic profile[16–18]. These features have 74 also been hypothesised to facilitate sustained oxygen consumption and respiratory 75 activity in the absence of haemoglobin and myoglobin. 76 Complex V of the electron transport chain, ATP synthase, is responsible for the 77 production of intracellular ATP from ADP and inorganic phosphate. Composed of an 78 F0 and F1 component, the F0 component is responsible for channelling protons from 79 the intermembrane space across the inner mitochondrial membrane and into the

80 mitochondrial matrix[19–21]. The rotation of the c-ring in F0, and with this the γ-subunit 81 of the central stalk, facilitates the translocation of protons across the inner 82 mitochondrial membrane that ultimately drives the catalytic mechanism of the F1 83 component[22,23]. 84 The motor unit F0, embedded in the inner membrane of mitochondria, is composed of 85 subunits b, OSCP (oligomycin sensitivity conferring protein), d, e, f, g, h, i/j, k which 86 are encoded by nuclear genes and subunits a (ATP6) and 8 (ATP8), which are 87 encoded by mitochondrial genes[24]. Despite the structure of the complex having 88 been first resolved decades ago, and hypotheses of the chemical mechanism were 89 developed over half a century ago, significant breakthroughs continue to be made in bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

90 our understanding of both the structure and function of the enzyme and its F0 91 component[25–28]. 92 Both ATP synthase subunit a (ATP6) and subunit 8 (ATP8) are proteins that function 93 as part of the F0 component of ATP synthase, encoded by genes that overlap within 94 the mitochondrial genome[29]. This overlap is over a short, but variable between 95 species, base pair sequence where the translation initiation site of subunit 8 is 96 contained within the coding region of subunit 6. 97 The peripheral stalk is a crucial component of the F0 component forming a physical 98 connection between the membrane sector of the complex and the catalytic core. It 99 provides flexibility, aids in the assembly and stability of the complex, and forms the 100 dimerization interface between ATP synthase pairs[30]. ATP8 is an integral 101 transmembrane component of the peripheral stalk, serving an important role in the 102 assembly of the complex[31]. The C-terminus of ATP8 extends 70 Å from the surface 103 of the makes contacts with subunits b, d and F6, while the N-terminus has been 104 reported to make connections with subunits b, f and 6 in the intermembrane 105 space[32,33]. Subunit 8 is also known to play a role in the activity of the enzyme 106 complex[34]. 107 ATP6 is an α-helical protein embedded within the inner mitochondrial membrane and

108 it interacts closely with the c-ring of F0, providing aqueous half-channels that shuttle 109 protons to and from the rotating c-ring[20,35]. It has previously been reported that 110 ATP6 has at least five hydrophobic transmembrane spanning α helices domain, where 111 two of the helices h4 and h5 are well conserved across manyspecies[36]. 112 Proteins coded by mitochondrial DNA (mtDNA) are involved in oxidative 113 phosphorylation and can directly influence the metabolic performance of this pathway. 114 Evaluating the selective pressures acting on these proteins can provide insights in 115 their evolution, where mutations in the mtDNA can be favourable, neutral or harmful. 116 The amino acid changes can cause inefficiencies in the electron transfer chain, 117 causing oxidative damage by excess production of reactive oxygen species and 118 eventually interrupting the production of mitochondrial energy. Due to the tight 119 coupling of icefish mitochondria relative to their red-blooded relatives, any changes in 120 the structure of ATP Synthase subunits, particularly those directly involved in the 121 transport of protons across the membrane, could result in significant physiological 122 outcomes[37]. 123 In this work, we combine sequence analyses and secondary structure prediction 124 analyses to explore mitochondrial genetic variation for ATP6 and ATP8 in the 125 Notothenioidei suborder species Champsocephalus gunnari (C. gunnari), 126 Chionodraco rastrospinosus (C. rastrospinosus), Chaenocephalus aceratus (C. 127 aceratus), Notothenia coriiceps (N. coriiceps), bernacchii (T. bernacchii), 128 the sub-Antarctic Eleginops maclovinus (E. maclovinus), and compare this with seven 129 other vertebrate species. These include: Nothobranchius furzeri (turquoise killifish N. 130 furzeri, Cyprinodontiformes), Danio rerio (zebrafish, D. rerio, Cypriniformes), the lizard 131 Anolis carolinensis (A. carolinesis, Squamata), the domestic guinea pig Cavia 132 porcellus (C. porcellus), the Bowhead whale Balaena mysticetus (B. mysticetus), the 133 naked mole-rat Heterocephalus glaber (H. glaber), and the eastern red bat Lasiurus 134 borealis (L. borealis) to shed light on the molecular evolution of these proteins in 135 vertebrate species. 136 bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

137 Methodology 138 139 Extraction of gene and protein sequences of ATP8 and ATP6 suborder 140 Notothenioid and other vertebrates. The list of complete coding sequences (CDS) 141 and protein sequences of the proteins were obtained from the National Centre for 142 Biotechnology Information (NCBI) protein database search, we chose only the Refseq 143 (provides a comprehensive, integrated, non-redundant, well-annotated set of 144 sequences, including genomic DNA, transcripts, and proteins) sequence queries 145 (https://www.ncbi.nlm.nih.gov/ lMSast searched:17th August 2020). 146 Multiple Protein Sequence alignment (MSA). (-/-) indicates absence of both 147 haemoglobin and myoglobin genes, whereas (-/+) indicate absence of haemoglobin 148 but presence of myoglobin. The sequences for the Notothenioidei suborder species 149 Champsocephalus gunnari (-/-), Chionodraco rastrospinosus (-/+), Chaenocephalus 150 aceratus (-/-), Notothenia coriiceps (+/+), Trematomus bernacchii, Eleginops 151 maclovinus (+/+), and seven other vertebrate species, Nothobranchius furzeri, Danio 152 rerio, Anolis carolinensis, Cavia porcellus, Balaena mysticetus, Heterocephalus 153 glaber, Lasiurus borealis were aligned using Clustal omega[38] to prepare the initial 154 alignment of ATP6 protein under the criteria of the presence and the absence of 155 haemoglobin and myoglobin proteins in the species, the alignments were also verified 156 using the other two progressive methods, MAFFT[39] and MUSCLE[38]. The same 157 method was applied for protein ATP8. The MSA was visualised and edited using 158 JALVIEW[40]. 159 Codon Alignment. Complete nucleotide coding sequences for genes ATP6 and 160 ATP8 from the thirteen vertebrate species were retrieved from NCBI GenBank 161 database (see Table 1). The sequences were aligned using Clustal omega[38] and 162 were manually edited and visualised as codons using MATLAB version R2018b 163 (9.5.0). 164 Phylogenetic Tree. The phylogenetic tree for the ATP6 protein was created from the 165 MSA obtained by Clustal omega[38] and saved in PHYLIP format. Using the Simple 166 Phylogeny[38] a phylogenetic tree was created with the neighbour joining algorithm 167 and visualised with iTOL v5[41]. 168 Comparison of properties of amino acids among the sequence from the above- 169 mentioned species. Using the ExPASy[42] tool ProtScale[43], different amino acid 170 properties such as the molecular weight of amino acids across the sequence, 171 hydrophobicity trend of amino acids, α--helix forming amino acids, average flexibility 172 trend and mutability for the protein ATP6 were compared graphically among the 173 thirteen species (https://web.expasy.org/protscale/). 174 Structure prediction for protein sequences. The MSA was structurally validated 175 using the structure prediction tool I-TASSER[44] (Iterative Threading ASSEmbly 176 Refinement) a hierarchical approach to protein structure and function prediction, to 177 generate the protein structure for AT6 from different species 178 (https://zhanglab.ccmb.med.umich.edu/I-TASSER/). 179 Figures. Protein structure images were produced with PyMOL v. 2.3.2. (The PyMOL 180 Molecular Graphics System, Version 2.0 Schrödinger, LLC.) Graphs were produced 181 with MATLAB version R2018b (9.5.0). Sequence logos were created using the 182 webserver WebLogo using 4000 vertebrate protein sequences for the protein ATP6 183 (http://weblogo.threeplusone.com/). 184 bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

185 RESULTS 186 187 Codon Alignment 188 189 MSA of all the sequences of ATP6 (See Fig. 1) and ATP8 (See Fig. 2) from the 190 different vertebrate species (Table 1 & Table 2) for both nucleotide (codon) and 191 proteins identified several conserved codons and amino acid residues. Five of the six 192 Antarctic fish species have twelve nucleotides (four codons) at the 5’ end of the gene 193 sequence which are not found in the other eight vertebrate species. The codon 194 alignment ATP6 for species E. maclovinus, N. coriiceps, C. rastrospinosus and C. 195 aceratus show that GTG codes for methionine, as the start codon for the protein. GTG 196 which is originally known for coding the amino acid valine has been accepted as a 197 mitochondrion start codon for invertebrate mitogenomes[45–47]. A common feature 198 with the species that have GTG as a start codon is that N. coriiceps, E. maclovinus, 199 C. rastrospinosus have genes coding for myoglobin, where the latter is devoid of 200 haemoglobin. C. aceratus do not express myoglobin due to a 15 bp sequence 201 insertion, other than that difference, their myoglobin gene sequence is identical to that 202 of C. rastrospinosus[12]. The only exception to this is the red-blooded species T. 203 bernacchii, but this may be attributed to the unverified source of its sequence 204 submission. 205 Another trend that has been observed through sequence alignment is that the species 206 that are more similar and have the same amino acid for a particular position also have 207 codons with the same nucleotide (nt) at the third position. ‘TGA’ codons or ‘stop 208 codons’ are found within the translated sequence, here these code for tryptophan, as 209 seen in human and yeast mitochondria[48]. A variation in the length of the sequences 210 was observed, with an average length for ATP6 nt sequence of 683 and 74 nt for ATP8 211 gene sequences. The ATP6 sequence ends with a TAA stop codon in all species 212 except the two red blooded Antarctic fish species, N. coriiceps and E. maclovinus. 213 bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

214 Table 1 Features of mtATP6 and mtATP8 genes for the thirteen vertebrate species

Species/Features C. gunnari C. rastrospinosus C. aceratus N. corriceps T.bernacchii E.maclovinus N.furzeri D. rerio A. carolinensis L. Borealis H. glaber C. Porcellus B. mysticetus

Common Name Mackerel Icefish Ocellated icefish Blackfin Icefish Marbled rockcod Emerald rockcod Rockcod Killifish Zebra fish Lizard Eastern red Naked mole Rat Guinea Pig Bowhead Whale

Accession No. NC_018340.1 NC_039543.1 NC_015654.1 NC_015653.1 KU166863 NC_033386.1 NC_011814.1 NC_002333.2 EU747728.2 NC_016873.1 NC_015112.1 NC_000884.1 NC_005268.1

Haemoglbin - - - + + + + + + + + + + Myoglobin - + - + + + + + + + + + + Length of 683 695 695 695 695 695 682 683 683 680 680 680 680 nucleotide ATP6 4_amino acids at + GTG-AAC-CTG- + GTG-AAC- + GTG-GTC-CTG-+ ATG-AAC-TTG- + GTG-AAC------5'end ACC CTG-ACC ACC GCC CTG-ACC Residue at Alanine Serine Serine Serine Serine Serine Leucine Threonine Asparagine - - - - position 35/39 5' flanking region 67nt-trn- 74 nt 75nt 75nt 75nt 75nt 75nt 74nt 73nt 66nt 73nt 68nt 71nt ATP8 Lysine 215 ATP6-Start codon atg gtg gtg gtg atg gtg atg atg atg atg atg atg atg

216

217 Overlapping genes 218 219 The overlap between genes is encoded on the same strand (see Table 1). The length 220 of overlap was 22 nt in ATP8-ATP6 for the five of the six species of Notothenioidei 221 suborder, that is excluding icefish C. gunnari where the overlap was of 10nt. Species 222 H. glaber, L. borealis and C. porcellus had an overlap of 43nt between ATP6 and 223 ATP8. The shortest overlap between the two genes were observed in the species A. 224 carolinesis has an overlap of 10nt and N. furzeri and D. rerio, have an overlap of 7nts.

225

226 bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

227 Table 2 Features for protein ATP synthase F0 subunit 6 and 8 for the thirteen vertebrate species 228 analysed in this study.

229 230 231 Protein alignment and structural changes in ATP6 232 233 The complete amino acid sequences for ATP6 and ATP8 were aligned separately for 234 the thirteen vertebrate species (see Figs. 4 & 5). Protein sequence alignment showed 235 conserved residues across the species based on identity and similarity. Four Antarctic 236 fish species, N. coriiceps, T. bernacchii, C. rastrospinosus, C. aceratus and the sub- 237 Antarctic E. maclovinus have four amino acids at the N-terminal with a total of 231 238 residues. As previously mentioned, the only exception to this, is the species C. gunnari 239 with 227 residues similar to that of other fish species, N. furzeri and D. rerio. Species 240 A. carolinesis, L. borealis, H. glaber and C. porcellus have 226 residues and B. 241 mysticetus has 225 residues. The protein ATP6 in vertebrates is known to have 226- 242 228 residues. In humans, four point mutations in the ATP6 gene account for 82% of 243 disease associated with this gene, suggesting point mutations could have 244 physiological relevance[49,50]. Common features in all 13 species were as follows: 245 (1) several hydrophobic amino acids (light pink) were observed to be conserved across 246 the sequences in the species, (2) insertions and deletions of amino acids occurred 247 more frequently near N-termini, and (3) the C-terminal of the protein sequence is 248 hydrophilic. Dashes in the amino acid sequence represent gaps which may be an 249 insertion or deletion of a residue. The gap in the alignment is observed for the species 250 L. borealis, C. porcellus, B. mysticetus and H. glaber at position 35, and at the C- 251 terminal end for A. carolinesis and B. mysticetus, at position 226 and 225 respectively. 252 The amino acid at position 35 has predominantly hydrophilic residues except in the 253 two species C. gunnari and N. furzeri, where it is substituted with alanine or leucine 254 respectively. All the Antarctic species and E. maclovinus have a serine at this position, 255 except C. gunnari. When we look at the codon alignment of the ATP6 gene, serine is 256 encoded by codon TCT predominantly at position 39 for all the species except T. 257 bernacchii and the alanine for the species C. gunnari is encoded by GCT (see Fig. 1). 258 Multiple sequence alignment of four thousand species was used to generate the 259 WebLogo (see Fig. 4), which shows the conservation of amino acid residues across 260 the species. Position 42 shows proline and serine are conserved as we find in the 261 notothenioid species. bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

262 A similar pattern was found in the amino acid alignment of ATP8, where the species, 263 B. mysticetus, H. glaber, C. porcellus and L. borealis, that showed a gap in the 264 previous alignment have hydrophilic residues whereas the other species have a gap 265 at the position 47. This observation could be attributed to the overlapping nature of the 266 nucleotide sequences coding for the two proteins. The protein sequence of ATP6 was 267 observed to be more conserved than ATP8. The amino acid sequences at the N- 268 terminal are more diverse, and the methionine residues are usually followed by amino 269 acids with short polar side chains [51]. Alanine is a non-polar amino acid whereas 270 serine is a polar amino acid. The hydrophobicity plot, average flexibility, mutability and 271 coil prediction across the sequences has shown that T. bernacchii and E. maclovinus 272 show similar trends in their physico-chemical properties across the sequence. 273 Notothenia coriiceps, C. aceratus and C. rastrospinosus follow this trend. 274 Champsocephalus gunnari is the only species out of the six notothenioids that varies 275 from the others. 276 277 The phylogenetic tree generated from the thirteen protein sequences for ATP6 (Fig. 278 3) shows that the species are clustered into three visible clusters, the first cluster has 279 species H. glaber, B. mysticetus, C. porcellus and L. borealis, the second cluster has 280 species A. carolinesis, D. rerio and N. furzeri, and the final cluster comprised the six 281 Notothenioidei. Notothenioidei are clustered together where the icefish, C. 282 rastrospinosus (presence of myoglobin) and C. aceratus, share a common node with 283 C. gunnari which can be interpreted as the most recent common ancestor in 284 agreement with the most recent and complete phylogenies of notothenioids[3]. 285 286 Protein structure differences were predicted at position 38-39 for species C. gunnari 287 (icefish), N. furzeri, D. rerio and A. carolinesis, where a strand-strand structure is found 288 at that position. All other species have coil structures at those positions (see Fig. 6). 289 For species T. bernacchii and E. maclovinus there is also a prediction for a strand 290 structure at positions 42-43. 291 292 293 294 bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

295 Discussion 296 297 We present our analysis highlighting differences in sequence and structure observed 298 in the two proteins of complex V, ATP8 and ATP6, encoded by mtDNA between the 299 red- and white blooded notothenioids. Our analyses are based on the current genome 300 annotation available which is subject to change as more information becomes 301 available. We have only used the RefSeq sequences as these are reviewed by NCBI 302 and represent a compilation of the current knowledge of a gene and protein products 303 and is synthesised using information integrated from multiple sources. RefSeq is used 304 as a reference standard for a variety of purposes such as genome annotation and 305 reporting locations of sequence variation. The RefSeq and GenBank entries available 306 for a ATP6 sequences for the Antarctic fish, NC_015653.1, AP006021.1 (N. coriiceps), 307 NC_039543.1, MF622064.1 (C. rastrospinosus), NC_033386.1, KY038381.1 (E. 308 maclovinus), NC_015654.1, YP_004581502.1 (C. aceratus), which are submitted by 309 different authors, have the start codon as GTG for the five notothenioid species. The 310 protein length of ATP6 has been consistent in all the entries, 231 amino acids. 311 312 It has previously been shown that mitochondria from icefish are more tightly coupled 313 than those of their red-blooded counterparts[37]. Mitochondria that are tightly coupled 314 usually have competent membranes and protons can only get into the matrix by 315 passing through complex V. The initial difference we see is that red-blooded species 316 N. coriiceps, E. maclovinus, T. bernacchii, and the two icefish, C. rastrospinosus that 317 are devoid of haemoglobin but have myoglobin and C. aceratus species with a nearly 318 identical gene to that of C. rastrospinosus, have an additional 12 nucleotides at the N- 319 terminal. The only exception to this is the icefish C. gunnari. 320 321 GTG as an alternative start codon 322 323 The biosynthesis of proteins encoded by their respective mRNA requires an initiation 324 codon for their translation. ATG is the usual initiation codon but GTG has been 325 reported as initiation codon in some lower organisms, the frequency of annotated 326 alternate codon in higher organisms is found to be less than 1 %[52]. An in-vitro study 327 of GTG-mediated translation of enhanced green fluorescent protein suggested that 328 initiation with GTG codon regulates expression of lower levels of the protein and a 329 similar observation was made for the protein endopin 2B-2[53]. It has also been 330 observed in a few human diseases that a mutation of the ATG initiation codon to a 331 GTG are associated with diseases such as beta-thalassemia and Norrie disease, 332 where GTG mutation leads to inactivation of the gene[54,55]. Another example is a 333 disruption caused by GTG as the initiation codon in the gene CYP2C19, which resulted 334 in poor metabolism of a drug, mephenytoin, when compared to the gene with an ATG 335 initiation codon[56]. Numerous studies on bacteria and lower organisms show GTG 336 as a start codon, where the non-methionine codon is initially coded for, however, when 337 they act as a start codon the initial amino acid is substituted with a methionine[53,57]. 338 There is only a single report of a vertebrate species, rat, where GTG is the start codon 339 in mtDNA[58]. An ATG to GTG exchange in human gene FRMD7 (FERM Domain 340 Containing 7) has been found as a first base transversion of the start codon that 341 accounts for a mutation, causing morphological changes in the optic nerve head[59]. 342 The level of corresponding protein expression has been shown to be lower when 343 initiated using an alternative codon such as GTG rather than ATG[53,60]. GTG was bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

344 observed as a start codon for ATP8 in fish Philomycus bilineatus, which adds onto the 345 show GTG as an acceptable start codon[47]. 346 A few but increasing number of mammalian genes have been found to give rise to an 347 alternative initiation codon in regulatory proteins such as transcription factors, growth 348 factors and a few kinases in humans and rats. The finding in all these studies have 349 shown a similar trend of a lower level of protein production when compared to an ATG 350 start codon[61–63]. It has been shown that the fish inhabiting colder climates had 351 undergone stronger selective constraints in order to avoid deleterious 352 mutations[64,65]. MtDNA coding genes such as ATP6, could be placed under 353 selective pressures by low environmental temperatures. A larger ratio of substitution 354 for different sites could indicate proteins undergoing adaptations[66]. A decrease in 355 ATP6 activity previously reported, shows incomplete ATPase complexes that are 356 capable of ATP hydrolysis but not ATP synthesis. ATPase complexes completely 357 lacking subunit a, were capable of maintaining structural interactions between F1 and 358 F0 parts of the enzyme but the interactions were found to be weaker[67]. 359 360 The GTG initiation for protein ATP6 in these fish species could suggest a common 361 parallel evolution of the translation machinery. The favouring of GTG as a start codon 362 could also mean a higher stability of the protein as GC base pair has higher thermal 363 stability when compared to the AT base pair which is attributed from stronger stacking 364 interaction between GC bases and a presence of triple bond compared to that of AT 365 double bond[68]. 366 367 368 369 Overlap of ATP8 and ATP6 genes 370 371 Protein coding genes ATP8 and ATP6 are located adjacent to each other and are 372 overlapping on the same strand in humans and other vertebrates, with an overlap of 373 44 nt (NCBI: NC_012920.1) observed in the humans for the gene. It has been 374 previously reported that ATP8-ATP6 overlap is generally of 10 nt in the fish 375 genome[69]. Species T. bernacchii, E. maclovinus, N. coriiceps, C. rastrospinosus and 376 C. aceratus show an overlap of 22 nts and C. gunnari has a 10 nt overlap, as reported 377 previously in other fish genomes mentioned above. The gene coding ATP8 ends with 378 the stop codon TAG for all notothenioid species and TAA for the other vertebrate 379 species, a single exception to this was H. glaber that ends with a TAG stop codon. It 380 has been previously hypothesised that TAG is a sub-optimal stop codon which is less 381 likely to be selected. A study showed that the protein encoding genes that end with 382 TAA stop codons are, on average more abundant than those with genes ending with 383 TGA or TAG and further shows that a switch of stop codon TAG from TGA might pass 384 through the mutational path of TAA stop codon which could be subject to positive 385 selection in several groups.[70]. 386 387 Protein Alignment and Structural changes in ATP6 388 389 The four Antarctic fish species, N. coriiceps, T. bernacchii, C. rastrospinosus, C. 390 aceratus and the sub-Antarctic E. maclovinus have four amino acids at the N-terminal 391 of ATP6 and a total of 231 residues. As previously mentioned, the only exception to 392 this is the species C. gunnari with 227 residues similar to N. furzeri and D. rerio. N- bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

393 terminal addition of amino acids can influence the properties of the protein, as it can 394 change the molecular weight of the protein, the charge, hydrophobicity, and this has 395 been seen in the yeast meta-caspase prion protein Mca1[71]. 396 Amino acid position 35 is populated with predominantly hydrophilic residues, apart for 397 the two species C. gunnari and N. furzeri, where respectively, alanine and leucine are 398 found. All the other Antarctic fish species and E. maclovinus have a serine at this 399 position. When we look at the codon alignment of the ATP6 gene, serine is encoded 400 by codon TCT at position 39 for all the species except T. bernacchii (encoded by TCC) 401 and the alanine for the species C. gunnari is encoded by GCT. Serine is the only amino 402 acid that is encoded by two codon sets. 403 A common example of a missense mutation is where the single base pair can alter the 404 corresponding codon to a different amino acid. This base substitution even though 405 affecting a single codon can still have a significant effect on the protein production. It 406 has been recently discovered that serine at a highly conserved position is more often 407 encoded in TCN fashion and will tend to substitute non-synonymously to proline and 408 alanine, which shows that codon for which serine is coded indicate different types of 409 selection for amino acid and its acceptable substitutions[72]. 410 The hydrophobicity plot, average flexibility, mutability and coil prediction across the 411 sequences highlights differences in the physiochemical properties across the 412 sequence of protein ATP6 in the species C. gunnari. 413 The secondary structure of a protein is the way in which protein molecules are coiled 414 and folded in a certain way according to the primary sequence. Beta-strands give 415 stability to the structure of a protein, its intrinsic flexibility can sometimes return it to 416 coil configuration in order for the protein to perform other functions. Structural changes 417 were observed at position 38-39 for species C. gunnari, N. furzeri, D. rerio and A. 418 carolinesis, where strand-strand structure was predicted at that position. All other 419 species are predicted to have coil structures at those positions (see Fig. 6). Species 420 T. bernacchii and E. maclovinus are predicted to have strand structures at positions 421 42-43. 422 Protein structure, dynamics and function are all interlinked and it is vital to understand 423 the structure of a protein in relation to function to comprehend molecular 424 processes[73]. We have used the unique biology of the icefish to gain a better 425 understanding of the variability of ATP6 and ATP8 sequence and structure which has 426 importance for mitochondrial function. 427 bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

428

429 CONFLICTS OF INTEREST 430 431 The authors declare that they have no conflict of interest. 432

433 FUNDING 434 435 Gunjan Katyal was supported by Vice Chancellor’s International Scholarship for Research 436 Excellence, University of Nottingham (2018-2021). Brad E Banks was supported by the 437 Biotechnology and Biological Sciences Research Council [grant number BB/J014508/1. 438 bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

439 REFERENCES: 440

441 1. Duhamel G, Hulley P-A, Causse R, Koubbi P, Vacchi M, 442 Pruvost P, et al. Biogenic Patterns of Fish. Biogeographic 443 Atlas of the Southern Ocean. Biogeographic Atlas of the 444 Southern Ocean. The Scientific Committee on Antarctic 445 Research; 2014. doi:10.13140/2.1.1828.3203

446 2. Dornburg A, Federman S, Lamb AD, Jones CD, Near TJ. 447 Cradles and museums of Antarctic teleost biodiversity. Nat 448 Ecol Evol. 2017;1: 1379–1384. doi:10.1038/s41559-017- 449 0239-y

450 3. Near TJ, MacGuigan DJ, Parker E, Struthers CD, Jones CD, 451 Dornburg A. Phylogenetic analysis of Antarctic 452 notothenioids illuminates the utility of RADseq for resolving 453 Cenozoic adaptive radiations. Mol Phylogenet Evol. 454 2018;129: 268–279. doi:10.1016/j.ympev.2018.09.001

455 4. Duman JG. ice-binding (antifreeze) proteins and 456 glycolipids: An overview with emphasis on physiological 457 function. J Exp Biol. 2015;218: 1846–1855. 458 doi:10.1242/jeb.116905

459 5. Devries AL. Role of glycopeptides and pepddes in inhibition 460 of crystallization of water in polar fishes. Philos Trans R Soc 461 London B, Biol Sci. 1984;304: 575–588. 462 doi:10.1098/rstb.1984.0048

463 6. Chen L, Devries AL, Cheng CHC. Evolution of antifreeze 464 glycoprotein gene from a trypsinogen gene in Antarctic 465 notothenioid fish. Proc Natl Acad Sci U S A. 1997;94: 3811– 466 3816. doi:10.1073/pnas.94.8.3811

467 7. Harding MM, Anderberg PI, Haymet ADJ. “Antifreeze” 468 glycoproteins from polar fish. European Journal of 469 Biochemistry. 2003. pp. 1381–1392. doi:10.1046/j.1432- 470 1033.2003.03488.x bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

471 8. Johan T. Ruud. Vertebrates without Erythrocytes and Blood 472 Pigment. Nature. 1954;173: 848–850.

473 9. Hamoir G. Biochemical adaptation of the muscles of the 474 channichthyidae to their lack in hemoglobin and myoglobin. 475 Comp Biochem Physiol -- Part B Biochem. 1988;90: 557– 476 559. doi:10.1016/0305-0491(88)90295-7

477 10. Sidell BD, O’Brien KM. When bad things happen to good 478 fish: The loss of hemoglobin and myoglobin expression in 479 Antarctic icefishes. J Exp Biol. 2006;209: 1791–1802. 480 doi:10.1242/jeb.02091

481 11. Sidell BD, Vayda ME, Small DJ, Moylan TJ, Londraville RL, 482 Yuan ML, et al. Variable expression of myoglobin among the 483 hemoglobinless Antarctic icefishes. Proc Natl Acad Sci U S 484 A. 1997;94: 3420–3424. doi:10.1073/pnas.94.7.3420

485 12. Small DJ, Moylan T, Vayda ME, Sidell BD. The myoglobin 486 gene of the Antarctic icefish, Chaenocephalus aceratus, 487 contains a duplicated TATAAAA sequence that interferes 488 with transcription. J Exp Biol. 2003;206: 131–139. 489 doi:10.1242/jeb.00067

490 13. O’Brien KM, Xue H, Sidell BD. Quantification of diffusion 491 distance within the spongy myocardium of hearts from 492 antarctic fishes. Respir Physiol. 2000;122: 71–80. 493 doi:10.1016/S0034-5687(00)00139-0

494 14. O’Brien KM, Sidell BD. The interplay among cardiac 495 ultrastructure, metabolism and the expression of oxygen- 496 binding proteins in Antarctic fishes. J Exp Biol. 2000;203: 497 1287–1297.

498 15. Small DJ, Vayda ME, Sidell BD. A novel vertebrate 499 myoglobin gene containing three A+T-rich introns is 500 conserved among antarctic teleost species which differ in 501 myoglobin expression. J Mol Evol. 1998;47: 156–166. 502 doi:10.1007/PL00006372 bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

503 16. Archer SD, Johnston IA. Density of Cristae and Distribution 504 of Mitochondria in the Slow Muscle Fibers of Antarctic Fish. 505 Physiol Zool. 1991;64: 242–258. 506 doi:10.1086/physzool.64.1.30158522

507 17. O’Brien KM. Mitochondrial biogenesis in cold-bodied fishes. 508 J Exp Biol. 2011;214: 275–285. doi:10.1242/jeb.046854

509 18. Biederman AM, Kuhn DE, O’Brien KM, Crockett EL. 510 Mitochondrial membranes in cardiac muscle from Antarctic 511 notothenioid fishes vary in phospholipid composition and 512 membrane fluidity. Comp Biochem Physiol Part - B Biochem 513 Mol Biol. 2019;235: 46–53. doi:10.1016/j.cbpb.2019.05.011

514 19. Dimroth P. Operation of the F0 motor of the ATP synthase. 515 Biochimica et Biophysica Acta - Bioenergetics. 2000. pp. 516 374–386. doi:10.1016/S0005-2728(00)00088-8

517 20. Angevine CM, Fillingame RH. Aqueous access channels in 518 subunit a of rotary ATP synthase. J Biol Chem. 2003;278: 519 6066–6074. doi:10.1074/jbc.M210199200

520 21. Aksimentiev A, Balabin IA, Fillingame RH, Schulten K. 521 Insights into the Molecular Mechanism of Rotation in the F 522 o Sector of ATP Synthase. Biophys J. 2004;86: 1332–1344. 523 doi:10.1016/S0006-3495(04)74205-8

524 22. Noji H, Yasuda R, Yoshida M, Kinosita K. Direct observation 525 of the rotation of F1-ATPase. Nature. 1997;386: 299–302. 526 doi:10.1038/386299a0

527 23. Ma J, Flynn TC, Cui Q, Leslie AGW, Walker JE, Karplus M. 528 A dynamic analysis of the rotation mechanism for 529 conformational change in F1-ATPase. Structure. 2002;10: 530 921–931. doi:10.1016/S0969-2126(02)00789-X

531 24. Artika IM. Current understanding of structure, function and 532 biogenesis of yeast mitochondrial ATP synthase. J Bioenerg 533 Biomembr. 2019;51: 315–328. doi:10.1007/s10863-019- 534 09809-4 bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

535 25. Mitchell P. Coupling of phosphorylation to electron and 536 hydrogen transfer by a chemi-osmotic type of mechanism. 537 Nature. 1961;191: 144–148. doi:10.1038/191144a0

538 26. Abrahams JP, Leslie AGW, Lutter R, Walker JE. Structure 539 at 2.8 Â resolution of F1-ATPase from bovine heart 540 mitochondria. Nature. 1994;370: 621–628. 541 doi:10.1038/370621a0

542 27. Klusch N, Murphy BJ, Mills DJ, Yildiz Ö, Kühlbrandt W. 543 Structural basis of proton translocation and force generation 544 in mitochondrial ATP synthase. Elife. 2017;6: 1–16. 545 doi:10.7554/eLife.33274

546 28. Kubo S, Niina T, Takada S. Molecular dynamics simulation 547 of proton-transfer coupled rotations in ATP synthase FO 548 motor. Sci Rep. 2020;10: 1–16. doi:10.1038/s41598-020- 549 65004-1

550 29. Fearnley IM, Walker JE. Two overlapping genes in bovine 551 mitochondrial DNA encode membrane components of ATP 552 synthase. EMBO J. 1986;5: 2003–2008. 553 doi:10.1002/j.1460-2075.1986.tb04456.x

554 30. Colina-Tenorio L, Dautant A, Miranda-Astudillo H, Giraud 555 MF, González-Halphen D. The peripheral stalk of rotary 556 ATPases. Front Physiol. 2018;9: 1–19. 557 doi:10.3389/fphys.2018.01243

558 31. Devenish RJ, Papakonstantinou T, Galanis M, Law RH, 559 Linnane AW, Nagley P. Structure/function analysis of yeast 560 mitochondrial ATP synthase subunit 8. Ann N Y Acad Sci. 561 1992;671: 403–14. doi:10.1111/j.1749- 562 6632.1992.tb43814.x

563 32. Stephens AN, Khan MA, Roucou X, Nagley P, Devenish RJ. 564 The molecular neighborhood of subunit 8 of yeast 565 mitochondrial F1F0-ATP synthase probed by cysteine 566 scanning mutagenesis and chemical modification. J Biol 567 Chem. 2003;278: 17867–17875. bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

568 doi:10.1074/jbc.M300967200

569 33. Lee J, Ding SJ, Walpole TB, Holding AN, Montgomery MG, 570 Fearnley IM, et al. Organization of subunits in the 571 membrane domain of the bovine F-ATPase revealed by 572 covalent cross-linking. J Biol Chem. 2015;290: 13308– 573 13320. doi:10.1074/jbc.M115.645283

574 34. GRANDIER‐VAZEILLE X, OUHABI R, GUÉRIN M. 575 Antibodies against subunits of F0 sector of ATP synthase 576 from Saccharomyces cerevisiae Stimulation of ATP 577 synthase by subunit‐8‐reactive antibodies and inhibition by 578 subunit‐9‐reactive antibodies. Eur J Biochem. 1994;223: 579 521–528. doi:10.1111/j.1432-1033.1994.tb19021.x

580 35. Allegretti M, Klusch N, Mills DJ, Vonck J, Kühlbrandt W, 581 Davies KM. Horizontal membrane-intrinsic α-helices in the 582 stator a-subunit of an F-type ATP synthase. Nature. 583 2015;521: 237–240. doi:10.1038/nature14185

584 36. Gu J, Zhang L, Zong S, Guo R, Liu T, Yi J, et al. Cryo-EM 585 structure of the mammalian ATP synthase tetramer bound 586 with inhibitory protein IF1. Science (80- ). 2019;364: 1068– 587 1075. doi:10.1126/science.aaw4852

588 37. Mueller IA, Grim JM, Beers JM, Crockett EL, O’Brien KM. 589 Inter-relationship between mitochondrial function and 590 susceptibility to oxidative stress in red- and white-blooded 591 Antarctic notothenioid fishes. J Exp Biol. 2011;214: 3732– 592 41. doi:10.1242/jeb.062042

593 38. Madeira F, Park YM, Lee J, Buso N, Gur T, Madhusoodanan 594 N, et al. The EMBL-EBI search and sequence analysis tools 595 APIs in 2019. Nucleic Acids Res. 2019;47: W636–W641. 596 doi:10.1093/nar/gkz268

597 39. Rozewicki J, Li S, Amada KM, Standley DM, Katoh K. 598 MAFFT-DASH: Integrated protein sequence and structural 599 alignment. Nucleic Acids Res. 2019;47: W5–W10. 600 doi:10.1093/nar/gkz342 bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

601 40. Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton 602 GJ. Jalview Version 2-A multiple sequence alignment editor 603 and analysis workbench. Bioinformatics. 2009;25: 1189– 604 1191. doi:10.1093/bioinformatics/btp033

605 41. Letunic I, Bork P. Interactive Tree of Life (iTOL) v4: Recent 606 updates and new developments. Nucleic Acids Res. 607 2019;47: 256–259. doi:10.1093/nar/gkz239

608 42. Gasteiger E, Gattiker A, Hoogland C, Ivanyi I, Appel RD, 609 Bairoch A. ExPASy: The proteomics server for in-depth 610 protein knowledge and analysis. Nucleic Acids Res. 611 2003;31: 3784–3788. doi:10.1093/nar/gkg563

612 43. Gasteiger E, Hoogland C, Gattiker A, Duvaud S, Wilkins 613 MR, Appel RD, et al. The Proteomics Protocols Handbook. 614 Proteomics Protoc Handb. 2005; 571–608. 615 doi:10.1385/1592598900

616 44. Yang J, Zhang Y. I-TASSER server: New development for 617 protein structure and function predictions. Nucleic Acids 618 Res. 2015;43: W174–W181. doi:10.1093/nar/gkv342

619 45. Guo J, Yang H, Zhang C, Xue H, Xia Y, Zhang JE. Complete 620 mitochondrial genome of the apple snail Pomacea diffusa 621 (Gastropoda, Ampullariidae) with phylogenetic 622 consideration. Mitochondrial DNA Part B Resour. 2017;2: 623 865–867. doi:10.1080/23802359.2017.1407683

624 46. Lin GM, Xiang P, Sampurna BP, Hsiao C Der. Genome 625 skimming yields the complete mitogenome of Chromodoris 626 annae (Mollusca: Chromodorididae). Mitochondrial DNA 627 Part B Resour. 2017;2: 609–610. 628 doi:10.1080/23802359.2017.1372715

629 47. Yang T, Xu G, Gu B, Shi Y, Mzuka HL, Shen H. The 630 complete mitochondrial genome sequences of the 631 Philomycus bilineatus (Stylommatophora: Philomycidae) 632 and phylogenetic analysis. Genes (Basel). 2019;10: 1–13. 633 doi:10.3390/genes10030198 bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

634 48. Fox TD. Five TGA “stop” codons occur within the translated 635 sequence of the yeast mitochondrial gene for cytochrome c 636 oxidase subunit II. Proc Natl Acad Sci U S A. 1979;76: 637 6534–6538. doi:10.1073/pnas.76.12.6534

638 49. Verny C, Guegen N, Desquiret V, Chevrollier A, Prundean 639 A, Dubas F, et al. Hereditary spastic paraplegia-like disorder 640 due to a mitochondrial ATP6 gene point mutation. 641 Mitochondrion. 2011;11: 70–75. 642 doi:10.1016/j.mito.2010.07.006

643 50. Pfeffer G, Frcpc CM, Blakely EL, Alston CL, Boggild M, 644 Horvath R, et al. Europe PMC Funders Group Adult-onset 645 spinocerebellar ataxia syndromes due to MTATP6 646 mutations. 2014;83: 883–886. doi:10.1136/jnnp-2012- 647 302568.Adult-onset

648 51. Arnesen T, Van Damme P, Polevoda B, Helsens K, Evjenth 649 R, Colaert N, et al. Proteomics analyses reveal the 650 evolutionary conservation and divergence of N-terminal 651 acetyltransferases from yeast and humans. Proc Natl Acad 652 Sci U S A. 2009;106: 8157–8162. 653 doi:10.1073/pnas.0901931106

654 52. Donath A, Jühling F, Al-Arab M, Bernhart SH, Reinhardt F, 655 Stadler PF, et al. Improved annotation of protein-coding 656 genes boundaries in metazoan mitochondrial genomes. 657 Nucleic Acids Res. 2019;47: 10543–10552. 658 doi:10.1093/nar/gkz833

659 53. Hwang SR, Garza CZ, Wegrzyn JL, Hook VYH. 660 Demonstration of GTG as an alternative initiation codon for 661 the serpin endopin 2B-2. Biochem Biophys Res Commun. 662 2005;327: 837–844. doi:10.1016/j.bbrc.2004.12.053

663 54. Villalobos-ar AR, Casas-casta M, Guti E, Perea FJ, Thein 664 SL, Ibarra B. Globin gene haplotypes in Mexican mestizos. 665 1997; 498–500.

666 55. Isashiki Y, Ohba N, Yanagita T, Hokita N, Doi N, Nakagawa bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

667 M, et al. Novel mutation at the initiation codon in the Norrie 668 disease gene in two Japanese families. Hum Genet. 669 1995;95: 105–108. doi:10.1007/BF00225085

670 56. Ferguson RJ, De Morais SMF, Benhamou S, Bouchardy C, 671 Blaisdell J, Ibeanu G, et al. A new genetic defect in human 672 CYP2C19: Mutation of the initiation codon is responsible for 673 poor metabolism of S-mephenytoin. J Pharmacol Exp Ther. 674 1998;284: 356–361.

675 57. Kitamoto T, Wang W, Salvaterra PM. Structure and 676 organization of the Drosophila cholinergic locus. J Biol 677 Chem. 1998;273: 2706–2713. doi:10.1074/jbc.273.5.2706

678 58. Gadaleta G, Pepe G, Candia G De, Quagliariellol C, Sbisa 679 E, Saccone C, et al. Nucleic Acids Research Nucleotide 680 sequence of rat nitochondrial NADH dehydrogenase 681 subunit 1 . GTG , a new initiator codon in vertebrate 682 mitochondrial genome We have determined the nucleotide 683 sequence of cloned fragments that contain with ATT . 684 However th. 1988;22: 6233.

685 59. Choi JH, Shin JH, Seo JH, Jung JH, Choi KD. A start codon 686 mutation of the FRMD7 gene in two Korean families with 687 idiopathic infantile nystagmus. Sci Rep. 2015;5: 5–10. 688 doi:10.1038/srep13003

689 60. Panicker IS, Browning GF, Markham PF. The effect of an 690 alternate start codon on heterologous expression of a PhoA 691 fusion protein in mycoplasma gallisepticum. PLoS One. 692 2015;10: 1–10. doi:10.1371/journal.pone.0127911

693 61. Short JD, Pfarr CM. Translational regulation of the JunD 694 messenger RNA. J Biol Chem. 2002;277: 32697–32705. 695 doi:10.1074/jbc.M204553200

696 62. Claus P, Döring F, Gringel S, Müller-Ostermeyer F, Fuhlrott 697 J, Kraft T, et al. Differential intranuclear localization of 698 fibroblast growth factor-2 isoforms and specific interaction 699 with the survival of motoneuron protein. J Biol Chem. bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

700 2003;278: 479–485. doi:10.1074/jbc.M206056200

701 63. Rhee S, Yang SJ, Lee SJ, Park D. βPix-bL, a novel isoform 702 of βPix, is generated by alternative translation. Biochem 703 Biophys Res Commun. 2004;318: 415–421. 704 doi:10.1016/j.bbrc.2004.04.039

705 64. Silva G, Lima FP, Martel P, Castilho R. Thermal adaptation 706 and clinal mitochondrial DNA variation of European 707 anchovy. Proc R Soc B Biol Sci. 2014;281. 708 doi:10.1098/rspb.2014.1093

709 65. Sun YB, Shen YY, Irwin DM, Zhang YP. Evaluating the roles 710 of energetic functional constraints on teleost mitochondrial- 711 encoded protein evolution. Mol Biol Evol. 2011;28: 39–44. 712 doi:10.1093/molbev/msq256

713 66. Deng Z, Wang X, Xu S, Gao T, Han Z. Population genetic 714 structure and selective pressure on the mitochondrial ATP6 715 gene of the Japanese sand lance Ammodytes personatus 716 Girard. J Mar Biol Assoc United Kingdom. 2019;99: 1409– 717 1416. doi:10.1017/S0025315419000225

718 67. Ješina P, Tesařová M, Fornůsková D, Vojtíšková A, Pecina 719 P, Kaplanová V, et al. Diminished synthesis of subunit a 720 (ATP6) and altered function of ATP synthase and 721 cytochrome c oxidase due to the mtDNA 2 bp microdeletion 722 of TA at positions 9205 and 9206. Biochem J. 2004;383: 723 561–571. doi:10.1042/BJ20040407

724 68. Yakovchuk P, Protozanova E, Frank-Kamenetskii MD. 725 Base-stacking and base-pairing contributions into thermal 726 stability of the DNA double helix. Nucleic Acids Res. 727 2006;34: 564–574. doi:10.1093/nar/gkj454

728 69. Satoh TP, Miya M, Mabuchi K, Nishida M. Structure and 729 variation of the mitochondrial genome of fishes. BMC 730 Genomics. 2016;17. doi:10.1186/s12864-016-3054-y

731 70. Belinky F, Babenko VN, Rogozin IB, Koonin E V. Purifying bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

732 and positive selection in the evolution of stop codons. Sci 733 Rep. 2018;8: 1–11. doi:10.1038/s41598-018-27570-3

734 71. Erhardt M, Wegrzyn RD, Deuerling E. Extra N-terminal 735 residues have a profound effect on the aggregation 736 properties of the potential yeast prion protein Mca1. PLoS 737 One. 2010;5. doi:10.1371/journal.pone.0009929

738 72. Schwartz GW, Shauli T, Linial M, Hershberg U. Serine 739 substitutions are linked to codon usage and differ for 740 variable and conserved protein regions. Sci Rep. 2019;9: 1– 741 9. doi:10.1038/s41598-019-53452-3

742 73. Guillermo Garcia-Manero, Hui Yang, Shao-Qing Kuang, 743 Susan O’Brien, Deborah Thomas and HK. 基因的改变NIH 744 Public Access. Bone. 2005;23: 1–7. 745 doi:10.1016/j.str.2010.01.020.Structure/function

746 bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2021.01.11.426195; this version posted January 11, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.