<<

The Phylogenetic Analysis of C to Gain an Insight into Mitochondrial and Endosymbiotic Transfer

By Rhys M. Thomas

Word Count-20,614 Abstract Word Count-377

Submitted in partial fulfilment of the requirement for the degree of Master of Science in Molecular

School of Sciences and Education Staffordshire University

Submitted August 2020

Student Number-18022710

Contents Table of Figures ...... 4 Abstract ...... 5 1.0 Introduction ...... 6 1.1 The Mitochondria ...... 6 1.1.1 Mitochondrial Function...... 8 1.1.2 Mitochondrial Origins ...... 10 1.1.3 Mitochondrial ...... 14 1.1.4 Mitochondrial Gene Transfer ...... 18 1.2 ...... 22 1.2.1 Cytochrome C Structure ...... 22 1.2.2 Eukaryotic Function ...... 23 1.2.3 Prokaryotic Function ...... 26 1.2.4 Cytochrome C Similarity ...... 27 1.2.5 Cytochrome C’s ...... 28 1.3 ...... 29 1.3.1 Cytochrome C in Phylogenetics ...... 30 1.3.2 How Phylogenetic Trees Work ...... 31 1.3.3 Maximum Likelihood Phylogenetic Trees ...... 32 1.4 Projects Aims, Objectives and Hypotheses ...... 33 2.0 Methods and Materials ...... 35 2.1 How the Sequences Were Chosen ...... 35 2.2 Databases Utilised ...... 37 2.3 Software and Tools Utilised ...... 37 2.4 Statistical Analysis ...... 39 3.0 Results ...... 40 3.1 Cytochrome C Structure and Function ...... 42 3.2 Cytochrome C Sequence Alignments ...... 44 3.3 Cytochrome C Sequence Variation ...... 46 3.4 Mitochondrial and Nuclear Cytochrome C Sequence Comparison ...... 48 3.5 Cytochrome C ...... 50 3.6 Cytochrome C Syntenic Analysis ...... 52 3.7 Estimated of Divergence ...... 55 4.0 Discussion ...... 56 4.1 Origins of the Mitochondria ...... 57 4.2 Cytochrome C Structure and Function ...... 59 4.3 Cytochrome C Sequence Variations ...... 62 2

Student Number-18022710

4.4 Mitochondrial and Nuclear Cytochrome C Variations ...... 64 4.5 Cytochrome C Phylogenetic Tree ...... 68 4.6 Cytochrome C Syntenic Analysis ...... 72 4.7 Limitations of the Study ...... 76 4.8 Further Work ...... 77 5.0 Conclusion...... 79 References ...... 80 Acknowledgements ...... 94 Appendix 1 – Project Declaration Form ...... 95 Appendix 2 – Raw Entropy ...... 96 Appendix 3 – Prokaryotic and Eukaryotic Maximum Likelihood Tree ...... 102 Appendix 4 – Abbreviation List ...... 103

3

Student Number-18022710

Table of Figures

Figure 1. Diagrammatic Mitochondria...... 6 Figure 2. A Tree of Life...... 12 Figure 3. Cytochrome C Variation in 3D Structure...... 43 Figure 4. MUSCLE Sequence Alignment of Eukaryotic and Prokaryotic Cytochrome C...... 45 Figure 5. Cytochrome C Eukaryotic and Prokaryotic Entropy Plot...... 47 Figure 6. Comparison of D. variabilis Cytochrome C under Different Genetic Codes...... 49 Figure 7. Evolutionary analysis of Cytochrome C by Maximum Likelihood method...... 51 Figure 8. Syntenic Analysis of Various Cytochrome C Genomic Regions...... 54 Figure 9. Microscope of Parakaryon myojinensis...... 58

4

Student Number-18022710

Abstract

4.6 billion years ago life began at submarine volcanic vents referred to as ‘black smokers’ common during period. These supplied thermal and a constant supplied of inorganic molecules separated by a pH gradient similar to what seen in mitochondria. However, it wasn’t until the ‘’ approximately 2.4 million years ago, the environment began to oxidise in cycles powered by ’s production. This would lead to an eventual endosymbiotic event. In which the uptake of a α-proteobacterium by an archaeon predating 1.45 billion years ago, dated via the earliest eukaryotic . Then came the ‘ explosion’ 540 million years ago which gave rise to the huge diversity witnessed today. However, since the endosymbiotic event almost 98% of that are required for mitochondrial function have undergone endosymbiotic gene transfer to the nuclear genome.

Using cytochrome c as an example due to it being nearly entirely encoded on the nuclear genome, an essential with a small conserved size making it a prime example to be utilised in phylogenetic analysis. The method used software to create a maximum likelihood tree to infer evolution relationships and online resources to identify syntenic regions around the cytochrome c gene to infer a common endosymbiotic gene transfer event. Including of divergence between species inferring a possible time of endosymbiotic gene transfer.

The results biased in Chordata data due to the availably of the genetic information accessible via online databases, indicate that the endosymbiotic event occurred prior to the divergence of and , 312 million years ago. Inferred by the use of TimeTree to estimate the time of divergence of two given species that CoGe was able to identify syntenic regions of 200,000 bases on the 5’ and 3’ side of cytochrome c gene.

For the first time the endosymbiotic gene transfer of cytochrome c has been mapped and an estimated time that the transfer of cytochrome c occurred prior to 312 million years ago. However, this research also identified this is a conservative estimate due to the limitation in the diversity of genetic data and it is plausible this event may have occurred over 1660 million years ago. While other mitochondrial which have undertaken endosymbiotic gene transfer events appear to have not been studied.

5

Student Number-18022710

1.0 Introduction

This research targets the mitochondrial protein cytochrome c (CytC) which can be located on both the nuclear and mitochondrial of . Conserved through both eukaryotes and genomes, CytC is a protein relentlessly studied via phylogenetic analysis. The continuation of this analysis of CytC will continue to elucidate its own and potentially the evolutionary history of the mitochondria.

1.1 The Mitochondria

Mitochondria are a ancient in eukaryotic cells (Friedman and

Nunnari, 2014), these double membrane bound pictured below, (Figure 1) are present in all cells, making up to 10% of cellular volume except red cells. While measuring approximately 0.75-3.0µm and can be reach 1000 individual mitochondria in a given . Although, these pleomorphic numbers vary based on the cell type, cycle and intracellular state

(Srivastava and Pande, 2016). Conserved across all know eukaryotes due to the and commonly inherited genes, even eukaryotes lacking ‘traditional’ mitochondria contain and which are of mitochondrial descent owing to their shared protein import processes and are sisters branching off mitochondrial-bearing lineages (Martin and Mental,

2010).

Figure 1. Diagrammatic Mitochondria. A partial cross section image of a mitochondria, showing the all the major components of the organelle, created using BioRender software. 6

Student Number-18022710

In 1890 a published histological description was published by Richard Altman, in which he termed them bioblasts, were he believed they were autonomous important to both metabolic and genetic functions (Altmann, 1894; O'Rourke, 2010). Eight years later in 1898, Carl Brenda coined the term mitochondria, originating from the Greek “mitos” meaning thread and “chondros” meaning . Due to their appearance in long chains during spermatogenesis (Benda, 1898;

Srivastava and Pande, 2016). In 1912, Benjamin Kingsbury hypothesised that mitochondria are involved in respiration based on the morphological changes. However, it would not be until 1949 that this became widely accepted when Eugene Kennedy and Albert Lehninger were able to demonstrate that respiratory are located within mitochondria (Kennedy and Lehninger,

1949; Pagliarini and Rutter, 2013). Between this confirmation, in 1925 Charles MacMunn’s work was rediscovered by David Keilin. MacMunn described a respiratory pigment found in muscles and tissues among all orders of the , these pigments have two states, the reduced state gives of a spectrum with four absorbance bands. Keilin, furthered this work and characterised the respiratory pigment and named them (Pagliarini and Rutter, 2013; Keilin and Hardy,

1925). Then (then Lynn Sagan) published an article in 1967 that was reportedly rejected by more than a dozen journals. However, now it is regarded as the resurgence of endosymbiotic theory of organelle origins. The article theorised that mitochondria, photosynthetic and bodies of flagella were once free-living prokaryotic cells (Gray, 2017; Sagan,

1967).

The mitochondria are separated from the cellular by the outer membrane, while the inner membrane forms cristae due to the invaginated of the membrane. This is utilised to increase the surface area and help define the of the organelle (Taanman, 1999).

The inner and outer membrane are both comprised of completely different protein content and therefore, function differently. The outer membrane consisting of mostly (30-35kDa), these facilitate the passage of and small molecules through the transmembrane channel. Uncharged molecules of approximately 5kDa are able to pass through the outer membrane. However, for

7

Student Number-18022710 proteins greater then this a binding of a signalling sequence at their N-terminus with a large multi- subunit protein on the outer membrane referred to as translocase is required (Herrmann and

Neupert, 2000). This actively moves the signalled protein across the membrane, while other mitochondrial pro-proteins are imported via specialized translocation proteins (Srivastava and

Pande, 2016).

The mitochondrial inner membrane contains the highest proportion of protein than any other cellular membrane containing up to 76% of the organelle’s proteins. This membrane is a closely regulated permeability barrier which is the site for the chemiosmotic apparatus for energy production, covered in detail in section 1.1.1 (Srivastava and Pande, 2016).

Enclosed by the inner membrane the matrix contains approximately 2/3rd of the total proteins, comprised of hundreds of enzymes, tRNA, mitochondrial DNA (mtDNA) and specialised

(Srivastava and Pande, 2016). While each carries 5-10 copies of mtDNA the number of varies between organisms covered in detail in section 1.1.4. Not only this but each mitochondrion processes the ability to synthesize its required proteins identified in 1958 by John

R. Mclean (McLean et al., 1958). Termed mitoribosomes, mammalian versions are composed of mt-large subunit and mt-small subunit similar to ribosomes however, their density is vastly different sedimenting at 55S, unlike eukaryotic cytosolic and bacterial versions sedimenting at either 80S or

70S (Mai, Chrzanowska-Lightowlers and Lightowlers, 2017). Thus, allowing the mitochondria to synthesise its own proteins allowing for local control in response to environmental stresses.

1.1.1 Mitochondrial Function

The mitochondria are often referred to as the ‘powerhouse of the cell’ although, it could be considered a ‘gate between life and .’ As this accurately depicts the vital functions undertaken

(Cottet-Rousselle et al., 2011). These cellular organelles have five key roles within the cell, ATP production, regulators of , generation of free radicals, biogenesis of and a role in (Srivastava and Pande, 2016). Although only two utilise CytC, the production of

ATP and apoptosis. 8

Student Number-18022710

The production of ATP is the principle role of the mitochondria and is achieved via oxidative (OXPHOS), in which 13 times more ATP is produced during this aerobic process when compared to , an anaerobic process (Rich, 2003). The products of , pyruvate which are produced in cytosol and transported into the mitochondria, more NADH and

FADH2 are also produced via the cycle within the matrix. The OXPHOS system is located on the cristae and form the mitochondria respiratory chain and F0F1 ATP Synthase (complex V, EC

7.1.2.2). Four multimeric complexes named NADH dehydrogenase (complex I, EC 7.1.1.2), (complex II, EC 1.3.5.1), cytochrome C reductase (complex III, EC 7.1.1.8) and (complex IV, EC 7.1.1.9,) and two mobile electron carriers’ coenzyme Q and

CytC comprise the mitochondrial respiratory chain. transferred from either NADH or

FADH2 are shuttled across the (ETC) complexes. This shuttling incrementally releases energy which is utilised to pump protons into the intermembrane . The stacking of protons within the inner membrane space results in an , this potential energy is utilised by F0F1 ATP Synthase via to phosphorylate ADP to ATP.

ATP then can be transported out of the mitochondria through both inner and outer membrane by adenine translocator and porin (Srivastava and Pande, 2016). Although this process is efficient small amounts of electrons can prematurely reduce O2 forming

(ROS) resulting in , mtDNA damage leading to (Huang and Manton, 2004).

Apoptosis can be initiated via two pathways, extrinsic and intrinsic pathway also referred to as the mitochondrial pathway. Within the intrinsic pathway CytC primary known for its function in ATP synthesis is utilised. However, when the cell receives an apoptotic such as the genes upregulated by such as bax, bak, and noxa, the activation of the and permeability transition pore, evidence suggests that these members of the B-cell lymphoma protein-2 are required for CytC release (Degenhardt et al., 2002; Ow et al., 2008; Srivastava and

Pande, 2016). Although, the mechanisms in which CytC are released from the inter-membrane space of the mitochondria through the outer membrane are controversial and remain largely

9

Student Number-18022710 unclear, most evidence supports the increased permeability of the outer member while others involve the restructuring of the cristae within the mitochondria (Ow et al., 2008; Wang and Youle,

2009). Once CytC is released into the cytosol it forms a complex with apoptotic protease activating factor 1 (Apaf-1) binding to the WD40 region inducing a conformational change displacing and opening the CARD . ATP is also bound to the nucleotide binding domain undergoing hydrolysis again causing a conformational changes resulting in a locked form and coassembles with six other subunits forming a symmetric wheel in which recruit’s procaspase-9 forming the active . This docking of the now activated caspase-9 cleaves procaspase-3 to its active from caspase-3 the ‘executioner caspase’ causing a caspase cascade and apoptosis (Wang and Youle,

2009). The use of CytC in apoptosis is evolutionary-conserved, as ’s purified CytC termed CytC551 which is utilised as an electron transport protein during the denitrification process. Despite, being of prokaryotic origin, CytC551 was able to enter J771 cell line- derived macrophages and induce apoptosis (Hiraoka et al., 2004).

1.1.2 Mitochondrial Origins

Prior to the events leading up to the origin of mitochondria a knowledge of the changes in atmospheric conditions are required. Starting 4.6 billion years ago and the one of the biggest riddles in science, the origins of life. The first suggestion, lightning in the early volatile atmosphere supplied the energy to synthesize such molecules from the ‘primordial soup’ and the origins of life were carbon and hydrocarbons synthesised within comets and meteorites as they burned in the atmosphere (Joyce, 2007). However, currently the most plausible is that submarine volcanic vents referred to as ‘black smokers’ common during the Archean period supplied the thermal energy and a constant supply of reactive inorganic molecules separated by a pH gradient similar to the proton gradient seen across the mitochondria inner membrane. The movement across this membrane released energy and synthesised more complex molecules (Herschy et al., 2014). Both theories rely on that biological building blocks such as nucleotides and amino acids are the fundamentally the most energy stable given the appropriate conditions. From this life continued to evolve in to both

10

Student Number-18022710 anaerobic prokaryotes and within a reducing environment, in which oxygen was present at <1ppm, where methanogenesis was a potential energy source. Then approximately, 2.4 billion years ago the environment began to alter to an oxidizing atmosphere in cycles during the ‘great oxidation event’, powered by cyanobacteria’s production of oxygen. With this change in the environment it would lead to the eventual endosymbiotic event (Schirrmeister et al., 2013).

All variations in the endosymbiotic models for the origin of mitochondria are variations on two basic models and are referred to as the ‘archezoan scenario’ in which the proto-mitochondrial was an archezoan and a hypothetical early lacking a mitochondrion.

Secondly, the ‘ scenario’ in which an uptake of an α-proteobacterium by a archaea, lead to the modern day mitochondria and compartmentalization of eukaryotic cells known today

(Koonin, 2010). The archezoan scenario, closely resembled the classic hypothesis of mitochondrial beginning by Lynn Margulis (Margulis, 1970) whereas, the symbiogenesis scenario takes into consideration the atmospheric changes to a oxidizing atmosphere and the hydrogen hypothesis put forward in 1998 (Martin and Müller, 1998).

Two findings ultimately led to the abandonment of the archezoan scenario was firstly, rigorous phylogenetic reconstructions shown the initial early-branching seen in archezoan taxa in the eukaryotic tree is a methodological artefact. This was caused by a high rate of sequence divergence of the chosen archezoan samples utilised for the analysis. This gave raise to and therefor, incorrectly placed these archezoan at the base of the eukaryotic (Gray, 2012).

Now, the eukaryotic tree is depicted more accurately as a bush where no lineages diverge earlier than the other seen below in figure 2. (Koonin, 2010). Secondly, lineages believed to be without mitochondria, amitochondrial were actually re-termed mitochondrial-related organelles (MRO) due to on closer inspection the presents of mitochondrial remnants. Discussed further in section

1.1.3. Therefore, there are no current amitochondrial eukaryotic lineages although this does not mean they do not exist merely yet to be discovered (Gray, 2012).

11

Student Number-18022710

Figure 2. A Tree of Life. A tree based on whole genomes shows the chimeric origins of eukaryotes in which no eukaryotes divergence earlier then another. In which the archaeal cell (purple) acquires a α- proteobacterium (blue) via endosymbiosis which formed the basis of the mitochondria. Also depicted is the acquisition of (green) in plantae (Lane, 2015).

Currently the most prominent explanation is the symbiogenesis scenario and the hydrogen hypothesis in which an autotrophic archaeon and strictly hydrogen dependant as seen in methanogenesis came together with a α-. As hydrogen was diminishing commodity in the changing oxidative environment and a α-proteobacterium that was able to respire and generate molecular oxygen as a waste product of heterotrophic . It was the archaeon’s dependence on molecular oxygen that produced a symbiotic relationship (Martin and Müller, 1998;

Koonin, 2010). Therefore, from this symbiogenesis the defining features of the eukaryotic cell appeared afterwards (Gray, 2012).

Arguments against this theory are documented such as the event itself which is a eukaryotic hallmark and essential for the incorporation of a bacterial symbiont. However, bacterial endosymbiosis’ are documented such as a γ-proteobacteria inside a β-proteobacteria. Although,

12

Student Number-18022710 archaeon endocytosis is still unclear (von Dohlen et al., 2001; Thao, Gullan and Baumann, 2002). It has been argued more recently that the endosymbiosis event was in line with an ‘inside-out’ theory.

This states that an increasing intimate association between the archaea and the α-proteobacterium membranes, drove the host to form protrusions and bleb to achieve greater surface area. Then these protrusions engulfed the α-proteobacterium and the multiple membranes gave rise to the nuclear membrane and the endoplasmic reticulums (Baum and Baum, 2014).

Another argument aimed at the symbiogenesis scenario is that in many eukaryotic lineages the bacterial-type genes within the genome appear to be from a wide array of prokaryotic lineages or fail to associate with any specific . The three strongest signals are cyanobacteria, proteobacteria in which α-Proteobacteria are a definite signal and thermoplasmatales (Pisani,

Cotton and McInerney, 2007). It would be expected that the overwhelming genetic signal would be from a α-proteobacteria although, this is not the case. It potentially eludes to the fact that the ancestral genomes were already a mixture of different origins due to a process of . However, this is based on assumptions due to the unknown lineages and metabolic types of the symbionts (Gray, 2012).

The hydrogen hypothesis claims to explain the simultaneous origins of both aerobic and within eukaryotes in which was contributed by the α-proteobacteria which was able to differently express both methods of respiration depending on the environmental conditions assuming the α-proteobacteria contained both pathways. It further tries to explains that the loss of aerobic respiration genes in eukaryotic lineages in which mitochondria were converted to MRO which function in anaerobic respiration and were subsequentially inherited vertically and clustering as a monophyletic lineage. However, more detailed researched within the proteins utilised in anaerobic pyruvate metabolism in eukaryotes conflicted this prediction by showing MROs and their enzymes are a high degree of independent and convergent evolution (Hug, Stechmann and Roger,

2010).

13

Student Number-18022710

The earliest eukaryotic microfossils date back to 1.45 billion years ago and due to the assumed dependence the symbiogenesis scenario has on mitochondria being key in the development of eukaryotes. It can be seen as the minimum of the mitochondria and an estimation based on starting date of eukaryotic evolution until the discovery of a pre-dating this. Geochemical views show this corresponds to the times in which the mostly anoxic and only ‘primitive’ lifeforms existed (Martin and Mental, 2010). Then approximately 540 million years ago () the

’ occurred and the appearance of metazoans (Zhuravlev and Wood, 2018).

Owing to the single symbiogenesis event occurring once in 4 billion years that enabled to vast seen today due to the energy producing potential of eukaryotes allowed for more complex genomes dependent on the power supplied by mitochondria while the prokaryotic genomes size and complexity has remains constrained by biogenetics (Lane and Martin, 2010).

1.1.3 Mitochondrial Genome

In most metazoans the mtDNA is firmly anchored to the inner membrane of matrix and packaged in to a , protein-DNA complexes comprised of mostly mitochondrial factor A

(Gammage and Frezza, 2019) . In initially sequenced mammals the mtDNA genome was found to be approximately 16-18kbp of circular DNA which encode for 2 rRNA, 22, tRNA and 13 respiratory proteins (Anderson et al., 1981; Anderson et al., 1982; Bibb et al., 1981). Although, later in 1998 it was uncovered that there is a great variation in size, physical and coding capacity across the eukaryotes (Gray et al., 1998). This variation can be attributed to the addition of proteins that are encoded by the mtDNA, mostly these are extra respiratory proteins and ribosomal proteins.

However, these genomes have also shown vast reductions in size losing multiple proteins. Currently the smallest mitochondrial genome contains only three proteins, mt-small subunit which is highly fragmented and rearranged and mt-large subunit, rRNA genes and no tRNA genes, this genome belongs to falciparum at approximately 6kb (Feagin, 2000; Gray, 2012). Opposite to this land have an expanded greatly in size although not in coding capacity. mitochondrial genomes can reach up to 17,000kb and often non-circular and in divided into

14

Student Number-18022710 separate in specific species (Hahn and Zuryn, 2019). Throughout this sequencing of mitochondrial genomes looked like typical bacterial genomes (Gray, 2012).

As mentioned mtDNA is , this allows for variants in alleles either inherited or arising sporadically, to be normally contained to only a small fraction of the mitochondria gene pool within a cell. This allows for detrimental to be tolerated without consequences in small quantities. However, if the mutations exceed the biochemical threshold the same can cause serious disease if not lethal to an individual. However, this is a complex phenotypic manifestation due to numerous factors such as the mutation, cell and type (Hahn and Zuryn,

2019).

The mitochondrial genome of mammals differs to their nuclear genome in multiple unique ways although it varies between eukaryotes, the genes that are encoded by the mtDNA lack and have an alternative . The mitochondrial code encodes for an extra , (AUA) instead of isoleucine and two extra stop codons (AGA, AGG) instead of while a stop codon (UGA) in the nuclear genome is replaced with the

(Barrell, Bankier and Drouin, 1979). Finally, in 2016 it was published that another isoleucine (AUU) was replaced in the mitochondria code with a start codon (Haag et al., 2016). However, this research was done in sapiens it is becoming clear that codon usage actually differs between species. Such as the mitochondrial genetic code in which AGA and AGG encode for serine, AUA encodes for methionine instead of isoleucine and UGA encodes for tryptophan instead of a stop codon. (Bender, Hajieva and Moosmann, 2008). There is a general assumption that the mitochondrial genome is also controlled by a single regulatory sequence, a A+T rich, non-coding region. This region contains two transcription start sites, on the light and heavy DNA strands. This method of transcription is polycistronic, the same region also is the for the mitochondrial genome. However, this has been known to abort soon after initiation, this a displacement loop containing the template, daughter and a displaced lagging strand (Hahn and

15

Student Number-18022710

Zuryn, 2019). As well as standard mitochondrial genomes other extreme forms exist referred to as

MRO.

Both hydrogenosomes and mitosomes are an extreme versions of mitochondrial genome reduction referred to as MROs, both hydrogenosomes and mitosomes lack mtDNA completely.

They are distinguished from each other as hydrogenosomes retain the ability to generate ATP when compared to mitosomes which do not (Gray, 2012). Originally discovered in 1973, the as mentioned lacks a genome a complete , ETC and cytochromes.

As their name eludes to molecular hydrogen one of the end products of a level pathway utilising pyruvate and a characteristic set of expected proteins (Gray, 2012; Lindmark and Müller,

1973). The , is unable to generate ATP and are normally discovered in anaerobic, parasitic . Mitosomes are capable of very little metabolic activity although, the Fe-S cluster formation pathway is present in both conventional mitochondria and MROs instead of the expected OXPHOS.

It may be this Fe-S cluster that is the reason of being for the mitochondria and its evolutionary products (Tovar et al., 2003).

There also appears to be a transitional form of MROs in specific anaerobes, blurring the boundaries between the typical mitochondria and MROs. These novel MROs are able to utilise the

ATP-producing hydrogenase pathway however, unlike hydrogenosomes the retain their genome although they lack multiple mtDNA-encoded genes such as respiratory complex III, IV and V. Due to this they are unable to generate ATP via OXPHOS but are able to perform more complex then both hydrogenosomes and mitosomes (Gray, 2011).

The origins and the evolution of the mitochondria remain elusive although, currently it is understood that a minimally derived mitochondrial genome has been identified in Reclinomonas americana, this eukaryote is a termed . Its mitochondrial genome contains

27 proteins the richest mitochondrial genome identified to date on 69kb mtDNA. This genome resembles a typical it has been referred to as ‘a eubacterial genome in miniature’

16

Student Number-18022710

(Lang et al., 1997). This is because of the apparent bacterial characteristics such as putative Shine-

Dalgarno motifs, operon-like gene clusters, highly like rRNA (Gray, 2012).

Rickettsia prowazekii’s genome was shown to resemble a mitochondrial genome due to reduced genome and dependence on a host cell. Although, this genome reduction is an independent evolutionary reduction. Potentially the mitochondria and share a common ancestor. Supported by a consistent phylogenetic findings however, it is uncertain whether they are sister groups or mitochondria branch from the Rickettsiales and more precisely the

Rickettsiaceae family. The robustness of such data has been questioned (Esser et al., 2004), based on the inferred relationship may potentially be a phylogenetic artefact due to the high A+T content of genomes resulting in a long branch artefact. Although it must be stated that bacteria in general tend to have higher A+T contents (Almpanis et al., 2018). Due to this more comprehensive phylogenomic approach are recommended.

Due to the Global Oceanic Survey, metagenomic data shown that oceanic α-proteobacteria are abundant. One particular α-proteobacteria shown potential, taken form the SAR11 clade comprised between 30-40% of total oceanic cell count. Candidatus Pelagibacter ubique contains the smallest free-living bacterium genome of 1.3Mb (Giovannoni et al., 2005). It has been reported that this clade and . P. ubique share a common ancestor with mitochondria and together form a sister group to Rickettsialles (Thrash et al., 2011). However, this has been rebuked suggesting in fact Ca.

P. ubique are more closely related to α-proteobacteria with large genomes and their origins in soil and the oceans. Pushing forward that a rare oceanic group α-proteobacteria termed Oceanic

Mitochondria Affiliated Clade (OMAC) is in fact the closest living relative to mitochondria

(Brindefalk et al., 2011; Viklund, Ettema and Andersson, 2012).

These differing opinions typify the challenges of phylogenetic analyses based on systematic errors, biased taxon sampling and redistricted gene content of the mitochondrial genome.

Worsened by the mosaicism in the mitochondrial genome which is believed to be comprised of

Ricketsiales, Rhizobiales and in certain other sampled mitochondria specially the R. americana 17

Student Number-18022710 other α-proteobacteria genes. These finding suggest a scenario in which both Rickettsiales and

Rhizobiales combined within a archaea cell approximately 1.5 billion years ago which resulted in gene loss and rearrangements and furthermore 500-600 MYA and again 30-90 million ago under went horizontal gene transfer and recombination events (Georgiades and Raoult, 2011). However, given the rarity of endosymbiosis events occurring successfully the merger of two bacterium within an archaea seems complex and unlikely while the horizontal gene transfer prior to endosymbiosis seems more plausible. However, it must be stated that the identity of the mitochondria origin remains elusive.

What is known is the reduction in the mitochondrial genome is not only in the loss of unrequired genes due to the presence of nuclear counter parts but the transfer of mitochondrial genes to the nuclear genome also (Gray, 2012). However, the mitochondria retain certain genes this allows for local control of processes such as oxidative phosphorylation, the synthesis of new proteins and apoptosis without the delay of the transport proteins from nuclear DNA emphasised by the example, of size of and the energy demands and the distance from the nucleus (Kann and

Kovács, 2007). This is seen in ‘giant bacteria’ such as Epulopiscium spp. Which can contain thousands times more DNA then a bacterium such as Escherichia coli owing to its need for local control (Mendell et al., 2008). This suggests that the evolution of the mitochondrial genome is a product driven by adapting to the internal environment of the cell.

1.1.4 Mitochondrial Gene Transfer

The transfer of mitochondrial genes is referred to as endosymbiotic gene transfer (EGT) or

NUMTogenesis (nuclear mitochondria), this process has been reported in 85 eukaryotic genomes including H. sapiens. This is an ongoing evolutionary process in which the mitochondrial genome has reduced to a point in which 98% of the genes required for function are encoded on nuclear chromosomes (Rodley et al., 2012).

EGT and minimal use of mitochondria genetic material between the same and different species has potentially evolved to minimise the probability of mtDNA acquiring nucleotide damage due to 18

Student Number-18022710 the highly oxidative environment of the . It is believed that the mtDNA also mutates faster than the nuclear DNA, although exact figures vary. However, work done on

Drosophila suggest that the mutation rate is 10-70 times greater than the nuclear DNA mutational accumulation lines (Haag-Liautard et al., 2008). This is thought to occur because of ROS formed by electrons leaking from the ETC and reacting with molecular oxygen. Propagated by the local control of the mitochondrial genome’s close proximity to such sites, this results in the mtDNA being susceptible to ROS damage. Furthermore, the mtDNA is not highly compacted and protected by histones. Although, it is also clear that copying errors during replication are a major contribution to the high mtDNA mutation rate (Hahn and Zuryn, 2019).

Mitochondrial-specific polymerase γ is in control of the replication of the mitochondrial genome, while the nuclear genome replication is linked to , mtDNA replication is not and occurs at a much more frequent rate. This replication continues to occur in post-mitotic cells thus, increasing the likelihood of introducing copying errors. However, the polymerase γ does have proofreading capabilities while many of the DNA repair mechanisms that help maintain the nuclear genome appear to be absent or where present less effective within the mitochondria (Yakes and

Van Houten, 1997). Although, both likely play a role in the accumulation of mtDNA damage and the chances of these affecting the function of a vital protein or RNA is high, albeit at a low heteroplasmy. In H. sapiens, such pathogenic mutation are estimated to have a prevalence of 1:200 in the western population (Elliott et al., 2008; Hahn and Zuryn, 2019).

This transfer of mitochondrial DNA has been well established and is an ongoing evolutionary process, in H. sapiens these nuclear insertion events have been estimated to occur at a rate of approximately 5x10-6 per germ cell per generation (Leister, 2005). Key if the EGT is to be passed on to the offspring and not lost in the heteroplasmy of somatic cells. However, over time these integrated sequences have been highly modified due to inversions, deletions, duplications and displaced sequences contrary, some are conversed and relatively resemble their parent mitochondria gene. There appear to be no patterns in the selection of genes for EGT and are

19

Student Number-18022710 randomly selected from different regions of the mitochondria, although, an under representation of the displacement loop region in reported though the reason why remain elusive (Dayama et al.,

2014).

Such EGT events results in between after divergence from a common ancestor, defined by the presence of multiple genes present on the same of any species. This does not require information about relative along an axis or the distance between the genes on a chromosome. This term is utilised in comparative and is used to refer to the presence of the of the same syntenic genes in species which have diverged from a common ancestor which link them (Stein, 2013)

Studies have tried to elucidate the genomic features required for an insertion, using the vast array of mitochondria sequences found within the H. sapiens . Some reportedly identified repetitive elements such as LINEs and Alu as an area of co-localisation of EGT insertions

(Tsuji et al., 2012) while another study found this not to be the case (Gherman et al., 2007). There is also evidence that these EGT insertions are found in open chromatin regions and neat A+T oligomer sequences (Tsuji et al., 2012). However, these studies generally focus on H. sapiens lineages and not the wider eukaryotic family despite of this there are a set of five general rules in which EGT sequences insert, near with no preference orientation of the but tend not to insert inside retrotransposons, within regions where DNA has high curvature with regions of high A+T rich oligomers especially TAT and in open chromatin regions

(Dayama et al., 2014; Tsuji et al., 2012).

The majority of organelle genes including mitochondria are thought to have been transferred early and perhaps rapidly in migrations although now the transfer of mitochondrial genes is thought to be highly discontinuous (Keeling and Palmer, 2008). Although, it appears the mitochondrial genome has been maintained by evolution to encode for highly hydrophobic membrane proteins

(Björkholm et al., 2015). However, for a mitochondrial gene to remain functional in the nuclear genome it requires appropriate promoter and termination sequences to allow expression and pre- 20

Student Number-18022710 sequences that allow the protein product to enter the destined subcellular compartment (Martin and Herrmann, 1998).

Prior to the relocation of mitochondrial genes, the genes must escape the mitochondria, multiple possibilities are proposed such as , division and cell stress all of which result in disruption of the organelle membrane. This then allows for the uptake of mitochondrial DNA via the nuclear import machinery (Kleine, Maier and Leister, 2009). This has been seen in Saccharomyces cerevisiae with inactive Yme1p, an ATP-dependent metalloprotease located in the mitochondria. The inactivation of Yme1p causes degradation of the affected mitochondria within a and increases the incidence of which the amount of mtDNA escapes to the nucleus. Secondly, the inactivation of an inner Yme2p leads to increased relocation of mtDNA to the nucleus (Hanekamp and Thorsness, 1996; Thorsness, White and , 1993). Due to the incidence in which both mitochondrial and DNA is found within the nuclear DNA of plants, it can be inferred that actually this release of DNA is common. In H. sapiens it is thought that this process occurs with the programmed degeneration of organelles during male gametogenesis (Kleine, Maier and Leister, 2009).

Microhomologies of 2 to 5bp were found adjacent to the integration site of both S. cerevisiae and Nicotiana tabacum this suggests that organelle DNA integrates via a nonhomologous end- joining repair of double stranded breaks (DSB) into the nuclear DNA. This repair mechanism requires only a small sequence homology of 0 to 4bp called micro-identities between the ends. This results in the non-complementary ends of the DSB and the organelle DNA to bound together (van

Gent, Hoeijmakers and Kanaar, 2001). This process and micro-identities have been found across eukaryotes and therefor a common phenomenon.

EGT has resulted in a reduction of the mitochondrial genome however, not all loss is based on the movement of functional genes some mitochondrial genes have been lost due to the presence of functional eukaryotic versions as seen in S. cerevisiae where the nucleus encodes for over 400 mitochondrial proteins while only half are of these are bacterial in origin while the rest appear to 21

Student Number-18022710 be of eukaryotic descent (Berg and Kurland, 2000). EGT has also been observed in plastids and it is believed that it promotes the stability of long term by making the symbiont dependent on the host. This transfer of genetic material has shaped the genome of modern eukaryotes however, the rate of EGT, the number of times this has occurred for a given gene remains poorly understood (Petitjean and Williams, 2017).

1.2 Cytochrome C

CytC was once described as a model protein for because of its highly conserved nature and presence within eukaryotes, prokaryotes and archaea (Margoliash, 1963).

Somewhere, along the eukaryotic evolutionary line this protein undergone one or multiple EGT events and can be found nearly entirely on the nuclear genome although it is still present on a few mitochondrial genomes.

1.2.1 Cytochrome C Structure

Proteins are comprised structural units called domains, which are distinct functional units. With regards to these are utilised as building blocks that may be rearranged my recombination to alter functions. However, conserved domains are recurring subunits in molecular evolution that can be determined via sequences and structural analysis. The comparative analysis of these structures can identify of conservation. Seen in CytC, one of the most studied proteins this is possibly due to the high thermodynamic stability and ease of purification due to its red colour. Its small size of approximately 12,000Da which is comprised of 105 amino acids dependent of the organism it is also highly solubility and has a high helical content of 5 helices

(Bertini, Cavallaro and Rosato, 2006; Hannibal et al., 2016). This cationic haemoprotein is characterised by the attachment of a haem to the polypeptide chain via two covalent thioether bonds involving the thiol groups of two cysteine residues and the vinyl group of the haem. This results in a sequence motif of CXXCH in which the cysteine residues generally always occur, the an axial ligand. Rarely there can be three or four X’s which can be any residue except cysteine (Allen et al., 2003). In mitochondrial CytC the N- and C-terminus α-helices are respectively

22

Student Number-18022710 referred to as α1 and α5 helices, the long is referred to as α3 and proceeds another short helix and a loop with contains the second axial ligand methionine in the majority of cases (Bertini,

Cavallaro and Rosato, 2006).

C-type cytochromes can be classified into four groups generally based on the number of haems bound. CytC binds to just one haem group although c-type cytochromes can have several haem groups which make up the III and IV C-type cytochromes whereas, CytC are bound to a single haem and are classified as class I, which comprise the low soluble CytC of mitochondria and prokaryotes where the haem group in attached near to the N-terminus while the second axial ligand is a methionine residue which resides approximately 40 residues closer to the C-terminus. This class is also divided in five subclasses to IE, in which mitochondrial CytC is placed in 1B. Finally, class II is comprised of high spin cytochromes and some low spin cytochromes. Within this class the haem attachment is closer to the C-terminus and the second axial ligand is closer to the N-terminus

(Ambler, 1991; Bertini, Cavallaro and Rosato, 2006).

1.2.2 Eukaryotic Function

As an essential component of the ETC, CytC is translated as apocytochrome prior to translocation into the inner membrane space. When inside it a haem group is attached forms the holocytochrome c via holocytochrome-c synthase (EC 4.4.1.17). CytC is a multifunctional with life and death functions far wider than the ECT and apoptosis (Hüttemann et al., 2011; Wang and Youle, 2009).

The moiety from the haem group and the high isoelectric point of 9.6 is essential for aerobic respiration and enables it to act as the intermediate for the shuttling of electrons from complex III to IV (Wang and Youle, 2009).

As mentioned in section 1.1.1, the mitochondria and CytC are essential for OXPHOS, which is fed by multiple substrates feeding electrons into the ETC these include NADH and FADH2, this transfer of electrons allows a mitochondrial to be formed by proton pumps. The potential is then used to drive complex V a rotary nanomotor to convert the potential into the chemical and rotation energy required to combine inorganic phosphate and ADP to ATP 23

Student Number-18022710

(Hüttemann et al., 2011). CytC is utilised to transfer an electron from complex III thus, reducing

CytC prior to the transfer to complex IV. Within mammals it is this process that is often thought of as being the rate limiting step as the eventual transfer of the electron to oxygen to form water produces ΔGo′ = − 100 kJ/mol of free energy via complex IV, this is twice as much as both complex

I and III because of this and the irreversible reaction must be tightly regulated to reduce ROS via phosphorylation’s at residues such as 28 (threonine), 47 (serine) and 48 (tyrosine) lowering CytC and complex IV interactions. This theorem is also supported by the evolutionary changes in complex

IV in which 300 amino acids out of 1500 have been replaced in anthropoid where charged residues in the CytC binding site have been replaced with uncharged hydrophobic residues decreasing the affinity to meet species specific energy demands (Kalpage et al., 2020; Pierron et al.,

2012).

First published in 1996, CytC was shown to play a crucial role in apoptosis in which the key step is identified as the molecular changes of CytC (Krippner et al., 1996; Liu et al., 1996), since then it has become apparent that it is the release of CytC into the cytosol where its subsequent actions were discussed in section 1.1.1. Although there are exceptions such as in which CytC stays associated to the mitochondria (Dorstyn et al., 2002). It has also been suggested that released CytC initiate apoptosis by binding to inositol triphosphate receptors resulting in calcium release into the cytosol resulting in calpain activation and apoptosis inducing factor release (Cao et al., 2007).

However, it has been shown isolated mitochondria can take up and release CytC thus, restoring their function during re-uptake. This selective release of CytC does not necessarily involve the rupture of the membrane or the opening of the mitochondrial permeability transition pore. Due to this the release of CytC release during apoptosis is not immediate kill switch and a CytC threshold must be obtained before apoptosis events become irreversible (Chalmers and Nicholls, 2003).

Therefore, CytC null cell lines show lower capase-3 activation during apoptosis stimulation indicating its essential role during the cascade. Studies have shown that injections of taurus

CytC restored the ETC in a model for (Piel, Deutschman and Levy, 2008; Piel et

24

Student Number-18022710 al., 2007). However, these results show CytC can translocate across the cellular membrane via cell penetrating epitopes within the N- and C-terminal helices and the injection of CytC did not induce apoptosis in the cytosol. It was speculated that the difference between the M. musculus and

B. taurus CytC sequence resulted ineffective binding to the to Apaf-1 however due to apoptosis being able to be induced between prokaryotes and eukaryotes (Hiraoka et al., 2004) this could potentially be explained via phosphorylation by serum kinase’s prior to entry to the cell similar to what was witnessed as a control in OXPHOS (Hüttemann et al., 2011; Kalpage et al., 2020).

CytC is involved in the import of mitochondrial proteins contain CX3C and CX9C motifs via redox- coupled import. This is done in connection with the proteins Erv1 and Mia40, this import occurs via translocases of the outer membrane and requires post-import modification such as the alternative folding induced by the creation of disulphide bridges. The import of translocases of the inner membrane are subject to Mia40 oxidation and the reactivation of Mai40 is dependent on the oxidation by Erv1 which in tune utilises CytC activity for its re-oxidation. This cascade of redox reactions is a route of an electron transfer from Mia40 to CytC and ultimately into the ETC

(Hüttemann et al., 2011).

ROS are avoided in cells by multiple means normally via functional enzymes however, these are limited by their kinetic reaction rates and some ROS such as hydroxyl radicals which are too reactive to be neutralised effectively via an enzymatic pathway due to half-life of approximately 1ns.

However, it has been shown that free CytC within the inner membrane space acts as radical scavenger via the removal of unpaired electrons from to form 02 (Pereverzev et al.,

2003). This extracted electron can then be utilised in the ETC via the transfer to complex IV and therefore restoring the oxidised form of CytC (Hüttemann et al., 2011). However, CytC can be associated with increased ROS formation during specific conditions in which a splice variant of the growth factor adapter Shc termed p66shc found throughout the cell although localized in the mitochondria where it can affect function. Regulated via reversible phosphorylation of residue 36

(serine). The phosphorylation of this site increases with age in multiple organs which leads to

25

Student Number-18022710 increasing numbers of ROS production leading to the accumulation of oxidative damage seen in aged mice (Lebiedzinska et al., 2009). Under stress conditions this is exacerbated as mitochondrial p66sch is oxidized by CytC and subsequently produces , Therefore, it is possible that phosphorylation of CytC may be utilised in the cell to combat ROS formation (Hüttemann et al., 2011).

There is also a unique isoform first identified in the testicular cells of and shares 86% homology with its somatic variant however the expression and functions are distinct. As proceeds the testicular isoform of CytC expression increases to become the predominant form which operates at a three-fold activity in hydrogen peroxide reduction and significantly increase apoptotic activity. Its proposed this is to tightly regulate the integrity of to ensure efficient DNA transmission (Liu et al., 2006). However, this isoform has been lost during the evolution of primates and is a pseudogene in H. sapiens. Therefore, H. sapiens somatic CytC must function as the ’s testicular isoform and may explain the evolution of CytC and complex

IV binding affinity to prevent ROS (Hüttemann et al., 2011). Therefore, for this research only somatic forms of CytC were considered due an increase conversed function.

1.2.3 Prokaryotic Function

While the eukaryotic CytC present in the mitochondria receive the vast amount of attention it is clear that prokaryotic CytC are much less researched despite the fact they are much more diverse in structure and function. Although some of these contain multiple haem groups and can be classified as c-type cytochromes due to their functions being diverse and can be involved in extraordinary ETCs that are present in multiple prokaryotic species. Although many species have the ‘classical’ aerobic respiration chain in which the mitochondria were derived from. Although these cytochromes can be utilised in or the reduction of varying terminal electron acceptors such as nitrates, dimethyl sulfoxides and metals because of this bacteria utilises various energy metabolism methods including phototrophs, methylotrophs, denitrifiers and sulphate

26

Student Number-18022710 reducers (Richard-Fogal et al., 2007). It is these oxidation and reduction reactions utilised by bacteria that have had profound impacts on the environment (Kranz et al., 2009).

Although apoptosis is believed to be a major eukaryotic invention, which appeared not to be directly related to direct prokaryotic predecessors. As bacterial cells can commit suicide during certain circumstances such as fruiting body formation in Myxobacteria. It is apparent these mechanisms are not essential for the survival of prokaryotes and their molecular cascade appears to be differ to the eukaryotic apoptotic machinery and does not utilise CytC (Lewis, 2000). However, as the mitochondria is often defined as the principle sensor in cellular damage and apoptosis it reasonable to conclude that eukaryotic apoptosis is a distant evolutionary product of bacterial proteins. In which scavenged bacterial genes such as which would become the core of the eukaryotic caspases and such core components would undergo further linage specific specialization such as the expansion of caspases in . Around such ‘early’ protein and domains were exapted as they were built upon to achieving different function thus, allowing CytC a major of mitochondrial damage to incorporated into the apoptosis cascade (Koonin and

Aravind, 2002).

1.2.4 Cytochrome C Similarity

CytC is regarded as a highly conversed protein that spans across a breadth of species from domains eukaryote, archaea and indicating little change over millions of years of evolution (Keya and Priya, 2016). However, despite prokaryotic CytC often being much larger when compared to their eukaryotic counterpart and often characterised as c-type cytochromes and as research develops it is becoming apparent that there are vast variations in prokaryotic CytC structures. These can contain multiple haem groups that function in within the diverse ETCs of prokaryotes, although it is mitochondrial CytC that the spotlight fall upon (Kranz et al., 2009).

Despite this variation there is plenty of evidence that suggests the function regions of these protein remains conversed across millions of years of evolution and domains, one example has already been covered, in which B. taurus CytC was able to restore the ETC in a M. musculus model 27

Student Number-18022710 for sepsis (Piel, Deutschman and Levy, 2008; Piel et al., 2007). The next involves the use of

Pseudomonas aruginosa’s CytC termed cytochrome C551 which is utilised during the ETC during denitrification. However, this protein is able to enter J774 cell-line derived macrophages and induce apoptosis (Hiraoka et al., 2004). This is not only apparent in apoptosis but is seen during the ETC with the interactions of CytC and complex IV. It has been shown that CytC from 5 prokaryotes and

7 eukaryotes are able to bind and react with other species complex IV enzymes. This was seen as

Thiobacillus novellus’ enzyme able to react readily with the CytC of krusei, sarda and

Thunnus thynnus. While the bovine enzyme was able to react with all eukaryotic CytCs. Contrary, to P. aeruginosa’s CytC being able to induce apoptosis in macrophages, it shown limited reaction to complex IV enzymes potentially elucidating that the dentification process may require an altered structure less compatible with aerobic respiration. However, on the whole these results indicate that despite these variation in functions and size function regions must be conversed across domains to induce activity (Yamanaka and Fukumori, 1981).

1.2.5 Cytochrome C’s Molecular Clock

Despite the conversed nature of CytC creationist have indicated potential pitfalls with this conversed theory of evolution due to comparisons in amino acid sequences showing that H. sapiens

CytC is more closely related to the mississippiens when compared to another ,

Otolemular garnetti. Where H. sapiens share 87.62% similarity with an A. mississippiens or when compared to 86.67% with another primate (O. garnetti). However, this variation is caused by the molecular clock of the CytC gene varying over time (Hofmann, 2017).

The molecular clock is a measure of non-synonymous substitutions between sequences of the same gene and can be used to measure the time since the divergence from a common an ancestor between the genes and therefore, the elapsed time is equal to the number of differences. However, is it apparent that this variation fluctuates and CytC has undergone large peaks of genetic replacement at approximately 25-40 and 40-90 MYA (Hofmann, 2017). In work done by Goodman

(1985), compared nucleotide substitution data rather than amino acid substitution data to ensure

28

Student Number-18022710 the capture of and expressed their results in units of nucleotide replacements per

100 codons per 100 million years. It was identified that the substitution rate had peaked between

40 and 90 MYA where it had reached an average of 17.3 nucleotide replacements per 100 codons per million years and because CytC is approximately 100 amino acids in length so requires 100 codons. Therefore the 17.3 replacement rate per 100 codons corresponds to a 17.3% rate of change and this period of change corresponds to the origins of placental mammals through to the divergence of new world monkeys. After this between 40 and 25 MYA the rate had dropped to

12.6% and plummeted to 1.9% in the time after 25 MYA, when had diverged from old world monkeys. This research had identified that the highest substitution rate occurred during the periods crucial for early primate before falling, this sudden drop was referred to by Goodman as

‘hominoid slowdown’ (Goodman, 1985; Hofmann, 2017).

These variable nucleotide substitution rates of CytC correlate to similar variation in rates of nucleotide substitution of other components of the ETC specifically complex IV, in which CytC is in direct interaction with during OXPHOS. In 2004 research shown increased substitution rates of

COX4-1 a subunit of complex IV correlated with both periods of heighten substitution rates. This shows that multiple proteins in the ETC have a substitution rate linked to increase following the divergence of anthropoid monkeys from and thus, explaining the variation In H. sapiens and

O. garnetti similarity (Grossman et al., 2004).

1.3 Phylogenetics

CytC has been described as both a good and a bad example for the phylogenetic analysis, it is considered a good example due to small size and highly conserved structure and function across domains makes this protein useful for the study of cladistics (Keya and Priya, 2016). While it has been shown previously to show unexpected similarities due to the variations in the molecular clock of CytC (Hofmann, 2017).

29

Student Number-18022710

1.3.1 Cytochrome C in Phylogenetics

Molecular phylogenetics uses DNA and protein sequences from variable species or populations to infer their evolutionary relationships which are represented as a phylogeny or tree, this has transformed the landscape of . Originated in the 1960s when protein sequences became available and continued to grow as genomic data became readily available.

Phylogeny is determined using a ubiquitous protein or gene this ensures that comparisons are independent of the overall phenotype (Keya and Priya, 2016). Phylogenetics is no longer restricted by Darwinian and Mendelian models of vertical gene transfer but can also consider the significance of lateral gene transfer (Sleator, 2011).

Research has taken advantage of CytC due to it being evolutionarily conserved across a wide range of species and therefore, has been utilised to establish homology and phylogenetic relationships between species, with research focusing on multiple alignments of the amino acid sequence and not the nucleotide sequence (Keya and Priya, 2016), this allows for the differences in codes between the mitochondrial based sequences and the nuclear based sequences. However, this method will in turn result in missing synonymous mutations. Despite this amino acid sequences are repeatedly used to establish phylogenetic relations between species.

CytC is ubiquitous and therefore ensures the comparisons and independent of the species phenotype. Research done in 2016, compared the structure and function of CytC to the phylogeny inferred by the sequence alignments. It appeared that the protein was functionally redundant, inferring dissimilar CytC sequences form a general structure and perform the same biological roles.

Although, this does not relate to performance as different CytC sequences have been shown to be more efficient electron transporters. From this information its suggested that if two different organisms share similar protein sequences are genealogically related (Keya and Priya, 2016).

From the construction of phylogenetic trees of CytC it is shown that H. sapiens and CytC sequences are much more similar then Candida albicans. This is due to the inheritance and the homologous similarities of CytC between wide ranges of species indicating an 30

Student Number-18022710 early common ancestor. Coupled with the similar function conversed by CytC and that the phylogenetic trees constructed generally match other phylogenies such as the use of the most frequently used genes to create phylogenetic trees, 18S rRNA (Meyer et al., 2010). This indicates that CytC can be utilised as molecular evidence of evolution and has been utilised since 1967 where

Fitch and Margoliash constructed phylogenetic trees from CytC (Fitch and Margoliash, 1967), while results from such phylogenies matches the proposed theory of by which states that a group of organisms have a common decent if they have a common ancestor.

It is apparent that the use of CytC in bacterial species is rarely undertaken, this can be accounted to firstly, the use of the 16S rRNA which is used as a standard and coupled with smaller and the use of to identify strains within species (Tyler et al., 2018).

Secondly, that CytC sequences with bacterial are dependent on their roles with the ETC and the terminal electron acceptor (Richard-Fogal et al., 2007).

1.3.2 How Phylogenetic Trees Work

Generally, a phylogenetic tree is a diagrammatic representation of biological species that are connected through common decent. Trees consist of branches and nodes; the branch represents the persistence of the species through time and for some phylogenetic trees the branches represent the degree of divergence from within the sequence compared to the ancestral sequence. This can infer longer branches indicate more substitutions but this is tree dependant and this information can be represented in numbers representing the number of substitutions per length of sequence or a percentage representing the change per 100 unit. Other lines such as vertical lines show connections between main branches and does not infer evolutionary branches. While the nodes can be split into internal nodes which represent a theoretical most recent common ancestor and are considered to be older then nodes residing closer to the external nodes which are represent the species related to the chosen sequences (Godini and Fallahi, 2019).

There are two types of phylogenetic trees, rooted and unrooted. Generally, to construct a rooted tree an of one or more very close but distinct organisms into the analysis. (Huelsenbeck, 31

Student Number-18022710

Bollback and Levine, 2002). Secondly, trees can be also rooted without the use of an outgroup this uses evolutionary parsimony a mathematical model based on the balanced transversion assumption of mostly purines to pyrimidines produce approximately the same number of pyrimidines to purines (Sinsheimer, Little and Lake, 2012). The main concept of rooted trees is to infer evolutionary relationships and find the most common ancestor. While unrooted trees cannot infer the direction of evolution, only the relationship between samples. Both sets of trees can also include a molecular clock, which will show the rate of evolution in DNA and amino acid sequences over time (Godini and Fallahi, 2019). For a full review into the construction of phylogenetic trees refer to Godini and Fallahi (2019).

As mentioned above not all phylogenetic trees can infer true evolutionary history of the organisms chosen, even by using the appropriate rooted tree anomalies arise. These anomalies are often interpreted as cryptic species however, this may be due to varying levels of intraspecific variation which is observed between a single gene in as few as two individuals. Due to this it the standard is to use multiple genetic loci to infer phylogeny. Although, the use of a single gene is not prohibited such as the 18S and 16S rRNA as this standardizes the loci, reduces cost, time and complexity (Tobe,

Kitchener and Linacre, 2010).

1.3.3 Maximum Likelihood Phylogenetic Trees

Phylogenetic trees are created based on the different types of tree you can choose each with their drawbacks, three common tree building methods are distance, maximum parsimony and maximum likelihood (Hall, 2013). However, for this research will focus maximum likelihood phylogenetic trees because they show the highest harmony with the typical taxonomic grouping. These trees also show minimum amount of inconsistencies when they are compared with traditional morphological forms, other molecular phylogeny studies and expected relationships between species (Tobe,

Kitchener and Linacre, 2010).

Maximum likelihood trees evaluate the probability of hypothetical trees in predicting the relatedness of the chosen sequences and the final tree constructed has the highest probability to 32

Student Number-18022710 predict the variation with the sequences. Evolutionary models such as the different forms of mutations are utilised to calculate the probability (Vandamme, 2009). This informs us about the observable and ancestral evolutionary events in the sequence that have occurred during the tree with substitutions, replacement models for both nucleotides and amino acids are key in the formation of a maximum likelihood trees (Le and Gascuel, 2008). These are often coupled with the statistical analysis of bootstrapping to detect which are more frequent.

The advantages of using a maximum likelihood trees are that evolutionary models are utilised for the analysis and its ability to study sequence evolution and evolutionary hypotheses such as molecular clocks via likelihood ratio test and protein evolution. However, its drawbacks are that it is computationally demanding calculations and time consuming and the potential to be statistically weak if incorrect choices of species are made (Godini and Fallahi, 2019).

1.4 Projects Aims, Objectives and Hypotheses

Currently there is no research published identifying the EGT of CytC, this research aims to study evolutionary history of CytC using phylogenetic trees and the analysis of the structure, similarity and function in both eukaryotes and prokaryotes. To help elucidate the history of CytC which was originally of α-proteobacteria origin and via EGT relocated to the genomic DNA from the mitochondria showing syntenic regions between different organisms around the CytC gene, and estimating a time such an event occurred.

This research has multiple objectives, as followed.

• The collection of various CytC amino acid sequences from the National Centre for

Biotechnology Information (NCBI) GenBank comprising of a wide range of taxa represents

the eukaryotic and prokaryotic domains comprising of both chromosomal and

mitochondrial genomes. This information will also comprise of other genetic information

such as the genes flanking CytC sequences and chromosomal location.

33

Student Number-18022710

• Use of BioEdit software to create a measure of variability graph based on a measure of

entropy, as this increases with variation.

• The structure of cytochrome c will be mapped in 3D showing the location of higher and

lower variations using Consurf.

• Using Molecular Evolutionary Genetic Analysis (MEGA) software, the amino acid sequences

will be aligned to analyse the differences in the sequences. The main intention of this

alignment is for the creation of the maximum likelihood phylogenetic tree with the use of

bootstrapping for the statistical analysis to understand the percentage confidence in each

clade.

• MEGA’s translate function will be utilised to identify if differing genetic codes between

nuclear and mitochondrial encoded CytC can infer differences in encoded locations of the

CytC gene.

• With the use of Genome Evolutionary Analysis tool, CoGe for the syntenic analysis to

identify regions of conserved DNA around the CytC gene to infer a shared EGT event

between two organisms coupled with TimeTree to estimate the divergence therefore,

indicating the time of the EGT event.

The research hypothesis states that the EGT of CytC from the mitochondrial genome to the nuclear genome has occurred once during the evolutionary history of the phylum Chordata and its mitochondrial genome.

34

Student Number-18022710

2.0 Methods and Materials

This research is based on the methodology utilised in the phylogenetic analysis of similar studies by Keya and Priya (2016) and Torktaz, Behjati and Rostami (2016) on their respective proteins CytC and otospirallin.

2.1 How the Sequences Were Chosen

Amino acid and nucleotide sequences of CytC from both mitochondrial genome and nuclear genome of species utilising oxygen as the final electron acceptor, 30 amino acid sequences divided in to 20 eukaryotes, 10 prokaryotes, 5 of which are α- proteobacteria and 0 Archaea due to being unable to identify archaea utilising oxygen. Nucleotide sequences were obtained for CytC of

Dermacentor variabilis and appendiculatus. All sequences were obtained from NCBI

GenBank (https://www.ncbi.nlm.nih.gov/genbank/). The accession numbers, scientific and common name of the obtained sequences are detailed below in table 1.

All CytC sequences were checked to ensure it contains only one CXXCH motif and homology to other CytC sequences was checked using Basic local alignment search tool (BLAST) to ensure these sequences were not C-type cytochromes.

35

Student Number-18022710

Table 2. Summarisation of CytC Sequences. Accession numbers of all 30 CytC sequences obtained from NCBI GenBank including scientific and common names, (mt) denotes sequences on the mitochondrial genome and (α) denotes prokaryotes of the α-proteobacterium lineage.

Accession Number Scientific Name Common Name

NP_061820.1 Homo sapiens

NP_001124639.1 Pongo abelii Sumatran

NP_001183974.1 lupus familiaris

NP_001039526.1 Bos taurus

NP_001072946.1 Gallus gallus Red

XP_010397328.1 cornix cornix

XP_007420882.1 bivittatus

JAG45171.1 horridus Timber

NP_001291051.1 lucius

NP_001002068.1 rerio

NP_001240955.1 Glycine max Soybean

XP_008795309.1 Phoenix dactylifera Date Palm

NP_001298995.1 xuthus Asian Shallowtail

XP_026726003.1 ni

QEO21489.1 Candida

AAB50255.1 N/A

XP_029837279.1 scapularis

JAP85941.1 Rhipicephalus appendiculatus Brown Ear Tick

ACL36921.1 Conocephalum conicum Great Scented Liverwort (mt)

AAY86488.1 variabilis American Dog Tick (mt)

POA07372.1 lentus N/A

WP_100225714.1 Escherichia coli N/A

WP_045781414.1 Klebsiella michiganensis N/A

RMW73485.1 Serratia marcescens N/A

ACA83734.1 Rhodothermus marinus N/A

RPF74376.1 Rickettsiales bacterium TMED289 (α) N/A

OUV53009.1 Rickettsiales bacterium TMED127 (α) N/A

WP_011391249.1 Rhodospirillum rubrum (α) N/A

WP_137882326.1 Rhizobiales bacterium 2SD (α) N/A

WP_008555532.1 Rhodobacterales bacterium Y4I (α) N/A

36

Student Number-18022710

2.2 Databases Utilised

GenBank maintained by the NCBI (Benson et al., 2007), was utilised for obtaining the amino acid and nucleotide sequences data in FASTA format as described above in section 2.1.

Protein Data Bank (https://www.rcsb.org/) for macromolecular 3D structures is the primary database for 3D structures used along with the structural classification of proteins to classify 3D protein structures in a hierarchical scheme of structural classes (Parasuraman, 2012) was utilised alongside ConSurf for the accusation of the 3D structure of both eukaryotic and prokaryotic CytC.

Genome Evolutionary Analysis (CoGe, https://genomevolution.org/CoGe/) database was used to identify similar organisms that were utilised within the initial creation of the phylogenetic tree. This was undertaken due to the shortage of sequences in the CoGe database available that matched sequences from GenBank although these were chosen specifically to identify with the clades within the phylogenetic tree.

2.3 Software and Tools Utilised

BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi) for single sequence alignments (Altschul et al.,

1990), utilises its own algorithm for the comparison of single sequences which results in bits scores and E-values. The bit score represents how good the alignment is while the E-value indicates the number of hits that can be expected to be seen by chance when searching the database for a particular size. No cut of values was utilised, BLAST was used to ensure that the sequences shared homology to other CytC sequences and no other larger or diverse c-type cytochromes.

The ConSurf server (https://consurf.tau.ac.il/) linked to the Protein Data Bank to identify the functional regions of proteins by mapping how variable regions of the amino acid sequences are during alignment (Glaser et al., 2003). The same method was utilised independently for both eukaryotic CytC of H. sapiens origin (Protein Data Bank reference 6ECJ) and prokaryotic CytC of

Rhodothermus marinus (Protein Data Bank reference 3CP5). Using the ConSurf to form a multiple sequence alignment a homolog search was under taken with the following paraments, HMMER

37

Student Number-18022710 homolog search algorithm, number of iterations 1, E-value cut off of 0.0001 and the UNIREF-90 protein database was utilised. The alignment method utilised was MAFFT-L-INS-i. ConSurf then automatically selected 150 homologs with a maximum percentage identification between sequences of 95% and minimum of 35%. The Bayesian calculation method and the default best model of evolutionary substitution was used.

BioEdit version 7.2.5 ( Biosciences) creation of an Entropy plot to indicate the variation in the different positions of a multiple sequence alignment (Hall, 1999). A separate multiple sequences alignment, aligned via multiple sequence alignment using command-line (MUSLCE) using MEGA

(discussed below) for both eukaryotes and prokaryotes was uploaded onto the BioEdit software prior the creation of two entropy plots.

MEGA-X version 10.1.8 was utilised for the alignment of CytC sequences and the construction of the maximum likelihood tree (Kumar et al., 2018). Multiple sequence alignments were undertaken using MUSCLE algorithm for eukaryotes, prokaryotes and a combination of eukaryotes and prokaryotes. The MUSCLE alignment option was set as followed, gap penalties, gap open -2.90, gap extend 0.00, Hydrophobicity multiplier 1.20. Maximum memory usage was set at 2048MB and max iterations 16 and the Cluster method was set at UPGMA. For the phylogeny reconstruction a maximum likelihood tree was created using 500 replicants of the Bootstrap method. The substitution model was set to Jones-Taylor-Thornton (JTT) model for amino acid substitutions and the rate among site was set to uniform and all sites were utilised including gaps and missing data.

The tree inference options, Maximum likelihood heuristic method was set to nearest-neighbour interchange (NNI).

CoGe was utilised to identify two or more genomic regions that are considered syntenic. For this, regions of genomic DNA sequences of 200,000 bases 3’ and 5’ of the CytC gene were obtained from

CoGe were analysis using the (B)LastZ: Large Regions alignment algorithm was used under the following conditions, Gap start penalty 400, Gap extend penalty 30, no chaining, score threshold set at 3000 and Mask threshold of 0. 38

Student Number-18022710

TimeTree (http://timetree.org/) was used to estimate the time of divergence from two organisms in which have been utilised in the syntenic analysis (Kumar et al., 2017). This is achieved by the online tool compiling information from multiple published phylogenetic analyses to give an estimated mean time of divergence and infer a date in which the EGT must have already occurred.

2.4 Statistical Analysis

Bootstrapping is a statistical method that checks the branch topology of a phylogenetic trees, it performs this by re-sampling columns in multiple aligned sequences creating new alignments replacing the original set. Normally done 100 times, each building a new phylogenetic tree and the value given to each branch point is the number of times the branch was build. Therefore, the higher the branch points occur the higher the number, normally values of 90-100 are considered statistically significant. This method is crude but applicable to all the tree building methods.

Although, it is shown to be conservative in significance but because evolution does not follow mathematical models this analysis is preferred (Holmes, 2003).

39

Student Number-18022710

3.0 Results

Prior to the analysis of the results the collection of supplementary results collected from GenBank regarding the mitochondrial genome and the locations of the CytC gene including information of the flanking genes for the eukaryotic organisms where CytC is present on the nuclear genome. The results of which can be seen in table 2.

Firstly, shown in table 2, the majority of the mitochondrial genomes that information was accessible for encoded for 13 proteins the exceptions being, plants and Candida auris which coded for 14 proteins. Next the chromosomal location of the CytC gene varied greatly between all taxa.

Mammals, and reptiles shared a CytC length of 105 amino acids and shown similar flanking genes such as neuropeptide VF (NPVF) and a common open reading frame (ORF) referred to as which shown homologs. Multiple organisms on the 3’ end of CytC shown binding protein like 3 (OSBPL3) and gasdermin E (GSDME). In some cases, such as H. sapiens GSDME was displaced by an uncharacterised non-coding . The other taxa utilised, shown much greater variation in CytC length, and the genes flanking CytC. Interestingly, the GenBank data on the Danio rerio shows the presence of two CytC genes referred to as cycsa and cycsb. Whereas, cycsa shows similarities to the H. sapiens flanking genes of NPVF, OSBPL3 and GSDME only missing C7orf31 homolog. Compared to cycsb which at the 3’ end of CytC resembles the Esox lucius due to the presence of the nuclear factor, erythroid 2-like 2b (nfe2l2b) gene. This will be utilised later in section

3.6 with regards to the CoGe analysis.

40

Table 3. Summary of Cytochrome C and Mitochondrial Data in Eukaryotes. A collection of data from the nuclear genome of the eukaryotic organisms utilised in this study to indicate any conversed evolutionary ties that may be identifiable when placed on the Maximum likelihood tree. A question mark was used to denote areas of missing genetic information seen in the number of mitochondrial encoded genes, chromosomal location and CytC flanking genes.

Scientific Name Number of Mitochondrial Chromosomal Location Genes Flanking Cytochrome C Length of Cytochrome C Encoded Protein Genes (number of Amino Acids)

Homo sapiens 13 7p15.3 5’-RPL7AP41-C7orf31-CytC-OSBPL3-Uncharacterised ncRNA-3’ 105 Pongo abelii 13 7 5’- NPVF-C7H7orf31-CytC-OSBPL3-GSDME-3’ 105 Canis lupus familiaris 13 14 5’-NPVF-C14H7orf31-CytC-Uncharacterised ncRNA-OSBPL3-3’ 105 Bos taurus 13 4 5’-NPVF-C4H7orf31-CytC-OSBPL3-EID1-3’ 105 Gallus gallus ? 2 5’-NPVF-C2H7orf31-CytC-OSBPL3-DFNA5-3-3’ 105 Corvus cornix cornix ? 2 5’-NPVF-CUNH7orf31-CytC-Uncharacterised ncRNA-OSBPL3-3’ 105 Python bivittatus 13 3 5’-NPVF-CUNH7orf31-CytC-? -3’ 105 Crotalus horridus 13 ? ? 105 Esox lucius 13 16 5’-casp10-prkra-CytC-pde11a-nfe2l2a-3’ 104 Danio rerio 13 6 (a) 5’-hoxa1a-npvf-CytC-osbpl3a-gsdmea-3’ 104 (b) 5’-notum2-ccr7-CytC-nfe2l2b-lnpk-3’ Glycine max 88 (this includes tRNA and 9 5’-acp1-nai1-CytC-zip4-str4a-3’ 112 uncharacterised ORFs) Phoenix dactylifera 44 (this includes ? 5’-pcmp-e67-Uncharacterised Protein-CytC-gba2-sgr2-3’ 113 uncharacterised ORFs) ? ? 5’-cyb5r4 -Uncharacterised Protein-CytC-Uncharacterised Protein- 102 Uncharacterised Protein-3’ Trichoplusia ni 13 1 5’-Uncharacterised Protein-Uncharacterised Protein-CytC-senp8- 108 cyb5r4-3’ Candida auris 14 2 5’-rap1-cwc22-CytC-Uncharacterised Protein-Uncharacterised Protein- 110 3’ Aspergillus nidulans ? ? ? 113 ? ? 5’-wdr75-mmtag2-CytC-4Cl1-Uncharacterised-3’ 109 Rhipicephalus ? ? ? 109 appendiculatus 3.1 Cytochrome C Structure and Function

Using PDB for eukaryotic origin CytC (H. sapiens- 6ECJ) and prokaryotic origin (R. marinus-3CP5) was entered as a query into ConSurf to analyse the variation with in these proteins by lining up homologs to determine areas of variation within the 3D structure.

Seen below in figure 3 A and D for both prokaryotic and eukaryotic the cartoon model shows the characteristics helices are conversed in both H. sapiens and R. marinus CytC although they appear to form slightly different configurations. figure 3 B and E using a space fill model in which 150 homologs have seen identified and aligned to show that in both prokaryotes and eukaryotes show similar structure with the more conversed residues of the protein identified in the centre and the more variable regions are on the outer side of the protein. This is confirmed by figure 3 C and F in which the variable regions have been removed and the highly conversed residues are shown which are to around the ligand, highlighting the conversed function of the haem group and the outer regions of this protein are more variable due to the binding to their own complex IV respectively in which they share an evolutionary relationship with. Student Number-18022710

A D

B E

C F

Figure 3. Cytochrome C Variation in 3D Structure. (A) Rhodothermus marinus CytC showing the structure with in a cartoon model and the ligand in ball and stick form taken from the PDB (3CP5). (B) Rhodothermus marinus CytC showing the structure with in a space fill model and the ligand in ball and stick using ConSurf to identify 150 homologs. (C) Rhodothermus marinus CytC showing the functional region with in a space fill model and the ligand in ball and stick using ConSurf to identify 150 homologs identifying the highly conversed residues. (D) Homo sapiens CytC showing the structure with in a cartoon model and the ligand in ball and stick form taken from the PDB (6ECJ). (E) Homo sapiens CytC showing the structure with in a space fill model and the ligand in ball and stick using ConSurf to identify 150 homologs. (F) Homo sapiens CytC showing the functional region with in a space fill model and the ligand in ball and stick using ConSurf to identify 150 homologs identifying the highly conversed residues.

43

Student Number-18022710

3.2 Cytochrome C Sequence Alignments

MUSCLE multiple sequence alignments were compared in a single diagram with the that the sequences are aligned on top of each other in a coordinated system. The sequences of 20 eukaryotic

CytC was taken from table 2 and aligned via MUSCLE the same was undertaken with 10 prokaryotic sequences associated with the origins of mitochondria and common aerobic bacteria.

Using screenshots from MEGA for figure 4 A and B. Figure 4A shows all 114 amino acid positions within the alignment, the results show high level of similarity and amino acids conserved within all

20 sequences are indicated with an Asterix. Amino acids that are different but are comprised of similar R-groups indicating similar functions are shown in matching colours. From this it can be inferred that eukaryotic CytC is highly similar. There is a clear CXXCH sequence motif present at the beginning of the CytC sequence (highlighted via a black box). When compared to the prokaryotic sequence alignment (figure 4B) showing 219 amino acids show large areas of gaps with in the sequences an indicator that CytC is variable with in prokaryotes, backed up with no Asterix’s due to no conserved amino acids.

44

A

B

Figure 4. MUSCLE Sequence Alignment of Eukaryotic and Prokaryotic Cytochrome C. (A) MUSCLE alignment of 20 eukaryotic CytC Sequences showing all 114 amino acids. (B) MUSCLE alignment of 10 prokaryotic CytC sequences showing the all 219 amino acids sequences due to the greater variation. Asterix’s above the aligned amino acid sequences indicate amino acids that are conserved across all species sequences (non-shown in prokaryotic sequences). Amino acids highlighted in the same colour indicate similar R-group and amino acid classification and gaps in the alignment are shown using a dash. Both black boxes indicate the areas of the CXXCH motif. 3.3 Cytochrome C Sequence Variation

Using the same sequence alignments shown above in section 3.2 entropy plot was constructed to highlight areas of variation as entropy increases with increase amounts of variation within the

MUSCLE sequence alignment of both eukaryotic and prokaryotic CytC. When the entropy plot touches or passes the of 1 it is considered to a highly variable.

The variable parts of eukaryotic CytC taken from 20 sequences 5-10, 12-14, 19, 21-22, 25, 32,

54, 60, 64, 70, 72, 75-76, 93, 98-99, 102, 110 and 113-114 peaking at two points 13 and 99 seen in figure 5A. This indicates areas where mutations are most likely to have occurred during the evolutionary process. Contrary, to indicating how variable sections of the protein are this can be utilised to identify areas in which no variation in amino acids are present this data matches the

Asterix utilised in section 3.2 of the sequence alignment. These positions are as followed 20, 24, 27-

28 (24, 27,28 make up the CXXCH motif), 30, 37, 39-42, 44, 47-48, 51, 55, 58, 61-63, 69, 77-78, 81-

86, 88-90, 92, 94, 96, 101, 103-104 and 108. Although these positions only match up to the alignment and not the individual positions of each CytC sequence.

The variable amino acids taken from 10 prokaryotic CytC sequences are as followed 1-27, 60-74,

93-98, 100-111, 113-123, 125-141, 151-169, and 173-200. Although in figure 5B the entropy does show areas of no variation this is not due to highly conserved amino acids but merely gaps in the alignment seen in figure 4B.

Student Number-18022710

A

B

Figure 5. Cytochrome C Eukaryotic and Prokaryotic Entropy Plot. (A) Entropy plot of 20 eukaryotic CytC sequences in which peaks that touch or pass the scale of 1 are considered variable, these areas are from positions 5-10, 12-14, 19, 21-22, 25, 32, 54, 60, 64, 70, 72, 75-76, 93, 98-99, 102, 110 and 113-114. (B) Entropy plot of 10 prokaryotic CytC sequences in which peaks that touch or pass the scale of 1 are considered variable, these areas are from positions 1-27, 60-74, 93-98, 100-111, 113-123, 125-141, 151-169 and 173-200.

47

Student Number-18022710

3.4 Mitochondrial and Nuclear Cytochrome C Sequence Comparison

Using nucleotide mRNA sequence data for the mitochondria encoded CytC of the D. variabilis

(accession number DQ084330.1) the sequence were inputted into MEGA and translated using the standard genetic code of the nuclear genome and the mitochondrial invertebrate mitochondrial genetic code to identify is any variation in the sequence that may indicate why the sequence has remained encoded by the mitochondria.

The results shown below in figure 6 indicate the reason for the D. variabilis still being encoded via the mitochondria is not to do with the differing genetic codes utilised by the by the nuclear and invertebrate mitochondria coding as they match expect at position 101 in which the mitochondrial sequence encodes for a methionine while the standard code encodes for isoleucine both non-polar amino acids. A comparison of nucleotide sequence between the D. variabilis and a relative sharing the same family , the R. appendiculatus (accession number GEDV01002616.1) nuclear encoded CytC shown only one amino acid variation in position 2 where was changed for glycine in the D. variabilis and few differences in the nucleotide sequences.

48

Figure 6. Comparison of Cytochrome C under Different Genetic Codes. A comparison the mRNA of the mitochondrial encoded CytC of the Dermacentor variabilis translated under the standard genetic code and the invertebrate mitochondrial genetic code. This shows the only difference is at position 101 in which methionine is replaces with isoleucine both non-polar amino acids. 3.5 Cytochrome C Phylogenetic Tree

The evolutionary analysis of CytC using the maximum likelihood tree was constructed using

MEGA-X, while the bootstrap method was used as a test of phylogeny. All 20 eukaryotic sequenced aligned in section 3.2 were utilised alongside the prokaryotic R. marinus to root the tree based on an early maximum likelihood tree constructed on 20 eukaryotic and 10 prokaryotic sequences (Seen in Appendix 3) to identify the closest prokaryotic CytC sequence that can be utilised as an outgroup.

The constructed phylogenetic tree is depicted below in figure 7.

The results of the phylogenetic tree show no major inconstancies with all clades sticking within their expected groups such as the birds, mammals’ reptiles are together in their respective clades.

With regards to the bootstrapping analysis shows a range of values indicating that the some of the internal nodes are significantly robust compared to other in the tree.

Figure 7. Evolutionary analysis of Cytochrome C by Maximum Likelihood method. The evolutionary history was inferred by using the Maximum Likelihood method and JTT matrix-based model. The bootstrap consensus tree inferred from 500 replicates [2] is taken to represent the evolutionary history of the taxa analysed. Branches corresponding to partitions reproduced in less than 50% bootstrap replicates are collapsed. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (500 replicates) are shown next to the branches. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbour-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the JTT model, and then selecting the topology with superior log likelihood value. This analysis involved 21 amino acid sequences. There was a total of 164 positions in the final dataset. Evolutionary analyses were conducted in MEGA-X. Red dots indicate divergence points in species that will be analysed for syntenic region.

3.6 Cytochrome C Syntenic Analysis

Syntenic analysis was performed using CoGe and their GEvo web-based software to visualise areas of syntenic sequences, various organisms’ sequences from the CoGe database were compared to each other based on their inferred relationship from the evolutionary analysis of CytC seen in figure

8. With this is it is possible to visualise syntenic regions between two organisms as this will include a larger coverage then the genes mentioned in table 2 and account for conversed areas of non- coding sequences and infer if the location of CytC in conserved between species. Seen in figure 7 as red dots, these divergences from a common ancestor were analysed to see if syntenic regions around the CytC gene are conserved and indicate endosymbiotic gene transfer of CytC prior to this divergence.

Seen below in figure 8, shows six syntenic alignments of different species to represent varying different taxa to compare. Figure 8A comprised of H. sapiens and Pongo abelii show high levels of syntenic regions around the CytC gene depicted as overlapping pink lines, while in 8B the A. mississippiensis and carolinensis both reptiles show much less syntenic regions. However, in the example of birds (figure 8C) the Taeniopygia guttata and the Gallus gallus show more syntenic regions then the 8B but less than 8A. Syntenic regions are even found in more distantly related species seen in 8D, E and F, in H. sapiens and G. gallus, A. mississippiensis and H. sapiens and lastly

H. sapiens and B. taurus respectively. All of these show varying levels of syntenic regions.

Student Number-18022710

A) Homo sapiens (top) and Pongo abelii (bottom)

B) Alligator mississippiensis (top) and (bottom)

C) Taeniopygia guttata (top) and Gallus gallus (bottom)

53

Student Number-18022710

D) Homo sapiens (top) and Gallus gallus (bottom)

E) Alligator mississippiensis (top) and Homo sapiens (bottom)

F) Homo sapiens (top) and Bos taurus (bottom)

Figure 8. Syntenic Analysis of Various Species Cytochrome C Genomic Regions. Using GEvo the web-based software on CoGe, two species were analysed at a time and 200,000 bases 3’ and 5’ of the CytC gene were analysed to identify areas that are syntenic. The red lines indicate the syntenic areas and their matching areas on the other species, Orange blocks represent unsequenced areas, yellow represents the selected gene CytC and green/blue/grey areas represent areas of genes. (A) Homo sapiens and Pongo abelii. (B) Alligator mississippiensis and Anolis carolinensis. (C) Taeniopygia guttata and Gallus gallus. (D) Homo sapiens and Gallus gallus. (E) Alligator mississippiensis and Homo sapiens. (F) Homo sapiens and Bos taurus.

54

Student Number-18022710

3.7 Estimated Time of Divergence

To be able to analyse the number of times that CytC has undergone EGT an estimated time of divergence was calculated to determine if the more ancestral nodes explored from figure 7 have less syntenic regions then more closely related species. As it is expected to find less syntenic regions between more ancestral nodes because of an increased time since divergence would allow for greater mutations and recombination to alter the DNA sequences.

The details of these results can be seen below in table 4 show that as the time of divergence increases from 15.76 MYA between H. sapiens and P. abelii to 312 MYA shared between H. sapiens and G. gallus then A. mississippiensis and H. sapiens the syntenic regions decrease as the time of divergence increases. This indicates an on-going evolutionary process in which the areas 200,000 bases up and down stream of CytC are mutating.

Table 4. Estimated Time of Divergence Using TimeTree. To estimate the syntenic relationships between CytC and time since the possible endosymbiotic gene transfer the time since divergence of numerous species where calculated using TimeTree. This data included the number of studies the information was obtained from and the confidence intervals (Kumar et al., 2017).

Corresponding Name of Species Estimated Confidence Number of

Figure 8 Time Since Intervals Studies

Diagram Divergence (MYA) Data taken

(MYA) from.

A Homo sapiens Pongo abelii 15.76 14.74 - 16.70 60

B Alligator Anolis 280 273 - 284 30 mississippiensis carolinensis

C Taeniopygia Gallus gallus 98 93.2 - 104.6 43 guttata D Homo sapiens Gallus gallus 312 294 - 323 30

E Alligator Homo sapiens 312 294 - 323 30 mississippiensis F Homo sapiens Bos taurus 96 91 - 101 65

55

Student Number-18022710

4.0 Discussion

This research set out to help elucidate the origins of mitochondrial evolution and the EGT of CytC via the study of the structure, function and similarity in both eukaryotic and prokaryotic organisms.

The aim was to identify if EGT of CytC’s relocation to the genomic DNA occurred once by comparing if the chromosomal location were conserved. Prior to this, research identified a gap in inclusion of prokaryotic CytC into phylogenetic trees. Using only CytC this research failed further information on the origins of the mitochondria using CytC discussed in detail in section 4.1. However, this research did show similarities in structure and function (section 4.2) and differences in sequence variation (section 4.3). The differences between the D. variabilis amino acid sequence depending on whether it is encoded by the nuclear or mitochondrial genetic code was investigated and identified little difference to try and elucidate reasoning why CytC is still encoded on the mitochondrial genome discussed in section 4.4.

Finally, the research hypothesis stated that the endosymbiotic gene transfer of CytC from the mitochondrial genome to the nuclear genome occurred once, during the evolutionary history of the mitochondrial genome for Chordata’s. For this a maximum likelihood tree was constructed with bootstrapping to ensure the confidence in the clades (section 4.6) prior to being used to elucidate syntenic regions around the chromosomal location of CytC highlighting conserved regions. This shown a likelihood that the EGT of CytC occurred once during the Chordata’s development prior to the divergence of the last common ancestor between mammals, reptiles and birds approximately

312 MYA discussed in section 4.6. Therefore, the research hypothesis can be accepted that the EGT of CytC from the mitochondrial genome to the nuclear genome has occurred once during the evolutionary history with in evolutionary history of the phylum Chordata and their mitochondrial genome.

56

Student Number-18022710

4.1 Origins of the Mitochondria

As already discussed, the origins of the mitochondrial is based on a the most prominent explanation of the symbiogenesis scenario which occurred after the great oxidation event that took place 2.4 billion years ago (Gray, 2012; Schirrmeister et al., 2013). However, the actual origins of the α-proteobacterium remain unclear. This research aimed to shed light on the potential origins of the mitochondria by the use of CytC, a protein that is deemed essential for life and highly conserved. The aim was to achieve this by the construction of a phylogenetic tree based on both eukaryotic and prokaryotic CytC inclusive of R. americana in which its mitochondrial genome resembles the most bacterial typical mitochondrial genome known to date (Lang et al., 1997) and compare this to other α-proteobacterium such as Rhizobiales, Rhodobacterales and Rickettsiales which have all been implicated in the potential origins of the mitochondria (Esser et al., 2004;

Georgiades and Raoult, 2011) to identify similarities that may indicate the origin of CytC.

Although this phylogenetic tree can be seen in Appendix C it was utilised to identify an outgroup for the phylogenetic tree constructed for syntenic analysis. This occurred due to the mitochondrial genome of R. americana (accession number NC_001823.1) not encoding CytC or any potential homologs when a BLAST search was performed. This indicates that although this eukaryote has the title of the most bacterial like mitochondrial genome EGT has already transferred CytC to the nuclear DNA. Coupled with the nuclear genome of this organism not being available resulting in no

CytC to analyse this aim was unachievable. However, this is becoming apparent in other studies that the focus on sequencing is mainly , plants and fungi when in fact the answer may lie in the gene rich bacterial like mitochondrial genomes of the which has led to the suggestion that these represent an early diverging group from the eukaryotic lineage branching near the root of the tree. However, this conclusion is difficult to obtain evidence for due to the process of EGT.

The genome reduction of the mitochondria and lastly the patterns of genes presence and absence across the spectrum eukaryotes are all poorly understood (Janouškovec et al., 2017). Although, its accepted that mitochondria lacking the ETC lose their mitochondrial genome, currently there is no

57

Student Number-18022710 theory that accounts for the size variations within mitochondrial genomes and the genes present on the nuclear genome, potentially as focus remains on the more familiar eukaryotes and not from the diverse set of eukaryotic microbes which in turn may hold the clues to elucidating this theory.

From this being able to work backwards to visualise the original α-proteobacterium genome may have looked like may be the best option (Petitjean and Williams, 2017). However, a discovery in

2012 did shows tantalising clues that the endosymbiotic events believed to be a once in a 4 billion year event may not be the case with the discovery of Parakaryon myojinensis. An intermediating form of eukaryote and prokaryote seen below in figure 9, show a potential endosymbiotic relationship. However, this was the only specimen identified and much mystery still remains around the discovery. Although, it highlights that undiscovered organism may in fact be able to help elucidate the origins of the mitochondria (Björkholm et al., 2015).

Figure 9. Electron Microscope of Parakaryon myojinensis. An ultrathin section of Parakaryon myojinensis taken via an electron microscope. Approximately x100 greater volume then E. coli containing a large nucleoid (N), a single nucleoid membrane (NM), (CW), plasma membrane (PM) and the potential presence of (E). There also appears to be no mitochondria (Björkholm et al., 2015).

By being able to elucidate this theory of how the reduction and gene transfer occurs, it will help elevate some of the issues of the difficulty in phylogenetically reconstructing evolutionary events as current methods are based on comparing modern mitochondrial genomes to modern α- proteobacterium genomes without the understanding of how the mitochondrial genome has changed over 1.45 billion years ago (Martin and Mental, 2010). Yet even with this reduction in genome many of the proto-mitochondrion genes have retained enough phylogenetic signal to be 58

Student Number-18022710 identified based on how prokaryotic genomes resemble modern day mitochondrial genomes, the two most likely candidates are either distant ancestors of the SAR11 or Rickettsiales taxon (Thrash et al., 2011).

4.2 Cytochrome C Structure and Function

To be able identify firstly, that eukaryotic and prokaryotic CytC has conversed structure, function and then a potential reasoning upon why the D. variables’ CytC is still encoded by the mitochondrial genome. Both the structure and the function of CytC had to be investigated to highlight conserved regions. This is to ensure that CytC can be utilised for both eukaryotes and prokaryotes, this will ensure a rooted tree which can be utilised to infer evolutionary relationships and furthermore to identify regions in which potential changes in the genetic code may elucidate why transferring the

CytC gene to the nuclear genome may be detrimental.

Seen in Figure 3, both eukaryotic and prokaryotic CytC carton models show the helices that are indicative of CytC and the space fill models show it is the region around that haem group are highly conserved and towards the outside of the protein the regions become less conserved.

Figure 3 A and D show both R. marinus and H. sapiens CytC in a cartoon model respectively and align with the already known content that CytC is comprised of approximately 5 helices (Bertini,

Cavallaro and Rosato, 2006; Hannibal et al., 2016). As stated, CytC is characterised by the attachment of the haem to the polypeptide via the thioester bonds from a sequence motifs of

CXXCH (Allen et al., 2003).

The conserved central structure described above has also been identified using the same software, ConSurf prior to this research (Keya and Priya, 2016). Unsurprisingly given the conserved nature of this protein and the way in which the software selects homologs, even is this research utilised H. sapiens CytC (6ECJ) as a focal point and Keya and Priya (2016) utilised T. thynnus CytC

(3CYT). Thus, showing the conserved nature of eukaryotic CytC. However, other work has taken the analysis of CytC further and identified alternative configurations based on numerous functions such

59

Student Number-18022710 as its primary function as an electron carrier or intrinsic apoptotic pathway and redox senor when in the cytosol. This transition into other conformations and a gain of peroxidase activity is believed to be the driving force to allow CytC to be multifunctional and enable the translocation across cellular membranes. During studies CytC has been shown to interact with , undergo post-translational modification, examples of which are tyrosine nitration, methionine sulfoxidation, small alteration in electrical fields and phosphorylation. It is believed that these changes in conformation that allow CytC to the flexibility to ensure multiple in vivo functions.

However, elucidating each function to alternative configuration remains a major challenge although, its apparent that CytC undergoes these conformation changes during normal and altered homeostasis (Hannibal et al., 2016). This highlights the issue with accessing 3D structural data regarding CytC that in fact the protein structure is variable, the ConSurf structure is solely a representation of CytC.

The central region is noted by a sequence motif of CXXCH however, this is not the case for all species and shows more variation then the imagines seen in figure 3 C and F, showing highly conserved nature. However, members of the phylum , which comprised of and Trypanosoma species contain mitochondria CytC with a haem bound by only one thioether bond. This phylum has a sequence motif differing from the general eukaryotes by (A/F)XXCH.

Contrary, to the difference in the in-sequence motif and the binding of the haem attachment an X- ray crystal structure of trypanosomatid Crithidia fasciculata the overall structure remained similar to that of the general mitochondrial CytC utilising the CXXCH motif. This similarity also includes the stereochemistry of the covalent bound haem attachment. However, despite these similarities in conformation S. cerevisiae’s CytC haem lyase is unable to mature the ’s CytC

(Fülöp et al., 2009).

As stated by Richard-Fogal et al, (2007) eukaryotic CytC gains the majority of the spotlight, this has resulted in very little research into the structures of prokaryotic CytC. As this study has suggested the structure of prokaryotic CytC is not to dissimilar to eukaryotic CytC seen in figure 3.

60

Student Number-18022710

Although, research regarding the 3D structure of prokaryotic CytC is scarce, complex IV in which accepts electron from CytC when bound showing remarkable similarities between prokaryotic and the mitochondrial enzyme. The crystal structures, specifically the core subunits I, II and III look identical at the atomic level (Michel et al., 1998). This can be higher than 99% (99.73%) identity as seen between complex IV, subunit II between proteobacterium and P. aeruginosa. This difficulty in studying the structure of prokaryotic CytC is down the variations seen across prokaryotes this has been described previously stating their vast variation in sizes and functions (Kranz et al., 2009). This is seen within this research in the multiple sequence alignment figure 4B showing differences in size, gaps and only a partial CXXCH alignment. It is potentially this variation that a consensus has been reached on overall prokaryotic CytC structure.

This research was able to show that the outside residues of CytC in both eukaryotes and prokaryotes are less conserved then the integral central portion of the protein. However, early X- ray structure analysis shown conserved hydrophobic and aromatic side chains, and glycine.

This was observed in over 30 species. It also shown that one portion of the molecules surface is conserved although the individual acidic side chains appear not to be and positive charges localised around two channels that are hydrophobic which lead from the centre to the surface. These channels can potentially be seen in figure 3 B and E as the stick as the haem group at the centre of the protein. Therefore, it is possible these hydrophobic channels without are utilised to transfer electrons from the haem group to complex IV (Dickerson, 1971). These unusual conserved surface features of CytC are potentially down to the interactions with complex III and IV identified as early as 1971 which have been shown to show similar substation rates and peaks that correlate to one another (Grossman et al., 2004). Individual acidic side chain variation while not affecting overall function of the protein are merely a conservative missense mutation which overtime become established in a given population. It is these mutations that are witnessed in figure 3 B and

E showing areas of conserved residues. However, unlike complex IV, complex III has a generally lower percentage identity rate of approximately 50% between prokaryotes and eukaryotes. This

61

Student Number-18022710 indicates greater variation in complex III then complex IV therefore, as indicated by Grossman et al, (2004) that the evolutionary changes in CytC are linked to complex IV instead of the more varied complex III.

The major limit with using ConSurf to examine the structure of CytC is that the software identifies other CytC structures based on 150 homologs with a maximum percentage identification between sequences of 95% and minimum of 35%. This results in a selection process that is bias towards identifying similar CytC molecules, not a major issue for eukaryotes as seen in the figure 4A this are highly similar sequences and structures however and even differences in the phylum Euglenozoa don’t show major differences. However, within prokaryotes this does not represent a fair selection of the diverse sequences and structure. Although, this method is simple and easily assessable allowing for the quick analysis of CytC to ensure this protein is evolutionary conserved.

4.3 Cytochrome C Sequence Variations

By utilising an entropy plot to identify variations with multiple sequence alignments due to the increasing entropy with an increase in variation it is possible to identify in which amino acids within

CytC are conserved and conversely which are increased likelihood for mutations. This method has been utilised to identify such variations in otospiralin (Torktaz, Behjati and Rostami, 2016).

Utilising entropy to identify peaks in variation of CytC appears to have not yet been done expect in this study with other phylogenetic analysis relying on the use of multiple sequence alignment to identify variation in the amino acid sequence by . Such work has been shown by Keya and Priya

(2016) and Hofmann (2017), while Hofmann focussed solely on 14 amino acid positions 1, 3, 11, 12,

15, 21, 44, 46, 50, 58, 83, 85, 89 and 96. He was able to identify differences between all of these positions between 9 eukaryotic species. Keya and Priya (2016) focused on the entire 105 amino acids of 4 eukaryotic species and identified differences at amino acid positions 12, 13, 16, 47, 51,

59, 61, 63, 84, 89, 90, 93 and 104. Therefore, both studies shown variation in positions 12 and 89 although it is likely variation in CytC by Hofmann (2017) was not identified due to focusing solely on 14 amino acid positions. 62

Student Number-18022710

Utilising the information gained from figure 5A taken from 20 eukaryotic species the variation is seen in positions 5-10, 12-14, 19, 21-22, 25, 32, 54, 60, 64, 70, 72, 75-76, 93, 98-99, 102, 110 and

113-114 of the sequence alignments. When converted into amino acid position this shows positions of similar sequences of 105 amino acids in length such as H. sapiens utilised in the other studies

(Hofmann, 2017; Keya and Priya, 2016). This excluded positions 5-10 from this research as these are variations seen in longer CytC sequences, and therefore the variable positions taken from figure

5A correspond to positions 3-5, 10, 12, 13, 16, 23, 45, 51, 55, 61, 63, 66-67, 84, 89, 90, 93, 101, 104,

105. A direct comparison of all three studies identified variation in positions 12 and 89 while comparing this research and Keya and Priya (2016), 13, 16, 51, 61, 63, 84, 89, 90, 93 and 104.

Hofmann (2017), and this research matched at only position 3.

Although, the use of entropy to identify positions of variation in CytC appears not to have been done prior to this. The results above listing matching amino acids positions between all three studies shows that this method is able to detect variation as seen in the manual identification of differences in sequence alignments. These consistent findings across three studies of various taxa’s

CytC has highlight positions in which this protein’s amino acid sequence is variable. However, this study identified 10 more positions of variation when compared to the other studies. Although it is likely these extra positions are identified due to an increased number of sequences, as this research utilised 20 eukaryotic organisms, Hofmann (2017) utilised 10 eukaryotic organisms and Keya and

Priya, (2016) utilised only 4 eukaryotic organisms thefore as you include a varying array of organisms such as arthopods and fungi something in which the other studies never included it is expected to an increase in variation due to earlier divergence times (Zou and Zhang, 2019).

Entropy plots although they require less manual checking of sequence alignments, the down side is they do not account for amino acids positions that although, variable have conserved chemistry within their R groups. This can be seen in figure 4A in which different amino acids are highlighted in the same colour to show conserved chemistry, important due to the already discussed areas of

63

Student Number-18022710

CytC where individual acidic side chain are important and not the amino acids itself allowing for mutations altering the amino acids sequence (Dickerson, 1971).

With regards to the prokaryotic variation in the entropy graph seen in figure 5B does highlight large areas of variation as followed, followed 1-27, 60-74, 93-98, 100-111, 113-123, 125-141, 151-

169, 173-200. This matches what is known about the vast difference in prokaryotic variation however, due to the lack of research regarding prokaryotic CytC this research is unable to compare to other studies. Although, the variation is so vast in prokaryotes the comparison would only support the theory that prokaryotic CytC is vastly variable in size, functions and therefore amino sequences (Kranz et al., 2009).

4.4 Mitochondrial and Nuclear Cytochrome C Variations

By utilising the different genetic codes available on MEGA the variation between D. variabilis with its CytC encoded on the mitochondrial genome and its closest available relation on GenBank the R. appendiculatus which are from the same family Ixodidae, encoded on the nucleus was investigated.

This was attempted with the mitochondrial CytC for the Conocephalum conicum plant however; no close relation could be identified using GenBank with an available CytC encoded by the nuclear genome. This was undertaken to identify reasons in which against the ‘standard’ of CytC being nuclear encoded and is encoded on the mitochondrial genome.

Due to the unviability of a closely related species of the C. conicum accessible on online databases the nucleotide sequence could not be compared. Using MEGA the genetic code can be changed for the of the nucleotide sequence, this is to identify if changes in the coding may in fact result in non-functional proteins via amino acids replacements. Although, this was not done for the

C. conicum because of initial work done in 1990 which states that mitochondria and chloroplasts of green plants utilise the universal genetic code without any changes (Jukes and Osawa, 1990).

However, recently the mitochondrial genomes of green plants have begun to gain attention because of their complex genome organization and structural diversity across clades that correlate

64

Student Number-18022710 major differences in evolutionary rates (Noutahi et al., 2019). However, liverworts or

Marchantiophyta such as C. conicum are thought of as pioneer plants that first colonised lands and are located between and higher plants (Koselski, Trebacz and Dziubinska, 2019). When combined with research with other vascular plant mitochondrial genomes it is noted that they contain a variable set of introns, with liverworts containing between 23-30 with an average of 28 introns. All major lineages of land plants contain a unique number of introns and no single is shared between all land plant mitochondrial genomes therefore, intron content is conserved within but is not conserved among lineages this suggests introns, gains and losses occurred early during the early evolution of land plants. While conservative evolution was able to maintain their content in their lineages (Slipiko et al., 2017). The mitochondrial genomes of land plants also contain numerous repeated sequences which include large repeats (>1,000bp), medium sized repeats (100-

1,000bp) and small repeats (50-100bp) and as the repeats decrease from ‘large’ to ‘small’ in size they appear to increase numerically within the mitochondrial genome. The increased repeat length may facilitate the amount of intragenomic recombination and therefore, account for the fluid genomic structure of vascular mitochondrial genomes (Mower, Sloan and Alverson, 2012).

It has been identified that the mitochondrial genome of liverworts contains and intermediate amount of repeated sequences and sizes and therefore, may account for intragenomic recombination (Dong et al., 2019).

Despite the availability of intragenomic recombination liverworts mitochondrial genome is capable of undertaking, it remains structurally stable similar to the other Liverwort plastomes. It is believed that this is shaped by nuclear encoded DSB repair proteins suppress ectopic recombination across small direct repeats within the mitochondria. During phytogenic analysis four DSB repair gene families were identified RecA, RecG, RecX and OSB and show notable liverwort-specific subfamily expansions. This suggest unusual evolutionary pattern of DNA repair mechanisms in liverworts however, the subcellular location of these protein yielded ambiguous results when using

TargetP therefore it cannot be ruled out that the liverwort specific expansions may contribute to

65

Student Number-18022710 the mitochondrial stability of liverworts. Selective pressures might have driven the divergence of the DSB repair gene families that may exhibit a wider range of functions that have helped support the high structural integrity of the mitochondrial genome (Dong et al., 2019).

The conserved evolution of the of the Liverwort mitochondrial genome and potentially specific-

DSB repair proteins may explain the low recombination levels coupled EGT is often associated with increasing the stability of the symbiosis (Petitjean and Williams, 2017). However, with the increased stability of the liverwort mitochondrial genome the EGT may in turn be less likely to occur resulting in CytC remaining encoded on the mitochondrial genome. Although, due to the lack of liverwort mitochondrial genomes available this speculation and require further sequencing of liverwort mitochondrial genomes and analysis.

Unlike the mitochondrial genome of liverworts and plants the D. variabilis mitochondrial genome is typically representative of the ‘standard’ mitochondrial genome with an average size of 14,633bp and is much more influenced by gene rearrangements and lengths of non-coding regions. Utilising

MITOS online analysis it has been shown that no and deletions are observed in tick mitochondrial genomes, which contain 13 protein coding genes, 2 rRNA and 22 tRNA genes

(Wang et al., 2019). However, Wang et al, (2019) did not use the mitochondria of D. variabilis in their sample of 63 species. In which none encoded for CytC on their mitochondrial genome, even more interestingly the study sampled 4 closely related species of the same genus Dermacentor.

Therefore, it is plausible that the D. variabilis may be the only tick species that the mitochondrial genome encodes CytC even differing between the other closely related species in the genus

Dermacentor.

Firstly, the nucleotide sequences between both the D. variabilis and the R. appendiculatus were aligned and compared to identify any differences, which identified a small amount of nucleotide differences. This is not unexpected as they share the same family Ixodidae as, Wang et al, (2019) shown via phylogenetic analysis. Utilising MEGA’s translation tool the nucleotide CytC sequence from the mitochondrial genome it was translated from using both the standard nuclear genetic 66

Student Number-18022710 code and invertebrate mitochondrial genetic code to identify if differences may be the reason that

CytC has remained on the mitochondrial genome via negative selection. This identified an amino acid change from the methionine while the standard nuclear genetic code encoded isoleucine at position 101, this can be seen in figure 6.

This position when compared to the position on the entropy graph (figure 5A) corresponds to position 105 and is represents by a position which shows little variation. However, both methionine and isoleucine are non-polar amino acids however, Methionine is known to be susceptible to oxidation and such oxidation events are known to cause sulfoxide to form often caused by natural biological oxidants and methionine is present on the outer surface of the protein. This oxidation has been known to often reduce or eliminate the biological activity. Although, it remains to be elucidated why methionine oxidation alters the biological activity however, the oxidation of proteins is known to destabilise the structure regarded as chemical mutagenesis. The methionine side chain is substituted with methionine sulfoxide, a larger and polar side chain which unsurprisingly affects the structure and function. Often a strategy utilised by pharmaceutical companies is to replace methionine with other residues more resistant to oxidation such as isoleucine (Kim et al., 2001). Therefore, it is plausible to be a strategy that has arisen via the evolutionary process of positive selection via EGT of CytC resulting in a more resistant protein to oxidative events.

A residue utilised to reduce oxidation events include isoleucine, this is because it is more resilient to oxidative damage due to not containing sulphur unlike methionine (Kim et al., 2001). This amino acid is found when converted to the standard genetic code and is conversed on 20 eukaryotic species obtained for this research, when seen in figure 4A and position 105. This may indicate a form of positive evolution in which the transfer of CytC to the nuclear genome helps reduce the oxidation events in what is a highly oxidative environment due to the presence of ROS and oxidative events of CytC which increases with age (Lebiedzinska et al., 2009; Pereverzev et al., 2003).

67

Student Number-18022710

Although, this suggests a reason why CytC has undergone EGT it does not answer the why it has remained encoded on the mitochondrial genome encoded in the D. variabilis.

This amino acid at position 101 is not currently recognised as a site that is regulated via phosphorylation to control CytC function, these site are 28, 47, 48, 58 and 97. These site show a conservation across species, although the majority of research focuses on mammals (Kalpage et al.,

2020). Therefore, it remains difficult to draw conclusions on why the CytC remains encoded via mtDNA for D. variabilis, partly due to the lack of research into the EGT of CytC as this protein is nearly entirely now encoded on the nuclear genome and only 3 species are identified on GenBank are identified are the D. variabilis, C. conicum and partial sequences of Leishmania spp. With regards to , systematic investigations on tick mitochondrial genomes remain rare due to only

63 mitochondrial genomes have been completely sequenced which leaves approximately 93% unexplored (Wang et al., 2019).

4.5 Cytochrome C Phylogenetic Tree

Using the sequences seen in figure 4A a maximum likelihood tree was constructed to identify the inferred evolutionary history of CytC across 20 eukaryotic species seen in figure 7 using R. marinus as an outgroup to root the tree.

The phylogenetic tree seen in figure 7 does not represent an entire tree of life but a representation of the super group Opisthokonta, in which contains metazoans and fungi although this study does not include heterotrophic protists the super group does include several lines. This super group is one of the most studied kingdoms of life (del Campo and Ruiz-Trillo, 2013). Many molecular based studies including single gene phylogenies, and indels have continued to support the existence of this group making it of the most reliable super groups (Liu et al., 2009).

This research’s maximum likelihood tree, shown all expected clades grouping together such as H. sapiens and P. abelii and both reptiles with a varying degree of success with bootstrapping statistics.

68

Student Number-18022710

However, one major inconstancy is the position of clade containing the E. lucius and D. rerio representing , sharing the same class or bony fish. Such Bony fish are normally represented as an outgroup of a Chordata phylogenetic tree. When compared to figure 7 they represent the position taken up by reptiles. These clades can actually be swapped to represent a classical view of a Chordata phylogenetic tree. This can be represented by the bootstrap value of

37, placing a common ancestor between birds and fish indicating low statistical significance.

Topological variation in phylogenetic trees constructed by single genes among eukaryotes are not uncommon, there are multiple factors that can cause genes to give different topologies however, there are three basic sources of variation (Castresana, 2007). The first is there is an important natural variation in between genes caused by the random probability of mutations. Short genes such as CytC are most affected by the random nature of mutations. These mutations found in CytC across different species may not be accurately reflect their true phylogeny. Second, the random retention of ancestral polymorphism in species that are diverging, referred to as lineage sorting.

This is an important natural source of variation in phylogenetic studies of closely related species.

Lastly, there can be the presence of phylogenetic reconstruction artefacts such as the saturation of substations, base-compositional bias and long branch attraction caused by rapidly evolving species or the species in which have long diverged and highly mutated they begin to appear closely related to distantly related species. However, Bayesian phylogenetic methods have helped reduce such issues (Huerta-Cepas et al., 2007). There are methodical problems related to homology by the selection of orthologs, which are homologous genes created by and not by gene duplication. This gene duplications leads to paralogs that can lead to misinterpretation of phylogenetic trees (Jeffroy et al., 2006; Rokas and Carroll, 2006)

Despite the issues with single gene phylogenies Keya and Priya (2016) was able to create the appropriate clades on their constructed Chordata phylogenetic tree therefore, this indicates that the three issues detailed above regarding the variation of mutations may not be the cause of the topological variation seen in this research. CytC has also been utilised as a model protein for

69

Student Number-18022710 molecular evolution since 1963 (Margoliash, 1963). Therefore, this may be due to an issue described by Hofmann (2017) and the unexpected finding of a phylogenetic tree from the divergences of apes and old-world monkeys due to variations in the molecular clock of CytC.

However, D. rerio has been found to have a similar molecular clock rate compared to and birds. (Huang, 2008). There does not appear to be specific data on the molecular clock of the E. lucius. Finally, the paralogs of CytC seen in bony fish may be the result of whole genome duplication which occurred approximately 300 MYA after the divergence of ’s termed genome duplication. After whole genome duplication events rapid gene loss normally occurs in the first 60 million years with a loss of more than 70-80% of duplicated genes deleted in early mass deletion events and genomic rearrangements followed by 250 million years of slower loss of individual genes

(Inoue et al., 2015). This is backed up by the presence of two somatic CytC genes, A (Gene ID:

100034500) utilised in this study and B (Gene ID: 415158) on GenBank. Although no such duplications are identified for the E. lucius however, this may have been lost during later deletion events after divergence from a common ancestor. This system is continuously and independently reshaped by evolution, as gene duplication and loss is the driving force of evolution to provide an atlas of gene repertoires (Fernández and Gabaldón, 2020). However, in the phylogenetic analysis by Keya and Priya (2016) contains T. thynnus CytC another bony fish in which was represented in an appropriate position on the phylogenetic tree, thus indicating that the gene duplication event early in bony fish evolution may not cause the topological variation seen in this study.

With the major issues described by Castresana (2007) discussed and unable to identify the issue with the topological variation seen in this study, this and the position of the CytC sequence remains unknown and requires a more detailed investigation to identify if this suggests a long branching artefact of CytC.

As described above there are issues in using single gene phylogenetic trees although, most studies utilise the small subunit (SSU) rRNA and not functional proteins. The use of SSU rRNA is the gold standard for microbial identification and the elucidation of deep evolutionary relationships.

70

Student Number-18022710

Occurring in all cells and organelles, the rRNA genes are large and conservative, with eukaryotic and prokaryotic SSU rRNA sharing approximately 50% identify over the alignable regions. The rRNA genes have also not undergone lateral gene transfer and their structural properties allow for optimized alignments and can be obtain through various database sources (Pace, 2009).

Although, there are limitations to utilising SSU rRNA for classification and the construction of phylogenetic trees due to the conserved nature which allows for the identification of deep phylogenies. This highly conserved nature results in the sequences being unable to discriminate close relatives of a species caused by the few changes in the sequence. Seen in the use of rRNA which does not reliably separate H. sapiens and M. musculus therefore, a less conversed gene the rRNA is required such as CytC which has shown the capability to separate organisms from this study and Kenya and Priya (2016). However, other information can be taken from close similarities in genes used in phylogenetic studies, as they can indicate different organisms, are similar at the cellular level, cell structure and metabolism. This may indicate that the CytC seen in D. rerio and E. lucius are similar to the Corvus cornix cornix and G. gallus then the expected Python bivittatus and

Crotalus horridus. However, research on mitochondria indicates nothing unusually different regarding the mitochondria genome or CytC which in fact closely resembles H. sapiens CytC (Ambler and Daniel, 1991; Yan, Li and Zhou, 2008).

An Issue with using SSU rRNA for resolving phylogenetic relationships is the available information which in SSU rRNA is limited to approximately 1,000 characters. The amount of information or characters utilised in phylogenetic analysis is key as it will influence the accuracy or the results, the more information within the data set the better. This is a major issue when using amino acid sequence of CytC with only contains approximately 105 characters therefore, the construction of a more accurate tree requires more information.

One way to include more information in for phylogenetic studies is to include more than one gene, an example of which would be to include the large subunit (LSU) rRNA gene, normally twice the size of SSU rRNA. However, databases only contain a small proportion of LSU rRNA sequences when 71

Student Number-18022710 compared to the vast array of SSU rRNA available. Moreover, there is a limited diversity of sequences of LSU rRNA which hinders the ability for the construction of large scale phylogenetic trees covering multiple taxa (Pace, 2009).

A possible technique to increase the available information for the construction of phylogenetic trees is to use a series of interconnected sequences or combinations of multiple genes. This approach it best utilised for the resolution of branches within trees, such as being able to distinguish the true position of fish and reptiles seen in figure 7. Although this method of using concatenated gene sets to elucidate deep branching evolution is considerably uncertain due to the low accuracy of sequence alignments, as the potential for the inclusion of highly variable or nonhomologous genes is high. With the sequence diversity limited the method is only utilised in a three domain tree, outlined with rRNA sequences to resolve branching but often contains a lack of consistency within the domain (Ciccarelli et al., 2006).

However, recently there has been a shift away from the above-mentioned methods, to phylogenomics which reconstructs the evolutionary histories of organisms by utilising the whole genome or large fractions of genomes. This coupled with the growing abundance of genomic information becoming readily available on a variety of organism has allowed phylogenomic studies to infer evolutionary relationships. In turn this has allowed for the development of computer programmes to handle such data sets. These studies are normally limited to the construction of phylogenetic trees of organisms with smaller genomes such as bacteria and (Patané,

Martins and Setubal, 2018).

4.6 Cytochrome C Syntenic Analysis

Due to the issues in the maximum likelihood tree (figure 7) discussed above with placement of both the fish and reptile clade, the tree is a not a true representation of the evolutionary history of the species and the evolution path CytC has taken. Despite this analysis can identify syntenic regions of nuclear DNA of 200,000 bases 3’ and 5’ of the CytC gene chosen due to the vast non-coding sequences between genes located around CytC. However due to the restricted nature of the CoGe 72

Student Number-18022710 database, not all sequences utilised for the creation of the maximum tree were available and therefore, other organisms were used that were available from the CoGe database that matched the organisms at the of the clade as explained in section 2.2.

Prior to the syntenic analysis seen in table 3, highlighted that genes flanking the CytC where conserved in the species utilised in this study across taxa with genes such as NPVF, OSBPL3 and even an uncharacterised ORF (C7orf31) remains conversed. Overall, this is not enough to identify syntenic regions as this only accounts for genes, where as non-coding sequences of DNA may also be conserved between different species. Seen in figure 8 A-F show diagrammatic representation of different degrees of syntenic areas of DNA.

From this information in table 4 and the syntenic analysis from figure 8 A-F, it is possible to determine that the EGT event of CytC happen prior to 312 MYA. This suggests that CytC was present on the nuclear genome upon the emergence of tetrapod’s a superclass of the first four vertebrates that include all reptiles, mammals, birds and approximately 346 - 358 MYA

Strengthen by the numerous amounts of studies utilised in the creation of the estimated divergence time (Kumar et al., 2017). As the syntenic analysis between both H. sapiens and G. gallus (figure

8D), A. mississippiens and H. sapiens (figure 8E) shows multiple syntenic areas within the 200,000 bases 3’ and 5’ of the CytC gene separated by 312 million years of divergence. The syntenic areas increase as the time since divergence decreases, seen in results section 3.7, table 4 suggests that

CytC did in fact undergo EGT prior to 312 MYA. Although, this appears to the only study identifying a possible timeframe in which the EGT transfer of CytC and even other originally encoded mitochondrial proteins. The differences witnessed in syntenic regions are due to chances in the vast non coding regions between genes undergoing evolutionary mutations.

As this research has already mentioned (section 1.1.4) EGT normally inserts in specific regions of

DNA that follows five general rules however, these insertions tend to be near retrotransposons

(Dayama et al., 2014; Tsuji et al., 2012). Retrotransposons have an impact on genome evolution due to the accumulation and activity in the genome over tens of millions of years. Specifically, LINE- 73

Student Number-18022710

1, Alu and SVA elements have tremendous effects on evolution in terms of structure and function, heavily researched in primate genomes. To first assess the impact these retrotransposons can impact evolution, the rate of which they can insert themselves in different regions of the genome, for both Alu and LINE-1 are estimated at approximately 20 births in H. sapiens. However, SVA retrotransposon appears to be much less frequent cause approximately every 900 births (Kano et al., 2009; Xing et al., 2009).

All three elements LINE-1, Alu and SVA can generate genomic rearrangements and create structural variation via, deletions, duplications and inversions. The insertion of these elements can result in a following deletion of adjacent genomic nucleotide sequences. These can range from 1 bp to >130 kb. These deletions occur via endonuclease-dependent and endonuclease-independent mechanisms (Gilbert et al., 2005). Both LINE-1 and Alu elements can also create genomic variation through recombination involving non-allelic homologous elements, this includes elements that that have been inserted for a long time within the genome. Defined as ectopic recombination can also results in the deletion, duplications and inversions on genetic material. Finally, the transduction of flanking sequences, either up and down stream sequences can occur in LINE-1 and SVA elements.

In 3’ transduction the RNA transcription machinery misses the weak retrotransposon signal therefore, continues until an alternative polyadenylation signal downstream. This is similar in 5’ transduction when a promoter located upstream of a retrotransposon is utilised to begin the transcription of the sequence. The transcript is then integrated back into the DNA via retrotransposition (Cordaux and Batzer, 2009; Hancks et al., 2009).

Due to these factors affecting the genomic stability which can alter evolution coupled with the other

EGT rules of large open chromatin regions (Dayama et al., 2014; Tsuji et al., 2012). It is plausible to expect these retrotransposons to incur the vast differences in the DNA between syntenic regions of species that diverged 312 million of years ago.

As previously discussed in section 4.5 the maximum likelihood tree constructed shown that H. sapiens and birds such a G. gallus shared a more recent common ancestor then H. sapiens and

74

Student Number-18022710 reptiles such as the A. mississippiens. However, the results from table 4 indicated that both H. sapiens, birds and reptiles shared a common ancestor 312 MYA. This is more in line with the standard phylogenetic trees, furthermore when the divergence of H. sapiens and D. rerio is investigated using TimeTree shown that the divergence was approximately 435 MYA taken from 32 studies (Kumar et al., 2017). Thus, putting the divergence of Bony Fish much later then suggested by the maximum likelihood tree constructed in this research.

This research has suggested that the EGT of CytC occurred 312 MYA prior to a last common ancestor of H. sapiens, Birds and Reptiles however, the D. variabilis is shown to still encode CytC on the mitochondria genome, this suggests that the EGT transfer should not have happened prior to the last common answer between H. sapiens and ticks as this would result in CytC being present on the nuclear genome on all later diverged species. Utilising TimeTree the estimated divergence time between H. sapiens and R. appendiculatus is 797 MYA taken from 31 studies (Kumar et al., 2017).

From this it can be estimated that the first CytC EGT occurred after 797 MYA and prior to 312 MYA.

Although, D. variabilis still have mitochondrial encoded CytC and R. appendiculatus nuclear encoded it suggests in tick lineages EGT must have occurred more than once. This is not uncommon as research suggests other mitochondrial proteins are encoded on both nuclear and mitochondrial genomes such as mitochondrial complex II subunits SDH3 and SDH4, fully reviewed in Roger,

Muñoz-Gómez and Kamikawa, (2017).

Mitochondrial genome reduction coupled with EGT has occurred since the last common ancestor of the eukaroytes and the largest mitchondrial genome belongs to jakobid which can encode for 66 protein genes (Roger, Muñoz-Gómez and Kamikawa, 2017). However, the R. americana mitochondrial genome available on GenBank (Reference Number NC_001823.1) contains no CytC even when a BLAST is ran to identify possible homologs to other CytC nucleotide sequences. This suggests that the CytC is already encoded on the nuclear genome, however this is yet to be sequenced and therefore unable to be confirmed. Although, if this is true this pushes the

EGT date back before 1660 MYA based on 8 studies. Conflicting the D. variabilis CytC positioning,

75

Student Number-18022710

If correct this would result in all metazoans and fungi that comprise the supergroup Opisthokonta which emerged approximately 749 - 1461 MYA, would encode CytC on the nuclear genome (Kumar et al., 2017). This would make the mitochondrial encoded CytC of the D. variabilis an anomaly.

Although C. conicum part of a different supergroup, which emerged approximately

773 - 1174 MYA therefore, may have undergone a separate EGT event. Predating the ‘Cambrian explosion’ approximately 540 MYA (Zhuravlev and Wood, 2018). Thus, inferring that either the linage of Ticks had already diverged prior to this point backed up by the flanking genes seen in table

3 showing different flanking genes compared to H. sapiens, B. taurus etc. Although, other eukaryotic lineages are considered to have subsets of R. americana mitochondrial genomes (Roger,

Muñoz-Gómez and Kamikawa, 2017) or the original research stating that CytC is encoded on the mitochondrial genome of the D. variabilis requires confirmation. The C. conicum was not discussed in this due to the unavailability of closely related species CytC sequences available to identify where they are encoded and their mitochondrial stability already discussed in section 4.4.

As already mentioned previously in this study and discussed by Pace, (2009) the problem with being able to completely elucidate the number of times EGT of CytC has occurred was hindered because of the limited diversity in sequences in databases and specifically CoGe. To be able to identify regions of synteny between a wider range of species such as H. sapiens and other this would aid in the identification of a shared EGT event. However, identifying syntenic region is a comparison genetics without numerical answers and based on judgement.

4.7 Limitations of the Study

The limitation of this study is the majority of databases and therefore, results in other research papers are heavily focused on . This limits the research to only a small clade of eukaryotes. Ignoring the vast diversity seen across eukaryotes and limiting the impact of this research. As this research suggests the EGT of CytC occurred prior to 312 MYA however, with a more diverse dataset it would be possible to potentially move this date back by being more inclusive of arthropods, fungi and protists. This results in only a small section of eukaryotes being included

76

Student Number-18022710 in the research, resulting in a conservative estimate when as its true value requires the inclusion of early diverging species. Thus, allowing a more in-depth study and potentially moving the date of

EGT of CytC back, as already indicated the potential if R. americana does in fact encode CytC on its nuclear genome the date could predate 1660 MYA.

With this limitation of the diversity of genetic information available in public databases it impacts the investigation into the mitochondrial encoded CytC found in C. conicum and D. variabilis.

Without the genetic information of closely related organism it become impossible to identify if these two organisms are exceptions and anomalous results against the trend that CytC is generally encoded by the nuclear genome or that the family groups of these organisms have resisted EGT of one of the most essential mitochondrial proteins based on its roles in ‘life and death’ scenarios for the cell. This resistance is potentially the case with regards to C. conicum, due the discussed mitochondrial stability in section 4.4. A third mitochondrial encoded CytC, Leishmania spp. was also excluded from the study due to only having a partial CytC sequence available.

4.8 Further Work

Although, this research appears to be the first time the EGT of mitochondrial protein CytC has been attempted to be mapped. It does highlight that by utilising this method it is potential to achieve a potential date in which a gene must have undergone EGT. For further work regarding

CytC the building of a maximum likelihood tree should utilise a different approach from the sequence of CytC to a more standard of SSU and LSU-rRNA. This will allow for sufficient information that is conversed across eukaryotes but contains enough variation to be able to identify clades or utilise the growing trend in phylogenomics. However, this may become difficult on identifying large homologous sections of the genome if a wide range of eukaryotic organism are chosen. By utilising such techniques this will overcome the miss represented clade the bony fish and reptiles seen in figure 7 and help increase the significance to the bootstrapping analysis by creating more robust sequences. Once an accurate maximum likelihood tree has been created and evolutionary relationships can be inferred the systematic checking of each internal nodes to identify and

77

Student Number-18022710 estimate divergence times Aiding to identify any potential errors with the phylogenetic tree. Once this has been achieved utilising the online tools at CoGe, to identify syntenic regions around CytC.

Thus, allowing either confirming or denying the validity of the finding of this research by building upon a greater range of eukaryotes organisms. This method is not exclusive to only CytC to can be utilised to investigate the EGT of other mitochondrial protein, although this may in turn be more difficult as other mitochondrial protein are more widespread between the nuclear and mitochondrial genome.

Secondly, work could be undertaken for the confirmation that CytC of both C. conicum and D. variabilis are in fact both encoded on the mitochondrial genome and if this is confirmed continue to sequence closely related organisms to identify if CytC is encoded on the mitochondria is conserved across the genus. This will also help in increasing the diversity seen within online genomic databases.

78

Student Number-18022710

5.0 Conclusion

CytC is now nearly exclusively encoded by the nuclear genome with only three known exceptions are C. conicum, D. variabilis and Leishmania spp. because of this it was hypothesised that the EGT of CytC occurred once in the Chordata lineage. These exceptions are not of the Chordata lineage and may in fact indicate other EGT events. Thus, resulting in the near widespread nuclear encoding of CytC. This research was able to identify that the EGT event for Chordata’s likely occurs prior to

312 MYA prior to the divergence of mammals and reptiles. However, this research was limited due to the limited diversity of online genetic databases and it is possible that CytC EGT occurred as early as 1660 MYA due to the absence of CytC on the mitochondrial genome of R. americana although, this is speculation as the nuclear genome is yet to be sequenced. Due to the lack of other research focusing on the EGT of CytC this appears to the first time a time scale for such an event has been identified because of this, this research is without support from other studies. However, the method utilised have all been used in other phylogenetic studies. It has shown the potential in assessing the EGT of CytC and can be used to elucidate more complex EGT events of mitochondrial proteins. Therefore, this has the potential to continue to broaden the understanding of the mystery that is the mitochondrial evolution, and as this begins to become understood it may help elucidate the origins of the mitochondria and its α-proteobacterium roots.

Lastly, this research has also highlighted the lack of diversity in the genetic information found in online databases across eukaryotes, heavily focused on Chordates. The limited availability of sequences from arthropods, fungi and protists for example limit the scope of such investigations.

This can also be seen in prokaryotes genetic databases and as this is the potential origins of mitochondria coupled with prokaryotes being the most diverse organism it is imperative this is increased to identify other prokaryotes that may share an ancestor with the origins of the mitochondria.

79

Student Number-18022710

References

Allen, J. F., Raven, J. A., Allen, J. W. A., Daltrop, O., Stevens, J. M. and Ferguson, S. J. (2003) 'C-type

cytochromes: diverse structures and biogenesis systems pose evolutionary problems', Philosophical

Transactions of the Royal Society of London. Series B: Biological Sciences, 358(1429), pp. 255-266.

Almpanis, A., Swain, M., Gatherer, D. and McEwan, N. (2018) 'Correlation between bacterial G+C content,

genome size and the G+C content of associated and bacteriophages', Microb Genom, 4(4).

Altmann, R. (1894) Die Elementarorganismen und ihre Beziehungen den Zellen. Veit.

Altschul, S. F., Gish, W., , W., Myers, E. W. and Lipman, D. J. (1990) 'Basic local alignment search tool',

J Mol Biol, 215(3), pp. 403-10.

Ambler, R. P. (1991) 'Sequence variability in bacterial cytochromes c', Biochim Biophys Acta, 1058(1), pp.

42-7.

Ambler, R. P. and Daniel, M. (1991) 'Rattlesnake cytochrome c. A re-appraisal of the reported amino acid

sequence', The Biochemical journal, 274 ( Pt 3)(Pt 3), pp. 825-831.

Anderson, S., Bankier, A. T., Barrell, B. G., de Bruijn, M. H., Coulson, A. R., Drouin, J., Eperon, I. C., Nierlich,

D. P., , B. A., Sanger, F., Schreier, P. H., Smith, A. J., Staden, R. and Young, I. G. (1981) 'Sequence

and organization of the human mitochondrial genome', Nature, 290(5806), pp. 457-65.

Anderson, S., de Bruijn, M. H., Coulson, A. R., Eperon, I. C., Sanger, F. and Young, I. G. (1982) 'Complete

sequence of bovine mitochondrial DNA. Conserved features of the mammalian mitochondrial

genome', J Mol Biol, 156(4), pp. 683-717.

Barrell, B. G., Bankier, A. T. and Drouin, J. (1979) 'A different genetic code in human mitochondria', Nature,

282(5735), pp. 189-94.

Baum, D. A. and Baum, B. (2014) 'An inside-out origin for the eukaryotic cell', BMC Biol, 12, pp. 76.

Benda, C. (1898) 'Ueber die spermatogenese der vertebraten und höherer evertebraten, II. Theil: Die

histiogenese der spermien', Arch. Anat. Physiol, 73, pp. 393-398.

Bender, A., Hajieva, P. and Moosmann, B. (2008) 'Adaptive methionine accumulation in

respiratory chain complexes explains the use of a deviant genetic code in mitochondria',

80

Student Number-18022710

Proceedings of the National Academy of Sciences of the United States of America, 105(43), pp.

16496-16501.

Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. and Wheeler, D. L. (2007) 'GenBank', Nucleic acids

research, 35(Database issue), pp. D21-5.

Berg, O. G. and Kurland, C. G. (2000) 'Why Mitochondrial Genes are Most Often Found in Nuclei', Molecular

Biology and Evolution, 17(6), pp. 951-961.

Bertini, I., Cavallaro, G. and Rosato, A. (2006) 'Cytochrome c: Occurrence and Functions', Chemical Reviews,

106(1), pp. 90-115.

Bibb, M. J., Van Etten, R. A., Wright, C. T., Walberg, M. W. and Clayton, D. A. (1981) 'Sequence and gene

organization of mouse mitochondrial DNA', Cell, 26(2 Pt 2), pp. 167-80.

Björkholm, P., Harish, A., Hagström, E., Ernst, A. M. and Andersson, S. G. E. (2015) 'Mitochondrial genomes

are retained by selective constraints on ', Proceedings of the National Academy of

Sciences of the United States of America, 112(33), pp. 10154-10161.

Brindefalk, B., Ettema, T. J. G., Viklund, J., Thollesson, M. and Andersson, S. G. E. (2011) 'A

phylometagenomic exploration of oceanic reveals mitochondrial relatives

unrelated to the SAR11 clade', PloS one, 6(9), pp. e24457-e24457.

Cao, G., Xing, J., Xiao, X., Liou, A. K. F., Gao, Y., Yin, X.-M., Clark, R. S. B., Graham, S. H. and Chen, J. (2007)

'Critical role of calpain I in mitochondrial release of apoptosis-inducing factor in ischemic neuronal

injury', Journal of , 27(35), pp. 9278-9293.

Castresana, J. (2007) 'Topological variation in single-gene phylogenetic trees', Genome biology, 8(6), pp.

216-216.

Chalmers, S. and Nicholls, D. G. (2003) 'The relationship between free and total calcium concentrations in

the matrix of liver and brain mitochondria', Journal of Biological Chemistry, 278(21), pp. 19062-

19070.

Ciccarelli, F. D., Doerks, T., von Mering, C., Creevey, C. J., Snel, B. and Bork, P. (2006) 'Toward automatic

reconstruction of a highly resolved tree of life', Science, 311(5765), pp. 1283-7.

81

Student Number-18022710

Cordaux, R. and Batzer, M. A. (2009) 'The impact of retrotransposons on evolution', Nature

reviews. Genetics, 10(10), pp. 691-703.

Cottet-Rousselle, C., Ronot, X., Leverve, X. and Mayol, J.-F. (2011) 'Cytometric assessment of mitochondria

using fluorescent probes', Cytometry Part A, 79A(6), pp. 405-425.

Dayama, G., Emery, S. B., Kidd, J. M. and Mills, R. E. (2014) 'The genomic landscape of polymorphic human

nuclear mitochondrial insertions', Nucleic acids research, 42(20), pp. 12640-12649.

Degenhardt, K., Sundararajan, R., Lindsten, T., Thompson, C. and White, E. (2002) 'Bax and Bak

independently promote cytochrome C release from mitochondria', J Biol Chem, 277(16), pp. 14127-

34. del Campo, J. and Ruiz-Trillo, I. (2013) 'Environmental survey -analysis reveals hidden diversity among

unicellular ', and evolution, 30(4), pp. 802-805.

Dickerson, R. E. (1971) 'The structure of cytochromec and the rates of molecular evolution', Journal of

Molecular Evolution, 1(1), pp. 26-45.

Dong, S., He, Q., Zhang, S., Wu, H., Goffinet, B. and Liu, Y. (2019) 'The mitochondrial genomes of Bazzania

tridens and Riccardia planiflora further confirm conservative evolution of mitogenomes in

liverworts', The Bryologist, 122(1), pp. 130-139.

Dorstyn, L., Read, S., Cakouros, D., Huh, J. R., Hay, B. A. and Kumar, S. (2002) 'The role of cytochrome c in

caspase activation in Drosophila cells', The Journal of , 156(6), pp. 1089-

1098.

Elliott, H. R., Samuels, D. C., Eden, J. A., Relton, C. L. and Chinnery, P. F. (2008) 'Pathogenic mitochondrial

DNA mutations are common in the general population', American journal of , 83(2),

pp. 254-260.

Esser, C., Ahmadinejad, N., Wiegand, C., Rotte, C., Sebastiani, F., Gelius-Dietrich, G., Henze, K., Kretschmann,

E., Richly, E., Leister, D., Bryant, D., Steel, M. A., Lockhart, P. J., Penny, D. and Martin, W. (2004) 'A

genome phylogeny for mitochondria among alpha-proteobacteria and a predominantly eubacterial

ancestry of yeast nuclear genes', Mol Biol Evol, 21(9), pp. 1643-60.

Feagin, J. E. (2000) 'Mitochondrial genome diversity in parasites', Int J Parasitol, 30(4), pp. 371-90. 82

Student Number-18022710

Fernández, R. and Gabaldón, T. (2020) 'Gene gain and loss across the metazoan tree of life', Nature

& Evolution, 4(4), pp. 524-533.

Fitch, W. M. and Margoliash, E. (1967) 'Construction of Phylogenetic Trees', Science, 155(3760), pp. 279-

284.

Friedman, J. R. and Nunnari, J. (2014) 'Mitochondrial form and function', Nature, 505(7483), pp. 335-343.

Fülöp, V., Sam, K. A., Ferguson, S. J., Ginger, M. L. and Allen, J. W. (2009) 'Structure of a trypanosomatid

mitochondrial cytochrome c with attached via only one thioether bond and implications for

the substrate recognition requirements of heme lyase', Febs j, 276(10), pp. 2822-32.

Gammage, P. A. and Frezza, C. (2019) 'Mitochondrial DNA: the overlooked oncogenome?', BMC Biology,

17(1), pp. 53.

Georgiades, K. and Raoult, D. (2011) 'The rhizome of Reclinomonas americana, Homo sapiens,

humanus and Saccharomyces cerevisiae mitochondria', Biology direct, 6, pp. 55-55.

Gherman, A., Chen, P. E., Teslovich, T. M., Stankiewicz, P., Withers, M., Kashuk, C. S., Chakravarti, A., Lupski,

J. R., Cutler, D. J. and Katsanis, N. (2007) 'Population bottlenecks as a potential major shaping force

of human genome architecture', PLoS genetics, 3(7), pp. e119-e119.

Gilbert, N., Lutz, S., Morrish, T. A. and Moran, J. V. (2005) 'Multiple fates of L1 retrotransposition

intermediates in cultured human cells', Molecular and cellular biology, 25(17), pp. 7780-7795.

Giovannoni, S. J., Tripp, H. J., Givan, S., Podar, M., Vergin, K. L., Baptista, D., Bibbs, L., Eads, J., Richardson,

T. H., Noordewier, M., Rappé, M. S., Short, J. M., Carrington, J. C. and Mathur, E. J. (2005) 'Genome

streamlining in a cosmopolitan oceanic bacterium', Science, 309(5738), pp. 1242-5.

Glaser, F., Pupko, T., Paz, I., Bell, R. E., Bechor-Shental, D., Martz, E. and Ben-Tal, N. (2003) 'ConSurf:

Identification of Functional Regions in Proteins by Surface-Mapping of Phylogenetic Information',

Bioinformatics, 19(1), pp. 163-164.

Godini, R. and Fallahi, H. (2019) 'A brief overview of the concepts, methods and computational tools used

in phylogenetic tree construction and gene prediction', Meta Gene, 21, pp. 100586.

Goodman, M. (1985) 'Rates of molecular evolution: the hominoid slowdown', Bioessays, 3(1), pp. 9-14.

Gray, M. W. (2011) 'The incredible shrinking organelle', EMBO reports, 12(9), pp. 873-873. 83

Student Number-18022710

Gray, M. W. (2012) 'Mitochondrial evolution', Cold Spring Harbor perspectives in biology, 4(9), pp. a011403-

a011403.

Gray, M. W. (2017) 'Lynn Margulis and the endosymbiont hypothesis: 50 years later', Molecular biology of

the cell, 28(10), pp. 1285-1287.

Gray, M. W., Lang, B. F., Cedergren, R., Golding, G. B., Lemieux, C., Sankoff, D., Turmel, M., Brossard, N.,

Delage, E., Littlejohn, T. G., Plante, I., Rioux, P., Saint-Louis, D., Zhu, Y. and Burger, G. (1998)

'Genome structure and gene content in protist mitochondrial ', Nucleic acids research, 26(4),

pp. 865-878.

Grossman, L. I., Wildman, D. E., Schmidt, T. R. and Goodman, M. (2004) 'Accelerated evolution of the

electron transport chain in anthropoid primates', Trends , 20(11), pp. 578-85.

Haag, S., Sloan, K. E., Ranjan, N., Warda, A. S., Kretschmer, J., Blessing, C., Hübner, B., Seikowski, J.,

Dennerlein, S., Rehling, P., Rodnina, M. V., Höbartner, C. and Bohnsack, M. T. (2016) 'NSUN3 and

ABH1 modify the wobble position of mt-tRNAMet to expand codon recognition in mitochondrial

translation', The EMBO journal, 35(19), pp. 2104-2119.

Haag-Liautard, C., Coffey, N., Houle, D., Lynch, M., Charlesworth, B. and Keightley, P. D. (2008) 'Direct

estimation of the mitochondrial DNA mutation rate in ', PLoS biology, 6(8),

pp. e204-e204.

Hahn, A. and Zuryn, S. (2019) 'Mitochondrial Genome (mtDNA) Mutations that Generate Reactive Oxygen

Species', (Basel, Switzerland), 8(9), pp. 392.

Hall, B. G. (2013) 'Building Phylogenetic Trees from Molecular Data with MEGA', Molecular Biology and

Evolution, 30(5), pp. 1229-1235.

Hall, T. A. 'BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows

95/98/NT'. 1999: [London]: Information Retrieval Ltd., c1979-c2000., 95-98.

Hancks, D. C., Ewing, A. D., Chen, J. E., Tokunaga, K. and Kazazian, H. H. (2009) '-trapping mediated by

the human retrotransposon SVA', Genome research, 19(11), pp. 1983-1991.

84

Student Number-18022710

Hanekamp, T. and Thorsness, P. E. (1996) 'Inactivation of YME2/RNA12, which encodes an integral inner

mitochondrial membrane protein, causes increased escape of DNA from mitochondria to the

nucleus in Saccharomyces cerevisiae', Molecular and Cellular Biology, 16(6), pp. 2764.

Hannibal, L., Tomasina, F., Capdevila, D. A., Demicheli, V., Tórtora, V., Alvarez-Paggi, D., Jemmerson, R.,

Murgida, D. H. and Radi, R. (2016) 'Alternative Conformations of Cytochrome c: Structure, Function,

and Detection', Biochemistry, 55(3), pp. 407-428.

Herrmann, J. M. and Neupert, W. (2000) 'Protein transport into mitochondria', Current Opinion in

Microbiology, 3(2), pp. 210-214.

Herschy, B., Whicher, A., Camprubi, E., Watson, C., Dartnell, L., Ward, J., Evans, J. R. G. and Lane, N. (2014)

'An origin-of-life reactor to simulate alkaline hydrothermal vents', Journal of molecular evolution,

79(5-6), pp. 213-227.

Hiraoka, Y., Yamada, T., Goto, M., Das Gupta, T. K. and Chakrabarty, A. M. (2004) 'Modulation of mammalian

and death by prokaryotic and eukaryotic cytochrome c', Proc Natl Acad Sci U S A,

101(17), pp. 6427-32.

Hofmann, J. R. (2017) 'Rate variation during molecular evolution: and the cytochrome c

molecular clock', Evolution: Education and Outreach, 10(1), pp. 1.

Holmes, S. (2003) 'Bootstrapping Phylogenetic Trees: Theory and Methods', Statist. Sci., 18(2), pp. 241-255.

Huang, H. and Manton, K. G. (2004) 'The role of oxidative damage in mitochondria during aging: a review',

Front Biosci, 9, pp. 1100-1117.

Huang, S. (2008) 'The Genetic Equidistance Result of Molecular Evolution is Independent of Mutation Rates',

Journal of computer science and systems biology, 1, pp. 92-102.

Huelsenbeck, J. P., Bollback, J. P. and Levine, A. M. (2002) 'Inferring the root of a phylogenetic tree',

Systematic biology, 51(1), pp. 32-43.

Huerta-Cepas, J., Dopazo, H., Dopazo, J. and Gabaldón, T. (2007) 'The human phylome', Genome biology,

8(6), pp. R109-R109.

Hug, L. A., Stechmann, A. and Roger, A. J. (2010) 'Phylogenetic distributions and histories of proteins

involved in anaerobic pyruvate metabolism in eukaryotes', Mol Biol Evol, 27(2), pp. 311-24. 85

Student Number-18022710

Hüttemann, M., Pecina, P., Rainbolt, M., Sanderson, T. H., Kagan, V. E., Samavati, L., Doan, J. W. and Lee, I.

(2011) 'The multiple functions of cytochrome c and their regulation in life and death decisions of

the mammalian cell: From respiration to apoptosis', Mitochondrion, 11(3), pp. 369-381.

Inoue, J., Sato, Y., Sinclair, R., Tsukamoto, K. and Nishida, M. (2015) 'Rapid genome reshaping by multiple-

gene loss after whole-genome duplication in teleost fish suggested by mathematical modeling',

Proceedings of the National Academy of Sciences of the United States of America, 112(48), pp.

14918-14923.

Janouškovec, J., Tikhonenkov, D. V., Burki, F., Howe, A. T., Rohwer, F. L., Mylnikov, A. P. and Keeling, P. J.

(2017) 'A New Lineage of Eukaryotes Illuminates Early Mitochondrial Genome Reduction', Current

Biology, 27(23), pp. 3717-3724.e5.

Jeffroy, O., Brinkmann, H., Delsuc, F. and Philippe, H. (2006) 'Phylogenomics: the beginning of

incongruence?', TRENDS in Genetics, 22(4), pp. 225-231.

Joyce, G. F. (2007) 'Forty years of in vitro evolution', Angewandte Chemie International Edition, 46(34), pp.

6420-6436.

Jukes, T. H. and Osawa, S. (1990) 'The genetic code in mitochondria and chloroplasts', Experientia, 46(11),

pp. 1117-1126.

Kalpage, H. A., Wan, J., Morse, P. T., Zurek, M. P., Turner, A. A., Khobeir, A., Yazdi, N., Hakim, L., Liu, J.,

Vaishnav, A., Sanderson, T. H., Recanati, M.-A., Grossman, L. I., Lee, I., Edwards, B. F. P. and

Hüttemann, M. (2020) 'Cytochrome c phosphorylation: Control of mitochondrial electron transport

chain flux and apoptosis', The International Journal of Biochemistry & Cell Biology, 121, pp. 105704.

Kann, O. and Kovács, R. (2007) 'Mitochondria and neuronal activity', American Journal of -Cell

Physiology, 292(2), pp. C641-C657.

Kano, H., Godoy, I., Courtney, C., Vetter, M. R., Gerton, G. L., Ostertag, E. M. and Kazazian, H. H. (2009) 'L1

retrotransposition occurs mainly in embryogenesis and creates somatic mosaicism', Genes &

development, 23(11), pp. 1303-1312.

Keeling, P. J. and Palmer, J. D. (2008) 'Horizontal gene transfer in eukaryotic evolution', Nature Reviews

Genetics, 9(8), pp. 605-618. 86

Student Number-18022710

Keilin, D. and Hardy, W. B. (1925) 'On cytochrome, a respiratory pigment, common to animals, yeast, and

higher plants', Proceedings of the Royal Society of London. Series B, Containing Papers of a Biological

Character, 98(690), pp. 312-339.

Kennedy, E. P. and Lehninger, A. L. (1949) 'Oxidation of fatty acids and tricarboxylic acid cycle intermediates

by isolated liver mitochondria', J Biol Chem, 179(2), pp. 957-72.

Keya, K. and Priya, S. (2016) 'A Study of Phylogenetic Relationships and Homology of Cytochrome C using

Bioinformatics'.

Kim, Y. H., Berry, A. H., Spencer, D. S. and Stites, W. E. (2001) 'Comparing the effect on protein stability of

methionine oxidation versus mutagenesis: steps toward engineering oxidative resistance in

proteins', Protein Engineering, Design and Selection, 14(5), pp. 343-347.

Kleine, T., Maier, U. G. and Leister, D. (2009) 'DNA transfer from organelles to the nucleus: the idiosyncratic

genetics of endosymbiosis', Annu Rev Plant Biol, 60, pp. 115-38.

Koonin, E. V. (2010) 'The origin and early evolution of eukaryotes in the light of phylogenomics', Genome

biology, 11(5), pp. 209-209.

Koonin, E. V. and Aravind, L. (2002) 'Origin and evolution of eukaryotic apoptosis: the bacterial connection',

Cell Death & Differentiation, 9(4), pp. 394-404.

Koselski, M., Trebacz, K. and Dziubinska, H. (2019) 'The role of vacuolar ion channels in salt stress tolerance

in the liverwort Conocephalum conicum', Acta Physiologiae Plantarum, 41(7), pp. 110.

Kranz, R. G., Richard-Fogal, C., Taylor, J.-S. and Frawley, E. R. (2009) 'Cytochrome c Biogenesis: Mechanisms

for Covalent Modifications and Trafficking of Heme and for Heme-Iron Redox Control',

and Molecular Biology Reviews, 73(3), pp. 510.

Krippner, A., Matsuno-Yagi, A., Gottlieb, R. A. and Babior, B. M. (1996) 'Loss of function of cytochrome c in

Jurkat cells undergoing fas-mediated apoptosis', Journal of Biological Chemistry, 271(35), pp.

21629-21636.

Kumar, S., Stecher, G., Li, M., Knyaz, C. and Tamura, K. (2018) 'MEGA X: Molecular Evolutionary Genetics

Analysis across Computing Platforms', Molecular biology and evolution, 35.

87

Student Number-18022710

Kumar, S., Stecher, G., Suleski, M. and Hedges, S. B. (2017) 'TimeTree: A Resource for Timelines, Timetrees,

and Divergence Times', Mol Biol Evol, 34(7), pp. 1812-1819.

Lane, N. (2015) 'The unseen World: Reflections on Leeuwenhoek (1677) ‘Concerning little animals’',

Philosophical Transactions of The Royal Society B Biological Sciences, 370.

Lane, N. and Martin, W. (2010) 'The energetics of genome complexity', Nature, 467(7318), pp. 929-934.

Lang, B. F., Burger, G., O'Kelly, C. J., Cedergren, R., Golding, G. B., Lemieux, C., Sankoff, D., Turmel, M. and

Gray, M. W. (1997) 'An ancestral mitochondrial DNA resembling a eubacterial genome in miniature',

Nature, 387(6632), pp. 493-7.

Le, S. Q. and Gascuel, O. (2008) 'An improved general amino acid replacement matrix', Molecular biology

and evolution, 25(7), pp. 1307-1320.

Lebiedzinska, M., Duszynski, J., Rizzuto, R., Pinton, P. and Wieckowski, M. R. (2009) 'Age-related changes in

levels of p66Shc and serine 36-phosphorylated p66Shc in organs and mouse tissues', Archives of

Biochemistry and Biophysics, 486(1), pp. 73-80.

Leister, D. (2005) 'Origin, evolution and genetic effects of nuclear insertions of organelle DNA', Trends

Genet, 21(12), pp. 655-63.

Lewis, K. (2000) 'Programmed death in bacteria', Microbiol Mol Biol Rev, 64(3), pp. 503-14.

Lindmark, D. G. and Müller, M. (1973) 'Hydrogenosome, a cytoplasmic organelle of the anaerobic flagellate

Tritrichomonas foetus, and its role in pyruvate metabolism', J Biol Chem, 248(22), pp. 7724-8.

Liu, X., Kim, C. N., Yang, J., Jemmerson, R. and Wang, X. (1996) 'Induction of apoptotic program in cell-free

extracts: requirement for dATP and cytochrome c', Cell, 86(1), pp. 147-157.

Liu, Y., Steenkamp, E. T., Brinkmann, H., Forget, L., Philippe, H. and Lang, B. F. (2009) 'Phylogenomic analyses

predict sistergroup relationship of nucleariids and fungi and of zygomycetes with

significant support', BMC evolutionary biology, 9, pp. 272-272.

Liu, Z., Lin, H., Ye, S., Liu, Q.-y., Meng, Z., Zhang, C.-m., Xia, Y., Margoliash, E., , Z. and Liu, X.-j. (2006)

'Remarkably high activities of testicular cytochrome c in destroying reactive oxygen species and in

triggering apoptosis', Proceedings of the National Academy of Sciences, 103(24), pp. 8965-8970.

88

Student Number-18022710

Mai, N., Chrzanowska-Lightowlers, Z. M. A. and Lightowlers, R. N. (2017) 'The process of mammalian

mitochondrial protein synthesis', Cell and Tissue Research, 367(1), pp. 5-20.

Margoliash, E. (1963) 'PRIMARY STRUCTURE AND EVOLUTION OF CYTOCHROME C', Proceedings of the

National Academy of Sciences of the United States of America, 50(4), pp. 672-679.

Margulis, L. (1970) Origin of eukaryotic cells: Evidence and research implications for a theory of the origin

and evolution of microbial, plant and animal cells on the . Yale University Press.

Martin, W. and Herrmann, R. G. (1998) 'Gene Transfer from Organelles to the Nucleus: How Much, What

Happens, and Why?', , 118(1), pp. 9.

Martin, W. and Müller, M. (1998) 'The hydrogen hypothesis for the first eukaryote', Nature, 392(6671), pp.

37-41.

Martin, W. F. and Mental, M. (2010) 'The Origin of Mitochondria', Nature Education, 3(9), pp. 58.

McLean, J. R., Cohn, G. L., Brandt, I. K. and Simpson, M. V. (1958) 'Incorporation of labeled amino acids into

the protein of muscle and liver mitochondria', J Biol Chem, 233(3), pp. 657-63.

Mendell, J. E., Clements, K. D., Choat, J. H. and Angert, E. R. (2008) 'Extreme polyploidy in a large bacterium',

Proc Natl Acad Sci U S A, 105(18), pp. 6730-4.

Meyer, A., Todt, C., Mikkelsen, N. T. and Lieb, B. (2010) 'Fast evolving 18S rRNA sequences from

Solenogastres () resist standard PCR amplification and give new insights into mollusk

substitution rate heterogeneity', BMC Evol Biol, 10, pp. 70.

Michel, H., Behr, J., Harrenga, A. and Kannt, A. (1998) 'Cytochrome c oxidase: structure and spectroscopy',

Annu Rev Biophys Biomol Struct, 27, pp. 329-56.

Mower, J. P., Sloan, D. B. and Alverson, A. J. (2012) 'Plant mitochondrial genome diversity: the genomics

revolution', Plant Genome Diversity Volume 1: Springer, pp. 123-144.

Noutahi, E., Calderon, V., Blanchette, M., El-Mabrouk, N. and Lang, B. F. (2019) 'Rapid Genetic Code

Evolution in Green Algal Mitochondrial Genomes', Molecular biology and evolution, 36(4), pp. 766-

783.

O'Rourke, B. (2010) 'From bioblasts to mitochondria: ever expanding roles of mitochondria in cell

physiology', Frontiers in physiology, 1, pp. 7-7. 89

Student Number-18022710

Ow, Y.-L. P., Green, D. R., Hao, Z. and Mak, T. W. (2008) 'Cytochrome c: functions beyond respiration',

Nature Reviews Molecular Cell Biology, 9(7), pp. 532-542.

Pace, N. R. (2009) 'Mapping the tree of life: progress and prospects', Microbiology and molecular biology

reviews : MMBR, 73(4), pp. 565-576.

Pagliarini, D. J. and Rutter, J. (2013) 'Hallmarks of a new era in mitochondrial biochemistry', Genes &

development, 27(24), pp. 2615-2627.

Parasuraman, S. (2012) 'Protein data bank', J Pharmacol Pharmacother, 3(4), pp. 351-2.

Patané, J. S. L., Martins, J., Jr. and Setubal, J. C. (2018) 'Phylogenomics', Methods Mol Biol, 1704, pp. 103-

187.

Pereverzev, M. O., Vygodina, T. V., Konstantinov, A. A. and Skulachev, V. P. 2003. Cytochrome c, an ideal

antioxidant. Portland Press Ltd.

Petitjean, C. and Williams, T. A. (2017) 'Evolution: New Gene-Rich Mitochondria Found across the Eukaryotic

Tree', Current Biology, 27(23), pp. R1270-R1271.

Piel, D. A., Deutschman, C. S. and Levy, R. J. (2008) 'Exogenous cytochrome C restores myocardial

cytochrome oxidase activity into the late phase of sepsis', Shock (Augusta, Ga.), 29(5), pp. 612.

Piel, D. A., Gruber, P. J., Weinheimer, C. J., Courtois, M. R., Robertson, C. M., Coopersmith, C. M.,

Deutschman, C. S. and Levy, R. J. (2007) 'Mitochondrial resuscitation with exogenous cytochrome c

in the septic ', Critical care medicine, 35(9), pp. 2120-2127.

Pierron, D., Wildman, D. E., Hüttemann, M., Letellier, T. and Grossman, L. I. (2012) 'Evolution of the couple

cytochrome c and cytochrome c oxidase in primates', Adv Exp Med Biol, 748, pp. 185-213.

Pisani, D., Cotton, J. A. and McInerney, J. O. (2007) 'Supertrees disentangle the chimerical origin of

eukaryotic genomes', Mol Biol Evol, 24(8), pp. 1752-60.

Rich, P. R. (2003) 'The molecular machinery of Keilin's respiratory chain', Biochemical Society Transactions,

31(6), pp. 1095-1105.

Richard-Fogal, C. L., Frawley, E. R., Feissner, R. E. and Kranz, R. G. (2007) 'Heme concentration dependence

and metalloporphyrin inhibition of the system I and II cytochrome c assembly pathways', Journal of

bacteriology, 189(2), pp. 455-463. 90

Student Number-18022710

Rodley, C. D. M., Grand, R. S., Gehlen, L. R., Greyling, G., Jones, M. B. and O'Sullivan, J. M. (2012)

'Mitochondrial-Nuclear DNA Interactions Contribute to the Regulation of Nuclear Transcript Levels

as Part of the Inter-Organelle Communication System', PLOS ONE, 7(1), pp. e30943.

Roger, A. J., Muñoz-Gómez, S. A. and Kamikawa, R. (2017a) 'The Origin and Diversification of Mitochondria',

Current Biology, 27(21), pp. R1177-R1192.

Rokas, A. and Carroll, S. B. (2006) 'Bushes in the tree of life', PLoS biology, 4(11), pp. e352-e352.

Sagan, L. (1967) 'On the origin of mitosing cells', J Theor Biol, 14(3), pp. 255-74.

Schirrmeister, B. E., de Vos, J. M., Antonelli, A. and Bagheri, H. C. (2013) 'Evolution of multicellularity

coincided with increased diversification of cyanobacteria and the Great Oxidation Event',

Proceedings of the National Academy of Sciences, 110(5), pp. 1791.

Sinsheimer, J. S., Little, R. J. A. and Lake, J. A. (2012) 'Rooting Gene Trees without Outgroups: EP Rooting',

Genome Biology and Evolution, 4(8), pp. 821-831.

Sleator, R. D. (2011) 'Phylogenetics', Arch Microbiol, 193(4), pp. 235-9.

Slipiko, M., Myszczyński, K., Buczkowska-Chmielewska, K., Bączkiewicz, A., Szczecińska, M. and Sawicki, J.

(2017) 'Comparative analysis of four Calypogeia species revealed unexpected change in

evolutionarily-stable liverwort mitogenomes', Genes, 8(12), pp. 395.

Srivastava, N. and Pande, M. (2016) 'Mitochondrion: Features, functions and comparative analysis of

specific probes in detecting sperm cell damages', Asian Pacific Journal of , 5(6), pp.

445-452.

Stein, N. (2013) 'Synteny (Syntenic Genes)', in Maloy, S. and Hughes, K. (eds.) Brenner's Encyclopedia of

Genetics (Second Edition). San Diego: Academic Press, pp. 623-626.

Taanman, J.-W. (1999) 'The mitochondrial genome: structure, transcription, translation and replication',

Biochimica et Biophysica Acta (BBA) - , 1410(2), pp. 103-123.

Thao, M. L., Gullan, P. J. and Baumann, P. (2002) 'Secondary (-Proteobacteria) endosymbionts infect

the primary (beta-Proteobacteria) endosymbionts of multiple times and coevolve with

their hosts', Applied and environmental microbiology, 68(7), pp. 3190-3197.

91

Student Number-18022710

Thorsness, P. E., White, K. H. and Fox, T. D. (1993) 'Inactivation of YME1, a member of the ftsH-SEC18-PAS1-

CDC48 family of putative ATPase-encoding genes, causes increased escape of DNA from

mitochondria in Saccharomyces cerevisiae', Molecular and Cellular Biology, 13(9), pp. 5418.

Thrash, J. C., Boyd, A., Huggett, M. J., Grote, J., Carini, P., Yoder, R. J., Robbertse, B., Spatafora, J. W., Rappé,

M. S. and Giovannoni, S. J. (2011) 'Phylogenomic evidence for a common ancestor of mitochondria

and the SAR11 clade', Scientific reports, 1, pp. 13-13.

Tobe, S. S., Kitchener, A. C. and Linacre, A. M. T. (2010) 'Reconstructing mammalian phylogenies: a detailed

comparison of the and cytochrome oxidase subunit I mitochondrial genes', PloS one,

5(11), pp. e14156-e14156.

Torktaz, I., Behjati, M. and Rostami, A. (2016) 'Phylogenetic analysis of otospiralin protein', Advanced

biomedical research, 5, pp. 41-41.

Tovar, J., León-Avila, G., Sánchez, L. B., Sutak, R., Tachezy, J., van der Giezen, M., Hernández, M., Müller, M.

and Lucocq, J. M. (2003) 'Mitochondrial remnant organelles of function in iron-sulphur

protein maturation', Nature, 426(6963), pp. 172-6.

Tsuji, J., Frith, M. C., Tomii, K. and Horton, P. (2012) 'Mammalian insertion is non-random', Nucleic

acids research, 40(18), pp. 9073-9088.

Tyler, S., Tyson, S., Dibernardo, A., Drebot, M., Feil, E. J., Graham, M., Knox, N. C., Lindsay, L. R., Margos, G.,

Mechai, S., Van Domselaar, G., Thorpe, H. A. and Ogden, N. H. (2018) 'Whole genome sequencing

and phylogenetic analysis of strains of the agent of Lyme disease Borrelia burgdorferi from

Canadian emergence zones', Scientific Reports, 8(1), pp. 10552. van Gent, D. C., Hoeijmakers, J. H. J. and Kanaar, R. (2001) 'Chromosomal stability and the DNA double-

stranded break connection', Nature Reviews Genetics, 2(3), pp. 196-206.

Vandamme, A.-M. (2009) 'Basic concepts of molecular evolution', The phylogenetic handbook, 2, pp. 3-32.

Viklund, J., Ettema, T. J. and Andersson, S. G. (2012) 'Independent genome reduction and phylogenetic

reclassification of the oceanic SAR11 clade', Mol Biol Evol, 29(2), pp. 599-615. von Dohlen, C. D., Kohler, S., Alsop, S. T. and McManus, W. R. (2001) ' beta-proteobacterial

endosymbionts contain gamma-proteobacterial symbionts', Nature, 412(6845), pp. 433-6. 92

Student Number-18022710

Wang, C. and Youle, R. J. (2009) 'The role of mitochondria in apoptosis', Annual review of genetics, 43, pp.

95-118.

Wang, T., Zhang, S., Pei, T., Yu, Z. and Liu, J. (2019) 'Tick mitochondrial genomes: structural characteristics

and phylogenetic implications', Parasites & Vectors, 12(1), pp. 451.

Xing, J., Zhang, Y., , K., Salem, A. H., Sen, S. K., Huff, C. D., Zhou, Q., Kirkness, E. F., Levy, S. and Batzer,

M. A. (2009) 'Mobile elements create structural variation: analysis of a complete human genome',

Genome research, 19(9), pp. 1516-1526.

Yakes, F. M. and Van Houten, B. (1997) 'Mitochondrial DNA damage is more extensive and persists longer

than nuclear DNA damage in human cells following oxidative stress', Proceedings of the National

Academy of Sciences of the United States of America, 94(2), pp. 514-519.

Yamanaka, T. and Fukumori, Y. (1981) 'Functional and Structural Comparisons between Prokaryotic and

Eukaryotic aa3-Type Cytochrome c Oxidases from an Evolutionary Point of View', Plant and Cell

Physiology, 22(7), pp. 1223-1230.

Yan, J., Li, H. and Zhou, K. (2008) 'Evolution of the mitochondrial genome in : gene rearrangements

and phylogenetic relationships', BMC genomics, 9, pp. 569-569.

Zhuravlev, A. Y. and Wood, R. A. (2018) 'The two phases of the Cambrian Explosion', Scientific reports, 8(1),

pp. 16656-16656.

Zou, Z. and Zhang, J. (2019) 'Amino acid exchangeabilities vary across the tree of life', Science Advances,

5(12), pp. eaax3124.

93

Student Number-18022710

Acknowledgements

I wish to express my sincerer thanks Dr Gavin McStay for his continued support, supervision and advice throughout the entire process from planning, bioinformatic analyse and the writing process.

Secondly, I would like to thank all at Staffordshire University including Dr Richard Halfpenny that have aided in the development of my skills and no how small.

Finally, I would like to thank anyone I have come into contact with over the past 4 months who has had to deal with my rambling regarding this research and putting up with me during the day following late .

94

Student Number-18022710

Appendix 1 – Project Declaration Form

(NB: A signed copy of this form must be included within each copy of your dissertation.)

STUDENT NAME: Rhys Thomas

1) PLAGIARISM DISCLAIMER This dissertation contains only my own data. I have not copied sections from any textbook, article, web site or other dissertation without acknowledging the source.

Signed: R.M. Thomas [student]

2) MATERIALS SATISFACTORILY CLEARED AWAY I am satisfied that this student has returned all reusable materials for storage and has safely disposed of all waste materials.

Signed: [technician/supervisor]

3) SAFETY AND ACADEMIC ETHICS We confirm that the University’s guidelines concerning safety and ethics have been consulted and that the safety and ethical issues and implications in relation to this project have been considered. We confirm that risk assessments were approved and any necessary ethical approval was obtained prior to the commencement of data collection.

Signed: R.M. Thomas [student]

Signed: [supervisor]

95

Student Number-18022710

Appendix 2 – Raw Entropy Data Eukaryotic Entropy data Alignment = C:\Users\rhysm\OneDrive\Desktop\Eukaryotes MEGA MUSCLE Alignment FASTA.fas Position 1: 0.19852 Position 2: 0.61287 Position 3: 0.61287 Position 4: 0.89538 Position 5: 1.17520 Position 6: 1.44278 Position 7: 1.33032 Position 8: 1.34731 Position 9: 0.99582 Position 10: 1.33307 Position 11: 0.19852 Position 12: 0.87113 Position 13: 1.62297 Position 14: 0.92616 Position 15: 0.89538 Position 16: 0.19852 Position 17: 0.73059 Position 18: 0.19852 Position 19: 1.00976 Position 20: 0.00000 Position 21: 1.33032 Position 22: 0.99727 Position 23: 0.64745 Position 24: 0.00000 Position 25: 1.15821 Position 26: 0.32508 Position 27: 0.00000 Position 28: 0.00000 Position 29: 0.19852 Position 30: 0.00000 Position 31: 0.19852 Position 32: 1.20797 Position 33: 0.19852 Position 34: 0.42271 Position 35: 0.68744 Position 36: 0.19852 Position 37: 0.00000 Position 38: 0.97457 Position 39: 0.00000 Position 40: 0.00000 Position 41: 0.00000 Position 42: 0.00000 Position 43: 0.91429 Position 44: 0.00000 Position 45: 0.51819 Position 46: 0.42271 Position 47: 0.00000 Position 48: 0.00000 Position 49: 0.61287 Position 50: 0.50040 Position 51: 0.00000 Position 52: 0.61287

96

Student Number-18022710

Position 53: 0.82607 Position 54: 1.33307 Position 55: 0.00000 Position 56: 0.64745 Position 57: 0.39440 Position 58: 0.00000 Position 59: 0.56234 Position 60: 1.03311 Position 61: 0.00000 Position 62: 0.00000 Position 63: 0.00000 Position 64: 1.20522 Position 65: 0.51819 Position 66: 0.61287 Position 67: 0.56234 Position 68: 0.93764 Position 69: 0.00000 Position 70: 1.33032 Position 71: 0.70835 Position 72: 1.07908 Position 73: 0.19852 Position 74: 0.19852 Position 75: 1.14212 Position 76: 1.17520 Position 77: 0.00000 Position 78: 0.00000 Position 79: 0.42271 Position 80: 0.42271 Position 81: 0.00000 Position 82: 0.00000 Position 83: 0.00000 Position 84: 0.00000 Position 85: 0.00000 Position 86: 0.00000 Position 87: 0.19852 Position 88: 0.00000 Position 89: 0.00000 Position 90: 0.00000 Position 91: 0.94335 Position 92: 0.00000 Position 93: 1.30415 Position 94: 0.00000 Position 95: 0.67301 Position 96: 0.00000 Position 97: 0.32508 Position 98: 1.37052 Position 99: 1.63996 Position 100: 0.50040 Position 101: 0.00000 Position 102: 1.36616 Position 103: 0.00000 Position 104: 0.00000 Position 105: 0.19852 Position 106: 0.63903 Position 107: 0.19852 Position 108: 0.00000 Position 109: 0.61287 97

Student Number-18022710

Position 110: 1.58272 Position 111: 0.68744 Position 112: 0.39440 Position 113: 1.34515 Position 114: 1.28992 Prokaryotic Entropy data Alignment = C:\Users\rhysm\OneDrive\Desktop\Prokaryotes MEGA MUSCLE Alignment FASTA.fas Position 1: 0.67301 Position 2: 1.47081 Position 3: 1.69574 Position 4: 1.60944 Position 5: 1.88670 Position 6: 1.60944 Position 7: 1.64342 Position 8: 1.55711 Position 9: 1.69574 Position 10: 1.50479 Position 11: 1.69574 Position 12: 0.94335 Position 13: 1.60944 Position 14: 2.02533 Position 15: 1.69574 Position 16: 1.69574 Position 17: 1.69574 Position 18: 1.88670 Position 19: 1.64342 Position 20: 1.47081 Position 21: 1.74807 Position 22: 1.74807 Position 23: 1.35924 Position 24: 1.35924 Position 25: 1.35924 Position 26: 1.35924 Position 27: 1.22061 Position 28: 0.32508 Position 29: 0.32508 Position 30: 0.32508 Position 31: 0.32508 Position 32: 0.32508 Position 33: 0.32508 Position 34: 0.32508 Position 35: 0.32508 Position 36: 0.32508 Position 37: 0.32508 Position 38: 0.32508 Position 39: 0.32508 Position 40: 0.32508 Position 41: 0.32508 Position 42: 0.32508 Position 43: 0.32508 Position 44: 0.32508 Position 45: 0.32508 Position 46: 0.32508 Position 47: 0.32508 Position 48: 0.32508 Position 49: 0.32508 98

Student Number-18022710

Position 50: 0.32508 Position 51: 0.32508 Position 52: 0.32508 Position 53: 0.32508 Position 54: 0.32508 Position 55: 0.32508 Position 56: 0.32508 Position 57: 0.32508 Position 58: 0.32508 Position 59: 0.32508 Position 60: 1.35924 Position 61: 1.49787 Position 62: 1.60944 Position 63: 1.47081 Position 64: 1.47081 Position 65: 1.50479 Position 66: 2.16396 Position 67: 1.47081 Position 68: 1.16828 Position 69: 1.83437 Position 70: 1.27985 Position 71: 1.35924 Position 72: 1.27985 Position 73: 1.97300 Position 74: 1.22061 Position 75: 0.63903 Position 76: 0.63903 Position 77: 0.63903 Position 78: 0.94045 Position 79: 0.94045 Position 80: 0.94045 Position 81: 0.94045 Position 82: 0.94045 Position 83: 0.94045 Position 84: 0.94045 Position 85: 0.94045 Position 86: 0.80182 Position 87: 0.94045 Position 88: 0.94045 Position 89: 0.80182 Position 90: 0.94045 Position 91: 0.80182 Position 92: 0.89795 Position 93: 1.22753 Position 94: 1.08890 Position 95: 1.08890 Position 96: 1.49787 Position 97: 1.16828 Position 98: 1.49787 Position 99: 0.63903 Position 100: 1.22061 Position 101: 1.49787 Position 102: 1.88670 Position 103: 1.27985 Position 104: 1.08890 Position 105: 1.50479 Position 106: 1.74807 99

Student Number-18022710

Position 107: 1.83437 Position 108: 0.94045 Position 109: 1.83437 Position 110: 1.22753 Position 111: 0.94045 Position 112: 0.63903 Position 113: 1.35924 Position 114: 2.02533 Position 115: 1.74807 Position 116: 1.35924 Position 117: 2.16396 Position 118: 1.35924 Position 119: 2.02533 Position 120: 1.88670 Position 121: 1.97300 Position 122: 2.16396 Position 123: 0.94045 Position 124: 0.63903 Position 125: 1.74807 Position 126: 0.94335 Position 127: 1.60944 Position 128: 1.55711 Position 129: 1.69574 Position 130: 1.22753 Position 131: 1.08890 Position 132: 1.02965 Position 133: 1.41848 Position 134: 1.60944 Position 135: 2.02533 Position 136: 2.16396 Position 137: 1.88670 Position 138: 1.97300 Position 139: 1.60944 Position 140: 1.69574 Position 141: 2.16396 Position 142: 0.63903 Position 143: 0.50040 Position 144: 0.63903 Position 145: 0.63903 Position 146: 0.63903 Position 147: 0.63903 Position 148: 0.50040 Position 149: 0.80182 Position 150: 1.49787 Position 151: 2.02533 Position 152: 2.02533 Position 153: 1.49787 Position 154: 1.64342 Position 155: 2.02533 Position 156: 1.97300 Position 157: 1.88670 Position 158: 1.88670 Position 159: 1.64342 Position 160: 1.83437 Position 161: 1.60944 Position 162: 1.97300 Position 163: 1.22753 100

Student Number-18022710

Position 164: 1.88670 Position 165: 1.69574 Position 166: 1.47081 Position 167: 1.69574 Position 168: 1.69574 Position 169: 1.49787 Position 170: 0.63903 Position 171: 0.63903 Position 172: 0.94045 Position 173: 1.50479 Position 174: 1.60944 Position 175: 1.69574 Position 176: 1.83437 Position 177: 1.35924 Position 178: 1.35924 Position 179: 1.74807 Position 180: 1.69574 Position 181: 1.49787 Position 182: 1.74807 Position 183: 1.50479 Position 184: 1.22061 Position 185: 1.64342 Position 186: 1.50479 Position 187: 2.02533 Position 188: 1.69574 Position 189: 1.36616 Position 190: 1.27985 Position 191: 1.60944 Position 192: 1.49787 Position 193: 1.50479 Position 194: 1.16828 Position 195: 1.60944 Position 196: 2.02533 Position 197: 1.60944 Position 198: 1.83437 Position 199: 1.49787 Position 200: 1.08890 Position 201: 0.94045 Position 202: 0.94045 Position 203: 0.94045 Position 204: 0.94045 Position 205: 0.94045 Position 206: 0.63903 Position 207: 0.63903 Position 208: 0.63903 Position 209: 0.63903 Position 210: 0.63903 Position 211: 0.63903 Position 212: 0.63903 Position 213: 0.63903 Position 214: 0.63903 Position 215: 0.32508 Position 216: 0.32508 Position 217: 0.32508 Position 218: 0.32508 Position 219: 0.32508

101

Student Number-18022710

Appendix 3 – Prokaryotic and Eukaryotic Maximum Likelihood Tree

102

Student Number-18022710

Appendix 4 – Abbreviation List

Basic Local Alignment Search Tool – BLAST

Cytochrome C – CytC

Double Stranded Breaks – DSB

Electron Transport Chain – ETC

Endosymbiotic Gene Transfer – EGT

Genome Evolutionary Analysis – CoGe

Large Subunit – LSU

Mega Evolutionary Genetic Analysis - MEGA

Millions of Years Ago – MYA

Mitochondrial DNA – mtDNA

Mitochondrial-Related Organelles – MRO

Multiple Sequence Alignment Using Command-Line – MUSLC E

National Centre for Information – NCBI

Open Reading Frame – ORF

Oxidative Phosphorylation – OXPHOS

Reactive Oxygen Species – ROS

Small Subunit – SSU

103