Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 961

Mitochondrial Evolution

Turning Bugs into Features

BY OLOF KARLBERG

ACTA UNIVERSITATIS UPSALIENSIS UPPSALA 2004                                !  "# $%%# "&'"( )   * )    ) +  , -    .       *,

   * /, $%%#, !       , -  * *   , 0       ,                     12", &3 ,    , 456 1"7((#7(1&&7"

-     * )    )        . .            *     *       87      , - * )   )                        .     ) ) *  )   *  )        , -  )  .  .  *      * )      *          *    *        87      , -    . 9   *  )                      *  )  7       )      .   87       .        , /   **         )7           ) *  *                *, :    )  87           *    )                  )   )      )      )          87      ,       )   )         .  )    *  .   )  *  )  87                 )        )    *      7      ,

 ! "     #  " $         " %  #  " &!' ()"      " #*+,- ./  " 0 

; / )  * $%%#

4556 ""%#7$&$< 456 1"7((#7(1&&7"  '  ''' 7#$"2 = '>> ,,> ? @ '  ''' 7#$"2A List of Papers

This thesis is based on the following papers, which will be referred to in the text by their roman numerals.

I Karlberg O, Canbäck B, Kurland CG, Andersson SGE (2000) The dual origin of the yeast mitochondrial proteome. Yeast 17, 170-187.

II Amiri H, Karlberg O, Andersson SGE (2003) Deep origin of plastid/parasite ATP/ADP translocases. J Mol Evol 56, 137-150.

III Alsmark CM*, Frank AC*, Karlberg EO*, Legault B, Ardell DH, Canbäck B, Eriksson A-S, Näslund AK, Handley SA, Huvet M, La Scola B, Holmberg M, Andersson SGE (2004) The louse- borne human pathogen Bartonella quintana is a genomic derivative of the zoonotic agent Bartonella henselae. Proc Natl Acad Sci U S A. in press.

IV Bousseau B, Karlberg EO, Frank AC, Legault B, Andersson SGE (2004) Inferring the Į-proteobacterial ancestor. Proc Natl Acad Sci U S A in revision.

Reprints were made with the permission of the publishers.

* Shared first authorship Contents

1. Introduction...... 1 1.1 Eukaryota ...... 1 1.2 Mitochondria – heart of the cell ...... 3 1.2.1 The Serial Endosymbiosis Theory...... 3 1.2.2 Reduced and restructured ...... 4 1.3 Mitochondrial ancestor(s)...... 4 1.3.1 The host ...... 5 1.3.2 The endosymbiont ...... 5 1.3.3 Į-Proteobacteria...... 5 1.3.4 Rickettsia prowazekii...... 7 1.3.5 Bartonella henselae and Bartonella quintana ...... 8 2. Aims...... 9 3. Methodological considerations ...... 10 3.1 Determining the genome sequence (paper III) ...... 10 3.1.1 Solving repeats...... 10 3.2 Defining the mitochondrial proteome (paper I)...... 13 3.3 Phylogenetic methods (papers I-IV)...... 14 3.3.1 Large scale phylogenies (papers I and IV) ...... 14 3.4 Reconstructing the ancestor (paper IV)...... 16 3.4.1 Clustering ...... 16 4. Results...... 18 4.1 Evolution in Į-proteobacteria...... 18 4.1.1 Genome dynamics ...... 18 4.1.2 From multi- to single-host pathogen...... 18 4.2 Mitochondrial evolution...... 19 4.2.1 Origin of the mitochondrial proteome ...... 19 4.2.2 Fate of the mitochondrial ancestor...... 20 5. Discussion...... 21 5.1 Evolution in Į-proteobacteria...... 21 5.1.1 From free-living to obligate parasitism ...... 22 5.2 Mitochondrial evolution...... 23 5.2.1 The yeast mitochondrial proteome ...... 23 5.2.2 The mitochondrial ancestor ...... 24 5.3 ATP/ADP-translocases...... 25 6. Concluding remarks and future perspectives ...... 27 7. Summary in Swedish ...... 28 Mitokondriens evolution ...... 28 8. Acknowledgments...... 31 9. References...... 32 Abbreviations

AIDS Aquired Immunodeficiency Syndrome ADP Adenosine 5’-diphosphate ATP Adenosine 5’-triphosphate bp Base pairs COG Clusters of Orthologous Groups DNA Deoxyribonucleic acid EMBL The European Molecular Biology Laboratory Mb Mega bases NADH Nicotinamide adenine dinucleotide NCBI National Center for Biotechnology Information (USA) PCR Polymerase Chain Reaction RNA Ribonucleic acid rRNA Ribosomal RNA SET Serial Endosymbiosis Theory SSU Small Subunit tRNA Transfer RNA TrEMBL translation of the EMBL database TrEMBLnew Additions to trEMBL since the last release 1. Introduction

Man has always been interested in dividing his environment into different categories. Carl von Linné is probably the most famous taxonomist ever, and much of his 18th century work is still valid. The idea that life can be categorized into increasingly larger groups of similar organisms was further fueled by Charles Darwin’s theory on the origin of the species. Modern evolutionary biology has provided a framework for classification of organisms by their evolutionary relationships. Based on the observation that all life has the same basic structure – in the form of DNA for information storage, RNA for information transfer and as executive units – it is generally assumed that all extant life has a common ancestor: LUCA, the Last Universal Common Ancestor. As taxonomic classification originally had to rely on features that could be observed by eye, under the microscope or in biochemical assays, the taxonomy of microorganisms was for long a problematic field. Until Carl Woese presented his pioneering work on molecular based taxonomy, life was divided into two top-level domains: prokaryota and eukaryota. By constructing phylogenies on the slow-evolving for small subunit (SSU) ribosomal RNA, Woese could demonstrate that prokaryota contained two substantially different groups (Woese and Fox, 1977). The result was the split of prokaryota into archaea and bacteria (Woese et al., 1990). Previously overlooked in microbiology, archaea are specialists at surviving in extreme environments and though having a cellular organization similar to bacteria, they share many molecular features with the eukaryotes.

1.1 Eukaryota When talking about life, most people think about human life or animals and plants. These are also the best known representatives of eukaryota. All individual organisms that we can see by eye are multi-cellular eukaryotes. But eukaryotes are much more than what we can see by eye and includes also unicellular organisms as amoebas and yeasts (fig. 1). The main feature that separates eukaryotes from bacteria and archaea is the containment of the in a nucleus. Eukaryotes are distinct from bacteria and archaea in other ways as well; the presence of various specialized cellular compartments, organelles, is only observed in eukaryotes (fig. 2). 1 Animalia

Choanozoa

Fungi

Microsporidia*

Amoebozoa*

Apusozoa

Loukozoa

Metamonada*

Parabasalia*

Discicristata

Rhizaria

Alveolata

Chromobiota

Cryptophyta

Plantae Figure 1 Schematic view of the eukaryotic taxonomy based on the work by Cavalier-Smith and Stechmann (Cavalier-Smith, 2002; Stechmann and Cavalier- Smith, 2002). Groups marked with * are those originally suggested as primitively amitochondrial in the Archezoa hypothesis.

Figure 2 Schematic drawing of the eukaryote cell. Adopted from (Alberts et al., 1994) 2 1.2 Mitochondria – heart of the cell Mitochondria are essential organelles to all respiring eukaryotes. Often described as the powerhouse of the eukaryotic cell, it is the cellular compartment where organic compounds are oxidized to carbon dioxide and water with a high yield of chemical energy in the form of ATP. This process, called oxidative phosphorylation, is such an effective process that it often is regarded as a prerequisite for multicellular life (Pfeiffer et al., 2001; Vellai et al., 1998). Indeed, to my knowledge there are no true multicellular organisms lacking mitochondria. Energy conversion is, however, not the only function of mitochondria. Mitochondria are involved in many essential biochemical processes such as steroid synthesis (Jefcoate et al., 1992) and calcium regulation (Deryabina et al., 2004). They have even been shown to play an important role in cell apoptosis (Petit et al., 1996; Susin et al., 1998). So, although the regulatory center – or brain – of mitochondriate cells is in the nucleus, where is switched on and off, the organelle which keeps it all going is the , its heart. The central role of mitochondria has also made mitochondrial dysfunction an important cause of (mostly hereditary) disease in both neural and muscular systems. Mutations in ND6, coding for a subunit in the NADH complex has been linked to sudden-onset blindness (Jun et al., 1994) while mutations in genes for tRNAs can cause myoclonic epilepsy, mitochondrial myopathy and ragged-red fiber disease (Goto et al., 1990; Shoffner et al., 1990). A single base mutation in the mitochondrial SSU rRNA greatly increases the risk of aminoglycoside induced deafness (Fischel-Ghodsian et al., 1997). As the aminoglycoside antibiotics act on the bacterial ribosome, it is easy to imagine how mutations in the mitochondrial ribosome can make them sensitive to antibiotics. This is because mitochondria are essentially bacteria that have evolved to serve the eukaryotic cell.

1.2.1 The Serial Endosymbiosis Theory The idea that mitochondria are of bacterial origin has been around since the beginning of the 20th century and the man commonly credited is Ivan E. Wallin, an American professor in anatomy (Wallin, 1927). Almost twenty years before him, Constantin Mereschkowsky had claimed that chloroplasts are of bacterial origin (Martin and Kowallik, 1999; Mereschkowsky, 1905). Unfortunately, none of these ideas got foothold at the time. Apart from the general reluctance in accepting part of the human as bacterial, Wallin’s theory on the mitochondrial origin was probably not aided by his claim to have succeeded in growing mitochondria outside of the eukaryotic cell (now regarded as impossible because of the tight integration of mitochondria into the eukaryotic cell). 3 The theories were revived again in 1970 by Lynn Margulis (Margulis, 1970; Margulis, 1981), and supported by molecular evidence (Gray et al., 1984; Olsen et al., 1994; Viale et al., 1994; Yang et al., 1985) the bacterial origins of both mitochondria and chloroplasts are now undisputed. Margulis’ theory, as presented in her 1970 book was dubbed the Serial Endosymbiosis Theory (SET) by Max Taylor (Taylor, 1974). Although Margulis should be credited for reviving these theories, not all of her ideas have been accepted and the current consensus can be regarded as a “light”-version of her SET. The original theory does not only comprise chloroplasts and mitochondria, but also undulipodia (flagella and cilia) are claimed to have originated through an endosymbiotic event.

1.2.2 Reduced and restructured During evolution, mitochondria have lost almost all of their genetic information and only a small number of genes are commonly found in the genomes of mitochondria. Although these genes – which include genes for cytochrome oxidase subunits, cytochrome b, ATP synthase subunits, succinate dehydrogenase subunits, ribosomal proteins and the ribosomal RNAs – are essential to mitochondrial function, they are far from enough for functional mitochondria. The majority of proteins used in mitochondrial functions are encoded in the eukaryotic nucleus and subsequent to translation imported into mitochondria. The SET implies a massive transfer of genes from the ancestral mitochondrial genome to the nucleus (Gray, 1992), and the transfer of genes from mitochondria to nucleus can still be observed in some plants. For example one copy of ribosomal protein s12 can be found active in the nucleus and as a pseudo gene in mitochondria (Grohmann et al., 1992). Although no such obvious gene migration has been observed in animals, a study of the human nuclear genome has revealed over 600 independent integrations of mitochondrial sequences as pseudogenes (Woischnik and Moraes, 2002).

1.3 Mitochondrial ancestor(s) In order to better understand how mitochondria have evolved to what they are today, it is important to know from what they have evolved. A theory on endosymbioses requires at least two participating organisms – a host and an endosymbiont. As the picture of the endosymbiont’s nature gets clearer all the time, the nature of the host is still obscured by time and the dust has not yet settled around the competing hypotheses and speculations on the subject.

4 1.3.1 The host The theories about the nature of the host can generally be divided into two categories: The eukaryotic host and the archaeal host. The best known theory involving a eukaryotic host cell is the archezoa hypothesis by Cavalier- Smith (Cavalier-Smith, 1987). According to this hypothesis the host that first aquired mitochondria was a primitive eukaryote from which some lineages diverged prior to the acquisition of mitochondria. These lineages, dubbed archezoa, were Metamonads, Microsporidia, Parabasalia and Archamoebae, all suggested to be deep branching amitochondriate eukaryotes (fig 1). As the archezoa are nucleated, the nucleus must have evolved prior to the acquisition of mitochondria. A major drawback of the archezoa hypothesis is that most of the suggested archezoa have been found to contain possible genetic remnants of mitochondria (Clark and Roger, 1995; Germot et al., 1996; Roger et al., 1998), and their placements in the eukaryotic evolutionary tree have been questioned (Hirt et al., 1999; Katinka et al., 2001). Theories using different archaea as the host of choice explains eukaryotes as the result of the merger between the archaeal host and the bacterial endosymbiont. Hence, there were no eukaryotes prior to the mitochondrial endosymbiosis and the nucleus was either the result of a subsequent event or a direct effect of the same symbiosis that gave rise to mitochondria (Martin and Muller, 1998; Moreira and Lopez-Garcia, 1998).

1.3.2 The endosymbiont Already early on in the formulation of the SET was the focus directed towards proteobacteria as a group of bacteria comprising species with characteristics suitable for a mitochondrial ancestor. Early suggestions of proto-mitochondria like bacteria were the į-proteobacterium Bdellovibro and the Į-proteobacterium Paracoccus (Margulis, 1970; Margulis, 1981) because of their metabolic capabilities, including the Krebs cycle and oxidative phosphorylation utilizing cytochromes. With time and the help of phylogenetic analyses of small subunit (SSU) ribosomal RNA (rRNA) as well as proteins of the respiratory complexes, Į-proteobacteria has become the taxonomic home of mitochondria (Gray et al., 1999; Gray and Spencer, 1996; Olsen et al., 1994; Sicheritz-Ponten et al., 1998; Viale et al., 1994; Yang et al., 1985).

1.3.3 Į-Proteobacteria Also known as purple [non-sulfur] bacteria, the proteobacteria are classified in four sub-divisions: alpha, beta, gamma, and epsilon/delta and comprises around 17,000 out of some 44,000 bacterial species in the NCBI taxonomy 5 database (www.ncbi.nih.gov/Taxonomy/). The number of Į-proteobacteria is around 6,000 species. These numbers could either reflect the ecological success and rich variability found among these species, or their medical and scientific importance resulting in a relative over-sampling. The group of Į-proteobacteria contains a wide spectrum of bacterial species living in diverse niches, from the obligate intracellular parasites in Rickettsiales to the free-living Caulobacterales. The group is divided into six different taxonomic orders, and a seventh order was recently described from water isolates from the Sargasso sea (Cho and Giovannoni, 2003) (Fig. 3). The spread in lifestyles among Į-proteobacteria is reflected in the span of genome-sizes from 1.1 Mb in Rickettsia prowazekii (Andersson et al., 1998) to 9.1 Mb in Bradyrhizobium japonicum (Kaneko et al., 2002), which almost covers the total span of bacterial genome sizes.

Rhodobactersphaeroides Rhodobacterales Paracoccusdenitrificans

Bartonellahenselae(1.9)

Bartonellaquintana(1.6)

Brucellamelitensis(3.3)

Brucellasuis(3.3) Rhizobiales Mesorhizobiumloti(7.0)

Agrobacteriumtumefaciens(4.9)

Sinorhizobiummeliloti(3.7)

Bradyrhizobiumjaponicum(9.1)

Rhodopseudomonaspalustris(5.5)

Parvularculabermudensis Parvularculales UnculturedsludgebacteriumH9

Caulobactercrescentus(4.0) Caulobacterales Brevundimonasdiminuta

Sphingomonaspaucimobilis Sphingomondales Zymomonasmobilis

Rhodospirillumrubrum Rhodospirillales Magnetospirillumgryphiswaldense

Rickettsiaconorii(1.3)

Rickettsiaprowazekii(1.1) Rickettsiales

WolbachiapipientiswMel(1.3) Figure 3 Schematic representation of the Į-proteobacterial taxonomy based on paper IV and Cho and Giovannoni (Cho and Giovannoni, 2003). Numbers in parentheses denotes genome size in Mb for sequenced genomes.

6 1.3.4 Rickettsia prowazekii As a representative of the Rickettsiales, Rickettsia prowazekii is often regarded as the closest relative of mitochondria among extant bacterial species. This view is founded partly on the results of molecular phylogenies (Emelyanov, 2003; Gray and Spencer, 1996; Sicheritz-Ponten et al., 1998) but another reason is the striking similarities in phenotypes between R. prowazekii and mitochondria. As an obligate intracellular parasite, R. prowzekii is totally dependent on its eukaryotic host cell for survival, it has evolved transport mechanisms to be able to utilize its nutrient rich environment, resulting in a still ongoing reduction of its genome, as whole biosynthetic pathways are discarded, and it has retained a complete set of genes for oxidative phosphorylation (Andersson and Andersson, 2001; Andersson et al., 1998). However, it is important to remember the two billion years (Hedges et al., 2004; Hedges et al., 2001) that lay between present time and the time when mitochondria were first formed. In this perspective, the similarities between mitochondria and Rickettsiales should not be interpreted to mean that the mitochondrial ancestor was one of the Rickettsiales that we can find today, but rather that they are likely to stem from the same ancestor. Such an ancestor would have the previously mentioned metabolic capabilities as well as a predisposition to adapt to an intracellular life-style. The genome of R. prowazekii was one of the first genomes, and the first Į-proteobacterial genome, to be sequenced when it was completed in 1998 (Andersson et al., 1998), leading way to more extensive research on its relation to mitochondria. R. prowazekii is also known as the causative agent of epidemic, louse- borne typhus, a disease that caused many deaths before the introduction of antibiotics. The use of lice as vector has proven especially advantageous during wars, where a lot of people are living close together under bad sanitary conditions, and it has been estimated that several millions died from typhus during the First and Second World War (Gross, 1996). Important for the lifestyle of Rickettsia spp. is their ability to prey on the ATP found in the host cells. This is accomplished with a family of proteins called ATP/ADP translocases which imports ATP into the Rickettsial cell in exchange for the less energy-rich ADP (Winkler, 1976). This is the opposite of what is done in mitochondria, in which the ATP/ADP exchange has the opposite polarity. The ATP/ADP translocases found in Rickettsia spp. are of a type found also in chloroplasts (Heldt, 1969), intracellular parasites of the family Chlamydiaceae (Hatch et al., 1982) and in the eukaryotic parasite Encephalitozoon cuniculi (Katinka et al., 2001) which belongs to the Microsporidia. However, they are unrelated to the ATP/ADP translocases found in mitochondria and despite the wide taxonomic spread of these

7 organisms, the plastid/parasite type of ATP/ADP translocases are so far not found in any other bacterial groups.

1.3.5 Bartonella henselae and Bartonella quintana Although perhaps not the closest relatives of Rickettsiales and mitochondria, the Į-proteobacterial genus Bartonella can help to shed some light on early mitochondrial evolution. Just as Rickettsia, these have specialized in a life inside eukaryotic cells. But Bartonella has also the ability to grow outside of the cells, although they have to steal some nourishment from their hosts. Another reason for interest in Bartonella species is their past and present ability to cause human disease (Karem et al., 2000). Just like R. prowazekii, B. quintana played an important role during the world wars as the causative agent of trench fever, characterized by multiple relapses of fever often associated with pain in the legs and the back. Both R. prowazekii and B. quintana are spread by the human body louse, but while R. prowazekii has its reservoir in the lice where it is inherited by the offspring and shortens the lifespan of the lice, B. quintana has its reservoir in humans, with the lice as vector for transmission between humans (Karem et al., 2000). B. henselae is commonly found in cats, which seem to take no harm from the infection (Regnery et al., 1996), and transmission to humans can incidentally occur through cat scratches or bites resulting in cat scratch disease. It has been estimated that as much as 30-60% of domestic cats in the U.S. are infected with B. henselae (Jameson et al., 1995). While the clinical manifestations of infection with B. henselae or B. quintana usually are suppressed by the immune system after the acute phase of illness, infections in immunocompromised individuals, such as AIDS patients, can cause bacilliary angiomatosis, bacilliary peliosis and bacteremia (Karem et al., 2000). Bacilliary angiomatosis and bacilliary peliosis are characterized by tumor like vasoproliferative lesions due to angiogenesis stimulation by the invading bacteria. This angiogenic effect of Bartonella infection is of special interest for cancer research as the process of angiogenesis is essential for continued tumor growth. Studies on Bartonella spp. can hopefully help in the understanding of these mechanisms.

8 2. Aims

In order to better understand the mitochondrial origin, we need to investigate the evolutionary processes that have shaped mitochondria. Because of the size and complexity of eukaryotic genomes, mitochondrial evolution is hard to study in its real environment. The only eukaryotic system that is well enough characterized is the model organism Saccharomyces cerevisiae (baker’s yeast), but even there a lot more needs to be done. Until the functional annotation of eukaryotic organisms has been taken to a new level, the best way to understand the evolutionary processes in early mitochondria is to study their closest bacterial relatives, the Į-proteobacteria. The aim of this work is to get a better understanding of mitochondrial evolution by extending our knowledge about Į-proteobacterial evolution in general and Į-proteobacterial and mitochondrial adaptation to an intracellular lifestyle in particular.

Even more specifically, the following questions are asked: x How do bacteria evolve from free-living to intracellular parasites or symbionts? What are the common processes and to what extent are they different in different lineages? x What were the features and abilities of the mitochondrial ancestor? x To what extent has the mitochondrial ancestor contributed to the genes in the eukaryotic nucleus? x To what extent has the mitochondrial host contributed to the mitochondrial proteome? x If the ATP/ADP translocases found in Rickettsia are unrelated to the mitochondrial translocases, where did they originate and were they present in the mitochondrial ancestor?

9 3. Methodological considerations

3.1 Determining the genome sequence (paper III) The advances in molecular biology in the last decade has made it possible to study organism evolution by means of whole genome comparisons. With the genome sequence of R. prowazekii at hand and a need for other Į- proteobacteria to compare with, the decision was made to sequence the somewhat larger genomes of B. henselae and B. quintana. Shotgun sequencing is the most common and most successful method for genome sequencing today, and that was the chosen method also for the Bartonella genomes. In short, shotgun sequencing is done by randomly sequencing a large number of fragments from a large DNA molecule and then resolve the full sequence by assembling the fragments according to overlapping regions. Despite of the developments of effective sequenators and powerful computers, genome sequencing is seldom a straight-forward process and although educated guesses can be made from related organisms, the content and structure of a genome is never known until it is sequenced. Both B. henselae and B. quintana proved to contain their share of complex regions, and despite comprising only a minor fraction of the total sequence these regions required most of the time and effort spent.

3.1.1 Solving repeats Repeats come in many variations, they can be long or short and identical or only very similar. In general, they only pose a problem when they are identical or nearly identical and longer than the maximal length of a sequence read (~700 bp). In these cases the assembly software will not know that identical reads should be used in several places and will just put them all at the same location in the assembly, leaving a gap somewhere else (fig. 4). With enough sequence data, high quality single base differences can be sufficient for a correct assembly, but there is always a trade-off in the software between accepting ambiguities and closing gaps. Because of sequencing errors, there will always be ambiguities and if the tolerance for these are set too strict, regions with low coverage or problematic structures, will not be assembled. On the other hand, if the tolerance is set too loose, 10 there is a high risk for mis-assembly of near identical repeats. In the end, solving repeats needs human intervention, and becomes the major rate- limiting step in a sequencing project. In general, assembly errors caused by repeats display themselves in two ways: an unexpectedly high coverage of a region (stacking) (fig. 4a) and/or the exclusion of some reads into a separate contig (fig. 4b). The first thing to try when resolving a repeat is to determine the distance between flanking unique sequences. Provided that the distance between the priming sites is shorter than 20 kb, this can generally be done with PCR, using genomic DNA as template. However, long range PCR can be very difficult, and depending on the flanking sequences it is not always possible to find suitable primer sites.

A. Correct

Result of assembly

B. Correct

Result of assembly

Figure 4 Two examples of mis-assembly due to long repeats. Black regions represents unique sequence while repeats are white/gray. The short lines represent sequence reads and the dotted lines reads which fit in more than one place in the assembly.

Once the total length of the region is known, the next step is to find out how the different segments are organized within the region. This is where high quality single base differences can be very useful. By separating reads on the content of these small differences, it can be possible to identify the different copies of a repeat. If put in the 3’ end of a primer, they can also be used to sort out the internal order of the repeated sequences. By trying PCR reactions between different unique locations within and between the repeats and measure the length of the resulting products, it is possible to make a map

11 over the repeats. However, it is not always obvious where the repeat boundaries are or even what is repeated and what is not. Simple as it may sound in theory, it is a major obstacle in practice. As both the methods and materials used are of biological nature, there is room for a lot of errors. First of all, sequencing errors will occur and are not always easy to distinguish from a single base difference between repeat copies. This can only be addressed by sequencing the same region many times. If the same difference occurs in several reads, then it can be assumed that the reads belong to different copies. However, it can not always be excluded that the differences are caused by sequencing artifacts occurring at a high frequency because of special properties of the surrounding sequence, or that they reflect a true polymorphism in the DNA preparation. Another source of problems is PCR reactions. These are not always accurate and may introduce new errors. PCR is highly dependent on many different factors such as salt concentrations, temperatures, primer sequences, and of course the amplified sequence. Furthermore, in a high throughput environment, it is impossible to optimize these parameters for all PCR reactions. This leads to a situation where the results are not always congruent between experiments, adding extra complexity to the picture. The problems described above are very difficult to solve programmatically as they all are unique and must often be solved with some amount of “gut-feeling”. In this process, the computer can only be a support to help in getting a clear view of the picture. The capacity of a human mind to keep track of all different experiments and sequence variations is limited, and computer support is needed. Unfortunately, there is to my knowledge no available software that provides this support. The editing programs that come with assembly packages are made for low level sequence editing, displaying only a few bases from a single contig at a time. Alternatively there are programs like Miropeats (Parsons, 1995) that can display repeat information at the contig level, leaving out all of the details. The recently released program ReDiT (Tammi et al., 2004) which is a tool for detection and manual verification of problem areas, could prove a very useful aid in solving repeats, but as it is based on the editing program Consed (Gordon et al., 1998) there is still the limitations of a too detailed perspective. As a solution, the needed functions were implemented in a perl/Tk program (unpublished data). This program is best described as a dynamic map where the user can zoom in and out to change the level of details. The program displays both repeat information as well as primer locations and successful PCR products, differentiating between PCR products of correct length relative to the map and PCR products in disagreement with the map (fig. 5).

12 Figure 5 Screen dump of the perl/Tk program used to solve repeats. Locations where primers match are shown as blue or red arrows if they match in one or more places, gray arrows represent open reading frames, colored boxes marks regions with similar sequences. Various information is displayed through mouse interaction with the map.

3.2 Defining the mitochondrial proteome (paper I) A major problem when studying the mitochondrial proteome is how to know which proteins it includes. Proteins encoded on the mitochondrial genome and part of known mitochondrial pathways are obvious, but to know which of all the proteins encoded in the nucleus that are used in mitochondrial functions requires a well characterized organism. At the time when the study of paper I was done, the only mitochondrial proteome possible to use was that of Saccharomyces cerevisiae. By then it was composed of almost 400 nuclearly encoded proteins and 30 proteins encoded on the mitochondrial . The number of known mitochondrial proteins encoded in the S. cerevisiae nucleus has since increased to over 600 (Schon, 2001), but recent studies have confirmed the results of our pioneering work and extended its conclusions to both man and mouse (Gabaldon and Huynen, 2003; Mootha et al., 2003). A problem with the proteome set used in paper I has turned out to be its unfortunate dependence on propriety data. When the set was defined the only database that allowed searches on subcellular localization was that from

13 Proteome Inc. At the time, this database was free to use for academic research (though permission was needed for inclusion in publication). Since then, Proteome Inc. has been bought by Incyte and access to the Proteome data now requires a $2,000 subscription. Together with a constant change of gene names in yeast, this has made revision of the data in paper I cumbersome.

3.3 Phylogenetic methods (papers I-IV) Genes evolve through mutations that alter the DNA sequence and when a mutation occurs in one individual but not in another, the population gets polymorphic with respect to that DNA locus. If the mutation does not impair the replication of the affected individual, selection will likely not clean it out of the gene pool and all of its offspring will retain the mutation. With time, many such mutations will accumulate and the hereditary – or phylogenetic – relationships in the population can be studied by comparing the patterns of genetic differences. The same principle can be used to infer the phylogenetic relationships between different species or duplicated genes within the same species. The advances in sequencing technology, evolutionary biology and the need for new tools in taxonomy, has brought a wealth of different methods for phylogenetic inference. An explanation of all the various phylogenetic methods that exist is not within the scope of this work, but an excellent review on the subject can be found in (Whelan et al., 2001). In short, they all have their pros and cons and various methods for phylogenetic inference usually give similar results for sequences of short to moderate evolutionary distances. However, the more the evolutionary distance increases, the larger the influence from erroneous assumptions in the phylogenetic method.

3.3.1 Large scale phylogenies (papers I and IV) As phylogenetic analysis can be performed with a large array of different methods and models, the analysis of a single gene can be enough for months of work just trying to find the best method. Some more months could be needed for computing time alone, if the tree is big and the selected model complex. To cope with hundreds of genes, compromises must be made. The first, and maybe most important step, is the inclusion of sequences for each phylogenetic reconstruction. In general, a richer taxonomic sampling increases the possibility to correctly resolve the phylogeny. On the other hand, the number of possible trees, and the computational time, grows very fast with the number of included taxa. This was not a problem only a few years ago, as the number of sequences in the databases was limited, and for 14 most genes it was possible to include all potentially homologous sequences. For paper I a simple limit on the number of sequences from Eukaryotes, Bacteria and Archaea was enough with a few exceptions where manual adjustment was made. Today, that approach is hardly possible because of the much larger numbers of hits found in a similarity search with an average sequence. The number of in the EMBL database has increased from 5 to 62 billions between 1999 and 2004 (www.ebi.ac.uk), reflecting the massive increase in sequencing. Looking at the distribution of nucleotides in the database, over 75% are from only four organisms of a very limited taxonomic coverage: human, chimpanzee, rat and mouse. Among those four, human sequences make up around half of the total nucleotides. To get an acceptable taxonomic distribution in a limited dataset, the inclusion of sequences should ideally be limited by their taxonomic placement. However, because of paralogies, it is difficult to use the source organism taxonomy as criterion. A better solution would be to make a draft tree with as many sequences as possible included and limit the sequences by removing the sequences that comes close in the trees. Additionally could the data be split in different trees for different paralogous groups of sequences. The problem with this approach is the time required for draft tree searches and manual pruning of the trees. A reasonable and effective solution is to limit the number of sequences on the basis of their similarity to already included sequences. Such a strategy guarantees a broad taxonomic sampling (if the sequences can be found) and at the same time limits the presence of very similar sequences. Neighbor Joining with p-distances can be used to get a first impression of the data. This is a very fast method, albeit not the most consistent or powerful, and from there decisions can be made if the taxonomic sampling seems appropriate or if the selection procedure needs to be adjusted. Once the datasets are fixed, more time consuming methods can be applied. A common procedure is to apply one each of some distance, parsimony and likelihood based method and then compare the results. The assumption is usually that if all methods give the same result, then that result is likely to be correct since the underlying signal is stronger then the differences in assumptions between the methods used. Though not necessarily giving the correct topology, congruence between methods is a reasonable substitute when more rigorous model evaluations are not feasible. In my experience, the most common problem is unresolved topologies and these are usually unresolved regardless of the method used because the phylogenetic signal in the underlying data is not strong enough. For the cases where the different methods disagree, but the trees are resolved, more time can be spent on trying to find the correct tree. In the end, someone has to look at the trees and decide on what story they tell.

15 3.4 Reconstructing the ancestor (paper IV) With the increasing number of available Į-proteobacterial genome sequences, it is now possible to do more large-scale comparisons of their contents and evolution. A problem that immediately arises is how such comparisons can be done without drowning in data and keeping an overview without getting lost in the details. A method that has been used previously for studies of evolution of genome content is Maximum Parsimony reconstructions. In these studies, very large evolutionary distances have been covered in attempts to reconstruct the genome contents of the last common ancestor of all life (Mirkin et al., 2003), or the last common ancestor of archaea (Snel et al., 2002). In this perspective, the last common ancestor of Į-proteobacteria is quite recent.

3.4.1 Clustering genes As with all phylogenetic reconstructions, it is important to correctly identify characters. In the case of reconstructing genome content, these characters are genes and the states are their copy numbers. Assuming that evolution is acting through duplication and divergence, genes belonging to the same family should all go back to a single ancestral gene. As the ancestral gene has duplicated and the copies diverged to different functions, their shared similarities might be lost and novel families formed. This makes it troublesome to assess which genes should be counted as the same characters, and which should be counted as different characters. A too inclusive criteria will miss evolution of novel functionality, while a too strict criteria will risk counting even minor divergences as inventions. Ideally, the grouping of genes into families should be based on their function and evolutionary history. However, because of the large number of genes in a study like this in combination with the varying quality of annotations, this is not feasible. The best alternative would be similarity searches combined with phylogenetic analyses, but again, the number of comparisons would be prohibitive. So, the only alternative that remains is to rely on similarity searches alone. Classifying genes by similarity searches is not an exact science. Generally, there is a trade-off between sensitivity and speed, where more accurate algorithms like Smith-Waterman (Smith and Waterman, 1981) has the potential to selectively find more of the distant matches than the much faster BLAST (Altschul et al., 1990). However, as this study is limited to relatively closely related organisms, the assumption can be made that if two sequences are not similar enough to be picked up with BLAST, then they should probably not be included in the same group because of too divergent functionality. Furthermore, the aim is not primarily to estimate the precise number of copies of genes in the Į-proteobacterial ancestor, but rather to 16 estimate its overall gene content; what functionality was present and what was not? Once settled on using sequence similarity scores as the measure of choice, the question is how it should be applied. The resulting groups will be very different depending on how the similarity scores are used. Which similarity should be used, the similarity between a reference protein and all other proteins using a simple threshold value, or should the group be expanded by iterative comparisons where similarity, as defined by some threshold, to any of the included genes is enough for inclusion? Both methods can be simple to implement, but the former is too simplistic and could result in too many groups, depending on the selected reference sequences and thresholds. The latter is probably more accurate as it does not rely on just single reference sequences, but the size of the clusters will be highly dependent on an arbitrarily selected threshold value. The Clusters of Orthologous Groups (COG) by Tatusov and co-workers (Tatusov et al., 1997; Tatusov et al., 2001) is similar to iterative BLAST in the sense that the groups are expanded through chains of similarities. However, it is made more robust by avoiding the problem of selecting a threshold and instead relying on “best reciprocal hits”. This means that for a sequence to be incorporated in a COG, its best hit in a similarity search in one genome must correspond to it being the best hit in a search with that sequence. By this scheme, the number of included sequences for a group can be kept relatively low and the risk of a too exclusive selection is minimized. A positive effect of using the COGs as characters is that there is a large compilation of pre-defined COGs available for download at NCBI. Apart from the obvious savings in computer time, this also has an advantage in the form of pre-existing functional annotations and classifications, saving a lot of work on identifying functions for the different clusters. Although COG is not an ultimate tool for clustering genes, it is an acceptable trade-off between speed and quality in this application. Moreover, as COG is well known and has been used in several other studies, its use in this study makes the results easier to compare with the results of others.

17 4. Results

4.1 Evolution in Į-proteobacteria

4.1.1 Genome dynamics Analysis of how the size of different gene categories correlates with the tenfold differences in genome size revealed small to moderate expansions in core functionalities like transcription and protein synthesis (paper IV, fig. 1b) while genes for e.g. transport, regulation and energy metabolism display rapid increases with genome size (paper IV, fig. 1a). Reconstructions of the Į-proteobacterial ancestor and the internal nodes in the Į-proteobacterial tree shows massive losses and expansions of the gene sets at the transitions to parasitic and plant-associated lifestyles, respectively (paper IV, fig. 3). There is no evidence for a single origin of the plasmids found in many of the species, but it seems like these extra replichores are highly dynamic and a source of rapid evolution. The estimated content of the Į-proteobacterial ancestor reveals a free-living, aerobic and motile bacterium with good capabilities to interact with its environment.

4.1.2 From multi- to single-host pathogen The specialization of B. quintana has been coupled with a genomic reduction and no substantial addition of novel functionality could be observed (paper III). In effect, B. quintana is genetically a subset of B. henselae. Except for 25 of the total 1143 genes, B. quintana has nothing that B. henselae has not. Of these 25 genes are 17 specific to B. quintana and one is similar only to a possible pseudo-gene in B. henselae, and hence no function can be assigned from sequence similarity to known genes. Only 8 of the genes not present in B. henselae have similarity to genes found in other species and as many as 6 of these are found as pseudo-genes in B. henselae. The B. quintana specific genes for which a function can be inferred are involved in iron transport (fatB, fatC, fatD & ceuD), hemolysin export and also a gene very similar to yopP, which is found on a virulence determining plasmid in Yersinia spp. In

18 Yersinia, the product of YopP has been reported to suppress the inflammatory response to infection (Hoffmann et al., 2004). While the previously identified family of hemin binding proteins (HbpA- E) in B. quintana (Minnick et al., 2003) is also present in B. henselae, the additional genes for iron transport and hemolysin export in B. quintana shows its extreme dependence on hemin uptake from the environment. Indeed, B. quintana has the highest reported hemin requirement for bacterial in vitro growth (Myers et al., 1969). The origin and function of the remaining 18 B. quintana specific genes with unknown functions deserves some more investigation, as they could be of importance for the differences in life-style between the two Bartonella species.

4.2 Mitochondrial evolution

4.2.1 Origin of the mitochondrial proteome Using the mitochondrial proteome in yeast to study the mitochondrial adaptation from an independent species to a highly integrated organelle, similarity searches and phylogenetic analyses showed that half of the mitochondrial proteins that are encoded in the nucleus have no discernable bacterial homologs and only a small fraction of nuclearly encoded proteins are possible to trace back to the Į-proteobacteria (fig. 6). Phylogenetic analyses were not able to resolve the genetic history for the vast majority of mitochondrial proteins with bacterial homologs. A surprisingly large fraction of the proteins seemed to be specific for yeasts and have probably evolved relatively late. Breaking down the results on functional categories revealed significant differences between on the one hand proteins involved in mitochondrial core functionality like energy metabolism and the translation machinery which have a high fraction of bacterial homologs, and on the other hand transport and regulatory proteins which are almost void of bacterial homologs (paper I, fig. 2). One example of proteins in the transport category is the mitochondrial ATP/ADP translocases, which belong to a large family of mitochondrial carrier proteins of eukaryotic origin (paper II, fig. 2) (Kuan and Saier, 1993) and have thus been recruited to mitochondrial function after the mitochondrial origin. Interestingly, ATP/ADP translocases with the opposite polarity are present in Rickettsiales, but these translocases are not related to the mitochondrial type. Phylogenetic analyses of the parasite/plastid type of ATP/ADP translocases show that they are deeply diverging and the five paralog families found in Rickettsia species are the result of ancient divergences rather than recent duplications (paper II, fig. 1). A closer 19 analysis (paper II, fig. 3) of conserved positions does not find any support for a horizontal transfer from Chlamydiaceae to Rickettsia, as has been suggested previously (Wolf et al., 1999).

ort nsp Tra n o ti la u g E e n R e r g e y n a r

b

m

e

M

is s e T h ra t n yn sl os at Bi ion

Figure 6 Yeast nuclear genes sorted into functional categories. White slices represent proteins without bacterial homologs, gray proteins with bacterial homologs and black represent proteins with Į-proteobacterial origin according to phylogenetic analyses (paper I).

4.2.2 Fate of the mitochondrial ancestor As the ancestor of all Į-proteobacteria, the reconstructed ancestor in paper IV should also be ancestral to mitochondria. Through analysis of the estimated gene set in the Į-proteobacterial ancestor, the minimum fraction of genes lost in the transition to mitochondria could be estimated to about a third of the circa 3000 genes originally present. The fate of the genes that have eukaryotic homologs is uncertain, but phylogenetic analyses shows that at least some of the genes not kept for mitochondrial functions have been recruited for use elsewhere in the eukaryotic cell (Gabaldon and Huynen, 2003; paper IV and unpublished data).

20 5. Discussion

As the mitochondrial symbiosis took place around 2 billion years ago (Hedges et al., 2004; Hedges et al., 2001), mitochondria and Į- proteobacteria have since had plenty of time to evolve in separate directions. Despite this, there is a multitude of convincing evidence for the bacterial origin of mitochondria (Gray et al., 1999; Gray and Spencer, 1996; Olsen et al., 1994; Sicheritz-Ponten et al., 1998; Viale et al., 1994; Yang et al., 1985). The enormous time span is of course a big obstacle in the study of mitochondrial origin, and it is very important to keep in mind that the organisms we have today – be it humans or bacteria – are results of a continuous evolution. Although evolution may proceed with different speed in different lineages and can vary with time, all depending on the level of environmental stability and the selection for or against change, mitochondria and their bacterial sisters are separated by a period of time sufficient to evolve humans from unicellular life. Note, for example, how many of extant Į-proteobacteria are specialized on a symbiotic or parasitic lifestyle dependent on highly evolved eukaryotes as plants and animals. This can obviously not have been the lifestyle of the mitochondrial ancestor. But, left with no other choice, extant Į-proteobacteria are, together with the mitochondria themselves, the best source we have for insights into the mitochondrial origin.

5.1 Evolution in Į-proteobacteria Į-proteobacteria is a group of bacteria comprising a wide variety of life- styles, abilities and genome sizes. Most of those sequenced so far are interesting because of their close association with animals or plants either as parasites or symbionts. The last common ancestor of Į-proteobacteria was, according to our results, a very capable organism with developed features like pili and flagella. This is reasonable, as it is rather more likely that a free living organism evolves into a parasite, than a parasite breaks loose and start to survive on its own. Our data and other’s (Andersson and Andersson, 2001; Fraser et al., 1995; Ogata et al., 2001) strongly points to a mechanism where parasitic bacteria that are highly dependent on their host evolves primarily through genome reduction, leading to ever higher dependence on the host for survival. 21 5.1.1 From free-living to obligate parasitism Among the Į-proteobacteria in the Rhizobiales group sequenced so far, Bartonella spp. are the smallest and the most host dependent. Brucella suis is the closest sequenced relative to Bartonella so far, and although known as a facultative intracellular pathogen in mammals, it has been reported to survive extended periods in soil (Elberg, 1981; Paulsen et al., 2002). The genome of B. suis is 3.3 Mb divided on a larger and a smaller chromosome of 2.1 and 1.1 Mb respectively (Paulsen et al., 2002). Although rearrangements have occurred, the larger chromosome shares its backbone structure with Bartonella and other Rhizobiales, while the smaller chromosome is different from any other Į-proteobacterial chromosome (paper III). It is also notable that in B. suis biovar 3, the two chromosomes are merged into one 3.3 Mb chromosome (Jumas-Bilak et al., 1998). Clearly, this extra chromosome is a valuable source of genomic plasticity, a genetic playground or test lab for innovations and additions of genetic material. This genomic plasticity seems to be widespread among Rhizobiales, as many of them have extra-chromosomal genetic elements. Indicative of the variability in these extra chromosomes is that they do not have any discernable similarities such as stretches of shared gene order or similarities in gene content as can be seen for the main chromosomes. B. henselae still contains some remnants in the genome suggesting that also the ancestor of Bartonella once had an extra chromosomal element, jumping in and out on the main chromosome. And though the genome of B. henselae contains lots of phage related genes and remnants, B. quintana lacks most of these. This suggests that the genome of B. quintana has lost most of its dynamic capability after the division from its common ancestor with B. henselae. As a consequence, B. quintana has also lost much of its ability to adapt to new environments and become irreversibly dependent on the human hosts. Despite many differences in metabolic properties and host interaction as well as the taxonomic distance, the reducing processes that shaped B. quintana can also be seen in e.g. Rickettsia prowazekii which is even more reduced, and still in the process of discarding even more genes (Andersson and Andersson, 2001). Apart from a loss of selective pressure for the genes that are no longer needed in the stable environment inside a host, the genomic reduction is probably also pushed by small population sizes and a high number of evolutionary bottlenecks. Only a tiny fraction of the population is transferred at each jump from one host to the next, and thus will the possibility for fixation of deleterious mutations be increased (Andersson and Kurland, 1998). Free-living bacteria do not usually have these reoccurring expansions from a small population and their evolution is instead driven by the continuous competition for nutrients, where only advantageous mutations

22 have a chance of getting fixed by outgrowing the rest of the population. This leads to a situation where it is an advantage to have a large and dynamic genome in an unstable and competitive environment. For the parasites is this advantage probably smaller as the environment is stable and abundant in nutrients and the major selected feature is ability to evade the host’s defense systems. The net effect in parasites seems to be that the reducing forces are greater than the advantages of keeping a large genome.

5.2 Mitochondrial evolution The adaptation of the mitochondrial ancestor into an organelle was accomplished through both massive losses and expansions of genetic material. A majority of the ancestral mitochondrial genome was probably lost in the process and the majority of proteins that now constitutes the mitochondrial proteome seem to have originated in the eukaryotic nucleus. Still, some of the functions present in the mitochondrial ancestor were possibly recruited for function in the eukaryotic cytoplasm.

5.2.1 The yeast mitochondrial proteome An implicit prediction of the SET is that there was a massive transfer of genes from the ancestral mitochondrial genome to the host nucleus (Gray, 1992). The results of paper I suggests that this transfer was rather limited and involved primarily the genes needed for the mitochondrial core metabolism. The proteins for regulation, transport and even many of the additional proteins in the large respiratory complexes have been substituted for or added by novel proteins originating in the nucleus. The large fraction of novel proteins in the mitochondrial proteome reflects the extent to which the ancestral mitochondrion has been modified, optimized and integrated into the function as an organelle. A problem with the analyses made in paper I is that the inability to find bacterial homologs or get informative phylogenetic reconstructions, does not necessarily mean that a protein evolved within the eukaryote. The large evolutionary distance that lies between mitochondria and any other Į- proteobacteria has certainly covered many traces. However, as much as a third of the mitochondrial proteins not found in bacteria were specific to yeast and are thus not likely to be anything but late additions to adjust mitochondrial functions. Additionally, as of this writing only 7 out of the original 194 proteins without bacterial hits, are now revealed to have possible bacterial homologs (BLAST, E<1e-10) in an updated search against all bacterial proteins currently sequenced (SwissProt, TrEMBL, TrEMBLnew). Furthermore, recent studies on the mitochondrial proteomes

23 of humans and mice, have obtained to similar results (Gabaldon and Huynen, 2003; Mootha et al., 2003). A mystery in mitochondrial evolution is how the mitochondrial protein import machinery has evolved. This machinery consists in yeast of five protein complexes that transports proteins translated in the cytosol into the mitochondrion. This import is generally directed by an amino-terminal targeting pre-sequence on the precursor proteins, though most of the proteins inserted into the mitochondrial membranes contain the targeting and sorting information within the mature protein (Koehler, 2000). This ability to translocate proteins from the cytosol into the mitochondria is a prerequisite for the migration of genes from the mitochondrial genome into the nucleus. But the proteins in the mitochondrial protein import system seem to be exclusively of eukaryotic origin. That leads to a question of which was first. Without the need for protein import, no system for it will evolve and with no system to import the proteins, no genes will successfully migrate to the nucleus. A lead to the solution has possibly been found by Marc et al. (Marc et al., 2002), who have shown that the messenger RNAs for proteins predominantly in the group with bacterial homologs are selectively translated on polysomes associated with the mitochondrial membrane.

5.2.2 The mitochondrial ancestor The taxonomic placement of mitochondria within the Į-proteobacterial tree is still a matter of debate. A problem is the lack of sequences from early diverging Į-proteobacteria. There is an abundance of SSU ribosomal sequences from unclassified Į-proteobacteria in the databases, but that is usually the only sequence from those bacteria. Unfortunately, SSU rRNA does not contain enough phylogenetic information to satisfactory resolve the Į-proteobacterial placement of mitochondria. The mitochondrial ribosomes are different from other bacterial ribosomes and have been subject to fast evolution (Lang et al., 1999; Mears et al., 2002). The reconstructions are also extremely dependent on the included taxa, something that should raise doubts about the results. A recent analysis of mitochondrially encoded proteins in the respiratory chain has strengthened the case for placing mitochondria on the same branch as Rickettsiales (Emelyanov, 2003). Although the results of Emelyanov are convincing, they need to be verified with a broader taxon sampling from both mitochondria and Į-proteobacteria. If mitochondria belong to the Rickettsiales, then it is tempting to assume that the mitochondrial ancestor had already started on the path towards intracellular parasitism that is a common feature of its descendents. As the ancestor reconstructed in paper IV is the ancestor of also the free-living Į- proteobacteria, it follows that much of its gene content was discarded already before the mitochondrial symbiosis was established. 24 This is also in agreement with the reconstruction of the ancestral mitochondrial metabolism by Gabaldón and Huynen (2003). From a large comparison of Į-proteobacterial and eukaryotic genomes, they could infer a minimal genomic content of the ancestral mitochondrion and reconstruct its metabolic capabilities. Their results reveal an organism with an abundance of transporters for metabolites like lipids and amino acids, and corresponding gaps in synthetic pathways. Again, this points to a host dependent organism. Another finding of this study is that only a minority of the proteins of ancestral mitochondrial descent in eukaryotes are used in the mitochondria. The proteins identified by Gabaldón and Huynen is a minimal estimate of the ancestral mitochondrion, as their method requires that the genes are retained in the genomes of extant eukaryotes. Our reconstruction of the Į- proteobacterial ancestor in paper IV can be used as the upper limit for the contents of the ancestral mitochondrial genome. Indeed, most of the proteins identified by Gabaldón and Huynen are also included in the reconstructed Į- proteobacterial ancestor. The difference between the upper and lower estimates is approximately 2000 orthologous groups of which almost half have no similarity to eukaryotic proteins. Preliminary phylogenetic analyses of the remaining proteins have identified about 100 additional candidates for transfer from the ancestral mitochondrial genome to the eukaryotic nucleus. Taken together, approximately 20-25% of the genes present in the Į- proteobacterial ancestor have potentially been transferred to the eukaryotic genome, but only a minority of these were recruited back for mitochondrial function. Instead, the products of the transferred genes are used elsewhere in the eukaryotic cell and a large number novel proteins have evolved for mitochondrial functions. As the size of the ancestral mitochondrial genome is still unknown, it is not yet possible to say how much was lost in the transformation to an organelle.

5.3 ATP/ADP-translocases An integral part of mitochondrial function is its net export of energy to the rest of the cell. This is accomplished by the mitochondrial ATP/ADP- translocase which works by pumping ATP out of the mitochondrion in exchange for less energy rich ADP. Although this functionality is central to the mitochondrial role in cellular metabolism and the notion of mitochondria as the power plant of the cell, the ATP/ADP-translocase did not originate from the symbiosis where mitochondria once was formed. Instead, mitochondrial ATP/ADP-translocase has evolved from a large group of mitochondrial carrier proteins that show no traces of bacterial heritage and is therefore assumed to be of eukaryotic origin (paper II; Runswick et al., 1987). 25 If the mitochondrial ATP/ADP-translocase evolved after the initial mitochondrial symbiosis, then exchange of ATP for ADP was probably not a key factor in the symbiosis. This should not be a surprise as there is no evolutionary gain for a free-living organism to export ATP into its environment, without getting the protection and rich substrate that a symbiotic host would offer in return. Rather, the forces of evolution would work against the development of a protein functioning as mitochondrial ATP/ADP-translocase before establishment of the symbiosis. This perspective opens up for theories and speculations about what the driving force for the mitochondrial symbiosis was if it was not for the functions we recognize as mitochondrial today. Although ideas about the driving forces behind the mitochondrial origin are difficult, if not impossible, to prove, they often make predictions or assumptions about the nature of the ancestral mitochondrion and its host. With increased knowledge about the mitochondrial ancestor and related bacteria, such ideas can be tested, and hopefully better ones formulated. ATP/ADP-translocases are also found among the mitochondrial relatives Rickettsiales. There is no discernable evolutionary relation between the parasite and mitochondrial type of ATP/ADP-translocases and the parasite translocases work in the opposite direction, depleting the host of ATP in exchange for ADP, in accordance with the parasitic nature of Rickettsiales. Although unrelated to the mitochondrial translocases, the parasite traslocase could be an important piece in the evolutionary puzzle. The parasite type of ATP/ADP-translocase is found in evolutionary very distant taxa and the findings in paper II dismisses the theory by Wolf et al. (1999) which explains the strange taxonomic distribution of parasite ADP/ATP translocase with horizontal gene transfer from Chlamydiaceae to Rickettsia. Instead our data supports an ancient origin of these transporters, where potential horizontal gene transfers took place before what is possible to resolve with phylogenetic methods. If this conclusion is valid, then it is reasonable to assume that the parasite type of ATP/ADP-translocase was present also in the mitochondrial ancestor. A possible scenario could be that the initial mitochondrial symbiosis was no symbiosis in its usual sense, it was parasitism. However, it should be noted that the recent sequencing of Wolbachia pipientis wMel, an obligate intracellular parasite of Drosophila melanogaster that belongs to the Rickettsiales, does not find any ATP/ADP- translocases (Wu et al., 2004). Whether it has been lost in the Wolbachia, or gained later in Rickettsia is currently impossible to tell.

26 6. Concluding remarks and future perspectives

The transition from a free-living to an intracellular lifestyle leads to a massive reduction in genome size and possibly a reduction of adaptive capabilities, further pushing the parasite towards host dependency. It seems likely that this also was the process that captured the mitochondrial ancestor, causing a massive loss of genes. The transition from endosymbiont to organelle has left only a core of proteins in the mitochondrial proteome which is still possible to trace back to the ancestral mitochondrion, but the contributions made by the ancestral mitochondrion were not limited to mitochondrial functions and the data now suggests a substantial transfer of genes to the nucleus but without a corresponding targeting to mitochondria. A remaining question is if the mitochondrial ancestor was a parasite. Accumulating data is pointing in that direction, but caution is needed in this interpretation and more data from early diverging Rickettsiales is needed. Another question is on what it was parasitizing. What was the nature of the host organism, the pre-eukaryote? In order to better understand the mitochondrial origin, the focus will soon have to shift towards the eukaryotic side of the transition and the sequencing of both mitochondriate and amitochondriate early diverging eukaryotes.

27 7. Summary in Swedish

Mitokondriens evolution Liv kan delas in i tre taxonomiska grupper; bakterier, arker och eukaryoter. Medan bakterier och arker är ”enkla” encelliga organismer, så omfattar eukaryoterna allt från encelliga amöbor och jästsvampar till djur och växter. En utmärkande egenskap hos eukaryoter är förekomsten av organeller. Det genetiska materialet är inneslutet i cellkärnan och golgi-apparaten sköter om transporter inom cellen. Två andra viktiga organeller är mitokondrien och, i fotosyntetiserande eukaryoter, kloroplasten. Både mitokondrier och kloroplaster är ursprungligen bakterier som anpassats till att fungera i eukaryoternas tjänst och båda organellerna har kvar en del av sin arvsmassa. Kloroplasterna sköter om fotosyntesen medan mitokondriernas främsta funktion är omvandling av kemisk energi via citronsyracykeln och andningskedjan. Båda organellerna har även andra viktiga uppgifter och mitokondrien spelar en central roll i till exempel programmerad celldöd. Kloroplastens förfader har spårats till cyanobakterierna, även kallade blågröna alger, medan mitokondrien härstammar från en grupp bakterier som kallas Į-proteobakterier. Dessa är en grupp av bakterier med vitt skilda nischer och egenskaper, men många lever i nära kontakt med eukaryoter antingen som parasiter eller symbionter. Den grupp av Į-proteobakterier som anses vara närmast släkt med mitokondrierna är Rickettsiales. Denna grupp omfattar Rickettsia prowazekii som är en intracellulär parasit som sprids mellan människor med löss och orsakar sjukdomen tyfus. Det finns många likheter mellan R. prowazekii och mitokondrier; båda har kompletta system för andningskedjan, är fullständigt beroende av sina värdorganismer och har utvecklat transportsystem för att kunna importera näringsämnen från den omgivande värdcellen. Båda har dessutom genomgått en kraftig reduktion av sin arvsmassa. En annan grupp av Į-proteobakterier som är intracellulära parasiter är Bartonella. Dessa klarar dock bättre att leva utanför sina värdceller (men är normalt beroende av en värdorganism för sin överlevnad) och har en större arvsmassa än Rickettsia. För att studera processerna som formar intracellulära parasiter, har vi sekvenserat arvsmassan för de två Bartonella 28 arterna B. henselae och B. quintana. B. henselae finns naturligt i katter där de inte ger några symtom på sjukdom, men kan överföras till människor via kattbett eller rivsår, vilket då ger en sjukdom som kallas kattklösarsjukan. B. quintana har, så vitt känt, sin naturliga reservoar i människor där de efter det akuta förlopp som kallas skyttegravsfeber eller femdagarsfeber, kan finnas kvar utan symtom. Liksom R. prowazekii sprids B. quintana mellan människor med löss och skyttegravsfeber, som var en vanlig åkomma i krigstid, drabbar idag hemlösa. Jämförelser mellan de två Bartonella arterna visar att B. quintana, som är mer specialiserad, har en mer reducerad och mindre dynamisk arvsmassa än generalisten B. henselae. Vidare jämförelser med andra Į-proteobakterier visar på en tydlig trend där arvsmassan hos intracellulära parasiter krymper, medan den expanderar i frilevande växtsymbionter. Resultaten tyder på att anpassningen till en intracellulär miljö leder till en snabbare reduktion av arvsmassans innehåll och en minskad förmåga att anpassa sig till förändringar. Denna reduktion av arvsmassan har troligen också varit aktiv under mitokondriens evolution. Kvar i mitokondriernas arvsmass finns nämligen bara gener för ett litet fåtal proteiner och dessa är långt ifrån tillräckligt för att mitokondrien ska fungera. Istället finns de flesta gener för mitokondriella proteiner i arvsmassan i eukaryotens cellkärna och proteinerna importeras sedan in i mitokondrien via speciella transportörer. Ett vanligt antagande har varit att dessa gener förts över från mitokondriens arvsmassa till cellkärnan. Våra analyser tyder dock på att en minoritet av alla proteiner som finns i mitokondrien och kodas av gener i cellkärnan har sitt ursprung i mitokondriens arvsmassa. De flesta generna för mitokondriella proteiner har troligen utvecklats ur arvsmassan i cellkärnan och många av dem är dessutom relativt nya. Genom analys av de 13 Į-proteobakterier för vilka arvsmassan är kartlagd kunde vi beräkna vilka gener som fanns i den senaste gemensamma förfadern för dessa organismer och mitokondrierna. Förfadern verkar ha varit en frilevande, aerobisk och mobil bakterie med ca 3000 gener i sin arvsmassa. Av dessa har minst 1000 gener försvunnit från mitokondriens förfader eller tidigt under mitokondriens utveckling, men åtminstone 600- 700 gener har förts över till arvsmassan i eukaroternas cellkärna. Våra data såväl som andras tyder dock på att vid flertalet av dessa överföringar har inte genprodukterna återförts till mitokondrien. Ett protein av stor betydelse för mitokondriens funktion är ATP/ADP- translokaset. Detta pumpar ut energirikt ATP ur mitokondrien till resten av cellen i utbyte mot energifattigare ADP, men trots att det är centralt för mitokondriens funktion idag, så har det inte samma ursprung som mitokondrien. I stället har det utvecklats ut en familj av eukaryota transport- proteiner. En fråga som då uppkommer är vad som drev fram symbiosen där mitokondrien uppstod om det inte var utbytet av energi. Det är många som 29 har försökt svara på den frågan, men tyvärr är det svårt eftersom händelsen ligger cirka 2 miljarder år tillbaka i tiden. En möjlighet är att det inte handlade om symbios i vanlig mening, utan om parasitism. Våra resultat har nämligen visat att det ATP/ADP-translokas som finns i Rickettsia, men som inte är relaterat till det mitokondriella ATP/ADP-translokaset, har ett mycket gammalt ursprung och kanske fanns redan i förfadern till mitokondrierna. Det som är speciellt med det ATP/ADP-translokas som finns i Rickettsia är att det har omvänd polaritet relativt det mitokondriella. Det betyder att Rickettsia stjäl energi från sin värd och om detta protein fanns redan i mitokondriens förfader så kan det betyda att den var en parasit som levde på andra celler innan den fångades och tämjdes till vad den är idag.

30 8. Acknowledgments

I would like to thank everybody who have helped in making this work possible and the stay at Molecular Evolution enjoyable.

I am especially grateful to my supervisor Siv for her great optimism and providing me with ideas, freedom, support and not least funding. Thanks also to Chuck for great interest in my work. Mikael, for a vast knowledge on foresting and many discussions on reef keeping. I am sure the arboretum module will grow up some day. Björn, for getting me started and expertise in regular =~ s/.(\w).*(\S)/$1xp$2$1s\163i\x6Fns/x. Ann-Sofie, I don’t need any more PCRs now. Bernard, thanks for all the primers. Carolin, for getting things done. Björn, thanks for all the help and watch out for those aliens. Bastien, Maxime and Ulrika thanks for excellent work. Boris, for the entertainment. Who said politics is boring? Hans-Henrik, you can’t solve everything with a database? Wagied, for being on time. Ola, for saving the cover? Håkan, thanks for all those root passwords. Keep me updated. Lisa, Gabor, Hillevi, Mats and everyone else at the department running Linux, thanks for the computer time. You can get back to work now. Sean, for bringing that Isle of Jura. It was appreciated, just not that good. Daniel, beer-club needs you. Cessie, Cissi and Åsa, the parties were better before you left. Thomas, may Tux be with you. Robert and Rolf for sharing their knowledge and wisdom at the coffee table. Idress, thank you. Olga, my favourite roommate. Thanks also to everyone else at Molecular Evolution, past and present, including: Alex, Bengt, Dave, Dirk, Haleh, Helena, Ivica, Jan, Jasper Karin*3, Kristina, Magnus, Linus, Otto, Ovidiu, Stefan… It’s been great working with you all.

If I forgot to mention you, thanks for (not) reminding me.

Finally I would like to thank my wonderful wife Elin for supporting me all this time and making sure I got home at nights.

31 9. References

Alberts, B., Bray, D., Lewis, J., Raff, M., Roberts, K. and Watson, J.D. (1994) Molecular Biology of the Cell. New York. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. (1990) Basic local alignment search tool. J Mol Biol, 215, 403-410. Andersson, J.O. and Andersson, S.G. (2001) Pseudogenes, junk DNA, and the dynamics of Rickettsia genomes. Mol Biol Evol, 18, 829-839. Andersson, S.G. and Kurland, C.G. (1998) Reductive evolution of resident genomes. Trends Microbiol, 6, 263-268. Andersson, S.G., Zomorodipour, A., Andersson, J.O., Sicheritz-Ponten, T., Alsmark, U.C., Podowski, R.M., Naslund, A.K., Eriksson, A.S., Winkler, H.H. and Kurland, C.G. (1998) The genome sequence of Rickettsia prowazekii and the origin of mitochondria. Nature, 396, 133-140. Cavalier-Smith, T. (1987) The origin of eukaryotic and archaebacterial cells. Ann N Y Acad Sci, 503, 17-54. Cavalier-Smith, T. (2002) The phagotrophic origin of eukaryotes and phylogenetic classification of Protozoa. Int J Syst Evol Microbiol, 52, 297-354. Cho, J.C. and Giovannoni, S.J. (2003) Parvularcula bermudensis gen. nov., sp. nov., a marine bacterium that forms a deep branch in the alpha- Proteobacteria. Int J Syst Evol Microbiol, 53, 1031-1036. Clark, C.G. and Roger, A.J. (1995) Direct evidence for secondary loss of mitochondria in Entamoeba histolytica. Proc Natl Acad Sci U S A, 92, 6518-6521. Deryabina, Y.I., Isakova, E.P. and Zvyagilskaya, R.A. (2004) Mitochondrial calcium transport systems: properties, regulation, and taxonomic features. Biochemistry (Mosc), 69, 91-102. Elberg, S.S. (1981) A Guide to the Diagnosis, Treatment and Prevention of Human Brucellosis. WHO, Geneva. Emelyanov, V.V. (2003) Common evolutionary origin of mitochondrial and rickettsial respiratory chains. Arch Biochem Biophys, 420, 130-141. Fischel-Ghodsian, N., Prezant, T.R., Chaltraw, W.E., Wendt, K.A., Nelson, R.A., Arnos, K.S. and Falk, R.E. (1997) Mitochondrial gene mutation is a significant predisposing factor in aminoglycoside ototoxicity. Am J Otolaryngol, 18, 173-178. Fraser, C.M., Gocayne, J.D., White, O., Adams, M.D., Clayton, R.A., Fleischmann, R.D., Bult, C.J., Kerlavage, A.R., Sutton, G., Kelley, J.M. and et al. (1995) The minimal gene complement of Mycoplasma genitalium. Science, 270, 397-403. 32 Gabaldon, T. and Huynen, M.A. (2003) Reconstruction of the proto- mitochondrial metabolism. Science, 301, 609. Germot, A., Philippe, H. and Le Guyader, H. (1996) Presence of a mitochondrial-type 70-kDa heat shock protein in Trichomonas vaginalis suggests a very early mitochondrial endosymbiosis in eukaryotes. Proc Natl Acad Sci U S A, 93, 14614-14617. Gordon, D., Abajian, C. and Green, P. (1998) Consed: a graphical tool for sequence finishing. Genome Res, 8, 195-202. Goto, Y., Nonaka, I. and Horai, S. (1990) A mutation in the tRNA(Leu)(UUR) gene associated with the MELAS subgroup of mitochondrial encephalomyopathies. Nature, 348, 651-653. Gray, M.W. (1992) The endosymbiont hypothesis revisited. Int Rev Cytol, 141, 233-357. Gray, M.W., Burger, G. and Lang, B.F. (1999) Mitochondrial evolution. Science, 283, 1476-1481. Gray, M.W., Sankoff, D. and Cedergren, R.J. (1984) On the evolutionary descent of organisms and organelles: a global phylogeny based on a highly conserved structural core in small subunit ribosomal RNA. Nucleic Acids Res, 12, 5837-5852. Gray, M.W. and Spencer, D.F. (1996) Organellar evolution. In Roberts, D.M., Sharp, P., Alderson, G. and Collins, M.A. (eds.), Evolution of microbial life. Cambridge university press, Cambridge. Grohmann, L., Brennicke, A. and Schuster, W. (1992) The mitochondrial gene encoding ribosomal protein S12 has been translocated to the nuclear genome in Oenothera. Nucleic Acids Res, 20, 5641-5646. Gross, L. (1996) How Charles Nicolle of the Pasteur Institute discovered that epidemic typhus is transmitted by lice: reminiscences from my years at the Pasteur Institute in Paris. Proc Natl Acad Sci U S A, 93, 10539-10540. Hatch, T.P., Al-Hossainy, E. and Silverman, J.A. (1982) Adenine nucleotide and lysine transport in Chlamydia psittaci. J Bacteriol, 150, 662- 670. Hedges, S.B., Blair, J.E., Venturi, M.L. and Shoe, J.L. (2004) A molecular timescale of eukaryote evolution and the rise of complex multicellular life. BMC Evol Biol, 4, 2. Hedges, S.B., Chen, H., Kumar, S., Wang, D.Y., Thompson, A.S. and Watanabe, H. (2001) A genomic timescale for the origin of eukaryotes. BMC Evol Biol, 1, 4. Heldt, H.W. (1969) Adenine nucleotide translocation in spinach chloroplasts. FEBS Lett, 5, 11-14. Hirt, R.P., Logsdon, J.M., Jr., Healy, B., Dorey, M.W., Doolittle, W.F. and Embley, T.M. (1999) Microsporidia are related to Fungi: evidence from the largest subunit of RNA polymerase II and other proteins. Proc Natl Acad Sci U S A, 96, 580-585. Hoffmann, R., Van Erp, K., Trulzsch, K. and Heesemann, J. (2004) Transcriptional responses of murine macrophages to infection with Yersinia enterocolitica. Cell Microbiol, 6, 377-390.

33 Jameson, P., Greene, C., Regnery, R., Dryden, M., Marks, A., Brown, J., Cooper, J., Glaus, B. and Greene, R. (1995) Prevalence of Bartonella henselae antibodies in pet cats throughout regions of North America. J Infect Dis, 172, 1145-1149. Jefcoate, C.R., McNamara, B.C., Artemenko, I. and Yamazaki, T. (1992) Regulation of cholesterol movement to mitochondrial cytochrome P450scc in steroid hormone synthesis. J. Steroid Biochem. Molec. Biol., 43, 751-767. Jumas-Bilak, E., Michaux-Charachon, S., Bourg, G., O'Callaghan, D. and Ramuz, M. (1998) Differences in chromosome number and genome rearrangements in the genus Brucella. Mol Microbiol, 27, 99-106. Jun, A.S., Brown, M.D. and Wallace, D.C. (1994) A mitochondrial DNA mutation at nucleotide pair 14459 of the NADH dehydrogenase subunit 6 gene associated with maternally inherited Leber hereditary optic neuropathy and dystonia. Proc Natl Acad Sci U S A, 91, 6206- 6210. Kaneko, T., Nakamura, Y., Sato, S., Minamisawa, K., Uchiumi, T., Sasamoto, S., Watanabe, A., Idesawa, K., Iriguchi, M., Kawashima, K., Kohara, M., Matsumoto, M., Shimpo, S., Tsuruoka, H., Wada, T., Yamada, M. and Tabata, S. (2002) Complete genomic sequence of nitrogen-fixing symbiotic bacterium Bradyrhizobium japonicum USDA110. DNA Res, 9, 189-197. Karem, K.L., Paddock, C.D. and Regnery, R.L. (2000) Bartonella henselae, B. quintana, and B. bacilliformis: historical pathogens of emerging significance. Microbes Infect, 2, 1193-1205. Katinka, M.D., Duprat, S., Cornillot, E., Metenier, G., Thomarat, F., Prensier, G., Barbe, V., Peyretaillade, E., Brottier, P., Wincker, P., Delbac, F., El Alaoui, H., Peyret, P., Saurin, W., Gouy, M., Weissenbach, J. and Vivares, C.P. (2001) Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculi. Nature, 414, 450-453. Koehler, C.M. (2000) Protein translocation pathways of the mitochondrion. FEBS Lett, 476, 27-31. Kuan, J. and Saier, M.H., Jr. (1993) The mitochondrial carrier family of transport proteins: structural, functional, and evolutionary relationships. Crit Rev Biochem Mol Biol, 28, 209-233. Lang, B.F., Gray, M.W. and Burger, G. (1999) Mitochondrial genome evolution and the origin of eukaryotes. Annu Rev Genet, 33, 351- 397. Marc, P., Margeot, A., Devaux, F., Blugeon, C., Corral-Debrinski, M. and Jacq, C. (2002) Genome-wide analysis of mRNAs targeted to yeast mitochondria. EMBO Rep, 3, 159-164. Margulis, L. (1970) Origin of Eukaryotic cells. Yale University Press, New Haven. Margulis, L. (1981) Symbioses in Cell Evolution. W. H. Freeman, San Francisco.

34 Martin, W. and Kowallik, K.V. (1999) Annotated English translation of Mereschkowsky's 1905 paper 'Über Natur und Ursprung der Chromatophoren im Pflanzenreiche'. Eur. J. Phycol., 34, 287-295. Martin, W. and Muller, M. (1998) The hydrogen hypothesis for the first eukaryote. Nature, 392, 37-41. Mears, J.A., Cannone, J.J., Stagg, S.M., Gutell, R.R., Agrawal, R.K. and Harvey, S.C. (2002) Modeling a minimal ribosome based on comparative sequence analysis. J Mol Biol, 321, 215-234. Mereschkowsky, C. (1905) Über Natur und Ursprung der Chromatophoren im Pflanzenreiche. Biol. Centralbl., 25, 593-604. Minnick, M.F., Sappington, K.N., Smitherman, L.S., Andersson, S.G., Karlberg, O. and Carroll, J.A. (2003) Five-member gene family of Bartonella quintana. Infect Immun, 71, 814-821. Mirkin, B.G., Fenner, T.I., Galperin, M.Y. and Koonin, E.V. (2003) Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes. BMC Evol Biol, 3, 2. Mootha, V.K., Bunkenborg, J., Olsen, J.V., Hjerrild, M., Wisniewski, J.R., Stahl, E., Bolouri, M.S., Ray, H.N., Sihag, S., Kamal, M., Patterson, N., Lander, E.S. and Mann, M. (2003) Integrated analysis of protein composition, tissue diversity, and gene regulation in mouse mitochondria. Cell, 115, 629-640. Moreira, D. and Lopez-Garcia, P. (1998) Symbiosis between methanogenic archaea and delta-proteobacteria as the origin of eukaryotes: the syntrophic hypothesis. J Mol Evol, 47, 517-530. Myers, W.F., Cutler, L.D. and Wisseman, C.L., Jr. (1969) Role of erythrocytes and serum in the nutrition of Rickettsia quintana. J Bacteriol, 97, 663-666. Ogata, H., Audic, S., Renesto-Audiffren, P., Fournier, P.E., Barbe, V., Samson, D., Roux, V., Cossart, P., Weissenbach, J., Claverie, J.M. and Raoult, D. (2001) Mechanisms of evolution in Rickettsia conorii and R. prowazekii. Science, 293, 2093-2098. Olsen, G.J., Woese, C.R. and Overbeek, R. (1994) The winds of (evolutionary) change: breathing new life into microbiology. J Bacteriol, 176, 1-6. Parsons, J.D. (1995) Miropeats: graphical DNA sequence comparisons. Comput Appl Biosci, 11, 615-619. Paulsen, I.T., Seshadri, R., Nelson, K.E., Eisen, J.A., Heidelberg, J.F., Read, T.D., Dodson, R.J., Umayam, L., Brinkac, L.M., Beanan, M.J., Daugherty, S.C., Deboy, R.T., Durkin, A.S., Kolonay, J.F., Madupu, R., Nelson, W.C., Ayodeji, B., Kraul, M., Shetty, J., Malek, J., Van Aken, S.E., Riedmuller, S., Tettelin, H., Gill, S.R., White, O., Salzberg, S.L., Hoover, D.L., Lindler, L.E., Halling, S.M., Boyle, S.M. and Fraser, C.M. (2002) The Brucella suis genome reveals fundamental similarities between animal and plant pathogens and symbionts. Proc Natl Acad Sci U S A, 99, 13148-13153.

35 Petit, P.X., Susin, S.A., Zamzami, N., Mignotte, B. and Kroemer, G. (1996) Mitochondria and programmed cell death: back to the future. FEBS Lett, 396, 7-13. Pfeiffer, T., Schuster, S. and Bonhoeffer, S. (2001) Cooperation and competition in the evolution of ATP-producing pathways. Science, 292, 504-507. Regnery, R.L., Rooney, J.A., Johnson, A.M., Nesby, S.L., Manzewitsch, P., Beaver, K. and Olson, J.G. (1996) Experimentally induced Bartonella henselae infections followed by challenge exposure and antimicrobial therapy in cats. Am J Vet Res, 57, 1714-1719. Roger, A.J., Svard, S.G., Tovar, J., Clark, C.G., Smith, M.W., Gillin, F.D. and Sogin, M.L. (1998) A mitochondrial-like chaperonin 60 gene in Giardia lamblia: evidence that diplomonads once harbored an endosymbiont related to the progenitor of mitochondria. Proc Natl Acad Sci U S A, 95, 229-234. Runswick, M.J., Powell, S.J., Nyren, P. and Walker, J.E. (1987) Sequence of the bovine mitochondrial phosphate carrier protein: structural relationship to ADP/ATP translocase and the brown fat mitochondria uncoupling protein. Embo J, 6, 1367-1373. Schon, E. (2001) Mitochondria. In Wilson, L. and Matsudaira, P. (eds.), Methods in Cell Biology. Academic Press, New York, Vol. 65, pp. 463-482. Shoffner, J.M., Lott, M.T., Lezza, A.M., Seibel, P., Ballinger, S.W. and Wallace, D.C. (1990) Myoclonic epilepsy and ragged-red fiber disease (MERRF) is associated with a mitochondrial DNA tRNA(Lys) mutation. Cell, 61, 931-937. Sicheritz-Ponten, T., Kurland, C.G. and Andersson, S.G. (1998) A phylogenetic analysis of the cytochrome b and cytochrome c oxidase I genes supports an origin of mitochondria from within the Rickettsiaceae. Biochim Biophys Acta, 1365, 545-551. Smith, T.F. and Waterman, M.S. (1981) Identification of common molecular subsequences. J Mol Biol, 147, 195-197. Snel, B., Bork, P. and Huynen, M.A. (2002) Genomes in flux: the evolution of archaeal and proteobacterial gene content. Genome Res, 12, 17- 25. Stechmann, A. and Cavalier-Smith, T. (2002) Rooting the eukaryote tree by using a derived gene fusion. Science, 297, 89-91. Susin, S.A., Zamzami, N. and Kroemer, G. (1998) Mitochondria as regulators of apoptosis: doubt no more. Biochim Biophys Acta, 1366, 151-165. Tammi, M.T., Arner, E., Kindlund, E. and Andersson, B. (2004) ReDiT: Repeat Discrepancy Tagger--a shotgun assembly finishing aid. Bioinformatics, 20, 803-804. Tatusov, R.L., Koonin, E.V. and Lipman, D.J. (1997) A genomic perspective on protein families. Science, 278, 631-637. Tatusov, R.L., Natale, D.A., Garkavtsev, I.V., Tatusova, T.A., Shankavaram, U.T., Rao, B.S., Kiryutin, B., Galperin, M.Y., Fedorova, N.D. and

36 Koonin, E.V. (2001) The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res, 29, 22-28. Taylor, F.J.R. (1974) II. Implications and extensions of the serial endosymbiosis theory of the origin of eukaryotes. Taxon, 23, 229- 258. Wallin, I.E. (1927) Symbionticism and the origin of species. Williams and Wilkins, Baltimore. Vellai, T., Takacs, K. and Vida, G. (1998) A new aspect to the origin and evolution of eukaryotes. J Mol Evol, 46, 499-507. Whelan, S., Lio, P. and Goldman, N. (2001) Molecular phylogenetics: state- of-the-art methods for looking into the past. Trends Genet, 17, 262- 272. Viale, A.M., Arakaki, A.K., Soncini, F.C. and Ferreyra, R.G. (1994) Evolutionary relationships among eubacterial groups as inferred from GroEL (chaperonin) sequence comparisons. Int J Syst Bacteriol, 44, 527-533. Winkler, H.H. (1976) Rickettsial permeability. An ADP-ATP transport system. J Biol Chem, 251, 389-396. Woese, C.R. and Fox, G.E. (1977) Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc Natl Acad Sci U S A, 74, 5088- 5090. Woese, C.R., Kandler, O. and Wheelis, M.L. (1990) Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci U S A, 87, 4576-4579. Woischnik, M. and Moraes, C.T. (2002) Pattern of organization of human mitochondrial pseudogenes in the nuclear genome. Genome Res, 12, 885-893. Wolf, Y.I., Aravind, L. and Koonin, E.V. (1999) Rickettsiae and Chlamydiae: evidence of horizontal gene transfer and gene exchange. Trends Genet, 15, 173-175. Wu, M., Sun, L.V., Vamathevan, J., Riegler, M., Deboy, R., Brownlie, J.C., McGraw, E.A., Martin, W., Esser, C., Ahmadinejad, N., Wiegand, C., Madupu, R., Beanan, M.J., Brinkac, L.M., Daugherty, S.C., Durkin, A.S., Kolonay, J.F., Nelson, W.C., Mohamoud, Y., Lee, P., Berry, K., Young, M.B., Utterback, T., Weidman, J., Nierman, W.C., Paulsen, I.T., Nelson, K.E., Tettelin, H., O'Neill, S.L. and Eisen, J.A. (2004) Phylogenomics of the Reproductive Parasite Wolbachia pipientis wMel: A Streamlined Genome Overrun by Mobile Genetic Elements. PLoS Biol, 2, E69. Yang, D., Oyaizu, Y., Oyaizu, H., Olsen, G.J. and Woese, C.R. (1985) Mitochondrial origins. Proc Natl Acad Sci U S A, 82, 4443-4447.

37 Acta Universitatis Upsaliensis Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology Editor: The Dean of the Faculty of Science and Technology

A doctoral dissertation from the Faculty of Science and Technology, Uppsala University, is usually a summary of a number of papers. A few copies of the complete dissertation are kept at major Swedish research libraries, while the summary alone is distributed internationally through the series Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology. (Prior to October, 1993, the series was published under the title “Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science”.)

Distribution: Uppsala University Library Box 510, SE-751 20 Uppsala, Sweden www.uu.se, [email protected]

ISSN 1104-232X ISBN 91-554-5933-1