<<

GROUP 4 GROUP 4

GROUP 3 GROUP 3 GROUP 2 GROUP 2

GROUP 1 GROUP 1

GROUP 4 GROUP 4

GROUP 3 GROUP 3 GROUP 2: P + D GROUP 1 GROUP 2: A GROUP 2 ?GROUP 1 Cover illustration by Stina Weststrand. Abstract The root represents the oldest point in a phylogeny, and determining it gives a time arrow. The first molecular treatment of the social amoebae (Dictyostelia) was carried out with small subunit ribosomal DNA (SSU rDNA) data in 2006, and from this four major groups were defined. However, the relationships among them, i.e. the root of the tree, could not be confidently determined. In this study, a ‘new’ protein data set – eukaryotic release factor 3 (eRF3) – was developed for deep dictyostelid phylogeny. This included the work with designing degenerate PCR primers, developing PCR strategies and assembling currently available dictyostelid data sets. Phylogenetic reconstructions show substantial support for an alternative topology placing the root in between the major groups 1 + 2 and 3 + 4. This outcome indicates that the current rooting (SSU rDNA data), which places group 1 as the most basal divergence, should be viewed with caution. In addition, the analyses reveal the root of the dictyostelid group 4, and purpureum can now with strong support be seen as the deepest divergence in this major group. This result is, among other things, important for determining the age of Dictyostelia.

Keywords: Dictyostelia, phylogeny, root, eRF3, RPB1, SSU rDNA, α-tubulin Populärvetenskaplig sammanfattning Att förstå hur organismer är släkt med varandra är en viktig del av biologiska studier. Genom att skapa evolutionära släktträd, fylogenier, kan man få en god förståelse för hur till exempel olika egenskaper kan ha uppstått och hur de har utvecklas till det vi ser idag.

En organismgrupp där fylogenetiska studier spelar en viktig roll är de sociala amöborna, Dictyostelia. Sociala amöbor är eukaryota mikrober med en fascinerande livsstrategi. De lever i jorden och under fördelaktiga förhållanden är de encelliga, livnär sig på bakterier och förökar sig genom celldelning. I brist på föda intar dictyosteliderna en ny fas – flercellighet. Med hjälp av självutsöndrade kemiska signaler klumpar amöbor ihop sig och bildar fruktkroppar som innehåller sporer. Via dessa sporer kan dictyosteliderna sedan spridas till nya områden där förhållandena förhoppningsvis är bättre.

Genom att befinna sig i gränslandet mellan en- och flercellighet utgör de sociala amöborna ett viktigt studieobjekt för att förstå hur komplexitet har uppstått och hur mekanismer såsom -cellkommunikation och celldifferentiering fungerar. Alla evolutionära frågor behöver dock ställas med fylogenetisk information som grund, och kunskapen om de evolutionära släktskapen hos dictyosteliderna är långtifrån fullständig. Mitt arbete syftar till att ”rota” de sociala amöbornas släktträd. Detta innebär att jag har undersökt vad som är upp och vad som är ner i trädet, det vill säga vilka arter av dictyosteliderna som kan anses som de ”evolutionärt äldsta”. Att rota evolutionära träd är ett av de största problemen kopplade till fylogenetiska studier. Framför allt kräver det att man har tillgång till en stor mängd molekylär data. I mitt arbete har jag lagt samman information från fyra olika gener. En av dessa gener har aldrig tidigare använts för fylogenetiska studier hos de sociala amöborna, och under större delen av projektet har jag arbetat med att ta fram data för denna gen.

Arbetet med att samla in data från en ”ny” gen består av flera olika steg. Jag började med att söka i databaser för att hitta en gen som lämpade sig för ändamålet. Efter det designade jag ”primrar” som skulle se till att den önskade genen ”klipptes ut” korrekt från genomet. Genbiten kopierades sedan upp i ett stort antal med hjälp av PCR (Polymerase Chain Reaction) och kloning med bakterier för att sedan skickas iväg för sekvensering. I fallet med sociala amöbor är det svårt att lyckas få data från den gen man tänkt sig. Primrarna går ofta inte att designa specifikt för dictyostelider och eftersom de lever i jorden är det vanligt att bakterie-DNA kommer med i proverna. Labarbetet innebar därför mycket testande fram och tillbaka, men resulterade trots detta i data från tio arter av sociala amöbor.

Min insamlade data, tillsammans med data från de tre gener som redan fanns tillgänglig, tyder på att roten hos de sociala amöbornas släktträd kan vara en annan än den man hittills har förlitat sig på. Mitt arbete visar också att gruppen dictyostelider i stort är något yngre än vad man tidigare trott. Allt detta bidrar till en klarare bild av hur de sociala amöborna har evolverat och är ett steg på väg för att skapa en bättre förståelse för uppkomsten av bland annat flercellighet. Nomenclature

5PTase inositol 5-phosphatase aa amino acid

BI Bayesian inference biPP posterior probability

EF-2 elongation factor 2 eIF2 eukaryotic initiation factor 2 eIF5B eukaryotic initiation factor 5B eRF3 eukaryotic release factor 3

Hsp90 90 kDa heat shock protein

INTS1 integrator complex subunit 1

ITS internal transcribed spacer

LBA long branch attraction

ML maximum likelihood mlBP bootstrap support value

NT nucleotide

RPB1 RNA polymerase II largest subunit

SSU rDNA small subunit ribosomal DNA Contents

1 Introduction 8 1.1 Background ...... 8 1.2 ...... 9 1.2.1 The dictyostelid life cycle ...... 9 1.2.2 Dictyostelid evolution ...... 11 1.2.3 Dictyostelid systematics ...... 11 1.3 Molecular phylogeny and its pitfalls ...... 13 1.3.1 Rooting ...... 13 1.3.2 Long branch attraction, mutational saturation and heterotachy . . 13 1.3.3 The importance of increased sampling ...... 14 1.3.4 Finding genes appropriate for phylogenetic studies ...... 15 1.4 PCR amplification with degenerate primers ...... 15 1.5 Questions addressed ...... 16 1.6 Project aims ...... 17

2 Materials and Methods 18 2.1 Selecting target genes ...... 18 2.1.1 Phylogenetic evaluation of markers ...... 18 2.1.2 Degenerate primer design ...... 19 2.2 Taxonomic sampling ...... 20 2.2.1 Ingroup taxa ...... 20 2.2.2 Outgroup taxa ...... 21 2.3 Wet lab procedures ...... 21 2.3.1 Cell culture and DNA extraction ...... 21 2.3.2 PCR amplification with degenerate primers ...... 21 2.3.3 Cloning, PCR purification and sequencing ...... 23 2.4 Multiple sequence alignment ...... 24 2.5 Phylogenetic analyses ...... 24 2.5.1 Bayesian inference ...... 25 2.5.2 Maximum likelihood ...... 25

3 Results 27 3.1 Evaluation of possible phylogenetic markers ...... 27 3.2 PCR amplification with degenerate primers ...... 30 3.3 Constructing data sets ...... 31 3.4 Phylogenetic analyses ...... 32 3.4.1 The root of the dictyostelid tree ...... 33 3.4.2 The root of group 4 ...... 34 3.4.3 The root of group 2 ...... 34

4 Discussion 41 4.1 The root of the dictyostelid tree ...... 41 4.1.1 Benefits of eRF3 as a marker for deep dictyostelid phylogeny . . . 42 4.1.2 Problems with eRF3 and the other phylogenetic markers considered 42 4.1.3 Inostitol 5-phosphatase as a phylogenetic marker ...... 44 4.1.4 The ‘eRF3 root’ and evolutionary trends in Dictyostelia ...... 45 4.2 The root of group 4 ...... 45 4.3 Conclusions ...... 45

Acknowledgements 46

References 47

Appendices 51

A Primer maps 51

B eIF5B primers and unused eRF3 primers 53

C List of sequences used in the study 53

D Isolates not included in the analyses 56

E Supplemental trees 57 Introduction 1 Introduction

1.1 Background Understanding organismal relationships (phylogenetics) is the cornerstone of evolutionary study. Phylogenetic trees give guidance in how evolutionary questions should be asked and answered and also, they are important for how should be treated. However, many of the current phylogenies available for organism groups across the tree of life are not well-resolved. They are mainly based on limited data sets and additional phylogenetic work is needed to get more accurate representations of the real organismal histories.

One group of organisms for which the evolutionary picture seems complex is the social amoebae, Dictyostelia. They are eukaryotic microbes placed within the amoebozoan lineage, the closest sister group to the clade with fungi and (Baldauf et al. 2000). Dictyostelids were first described in 1869 (O. Brefeld) and ever since they have been studied with a focus on their unique life history, a strategy lying on the border between uni- and multicellularity (Raper 1984). The evolution of multicellularity is a striking question in for which the social amoebae serve as an important model system (Eichinger et al. 2005). In addition, they are uniquely useful for studying the evolution of social behaviour and basic cell biology such as cell-cell communication, cell movement and cell differentiation (Raper 1984; Kaushik and Nanjundiah 2003).

According to traditional taxonomy based on morphological data, the Dictyostelia is subdivided into three separate genera: Dictyostelium, and (Raper 1984). The first molecular phylogeny of the group was published by Schaap et al. in 2006. The work is based on parallel small subunit ribosomal DNA and alpha tubulin data sets, and it clearly indicates that the old classification of the social amoebae is deeply flawed. Instead of three distinct subdivisions, the dictyostelids comprise four major groups of which none correspond to the three traditionally proposed genera. Furthermore, Schaap et al. (2006) showed that Dictyostelia is an extremely deep that is likely to have a large amount of hidden diversity.

During the last years, a substantial amount of additional dictyostelid data have been added to the data set of Schaap et al. (2006), and today nearly all branches within the four higher-level taxa are fully resolved (Romeralo et al. 2010a). Furthermore, another three well-supported major clades have been distinguished in addition to the original four, giving a total of seven major clades (Romeralo et al. 2010b). However, some of the most important questions in the evolution of Dictyostelia remains unanswered – the position of the root of the overall tree, as well as the root of two of the major groups (group 2 and 4). The root represents the oldest point in a phylogeny and it provides the possibility to determine the direction of evolution; which taxon branched off before which? (Baldauf 2003; Graham et al. 2002). Such knowledge is crucial for a correct interpretation of phylogenetic relationships, and for understanding the in which various characteristics in Dictyostelia evolved.

8 1.2. Dictyostelids Introduction

1.2 Dictyostelids 1.2.1 The dictyostelid life cycle The social amoebae (Dictyostelia) consitute an important group of microbal . They have a global distribution and are considered to have forest soils as their primary (Swanson et al. 1999). Even though much is known about specific dictyostelid , especially the model species , the understanding of the taxon as a whole is far from complete. This counts both for the systematics of the group and for biogeographical and ecological aspects of the dictyostelid life history (Swanson et al. 1999). However, our knowledge about social amoebae and their evolution is changing. For example, in the last five years over 100 new dictyostelid species have been described from new sampling sites worldwide (Romeralo et al. 2010a).

Dictyostelia have fascinated developmental biologists ever since their first description in 1869 (Raper 1984). Most of the fascination lies in their unique life strategy in which they alternate between unicellular and multicellular stages. However, in contrast to nearly all other multicellular organisms, dictyostelids become multicellular by aggregation (Raper 1984). The life cycle of social amoebae can be said to consist of several phases and the following description concerns the model species D. discoideum for which the biology in general is best understood (Fig. 1). The cycle starts out with a vegetative feeding phase, the growth phase, in which single celled amoebae (myxamoebae) are feeding upon in the upper layers of the soil. As long as their food supply is plentiful, the amoebae grow and reproduce asexually by binary fission. However, when food becomes scarce it triggers the social cycle to begin, and this starts with an aggregation phase. In D. discoideum, aggregation occurs when 100’s of thousands of free-living amoebae stream together to form a . Aggregation takes place in response to self-generated chemical signals called acrasins. These chemoattractants are only known for some species, and in D. discoideum it is cyclic AMP (cAMP). In fact, cAMP signaling, which is also important in vertebrate cells, was first discovered in Dictyostelia (Bonner and Savage 1947; Shaffer 1953). (Raper 1984; Schaap 2007)

The multicellular aggregate results in a mound (the mound phase) which later develops into a tipped mound with amoebae differentiated into prespore and prestalk cells. The tip continues to emit cAMP and can be seen as an ‘organising center’ which controls cell movement within the mound and thereby shapes the aggregate. It is also cAMP that induces cell differentiation. The next step in the life cycle is the transformation of the tipped mound into a finger shaped aggregate, and it can now be seen that prestalk cells, that are most sensitive to cAMP, have moved towards the apical end of the cell mass. A common scenario is that the ‘finger’ eventually falls over to become a . The slug is motile and moves towards the soil-surface in response to environmental stimuli such as chemical gradients, light and temperature (Bonner and Lamont 2005). (Fey et al. 2007; Raper 1984; Schaap 2007)

During the last phase of the dictyostelid life cycle – culmination – the finger or slug contracts into a ‘mexican hat shape’ and starts to form a fruiting body. An inert cellulose tube is formed through which the anterior prestalk cells move downwards from the apical end of the aggregate to become dead, vaculoated stalk cells. The prespore cells in the ‘tail’

9 1.2. Dictyostelids Introduction

move up to the top of the rising fruiting body forming a globular mass of viable . When the conditions become favourable, the spores germinate to form new free-living single celled amoebae, and the life cycle starts all over again. For D. discoideum the whole described cycle, from depletion of the food resource to a mature fruiting body, takes about 24 hours on a petri plate in the lab. (Fey et al. 2007; Raper 1984; Schaap 2007)

Gro wt mass h U ni ce llu n Bacteria la o Spores ti Binary ri a ty in VEGETATIVE fission lm CYCLE u C Stalk

Myxamoebae Mature fruting body Cellulose tube Starvation Mexican hat

Finger/slug contracts SOCIAL CYCLE

Chemotaxis A

Cell adhesion g

g

Migrating slug r

e

g

a

t

i

Finger Streaming o n

y

t

i

r Tipped mound

a

l Cell differentiation

u

l

l

e

c Prestalk cells i

t

l Prespore cells

u

M

d

n

u

o

M

Figure 1: Life cycle of Dictyostelium discoideum. Both the vegetative cycle and the social cycle are shown, together with the different phases that the total cycle can be divided into: growth, aggregation, mound and culmination. Illustration by Stina Weststrand with information taken from Schaap (2007) and Fey et al. (2007).

10 1.2. Dictyostelids Introduction

Most, if not all, dictyostelid species can also respond to harsh conditions by encystation of single amoebae (microcysts) or with a cycle in which they mate sexually (Kaushik and Nanjundiah 2003). Most importantly the different stages of the social life cycle can vary remarkably between different species of Dictyostelia. There are differences in the aggregation pattern, slug formation and movement, fruiting body morphology, size1, spore appearance, etc. Furthermore, there is disparity in the number of different cell types that a fruiting body possesses. Most common is a fruiting body with two distinct cell types, stalk and spore cells; however, species of one of the three dictyostelid genera – Acytostelium – only have spore cells, and D. discoideum shows as many as five different cell types (Bonner 2003; Schaap 2007). Nevertheless, morphological characters are often difficult to distinguish and overlapping features are common between species (Raper 1984).

1.2.2 Dictyostelid evolution Due to their exceptional life strategy, the dictyostelids constitute a unique model system for studying important questions in biology. Firstly, they provide an opportunity to better understand multicellular development in terms of cell-, cell differentiation and cell movement (Raper 1984; Schaap et al. 2006). Secondly, the life strategy of Dictyostelia involves social evolution where cell differentiation can be seen as a form of behavioural and division of labour (Strassmann et al. 2000).

A commonly encountered question when studying Dictyostelia is how their life history first evolved. It might be seen as selectively advantageous to cluster together if it makes dispersal more effective. A stalk lifts the spores above the soil, whereby predation and decomposing agents can be avoided to some degree and there is an increased opportunity for slightly longer dispersal distances (Kaushik and Nanjundiah 2003; Strassmann et al. 2000). If the fruiting body grows even larger and if the stalk can bear a bigger spore mass, these advantages should increase. However, this also requires higher complexity, and the larger species of dictyostelids possess more than one cell type of which the sterile stalk cells provide the additional support needed for lifting large spore masses (Bonner 2003; Strassmann et al. 2000). In addition, many of the larger species have a migration stage in their life cycle. The motile slug provides further opportunity for spore dispersal; a slug moves approximately three times faster than a solitary amoebae (Bonner 2003). Nonetheless, the life history of the social amoebae creates opportunities for cheating, and one interesting area of studying is the altruistic behaviour of the dictyostelid stalk cells (e.g. Strassmann et al. 2000).

1.2.3 Dictyostelid systematics Dictyostelia is placed within the supergroup with Opisthokonta, including animals and fungi, as the closest sister clade (Baldauf et al. 2000). More precisely the social amoebae are classified as , a group comprising species with spore-bearing fruiting bodies. The group is traditionally considered to contain three major divisions: Dictyostelia, and Protostelia (Baldauf and Doolittle 1997; Fiore-Donno et al. 2010; Shadwick et al. 2009). Myxogastria (plasmodial slime moulds)

1A dictyostelid fruiting body in general has a size of 1 mm or less, but there are species with fruiting bodies as high as 10 mm or even several centimetres (Raper 1984; Schaap 2007).

11 1.2. Dictyostelids Introduction

is the closest sister group to the social amoebae, and the two taxa are together called Macromycetozoa (Baldauf 2008; Fiore-Donno et al. 2010).

Early classifications of the dictyostelids were mainly based on morphological characters, particularly fruiting body architecture and size. The traditional view is to divide the group into three genera: Dictyostelium, Polysphondylium and Acytostelium (Raper 1984). The general characteristics of these three taxa are that species of Dictyostelium mainly show unbranched fruiting bodies with a terminal spore mass, Polysphondyliums have fruiting structures with regular whorls of side branches and the main character of Acytostelium is the acellular stalk (Raper 1984). In addition, the different dictyostelid groups show variation in for instance the size and shape of their spores, pigmentation and chemoattractant used for aggregation (Raper 1984; Schaap 2007). However, dictyostelid classification is associated with difficulties, and this is mainly due to few distinguishable morphological characters (Raper 1984). In fact, recent molecular studies indicate that some of the identified morphospecies are species complexes, i.e. groups of species that are morphologically similar but evolutionarily distantly related (Romeralo et al. 2010a).

Swanson et al. (2002) were the first to suggest that the old morphologically based classification of the three genera Dictyostelium, Polysphondylium and Acytostelium is not reliable. This was based on cladistic analysis of traditional taxonomic characters. The first molecular phylogeny of Dictyostelia was published by Schaap et al. in 2006. This work includes close to a hundred dictyostelid isolates and similarly to Swanson et al. (2002), it clearly indicates that the taxonomy of the social amoebae is highly flawed. In addition, the phylogeny shows that the dictyostelids possesses an enormous molecular depth, which indicates that the taxon is complex and that it might contain a great amount of hidden diversity. Thus, although the traditional taxonomy of the dictyostelids is still in use, a taxonomic revision of the group is desperately needed (Romeralo et al. 2010a).

The molecular phylogeny from 2006 (Schaap et al.) is primarily based on small subunit ribosomal DNA (SSU rDNA) sequences, but the main conclusions were also confirmed by alpha tubulin data. These data show that the dictyostelids can be divided into four high level taxa informally referred to as group 1–4, where group 1 is designated as the possible evolutionary oldest group and group 3 + 4 as the youngest. None of the four major groups correspond to the three traditionally recognised genera. Dictyostelium species are seen in all four groups and Polysphondyliums are found in two widely separated groups (2 and 4). The Acytosteliums are placed as a subgroup within group 2. Group 4 seems to be the most speciose group of social amoebae and group 2 can be seen as the most morphologically heterogeneous with species from all three traditional genera. These results imply that fruiting body morphology in dictyostelids is very plastic and therefore not reliable as a phylogenetic marker. They also show that many of the morphological characters, such as fruiting body branching pattern, must have arisen several times independently in the evolutionary history of Dictyostelia.

A substantial amount of work has been conducted on the systematics of Dictyostelia since the phylogeny by Schaap et al. (2006). This work has mainly involved molecular and morphological description of new dictyostelid species and the addition of these taxa to the SSU rDNA tree. The most recent SSU rDNA phylogeny (Romeralo et al. 2010b)

12 1.3. Molecular phylogeny and its pitfalls Introduction

shows that besides the four major clades distinguished by Schaap et al. (2006), there are three additional well-supported groups lying in between them (the ‘violaceum complex’, the ‘polycephalum complex’ and the ‘polycarpum complex’). However, these complexes are still only represented by a few species each. Furthermore, analyses of the internal transcribed spacers (ITS) region of rDNA combined with SSU rDNA show full resolution for nearly all branches within the four major dictyostelid groups (Romeralo et al. 2010a). This has made it possible to identify various ‘subgroups’ informally referred to as 1A, 1B, 2A, etc. (Romeralo et al. 2010a).

1.3 Molecular phylogeny and its pitfalls Phylogenetics is essentially about analysing characters to estimate evolutionary relation- ships. Nowadays, it is more a rule than exception to use large amounts of molecular data, and the most commonly applied phylogenetic methods for analysing data are maximum parsimony (MP), maximum likelihood (ML) and Bayesian inference (BI). Independent of which strategy applied, there are several possible pitfalls to run into when working with phylogenies. There are problems with rooting trees, identifying appropriate data for phylogeny, problems with uneven data sampling, etc.; all of which can result in an erroneous representation of organismal relationships.

1.3.1 Rooting One of the main questions when conducting phylogenetic analyses is where the root should be located on the tree. Since the root represents the deepest split in a phylogeny, it is essential for determining the directionality of evolution (Baldauf 2003; Graham et al. 2002). Thus, the root will answer the question: “in which order did the subsequent branching events in the tree occur?”, and an incorrect root position will therefore mislead the whole interpretation of a phylogeny.

The usual approach for rooting phylogenies is to use an external point of reference, an outgroup (Gribaldo and Philippe 2002). However, selecting an appropriate outgroup is often problematic. Mostly we want to use outgroup taxa that are as closely related to the ingroup as possible, but in many cases the higher level relationships between lineages are unknown or at least uncertain (Graham et al. 2002). For deep taxa such as Dictyostelia, another problem is that the closest outgroups may still be far away because of many extinct lineages (Graur and Li 2000). This often results in spurious rooted trees in which the branch(es) leading to the outgroup taxa is much longer than the ingroup branches (Graham et al. 2002; Philippe et al. 2004).

1.3.2 Long branch attraction, mutational saturation and heterotachy Long branch attraction (LBA) occurs when the longest branches in a tree are clustering together irrespective of their ‘true’ underlying relationship (Felsenstein 1978). This is a problem encountered particularly in deep phylogenies where highly divergent sequences are commonly present and many intervening taxa are missing (Baldauf 2003). The general outcome of LBA is that the longest ingroup branches will be attracted to the long outgroup branch and thereby emerge as the deepest offshoots in the tree (Gribaldo and Philippe

13 1.3. Molecular phylogeny and its pitfalls Introduction

2002). Bergsten (2005) emphasised that LBA also can be due to uneven taxon sampling, thus without having any different evolutionary rates in a tree. To make things even more confusing, in some cases the longest ingroup branch is correctly placed as the most basal one (Bergsten 2005).

Several methods have been proposed to overcome the impact of LBA in phylogenetic inference. The importance of an increased taxonomic sampling has been emphasised by Zwickl and Hillis (2002), with the caveat that additional taxa should be selected so that they break up long branches. Further, Bergsten (2005) stressed that focus should be put on slowly evolving positions; for example fast evolving third codon positions can be excluded when analysing protein encoding genes.

Another issue encountered in phylogenetic reconstruction, and especially for deep phylo- genies, is the problem with mutational saturation (Gribaldo and Philippe 2002). There are only a certain number of states that an alignment position (nucleotide (NT) or amino acid (aa)) can possess, and after long time mutations will start repeating themselves and evolutionary history is lost (Baldauf 2003). A lot of progress has been made in developing methods to overcome the problem of mutational saturation (e.g. Philippe and Laurent 1998). However, the issue is still far from solved.

An additional serious problem, also this especially for deep phylogenies, is heterotachy. That is, different substitution rates can be observed for the same sequence site across an evolutionary tree (Gribaldo and Philippe 2002). The effect of heterotachy on phylogenetic methods have only widely been appreciated recently (e.g. Som and Fuellen 2009).

1.3.3 The importance of increased sampling It is important to conduct extensive sampling to get a reliable evolutionary tree. This concerns taxonomic sampling, i.e. adding additional taxa to a single gene data set, but also the need of analysing several different types of characters, for instance multiple genes (Bergsten 2005; Gribaldo and Philippe 2002; Philippe et al. 2004). Adding more taxa, both to the ingroup and to the outgroup, will often help in breaking up long branches (Bergsten 2005; Gribaldo and Philippe 2002).

The idea of combining several genes to reveal higher resolution in phylogenetic analyses has been discussed by e.g. Huelsenbeck et al. (1996), and the method is today widely accepted (e.g. Baldauf 2003; Bergsten 2005; Gribaldo and Philippe 2002; Philippe et al. 2004). In particular, concatenated multigene analyses are of importance for resolving deep branches for which single genes are highly affected by phylogenetic noise, i.e. the phylogenetic signal for an individual gene is often too weak (Baldauf et al. 2000; Baldauf 2003). Furthermore, Page (2000) stressed that single gene inference is sensitive to hidden paralogy. That is, if sampling is poor, there is a risk of comparing paralogous genes instead of orthologous.

Multigene analyses should be performed with compatible and complementary data. The general approach (e.g. Bergsten 2005) is to combine data sets possessing different properties, for instance to make use of unlinked genes or both molecular and morphological data. However, combining data is only appropriate if all data have the same underlying history.

14 1.4. PCR amplification with degenerate primers Introduction

To make sure that this is the case, it is important to run controls (Bergsten 2005; Sullivan 1996). Most commonly this is single gene trees, and the goal is to identify possible conflicting signals in the data (Baldauf et al. 2000; Queiroz et al. 1995).

A problem when combining data sets is that limited sampling can result in missing data entries, and such an information loss can result in poorly resolved phylogenies (Philippe et al. 2004). Yet, a commonly used strategy (e.g. Philippe et al. 2004) is to construct chimeric sequences, i.e. to let closely related species/isolates represent the same gene for maximising taxonomic representation. Nevertheless, the balancing between the utilisation of numerous genes or numerous species is a constant problem in phylogenetic reconstruction (Philippe et al. 2004).

1.3.4 Finding genes appropriate for phylogenetic studies The dictyostelid root appears to be very ancient (Schaap et al. 2006), indicating the need of phylogenetic markers (genes) suitable for resolving deep evolutionary relationships. A useful gene should fit the following criteria (Baldauf and Palmer 1993; Philippe et al. 2004; Tarasov et al. 2008): 1) slow rate of evolution, 2) large size (no use trying to work with a protein ≤ 400 aa), 3) single copy (no paralogs), 4) universal in ingroup, and 5) sequences available from close outgroup taxa. Finding an appropriate outgroup is often facilitated if earlier work is present on the gene in question (Sandra Baldauf, personal communication).

1.4 PCR amplification with degenerate primers Sequencing genes for deep phylogeny in Dictyostelia requires the design of degenerate PCR primers. The total is only sequenced for three species of social amoebae (D. discoideum, D. purpureum and ), and due to the tremendous depth of the taxon there is not enough information to find universal primer sites by comparing just three sequences. Instead, primer design has to make use of protein sequences from a broad range of distantly related taxa (Hubbard Center for Genome Studies 2003). From an alignment at the aa level it is possible to identify conserved regions among all the sequences included. These regions can later be used for constructing degenerate primers.

‘Degenerate primers’ refer to the embedded degeneracy of the genetic code; the third codon position is often, and the first codon position sometimes, synonymous meaning that most amino acids are coded for by more than one codon. This is the problem when designing degenerate primers: one needs to identify universally conserved aa sites, but these do not necessarily have universally conserved NT sequences because of the degeneracy of the genetic code. The following are some general advices for degenerate primer design (Hubbard Center for Genome Studies 2003; Premier Biosoft International 2010, Sandra Baldauf, personal communication): 1) a length of 6–8 aa, 2) only the four first aa in the 30 end need to be degenerate, 3) aim for 2–4 fold degenerated aa, 4) GC content of ◦ 0 ∼50 %, and 5) melting temperature (Tm) of 52–65 C. The 3 end determines the binding specificity of the primer and should preferable include G’s and C’s for stability.

15 1.5. Questions addressed Introduction

Due to their non-specifity, degenerate primers make it more difficult to obtain single products from PCR amplification. Appropriate sized gel bands usually have to be cut out and cloned, whereby nested PCR can follow (e.g. Fiore-Donno et al. 2005). Nested PCR is used to quickly screen clones in order to identify vectors with the correct insert present. It implies an additional PCR reaction with the proposed band as the template and with a pair of primers located inside the ‘original’ ones on the desired target gene.

An additionally problem is DNA amplification of possible contaminants. In the case with social amoebae, their feeding upon bacteria when grown in culture makes it important to utilise primers, or even genes, not present in bacteria.

1.5 Questions addressed 1. The root of the dictyostelid tree. SSU rDNA data, and SSU rDNA in combination with α-tubulin data, support the placement of group 1 as the earliest diverging branch in the phylogeny of social amoebae (Schaap et al. 2006). However, this root is not well- supported, most likely because two genes do not provide enough information to resolve such an ancient question as the root of the dictyostelid tree. Thus, finding the overall root requires additional data (genes) and a less distantly related and poorly sampled outgroup.

2. The root of group 4. All available SSU rDNA phylogenies (and SSU rDNA/ITS) (Romeralo et al. 2010a; Romeralo et al. 2010b; Schaap et al. 2006) place, with strong support, a clade of D. giganteum, D. brunneum and D. robustum at the most basal branch in group 4. However, single gene analyses of α-tubulin aa data and concatenated analyses of SSU rDNA and α-tubulin NT data place D. purpureum as the first branch in the group. The rooting of group 4 is particularly important since the age estimation of 400 million years between D. discoideum and D. purpureum (Parikh et al. 2010) actually could be the age of group 4 as a whole, and since such an outcome would affect the calibration of the whole dictyostelid tree. In addition, group 4 includes the model species D. discoideum.

3. The root of group 2. The SSU rDNA phylogenies (Romeralo et al. 2010b; Schaap et al. 2006) and the SSU rDNA/ITS phylogeny (Romeralo et al. 2010a) indicate that A. ellipticum is the deepest offshot in the subdivision of group 2 comprising Polysphon- dyliums. This means that Acytosteliums are located on both sides of the group 2 root, and that Polysphondyliums are nested within Acytosteliums and therefore represent a reinvention of the cellular stalk. However, even though this topology is supported, it should be viewed with caution since the branch leading to A. ellipticum is very long.

16 1.6. Project aims Introduction

1.6 Project aims The aim of this study was primarily to get a more robust insight into the location of the overall root in the phylogeny of Dictyostelia. This was done by developing two ‘new’ genes for deep dictyostelid phylogeny. Appropriate genes were found through data base searches and used to design sets of degenerate primers. Data from a broad range of dictyostelid isolates were generated via PCR and subsequent sequencing. The new sequences were assembled and later combined with currently available dictyostelid data sets, both published (small subunit ribosomal DNA and alpha tubulin (Schaap et al. 2006)) and unpublished (largest subunit of RNA polymerase II, the Baldauf lab). Together with appropriate outgroup taxa the data were used to test various possible roots in Dictyostelia. With the same data, I also examined some of the internal relationships of the dictyostelid phylogeny.

17 Materials and Methods 2 Materials and Methods

2.1 Selecting target genes Five genes were evaluated for their adequacy for deep dictyostelid phylogeny. They encode the proteins Hsp90, IF2, eIF5B, EF-2 and eRF3.

90 kDa heat shock protein (Hsp90). A member of the 90 kDa molecular chaperone , proteins that bind and facilitate activities of other proteins. Hsp90 is active in the and its name is derived from the fact that it is heat-induced. It has an prokaryotic counterpart, HtpG. (Csermely et al. 1998)

Eukaryotic initiation factor 2 (eIF2). One of the factors that initiates eukaryotic translation by helping the joining of methionyl-tRNA (Met-tRNA) to the 40S ribosomal subunit in a GTP dependent manner (Kimball 1999). No prokaryotic homologs are known (Unbehaun et al. 2007).

Eukaryotic initiation factor 5B (eIF5B). Another factor that is responsible for the initiation of eukaryotic translation by mediating the binding of the 60S ribosomal subunit with the smaller 48S. The factor is a ribosome dependent GTPase and its prokaryotic homolog is named IF2. (Unbehaun et al. 2007)

Elongation factor 2 (EF-2). An eukaryotic elongation factor that translocates the ribosome along the mRNA during protein synthesis. It is a GTP binding protein and its counterpart in bacteria is EF-G. (Perentesis et al. 1992)

Eukaryotic release factor 3 (eRF3). A GTPase and translation termination factor that stimulates the main release factor eRF1 in the hydrolisation and release of peptidyl- tRNA from the ribosome. The protein RF3 fills the same purpose in bacteria, but the genes encoding the two proteins are not orthologs. (Inagaki and Doolittle 2000)

2.1.1 Phylogenetic evaluation of markers A protein data set of broadly sampled taxonomic groups was created for each of the five target genes. Sequences were from: 1) all available Dictyostelia and Amoebozoa, 2) a broad sampling of other eukaryotes: Drosophila and Monosiga (Metazoa), basidiomycetes and ascomycetes (Fungi), Arabidopsis, Oryza and Ostreococcus (Plants), and 3) Bacteria (, Bacillus subtilus, Thermus thermophilus, Agrobacterium tumafaciens and Streptomyces coelicolor 2). Bacteria were included to look for non-bacterial primer sites. 2Here, it was a mistake that the species Klebsiella aerogenes, upon which the dictyostelids cultured in this study were feeding, was not included. However, gene sequences from Klebsiella were later added to the eRF3 alignment from which usable primer sites were detected, and it could be concluded that the designed primers do not have any preference for binding to Klebsiella.

18 2.1. Selecting target genes Materials and Methods

Protein sequences were obtained via protein BLAST (BLASTp) at the website of National Center for Biotechnology Information (NCBI) (http://blast.ncbi.nlm.nih.gov/) using the non-redundant database. For each protein a sequence from D. discoideum was found through GenBank (www.ncbi.nih.gov/Entrez/) by keyword searching and used as query. Sequences for each gene were searched by organism or organism group as given above. The ‘E-value cut off’ differed slightly between the five genes, but in general no sequences with an E-value higher than 10−70 were used except for bacteria where the E-values were allowed to be higher. For some searches a redundant number of hits had E-values better than the defined ‘cut off’, whereupon a maximum of seven top hit sequences were downloaded and included in the alignments. Sequences from D. purpureum were obtained via the BLASTp service at the website of DOE Joint Genome Institute (DOE JGI) (http://genome.jgi-psf.org/Dicpu1/Dicpu1.home.html).

Alignments were constructed at the aa level with ClustalX version 2.0.12 (Thompson et al. 1997). For each gene a neighbour-joining distance tree (NJ-tree), with 1000 bootstrap trials and exclusion of gaps, was calculated in the same program. These ‘rough’ phylogenies were studied in FigTree version 1.2.3 (http://tree.bio.ed.ac.uk/software/figtree/) for evaluation of relative evolutionary rates and possible presence of multiple gene copies. According to the ‘marker criteria’ (section 1.3.4), eRF3 and eIF5B were the best candidates for inferring the root of the dictyostelid tree.

2.1.2 Degenerate primer design Sets of degenerate primers were designed for both eIF5B and eRF3. Two sets of aa alignments were constructed for each marker; one including all the eukaryotes and one including dictyostelid and bacterial sequences only. The first alignment was used to find ∼100 % consensus regions to design primers to, and the second was used to look for possible primer sites in bacteria. Consensus regions also present in bacteria were rejected. If multiple sequences were found for any , the shortest sequence(s) was excluded. For eRF3, all sequences from Arabidopsis were left out due to several specific indels. In addition, sequences that obviously were paralogs were excluded. eRF3 searches also found elongation factor 1-alpha (EF-1α) and 70 kDa heat shock protein (Hsp70) sequences. These sequences grouped outside the ‘eRF3 clade’ and the E-values were considerable higher than for the plausible eRF3 data.

For each gene the aim was to find at least two forward primers in the beginning of the gene, two reverse primers in the end and one of each kind in the middle. These were designed to allow me to do nested PCR to identify correct bands, and also to get full length sequences of eIF5B which was too long to amplify in a single reaction. Lists of the primers that later worked for eRF3, together with information on their GC contents and melting temperatures, can be seen in Table 1. Additional primers for eRF3 and the barely tested primers for eIF5B are listed in Tables B.1 and B.2. All primers were ordered from Eurofins MWG Operon (Ebersberg, Germany).

19 2.2. Taxonomic sampling Materials and Methods

Table 1: Degenerated primers for eRF3. Start and end positions, nucleotide sequences, GC contents and melting temperatures (Tm).

a b b 0 0 c ◦ c Name Start End Nucleotide sequence (5 → 3 ) GC content (%) Tm ( C) 2F 188 195 AAG GGT AAG ACT GTN GAR GTN GG 50 61.5 3F 224 231 GA GCA GCA CAA GCN GAY GTN GG 61.4 64.9

1R 552 545 AC GTT GAC CAC YTT NCC RAA NGC 52.2 62.4 2R 550 543 AAC GAC CTT CCC RAA NGC DAT NGT 49.3 62.4 3R 480 473 ATC TTC TAC TCC NGT RTG NGC RTG 50 62.7 a F: forward; R: reverse. b Start and end positions refer to the aa alignment of dictyostelid eRF3 sequences seen in Fig. A.1. c Values were calculated by Eurofins MWG Operon (Ebersberg, Germany).

2.2 Taxonomic sampling Sequence data were collected from a variety of sources. SSU rDNA and α-tubulin matrices used in Schaap et al. (2006) were received from Sandra Baldauf, as well as data for the largest subunit of RNA polymerase II (RPB1) currently being collected in the Baldauf lab. Sequences for eRF3 were obtained through my lab experiments and data for eIF5B were never collected due to lack of time. In addition, two ‘bonus proteins’ – inositol 5-phosphatase (5PTase) and integrator complex subunit 1 (INTS1) – were obtained as ‘contaminants’ with eRF3 primers. These proteins were considered further in the study. For some taxa, data were also collected from different publicly available sequence resources.

2.2.1 Ingroup taxa The main criterion when sampling the dictyostelids for eRF3 was to get as broad taxonomic distribution as possible. Sequences were especially wanted from both sides of all deep branches in the current phylogenies (Romeralo et al. 2010b; Schaap et al. 2006), i.e. from the major groups 1, 2, 3 and 4, and with lower priority also from the complexes lying in between the four groups. In addition, data were desirable from as many as possible of the smaller ‘internal groups’ defined by Romeralo et al. (2010a). Furthermore, isolates with available SSU rDNA, α-tubulin and RPB1 data were favoured. When sequences from exactly the same isolate were not available, closely related isolates that might be combined were selected. In addition, some effort was put into species considered specially interesting, e.g. D. giganteum in group 4 and Polysphondyliums and Acytosteliums in group 2. The names of specific isolates will not be written out in the text, but can be found in Table C.1.

D. discoideum and P. pallidum sequences for α-tubulin, RPB1, eRF3 and the ‘bonus proteins’ 5PTase and INTS1 were obtained from GenBank (www.ncbi.nih.gov/Entrez/). A D. discoideum SSU rDNA sequence was found at the Sanger Institute (http://www.sanger. ac.uk/Projects/D_discoideum/) (nucleotide BLAST, BLASTn). D. purpureum sequences were obtained from DOE JGI (http://genome.jgi-psf.org/Dicpu1/Dicpu1.home.html) (BLASTn or BLASTp). All ingroup sequences can be seen in Table C.1.

20 2.3. Wet lab procedures Materials and Methods

2.2.2 Outgroup taxa eRF3 and RPB1 sequences from the in-progress genome sequence of poly- cephalum were found via the tBLASTn service3 at the Genome Center at Washington University (http://genome.wustl.edu/tools/blast). For RPB1 three contigs had to be evaluated and assembled by hand. The SSU rDNA and α-tubulin P. polycephalum sequences originated from GenBank.

An eRF3 sequence for castellanii was obtained from the genome project of the species (Human Genome Sequencing Center (HGSC); http://blast.hgsc.bcm. tmc.edu/blast.hgsc?organism=AcastellaniNeff) (tBLASTn). A. castellanii SSU rDNA, α-tubulin and RPB1 sequences were found through GenBank. Outgroup taxa for the two ‘bonus proteins’ 5PTase and INTS1 were taken as the three or four top hits (lowest E-values) from BLASTp searches performed at NCBI (http://blast.ncbi.nlm.nih.gov/). A list of all outgroup sequences is given in Table C.2.

2.3 Wet lab procedures The lab procedures described here were primarily applied on eRF3. However, the eIF5B primers were tested on A. subglobosum LB1, and it could be concluded that further work with these primers is possible.

2.3.1 Cell culture and DNA extraction Dictyostelid cultures were partly ordered from the Dicty Stock Center (dictyBase; North- western University, Chicago), partly grown from frozen spores held by the Baldauf lab (Table 4 and D.1). To extract DNA, all species were grown from spores with the bacterium Klebsiella aerogenes on SM agar (Standard Medium: 20 g l−1 peptone; 2 g l−1yeast extract; −1 −1 −1 −1 20 g l glucose; 2 g l MgSO4; 3.8 g l KH2PO4; 1.2 g l K2HPO4; 2 % agar (Romeralo et al. 2009)). The plates were stored at room temperature and the culturing time varied from a couple of days for the most fast growing species, to several weeks for the more delicate ones. In some cases, pellets of activated charcoal were needed in the petri dishes to facilitate the growth process and to eliminate ammonia (NH4), which inhibits growth.

For DNA extraction, cells were harvested from the edge of the plaques growing on the SM plates. Cells were placed into a tube with 30 µl MasterAmp DNA extraction solution (Epicentre Technologies, Madison, WI) whereupon the mixture was heated 30 min at 60 ◦C followed by 8 min at 98 ◦C. Cell lysates were stored at −20 ◦C until needed. Some DNA’s were already available from Schaap et al. (2006) (Table 4 and D.1).

2.3.2 PCR amplification with degenerate primers eRF3 was amplified using PCR. Cell lysates were used directly as a template. Amplification initially required some trial and error to optimise PCR, and three different protocols were later utilised (Table 2). The general PCR program was as follows: an initial denaturation step at 94 ◦C for 5 min followed by 30 or 35 cycles with denaturation (94 ◦C for 1 min),

3The tBLASTn algorithm uses a protein query for searching translated nucleotide databases.

21 2.3. Wet lab procedures Materials and Methods

annealing (50 ◦C or 55 ◦C for 1 min) and elongation (72 ◦C for 1 min). The final elongation step was in all cases 72 ◦C for 10 min. The parameters that varied for the different protocols were the number of cycles and the annealing temperature. In addition, the primer volume (forward and reverse, respectively) was either 1 µl or 1.5 µl (10 µM). More cycles, a lower annealing temperature and a larger amount of primers were applied to give more of the desired PCR product when the binding specificity of the primers seemed to be too low in the first try. In the case of D. tenue Pan52, the DNA had to be diluted ×5 with TM ddH2O to get amplification. All PCR reactions were performed using ‘illustra puReTaq Ready-To-Go PCR Beads’4 (GE Healthcare; Solna, Sweden) and the total volume in each reaction tube was 25 µl.

Table 2: Three alternative PCR protocols used in the study. Thermo- cycling conditions, primer volumes and DNA volumes. The total reaction volume was 25 µl.

Alt. 1 Alt. 2 Alt. 3 Initial denaturation 94 ◦C 5 min 94 ◦C 5 min 94 ◦C 5 min

Cycles 35 30 35 Denaturation 94 ◦C 1 min 94 ◦C 1 min 94 ◦C 1 min Annealing 55 ◦C 1 min 55 ◦C 1 min 50 ◦C 1 min Elongation 72 ◦C 1 min 72 ◦C 1 min 72 ◦C 1 min

Final elongation 72 ◦C 10 min 72 ◦C 10 min 72 ◦C 10 min Primer vol. (F/Ra)(µl) 1.5 1 1.5 DNA vol.b (µl) 1 1 1 a F: forward; R: reverse. Primer concentration 10 µM. b Undiluted or diluted DNA.

PCR products were separated on 1.5 % agarose gels, and individual bands of adequate size were excised from the gel and purified with the ‘QIAquick Gel Extraction Kit’5 (Qiagen; Crawley, UK) according to the manufacturer’s protocol (‘Gel Extraction Spin Protocol’). Due to the usage of degenerate PCR primers, one band per reaction was rarely seen. However, in most cases it was distinguishable which band was the right-sized one, and multiple eRF3 genes were never detected.

4The beads contain dNTP, puReTaq DNA polymerase, reaction buffer, stabilisers and BSA (Bovine Serum Albumin). 5The kit utilises spin-column technology in combination with selective binding properties of a silica membrane. It contains the buffer QG that solubilises the agarose gel slice and makes the conditions advantageous for DNA to bind the silica membrane, the buffer PE that washes away unwanted impurities from the membrane, and ddH2O that lastly elutes the DNA.

22 2.3. Wet lab procedures Materials and Methods

2.3.3 Cloning, PCR purification and sequencing Extracted gel bands were cloned with the ‘TOPO TA Cloning R Kit’6 according to the manufacturer’s instruction (Invitrogen; Stockholm, Sweden). PCR products were cloned into the plasmid vector pCR R 2.1-TOPO R and transformed into TOP10 OneShot R chemically competent Escherichia coli cells provided with the kit. Transformed cells were spread on LB plates (Lysogeny Broth medium with kanamycin: 14 g LB agar; 400 ml −1 ddH2O; 400 µl Km (50 mg ml )) with X-gal. A maximum of 70 positive (white) clones per PCR product were prepared for sequencing by PCR amplification using the vector specific primer pair T7promoter/M13R-pUC(-40) (50 TAATACGACTCACTATAGGG 30 and 50 CAGGAAACAGCTATGAC 30 (Macrogen Inc.; Seoul, Korea)). The PCR protocol for sequencing grade DNA was as follows: an initial denaturation step at 94 ◦C for 10 min followed by 30 cycles with denaturation (94 ◦C for 1 min), annealing (60 ◦C for 1 min) and elongation (72 ◦C for 1 min), and a final elongation with 72 ◦C for 7 min. The PCR reactions were either prepared as above using ‘illustraTM puReTaq Ready-To-Go PCR Beads’ (GE Healthcare; Solna, Sweden) adding 2 µl of each primer to a total volume of 25 µl, or without using PCR beads but instead mixing all components (Thermo Scientific; Surrey, UK) (2.5 µl reaction buffer (10×); 2.5 µl MgCl2 (25 mM); 2 µl dNTP; 0.13 µl Taq polymerase (5 U/µl); 1 µl of each primer (10 µM) 15.9 µl ddH2O) to a total volume of 25 µl.

For D. fasiculatum and D. giganteum positive clones were screened with the nested primers 3F/3R before amplification with T7promoter/M13R-pUC(-40). This was done to identify vectors with an insert present. However, it was concluded that this ‘subcloning’ step did not reveal any additional useful information, whereby it was omitted for subsequent isolates for which instead all positive clones were sequenced.

DNA for sequencing, obtained using PCR beads, was purified with the ‘QIAquick R PCR Purification Kit’7 (Qiagen; Crawley, UK) according to the kit’s protocol (‘PCR Purification Spin Protocol’). In contrast, DNA for sequencing obtained when all reaction components were mixed directly was sent to Macrogen Inc. (Seoul, Korea) for purification. Sequencing was performed by Macrogen Inc. (Seoul, Korea) on an ABI 3730XL sequence analyser. In most cases, all positive clones were sent for sequencing using the primer T7promoter. When needed, additional sequencing with the reverse primer M13R-pUC(-40) was carried out for a couple of the positive clones.

Sequences were edited and assembled into contigs utilising the Progap4 and Gap4 modules of the Staden package version 4.1 (Bonfield et al. 1995). Vector sequences were deleted

6The kit applies a cloning strategy in which the PCR products are directly ligated into the plasmid vector pCR R 2.1-TOPO R . The vector is linearised when supplied and it has single 30-T overhangs to facilitate the ligation with PCR products which, due to the activity of Taq polymerase, has single overhanging 30-A residues. The PCR product is mixed together with TOPO R Vector and salt solution (NaCl and MgCl2). The salt solution is added to increase the number of transformants. The recombinant vector is later transformed into competent TOP10 Escherichia coli cells. SOC-medium is added to maximise the transformation efficiency. 7The kit utilises spin-column technology in combination with selective binding properties of a silica membrane. It contains the buffer PBI that binds the DNA to the silica membrane, the buffer PE that washes away unwanted impurities from the membrane, and ddH2O that lastly elutes the DNA.

23 2.4. Multiple sequence alignment Materials and Methods

and the remaining consensus sequences were translated into amino acids using the online program ‘Six Frame Translation of Sequence’ provided by HGSC (http://searchlauncher. bcm.tmc.edu/seq-util/Options/sixframe.html). All sequences will shortly be submitted to GenBank.

2.4 Multiple sequence alignment In total, six different sequence alignments were constructed: SSU rDNA, α-tubulin, RPB1, eRF3, 5PTase and INTS1. A detailed summary of all sequences included in the different alignments, and which sequences were concatenated to build chimeric sequences, can be found in Table C.1. The alignments were prepared for tree calculations by deleting all ambiguously aligned regions. The number of parsimony informative characters were calculated for all single gene alignments using PAUP* version 4.0b10 (Swofford 2003). For the concatenated data sets, single gene alignments were directly assembled into a single file as separate partitions.

SSU rDNA and α-tubulin alignments used data sets from Schaap et al. (2006) as the ‘template’ to which additional sequences (see section 2.2) were added by hand using a simple text editor. Sequences not needed for this study were deleted. The SSU rDNA alignment was constructed at the DNA level, and the α-tubulin alignment at the aa level.

The RPB1 alignment (the Baldauf lab, 2009-11-03) was adjusted in BioEdit version 7.0.5.3 (Hall 1999). Possible introns were found at the DNA level, both by comparing the coding sequence (CDS) limits for D. discoideum and by the general rule that an intron usually starts out with ‘GT’ and ends with ‘AG’. Introns were then deleted and the whole alignment was translated into amino acids using BioEdit. Additional ingroup and outgroup sequences (see section 2.2) were later added manually.

The eRF3, 5PTase and INTS1 alignments were directly constructed at the aa level utilising ClustalX version 2.0.12 (Thompson et al. 1997). Sequences with frame shifts were spliced together from translations of different reading frames. Introns were identified as above.

2.5 Phylogenetic analyses Single gene and combined data sets were subjected to analyses with Bayesian inference (BI) and maximum likelihood (ML) (Table 3). Individual gene analyses were performed to enable the detection of possible conflicting phylogenetic signals. The combined SSU rDNA/ITS fine level phylogeny (Romeralo et al. 2010a) was used to identify close isolates that could be combined to construct chimeric sequences. In a few cases no combinable data were available and taxa for individual genes had to be left as missing data. In the combined SSU rDNA and eRF3 data set, weighting was applied just for curiosity. The eRF3 alignment was tripled to make the number of parsimony informative sites close to equal for the two genes. All protein encoding genes were examined at the aa level, and only unambiguously aligned positions were used.

24 2.5. Phylogenetic analyses Materials and Methods

2.5.1 Bayesian inference The webserver of ModelTest version 3.8 (Posada 2006), applying the Akaike information criterion (AIC), was used to find the most appropriate substitution model for the SSU rDNA data set. The test selected the GTR + I + Γ model, the General Time Reversible model which takes into account invariable sites and models rate variation among sites with a gamma shaped distribution. This model, with four gamma rate categories, was implemented in Bayesian inference (BI) analyses of the SSU rDNA data set performed in MrBayes version 3.1.2 (Ronquist and Huelsenbeck 2003). A mixture of amino acid models (I + Γ, four rate categories) was used for the proteins. Concatenated (partitioned) data sets were analysed with a separate substitution model for each partition, and the evolutionary model was allowed to vary between genes.

Two independent sets of four chains each were run simultaneously using the Metropolis coupled Markov chain Monte Carlo (MC3) tree search algorithm. Parameters were sampled every 100 generations and trees every 10 for a total of one million generations, whereupon the standard deviation of split frequencies was < 0.018. Parameters and trees sampled before convergence was reached between the two run sets (first 25 % of each run) were discarded as the ‘burn-in’. The optimal tree topology and posterior probabilities of clades were determined from the remaining values.

2.5.2 Maximum likelihood Maximum likelihood (ML) analyses were conducted in RAxML version 7.0.4 (Stamatakis 2006; Stamatakis et al. 2008) with 1000 replicates of fast bootstrap. The substitution model for SSU rDNA was GTR + I + Γ. Models for the proteins were optimal models selected by the BI analyses. Every fifth bootstrap tree was used as a starting point for the ML tree search algorithm. For combined data sets, partitioned data analyses were performed in which each gene was assigned an own substitution model (Table 3). ML analyses were performed for all single gene and combined data sets except SSU rDNA/eRF3 and SSU rDNA/eRF3×3.

8In the SSU rDNA/eRF3 analyses shown in Fig. E.2 and E.3, the BI analyses were run shorter (SSU rDNA/eRF3 ; 280 000 generations) or longer (SSU rDNA/eRF3×3; 1.1 million generations) than one million generations.

25 2.5. Phylogenetic analyses Materials and Methods

Table 3: Single gene and combined data sets, number of ingroup and outgroup taxa included, aligned positions and substitution models applied.

Ingroup Outgroup Aligned Substitution Data set taxa taxa positions modela SSU 10 2 1509 GTR α-tubulin 10 2 325 Dayhoff RPB1 18 2 401 Rtrev eRF3 10 2 295 Rtrev 5PTase 4 3 292 WAG INTS1 4 4 189 JJT SSU + eRF3 10 2 1804 Partitions SSU + eRF3×3 10 2 2394 Partitions RPB1 + eRF3 10 2 700 Partitions α-tubulin + RPB1 + eRF3 10 2 1025 Partitions SSU + α-tubulin + RPB1 + eRF3 10 2 2534 Partitions

a All models were set to imply the additional settings I + Γ and they were defined based on information from BI analyses.

26 Results 3 Results

3.1 Evaluation of possible phylogenetic markers Of the five proposed phylogenetic markers – Hsp90, IF2, eIF5B, EF-2 and eRF3 – only the eukaryotic initiation factor 5B (eIF5B) and the eukaryotic release factor 3 (eRF3) were judged to be adequate for further usage in phylogenetic analyses. The decisions were made based on the criteria described in section 1.3.4. In the following, all reported E-values were obtained using D. discoideum as query.

90 kDa heat shock protein (Hsp90). Firstly, the prokaryotic counterpart HtpG has E-values as low as 10−106 which indicates a likely problem in finding primer sites that exclude bacteria. Secondly, the gene seems to be present in multiple copies in most of the examined eukaryotes. As many as three gene copies can be found among the dictyostelids (Fig. 2). The gene was rejected for further phylogenetic studies.

Eukaryotic initiation factor 2 (eIF2). Work on this protein was never continued since it was found that the protein sequence is too short, only 341 aa in D. discoideum.

Eukaryotic initiation factor 5B (eIF5B). The protein eIF5B is large, 1045 aa for D. discoideum, and all eukaryotes had E-values < 10−139. In addition, the best bacterial hit had an E-value of 10−28, indicating that a possible bacterial homolog is not a threat to primer design. From the NJ-tree (Fig. 3) it could further be concluded that eIF5B seems to be a good phylogenetic marker for deep phylogeny. The tree has an appropriate depth, which indicates that there is enough sequence variation to get some useful phylogenetic signal (consider the scale bar 0.1 substitutions/site). Also, the branch lengths are approximately the same for ingroup (Dictyostelia) and outgroup taxa (other eukaryotes), suggesting a relatively even evolutionary rate. Furthermore, the gene is only present in one copy in all the organisms studied here. It was decided that eIF5B should be considered more in this study.

Elongation factor 2 (EF-2). All eukaryotes had E-values < 10−143, and the bacterial hits had E-values ≥ 10−45. However, the gene is present in at least three copies in Dictyostelia (data not shown). Thus, EF-2 was rejected as a phylogenetic marker for this study.

Eukaryotic release factor 3 (eRF3). The eRF3 protein is large, 557 aa for D. dis- coideum. All examined eukaryotes had several homologs that matched the protein with E-values < 10−65. However, the NJ-tree showed that all ‘low E-value hits’ strongly grouped together to the exclusion of ‘higher E-value hits’ which were annotated as Ef-1α and Hsp70 and clearly formed their own strong groups (data not shown). Thus, Ef-1α and Hsp70 should not affect the usage of eRF3 as a phylogenetic marker, since these paralogs are distinct enough to design specific primers that exclude them. Considering the bacterial sequences, none of them had an E-value < 10−28. In addition, the gene appears to be present in a single copy in Dictyostelia and all other eukaryotes examined here. Also, the tree has an decent depth indicating enough sequence variation for phylogenetic signal. The branch lengths are roughly the same for ingroup (Dictyostelia) and outgroup (other eukaryotes) taxa. eRF3 was defined as a good phylogenetic marker that should be used in this study.

27 3.1. Evaluation of possible phylogenetic markers Results

99 100 95 100 99 100 FUNGI 96 100 50

100 52 100 100 100 100 100 DICTYOSTELIA 83 100 71 99 AMOEBOZOA (no Dictyostelia) 100 56 PLANTS AMOEBOZOA (no Dictyostelia) 100 METAZOA 68 79 100 PLANTS DICTYOSTELIA 100 99 100

100 100 100

100 PLANTS 100

100 100 100 86

100

100 99 100 DICTYOSTELIA 69

64

100 METAZOA 100

50 100 PLANTS

35 28 48 100 BACTERIA 71 86

Figure 2: Neighbour-joining tree of Hsp90 sequences from representative taxa across the tree of life. The topology shown was derived from a NJ-analysis performed in ClustalX version 2.0.12 (Thompson et al. 1997). Regions with gaps and missing ends were deleted from the calculation and support values obtained from a bootstrap analysis of 1000 trials can be seen on, or to the right of, the relevant branches. All branch lengths are drawn to scale given by the scale bar (0.1 substitutions/site). The taxa represented are from six major groups: fungi (red), Dictyostelia (orange), Amoebozoa without dictyostelids (light blue), plants (dark blue), Metazoa (brown) and bacteria (black).

28 3.1. Evaluation of possible phylogenetic markers Results

100

96

100 100

63 62

67 FUNGI

100 100 55 100

100

58 57

49 100

100 100 PLANTS 57 100

46

100 100

24

100 100 METAZOA

100 100 DICTYOSTELIA

68 65 100 BACTERIA 100

Figure 3: Neighbour-joining tree of eIF5B sequences from representative taxa across the tree of life. Method and annotation as in Fig. 2.

29 3.2. PCR amplification with degenerate primers Results

3.2 PCR amplification with degenerate primers Two phylogenetic markers, eRF3 and eIF5B, passed all initial criteria for deep phylogenetic reconstruction in Dictyostelia. I decided to start the work on eRF3 since this protein is smaller and thus should be easier to amplify.

In total, three optimised PCR protocols were used to obtain eRF3 sequences of which ‘Alternative 1’ with 35 cycles and an annealing temperature of 55 ◦C was the one that worked most frequently (Table 2). Sequences could most often be determined by using the primer pair 2F/3R. 2F/1R and 3F/1R also gave results. The ‘bonus proteins’ 5PTase and INTS1 were sampled with 2F/2R and 2F/3R, respectively. (Table 4)

Table 4: Dictyostelid isolates from which eRF3, 5PTase or INTS1 sequences were obtained. DNA source, PCR protocol applied, primers, length and GC content of obtained gene fragments.

Speciesa Group Isolate Sourceb Gene PCR Primersc Length GC protocol (NT) (%) D. fasiculatum 1 SH3 Schaap eRF3 Alt. 1 2F/1R 1092 49 A. longisorophorum 2 DB10A DBS0235451 eRF3 Alt. 1 3F/1R 1016 45 A. subglobosum 2 LB1 Schaap 5PTase Alt. 2 2F/2R 579 55 D. gloeosporum 2 TCK52 DBS0235825 eRF3 Alt. 1 2F/3R 876 47 D. gloeosporum 2 TCK52 DBS0235825 eRF3 Alt. 1 3F/1Rd 971 47 P. filamentosum 2 SU1 Schaap eRF3 Alt. 1 2F/3R 876 46 P. pseudocandidum 2 TNSC91 MR eRF3 Alt. 1 2F/3R 876 44 D. tenue 3 Pan52 MR eRF3 Alt. 1e 2F/3R 875 33 D. tenue 3 PJ6 Schaap INTS1 Alt. 1 2F/3R 506 51 D. giganteum 4 WS589 MR eRF3 Alt. 3 2F/3R 877 40 a D: Dictyostelium, A: Acytostelium, P: Polysphondylium. b Schaap: DNA kept by the Baldauf lab and used in Schaap et al. (2006); DB: dictyBase strain number (Dicty Stock Center; Northwestern University, Chicago); MR: Spores obtained from Maria Romeralo, the Baldauf lab. c F: forward; R: reverse. d The sequence was never used in any analyses since the majority of the other obtained sequences started at the primer 2F. Thus, the D. gloeosporum sequence ‘2F/2R’ was preferred over ‘3F/1R’. e The DNA was diluted ×5. Undiluted DNA failed for the same PCR protocol.

The correct PCR products were most often obtained when no obvious multiple bands could be discerned from the gel photo (well D, Fig. 4). However, a single, seemingly right sized band did not always give the correct result (well A, Fig. 4). In a few cases, eRF3 sequences were also obtained from reactions that appeared unclean (well B, Fig. 4). Well C, Fig. 4 shows a commonly encountered outcome; an apparent right sized band could be distinguished among multiple bands, but turned out to be a contamination.

In the case of D. tenue Pan 52 the DNA had to be diluted ×5 to get amplification. A likely explanation for this is the importance of having a correct proportion between the components in a PCR reaction. Positive clones from D. fasiculatum and D. giganteum were screened with the nested primers 3F/3R, and it was concluded that this extra step did not reveal additional useful information for identifying vectors with a correct insert present.

30 3.3. Constructing data sets Results

Figure 4: Gel photo of PCR products obtained from PCR reactions run according to pro- tocol Alt. 3 (D. minutum AP11A) and protocol Alt. 1 (D. tenue Pan52 and P. filamentosum SU1) (Table 2). 2F/1R, 2F/2R, etc. indicate primer pairs. Arrows point out right sized bands. A: no result, B: eRF3 sequence obtained, C: no result, D: eRF3 sequence obtained.

3.3 Constructing data sets eRF3 sequences were determined for seven dictyostelid isolates and data for the three sequenced , D. discoideum, D. purpureum and P. pallidum, were obtained from public data bases (Table 4 and C.1). Seven additional isolates were subjected to PCR, but their eRF3 sequences could not be amplified with any primer combination (Table D.1). All new sequences were devoid of introns and had a size ranging from 875 to 1092 NT, which corresponds to 52–68 % of the coding sequence length seen in D. discoideum, D. purpureum and P. pallidum. In all cases where multiple gene fragments were obtained for an isolate, they were close to identical in size and sequence. Hence, the gene appears to be single copy in Dictyostelia. The GC content varied from 33 % in D. tenue to 49 % in D. fasiculatum (Table 4). The final alignment of eRF3 comprised 292 aa.

The SSU rDNA gene is slowly evolving (Romeralo et al. 2010a) and it is in general the first gene sequenced when examining a new species/taxon (Sandra Baldauf, personal communication). α-tubulin, on the other hand, is together with β-tubulin the major component of microtubules in eukaryotic cells (Oakley 2000). Microtubules serve an important role in different types of flagella, and also in the spindles seen during and . Due to its function, α-tubulin is one of the most conserved proteins among the tubulins, and also one of the most conserved eukaryotic proteins in general (Oakley 2000). The SSU rDNA data set contained sequences from all the ten dictyostelid isolates with eRF3 data. In contrast, α-tubulin data could only be found for seven of the ‘eRF3 isolates’. However, three closely related isolates were used instead for the ones missing (Table C.1). The SSU rDNA alignment included 1509 NT and the α-tubulin alignment 325 aa.

31 3.4. Phylogenetic analyses Results

RPB1 is one of three RNA polymerases found in eukaryotic cells (Guilfoyle et al. 1994). It is responsible for the synthesis of pre-mRNA and small nuclear RNAs, and due to its conservative properties, it has been subjected to phylogenetic studies before (e.g. Peer et al. 2000). The RPB1 alignment (the Baldauf lab, 2009-11-03) contains sequences from 18 dictyostelid isolates. Only seven of these are the same ones as were obtained for eRF3 (Table C.1). Since no phylogenetic analyses have been conducted before on this data set, the single gene RPB1 analysis included all available sequences. Nevertheless, for the concatenated data sets, only the seven isolates with eRF3 information available were utilised. In addition, one chimeric sequence could be added. The RPB1 data had to be investigated for possible introns. Using D. discoideum RPB1 (GenBank accession number: XM_636643.1) as a reference (5184 NT), an intron was located between NT 348 and NT 464. This is in concordance with the CDS limits given in GenBank. The intron was found in all examined dictyostelid RPB1 sequences, with the exception of representatives from major group 3, and varied in size from 58 to 116 NT. The RPB1 alignment comprised 401 aa.

The two ‘bonus proteins’ 5PTase and INTS1 were amplified by chance for A. subglobosum and D. tenue PJ6, respectively. However, the obtained gene fragments were only 576 and 506 NT, respectively, representing 26–32 % and 7–13 % of the total coding sequence length seen in D. discoideum, D. purpureum and P. pallidum. Both proteins were devoid of introns and the GC content was 55 % for 5PTase and 51 % for INTS1. The final alignments included additional sequences from the genome sequenced dictystelid isolates (Table C.1), and comprised 292 5PTase and 189 INTS1 aa positions, respectively.

The number of parsimony informative sites for single gene data sets varied between 22 % for the SSU rDNA data set to 65 % for INTS1 (Table 5).

Table 5: Number of aligned positions and parsimony informative sites for single gene data sets.

Parsimony Data set Aligned positions informative sites SSU 1509 335 (22 %) α-tubulin 325 91 (28 %) RPB1 401 106 (26 %) eRF3 295 123 (42 %) 5PTase 292 119 (41 %) INTS1 189 123 (65 %)

3.4 Phylogenetic analyses Single and combined gene trees were constructed and evaluated using Bayesian posterior probabilities (biPP) and bootstrap support values (mlBP). Due to highly limited taxonomic sampling, no results from the analyses with the two ‘bonus proteins’ 5PTase and INTS1 will be discussed in the following section. However, both the trees are in concordance with the major grouping described by Schaap et al. (2006) (Fig. E.4 and E.5).

32 3.4. Phylogenetic analyses Results

3.4.1 The root of the dictyostelid tree Individually or in various combinations, the four phylogenetic markers gave rise to four different roots, albeit with very different levels of statistical support (Fig. 5). In all cases, group 3 and 4 come out as sister lineages. It should be noted that the placement of D. fasiculatum, the sole representative of group 1, is critical to the position of the dictyostelid root in the following analyses.

GROUP 4 GROUP 4

GROUP 3 GROUP 3 GROUP 2 GROUP 2

GROUP 1 GROUP 1 SSU + α-tubulin + RPB1 + eRF3 ∗/98 α-tubulin + RPB1 + eRF3 0.81/69 SSU + eRF3 ∗ RPB1 + eRF3 0.94/69 SSU + eRF3 ×3 0.84 eRF3 0.99/60 SSU ∗/97

GROUP 4 GROUP 4

GROUP 3 GROUP 3 GROUP 2: P + D GROUP 1 GROUP 2: A GROUP 2 GROUP 1 α-tubulin 0.71/64 RPB1 0.89/–

Figure 5: Dictyostelid root positions obtained by single gene or combined data sets. Support values are indicated for the branches marked with red points; biPP/mlBP or only biPP if one value is given. Values equal to 1.0 (biPP) are shown as asterisks (∗). The grouping (1, 2, 3 and 4) is according to Schaap et al. (2006). P, D and A for group 2 stand for Polysphondyliums, Dictyosteliums and Acytosteliums.

As expected, the single gene analyses on SSU rDNA and α-tubulin data give similar results to those of Schaap et al. (2006) and Romeralo et al. (2010b). SSU rDNA sequences place group 1 as the most basal clade with strong support (1.0 biPP/97 % mlBP; Fig. 6). In contrast, α-tubulin data weakly support group 2 as the deepest major divergence (0.71 biPP/64 %mlBP; Fig. 7) as in Schaap et al. (2006) (biPP < 0.89, mlBP < 60 %). The eRF3 data show an alternative root similar neither to SSU rDNA nor to α-tubulin (Fig. 8). This root is located in between group 1 + 2 and group 3 + 4, and the placement is highly supported by the BI analysis (0.99 biPP/60 % mlBP). It is notable that this result, depending on the outgroup and method of analysis, also was obtained with SSU rDNA data by Schaap et al. (2006). A fourth topology is recovered in the RPB1 phylogeny with

33 3.4. Phylogenetic analyses Results

group 1 as the deepest offshoot similarly to the SSU rDNA data (0.89 biPP/mlBP < 50 %), but group 2 as a grade with Acytosteliums as the earliest branch and Polysphondyliums and Dictyosteliums as the subsequent branch (0.88 biPP/58 % mlBP; Fig. 9). The multigene sequence analyses either resulted in the ‘SSU root’ or the ‘eRF3 root’ (Fig. 10 and 11, respectively).

The tree including all comprehensive dictyostelid gene data currently available (SSU rDNA, α-tubulin, RPB1 and eRF3), shows the ‘SSU root’ with significant support (1.0 biPP/98 % mlBP; Fig. 10). Thus, according to a strict ‘total evidence’ approach, the ‘SSU root’ is preferred. Nevertheless, when only combining the protein sequences (α-tubulin, RPB1 and eRF3) it is the ‘eRF3 root’ that emerges, although only weakly to moderately supported (0.81 biPP/69 % mlBP; Fig. 11). When more genes (sites) are added to the analysis (Table 3), the support values of the critical branch for the ‘eRF3 rooting’ decrease. For example do combined RPB1 and eRF3 data support the ‘eRF3 root’ with a higher Bayesian posterior probability than do all proteins together (α-tubulin added) (0.94 biPP/69 % mlBP; Fig. E.1). Finally, it is notable that the ‘SSU root’ also was obtained when combining SSU rDNA and eRF3 data alone. When no weighting was applied, the critical ‘rooting branch’ was fully supported by the BI analysis (biPP = 1.0; Fig. E.2). However, when the eRF3 data set was tripled to make the number of parsimony informative sites equal between the two genes, the Bayesian posterior probability dropped down to 0.84 (Fig. E.3).

3.4.2 The root of group 4 Like the large SSU rDNA data set used by Schaap et al. (2006), the limited SSU rDNA data used in this study place D. giganteum as the most basal taxon of group 4 (1.0 biPP/93 % mlBP; Fig. 6). However, all other data sets examined here, both individual or in combinations, show D. purpureum as the deepest diverging taxon with strong support. Most importantly, the full four gene concatenated data set supports the placement of D. purpureum as the most basal branch in group 4 with values of 1.0 biPP and 90 % mlBP (Fig. 10).

3.4.3 The root of group 2 No eRF3 data were collected for the taxon critical for rooting group 2 – A. ellipticum. Thus, the position of the root in this group remains uncertain.

34 3.4. Phylogenetic analyses Results

? */93

*/91 ? GROUP 4

*/* 4A

3C1 GROUP 3

*/97 0.75/64

0.78/53 2B1

*/* GROUP 2

*/* */* 2B2

2A

1B GROUP 1

*/*

Figure 6: A rooted SSU rDNA phylogeny of the ten dictyostelid isolates later combined with eRF3 data. The topology shown is the result of a Bayesian inference analysis of 1509 aligned NT positions performed in MrBayes version 3.1.2 (Ronquist and Huelsenbeck 2003). The analysis was run for one million generations and the ‘burn-in’ was determined to 25 %. Support values are displayed on the relevant branches: Bayesian posterior probability/maximum likelihood bootstrap. Values equal to 1.0 biPP or 100 % mlBP are indicated as ∗; a – sign represents a support value less than or equal to 0.7 biPP/50 % mlBP. All branches are drawn to scale given by the scale bar (0.1 substitutions per site). The four major groups are indicated as in Schaap et al. (2006). The ‘within group designations’ follow the classification suggested by Romeralo et al. (2010a). D: Dictyostelium, P: Polysphondylium; A: Acytostelium, followed by group number.

35 3.4. Phylogenetic analyses Results

4A */96 */* ? GROUP 4

*/* ?

0.71/64 3C1 GROUP 3 1B GROUP 1

*/* */87 2B1 */81

− GROUP 2 x ? */97 2B1 62 2A

*/*

Figure 7: A rooted α-tubulin phylogeny of the ten dictyostelid isolates later combined with eRF3 data. 325 aa positions were included. The vertical dotted line indicates an alternative branching pattern obtained by the ML analysis. The concerned branch is marked with ‘x’. Method and annotation otherwise as in Fig. 6.

36 3.4. Phylogenetic analyses Results

4A */96

*/99 ? GROUP 4

*/95 ?

3C1 GROUP 3

2B1 0.9/− */* −/− 2B2

*/98 GROUP 2 2B1 */97

0.99/60 2A

1B GROUP 1

*/*

Figure 8: A rooted eRF3 phylogeny of Dictyostelia. 295 aa positions were included. Method and annotation otherwise as in Fig. 6.

37 3.4. Phylogenetic analyses Results

*/96 ?

*/* 4C4 GROUP 4 */* ?

*/85 4C

*/97 3C1 */* GROUP 3 0.88/58 3B 0.77/74 3C2

0.86/56 */* 2B1 0.89/−

GROUP 2 */*

−/− 2A */* 0.97/75

*/89 1B */* GROUP 1 ?

*/*

Figure 9: A rooted phylogeny of all available RPB1 data for Dictyostelia. 401 aa positions were included. Method and annotation otherwise as in Fig. 6.

38 3.4. Phylogenetic analyses Results

? */90

*/* 4A GROUP 4

*/* ?

3C1 GROUP 3

2B1 */98 0.97/− 0.98/− 2B2 + ?

*/* GROUP 2 */* 2B1 */*

2A

1B GROUP 1

*/*

Figure 10: A rooted phylogeny of Dictyostelia based on combined SSU rDNA, α-tubulin, RPB1 and eRF3 sequences. 2534 positions were included and all sequence data were handled at the aa level except SSU rDNA which was at the DNA level. Chimeric sequences are shown with the names of all the isolates combined. Method and annotation otherwise as in Fig. 6.

39 3.4. Phylogenetic analyses Results

? */99 */* 4A GROUP 4

*/* ?

3C1 GROUP 3

2B1 −/− */* 2B2 + ? */* GROUP 2 − */* − x 2B1

0.81/69 2A

1B GROUP 1

*/*

Figure 11: A rooted phylogeny of Dictyostelia based on combined α-tubulin, RPB1 and eRF3 sequences. 1025 aa positions were included. The dotted line indicates an alternative branching pattern obtained by the ML analysis. The concerned branch is marked with ‘x’. Chimeric sequences are shown with the names of all the isolates combined. Method and annotation otherwise as in Fig. 6.

40 Discussion 4 Discussion

An incorrectly rooted phylogeny lacks a large amount of important evolutionary informa- tion. The root represents the oldest point in a tree, and determining it gives evolution a time arrow (Baldauf 2003; Graham et al. 2002). Even though most of the finer level relationships within Dictyostelia are starting to be resolved (Romeralo et al. 2010b), the position of the dictyostelid root remains uncertain. Here, a ‘new’ protein for deep dictyostelid phylogeny – eRF3 – was sequenced, and combined with available data for SSU rDNA, α-tubulin and RPB1. The resulting four gene phylogeny indicates that the dictyostelid root proposed by Schaap et al. (2006) (the ‘SSU root’), placing group 1 as the most basal taxon, still is the dominating one. However, an alternative root (the ‘eRF3 root’) lying in between group 1 + 2 and group 3 + 4 shows some substantial support, especially from protein coding sequences. Furthermore, my data show that D. purpureum with strong support can be considered as the deepest divergence in group 4.

4.1 The root of the dictyostelid tree In this study, the first larger multigene data set was constructed for testing the location of the dictyostelid root. The inclusion of data for the protein eRF3 clearly had an effect on the strength of the results. Even though the phylogenetic analyses including all four examined markers (SSU rDNA, α-tubulin, RPB1 and eRF3) resulted in the same topology as seen in the large dictyostelid SSU rDNA phylogenies (Romeralo et al. 2010b; Schaap et al. 2006) (Fig. 10), the eRF3 and eRF3/RPB1 analyses run as controls indicate that this outcome cannot confidently rule out the alternative ‘eRF3 root’ (Fig. 8 and E.1). The earlier published ‘SSU root’ can particularly be questioned since the outgroup used in the studies is distantly related to the ingroup, indicated by the long branch leading to the outgroup. Hence, a possible interpretation is that the outcome with group 1 as the deepest offshoot might well be a result of long branch attraction of group 1 isolates (e.g. D. multi-stipes) to the outgroup taxa. In addition, the topology obtained with the SSU rDNA data was found to be dependent on which outgroup is used, and also on which evolutionary models are applied (Schaap et al. 2006). The alternative ‘eRF3 root’ obtained with my data has also been observed with the 2006 SSU rDNA data. All together, this shows that the SSU rDNA data set cannot stably resolve the root of the dictyostelid tree.

Apart from in the single protein eRF3 tree and eRF3/RPB1 tree, the ‘eRF3 root’ was obtained when analysing all proteins together (α-tubulin, RPB1 and eRF3; Fig. 11). The fact that a combined data set of all three proteins, albeit weakly supported (0.81 biPP/69 % mlBP), actually shows the ‘eRF3 root’ over the ‘SSU root’ indicates the large impact of the SSU rDNA data set on the analyses. However, this might be due to the differences in lengths of the utilised alignments. The SSU rDNA data set includes as many as 1509 aligned NT positions (335 parsimony informative sites), whereas α-tubulin, RPB1 and eRF3 hold 325 (91), 401 (106) and 295 (123) aa positions, respectively (Table 5). The advantage in phylogenetic signal can cause the ‘SSU root’ to dominate when running analyses on all four genes together. This even though SSU rDNA might not possess enough information for resolving the deepest relationships, a level on which especially the eRF3 data seems to have a more conserved phylogenetic signal.

41 4.1. The root of the dictyostelid tree Discussion

4.1.1 Benefits of eRF3 as a marker for deep dictyostelid phylogeny There are several reasons to believe that eRF3 is a good phylogenetic marker for resolving deep relationships in social amoebae. Due to its involvement in translation termination, eRF3 is considered to be a conserved protein among eukaryotes. In addition, my study shows that the gene is present as a single copy among the social amoebae, and also in the outgroup species utilised here. Furthermore, the eRF3 alignment that includes all dictyostelid and outgroup taxa possesses as much as 42 % parsimony informative sites (Table 5). This indicates that the gene is not too slowly evolving to show a sufficient phylogenetic signal. As a comparison, the percentage of parsimony informative sites ranges between 22–28 % for SSU rDNA, α-tubulin and RPB1. However, the total number of parsimony informative sites is still higher for SSU rDNA compared to eRF3 (335 and 123, respectively). It is also important to consider that eRF3 constitutes a ‘new kind of information’ added to the dictyostelid multigene data set. The gene encoding eRF3 is unlinked to the other three genes used, and even though it participates in translation as SSU rDNA, eRF3 is a protein while SSU rDNA is not; thus, the markers possess different properties.

4.1.2 Problems with eRF3 and the other phylogenetic markers considered A main question here is how to improve the multigene data set for deep dictyostelid phylogeny. First of all, it is important to continue the development of the eRF3 data set. The two main problems affecting the current data assemblage is: 1) the limited taxonomic sampling, and 2) the long branch leading to the outgroup; even though the long outgroup branch also can be seen as a sampling problem (Bergsten 2005). Considering the sampling of dictyostelids, a broader taxonomic representation is wanted particularly from group 1 and group 3 from each of which only one dictyostelid isolate has been studied here. Of special interest are those isolates representing major branches that could fall on either side of the root. These are mainly taxa possessing long branches in the most recent SSU rDNA phylogeny (Romeralo et al. 2010b), thus having an uncertain position in the tree. In group 1, D. multistipes UK26b and D. deminutivum MexM19A are such isolates, and one example from group 3 is D. caveatum WS695. In addition, it is needed to sample the particularly interesting taxon in group 2, A. ellipticum AE2, whose taxonomic position plays a crucial role for determining the evolutionary track of multicellularity in the social amoebae. For A. ellipticum it would be of special interest to obtain data from further isolates of the species that could possibly break up the long branch leading to the taxon. To get such a total evolutionary picture of the dictyostelids as possible, it would also be favourable to sequence the eRF3 gene for some of the taxa included in the three major ‘complexes’ identified by Romeralo et al. (2010b). The most attention should be given to the ‘polycarpum complex’ and/or the ‘polycephalum complex’ since these two groups are positioned in between group 2 and group 3 + 4 in the 2010 SSU rDNA phylogeny (Romeralo et al. 2010b). This placement indicates that an addition of these two complexes to the eRF3 data set should give further insight into the ‘true’ relationships between all the major dictyostelid groups.

42 4.1. The root of the dictyostelid tree Discussion

The second problem affecting the eRF3 data set, and actually the main problem with rooting the dictyostelid tree with any gene, addresses the adequacy of the chosen outgroup taxa. In this study these were Acanthamoeba castellanii and . Similarly to the 2006 SSU rDNA study (Schaap et al. 2006), it was difficult to find gene sequences for outgroup taxa positioned close enough to the ingroup. Most importantly, the sampling was limited to the amoebozoan data available in public data bases, which in the case of eRF3 excluded all other species than those with a fully sequenced genome. According to recently published phylogenies of Amoebozoa (Fiore-Donno et al. 2010; Pawlowski and Burki 2009; Shadwick et al. 2009), P. polycephalum is more closely related to Dictyostelia than is A. castellanii. P. polycephalum belongs to ‘Myxogastria’, whereas A. castellanii is a member of ‘Acanthhopodia + ’ (Fiore-Donno et al. 2010; Pawlowski and Burki 2009) or the ‘Acanthamoeboids’ (Shadwick et al. 2009). Hence, these phylogenies show that myxogastrids, thus including P. polycephalum, should be a more appropriate outgroup than A. castellanii and its closest relatives. This indicates that one possibility for further improvement of the current dictyostelid data set would be to exclude the A. castellanii sequences and instead include data for some additional myxogastrid species.

One additional aspect to consider when choosing appropriate outgroup taxa is the influence of compositional nucleotide biases between the ingroup and the outgroup terminals. Dictyostelids are known to be AT rich, with an overall AT content of ∼77.5 % in D. discoideum (Eichinger et al. 2005). In contrast, the closely related myxogastrids tend to have a large range of bias from extremely AT rich to extremely GC rich species (Fiore- Donno et al. 2005). However, most available data is for P. polycephalum which is about as GC rich as D. discoideum is AT rich. This compositional difference often makes it difficult to use P. polycephalum as the outgroup in dictyostelid phylogeny, at least for SSU rDNA trees (Schaap et al. 2006). Nevertheless, today only a few amoebozoan genomes are sequenced and we need to use what is available. As an alternative to using Myxogastria as the outgroup, the amoebozoan phylogeny by Fiore-Donno et al. (2010) shows that the ‘Ceratiomyxida’ forms the sister clade to Dictyostelia + Myxogastria, thus species from this group might also be useful as future outgroup taxa.

When further considering the multigene data set as a whole, it is essential to evaluate each single gene data set separately and question its influence/importance for the phylogenetic outcome. In the case with the four phylogenetic markers utilised in this study, I have already mentioned the problem that the SSU rDNA data comprises approximately the same number of parsimony informative sites as all the proteins taken together (Table 5). However, I do not see it as an alternative to exclude the SSU rDNA data from the analyses. The SSU rDNA gene is important for showing the major divisions in the dictyostelid tree, and it also helps resolving many of the relationships within groups of social amoebae. In addition, it is the only representative of a RNA gene in the current dictyostelid multigene data set. The inclusion of several unlinked genes possessing different properties is important for acquiring such a broad sampling and correct evolutionary picture as possible (Bergsten 2005). Regarding the proteins included, α-tubulin is the one that can be viewed with scepticism. Even though this protein, in addition to SSU rDNA, could confirm the presence of four major dictyostelid divisions (Schaap et al. 2006), the resulting single gene phylogeny is highly affected by heterotachy (Fig. 7). The evolutionary

43 4.1. The root of the dictyostelid tree Discussion

pressure on α-tubulin has decreased in Dictyostelia due to the loss of flagella in the group. This has opened for a quite high evolutionary rate in α-tubulin, and since many of the available outgroup taxa still have flagella the problem with heterotachy is obvious. An especially high evolutionary speed can be observed in group 4, whereas group 1 + 2 possess a lower rate. Since heterotachy can seriously affect phylogenetic reconstruction (Gribaldo and Philippe 2002; Som and Fuellen 2009), the best thing to do might be to exclude α-tubulin from further phylogenetic studies in Dictyostelia. Instead, high priority should be given to the development of an additional dictyostelid protein data set. Initial work performed during this study shows that the eukaryotic initiation factor 5B (eIF5B) seems to be a good phylogenetic marker for deep dictyostelid phylogeny. Primers are already available (Table B.2) and a first test amplification indicated that some primer combinations work for A. subglobosum.

Even though multigene data sets are supposed to give a more accurate picture of the actual evolutionary history, it is still a balancing act between the inclusion of more genes and a broader taxonomic sampling for each gene (e.g. Baldauf et al. 2000). It is seldom possible to have the same broad sampling for all genes included, thus the problem with missing data and the need of chimeric sequences is common in large scale phylogenetic reconstruction. However, the optimal situation would certainly be to have such a wide taxonomic sampling as possible for all utilised genes (Philippe et al. 2004). Considering the most recent SSU rDNA phylogeny (Romeralo et al. 2010b) which includes about 150 dictyostelid isolates, the family trees constructed here comprises a small selection of the actual dictyostelid diversity known. However, it is important that as many as 50 new species of social amoebae have been discovered since 2006, a number which indicates that the Dictyostelia is probably still only sparsely sampled. Most of the new species are small and delicate, and they are mostly found in new sample locations (Romeralo et al. 2010b). Interesting is that these species often represent the deepest branches in the dictyostelid tree, thus constitute important data for the understanding of larger evolutionary trends in the evolution of Dictyostelia (Schaap et al. 2006).

4.1.3 Inostitol 5-phosphatase as a phylogenetic marker Besides eIF5B, the unexpectedly found ‘bonus protein’ inositol 5-phosphatase (5PTase) is a potential marker for deep phylogenetic studies in social amoebae. Even though the evolutionary tree calculated here (Fig. E.4) only includes representatives from group 2 and group 4, the phylogeny shows the important sign of an even evolutionary rate between ingroup and outgroup taxa. In addition, the protein is involved in a totally different cellular pathway as compared to the other proteins used for dictyostelid phylogenies. 5PTase is a metabolising enzyme that is involved in growth and development in D. discoideum (Loovers et al. 2003). When comparing the second ‘bonus protein’, integrator complex subunit 1 (INTS1), the calculated tree (Fig. E.5) indicates that this marker might possess a problem with heterotachy; group 4 seems to have a slower evolutionary rate than group 2 + 3. However, no general conclusions can be drawn for INTS1 since only approximately 10 % of the total gene was sampled. As a comparison, around 30 % of 5PTase was sequenced.

44 4.2. The root of group 4 Discussion

4.1.4 The ‘eRF3 root’ and evolutionary trends in Dictyostelia The alternative ‘eRF3 root’ affects the interpretations of the morphological evolution in social amoebae. Only a few morphological trends could be distinguished in a mapping study conducted on the 2006 SSU rDNA phylogeny (Schaap et al. 2006), whereof the most striking is an increased size of the fruiting bodies and spores when going from what might be the evolutionary oldest group 1 to the one of the youngest, group 4. However, with an alternative rooting one has to think about how the expression “evolutionary oldest and youngest group” should be used for dictyostelids. The ‘larger trends’ described by Schaap et al. (2006) assumes the ‘SSU root’ to be the right one, and the ‘eRF3 root’ might give a different interpretation.

4.2 The root of group 4 In addition to showing a likely alternative root of the overall dictyostelid family tree, the data used in this study reveal a well-supported root of the major group 4 (Fig. 10). This root has for a long time been an outstanding question in dictyostelid phylogenetics, and the published SSU rDNA phylogenies (Schaap et al. 2006; Romeralo et al. 2010b) place D. giganteum as the deepest offshoot of the clade. However, this placement is strongly rejected here where instead D. purpureum is located on the most basal branch with full support. This result was obtained utilising all data sets except the single gene SSU rDNA data set. Most important, this topology was given by the combined data set comprising all four markers used in the study: SSU rDNA, α-tubulin, RPB1 and eRF3 (Fig. 10).

Finding the ‘true’ root of group 4 is crucial for estimating the overall age of Dictyostelia. An age of more than 400 years for group 4 is based on the genetic distance between D. discoideum and D. purpureum which shows that the two species shared a common ancestor 400 million years ago (Parikh et al. 2010). However, the placement of D. purpureum as the most basal branch in group 4 suggests that the age of this group is no more than 400 million years. The location of the root of group 4 is also of interest since this subclade comprises the model species D. discoideum.

4.3 Conclusions Obtaining an accurate reconstruction of evolutionary relationships is challenging, especially when considering deep phylogenies such as the one of Dictyostelia. A correctly interpreted position of the dictyostelid root is important for understanding the evolution of social amoebae at many different taxonomical levels. Here, an eRF3 data set was constructed, and it shows that the currently described root, which is solely based on SSU rDNA and places group 1 as the most basal branch (Schaap et al. 2006), should be viewed with caution. Instead, an alternative root located between group 1 + 2 and group 3 + 4 shows some substantial support, especially from protein encoding genes. The outcome indicates that the phylogeny of Dictyostelia is far from complete and that further taxonomical sampling, both on a species level and on a gene level, is needed to gain insight into the evolutionary picture that the social amoebae display. Furthermore, this study fully supports D. purpureum as the deepest divergence in the dictyostelid group 4. This result questions the age of group 4 and the calibration of the whole dictyostelid tree.

45 Acknowledgements

First of all I would like to give the greatest thanks to my supervisor Sandra Baldauf for excellent support and firm optimism during the whole project. I am also grateful to Petra Korall who made it possible for me to do my degree project at the Department of Systematic Biology in the first place. Afsaneh Ahmadzadeh and Nahid Heidari have provided me first- guidance in the lab, and special thanks to Afsaneh who has been a perfect discussion partner and who always has found solutions to my lab-based problems. Maria Romeralo and Allison Perrigo have been superior teachers in dictyostelid culture technique, and they also generously provided me with DNA samples. Thanks to Ding He and Omar Fiz-Palacious for help with various computer problems. I would also like to express my gratitude to everyone else at the department for giving me a warm welcome, and an extra thank to my friendly room-mates Pravech (Ping) Ajawatanawong and Åsa Kruys. Likewise, thanks Anders for all computer support at home.

Many of the dictyostelid cultures used in this study were provided by the Dicty Stock Center (dictyBase; Northwestern University, Chicago).

46 References

Baldauf, S. L. (2003). The deep roots of eukaryotes. Science. 300, 1703–1706. — (2008). An overview of the phylogeny and diversity of eukaryotes. Journal of Systematics and Evolution. 46, 263–273. Baldauf, S. L. and Doolittle, W. F. (1997). Origin and evolution of the slime molds (Mycetozoa). Proceedings of the National Academy of Sciences of the United States of America. 94, 12007–12012. Baldauf, S. L. and Palmer, J. (1993). Animals and fungi are each other’s closest relatives: Congruent evidence from multiple proteins. Proceedings of the National Academy of Sciences. 90, 11558–11562. Baldauf, S. L., Roger, A. J., Wenk-Siefert, I. and Doolittle, W. F. (2000). A -level phylogeny of eukaryotes based on combined protein data. Science. 290, 972–977. Bergsten, J. (2005). A review of long-branch attraction. Cladistics. 21, 163–193. Bonfield, J., Smith, K. and Staden, R. (1995). A new DNA sequence assembly program. Nucleic Acids Research. 23, 4992–4999. Bonner, J. T. (2003). On the origin of differentiation. Journal of Biosciences. 28, 523–528. Bonner, J. T. and Lamont, D. S. (2005). Behavior of cellular slime molds in the soil. Mycologia. 97, 178–84. Bonner, J. T. and Savage, L. (1947). Evidence for the formation of cell aggregates by in the development of the Dictyoslelium discoideum. Journal of Experimental Biology. 106, 1–26. Csermely, P., Schnaider, T., Sóti, C., Prohászka, Z. and Nardai, G. (1998). The 90- kDa Molecular Chaperone Family Structure, Function, and Clinical Applications. A Comprehensive Review. Pharmacology and Therapeutics. 79, 129–168. Eichinger, L. et al. (2005). The genome of the social amoeba Dictyostelium discoideum. Nature. 435, 43–57. Felsenstein, J. (1978). Cases in which parsimony or compatibility methods will be positively misleading. Systematic Zoology. 27, 401–410. Fey, P., Kowal, A. S., Gaudet, P., Pilcher, K. E. and Chisholm, R. L. (2007). Protocols for growth and development of Dictyostelium discoideum. Nature protocols. 2, 1307–1316. Fiore-Donno, A.-M., Berney, C., Pawlowski, J. and Baldauf, S. L. (2005). Higher-order phylogeny of plasmodial slime molds (Myxogastria) based on elongation factor 1-A and small subunit rRNA gene sequences. The Journal of Eukaryotic . 52, 201–210. Fiore-Donno, A.-M., Nikolaev, S. I., Nelson, M., Pawlowski, J., Cavalier-Smith, T. and Baldauf, S. L. (2010). Deep phylogeny and evolution of slime moulds (Mycetozoa). . 161, 55–70. Graham, S., Olmstead, R. and Barrett, S. H. (2002). Rooting phylogenetic trees with distant outgroups: a case study from the commelinoid monocots. and evolution. 19, 1769–1781. Graur, D. and Li, W. (2000). Fundamentals of molecular evolution. Sunderland, Massachu- setts: Sinauer Associates. Gribaldo, S. and Philippe, H. (2002). Ancient phylogenetic relationships. Theoretical Population Biology. 61, 391–408.

47 Guilfoyle, T., Ulmasov, T. and Larkin, R. (1994). RNA polymerase genes. Plant Molecular Biology Reporter. 12, S63–S66. Hall, T. (1999). BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symposium Series. 41, 95–98. Hubbard Center for Genome Studies (2003). Degenerate primers. url: http://hcgs.unh. edu/protocol/basic/pcrdegenpri.html (visited on 25/01/2010). Huelsenbeck, J., Bull, J. and Cunningham, C. (1996). Combining data in phylogenetic analysis. Trends in Ecology and Evolution. 11, 152–158. Inagaki, Y. and Doolittle, W. F. (2000). Evolution of the eukaryotic translation termination system: origins of release factors. Molecular Biology and Evolution. 17, 882–889. Kaushik, S. and Nanjundiah, V. (2003). Evolutionary questions raised by cellular slime mould development. Proceedings of the Indian National Science Academy. B69, 825– 852. Kimball, S. R. (1999). Eukaryotic initiation factor eIF2. The International Journal of Biochemistry and Cell Biology. 31, 25–29. Loovers, H., Veenstra, K., Snippe, H., Pesesse, X., Erneux, C. and Haastert, P. van (2003). A diverse family of inositol 5-phosphatases playing a role in growth and development in Dictyostelium discoideum. The Journal of Biological Chemistry. 278, 5652–5658. Oakley, B. (2000). An abundance of tubulins. Trends in Cell Biology. 10, 537–542. Page, R. (2000). Extracting species trees from complex gene trees: reconciled trees and vertebrate phylogeny. Molecular Phylogenetics and Evolution. 14, 89–106. Parikh, A., Miranda, E., Katoh-Kurasawa, M., Fuller, D., Rot, G., Zagar, L., Curk, T., Sucgang, R., Chen, R., Zupan, B., Loomis, W., Kuspa, A. and Shaulsk, G. (2010). Conserved developmental transcriptomes in evolutionarily divergent species. Genome Biology. 11, R35. Pawlowski, J. and Burki, F. (2009). Untangling the phylogeny of amoeboid . The Journal of Eukaryotic Microbiology. 56, 16–25. Peer, Y. Van de, Abdelghani, A. and Meyer, A. (2000). Microsporidia: accumulating mo- lecular evidence that a group of amitochondriate and suspectedly primitive eukaryotes are just curious fungi. Gene. 246, 1–8. Perentesis, J. P., Phan, L. D., Gleason, W. B., LaPorte, D. C., Livingston, D. M. and Bodley, J. W. (1992). Saccharomyces cerevisiae elongation factor 2. Genetic cloning, characterization of expression, and G- modeling. The Journal of Biological Chemistry. 267, 1190–1197. Philippe, H. and Laurent, J. (1998). How good are deep phylogenetic trees? Current Opinion in Genetics and Development. 8, 616–623. Philippe, H., Snell, E. A., Bapteste, E., Lopez, P., Holland, P. W. H. and Casane, D. (2004). Phylogenomics of eukaryotes: impact of missing data on large alignments. Molecular Biology and Evolution. 21, 1740–1752. Posada, D. (2006). ModelTest server: a web-based tool for the statistical selection of models of nucleotide substitution online. Nucleic Acids Research. 34, W700–W703. Premier Biosoft International (2010). PCR primer design guidelines. url: http://www. premierbiosoft.com/tech_notes/PCR_Primer_Design.html (visited on 25/01/2010). Queiroz, A. de, Donoghue, M. J. and Kim, J. (1995). Separate versus combined analysis of phylogenetic evidence. Annuual Review of Ecology, Evolution, and Systematics. 26, 657–681.

48 Raper, K. B. (1984). The dictyostelids. Princeton, New Jersey: Priceton University Press. Romeralo, M., Baldauf, S. L. and Cavender, J. C. (2009). A new species of cellular slime mold from southern Portugal based on morphology, ITS and SSU sequences. Mycologia. 101, 269–274. Romeralo, M., Spiegel, F. W. and Baldauf, S. L. (2010a). A fully resolved phylogeny of the social (Dictyostelia) based on combined SSU and ITS rDNA sequences. Protist, (in press). Romeralo, M., Cavender, J., Landolt, J., Stephenson, S. and Baldauf, S. (2010b). The expanding phylogeny of social amoebae defines new major lineages and emerging patterns in morphological evolution., (submitted). Ronquist, F. and Huelsenbeck, J. (2003). MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 19, 1572–1574. Schaap, P. (2007). Evolution of size and pattern in the social amoebas. BioEssays. 29, 635–644. Schaap, P., Winckler, T., Nelson, M., Alvarez-Curto, E., Elgie, B., Hagiwara, H., Cavender, J., Milano-Curto, A., Rozen, D. E., Dingermann, T., Mutzel, R. and Baldauf, S. L. (2006). Molecular phylogeny and evolution of morphology in the social amoebas. Science. 314, 661–663. Shadwick, L. L., Spiegel, F. W., Shadwick, J. D. L., Brown, M. W. and Silberman, J. D. (2009). Eumycetozoa = Amoebozoa?: SSUrDNA phylogeny of slime molds and its significance for the amoebozoan supergroup. PloS one. 4, e6754. Shaffer, B. (1953). Aggregation in cellular slime moulds: in vitro isolation of acrasin. Nature. 171, 975. Som, A. and Fuellen, G. (2009). The effect of heterotachy in multigene analysis using the neighbor joining method. Molecular Phylogenetics and Evolution. 52, 846–851. Stamatakis, A. (2006). RAxML-VI-HPC: Maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 22, 2688–2690. Stamatakis, A., Hoover, P. and Rougemont, J. (2008). A rapidbootstrap algorithm for the RAxML web-servers. Systematic Biology. 75, 758–771. Strassmann, J., Zhu, Y. and Queller, D. (2000). Altruism and social cheating in the social amoeba Dictyostelium discoideum. Nature. 408, 965–967. Sullivan, J. (1996). Combining data with different distributions of among-site rate variation. Systematic Biology. 45, 375–380. Swanson, A. R., Vadell, E. M. and Cavender, J. C. (1999). Global distribution of forest soil dictyostelids. Journal of Biogeography. 26, 133–148. Swanson, A. R., Spiegel, F. W. and Cavender, J. C. (2002). Taxonomy, slime molds, and the questions we ask. Mycologia. 94, 968–979. Swofford, D. (2003). PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods) version 4. Sunderland, Massachusetts: Sinauer Associates. Tarasov, O. V., Zhuravleva, G. A. and Abramson, N. I. (2008). Evaluation of the gene encoding translation termination factor eRF3 as a possible phylogenetic marker. Molecular Biology. 42, 834–842. Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F. and Higgins, D. G. (1997). The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Research. 25, 4876–4882.

49 Unbehaun, A., Marintchev, A., Lomakin, I. B., Didenko, T., Wagner, G., Hellen, C. U. T. and Pestova, T. V. (2007). Position of eukaryotic initiation factor eIF5B on the 80S ribosome mapped by directed hydroxyl radical probing. The EMBO Journal. 26, 3109–3123. Zwickl, D. and Hillis, D. (2002). Increased taxon sampling greatly reduces phylogenetic error. Systematic Biology. 51, 588–598.

50 Appendix A Appendices

A Primer maps 3R 3F Polysphondylium (Dpu) and 1R 2R 2F (Ddi), Dictyostelium discoideum 4R Primer map for eRF3. The aa alignment was constructed in ClustalX version 2.0.12 (Thompson et al. 1997) 1F (Ppa). Gaps are indicated by hyphens and a consensus sequence is shown below each row in the alignment. Note Figure A.1: and it includes eRF3pallidum sequences from that only the forward primers (indicated by F’s) could beprimers designed were designed from also the included marked a aa’s, range whereas of the other eukaryotes reverse besides primers dictyostelids. AAO61461.1, Ppa – EFA83433.1. DOE Joint Genome Institute: Dpu – 36325. N.B. The ’real’ alignment from which eRF3 (indicated by R’s) were constructed from the complements of the marked aa’s. GeneBank (NCBI) accession numbers: Ddi – 5 15 25 35 45 55 65 75 85 95 105 115 Ddi --MKLNPNASSFVPK-FVAKPAAPAPAAPVAVVAEPVAAPVVAEPTPTP-APVETKEEPTTTATTTSVSSIKTEEDLENIKPTESVYKVEDDDIEDDDE-VDEVAEKIEQV----VKVLP Dpu MNLKLNPNASSFVPKGFGAKPAEPAAEE-----QPPKEEKPVVEEKPK------VVEQPPKTEET------KTTESVFKIEDDDIEDDEEEITSTTNKVADI----KIDLP Ppa ----MNPNASVFVPK-FLVKPAATTPAPAPAATPAPAPAPAAATPAPEEKQSTTTTTTTTTTTSTPSASKVTASPPVEEQKPTESVFKVEDDDIEEEET--DEITKKIEALPIQDVSDFK :***** **** * .*** .:...... * *.****:*:******::: . ::*: : 125 135 145 155 165 175 185 195 205 215 225 235 Ddi EDSREHLNIVFLGHVDAGKSTLSGSIMVLTGQVDPHTLAKYEREAKENHREGWIYAYIMDTNEEERTKGKTVEVGRAHFETTKKRYTILDAPGHRLYVPNMIIGAAQADVGILVISSKKG Dpu EDSREHVNLVFLGHVDAGKSTLSGSIMVLTGQVDSHTLAKYEREAKENHRESWIFAYIMDTNEEERTKGKTVEVGRAHFETEKKRFTILDAPGHRLYVPNMIVGAAQADVGILVVSSKKG Ppa EDQREHVNIVFIGHVDAGKSTISGRIMVLTGQIDSHTLAKYEREAKENHRESWLYAFVMDTNEEERAKGKTVEVGRAHFETEKRRYTILDAPGHKNYVPNMISGAAQADVGILVVSSKKG **.***:*:**:*********:** *******:*.****************.*::*::********:************** *:*:********: ****** ***********:***** 245 255 265 275 285 295 305 315 325 335 345 355 Ddi EFEAGVEGGQTIEHARLAKMIGIKYLVVFVNKMDEPTVKWSKARYDEITDKLTVHLKKCGWNPKKDFHFVPGSGYGTLNVLAPLAPGVCDWYSGPSLIGTLDNLSGMERNEGGALRIPIT Dpu EFEAGVDGGQTVEHARLAKMIGLKQLVIFVNKMDEPSVQWSKERYDEIVEKLSVHLKKCGWNLKKDVQFIPGSGYGTLNVKDPLKPGVCSWYNGPSLIGTLDNLPPIERNIEGALRIPIT Ppa EFEAGVDGGQTIEHARLAKMIGIKHLIVLVNKMDEPTVNWSKARYDEICEKITGHLKKCGFNPKKDFQFVPGSGYTSANIKDTVSKELCPWNTGVSLIETLDLLPQFERNESGALRVPII ******:****:**********:* *:::*******:*:*** ***** :*:: ******:* ***.:*:***** : *: .: :* * .* *** *. :*** ****:** 365 375 385 395 405 415 425 435 445 455 465 475 Ddi TSYKDRGIVNVIGKVESGTISVGQSIHIMPGKTKVEVISLTGDICSFKTARPGENITIALKGIEGDDSIRPGSILAEINRPVPVVSEIEAIVYILDMPEERRLFTPSFSAIFHAHTAVED Dpu TSYKDRGIVNVIGKVESGTISVGQSIHIMPGKYKVDVVSLTGDICSFKTARPGENITIALKGIE-DDVIRPGSILAEINRPVPVVSEIEAIVYILDLPEERKLFSPDFTSIFHAHTAVED Ppa DSFKDRGIVHVMGKVESGSITLGQQIFIMPGKTKVEVLSLSNESCNFRTGRTGENLRIGLKGID-DEQIRQGSILSEIVRQVPVVTEIEAIIVLIDLPKDKQLFTADYEAMFHAHTAVEE *:******:*:******:*::**.*.***** **:*:**:.: *.*:*.*.***: *.****: *: ** ****:** * ****:*****: ::*:*::::**:..: ::********: 485 495 505 515 525 535 545 555 565 Ddi VTVKSLIATIDTKTSTEIKQKPTFCKVGDAVKCRLVLGRAVCLEEFTTNPQLARFTIRDSTKTIAFGKVINIGKKAKEEIARQTKA---- Dpu VRVKSLIATIDMKTNEEKKQKPTFCKVGDAVKCRLILGRPVCLEEFSVNPQLSRFTLRDSTKTIAFGKVVNIGKKAREAIAAATLSPK-- Ppa CKIKSLIATIDMKTGNELKQKPTFARNGESIKARVVLSRAVCLEEFTSNPQLSRFTLRQSNKTIAFGKVVNVGKKVKEQLAAAAAALSPK :******** **. * ******.: *:::*.*::*.*.******: ****:***:*:*.********:*:***.:* :* :

51 Appendix A 2R (Ppa). Gaps are 4R 4F 1R Polysphondylium pallidum 5F (Dpu) and Dictyostelium purpureum (Ddi), 3R 1F 2F + 3F Dictyostelium discoideum Primer map for eIF5B. The aa alignment was constructed in ClustalX version 2.0.12 (Thompson et al. 1997) and it includes Figure A.2: eIF5B sequences from indicated by hyphens andF’s) a could consensus be sequence designed ismarked from shown aa’s. the below GeneBank marked each (NCBI) aa’s, row accession whereas in numbers: the the reverse alignment. Ddi600 primers – Note alignment (indicated XP_641984.1, that positions by Ppa only are R’s) – the cut were EFA79965.1. forward constructed from DOE primers from this Joint (indicated the figure. Genome by complements Institute: of Dpu the – 4800. N.B. The ’real’ alignment from which eIF5B primers were designed also included a range of other eukaryotes besides dictyostelids. N.B. The first 485 495 505 515 525 535 545 555 565 575 585 595 Ddi SAVGEKKEKSTNKTKSGGKKQNQVSSITESVEKLKIEEPTSAPKDDVVEVMDSWDNDDYETVEEIQKKKEEEAKRKEEEEEAQRLAAKEEKKKAKAAAAAAAAALIPTTDPTTTFA---- Dpu ------TQQLQTG------QTDPTSTVE---- Ppa PGDAKKVVVYSSKKKPGKKDNVNQLAEQVSEIKIEETTPTTSSPPAEEEVMESWDADGWESLEDIQKRKEEEEKQRLEEEEKKRKEEKEAAKREKKAGKKGSKDSKSTTTTTTTAAGGNP :.: :.* * .*:* 605 615 625 635 645 655 665 675 685 695 705 715 Ddi ------DKSYRSPIICILGHVDTGKTSLLDKIRNTNVQGGEARGITQQIGASFIPVDAIKEQTKSFAEKIKMDFKLPGLLLIDTPGHESFNNLRSRGSGLCDLAILVIDIMHGLQAQTL Dpu ------NKSYRSPIICILGHVDTGKTSLLDRIRNTNVQGGEARGITQQIGASFIPVDAIKEQTKTFAEKLKMDFKLPGLLLIDTPGHESFNNLRSRGSGLCDLAILVIDIMHGLEAQTI Ppa ENQETPAEKELRSPILCILGHVDTGKTSLLDKIRSTNVQGGEARGITQQIGASFVPVETIKEQTKGINEKLKVNFRLPGLLLIDTPGHESFNNLRSRGSGLCDLAILVIDIVHGIEKQTV :*. ****:***************:**.*******************:**::****** : **:*::*:***********************************:**:: **: 725 735 745 755 765 775 785 795 805 815 825 835 Ddi ESINLLRMRKTPFIVALNKVDRIYDWKPCVNTDFKEAYKIQSKSAAQEFDYKVKDIIAALAGQELNAELYWRNKDHRKYVSLVPTSANTGEGISDLMLVVIQLMQKLMLDKVEFTNQLQC Dpu ESINLLRMRKTPFIVALNKVDRIYDWKPCVNTDFKESYKLQTRVAAQDFDSKVKETIAALAGQELNAELYWRNKDHKKYVSLVPTSAHTGEGLSDLMLVVIQLMQKLMLDKVEFSSQLQC Ppa ESINLLRMRKTPFVVALNKVDRVYEWQKYVNLSMNEELKLQNRHTAQDFESRTNKIIAQLAEHELNAALYWRNSDYKKVVSLVPTSAVTGEGLGDLMLVVVQLMQKIMSEKVTYRNTLEC *************:********:*:*: ** .::* *:*.: :**:*: :.:. :**** *****.*::* ******** ****:.******:*****:* :** : . *:* 845 855 865 875 885 895 905 915 925 935 945 955 Ddi TLLEVKVIEGFGTTIDVVLVNGTLNEGDKIVVSGFNGPIETSIRSLLTPPPLRESRVKSQFINHKSIRAAMGIKIVAPGLEKAVPGTSLHVVGPNDDIEKIRAEAKREVDSVLNDVETSG Dpu TLLEVKVIEGYGTTIDVVLVNGTLNEGDKIVVSGFNGPIETNIRSLLTPPPLRESRVKSQFIQHKSIRAAMGIKIVAPGLEKAVPGTSLHVVGPNDDIEKIKAEAKKEVDSVLNDVETSG Ppa TILEVKAIEGLGTTVDAVLVNGVLNEGDKIVVSGFGGPIVTTIRSLLTPPPLRETRVKTALIPNKSIRASMGVKIVAPGLEKAVAGTSLYVVGEDDDLEELKREAQKDVDSVLATDDTEE *:****.*** ***:*.*****.************.*** *.************:***: :* :*****:**:***********.****:*** :**:*::: **:::***** :*. 965 975 985 995 1005 1015 1025 1035 1045 1055 1065 1075 Ddi IGVSVQASTLGSLEAFLNFLKKIKIPVANVAIGPVHKKHIMNASIMLDKDPKYAILLAFDVKIEESAIQAANEMKVQVLSDETIYLFEEKLKKHFGAIKEKLRAETASICVWPCILEVTN Dpu IGVSVQASTLGSLEAFLNFLKKIKVPVANVAIGPVHKKHVMNTAIMLEKDPKYAILLAFDVKIEDSAVQAANEMKIQILSDETIYLFEEKLNKHFGAIKEKLRAETAGICVWPCILEVVQ Ppa IGVTIQASTLGSLEAFSTFLKGEKIPVAFKAIGPVHKKHVMNAAIMANKDPKYAILLAFGVKVEESAMQTANELKITIMSEETIYLFVDRLKTYFGAIKEKLREQTANVCVWPCLLDVIK ***::*********** .*** *:*** *********:**::** :***********.**:*:**:*:***:*: ::*:****** ::*:.:********* :**.:*****:*:* : 1085 1095 1105 1115 1125 1135 1145 1155 1165 1175 1185 1195 Ddi VFRNSNPILVGVRVKEGTLRIGTPICVPESN--CADVGKVIGIKLNEKDVTLAKKDDVVSVAIDDNNTKTTIYRHFDDKKQWMSKITRESLDALKEGWSEDLTKQDIQLLKFMKTVYKIQ Dpu VFRNTNPIICGVKVREGTLRIGTPICVPESN--NADVGTVTSIKLNEKDVQLAKKDDVVSIAIDDNNTKTTLYRHFDDKKQWMSKITRESLDALKEAWAGDLSKQDIQLLKFMKQIYKI- Ppa VFRQKDPILVGVRIKEGTLRIGTPLVLPDPNQQHVDIGVVQSIKLNEKDVNIAKKDDEVSISIDDSNTKTTFGRHFDDKKDWYSKISRDSIDALKEAWSDELTKQDKQLIVFMKGVFKIN ***:.:**: **:::*********: :*:.* .*:* * .******** :***** **::***.*****: *******:* ***:*:*:*****.*: :*:*** **: *** ::**

52 Appendix C B eIF5B primers and unused eRF3 primers

Table B.1: Unused eRF3 primers. Start and end positions, nucleotide sequences, GC contents and melting temperatures (Tm).

a b b 0 0 c ◦ c Name Start End Nucleotide sequence (5 → 3 ) GC content (%) Tm ( C) 1F 131 138 TTC ATA GGT CAT GTN GAY GCN GG 50 61.5 4R 396 389 TTC GAC CTT AGT YTT NCC NGG CAT 47.9 61.9 a F: forward; R: reverse. b Start and end positions refer to the aa alignment of dictyostelid eRF3 sequences seen in Fig. A.1. c Values were calculated by Eurofins MWG Operon (Ebersberg, Germany).

Table B.2: eIF5B primers. Start and end positions, nucleotide sequences, GC contents and melting temperatures (Tm).

a b b c ◦ c Name Start End Nucleotide sequence (5’ → 3’) GC content (%) Tm ( C) 1F 633 640 GAG CGG AAG AGC AAY GTN CAR GG 58.7 65.1 2F 632 639 CC GAC AGG AGC ACN AAY GTN CAA 54.3 63.3 3F 632 639 CC GAC AGG AGC ACN AAY GTN CAG 58.7 65.1 4F 805 812 ACT GCT CGA CAT ACN GGN GAR GG 58.7 65.1 5F 910 917 GGC ACG GGA CAG AAR ATH GTN GC 58 64.8

1R 1162 1155 CCA CCG CCT CGT RTC RTC RAA RTG 58.3 66.1 2R 1078 1071 GAC CTC CAA CAT RCA NGG CCA NAC 56.3 65.3 3R 1000 993 CAG TTC GGT CTT RTG NAC NGG NCC 58.3 66.1 4R 928 921 CGA TCT CCC ACG NAC NGC YTT YTC 58.3 66.1 a F: forward; R: reverse. b Start and end positions refer to the aa alignment of dictyostelid eIF5B sequences seen in Fig. A.2. c Values were calculated by Eurofins MWG Operon (Ebersberg, Germany).

C List of sequences used in the study

53 Appendix C ) g e – – – – – – – – – – – – 100030 INTS1 This work (DOE JGI ) g e – – – – – – – – – – – – 156222 5PTase This work (DOE JGI ) g e – – – – – – , 36325 eRF3 This work This work This work (DOE JGI ) g d P. filamentosum – – – – – – – – or references. Isolates combined to chimeric sequences are SLB This work SLB SLB This work SLB This work SLB This work a 149604 RPB1 with (DOE JGI . ) c g P. luridum – – – – – – 50524 Schaap Schaap -tubulin D. giganteum SH3, α (DOE JGI XP_637058.1 XM_636643.1 AAO61461.1 XP_641410.1 XP_638252.1 with ) f ) g c – – – – D. fasiculatum D. brunneum www.ncbi.nih.gov/Entrez/ ) if nothing else is given. (DOE JGI _0C0001 (SI Pan52 and SH3 Schaap Schaap Isolate SSU WS700 – SmokOW9A D. tenue 1 2 DB10A Schaap3 Schaap HAG653 – Schaap 1 3 Pan524 Schaap 22 LB1 TCK52 Schaap3 Schaap PJ6 22 SU1 LR2 Schaap 44 AX4 WS589 Contig BF4V2 Schaap 22 PN500 TNSC91 Schaap Schaap EFA86605.1 Schaap EFA78084.14 EFA83433.1 QSDP1 EFA80518.1 Scaffold 786 EFA75731.1 Group with . http://www.sanger.ac.uk/Projects/D_discoideum/ ). List of dictyostelid sequences used in the study. Accession numbers  SmokOW9A was combined with taken from GeneBank (NCBI; ∗ ∗ † † C.1:  ophorum . b marked with similar symbols. Table D: Dictyostelium, A: Acytostelium, P: Polysphondylium. Sequences are Schaap: sequences from theSLB: data sequences set from from an Schaap unpublished et RPB1 al. data (2006). set (2009-11-03), the Baldauf lab. This work: sequences obtainedSanger from Institute my ( experiments during this study. DOE Joint Genome Institute ( http://genome.jgi-psf.org/Dicpu1/Dicpu1.home.html). D. fasiculatum D. monochasioides All 20 isolates for which sequences are available are not shown here, but can be seen in Fig. 9. f b c e g a D. fasiculatum D. tenue Species D. fasiculatum A. longisor D. monochasioides D. brunneum d A. subglobosum D. gloeosporum D. tenue P. filamentosum P. luridum D. discoideum D. giganteum P. pallidum P. pseudocandidum D. purpureum

54 Appendix C – – – – INTS1 XP_001367532.1 XP_002426766.1 XP_002116510.1 – – – – XP_971265.1 – – 5PTase XP_002119808.1 XP_001747497.1 ) ) c or references. d a – – – – – – – XP_789838.2 – eRF3 (HGSC (GCWU Contig 1479 ) c – – – – – – – (GCWU 2504.2, 2504.1 – – – – – – – -tubulin RPB1 α www.ncbi.nih.gov/Entrez/ ) if nothing else is given. List of outgroup sequences used in the study. Accession numbers C.2: ergroup SSU Table Sup Amoebozoa X13160.1 P50258.1 Contig 1533.7, Amoebozoa U07416.1 AAZ80770.1 AAC18417.1 Contig 604.1 Opisthokonta – Opisthokonta – Opisthokonta – Opisthokonta – OpisthokontaOpisthokonta – – Opisthokonta – taken from GeneBank (NCBI; b anthamoeba D: Dictyostelium, A: Acytostelium,Genome P: Center Polysphondylium. at WashingtonHuman University ( http://genome.wustl.edu/tools/blast ). Genome Sequencing Center ( http://blast.hgsc.bcm.tmc.edu/blast.hgsc?organism=AcastellaniNeff ). Sequences are b c a Trichoplax adhaerens Monodelphis domestica Physarum polycephalum Ciona intestinalis castellanii d Species Ac Monosiga brevicollis Pediculus humanus corporis Tribolium castaneum Strongylocentrotus purpuratus

55 Appendix D D Isolates not included in the analyses

Table D.1: Dictyostelid isolates from which eRF3 sequences could not be obtained. DNA source and the reason for their exclusion.

Species Group Isolate Sourcea Why not used D. parvisporum 1 OS126 DBS0235853 Not amplificable A. leptosomum 2 FG12A DBS0235449 Not amplificable A. subglobosum 2 LB1 Schaap 5PTase A. subglobosum 2 LB1 DBS0235452 Not amplificable D. minutum 3 AP11A AP Wrong organism when sequenced D. tenue 3 PJ6 Schaap INTS1 D. leptosomom 4 AP11S AP Wrong organism when sequenced P. violeceum No group P6 MR Not amplificable a DB: dictyBase strain no. (Dicty Stock Center; Northwestern University, Chicago); Schaap: DNA kept by the Baldauf lab and used in Schaap et al. (2006); AP: Spores obtained from Allison Perrigo, the Baldauf lab; MR: Spores obtained from Maria Romeralo, the Baldauf lab.

56 Appendix E E Supplemental trees

? */98

*/* 4A GROUP 4

*/98 ?

3C1 GROUP 3

2B1 0.98/− */* 0.92/− 2B2

*/* GROUP 2 2B1 */75

0.94/69 2A

1B GROUP 1

*/*

Figure E.1: A rooted phylogeny of Dictyostelia based on combined RPB1 and eRF3 sequences. The topology shown is the result of a Bayesian inference analysis of 700 aligned aa positions performed in MrBayes version 3.1.2 (Ronquist and Huelsenbeck 2003). The analysis was run for one million generations and the ‘burn-in’ was determined to 25 %. Support values are displayed on the relevant branches: Bayesian posterior probability/maximum likelihood bootstrap. Values equal to 1.0 (biPP) or 100 % (mlBP) are indicated as ∗; a – sign represents a support value less than or equal to 0.7/50. All branches are drawn to scale given by the scale bar (0.1 substitutions per site). The four major groups are indicated as in Schaap et al. (2006). The ‘within group designations’ follow the classification suggested by Romeralo et al. (2010a). The chimeric sequence is shown with the names of the two isolates combined. D: Dictyostelium, P: Polysphondylium; A: Acytostelium.

57 Appendix E

? 0.88

* 4A GROUP 4

* ?

3C1 GROUP 3

2B1 * − − 2B2

* GROUP 2

* 2B1 *

2A

1B GROUP 1

*

Figure E.2: A rooted phylogeny of Dictyostelia based on combined SSU rDNA and eRF3 sequences. 1804 (NT (SSU rDNA) + aa (eRF3)) positions were included. The data set was only subjected to Bayesian inference and the analysis was run for ∼280 000 generations. Method and annotation otherwise as in Fig. E.1.

58 Appendix E

? *

* 4A GROUP 4

* ?

3C1 GROUP 3

2B1 0.84 0.84

0.84 2B2

* GROUP 2

* 2B1 *

2A

1B GROUP 1

*

Figure E.3: A rooted phylogeny of Dictyostelia based on combined SSU rDNA and triple weighted eRF3 sequences. The eRF3 alignment was weighted to give the two genes an approximate equal number of parsimony informative sites. 2394 (NT (SSU rDNA) + aa (eRF3)) positions were included. The data set was only subjected to Bayesian inference and the analysis was run for ∼1.1 million generations. Method and annotation otherwise as in Fig. E.1.

59 Appendix E

?

*/99 GROUP 4

?

*/*

2B1

0.95/62 GROUP 2

2A

*/*

*/*

Figure E.4: A rooted 5PTase phylogeny of Dictyostelia. 292 aa positions were included. Method and annotation otherwise as in Fig. E.1.

60 Appendix E

? 0.98 x GROUP 4

? */* 54 3C2 GROUP 3 */*

2B1 GROUP 2

*/* */94

*/−

Figure E.5: A rooted INTS1 phylogeny of Dictyostelia. 189 aa positions were included. The dotted lines indicate alternative branching patterns obtained by the ML analysis. The concerned branches are marked with ‘x’. Method and annotation otherwise as in Fig. E.1.

61