<<

Thesis

A phylogenomic contribution to the eukaryotic tree of

BURKY, Fabien

Abstract

Depuis quelques années, la phylogénie moléculaire, c'est-à-dire l'étude des relations évolutives entre les êtres vivants en comparant des séquences d'ADN ou d'acides aminés, a profondément modifié notre vision de l'arbre des eucaryotes. Le schéma qui est actuellement accepté par le plus grand nombre voit cinq assemblages majeurs d'organismes regroupant toutes les espèces d'eucaryotes : ce sont les ‘super-groupes' (unikonts, excavates, Plantae, chromalveolates et ). Si l'existence de ces super-groupes fait figure de consensus, les relations phylogénétiques qui les lient sont encore très incertaines. La phylogénomique est un nouvel outil de la biologie évolutive permettant d'adresser d'importantes questions qui étaient jusque-là restées sans réponse, faute d'information suffisante. Grâce à l'accumulation de données génomiques, il devient en effet possible d'utiliser un nombre toujours plus important de marqueurs moléculaires (gènes ou protéines) pour une diversité croissante d'organismes en vue de reconstruire les relations phylogénétiques à l'échelle des eucaryotes. Les [...]

Reference

BURKY, Fabien. A phylogenomic contribution to the eukaryotic tree of life. Thèse de doctorat : Univ. Genève, 2009, no. Sc. 4077

URN : urn:nbn:ch:unige-18450 DOI : 10.13097/archive-ouverte/unige:1845

Available at: http://archive-ouverte.unige.ch/unige:1845

Disclaimer: layout of this document may differ from the published version.

1 / 1 UNIVERSITÉ DE GENÈVE FACULTÉ DES SCIENCES

Département de zoologie Professeur Jan Pawlowski et biologie animale

A phylogenomic contribution to the eukaryotic tree of life

THÈSE

Présentée à la Faculté des sciences de l’Université de Genève

pour obtenir le grade de Docteur ès sciences, mention biologie

par Fabien BURKI

de

Genève (Suisse)

Thèse No 4077

GENÈVE

ReproMail: atelier d’impression à Uni Mail

2009

Remerciements

Je tiens à exprimer ici ma plus profonde reconnaissance à toutes les personnes qui, de prés ou de loin, m’ont apporté leur soutient et leur confiance tout au long de cette aventure.

Au Professeur Jan Pawlowski tout d’abord… Jan, que dire en particulier il y a tellement de pensées qui me viennent à l’esprit. Nous nous sommes rencontré il y a 8 ans, lorsque je suis venu frapper à ta porte pour un travail de diplôme. 8 ans ! Après une période d’incertitudes où je ne savais plus très bien quel chemin suivre, tu m’a à nouveau accordé ta confiance en m’acceptant comme doctorant, il y a 4 ans et demi. Depuis lors, ce ne fût que bonheur et découverte. Merci pour la liberté que tu m’as ac- cordée. Merci pour ta disponibilité, ton écoute, tes conseils.

To Professor Kamran Shalchian-Tabrizi. Thank you so much for all your help, our countless discussions over skype, and the aquavit. You’re more than just a collea- gue !

Au Professeur Louisette Zaninetti, pour le souvenir merveilleux de mes années de diplôme qui m’a fait revenir frapper à la porte du labo.

A Juan, pour ton dévouement, ta disponibilité et tes explications toujours telle- ment claires qu’en fait… on les comprend, même s’il s’agit de phylogénie…

A Jackie, pour tes connaissances, ta contribution, et… nos discussions. J’espère de tout cœur que nos chemins se croiseront à nouveau.

A José, merci pour ta disponibilité de chaque instant, ton enthousiasme et ton aide.

A Loic… non je ne le dirai pas ici. Mais toi seul sais de quoi je parle. Simplement merci pour ces 2 années, t’es à jamais dans mon cœur… ouais mais bon…

A Béa, même si ce couloir a été comme une barrière infranchissable… Que ton enthousiasme débordant ne s’éteigne jamais.

A Thierry, eh gaillard c’est pas parce que t’es vaudois que t’as pas le droit de faire partie de la liste. T’es vaudois en fait ? C’est déjà où les Brenets ?

Aux diplômants Michael et Cyril, avec qui le labo s’est d’ores et déjà assuré un avenir radieux.

A Ignacio Bolivar, pour ton aide précieuse durant les balbutiements de ce tra- vail.

Aux ex : Cédric, Ben, Xav, Patrick, Fred, Sasha et Yurika. Parce que vous avez bercé une partie de ma thèse de votre douce folie. Vos départs ont laissé un gros vide.

 Aux collaborateurs des divers projets : Kamran Shalchian-Tabrizi, Jon Brate, Asmund Skjaeveland, Marianne Minge, Dag Klaveness, Kjetill Jakobsen from Oslo ; John Archibald from Halifax ; Patrick Keeling, Ales Horak from Vancouver ; Yuji Inagaki, Tetsuo Hashimoto, Miako Sakaguchi from Tsukuba ; Hervé Philippe from Montréal ; Thomas Cavalier-Smith from Oxford ; Colomban de Vargas, Ian Probert from Roscoff ; Sergey Nikolaev from Geneva.

Aux ordinateurs, aspect non-humain mais indispensable à ce genre travail, et à toutes les personnes qui permettent le fonctionnement de ces clusters. Le Vital-IT de l’Institut Suisse de Bioinformatique et le Bioportal de l’Université de Oslo ont permis pratiquement la réalisation de cette thèse en moins de 20 ans. Je tiens particulièrement à remercier Bruno Nyffeler, Jacques Rougemont et Volker Flegel au Vital-IT, ainsi que Asmund Skjaeveland et Pal Enger au Bioportal pour leur aide inestimable dans les in- nombrables debuggings. Je voudrais aussi remercier Lorenza Bordoli, Vassilios Ioannidid and Laurent Falquet pour leur précieux conseils dans les méandres de l’apprentis bioin- formaticien.

To the Canadian consortium EST Program (PEP) that has made publi- cly available several EST projects, allowing us to greatly improve the taxon-sampling of our alignments.

Je suis infiniment reconnaissant au Fond National Suisse de la recherche scienti- fique, à l’état de Genève, au Département de zoologie et biologie animale et à la Fonda- tion Ernst & Lucie Schmidheiny pour leur soutient financier.

I am very grateful to John Archibald, Kjetill Jakobsen and Michel Milinkovitch for accepting to evaluate my PhD.

A ceux que j’oublie… j’espère peu nombreux.

Enfin, un énorme, immense, gigantisme MERCI aux miens : ma mère, mon père, mon frère. C’est surtout grâce à vos inconditionnels encouragements depuis toujours que j’y suis arrivé.

And… Thanks, Grazie, Merci à toi Babs pour… tout. Merci d’être toi, merci d’être là. Merci d’avoir supporté ces moments difficiles, lorsque le stresse devenait trop fort. Mer- ci pour ton soutient, tes encouragements. Merci, merci, merci. L’aventure continue, en- semble.

 Résumé en français

Depuis quelques années, la phylogénie moléculaire, c’est-à-dire l’étude des relations évolutives entre les êtres vivants en comparant des séquences d’ADN ou d’acides aminés, a profondément modifié notre vision de l’arbre des eucaryotes. Le schéma qui est actuelle- ment accepté par le plus grand nombre voit cinq assemblages majeurs d’organismes regrou- pant toutes les espèces d’eucaryotes : ce sont les ‘super-groupes’ (unikonts, excavates, Plantae, chromalveolates et Rhizaria). Si l’existence de ces super-groupes fait figure de consensus, les relations phylogénétiques qui les lient sont encore très incertaines. La phylo- génomique est un nouvel outil de la biologie évolutive permettant d’adresser d’importantes questions qui étaient jusque-là restées sans réponse, faute d’information suffisante. Grâce à l’accumulation de données génomiques, il devient en effet possible d’utiliser un nombre toujours plus important de marqueurs moléculaires (gènes ou protéines) pour une diversité croissante d’organismes en vue de reconstruire les relations phylogénétiques à l’échelle des eucaryotes. Les récentes publications d’études relatant les analyses d’immenses alignements multigéniques (plus de 120 gènes concaténés) englobant tous les super-groupes ont montré qu’il était désormais possible de remonter toujours plus loin dans le temps et de résoudre des relations évolutives pour lesquelles seul un très grand nombre de données contient un signal suffisant. S’inscrivant complètement dans cette nouvelle approche génomique de la phylogé- nie, nous avons en premier lieu obtenu des librairies d’EST (Expressed Sequence Tag) pour 3 espèces appartenant au super-groupe Rhizaria : deux foraminifères, filosa et Quinqueloculina sp., et un Gymnophrys cometa (récemment renommé Limnofila borokensis). Un total de plus de 4500 séquences représentant des gènes exprimés au moment de l’extraction d’ARN ont été analysé, ce qui constitue à ce jour le plus vaste jeu de données disponibles pour cet assemblage majeur d’eucaryotes. Apparus à la fin des années 1990, les Rhizaria sont un groupe pour l’instant définis uniquement sur la base de séquences moléculaires. Bien que reconnus comme étant l’un des cinq super-groupes d’eucaryotes, il était jusqu’à notre étude restés en dehors de la discussion alimentée par la phylogénomique à cause de l’insuffisance des données. Notre objectif de départ était simple : utiliser des alignements multigéniques pour 1) confirmer la monophylie des Rhizaria et, surtout, 2) positionner ce super-groupe dans l’arbre des eucaryotes, autrement dit investi- guer les relations phylogénétiques existantes entre les divers super-groupes présumés. Nous avons donc commencé par obtenir environ 2000 ESTs pour R. filosa (chapi- tre 2) ce qui nous a permis, en utilisant également la seule librairie de Rhizaria publique de l’époque ( natans), de construire un alignement comprenant 85 protéines (environ 13'000 acides aminés) et 37 espèces (dont deux Rhizaria) (chapitre 3). Les Rhi- zaria faisaient ainsi leur entrée dans le domaine de la phylogénomique. Cette étude est ve- nue confirmer, en utilisant pour la première fois des dizaines de gènes, que ce groupe avait en effet un ancêtre commun. Elle n’a par contre pas pu établir de façon convaincante sa position dans l’arbre des eucaryotes, notre alignement ne contenant pas suffisamment de signal phylogénétique. Nous avons ensuite continué notre échantillonnage pour augmenter nos données, en séquençant plus de 2500 ESTs pour deux nouvelles espèces de Rhizaria (Quinqueloculina

 sp. et G. cometa). Au même moment, de nombreux projets EST pour une grande variété de protiste étaient rendus publiques, nous permettant de mettre sur pied un très grand jeu de données: 123 gènes représentant environ 30'000 acides aminés pour 50 espèces d’eucaryotes équitablement réparties à travers l’arbre (chapitre 4). Cette fois-ci, de façon très intéressante, l’analyse de cet alignement multigénique indique que deux des super- groupes supposés (Rhizaria et chromalveolates) sont en fait intimement liés. Nous propo- sons dans cette étude un nouvel arbre ne contenant plus que quatre assemblages majeurs d’eucaryotes, et regroupons les Rhizaria et des membres des chromalveolates sous la nou- velle appellation ‘SAR’ (, , Rhizaria). La monophylie de ‘SAR’ im- plique que la plus importante biodiversité connue de protistes partage un ancêtre commun. Cette nouvelle relation a également d'importantes conséquences, notamment sur la com- préhension que l'on a de l'évolution des organismes photosynthétiques à travers les diverses endosymbioses. Pour estimer l’impact de l’échantillonnage taxonomique sur la résolution de nos ar- bres phylogénétique, nous avons ensuite étendu notre alignement en incluant d’avantage d’espèces (65 espèces) (chapitre 5). L’analyse de cet alignement a permis de remonter encore plus loin dans le temps en révélant la relation phylogénétique entre ‘SAR’ et les plantes (Plantae). Cette relation est intrigante, notamment car elle lie évolutivement en- tres elles toutes les espèces photosynthétiques. Nous avons également utilisé l’approche phylogénomique pour essayer de résoudre la position phylogénétique d’espèces pour lesquelles les analyses basées sur quelques gènes seulement étaient restées infructueuses (chaptire 6). Précisément nous avons investigué l’origine de deux groupes énigmatiques d’eucaryotes, les télonémides and centrohélides qui sont parmi les derniers sans réelle position au sein de l’arbre des eucaryotes, et suggérons leur appartenance au même contenant également les et cryptomonades. Ce groupe correspond en fait à un nouvel assemblage majeur d’eucaryotes qui, de part les espèces qui le compose et sa place évolutive, devient central pour comprendre la réparti- tion des plastes résultants de l’endosymbiose secondaire avec une algue rouge. Nous avons par ailleurs participé à l’étude d’une autre espèce sans position phylogénétique claire, Bre- viata anathema, mais qui est cruciale dans notre compréhension des premières phases de l’évolution des eucaryotes (voir annexes). Finalement, nous sommes actuellement en train de travailler sur la datation molé- culaire de l’arbre des eucaryotes. Savoir à quelles périodes les principaux groupes d’eucaryotes ont divergé les uns des autres est fondamental pour mieux comprendre ces étapes évolutives majeures. Jusqu’à présent ces études ont souffert d’une part du manque de données moléculaires pour suffisamment d’espèces et d’autre part du manque de points de calibration fossiles précis pour donner un cadre à l’inférence temporelle. Nous proposons ici de combler ces deux problèmes en séquençant massivement trois espèces possédant un bon bilan microfossile et en utilisant ces nouvelles calibrations dans une datation moléculai- re basée sur notre alignement multigénique. Les motivations de ce projet sont exposées dans le chapitre 7.

 Co-authors affiliation

Here is listed, in alphabetic order, the affiliation of all authors who participated (or are par- ticipating) in this work, outside the Department of zoology and biology at the Uni- versity of Geneva:

John M. Archibald: Dalhousie University, Department of Biochemistry and Molecular Biology, Halifax, Nova Scotia, B3H 1X5, Canada

Jon Brate: University of Oslo, Department of Biology, N-0316 Oslo,

Thomas Cavalier-Smith: University of Oxford, Department of Zoology, South Parks Road, OX1 3PS, UK

Colomban de Vargas: Station Biologique, Equipe Evolution du Plancton et PaléoOcéans, 29682 Roscoff, France

Tetsuo Hashimoto: University of Tsukuba, Center for Computational Sciences, Institute for Bio- logical Sciences, Ibaraki 305-8577, Tsukuba, Japan

Ales Horak: University of British Columbia, Botany Depatrment, Vancouver, BC, V6S 1T4, Can- ada

Yuji Inagaki: University of Tsukuba, Center for Computational Sciences, Institute for Biological Sciences, Ibaraki 305-8577, Tsukuba, Japan

Kjetill S. Jakobsen: University of Oslo, Department of Biology, N-0316 Oslo, Norway

Patrick J. Keeling: University of British Columbia, Botany Depatrment, Vancouver, BC, V6S 1T4, Canada

Dag Klaveness: University of Oslo, Department of Biology, N-0316 Oslo, Norway

Marianne A. Minge: University of Oslo, Department of Biology, N-0316 Oslo, Norway

Sergey I. Nikolaev: University of Geneva, Department of Genetic Medicine and Development, 1 rue Michel-Servet, 1211 Geneva, Switzerland

Hervé Philippe: Université de Montréal, Centre Robert Cedergren, Département de Biochimie, Montréal, Québec H3T1J4, Canada

Ian Probert: Station Biologique, Equipe Evolution du Plancton et PaléoOcéans, 29682 Roscoff, France

Miako Sakaguchi: University of Tsukuba, Center for Computational Sciences, Institute for Biologi- cal Sciences, Ibaraki 305-8577, Tsukuba, Japan

Kamran Shalchian-Tabrizi: University of Oslo, Department of Biology, N-0316 Oslo, Norway

Asmund Skjaeveland: University of Oslo, Department of Biology, N-0316 Oslo, Norway



Table of contents



Foreword...... 1

Abstract...... 3

Chapter 1: Introduction ...... 7 1.1 Motivation ...... 9 1.2 The tree of ...... 10 1.2.1 World of Kingdoms...... 10 1.2.2 Molecular r-evolution: the SSU rRNA ...... 10 1.2.3 Time for deconstruction ...... 14 1.2.4 Groundwork for reconstructing ...... 15 1.2.5 Where is the root?...... 19 1.2.6 evolution...... 20 1.3 Phylogenomics ...... 21 1.3.1 A definition...... 21 1.3.2 How does phylogenomics work? ...... 22 1.3.3 Stochastic and systematic errors...... 25 1.3.4 So: can phylogenomics answer important questions?...... 26 1.3.5 The case of EGT in the context of phylogenomics...... 28

Chapter 2: Analysis of expressed sequence Tags from a naked foraminiferan

Reticulomyxa filosa ...... 31 2.1 Project description ...... 32 2.2 Abstract ...... 33 2.3 Introduction ...... 33 2.4 Result & discussion ...... 34 2.4.1 Sequencing & clustering...... 34 2.4.2 Comparisons with databases...... 36 2.4.3 Functional annotation ...... 38 2.5 Materials & methods...... 39 2.5.1 Cells and culture conditions ...... 39 2.5.2 cDNA construction and ESTs sequencing...... 39 2.5.3 Sequence processing and analysis...... 40

 Chapter 3: Monophyly of Rhizaria and multigene phylogeny of unicellular

Bikonts...... 41 3.1 Project description ...... 42 3.2 Abstract...... 43 3.3 Introduction...... 43 3.4 Results ...... 45 3.4.1 Sequences and alignments...... 45 3.4.2 Phylogenetic position of Rhizaria ...... 46 3.5 Discussion ...... 50 3.6 Materials & methods ...... 54 3.6.1 Construction of the alignment...... 54 3.6.2 Phylogenetic analyses...... 55 3.6.3 Testing phylogenies ...... 56 3.7 Supplementary material...... 57

Chapter 4: Phylogenomics reshuffles the eukaryotic supergroups...... 65 4.1 Project Description ...... 66 4.2 Abstract...... 67 4.3 Introduction...... 67 4.4 Results ...... 69 4.4.1 Single- analyses and concatenation...... 69 4.4.2 Phylogenetic position of Rhizaria ...... 71 4.5 Discussion ...... 72 4.6 Materials & methods ...... 74 4.6.1 Sampling, culture and constructions of cDNA libraries ...... 74 4.6.2 Construction of the alignments ...... 74 4.6.3 Phylogenomic analyzes...... 75 4.6.4 Tree topology tests...... 76 4.7 Supplementary material...... 77

Chapter 5: Phylogenomics reveals a new ‘Megagroup’ including most photosynthetic eukaryotes...... 85 5.1 Project description ...... 86 5.2 Abstract...... 87 5.3 Introduction...... 87 5.4 Results ...... 89 5.5 Discussion ...... 89 5.6 Materials & methods ...... 90 5.7 Supplementary material...... 93

Chapter 6: Early evolution of eukaryotes: two enigmatic heterotrophic groups are related to photosynthetic chromalveolates...... 95 6.1 Project description ...... 96 6.2 Summary...... 97 6.3 Results and Discussion...... 97 6.3.1 Evolutionary origin of and Centroheliozoa...... 97 6.3.2 An emerging group of great diversity, and the expansion of chromalveolates...... 102 6.3.3 Implications for plastid evolution...... 103 6.4 Experimental procedures ...... 104 6.4.1 Cultures...... 104 6.4.2 cDNA library construction and 454 pyrosequencing...... 104 6.4.3 Contig assembly and sequence alignment...... 104 6.4.4 Phylogenetic analyses ...... 105 6.5 Supplementary material ...... 107 6.5.1 Analyses after removing both, or one of T. subtilis or R. contractilis...... 107 6.5.2 Topology comparisons based on the supermatrices...... 108 6.5.3 Supplementary Table and Figures...... 109

Chapter 7: General conclusions and perspectives...... 115 7.1 Achievements ...... 117 7.2 Origin and spread of chlorophyll-c containing , and the early photosynthetic eukaryotes...... 118 7.3 A molecular time-scale for evolution: combining phylogenomics and the continuous microfossil record...... 121 7.4 Other perspectives ...... 124

Chapter 8: Literature cited...... 127

Chapter 9: Annexes...... 147 9.1 Other projects in which I have been involved during my PhD...... 147 9.2 Journal-formatted copy of the published chapters...... 147 9.3 Articles related to our work ...... 148  





Foreword

This manuscript describes the research I started in September 2004 as a PhD stu- dent in molecular phylogenetics of eukaryotes. At that time, the evolutionary tree of eu- karyotes was in a period of reconstruction after that the weaknesses of rDNA-based phylo- genies were demonstrated. Four and a half years later much progress has been made, as we shall see, and a new picture for the evolutionary history of eukaryotes is emerging.

The following chapters are arranged in a chronological manner, and I have tried to account for the gradual changes permitted by the successive release of new data. Of course the very last, most complete trees that we obtained only a couple of weeks ago could al- most say it all, summarizing by themselves the main results of this thesis. But they would not accurately detail the journey I wish to relate in this manuscript, a journey through the recent modifications of the eukaryotic tree.

Chapters 2, 3, 4 and 5 correspond to the work we published during the course of this PhD. They are preceded by a general introduction and followed by a yet unpublished chapter that explains our most recent results. Finally I conclude the main part of this manuscript by some important comments that have been raised by our research, and pre- sent the motivations for an ongoing project that address the timing of eukaryote evolution. The central chapters all start with a brief ‘project description’ that was added here for the sake of clarity: its purpose is to situate the reader in the context of the time the study was performed.

Because this manuscript is largely a collection of papers, I would like to warn the reader: it inevitably contains redundancy, particularly in the different introduction and discussion sections.

To obtain a unity in the format of the different chapters, I chose to present the manuscript version of the published articles. If the reader prefers the journal-formatted version, the original papers can be found in the annexes.

 1



Abstract

Resolving the global eukaryotic tree of life remains one of the most important and chal- lenging tasks facing biologists. A phylogeny supporting the evolutionary relationships among all eukaryotic lineages would provide a fundamental framework for broad compara- tive genomics, as more and more completed genomes for an always broader taxonomic sampling are being released. For example, an important question directly related to the structure of the tree of eukaryotes concerns the origin and spread of . Indeed the process of endosymbiosis, that eventually led to plastids, has been responsible for some of the most significant events in evolution but to fully tackle this question, a robust tree is the first requirement as it provides the support on which different hypotheses can be tested.

In the last decade, molecular phylogenetic trees have gradually assigned most of eu- karyotes to one of five or six putative very large assemblages, the so-called "supergroups". These comprise the ‘’ and ‘Amoeboza’ (often united in the ‘Unikonts’), ‘Ar- chaeplastida’ or ‘Plantae’, ‘’, ‘’, and ‘Rhizaria’. The strength of the evidence supporting these supergroups has been subject to much debate and, importantly, the relationships between them are yet to be confirmed. These uncertainties are largely due to the limited amounts of data available until recently for most parts of eukaryotic diversity. In particular, only a small fraction of the unicellular eukaryotes has been subject to molecular studies, leading to important imbalances in phylogenies and preventing re- searchers to reliably infer deep evolutionary relationships.

However, several sequencing efforts within the last couple of years have permitted a radical change in the inference of phylogenetic trees at high taxonomic levels (i.e., among the supergroups). Reconstructing the evolutionary history among the eukaryotic groups is no longer seen as a job for a few (with all the uncertainties related to this lack of data) or many genes but poor taxon samplings. Instead, huge phylogenomic datasets, in- volving the analysis of more than 100 genes, can now be used in order to reconstruct the evolutionary steps that led to the current diversity of eukaryotes.

 3 My PhD work is part of the current effort that aims to resolve the tree of eukaryotes, and started based on a simple observation: phylogenomic studies at the eukaryotic level were all lacking Rhizaria, which was an important problem when one considers that Rhi- zaria represent one fifth of the recognized supergroups and include a large diversity of very different taxonomic groups of . The reason for this was that genomic data were es- sentially absent from databases. Thus we generated Expressed Sequence Tags (ESTs) for several rhizarian . We first obtained ESTs for a species belonging to the important group of foraminifers (chapter 2). We then reported a large-scale analysis of eukaryote phylogeny including data for 2 rhizarian species, meaning that for the first time representa- tives of every supergroup were analyzed together using a phylogenomic approach (chapter 3). Our results, based on a dataset of 37 species and 85 genes, confirmed the putative mo- nophyly of Rhizaria. This was of interest as this supergroup is still defined only by molecu- lar characters. At the same time this project shed light on the great difficulties one would face when trying to infer the evolutionary relationships between the major groups of uni- cellular , even when more than 10'000 amino acid sites are involved. Overall our trees were poorly resolved within the part and we concluded that what was needed was longer alignments and, perhaps more importantly, a better taxonomic sampling.

Following our previous conclusions, we attempted to build a much larger phylogenomic dataset in order to investigate the phylogenetic relationships between all the eukaryotic supergroups (chapter 4). This was carried out by generating new genomic data and survey- ing public databases to construct both a longer and taxonomically broader alignment. Our new dataset contained 49 species and 123 genes. Very interestingly, this matrix contained enough phylogenetic signal to confidently resolve several ancient divergence points in the evolutionary history of eukaryotes. Of particular significance, it supported a very robust relationship between Rhizaria and two main of the supergroup chromalveolates (stramenopiles and alveolates): the ‘SAR’ grouping. We showed the existence of consistent affinities between assemblages that were thought to belong to different supergroups, thus not sharing a recent common ancestor. These new relationships had important conse- quences for our understanding of the evolutionary history of eukaryotes. Notably, Rhizaria became a new player that cannot be ignored when addressing questions related to the pu- tative single red algal origin of the chlorophyll-c containing plastids among the chromalveo- lates.

To test further our alignment in order to investigate even earlier evolution among eu- karyotes, we significantly updated our matrix with several publicly available species to reach 65 mostly bikont species and 135 genes (chapter 5). This new dataset, analyzed with

4 

the latest phylogenetic methods, allowed us to obtain a tree in which, at its deepest level, only three stems were displayed, i.e. two highly supported megagroups, enclosing the vast majority of eukaryotic species, and the excavates that were of uncertain position. Our re- sults brought convincing support for the clustering of almost all photosynthetic groups in a unique mega-clade. We speculated that the observable diversity of plastids within the new megagroup could be traced back to its last common ancestor, and is the consequence of an increased capability of all its members to accept and keep plastids or plastid-bearing cells.

Phylogenomics is helpful to infer ancient relationships between the eukaryotic super- groups, it can also be used to address the evolutionary origin of lineages that have proven challenging thus far (referred to ‘orphan’ lineages). We undertook massively parallel 454 sequencing of two such groups of uncertain phylogenetic position, Telonemia and Centro- (chapter 6). These groups were of great interest because they both include only heterotrophic organisms, yet based on weak hints they have been suggested to be related to photosynthetic members of the chromalveolates. Our analyses of 72 species and 127 genes brought the first reliable support for the placement of telonemids and into an expanded chomalveolate megagrouping, also containing Rhizaria, most likely closely related to haptophytes and . Thus, these two lineages are from now on of key importance in further investigations to understand the distribution of red algal-derived plastids. We also participated in a phylogenomic study of another orphan group, the brevi- ate amoebae, that is of crucial significance for our understanding of early transitions in eu- karyote evolution (see annexes).

Finally, we conclude this manuscript with general comments on our work, and give some possible futures directions (chapter 7). In particular, we present our motivations for an ongoing project for inferring a molecular time-scale for major evolutionary events of eu- karyote evolution. In order to reduce artifacts in molecular dating due to small amounts of data or lack of reliable calibrated nodes, we massively sequenced one and 2 cocco- lithophorids which will allow us to combine phylogenomics and micropaleontology (by means of the well-documented continuous microfossil record).

 5



Chapter 1: Introduction   

 7

Chapter 1: Introduction

1.1 Motivation

Is it an important achievement to resolve the evolutionary tree of life? The answer is yes, without any possible doubt. Beyond the pure interest that drives the research of most scientists –the world, us , would be very different without our comprehension of fundamental and not necessarily critical topics– having a fully resolved including all organisms is the needed framework for studies aimed at understanding the acquisition and evolution of countless characters. A tree is the reference for selecting key species that have the potential to answer important evolutionary questions using, for ex- ample, comparative genomics. Evolutionary studies help to place comparisons in perspec- tive so that one can understand how, when, and sometimes why some similarities and dif- ferences in genomes arose [Eisen and Fraser 2003]. Once strongly biased to organisms rele- vant for well-being (of economical or medical importance), the taxonomic distribu- tion of species for which extensive genomic data sets are available has now increased dra- matically, and will continue to expand thanks to new sequencing technologies and lower costs. We have entered a very exiting period where we may begin to revisit questions of general interest, such as the origin of multicellularity or photosynthesis, and find precise answers by digging into the mass of data that continues to accumulate. The tree of life has not been immune to change with these new data either: it has undergone a profound re- shuffling revealing a number of relationships between major lineages that were previously unknown and formerly inaccessible.

Being in line with these new possibilities, the aim of our project was to explore the po- tential of phylogenomics –i.e., the use of large data sets in molecular phylogenetics– in re- solving ancient evolutionary relationships within the eukaryotic tree of life. Our initial fo- cus was on one eukaryotic supergroup in particular, Rhizaria, as it was mainly ignored in phylogenomic studies (chapter 2, 3 and 4). We later got interested in other important questions involving deep evolution of eukaryotes such as the phylogenetic position of the and the other photosynthetic species (chapter 5), the placement of “orphan” species (chapter 6) or the molecular dating of the tree of eukaryotes (chapter 7). In this manu- script, I will discuss the recent topological changes of the tree in the light of our results.

But before I shall start with several general considerations that are necessary to place this work in context.

 9 Chapter 1: Introduction

1.2 The tree of eukaryotes

1.2.1 World of Kingdoms

Inferring the evolutionary relationships among living beings is not a recent matter, and scientists have not waited for the advent of the molecular or the “all genomic” eras to pro- pose trees depicting how species are related to one another. Among all the abundant work foregoing the democratization of molecular phylogenies, one can cite three landmark studies mainly based on morphology and nutrition modes that proposed trees that have entered general biology textbooks for decades: the 3 kingdoms of Haeckel in 1866, Figure 1-ch.1 (Plantae, Protista and Animalia) [Haeckel 1866], the 4 kingdoms of Copeland in 1938, Fig- ure 2-ch.1 (, Protista, Plantae, and Animalia) [Copeland 1938], and the 5 kingdoms of Whittaker in 1969, Figure 3-ch.1 (Monera, Protista, Plantae, Fungi, and Animalia) [Whittaker 1969]. Ultimately these systems were quite similar, the authors essentially modified the boundaries delimitating the kingdoms to go from three to five as the knowl- edge accumulated. They all represented schemes in which the evolutionary processes led to several transitions from a basal pool of apparently simple, largely unicellular organisms – the protists–, to more elaborate multicellular organisms. Although these proposals suc- ceeded in roughly recognizing several major assemblages (e.g., Fungi, ), they nota- bly failed to resolve their relationships and, importantly, account for the fundamental paraphyletic and complex nature of the protist lines. Despite the fact that they were very important pieces of work and are still extremely useful when trying to embrace the huge diversity of life, certainly because they represent a “natural”, very pedagogical, way of classifying the organisms (the Whittaker's system, for example, is still taught in most high school biology lessons), it is not my intention here to explore further their strengths and weaknesses.

1.2.2 Molecular r-evolution: the SSU rRNA

Instead I would like to consider in more detail another kind of phylogeny that has re- placed the above classical phenotype-based approach: the phylogenies inferred by compar- ing sequences of DNA or amino acids, i.e. molecular systematics, which have revolution- ized our understanding of evolutionary relationships. In 1965, Zuckerkandl and Pauling [Zuckerkandl and Pauling 1965] argued that collating of informative molecules would per- mit the evaluation of evolutionary relatedness. They were obviously right and since that time molecular phylogeny has been regarded as the tool of choice for reconstructing evolu- tionary histories –this is particularly true for protists where the interpretation of morpho- logical characters alone is problematic, particularly in an evolutionary framework.

10 Chapter 1: Introduction

Figure 1-ch.1. Haeckel’s three kingdoms. Figure 2-ch.1. Copeland’s four kingdoms. From [Haeckel 1866]. From [Copeland 1938].

Figure 3-ch.1. Whittaker’s five kingdoms. From [Whittaker 1969].

 11 Chapter 1: Introduction

The first molecular works aimed at determining the evolutionary relationships among eukaryotes date back to the mid eighties and principally depended on the small subunit ribosomal RNA (SSU rRNA) [Sogin et al. 1986; Friedman et al. 1987; Sogin et al. 1989; Woese et al. 1990; Sogin 1991], although the large subunit (LSU rRNA), to a lesser extent, also contributed to phylogenetic reconstruction (e.g., [Perasso et al. 1989]). These pioneer studies were all characterized by a handful of deeply diverging protist lineages (e.g., , , ), progressively emerging from the distant prokaryotic root, and followed by a densely branched “crown” nesting most eukaryotic diversity (Fig- ure 4-ch.1). From these early molecular analyses, evolutionists drew the following principal features:

1) As a result of the huge genetic diversity in SSU rRNA, the deep eukaryotic branches seemed to exceed the depth of branching within the entire prokaryotic world [Sogin et al. 1986].

2) Consequently, eukaryotes became distinct very early in the , and were thought to be likely as old as the eubacteria and archaebacteria [Sogin et al. 1989].

3) The lowermost lineages of the eukaryotic tree were usually simple, most of which liv- ing parasitically within animal hosts and, importantly, lacking , in particular the [Friedman et al. 1987]. This notion of primitive amitochondrial eu- karyotes (the “Archezoa” hypothesis [Cavalier-Smith 1989]) was in fact being discussed prior to the publication of molecular phylogenies that wrongly supported its validity. It postulated that mitochondria-lacking eukaryotes had diverged before the acquisition of mitochondria through endosymbiosis and had evolved under anaerobic conditions. So when the first SSU rRNA trees including such organisms came out, specifically showing a deeper branching than any previously known eukaryotic sequences [Friedman et al. 1987; Sogin 1989; Sogin et al. 1989], the general consensus converged towards the postu- late that these amitochondrial lineages were genuinely primitive, relicts of an ancient world devoid of oxygen (Figure 4-ch.1).

4) The apical part of the SSU rRNA tree, the so-called crown, contained major clades that branched near a common point, as if their divergence occurred nearly simultane- ously [Sogin 1991]. Here were included, among others, the animals, fungi, plants, and diverse protist lineages that now form the grouping. Because the branching pattern among these groups could not be resolved, it was suggested that they origi- nated in a massive radiation [Knoll 1992]. This lack of a clear order of divergence among the SSU rRNA crown was equally uncovered soon after with several protein markers [Baldauf 1999; Hirt et al. 1999; Pawlowski et al. 1999; Roger et al. 1999; Mor-

12 Chapter 1: Introduction

eira et al. 2000], leading some to propose the “big-bang” hypothesis [Philippe et al. 2000a] which postulated that most eukaryotic phyla emerged in a relative short period of time, thus not enough phylogenetic signal could accumulate in the sequences.

Figure 4-ch.1. A typical SSU rRNA tree of eukaryotes, as it was being published in the mid-nineties. Plastid-bearing lineages are indicated in colors approximating their respective pigmentation. From [Embley and Martin 2006].

In the nineties, as more and more species were being sequenced, intermediate groups appeared in between the Archezoa members and the eukaryotic crown. Similarly to the amitochondrial species, these newcomers were characterized by a high rate of evolution producing long branches in phylogenetic reconstructions. A classical example are the Fo- raminifera whose both SSU and LSU rRNA showed a mid-position in the tree [Pawlowski et al. 1994; Pawlowski et al. 1996].

Interestingly enough, this view of the eukaryotic tree relied almost entirely on a single molecular marker (the SSU rRNA, although as mentioned above a few others started to be used), and the gene trees were very much interpreted as the organismal tree. Unfortu- nately this marker proved to be highly mutationally saturated at the eukaryotic level, with very variable evolutionary rates between species [Philippe and Laurent 1998]. Because this characteristic was shown to be prone to the Long Branch Attraction (LBA) artifact, in which two distant species with fast evolving sequences are erroneously clustered together [Felsenstein 1978], the SSU rRNA topology became highly suspicious [Embley and Hirt

 13 Chapter 1: Introduction

1998; Philippe and Laurent 1998]. Furthermore, two other important requirements for accu- rate phylogenetic inferences were not respected: the availability of a well sampled diversity of species and the use of appropriate tree reconstruction methods [Hendy and Penny 1989; Lecointre et al. 1993; Huelsenbeck 1997; Brinkmann et al. 2005]. Indeed the taxon sam- pling at the beginning of molecular systematics was rather sparse, with often a single repre- sentative per major lineage, which increased the sensitivity of LBA by leaving unbroken the basal long branches. Likewise, simplistic approaches for inferring phylogenies (distances computed or parsimony) together with the use of unrealistically simplified models of evolu- tion were a serious brake for the resolution of the eukaryotic tree.

So the situation in the late nineties was a tree of eukaryotes very much based on a single molecular marker, with recognized shortcomings in its capability for being able to infer phylogenetic relationships at deep taxonomic levels.

1.2.3 Time for deconstruction

Important discrepancies between the SSU rRNA trees and a growing number of pro- tein-coding gene phylogenies started to bring alternative hypotheses for evolutionary rela- tionships within eukaryotes. Besides the diversification of molecular markers, much better methods (probabilistic) and models of evolution as well as broader taxonomic samplings became available. These new topologies, although often incongruent between them [Philippe et al. 2000b], essentially consisted in moving to (or close to) the crown species that were diverging much earlier in SSU rRNA trees. One can mention the studies of Mi- crosporidia (- and -tubulins, [Keeling and Doolittle 1996] or TBP, [Fast et al. 1999]), the slime molds (elongation factor-1, [Baldauf and Doolittle 1997]), or an important work in- vestigating eukaryotes as a whole by combining four proteins (- and -tubulins, actin, and elongation factor-1) [Baldauf et al. 2000]. At the same time relationships between some major groups were recovered with reasonable statistical support and, in some in- stances, also combining data gained from rare genomic changes such as the inser- tion/deletion character (indel). For example the specific associations between animals and fungi [Baldauf and Palmer 1993], green plants and red [Moreira et al. 2000] or a su- pertaxon grouping the alveolates and stramenopiles [Baldauf et al. 2000] began to appear.

Strikingly, in a very similar way that earlier phylogenetic trees supported the Archezoa hypothesis, genes derived from the mitochondrial symbiont were progressively discovered in species that apparently lacked mitochondria as these lineages were relocating within the eukaryotic crown (thus leaving the “primitive” part of the tree) [Clark and Roger 1995; Roger et al. 1996; Germot et al. 1997; Roger et al. 1998]. These genes were shown to en- code proteins localized in particular organelles, the [Bui et al. 1996] and

14 Chapter 1: Introduction [Tovar et al. 1999], that are most likely remnants of mitochondria. Altogether, the most parsimonious explanation is that mitochondria were ancestrally present in all eu- karyotes, but have been secondarily lost repeatedly or degenerated into small organelles in some lineages, hence totally invalidating the Archezoa hypothesis [Keeling 1998].

An important consequence of this relocation of the no longer primitive eukaryotes within the crown was that the transition between the prokaryotic outgroup and the extant species was in fact not a progressive transformation involving intermediate forms. The con- cept of the eukaryotic crown itself had become obsolete as any living eukaryotes actually belong to it, which implied a great reduction of the evolutionary distances between the former “basal” and “apical” lineages. Furthermore it invalidated the suggestion that eu- karyotes were extremely ancient and reduced their evolutionary diversity below that of prokaryotes.

1.2.4 Groundwork for reconstructing

These profound modifications of the structure of the eukaryotic tree led to the concept that most, if not all, diversity can be assigned to one of several major assemblages: the “supergroups” (Figure 5-ch.1). Reassembling the evolutionary history of eukaryotes was obviously not the result of a single study, but very much a matter of uniting several types of data into one comprehensive picture. Despite what has been mentioned above, single- gene trees continue to be a valuable source of information when they are combined with an appropriate knowledge of potential artifacts, because they are generally built with taxon-rich alignments. Thus, by correctly interpreting several individual trees one might be able to discern general tendencies in phylogenies. However, as more and more genomic data accumulated, it became possible to assemble larger datasets that contain in principle more phylogenetic signal, so great possibilities were given to address further evolutionary questions. Finally, discrete molecular characters such as indels, gene fusions, or gene order have also been useful in reevaluating the eukaryotic tree as they are independent of phylo- genetic reconstruction, although much caution is here as well required because these mark- ers are not free of misleading errors [Bapteste and Philippe 2002].

When this working hypothesis was first summarized in a paper, eight supergroups were recognized [Baldauf 2003]. This review was notably relevant because it accounted for the “true diversity of life”, i.e. the discovery of non-cultured minute organisms, nano- or pico- in size, that were scattered across the tree. Soon after and regularly since then, reviews are being published updating the tree of eukaryotes with the lastest minor modifications, es- sentially representing the same scheme for the eukaryote evolution [Simpson and Roger 2004; Adl et al. 2005; Keeling et al. 2005; Lane and Archibald 2008]. These trees, unrooted,

 15 Chapter 1: Introduction all display a basal polytomy with five or six branches representing the supergroups that emerge from a common point, the order of divergence among these groups being very much uncertain (Figure 5-ch.1). Importantly, the supergroups hypothesis represents a con- sensus for the tree of eukaryotes, the most accurate we have so far, but by no means an unshakable scheme. The existence (that is, the monophyletic origin) for most of the super- groups is still highly arguable. Generally, parts of these hypothesized major assemblages have been reasonably shown to have a common origin, but we currently lack evidence for the supergroups as a whole, including all postulated lineages (this is less true for the opisthokonts, commonly robustly supported, see [Parfrey et al. 2006] for a broad discus- sion).

Figure 5-ch.1. One of the numerous schemes for the current view of the eukaryotic evolution, repre- senting the six hypothesized supergroups of eukaryotes. From [Lane and Archibald 2008].

16 Chapter 1: Introduction

Below I briefly introduce these six supergroups:

 Opisthokonts: This supergroup contains animals and fungi [Cavalier-Smith and Chao 1995], which are thought to have evolved independently from unicellular lineages belonging to the paraphyletic assemblage [Lang et al. 2002; Cavalier-Smith and Chao 2003c; Steenkamp et al. 2006; Ruiz-Trillo et al. 2008; Shalchian-Tabrizi et al. 2008], also included in it. It is putatively united by the presence of a single posterior in several representatives [Cavalier-Smith and Chao 1995], as well as much molecular-based evidence (e.g., single-genes [Baldauf and Palmer 1993; Wainright et al. 1993], 4 genes [Baldauf et al. 2000], 143 genes [Rodriguez-Ezpeleta et al. 2005], amino acid insertion/deletion [Baldauf and Palmer 1993]). It is currently the most reliable su- pergroup, but some continue to argue for a close evolutionary relationship between animals and green plants instead [Stiller 2007].

 : This supergroup includes mostly amoeboid protists (that is cells with ) such as the classical with lobose pseudopodia but also slime moulds and some amitochondrial lineages. Evidence that it is a monophyletic group, not very strong at the moment, has emerged only recently and is based on single and multigene phylogenies [Baldauf et al. 2000; Bapteste et al. 2002; Fahrni et al. 2003; Smirnov et al. 2005], as well as a gene fusion in mitochondrial genome of the two spe- cies that were investigated [Lonergan and Gray 1996].

Opisthokonts and Amoebozoa are often united in a larger supergroup, Unikonts [Cavalier-Smith 2002], that is supported by several rare genomic changes (see section 1.2.5) [Stechmann and Cavalier-Smith 2002; Stechmann and Cavalier-Smith 2003b; Richards and Cavalier-Smith 2005] as well as several single (e.g., [Baldauf and Palmer 1993]) and a grow- ing number of multigene phylogenies (e.g., [Rodriguez-Ezpeleta et al. 2007a]).

 Pl antae (or Archeplastida): This supergroup is comprised of the three main lineages of primary photosynthetic organisms, thus corresponding to the group where plastids with two membranes first evolved by primary endosymbiosis with a cyanobac- teria are found: , green plants, and . Its monophyly has been gen- erally accepted because of the parsimonious explanation for a single origin of primary plastid [Palmer 2003; Keeling 2004; Mcfadden and van Dooren 2004; Reyes-Prieto et al. 2007; Archibald 2009] (although see [Prechtl et al. 2004; Nowack et al. 2008] for exam- ples of more recent independent primary endosymbioses), but other views, in particular an earlier divergence of the red algae, are still strongly debated [Nozaki et al. 2003; No- zaki 2005; Nozaki et al. 2007; Stiller 2007; Maruyama et al. 2008]. The use of genomic data has recently recovered strong support for a monophyletic assemblage in several

 17 Chapter 1: Introduction

studies, both based on [Martin et al. 2002; Chu et al. 2004; Hagopian et al. 2004; Rodriguez-Ezpeleta et al. 2005] and nuclear genes [Rodriguez-Ezpeleta et al. 2005; Rodriguez-Ezpeleta et al. 2007a].

 Chromalveolata: This supergroup is doubtlessly the most debated, because of the lack of clear and simple evidence supporting it and its central role in the under- standing of eukaryote evolution. It encompasses at present four diverse groups, mixing phototrophy and heterotrophy: stramenopiles (), cryptomonads, haptophytes (altogether the chromists [Cavalier-Smith 1998a]), and alveolates. This grouping results from the proposition that the number of plastids originated by secondary endosymbiosis (i.e. involving two eukaryotes) should be limited in evolution because of the real com- plexity in establishing a protein targeting system in a nascent plastid [Cavalier-Smith 1999]. Specifically, the chromalveolate hypothesis postulates that a single secondary en- dosymbiosis with a red alga took place in the ancestor of all chromalveolates, giving rise to an orthologous plastid in all its descendants. The consequence of this is that the host lineages must be related, a condition that is generally not respected as a whole even with big alignments (haptophytes and cryptomonads often branch elsewhere in the tree, or are not supported as sister to the rest of the chromalveolates) [Harper et al. 2005; Patron et al. 2007], but see [Hackett et al. 2007]. On the other hand, plastid data often recover a common origin for the photosynthetic members of this supergroup [Yoon et al. 2002; Khan et al. 2007], but this does not rule out the possibility that the red plastids were acquired independently via serial endosymbioses to reach the current situation. In favor of the hypothesis are also two specific gene duplications that unde- niably cluster the plastid-targeted proteins of the chromalveolates [Harper and Keeling 2003; Patron et al. 2004], but the relationships of the cytosolic version are much more ambiguous.

 Rhizar ia: This supergroup is the most recently recognized assemblage and is pres- ently only defined based on molecular data, commonly including organisms bearing “root-like reticulose or filose pseudopodia” [Cavalier-Smith 2002; Cavalier-Smith 2003]. In addition to typically amoeboid taxa, Rhizaria also include a large diversity of free- living , amoeboflagellates, and parasitic protists. The first presage for this grouping was a clade formed by the euglyphid and the photosynthetic in SSU rRNA phylogeny [Bhattacharya et al. 1995]. This clade was later enlarged to also include zooflagellate species and the plasmodiophorid parasites [Cavalier-Smith and Chao 1996-1997], leading to the creation of the Cercozoa [Cavalier-Smith 1996-1997]. The next important step was the finding that Cercozoa and are related in actin phylogeny [Keeling 2001]. This unex-

18 Chapter 1: Introduction

pected result was later confirmed by the discovery of a one or two amino acids inser- tion in the polyubiquitin polymers [Archibald et al. 2003a; Bass et al. 2005], and analy- ses of the large subunit of RNA polymerase gene [Longet et al. 2003] and SSU rDNA [Berney and Pawlowski 2003]. The taxonomic composition of Cercozoa was progres- sively expanded by including various zooflagellates [Atkins et al. 2000; Kuhn et al. 2000], gromiids [Burki et al. 2002], testate amoebae [Wylezich et al. 2002], filose and re- ticulate protists [Nikolaev et al. 2003], and radiolarians [Polet et al. 2004]. A strong support for Rhizaria, composed of all previously included taxonomic groups, plus Des- mothoracida and Taxopodida, was recovered in a combined analysis of actin and SSU rDNA genes [Nikolaev et al. 2004]. The rhizarian supergroup is growing continuously by new inclusions such as the marine ebriids [Hoppenrath and Leander 2006], the amoeboid Corallomyxa [Tekle et al. 2007], the parasitic plasmodial Paradinium [Skovgaard and Daugbjerg 2008] and the soil flagellate Sainouron [Cavalier-Smith et al. 2008].

 Excavata: This supergroup is composed of diverse heterotrophic protists, many of which are anaerobic and/or parasitic, characterized by a distinctive feeding groove and two flagella in most, but not all, of these organisms [Simpson 2003]. It is tentatively as- sembled in one monophyletic entity by a combination of molecular and morphological data [Simpson 2003], but to date a robust evidence is still lacking, although a recent phylogenomic study recovered moderate support for this supergroup [Hampl et al. 2009].

The four supergroups described above are often known as the Bikont assemblages [Cavalier-Smith 2002].

1.2.5 Where is the root?

The answer to this central question in our understanding of eukaryotic evolution is… next question please! Indeed we do not really now at present where the root lies, and all bets are off. The most common way for rooting a phylogenetic tree is the use of an exter- nal group (outgroup) that position the root of the ingroup lineages and gives a direction to the evolution. The natural outgroup for the eukaryotic tree consists of prokaryotes, usually belonging to the archaebacteria [Woese et al. 1990; Woese 2002; Pace 2006], even more pre- cisely to the Crenarchaeota line as recently shown [Cox et al. 2008]. Unfortunately this approach has proven to be unsuccessful with the current models of evolution, due to the very high genetic distances between eukaryotes and their outgroup resulting in artifactual placements of the fastest evolving eukaryotes at the base of the tree [Philippe and Germot 2000; Brinkmann et al. 2005].

 19 Chapter 1: Introduction

An alternative method for rooting a phylogenetic tree relies on complex genetic changes, which are expected to be rare. Today, perhaps the most commonly cited position for the eukaryotic root is between unikonts and bikonts, as deduced from several rare changes. Firstly the presence of a gene fusion in most tested bikonts made of TS and DHFR, two genes that are separated in unikonts and [Philippe et al. 2000b; Stechmann and Cavalier-Smith 2002; Stechmann and Cavalier-Smith 2003b]. Following a parsimonious logic, this fusion occurred only once and no reversal took place afterwards, the character TS-DHFR would be a derived character for the bikonts, implying a root out- side this “monophyletic” group. However this scenario is questionable, notably because the taxon sampling for which the presence of the TS-DHFR fusion has been tested so far is scarce [Embley and Martin 2006], some of the putative basal eukaryotic lineages (such as and [Arisue et al. 2005]) lack these genes, and fusion-bearing spe- cies seem to be phylogenetically related to unikonts [Kim et al. 2006]. Yet other characters are in agreement with the unikonts-bikonts basal bifurcation, for example a shared derived duplication for the phosphofructokinase gene of unikonts [Stechmann and Cavalier-Smith 2003b], or a particular type of myosin (type II) specific, again, to unikonts [Richards and Cavalier-Smith 2005]. To add even more uncertainties, a puzzling character suggests in- stead a root within excavates, basal to . Acting against the unikonts-bikonts split, the mitochondrial DNA encodes a bacterial-like RNA polymerase that is different from all other eukaryotes studied to date [Lang et al. 1997]. Hence, the position for the root of the eukaryotic tree remains an open question, and more rare genomic characters associ- ated with better phylogenies are required.

1.2.6 Plastid evolution

Our work, as we describe in the following chapters, turned out to be intimately related to the fundamental question of plastid evolution. Plants and algae acquired photosynthesis through primary endosymbiosis, in which a free living prokaryote (related to modern-day cyanobacteria) was engulfed, retained and integrated by a heterotrophic eukaryote. All members of the supergroup Plantae possess plastids of primary origin, bounded by 2 mem- branes of cyanobacterial type [Gould et al. 2008; Archibald 2009]. These plastids have sub- sequently spread across the tree of eukaryotes by secondary endosymbioses (also tertiary and serial endosymbioses), involving the uptake of either green or red algal endosymbionts by secondary eukaryotic hosts and resulting in plastids with 3 or 4 membranes [Gould et al. 2008; Archibald 2009]. Lineages harboring secondary plastids are found in three different supergroups: Excavata (green plastids), Rhizaria (green plastids), and Chromalveolata (red plastids).

20 Chapter 1: Introduction

It is generally accepted that primary endosymbiosis occurred only once in evolutionary history [Reyes-Prieto et al. 2007]. On the other hand, how many time secondary endosym- biosis took place is uncertain because it has not yet been possible to infer robust phyloge- netic relationships among all secondary photosynthetic species. It is largely recognized that 2 independent events explain the green plastids of (belong to Excavata) and chlorarachniophytes (belong to Rhizaria) [Rogers et al. 2007]. However, the number of sec- ondary endosymbiosis that led to the red plastid distribution we observe today is much more debated. One hypothesis is that all secondary red plastids derive from a single endo- symbiosis: this is known as the chromalveolate hypothesis (see section 1.2.4) [Cavalier- Smith 1999]. But because heterotrophic species are present in most photosynthetic clades, photosynthesis (or perhaps even plastids) must have been lost in multiple lineages to ex- plain the current patchy distribution. Thus, at face value, multiple independent acquisi- tions of secondary red plastids is so far a valid alternative postulate [Sanchez-Puerta and Delwiche 2008; Archibald 2009; Bodyl et al. 2009].

1.3 Phylogenomics

1.3.1 A definition

Phylogenomics is a ten year old discipline of evolutionary biology that has followed the increasing availability of complete genome or genomic data (mainly ESTs). It was formal- ized when Eisen [Eisen 1998] first invented the term phylogenomics to describe an ap- proach combining comparative genomics and evolutionary information. Later on he pro- posed a short and general definition that describes very well what phylogenomics is about: “intersection of evolution and genomics” [Eisen and Fraser 2003]. The main idea behind is that studying genomes alone, without an evolutionary perspective, greatly narrows down the potential of such research.

Since these initial definitions, the scope of phylogenomics has extended and now in- cludes two mains fields:

1) The prediction of molecular function by homology, through the inference of evolution- ary processes underlying the appearance of protein families, integrating experimental data in these computer-based analyses (reviewed in [Sjölander 2004]).

2) The inference of species relationships using genomes and genomic data.

I will explain here in more detail this second aspect, which represents in my opinion the true phylogenomic approach (i.e., phylogeny + genomics) and was the approach I used

 21 Chapter 1: Introduction to tackle the different questions asked in this work. It is hard to set a precise limit above which one can call a dataset “a phylogenomic dataset”, but for the sake of simplicity I will consider studies analyzing more than 50 genes to be phylogenomic in nature. I will de- scribe the obligatory cautiousness when employing phylogenomics, and how it can help in reconstructing the phylogeny of species.

1.3.2 How does phylogenomics work?

Generally, two possibilities have been explored for inferring phylogenies using phyloge- nomics [Delsuc et al. 2005] (Figure 6-ch.1):

Figure 6-ch.1. Methods of phylogenomics inference. From [Delsuc et al. 2005].

Sequence-based methods: As in any phylogenetic reconstruction, a primordial step needs here as well to be carefully performed in order to ensure that a “vertical trans- mission” of the characters is respected: the homology assessment. This is not an easy task because genes of interest for inferring evolutionary history are often poorly sampled, so de- ciding between orthology and paralogy is not always possible (impossible to differentiate between independent gene losses or unsampled genes in some species, for example). Simi- larly, horizontally (or laterally) transferred genes (HGT) can be difficult to pinpoint with a limited taxon selection, yet they are potentially a source of misleading signal [Beiko and Ragan 2008; Keeling and Palmer 2008]. But because the starting material to build big

22 Chapter 1: Introduction alignments is usually datasets of at least a thousand ESTs, finding a sufficient number of genes where orthology can be deduced is nevertheless doable. On the other hand, for this very same reason that phylogenomics deals with huge amounts of data, it is fair to assume that even if little undetected paralogy or HGT remain after careful checks it is unlikely that they will dominate the genuine phylogenetic signal [Lake and Rivera 2004].

The method of choice for creating a set of orthologous genes is the inference of phylo- genetic trees for every single-gene alignment, thus requiring the individual genes to be aligned and unambiguous positions to be selected. This is much more precise than a sim- pler and faster selection of species/sequences based solely on BLAST results [Altschul et al. 1990], which is known to be unreliable. Indeed the BLAST algorithm does not take into account evolutionary information, so that genes appearing to be the most similar based on BLAST hits are often not each others closest relative in term of phylogeny, leading to false positive insertions of species [Koski and Golding 2001].

After this first and important step, two options are conceivable, that is the superma- trix or the supertree approaches [Delsuc et al. 2005]. The supermatrix approach corre- sponds to the concatenation of all selected single-genes into one super-alignment, and sub- mitting it to classical phylogenetic reconstruction methods (the more reliable being at pre- sent the probabilistic methods, Maximum Likelihood [Felsenstein 1981] and Bayesian [Huelsenbeck et al. 2001]). The strategy that is generally applied is to consider each con- catenated sequence as one “gene” and ignoring the evolutionary specificities of each [Philippe et al. 2004; Philippe et al. 2005; Rodriguez-Ezpeleta et al. 2005; Rokas et al. 2005; Wiens 2005; Delsuc et al. 2006; Patron et al. 2007; Rodriguez-Ezpeleta et al. 2007a]. It pre- sumes that the different discordant histories, if any, contained in each gene will be aver- aged away by the combined analysis of numerous characters. Another strategy that has also been tested, to a lesser extent, is to allow a different set of parameters for each gene in order to more adequately describe different tempos and modes of evolution [Bapteste et al. 2002; Philippe et al. 2004; Patron et al. 2007], but results were generally not significantly different from the “cruder” approach above, questioning its real utility.

Because every single-gene that makes up the concatenation is in principle subject to its own selection of species, upon sequence availability, missing entries are the rule and they are generally not equally distributed (some species have many missing positions, oth- ers have nearly none). Potentially they could drastically lower the resolution of a tree, or induce artifacts due to model violations. Missing data occur even when complete genomes are available because genes can be independently lost, duplicated, or horizontally trans- ferred. This feature of phylogenomic alignments could be a serious drawback, making this

 23 Chapter 1: Introduction discipline a nice approach in theory but practically impossible to perform. Fortunately em- pirical and simulation studies have shown that the percentage of missing data can actually be high, up to 90%, and yet the overall signal remains [Wiens 2003; Driskell et al. 2004; Philippe et al. 2004]. This is especially true in a phylogenomic context as the number of sites present in a large concatenated alignment remains high, even for species with a lot of missing data. Furthermore, it seems that adding even incomplete taxa is beneficial and improve phylogenetic accuracy by breaking long branches [Wiens 2005]. However this con- cern has in my opinion not been investigated thoroughly enough, and precise issues such as the influence of the distribution of missing data (evolutionary close or not to species with no missing data) still need to be specifically discussed. Otherwise risks exist that the sup- posedly weak influence of missing data is taken as face value, so that many highly incom- plete species would not be treated with the necessary caution.

The second sequence-based approach is the supertree, which differs from the superma- trix in that it combines the trees, generated individually based on the single-genes, and not the single-genes themselves. In practice this method has barely been employed [Philip et al. 2005; Fitzpatrick et al. 2006] and comparative efficiency studies of supermatrix and supertree, especially in a phylogenomic context, are needed to shed light on the benefits and disadvantages of both. Until then phylogenomics will likely be based almost entirely on the supermatrix approach, owing to its much greater hindsight.

Whole-genome feature methods: These methods, relatively new, do not directly rely on multiple-sequence alignment and generally cannot be applied to incom- plete genomic sampling. They provide great promise for the near future as very valuable independent and complementary possibilities for testing phylogenetic trees, when complete genomes will be available for a larger diversity and improvements made in their implemen- tation. Because they are precisely based on entire genomes, one can assume that these kind of data truly reflect the organismal evolution, or at least better approximate it than single-, or even multiple-gene phylogenies. Moreover, events under investigation here, such as alteration of the gene-order or gene-content in a species are supposed to be extremely rare [Rokas and Holland 2000], thus not sensitive to homoplasy.

Looking at the gene-order or gene-content comes to considering each chromosome as a linear (or circular, for example in the case of mitochondria or genomes) order- ing of genes [Moret and Warnow 2005], from which evolutionary relationships are puta- tively inferred. These methods use the number of shared orthologous genes between ge- nomes as a similarity measure [Korbel et al. 2002]. In its most simplistic application the similarity between two species is calculated by the number of genes they have in common

24 Chapter 1: Introduction divided by their total number of genes, so that evolutionary distances are interpreted in terms of events such as the acquisition and loss of genes [Snel et al. 1999]. More sophisti- cated approaches have also been developed, such as parsimony algorithms [Fitz-Gibbon and House 1999; House and Fitz-Gibbon 2002] or statistical frameworks [Gu and Zhang 2004; Larget et al. 2005a; Larget et al. 2005b], but they generally showed weak capacities in resolving phylogenies due to their inability to correctly handle the issue of saturation.

Another method is the distribution of sequence strings approach which transforms into distances the observed frequencies of short oligonucleotides or oligopeptides in comparison to a theoretical, completely random usage of these strings [Deschavanne et al. 1999; Ed- wards et al. 2002; Pride et al. 2003; Qi et al. 2004]. This method has not been extensively tested and so far no model of evolution is available to explain the observations. This method seems to be particularly prone to saturation of the signal, thus of limited usage in deep evolutionary questions [Pride et al. 2003]. Nevertheless it is worth investigating fur- ther, notably because it is one of the few to be free of the orthology prerequisite.

1.3.3 Stochastic and systematic errors

Whereas some aspects of phylogenomics urgently require validation and further im- provements (supertree, whole-genome methods), the supermatrix approach has been, on the other hand, well tested and is based on more than 20 years of molecular phylogeny, using the very same fundamental principles. This means that the classical problems in tree reconstruction are known. Importantly, they are often exacerbated in a phylogenomic con- text so one has to be particularly careful when inferring phylogenomic trees. What are these problems?

Two types of errors are recognized in phylogenetic reconstruction, and these are the same in phylogenomics: stochastic (or sampling) errors and systematic errors [Delsuc et al. 2005]. To really understand the issues here, one can visualize molecular phylogeny as a balance where on one side is the true phylogenetic signal (thus carrying the evolutionary history) and on the other side is the lack of signal combined with the unavoidable non- phylogenetic signal (resulting from the noise, generated by homoplasic positions in the data –stochastic errors–, and model misspecifications –systematic errors–, respectively). This misleading signal is the resultant combination of the stochastic and systematic errors. The goal of a phylogenetic inference is logically to minimize the effects of the non-phylogenetic signal and, at the same time, maximize the phylogenetic signal.

Stochastic errors arise when the number of positions in an alignment is small, meaning that the random background noise, inevitably present and accumulating through time, will have a non-negligeable effect on the positions that do bear real information. This is typi-

 25 Chapter 1: Introduction cally the case for most single-gene phylogenies that address ancient relationships, which often lead to poorly resolved trees. The obvious way to overcome this problem is to in- crease the amount of data in the hope that enough phylogenetic signal will be recovered (i.e. synapomorphy will dominate homoplasy). In theory this is the case, and if extremely large datasets are available the bootstrap support should in principle be maximum, com- pletely resolving evolutionary trees [Gee 2003].

However, one has to keep in mind that a fully resolved tree, with maximum statistical support for all nodes, does not necessarily mean that the phylogeny is correct; tree recon- struction artifacts may be at work, also producing strong support (see for example [Rokas and Carroll 2005; Jeffroy et al. 2006]). This apparent contradiction is due to model viola- tions that, by incorrectly explaining the data, lead to systematic errors [Jermiin et al. 2004; Rodriguez-Ezpeleta et al. 2007b]. Contrary to stochastic errors, systematic errors do not tend to vanish with the addition of data because the signal does not average out over nu- merous sites. If this signal is strong enough to actually prevail the phylogenetic signal, then well supported but indeed wrong trees are produced. So observing an increasing phyloge- netic resolution as larger datasets are treated is not a reliable criterion of accuracy. Evolu- tionary features at the origin of systematic errors, when not correctly handled by probabil- istic models, include among others: heterogeneity of nucleotide or amino acid composition, that tends to incorrectly cluster together species sharing the same composition [Galtier and Lobry 1997; Foster 2004; Jermiin et al. 2004]; variable rates of substitution across sites, leading to the LBA artifact [Felsenstein 1978; Yang 1994]; heterotachy, that is the variation of the evolutionary rate at a given position throughout time [Lopez et al. 2002; Kolaczkowski and Thornton 2004]. More generally, any signal resulting from convergence in unrelated lineages will affect the correct estimates of the real phylogeny [Ho and Jermiin 2004].

1.3.4 So: can phylogenomics answer important questions?

According to what has been just said, one may have the impression that, in truth, phylogenomics does not really counteract the issues of single-gene phylogenies. But it does, it is in fact potentially very powerful for answering difficult evolutionary questions, and I hope that by the end of this manuscript you will be convinced. The important point is that the known sources of systematic errors have to be cautiously reduced as much as pos- sible, because, as the amount of data gets larger, the noise will naturally be weakened by the big size of datasets. By doing so, it is expected that the resolution of trees will in- crease, giving reliable support for real nodes.

26 Chapter 1: Introduction

Several approaches have been proposed in order to reduce the impact of systematic er- rors. When some lineages are evolving faster than the mean rate, thus accumulating more multiple substitutions, the obvious way for avoiding an artifactual grouping of these species is to discard them prior to the analysis (e.g., [Stefanović et al. 2004; Philippe et al. 2005]). A somehow finer method is to selectively eliminate fast-evolving characters (not a com- plete branch), allowing for retention of as many species as possible but at the same time removing the positions that are most likely to be saturated [Brinkmann and Philippe 1999]. Whereas this approach is not very suitable for single-gene phylogenies because of the insuf- ficient remaining signal after the removal of rapidly evolving sites, it has proven to be helpful in phylogenomics [Delsuc et al. 2005; Rodriguez-Ezpeleta et al. 2007b]. An interme- diate possibility is to successively take off whole genes based on their rate of evolution [Brinkmann et al. 2005]. Instead of trying to reduce systematic errors through data exclu- sion, phylogenies can also be improved by more sophisticated models of evolution. For ex- ample a recently developed model (the CAT model [Lartillot and Philippe 2004]) that bet- ter explains the heterogeneity of the amino acid replacement process than classical homo- geneous models gave promising results [Lartillot and Philippe 2008], and emphasized the urgent need of more realistic models. Finally, another important way of limiting the mis- leading effect of multiple substitutions is to add as many species as possible in a tree re- construction (as long as they do not carry non-phylogenetic signal), in order to break long branches and allow the models to detect saturation that would have otherwise been hidden with a poorer taxon sampling [Graybeal 1998; Mitchell et al. 2000; Lin et al. 2002].

Nowadays, the phylogenomic approach cannot be ignored when inferring evolutionary relationships. This is particularly true for ancient divergences because, as explained above, the time elapsed since the split in the history of species is so long that the phylogenetic signal contained in a few molecular markers has often been erased by the random noise that accumulates naturally through evolution. The initial output of this approach has al- ready revealed promising results, and phylogenetic relationships were proposed or con- firmed. For example, it was shown that a single endosymbiotic event gave rise to the pri- mary photosynthetic eukaryotes [Rodriguez-Ezpeleta et al. 2005]. The monophyly of two important groups of red algal secondary plastid bearing eukaryotes, haptophytes and cryp- tomonads, was also demonstrated [Patron et al. 2007]. Within Metazoa, phylogenomic analyses strongly supported the monophyly of Ecdysozoa and Lophotrochozoa [Philippe et al. 2005] and a sister relationship between tunicates and [Delsuc et al. 2006]. The origin of animals from their unicellular ancestors has also been addressed [Ruiz-Trillo et al. 2008; Shalchian-Tabrizi et al. 2008]. At a broader scale, a well resolved tree of eu-

 27 Chapter 1: Introduction karyotes including most –but not all– important assemblages were proposed based on the analyses of 143 proteins [Rodriguez-Ezpeleta et al. 2007a].

Our own work, as we shall see in the following chapters, permitted us to confirm the monophyly of Rhizaria (chapter 3) and confidently resolved several ancient divergence points in the evolutionary history of eukaryotes. In particular it supported a very robust relationship between Rhizaria and two main clades of the supergroup chromalveolates (stramenopiles and alveolates) (chapter 4) as well as a tree in which, at its deepest level, only three stems are displayed, that are two highly supported megagroups, enclosing the vast majority of eukaryotic species, and the excavates (chapter 5).

Phylogenomics is also a valuable alternative for targeted species that could not be posi- tioned within the tree of eukaryotes neither from morphological studies nor single gene- based phylogenies. Such cases include for example the recent placement of the green flag- ellate Mesostigma viride in the [Rodriguez-Ezpeleta et al. 2007c], or the in- ference of the evolutionary origin for the breviate amoeba {Minge, 2009, p08461}. We in- vestigated the phylogenetic position of two of the main ‘orphan’ groups of eukaryotes, Te- lonemia and Centroheliozoa, a project that is described in chapter 6.

1.3.5 The case of EGT in the context of phylogenomics

Endosymbiotic gene transfer (or EGT) explains in part why all known organelles of en- dosymbiotic origin (in particular the photosynthetic plastids) encode only a small fraction of the genes present in the free-living organisms that were engulfed [Lane and Archibald 2008]: a proportion of genes have been transferred to the host nucleus. These genes are translated in the host cytoplasm and the resulting proteins are targeted to the plastid by a dedicated protein import machinery [Soll and Schleiff 2004]. This proportion of transferred genes varies depending on the species, but is generally quite significant. For example it has been reported that the nuclear genome of the primary photosynthetic species Arabidopsis thaliana harbors a surprisingly high percentage of genes (18%) derived from the cyanobac- terial progenitor of plastids [Martin et al. 2002]. Lower amounts have been observed in Chlamydomonas reinhardtii (6%) [Moustafa and Bhattacharya 2008], Cyanophora para- doxa (11%) [Reyes-Prieto et al. 2006], and Cyanidioschyzon merolae (13%) [Sato et al. 2005], but overall it remains very substantial. Species with secondary plastids also have important fractions of genes that are likely to have been transferred to the nuclear genome of the heterotrophic host cell, both from the plastid and the nuclear genome of the endo- symbiont. To date the study of secondary algae for which complete genomes are available has revealed that the diatom Thalassiosira pseudonana encodes 9% of plant and algal genes and more than 2% of cyanobacterial genes [Armbrust et al. 2004]. Very recently, the

28 Chapter 1: Introduction genome of another diatom (Phaeodactylum tricornutum) was released and showed that it contains 1.6% of red algal genes [Bowler et al. 2008]. Strikingly, even non-photosynthetic organisms (but related to plastid-bearing lineages) possess genes of putative red algal or cyanobacterial ancestry, as attested by the discovery of such genes in the ramorum [Tyler et al. 2006] and the [Reyes-Prieto et al. 2008].

So why is this massive movement of genes relevant in the context of phylogenomics? While all eukaryotes are chimeras (through the endosymbiosis that gave rise to the mito- chondria), the plants and primary algae are, as we have just described, particularly com- plex mixtures of diverse origin. Even more complicated are the secondary algae (and terti- ary). But all these endosymbiosis events have created an important part of the eukaryotic biodiversity we observe today, characteristics of lineages belonging to three different super- groups, perhaps even at the origin of the chromalveolates, so that they cannot be ignored when inferring phylogenetic trees for many groups. When single-genes were the rule, it was in most cases easy to determine whether the marker under consideration was from the host or instead laterally transferred from the endosymbiont. With phylogenomic datasets the situation is different, because it is difficult to investigate in great detail the evolutionary history of hundreds of genes. One must rely on automated or semi-automated procedures to identify vertically inherited candidates and discard genes that were introduced into the host nucleus from other sources, such as by way of EGT. At present, it is difficult to assess to what extent this phenomenon biases phylogenetic reconstructions based on phyloge- nomic principles. The potential to introduce noise and even misleading conclusions is cer- tainly non-negligible, and much caution needs to be taken. But so far, however, several independent studies (as discussed in this manuscript) tend towards the same general re- sults (e.g., the clade composed of Rhizaria, stramenopiles and alveolates), thus validating to our opinion this approach.

 29

Chapter 2: Analysis of expressed sequence Tags from a naked foraminiferan Reticulomyxa filosa 

 

F. Burki, Nikolaev SI, Bolivar I, Guiard J & Pawlowski J

Published in: Genome, 49: 882-887, 2006

 31 Chapter 2: cDNA library of R.filosa and EST sequencing

2.1 Project description

This chapter describes the initial steps that were at the origin of the whole project and preceded all the phylogenetic analyses: obtaining the first genomic data for a foraminifer, member of the supergroup Rhizaria. This was not an easy task as “forams” have largely proven to be very challenging to culture in a laboratory. In order to obtain sufficient amounts of starting material we concentrated our efforts on Reticulomyxa filosa, one of the few species that is cultivable and happened to be maintained in our laboratory. To gener- ate rapidly and cost-effectively as many sequences as possible we chose to sequence librar- ies of cDNA with the classical Sanger approach, a method that was widely used on model organisms but not on poorly studied protists. Hence, we had to adapt a protocol to extract and amplify the RNA, construct, clone and finally sequence the cDNA from R. filosa. This paper was mostly to validate our approach.

32 Chapter 2: cDNA library of R.filosa and EST sequencing

2.2 Abstract

Foraminifers are a major component of modern marine ecosystems and one of the most important oceanic producers of calcium carbonate. They are a key phylogenetic group among amoeboid protists but our knowledge of their genome is yet mostly limited to a few conserved genes. Here, we report the first study of expressed genes by means of expressed sequence tag (EST) from the freshwater naked foraminiferan Reticulomyxa filosa. Cluster analysis of 1630 valid ESTs enabled the identification of 178 groups of related sequences and 871 singlets. Approximately 50% of the putative unique 1059 ESTs could be annotated using Blast searches against the protein database SwissProt + Trembl. The EST database described here is the first step towards gene discovery in foraminifera and should provide the basis for new insights into the genomic and transcriptomic characteristics of these in- teresting but poorly known protists.

2.3 Introduction

Foraminifers are a major component of modern marine ecosystems [Lee and Anderson 1991]. Their cosmopolitan distribution ranges from polar shelves to tropical coral reefs. Planktonic foraminifers are an important and ubiquitous group of marine zooplankton [de Vargas et al. 1997]. Benthic foraminifera are reported from marine habitats going from su- pralittoral sands and intertidal mudflats to the deepest abyssal trenches [Gooday 2002; Todo et al. 2005]. They are highly diverse even in the most extreme polar environments [Pawlowski et al. 2002; Habura et al. 2004]. They are also present in freshwater [Holzmann and Pawlowski 2002; Holzmann et al. 2003], as well as in terrestrial habitats [Meisterfeld et al. 2001].

Foraminifers play an important role in biogeochemical cycles of inorganic and organic compounds [Sen Gupta 1999]. Together with coral reef communities and pelagic micro- organisms such as coccolithophores, foraminifers belong to the major oceanic producers of calcium carbonate. The production and burial of both organic carbonate and biogenic car- bonate within the marine system provide a potential sink for carbon and therefore takes a big part in the global carbon cycle [Barker et al. 2003].

Whereas the diversity and ecology of modern foraminifers are quite well characterized [Sen Gupta 1999], most aspects of foraminiferan molecular biology are still poorly studied. Similarly, very little is known about other related amoeboid protists, which form together with foraminiferans the supergroup of Rhizaria [Nikolaev et al. 2004]. To accelerate gene discovery and investigate evolution in Foraminifera as well as within other Rhizaria, we

 33 Chapter 2: cDNA library of R.filosa and EST sequencing have conducted an expressed sequence tag (EST) project on the freshwater naked fora- miniferan Reticulomyxa filosa.

Unlike other foraminifers, R. filosa cells grow rapidly in laboratory and simple culture conditions allow one to obtain adequate amounts of pure DNA and RNA. First isolated from decaying leaves in the New York City area [Nauss 1949], it was later rediscovered in a freshwater pond in Bochum, Germany [Hülsmann 1984], and in a freshwater fish tank in a laboratory in Berkeley, USA [Koonce and Schliwa 1985]. However, it was only recently that this naked species, traditionally considered as a group of testate protists [Pawlowski et al. 1999], was clearly identified as belonging to Foraminifera.

Prior to the initiation of this survey, the only molecular data available for the Fora- minifera were LSU rRNA, SSU rRNA, and 6 protein coding genes (namely actin, RNA polymerase II, -, - and -tubulin and ubiquitin) [Pawlowski et al. 1994; Pawlowski et al. 1996; Longet et al. 2003; Archibald and Keeling 2004; Flakowski et al. 2005; Habura et al. 2005]. We sequenced about 1900 cDNA clones resulting in over 1050 unique sequences, in- creasing the available molecular data for the group more than 100-fold and thus creating the first extensive high-throughput data set for the Foraminifera.

2.4 Result & discussion

2.4.1 Sequencing & clustering

A total RNA extraction from R. filosa was used in a mRNA amplification procedure to generate antisense RNA copies of each mRNA in the sample. This method was chosen over other nucleic acid amplification methods because it does not significantly distort the relative abundance of individual mRNA sequences within an RNA population. The an- tisense RNA copies were then converted into cDNA by priming random hexamers, which, in theory, are capable of binding throughout virtually any RNA template and have been shown to contain more 5’ information than those primed with oligo(dT) [Ozkaynak et al. 1990]. After ligation and transformation of the cDNA, a total of 1908 clones were isolated and sequenced, leading to 1630 high quality sequences passing quality checks and vector- trimming (see Material and Methods for more details). Within this EST collection, 759 se- quences were assembled into 178 clusters whereas 871 sequences did not find any homolog after pairwise comparisons (thus stayed as singlets). At the end of the clustering procedure, the CAP4 engine (available in the Paracel Clustering Package, see Material and Methods) performs assembly on the clustered sequences by taking all the ESTs that are in a cluster and attempting to assemble them into a single contig representing a single transcript. As

34 Chapter 2: cDNA library of R.filosa and EST sequencing expected with ESTs binned based on local similarity (some sequences are occasionally not similar enough to be assembled together), 6 clusters generated more than one contigs (from 2 to 4). Overall, singlets plus contigs give a total of 1059 putative unique sequences (uniseqs).

Because of the use of random hexamers and the non-directional cloning strategy, this number of uniseqs is likely to be over representative of the actual number of unique genes. Indeed, one expects to find the same gene in 2 or more clusters or singlets if 2 or more non-overlapping ESTs belonging to the same transcript have been sequenced. To get a rough idea to what extend this may affect the clustering, it is worth to mention that we observe only 15 groups of uniseqs, representing a total of 33 contigs or singlets (1 group with 4 contigs belonging to the same cluster, 1 group with 3 uniseqs, and 13 groups with 2 uniseqs whose 2 are 2 contigs from the same cluster), harboring the same best hit after a blastx search against the SwissProt + Trembl (swisstrembl) protein database. Although this result does not mean that some other ESTs with different hits may represent the same gene too, it is still a good indication of the number of unique genes in our dataset (likely above 1000).

From the distribution of the EST number in each cluster (Figure 1-ch.2), it is apparent that the majority of ESTs are present in low copy number. Most of the reads were singlets (871 sequences), and the biggest number of clusters (113) contains only 2 sequences. On the other hand, the larger cluster was comprised of 87 ESTs, followed by the next largest clusters of 61, 50, and 38, respectively. The biggest cluster contains thus more than 2 times sequences that of the fourth cluster. Overall, the distribution of EST number per cluster shows that there is a low redundancy in our data, which makes our approach favorable for gene discovery.

Figure 1-ch.2. Frequency of occurrence of the different cluster sizes. The 871 clusters con- taining only 1 sequence correspond to singlets.

 35 Chapter 2: cDNA library of R.filosa and EST sequencing

2.4.2 Comparisons with databases

Each uniseq was searched (blastx) against the swisstrembl amino acid database. Using an e-value threshold of 1e-5, 519 sequences (49%) matched a known protein in the data- base whereas 540 (51%) lacked a similarity. This proportion of no hits is slightly above those recently published in other EST studies of various organisms, but it is still within the range typically expected when a large sample is acquired from a eukaryotic genome [Hackett et al. 2005; Jouannic et al. 2005; Keon et al. 2005; Ribichich et al. 2005]. The high percentage of sequences with no match probably reflects both the high genomic divergence of foraminifera [Pawlowski et al. 1997] and the early stage of rhizarian genome exploration (and thus the lack of sequences belonging to related organisms to R. filosa). To date, the only genome information available from another rhizarian is an EST dataset of Bigelowiella natans comprising about 3500 sequences [Keeling 2001]. Sequence comparisons against this dataset using tblastx revealed that only 233 of our 1059 uniseqs had a significant match (e- value  1e-5), much less than the 519 hits obtained against swisstrembl (only 4 hits out of these 233 are new, i.e. correspond to no similarity when searching the swisstrembl data- base). This comparison may be interpreted with some caution since the B. natans dataset is very small and not exhaustive at all compared to swisstrembl. However, there are likely to be substantial differences between R. filosa and B. natans with respect to gene content or/and gene expression levels, despite their close evolutionary relationship. The phyloge- netic analysis of ESTs from both species compared to other eukaryotes is in progress (F. Burki and J. Pawlowski, in prep. See Chapter 3). Interestingly, a tblastx search using the 540 sequences lacking a similarity (see above) against a homemade EST dataset (all EMBL + GenBank ESTs but without organism from which whole genome has been made avail- able) identified 31 more matches for those sequences that could not be assigned a similarity through searches of the swisstrembl database.

To further characterize these unknown sequences, we checked for putative open read- ing frames (ORFs) to test whether they do not just represent untranslated regions (UTR) of a transcript, rRNA contamination or anything else non-coding and thus homology simply cannot be found in a protein database. Out of the 540 sequences that lacked a similarity only 39 have a predicted ORF shorter than 50 amino acids and 111 (including the 39 ORFs just mentioned) have a predicted ORF shorter than 100 amino acids. Considering this cut- off that is often used to discriminate real genes from random ORFs, we can suppose that the remaining 429 ESTs are indeed coding regions that do not have similar known proteins available.

36 Chapter 2: cDNA library of R.filosa and EST sequencing

The top 20 most significant blastx hits against swisstrembl are shown in Table 1-ch.2. We found here some classical highly expressed housekeeping proteins (actins, polyubiq- uitin, tubulins) accounting for half of the 20 best hits. Importantly, 6 matches within the top 20 correspond to R. filosa TREMBL entries. In fact, we obtained a R. filosa hit in most of the cases for these very few R. filosa proteins already deposited in the public da- tabases. When available, matching these few proteins belonging to the same organism is a good sign of the quality of our dataset.

Table 1-ch.2. Top 20 hits of the R. filosa ESTs to the swisstrembl protein database rank Uniseq ID E-value AC Protein description Organism 1 cl136 0 Q820D0 Glutamine amidotransferase class- Nitrosomonas II:Phosphoribosyl transferase europaea

2 cl047 0 Q84VE1 Adenosylhomocysteinase-like protein Oryza sativa 3 cl007 0 Q7XAS6 Pollen 2-phosphoglycerate dehydrogenase 2 Cynodon dactylon 4 cl039 0 Q8AVH9 Phosphoprotein phosphatase 2A-alpha cata- Xenopus laevis lytic chain 5 cl003 0 Q9Y796 Glyceraldehyde-3-phosphate dehydrogenase Cryptococcus curvatus 6 cl109 0 Q9ZSE4 Serine/threonine protein phosphatase PP2A Hevea brasilien- catalytic subunit sis 7 cl015 0 Q9Y018 Actin 1 Reticulomyxa filosa 8 cl006 0 Q9Y019 Actin 2 Reticulomyxa filosa 9 cl013 0 Q7KQK2 Polyubiquitin fal- ciparum 10 cl063 0 Q26233 Alpha-tubulin Reticulomyxa filosa 11 jaR174 0 Q26235 Beta-tubulin Reticulomyxa filosa 12 cl051 0 Q5DFR4 Hypothetical protein Schistosoma ja- ponicum 13 cl052 0 Q26236 Beta-tubulin Reticulomyxa filosa 14 cl009 0 O44024 Alhpa-tubulin 3 Reticulomyxa filosa 15 cl010 0 P56839 Phosphoenolpyruvate phosphomutase Mytilus edulis 16 cl113 1e-94 Q6ZLZ9 Alpha tubulin Plasmodium fal- ciparum 17 cl025 2e-91 Q9ZSW1 Tubulin beta-1 chain 18 re407 1e-90 Q75JR8 Protein phosphatase 6 catalytic subunit discoideum 19 cl117 5e-89 Q7PY97 ENSANGP00000018457 gam- biae s 20 re549 3e-88 Q6WE52 Actin Thecamoeba si- milis

Moreover, Table 2-ch.2 displays the identity of clusters made up of 10 or more ESTs. The largest cluster (87 ESTs) represents one of the two paralogues of foraminiferan actin gene (ACT2). The other most common transcripts (Table 2-ch.2) were those encoding polyubiquitin (61 ESTs), actin 1 (50 ESTs) and alpha-tubulin (38 ESTs), suggesting that these are the most expressed genes in foraminifers. Note that no ribosomal protein appears in these most abundant ESTs. This is in contrast to what has been frequently reported for

 37 Chapter 2: cDNA library of R.filosa and EST sequencing other analyses to this type where this particular class of genes can be seen to dominate frequency tables [Abu et al. 2004; Jouannic et al. 2005; Ribichich et al. 2005]. Furthermore, 1 out of the 10 most abundant ESTs do not have any match to known proteins. Obviously, we cannot speculate on the function of the protein encoded by this unclassified cluster. But because of the high frequency of the ESTs belonging to this cluster, they might be highly expressed genes that are either specific to foraminifers or are evolving sufficiently rapidly to be beyond detection. Therefore, they would be interesting candidates for further investigations.

Table 2-ch.2. Transcript abundance as measured by EST redundancy.

Rank EST count Identity of clusters 1 87 Actin 2 2 61 Polyubiquitin 3 50 Actin 1 4 38 Alpha-tubulin 3 5 20 Glyceraldehyde-3-phosphate dehydrogenase 6 19 No similarity 7 16 Putative cysteine protease 8 11 Putative cysteine protease 9 10 SFN protein

2.4.3 Functional annotation

As mentioned above, 49% (519/1059) of the total non-redundant ESTs (uniseqs) share significant similarities with SwissProt or TREMBL entries. Putative function was exam- ined by classifying uniseqs by COG categories [Tatusov et al. 2003] and could be assigned for approximately the same number of ESTs (526) (Figure 2-ch.2), according to the auto- matic annotation protocol as implemented in AutoFACT [Koski et al. 2005]. The ESTs are unequally distributed between the different functional categories. A large proportion is rep- resented by a poorly annotated category as 21% of ESTs fall into the class General function prediction only. The second largest class of proteins is related to protein modification and turnover (14%), followed by translation (11%) and energy production (10%) categories. Many other classes are represented by less uniseqs. Globally, our dataset covers a broad range of different functionality since the annotation procedure classified the ESTs among 20 out of the 26 COG functional categories.

38 Chapter 2: cDNA library of R.filosa and EST sequencing

Figure 2-ch.2. Functional classification of the R. filosa ESTs based on COG categories. For each fraction, the label includes the COG category name (left), the number of clusters (centre), and the corresponding percentage of clusters. J: Translation, ribosomal structure and biogenesis / K: Tran- scription / L: Replication, recombination and repair / B: Chromatin structure and dynamics / D: Cell cycle control, , chromosome partitioning / V: Defense mechanisms / T: Signal transduc- tion mechanisms / Z: Cytoskeleton / U: Intracellular trafficking, secretion, and vesicular transport / O: Posttranslational modification, protein turnover, chaperones / C: Energy production and conver- sion / G: Carbohydrate transport and metabolism / E: Amino acid transport and metabolism / F: Nucleotide transport and metabolism / H: Coenzyme transport and metabolism / I: Lipid transport and metabolism / P: Inorganic ion transport and metabolism / Q: Secondary metabolic biosynthesis, transport and metabolism / R: General function prediction only / S: Function unknown.

2.5 Materials & methods

2.5.1 Cells and culture conditions

The strain of R. filosa, obtained from Dr. R. Breuker (University of Bochum), was maintained using Volvic table water as a culture medium and fed with pre-wetted wheat germ flakes as food [Breuker 1997]. Cells were inspected for purity by light microscopy, collected by low-speed centrifugation, re-suspended into 5 volumes of TriReagent (Sigma), and broken using manual pestles and adapted microtubes.

2.5.2 cDNA construction and ESTs sequencing

Following cell breakage, total RNA was extracted using the TriReagent manufacturer protocol. 250 ng of this total RNA was used in a reverse transcription reaction with an

 39 Chapter 2: cDNA library of R.filosa and EST sequencing oligo(dT) primer bearing a T7 promoter and in vitro transcription of the resulting DNA with T7 RNA Polymerase (MessageAmp aRNA Kit, Ambion). The resulting antisense RNA (aRNA) copies were then converted into cDNA with the SuperScript Choice System for cDNA Synthesis (Invitrogen), generating double-stranded EcoRI-ended cDNAs that were next amplified by PCR using the adapter sequence as primers and randomly ligated into the pCR 2.1-TOPO vector (Invitrogen). These constructs were directly used to trans- form bacteria, followed by verification PCRs to check whether the clones were positive. EST sequencing was carried out with the ABI-PRISM Big Dye Terminator Cycle Se- quencing Kit and analysed with an ABI-3100 DNA sequencer (Perkin-Elmer), all according to the manufacturer’s instructions.

2.5.3 Sequence processing and analysis

The ABI formatted chromatogram ESTs were processed automatically using a custom pipeline. This pipeline included base calling and quality control by PHRED [Ewing et al. 1998], followed by vector and adapter regions trimming by CrossMatch [Green 1996]. A final manual check eliminated remaining sequences shorter that 200 bp and removed poly(A) tails. The EST sequences were clustered and assembled into contigs using the Paracel Clustering Package (PCP) (Paracel Inc, Pasadena CA). Clusters that contained only one sequence were classified as singlets. Altogether, contigs and singlets make the dataset of putative unique ESTs (uniseqs).

All uniseqs were used to search the SwissProt + TREMBL protein sequence dataset (swisstreml) using the blastx algorithm with an e-value cut off at 1e-5. For ESTs with no similarity found, putative open reading frames (ORF) were predicted with ESTScan (Iseli et al. 1999). To assign functions, uniseqs were also subjected to automatic annotation using AutoFACT [Koski et al. 2005]. PHRED, CrossMatch, PCP, blast searches and ESTScan were all run on the remote server Ludwig-sun2 at the Swiss Institute of Bioinformatics (Lausanne, Switzerland) [Falquet et al. 2003]. AutoFACT was executed on the Vital-IT computational facilities at the Swiss Institute of Bioinformatics (Lausanne, Switzerland).

40

Chapter 3: Monophyly of Rhizaria and multigene phylogeny of unicellular Bikonts 

   F. Burki & Pawlowski J

Published in: Molecular Biology and Evolution, 23: 1922-1930, 2006

 41 Chapter 3: Rhizaria in a phylogenomic framework

3.1 Project description

With our R. filosa EST dataset in hand, we wanted to contribute to the general effort that aimed at resolving the deep evolutionary relationships within the eukaryotic tree. Ge- nomic data were becoming available for enough species to make possible the construction of large multigene alignments, yet the diversity of such species was limited (mainly re- stricted to model organisms, which are often not of great phylogenetic relevance). Our newly produced ESTs corresponded to the second set of genomic sequences for a member of the supergroup Rhizaria. Importantly, this supergroup was never included in large multigene phylogenies of eukaryotes. This was problematic as it represents one of the five recognized supergroups. In our opinion, it cannot be ignored when discussing the tree of eukaryotes. This paper was the first attempt to introduce Rhizaria in a phylogenomic framework.

42 Chapter 3: Rhizaria in a phylogenomic framework

3.2 Abstract

Reconstructing a global phylogeny of eukaryotes is an ongoing challenge of molecular phylogenetics. The availability of genomic data from a broad range of eukaryotic phyla helped in resolving the eukaryotic tree into a topology with a rather small number of large assemblages, but the relationships between these “supergroups” are yet to be confirmed. Rhizaria is the most recently recognized “supergroup”, but, in spite of this important posi- tion within the tree of life, their representatives are still missing in global phylogenies of eukaryotes. Here, we report the first large scale analysis of eukaryote phylogeny including data for two rhizarian species, the foraminiferan Reticulomyxa filosa and the chlorarach- niophyte Bigelowiella natans. Our results confirm the monophyly of Rhizaria (Foraminifera + Cercozoa), with very high bootstrap supports in all analyses. The overall topology of our trees is in agreement with the current view of eukaryote phylogeny with basal division into “unikonts” (Opisthokonts, Ameobozoa) and “bikonts” (Plantae, alveolates, strameno- piles and excavates). As expected, Rhizaria branch among bikonts, however their phyloge- netic position is uncertain. Depending on the dataset and the type of analysis Rhizaria branch as sister group to either stramenopiles or excavates. Overall, the relationships be- tween the major groups of unicellular bikonts are poorly resolved, despite the use of 85 proteins and the largest taxonomic sampling for this part of the tree available to date. This may be due to an acceleration of evolutionary rates in some bikont phyla or be related to their rapid diversification in the early evolution of eukaryotes.

3.3 Introduction

Resolving the structure of the phylogenetic tree of eukaryotes is of crucial importance for understanding the major evolutionary steps that could possibly explain the relationships between species. During the last two decades, the advances in molecular systematics led to establish new monophyletic assemblages and helped in drawing the relations between the numerous lineages recognized on the basis of morphological and ultrastructural data. At first based almost exclusively on the small-subunit rRNA gene (SSU rRNA) [Sogin et al. 1989; Sogin 1991; Kumar and Rzhetsky 1996; Pawlowski et al. 1996; Sogin and Silberman 1998], molecular phylogenies of eukaryotes were subsequently tested with protein-coding genes [Yamamoto et al. 1997; Moreira et al. 1999; Philippe et al. 2000b]. Despite their im- portant role in the early days of molecular phylogenetics, single gene phylogenies are now known to be highly sensitive to variation of evolutionary rates, which often led to false

 43 Chapter 3: Rhizaria in a phylogenomic framework representation of early eukaryotic evolution [Stiller and Hall 1999; Morin 2000; Philippe 2000; Philippe and Germot 2000].

Over time, the accumulation of protein sequences from a large variety of eukaryotes has made it possible to test single gene phylogenies using combined data [Baldauf et al. 2000]. A new view of global phylogeny of eukaryotes emerged from a growing number of evidence based on several different kinds of mutually reinforcing data, such as (i) multiple gene phylogenies [Bapteste et al. 2002; Yoon et al. 2002; Philippe et al. 2004; Hampl et al. 2005; Harper et al. 2005; Philippe et al. 2005; Rodriguez-Ezpeleta et al. 2005; Sims et al. 2006; Steenkamp et al. 2006]; (ii) individual phylogenies converging on the same relation- ships [Fast et al. 2002; Simpson et al. 2002b; Longet et al. 2004]; (iii) discrete characters [Baldauf and Palmer 1993; Keeling and Palmer 2001; Stechmann and Cavalier-Smith 2002; Archibald et al. 2003a]; and (iv) morphological and ultrastructural data [Simpson et al. 2002a]. Overall, the vast majority of the known diversity of eukaryotes seems to be distrib- uted among only five to six major divisions that are probably all monophyletic, referred to as the plants, excavates, chromalveolates, Rhizaria (all belonging to the assemblage of so called “bikonts”) and the “unikonts”, which comprise the opisthokonts and Amoebozoa [Keeling et al. 2005]. Identifying these natural supergroups raised the new challenge of un- derstanding the relationships amongst them, which, for most of the eukaryotic tree, has yet to be confirmed.

Rhizaria [Cavalier-Smith 2002] is a recently emerged supergroup of eukaryotes enclos- ing organisms as diverse as filose testate amoebae, cercomonads, chlorarachniophytes, fora- minifera, plasmodiophorids, haplosporidians, gromiids and radiolarians [Adl et al. 2005]. The first hints for the evolutionary meaning of the group came from SSU rRNA based phyloge- nies [Bhattacharya et al. 1995; Cavalier-Smith and Chao 1997]. Rapidly, the phylum Cer- cozoa was created to accommodate this new assemblage [Cavalier-Smith 1998b]. Further molecular studies confirmed the heterogeneity of this phylum, with various protists being included in it [Burki et al. 2002; Cavalier-Smith and Chao 2003a; Polet et al. 2004]. Protein data indicated a relationship between Foraminifera and Cercozoa [Keeling 2001; Archibald et al. 2003a; Longet et al. 2003] and a combined analysis of SSU rRNA and actin confirmed their relation with [Nikolaev et al. 2004]. Finally, a study of a single or double amino acid insertion in the protein polyubiquitin suggests that Radiolaria represent the most basal branch of Rhizaria, followed by Foraminifera and Cercozoa [Bass et al. 2005].

Despite their now well accepted taxonomic status, the Rhizaria are still missing in most of the multigene phylogenies published to date [Bapteste et al. 2002; Philippe et al. 2004; Hampl et al. 2005; Rodriguez-Ezpeleta et al. 2005; Steenkamp et al. 2006]. Until re-

44 Chapter 3: Rhizaria in a phylogenomic framework cently, the only available rhizarian genomic information was an expressed sequence tag (EST) dataset for the Bigelowiella natans comprising about 3500 se- quences [Keeling and Palmer 2001]. Some of these sequences have been used in studies with other purposes than exploring the phylogenetic position of Rhizaria [de Koning et al. 2005; Harper et al. 2005] or are even absent from the final trees because of a suspected arti- factual position [Simpson et al. 2006].

To include Rhizaria in multigene phylogenies of eukaryotes, we have recently con- ducted an EST project on the freshwater naked foraminiferan Reticulomyxa filosa, which led to approximately 1600 high quality sequences (Burki et al. 2006). Combining the avail- able genomic information, we assembled in this study a dataset of 85 orthologous proteins for 37 eukaryotic species, including the two rhizarian species R. filosa and B. natans, in order to (i) confirm the monophyly of Rhizaria when using a large number of protein- coding genes, and (ii) infer the phylogenetic position of this supergroup within eukaryotes.

3.4 Results

3.4.1 Sequences and alignments

37 eukaryotic species representing a broad taxonomic sampling and for which a large amount of data is available were selected. From our initial dataset, we retained 85 proteins (see supplementary table S2-ch.3) according to the following criteria: (i) at least 19 species out of the total of 37 (> 50 %) could be retrieved; (ii) at least one out of the two rhizarian species were present; and (iii) the orthology between all species was unambiguous on the base of ML trees. To minimize missing data in Rhizaria, sequences were shorted by re- moving all sites if not present neither in R. filosa nor in B. natans, leading to a final con- catenated alignment of 13258 amino acid positions (complete alignment or CA). Overall, the average missing data across the alignment was of 21 % with a minimum of no missing data in Homo sapiens and Drosophila melanogaster (0 %) and a maximum of 79.55 % in (see supplementary table S1 for a detailed list).

We also considered for analyses a reduced alignment where genes not found in our R. filosa ESTs survey were taken off, leaving 9947 amino acid positions (R. filosa no missing data alignment or NMDA). This has been done for two reasons. First, R. filosa is our or- ganism of main interest, thus we wanted to have an alignment without any missing data for this species. Second, the B. natans EST dataset contains a lot of sequences encoding plastid-targeted proteins with a chlorophyte green algal origin for the most part, but also with streptophyte algae, red algae or even bacteria origins [Archibald et al. 2003b]. Al-

 45 Chapter 3: Rhizaria in a phylogenomic framework though quite a few of these ESTs have already been annotated [Archibald et al. 2003b; Rogers et al. 2004], it was crucial to avoid the mixture of host genes with non-annotated endosymbiont or laterally transferred genes. Based on separate phylogenetic analyses for each selected gene, we were able to discard many questionable B. natans genes (i.e. B. natans genes which doubtfully branched very closely to plants), but one might still argue that some genes with only B. natans as rhizarian species in our complete alignment (CA) have originated through secondary endosymbiosis or lateral gene transfer. Thus, considering the NMDA where for every B. natans sequence an orthologous rhizarian sequence from R. filosa was available lead to higher confidence in our results (see below).

3.4.2 Phylogenetic position of Rhizaria

The analyses of the CA and the NMDA give trees of generally similar structure (Fig- ure 1-ch.3 + Supplementary Material), congruent with global eukaryotic phylogenies in- ferred in previous EST-based studies [Philippe et al. 2004]. In all analyses, three major as- semblages of species can be distinguished. The first assemblage comprises animals, fungi and Amoebozoa, i.e. the “unikonts” of Stechmann (2003). The second assemblage is com- posed of green plants and rhodophytes, which form a strongly supported grouping of the primary photosynthetic eukaryotes [Rodriguez-Ezpeleta et al. 2005]. The third assemblage includes all other unicellular “bikonts” (stramenopiles, alveolates, rhizarians and exca- vates). These three major assemblages are strongly supported in the analysis of the com- plete alignment, but, with the exception of the MrBayes analysis, the support is globally weaker in the case of NMDA (Supplementary Figures S1, S3, S5 – ch.3). Although most of the supergroups of eukaryotes, including Rhizaria, are recovered in all analyses, their rela- tionships are not well resolved. In particular, the assemblage of unicellular bikonts appears as an unresolved radiation of four supergroups (Figure 1-ch.3).

Figure 1-ch.3. Next page. Consensus ML phylogenetic tree as obtained with TREEFINDER after the analysis of the complete dataset (CA). 100 bootstrap replicates were done (bootstrap support are represented by the numbers at nodes) and the unresolved nodes correspond to relationships recovered in less than 50 replicates.

46 Chapter 3: Rhizaria in a phylogenomic framework

The phylogenetic position of Rhizaria varied depending on both the type of alignment and the method of analysis. In the ML (PHYML) analysis of the complete alignment (Supplementary Figure S2-ch.3), Rhizaria branch as sister group to stramenopiles, while in the Bayesian analysis (Supplementary Figure S4-ch.3), they appear as sister group to ex- cavates. This last topology was also found in the ML analysis using TreeFinder, but in this case, the ciliates branched between Rhizaria and excavates (not shown). Both ML and Bayesian methods show Rhizaria branching as sister group to excavates in analysis of the NMD alignment, but the bootstrap support for this and other groupings was rather weak (Supplementary Figure S3-ch.3 and S5-ch.3).

To better examine the position of Rhizaria, we successively removed some fast evolv- ing lineages, which could potentially introduce systematic bias in our analyses, especially

 47 Chapter 3: Rhizaria in a phylogenomic framework with analyses of large-scale data sets [Brinkmann et al. 2005; Jeffroy et al. 2006]. In par- ticular, to avoid a long-branch attraction (LBA) artefact [Felsenstein 1978], we reanalyzed our data in absence of excavates or ciliates, which appeared particularly unstable in our analyses. These modifications of species composition had different impacts on the rhizarian position, depending on both the alignment studied and the method used. After removing both Giardia and Trichomonas, or all excavates at the same time, the topology of the CA tree remained unchanged (see supplementary figure S2-ch.3) while the NMDA topology was drastically changed as the relationship between Rhizaria and stramenopiles was recov- ered (data not shown). When ciliates were removed, Rhizaria branched as sister group to excavates in ML analyses of both complete and NMD alignments. Finally, because R. filosa has a slightly longer branch than B. natans (see Figure 1-ch.3), we tested whether B. natans alone prefers the excavate or the stramenopile position by reconstructing a TREEFINDER tree (not shown). Interestingly, it branched as sister to stramenopiles pre- venting us to rule out the possibility that the relationship between Rhizaria and excavates is due to the rapid evolutionary rates of foraminifers.

This observed instability could indicate the presence in the data of two opposite signals of similar strength (a phylogenetic and a non-phylogenetic signal) that prevent phyloge- netic methods from finding the true evolutionary tree (N. Rodriguez-Ezpeleta and H. Phil- ippe, personal communication). One way to eliminate the non-phylogenetic signal and ex- tracting the true evolutionary information is the removal of potentially saturated fast- evolving sites. To do this, we divided the fastest evolving amino acid positions in the CA in different categories according to their evolutionary rates and we inferred ML trees based on alignments successively shortened by removing a class of sites. Figure 2-ch.3 (a, b, c, d) shows the four different topologies we obtained and their occurrence (Figure 2e-ch.3). As one can notice, the relationships were very dependent on both the alignment and the method. PHYML gave a mixture of topologies B and C, while TREEFINDER mostly found topology D but also found topology B when the five fastest categories were removed. Based on these comparisons, one cannot obviously decide in favor of a particular topology as no clear pattern appears.

Figure 2-ch.3. Next page. Results of the fast-evolving sites removal analysis. (A, B, C, D) The four different topologies obtained after successively excluding classes of sites (see Materials and Meth- ods for details). The length of triangles corresponds to the branch length of the faster evolving lineage in that group and the width is proportional to the number of taxa included in our analyses. (E) Sum- mary of the different datasets analysed with for each class the length of the alignment in amino acids and the topology obtained with PHYML, TREEFINDER, and Mrbayes.

48 Chapter 3: Rhizaria in a phylogenomic framework

 49 Chapter 3: Rhizaria in a phylogenomic framework

Additionally, to assess a confidence level for the comparison of the topologies we per- formed the “approximately unbiased” (AU) test, which is considered as the least-biased and most rigorous test available to date [Shimodaira 2002]. Precisely, the only four different topologies obtained during this study (i.e. topologies in Figure 2-ch.3) were tested, given CA, NMDA, and the seven alignments resulting from the removal of class of sites. The first block of table 1-ch.3 corresponds to the comparison of the four trees given CA and shows that no topology can be rejected although topology D is just above the limit at the significance level of 0.05. Focusing on NMDA, the AU test significantly rejects topologies B and D (second block), keeping only solutions where Rhizaria are directly related to exca- vates. As we go further down, the rest of the results in table 1-ch.3 means that all topolo- gies passed the test (no rejection), except topology D which is either discarded with the shortest alignments or just above the rejection limit.

3.5 Discussion

Our data bring a new multigenic evidence for the close evolutionary relationships be- tween Foraminifera and Cercozoa. The branching of the foraminiferan R. filosa and the chlorarachniophyte B. natans receives strong bootstrap support in all our analyses. Besides, these two species branch together in all different topologies we obtained (Figure 2-ch.3). The relationships between these two phyla was previously suggested based on analyses of actin [Keeling 2001; Flakowski et al. 2005], polyubiquitin [Archibald et al. 2003a; Bass et al. 2005], RNA polymerase [Longet et al. 2003] and SSU rRNA gene [Berney and Paw- lowski 2003; Cavalier-Smith and Chao 2003a]. With more than 80 analyzed genes, our study strongly confirms these single-gene analyses, providing a compelling evidence for the monophyly of Rhizaria. However, as this supergroup is very heterogenous [Adl et al. 2005], the phylogenetic position of other putative rhizarians, especially the and acan- tharian radiolarians, should still be confirmed by multigene data.

Although the monophyly of Rhizaria (Cercozoa + Foraminifera) was ascertained by our data, their phylogenetic position in the eukaryotic tree remains questionable. Two concur- rent hypotheses on the relationships between Rhizaria and other eukaryotes were brought by our analyses, preventing us from an univocal conclusion. According to the first hypothe- sis, Rhizaria are sister group to excavates. There are several lines of evidence supporting this hypothesis: (i) all phylogenetic reconstruction methods used in this study show this association when an alignment with no missing data for R. filosa is analyzed; (ii) if ciliates are removed from the taxa sampling, this union is also recovered with the alignment of the complete dataset; (iii) topology comparisons never reject trees where Rhizaria are specifi-

50 Chapter 3: Rhizaria in a phylogenomic framework cally related to excavates and they are always the best plausible trees examined. Finally, this relationship has been previously suggested based on the presence of secondary symbio- sis with green-algae in some excavates () and some rhizarians (chlorarachniophytes) and is known as the cabozoan hypothesis [Cavalier-Smith 1999].

Table 1-ch.3. Likelihood AU Test of Alternative Tree Topologies

Alignments / Tree topologies Δ ln La AUb

Complet (CA) / Rhiz. sister to exc. chromal. (Fig. 2A) 50.5 0.147 Complet (CA) / Rhiz. sister to stram. (Fig. 2B) 14.6 0.412 Complet (CA) / Rhiz. sister to exc. (Fig. 2C) -14.6 0.802 Complet (CA) / Rhiz. sister to ciliates + exc. (Fig. 2D) 69.9 0.051 No missing data R.filosa (NMDA) / Rhiz. sister to exc. chromal. (Fig. 2A) -15.5 0.702 No missing data R.filosa (NMDA) / Rhiz. sister to stram. (Fig. 2B) 63.2 0.028 No missing data R.filosa (NMDA) / Rhiz. sister to exc. (Fig. 2C) 15.5 0.438 No missing data R.filosa (NMDA) / Rhiz. sister to ciliates + exc. (Fig. 2D) 71.1 0.044 Without 3.80 / Rhiz. sister to exc. chromal. (Fig. 2A) 42.4 0.183 Without 3.80 / Rhiz. sister to stram. (Fig. 2B) 15.8 0.386 Without 3.80 / Rhiz. sister to exc. (Fig. 2C) -15.8 0.816 Without 3.80 / Rhiz. sister to ciliates + exc. (Fig. 2D) 71.4 0.052 Without 3.70 / Rhiz. sister to exc. chromal. (Fig. 2A) 67.6 0.227 Without 3.70 / Rhiz. sister to stram. (Fig. 2B) 18.1 0.360 Without 3.70 / Rhiz. sister to exc. (Fig. 2C) -18.1 0.821 Without 3.70 / Rhiz. sister to ciliates + exc. (Fig. 2D) 37.6 0.063 Without 3.60 / Rhiz. sister to exc. chromal. (Fig. 2A) 50.7 0.138 Without 3.60 / Rhiz. sister to stram. (Fig. 2B) 11.2 0.420 Without 3.60 / Rhiz. sister to exc. (Fig. 2C) -11.2 0.793 Without 3.60 / Rhiz. sister to ciliates + exc. (Fig. 2D) 69.2 0.057 Without 3.50 / Rhiz. sister to exc. chromal. (Fig. 2A) 52.5 0.117 Without 3.50 / Rhiz. sister to stram. (Fig. 2B) 14.7 0.401 Without 3.50 / Rhiz. sister to exc. (Fig. 2C) -14.7 0.805 Without 3.50 / Rhiz. sister to ciliates + exc. (Fig. 2D) 74.7 0.036 Without 3.40 / Rhiz. sister to exc. chromal. (Fig. 2A) 57.0 0.093 Without 3.40 / Rhiz. sister to stram. (Fig. 2B) 15.5 0.374 Without 3.40 / Rhiz. sister to exc. (Fig. 2C) -15.5 0.818 Without 3.40 / Rhiz. sister to ciliates + exc. (Fig. 2D) 76.8 0.040 Without 3.30 / Rhiz. sister to exc. chromal. (Fig. 2A) 58.1 0.104 Without 3.30 / Rhiz. sister to stram. (Fig. 2B) 12.5 0.409 Without 3.30 / Rhiz. sister to exc. (Fig. 2C) -12.5 0.775 Without 3.30 / Rhiz. sister to ciliates + exc. (Fig. 2D) 80.1 0.022 Without 3.20 / Rhiz. sister to exc. chromal. (Fig. 2A) 60.3 0.095 Without 3.20 / Rhiz. sister to stram. (Fig. 2B) 14.5 0.372 Without 3.20 / Rhiz. sister to exc. (Fig. 2C) -14.5 0.794 Without 3.20 / Rhiz. sister to ciliates + exc. (Fig. 2D) 84.1 0.023 aLog likelihood difference. bApproximate Unbiased test. Underlined numbers correspond to the significant P values of the rejected topologies. Abbreviations are as follows: Rhiz. = Rhizaria; exc. = excavates; chromal. = chromalveolates; stram. = stramenopiles.

 51 Chapter 3: Rhizaria in a phylogenomic framework

More unexpected is the second hypothesis suggesting that Rhizaria are sister group to stramenopiles. The branching of Rhizaria and stramenopiles is shown by many of ML analyses (Figure 2-ch.3) and none of these trees can be statistically rejected (Table 1-ch.3). Moreover, Rhizaria also branch with stramenopiles when fast evolving excavates sequences are removed as well as when the less divergent B. natans sequence in isolation is kept. If this configuration turns out to be correct with additional evidence such as discrete charac- ters or phylogenomic analyses of other less rapidly evolving rhizarians, it would have im- portant implications on the chromalveolates hypothesis [Harper et al. 2005]. This hypothe- sis is based, among others, on a specific model of plastid evolution suggesting that both stramenopiles and alveolates (with the exception of ciliates) have a plastid derived from a single endosymbiotic event with a red algae in their common ancestor [Cavalier-Smith 1999; Harper et al. 2005]. A putative sister relationship between Rhizaria and stramenopiles would complicate the situation suggesting that either stramenopiles have acquired their secondary plastid in an independent event of endosymbiosis, or the single engulfment of a red algae occurred in a very early stage of chromalveolates evolution and the resulting plastid was secondarily lost in certain lineages, such as ciliates and Rhizaria. Although such a scenario is certainly less parsimonious than the chromalveolates or cabozoan hypotheses, none of them are actually strongly supported by multigenic data.

The uncertainty concerning the phylogenetic position of Rhizaria reflects the general difficulties in resolving the phylogeny in this part of the eukaryotic tree. Except for plants, whose position seems to be well established, the relations between all other groups of bikonts remained unresolved. This is not surprising given that even the analyses of larger datasets, with more than 100 proteins, failed to properly resolve the phylogeny of bikonts [Bapteste et al. 2002]. For example, chromalveolates were strongly supported in multigene phylogenies only when no other unicellular bikonts were present in the analyses [Rodriguez-Ezpeleta et al. 2005] and other phylogenetic analyses provided only mixed sup- port for this plastid-based view of eukaryotic relationships [Yoon et al. 2002; Yoon et al. 2004]. Despite this lack of clear support, the union of chromalveolate taxa has been poten- tially confirmed by the existence of a gene replacement in which the cytosolic GAPDH gene was duplicated and retargeted to the plastid uniquely in these taxa [Fast et al. 2001; Harper and Keeling 2003]. Nevertheless, none of these studies was directly concerned by the overall phylogeny of bikonts, which resulted in a relatively limited taxon sampling of unicellular bikonts and a lack of detailed analysis of their relationships. By adding Rhizaria and all available sequence data on stramenopiles, alveolates, and excavates, we included in our analyses all major bikont phyla, except haptophytes, cryptophytes and centrohelids.

52 Chapter 3: Rhizaria in a phylogenomic framework

However, even with such exhaustive sampling we were unable to resolve the relationships between these taxa.

The obvious question is why multigene analyses cannot reliably resolve the phylogeny of unicellular bikonts? It has been proposed that this lack of resolution observed in other EST-based phylogenies is due to the mutational saturation, phylogenetic incongruence or rapid diversification [Philippe et al. 2004]. Indeed, it has been demonstrated by single gene phylogenies that some excavates [Philippe et al. 2000b], foraminifers [Pawlowski et al. 1996] and ciliates [Philippe and Adoutte 1998] can evolve exceptionally rapidly and it cannot be excluded that most part of these genomes show accelerate rates of evolution. In our trees, this is particularly well illustrated by the case of ciliates ( + ). Although there are several evidences that ciliates share a common ancestor with apicom- plexans and [Cavalier-Smith 1993; Leander and Keeling 2003; Leander and Keeling 2004], in our analyses, they often branch as sister group to excavates (Figure 2d- ch.3), but this branching is systematically rejected by the AU test, suggesting an artifac- tual position.

The accelerated rates of evolution in some unicellular bikonts, which potentially erase the phylogenetic signal, are probably the main source of problems when inferring their evo- lutionary relationships. However, other possible causes cannot be discarded. One of them could be the rapid diversification of eukaryotes, suggested by some authors [Cavalier-Smith 2002]. In fact, the lack of resolution in early animal phylogeny compared to well resolved phylogeny of fungi (observed also in our data, see Figure 1-ch.3) has been interpreted as an indirect evidence for Cambrian explosion [Rokas et al. 2005]. However, it is not clear why such rapid diversification would occur in the unicellular bikonts, but not in other eukaryo- tes. Alternatively, it may be that the position of the root for the eukaryotic tree between unikonts and bikonts, principally based on a single genomic fusion [Stechmann and Cava- lier-Smith 2002], is not correct. Some authors indeed suggest that this root could rather be on the branch leading to opistokonts or to the common ancestor of diplo- monads/parabasalids [Arisue et al. 2005]. If this is true, then the unicellular bikonts would be paraphyletic and their phylogeny will be particularly difficult to resolve.

To conclude, resolving the phylogeny of bikonts will probably require several addi- tional efforts. As illustrated by our study, the addition of new higher-level taxa, such as Rhizaria, is not sufficient, but may help to solidify the relationships within particular su- pergroups. It is doubtful whether better resolution can be achieved only by increasing the number of analyzed genes (more EST or whole genome data). In fact, the analysis of se- lected slowly evolving genes may be more informative than the analysis of large databases,

 53 Chapter 3: Rhizaria in a phylogenomic framework as it has been shown in case of chromalveolates [Harper et al. 2005]. Also, searching for new genomic signatures may be an essential complement to multigene analyses. Finally, proper rooting of the eukaryotic tree will be crucial for an accurate interpretation of the relationships between unicellular bikonts and a better understanding of the deep phylogeny of eukarytotes.

3.6 Materials & methods

3.6.1 Construction of the alignment

Using our R. filosa ESTs as queries, we performed blastx searches against the UniProt protein database on the Swiss Institute of Bioinformatics server to find sufficiently con- served genes in a broad taxonomic sampling of eukaryotes. A homemade perl script linking the blast output and the seqret program from the EMBOSS package (http://emboss.ch.embnet.org/EMBOSSDOC/programs/html /seqret.html) allowed us to retrieve and store in different files (each corresponding to a different R. filosa gene) all se- quences from the database with an e-value < 10-40. This relatively stringent cutoff was defined in order to avoid the integration of paralogous genes. The homologous proteins were then aligned with ClustalW [Thompson et al. 1994] and kept for further analyses if (i) they showed a reasonable taxonomic distribution and (ii) were conserved enough across all eu- karyotes.

To increase the number of eukaryotes represented, we downloaded all available nucleo- tide sequences from GenBank through the browser at NCBI (http://www.ncbi.nlm.nih.gov) for the stramenopiles, ciliates, alveolates, Entamoeba, Phy- scomitrella, Rhodophyta, Strongylocentrotus, Schistosoma, Giardia, Trichomonas, Alex- andrium, and B. natans. We searched for homology between this constructed dataset and our R. filosa ESTs by performing local tblastx (threshold < 10-40) and added the resulting matching sequences to our alignments. At this point, only the genes found either in both R. filosa and B. natans or only in R. filosa had been retained. To increase the number of genes we repeated the blasting and selecting procedures using this time the B. natans se- quences as query. Overall it resulted in a dataset of homologous aligned genes containing for Rhizaria either both R. filosa and B. natans, only R. filosa, or only B. natans, in addi- tion to all other eukaryotic species. Alignments were eye-checked and refine manually with BioEdit 7.0.5 [Hall 1999] and ambiguously aligned positions were removed with Gblocks [Castresana 2000].

54 Chapter 3: Rhizaria in a phylogenomic framework

Because of the limited data for certain groups and to maximize the number of genes by taxonomic assemblage, some higher taxa were represented by different closely related spe- cies: Paramecium, Phytophthora, , Rhodophyta, (see supple- mentary table S1-ch.3 for details). To decide on the final set of genes used in this study, we checked for orthology between all the retrieved sequences for each selected genes by first carrying out a Neighbor-Joining (NJ) analysis with the program PROTDIST 3.6 [Felsenstein 2004], allowing us to discard very distant paralogous genes. To refine our selec- tion, we then constructed for each gene a maximum likelihood (ML) tree using PHYML (JTT + F + Γ4) [Guindon and Gascuel 2003] so that we were able to keep genes only where clear orthology between species could be identified.

3.6.2 Phylogenetic analyses

We concatenated all genes into alignments that were analyzed with both Maximum Likelihood (ML) and Bayesian Inference (BI). ML analyses utilized the programs PHYML [Guindon and Gascuel 2003] and TREEFINDER [Jobb et al. 2004]. Following the Akaike Information Criterion (AIC) [Posada and Buckley 2004] computed with ProtTest 1.2.6 [Abascal et al. 2005], the RtREV + F + Γ model allowing between-site rate variation was chosen (calculations were done with eight gamma categories). Coming right after according to the AIC, the WAG model was also tested and gave the exact same topologies. To esti- mate the robustness of the phylogenetic inference, we used the bootstrap method [Felsenstein 1985] with 100 pseudo-replicates generated and analyzed with PHYML and TREEFINDER.

Bayesian analyses using the WAG + F + Γ4 model were performed with the parallel version of MrBayes 3.1.2 [Ronquist and Huelsenbeck 2003]. Each inference, starting from a random tree and using four Metropolis-coupled Markov Chain Monte Carlo (MCMCMC), consisted of 1,000,000 generations with sampling every 100 generations. The average stan- dard deviation of split frequencies was used to assess the convergence of the two runs. Bayesian posterior probabilities were calculated from the majority rule consensus of the tree sampled after the initial burnin period as determined by checking the convergence of likelihood values across MCMCMC generations (corresponding to roughly 20,000 to 50,000 generations, depending on the analysis).

In subsequent analyses, amino acid positions were successively removed from the com- plete alignment according to their substitution rates. Substitution rates at sites were com- puted with the program CODEML from the PAML package [Yang 1997], given the 15 pos- sible trees uniting the bikonts when alveolates, stramenopiles, Rhizaria, and excavates are

 55 Chapter 3: Rhizaria in a phylogenomic framework defined as a multifurcation and the WAG model with all parameters to be estimated (12 categories gamma). Based on the substitution rates expressed in number of substitution per sites, we defined several categories of sites (i.e. going from the fastest evolving sites to slower evolving sites). Seven different alignments were generated, each having one category plus the faster categories of sites removed (see fig. 2 for the details).

PHYML and CODEML were executed on the Vital-IT computational facilities at the Swiss Institute of Bioinformatics (http://www.vital-it.ch). MPI-MrBayes was run at the freely available University of Oslo Bioportal (http://www.bioportal.uio.no).

3.6.3 Testing phylogenies

Phylogenetic hypotheses were tested using the approximately unbiased (AU) test [Shimodaira 2002]. For each tested tree, site likelihoods were calculated using CODEML and the AU test was performed using CONSEL [Shimodaira and Hasegawa 2001] with de- fault scaling and replicate values.

56 Chapter 3: Rhizaria in a phylogenomic framework

3.7 Supplementary material

Supplementary Table S1-ch.3. Summary of the occurrence of missing data per taxa in the com- plete dataset Species Chimera Nb of amino % of missing acids data Alexandrium tamarense - 2713 79.55 Phaeodactylum tricornutum - 4744 64.29 Giardia intestinalis - 5811 55.98 Trichomonas vaginalis - 6965 47.47 Paramecium Paramecium aurelia 7424 44.00 Paramecium tetraurelia Paramecium caudatum - 8615 35.02 Entamoeba histolytica - 8799 33.63 Phytophthora 9005 32.20 Phytophthora sojae Thalassiosira pseudona - 9281 30.00 Cryptosporidium Cryptosporidium hominis 9298 29.87 Cryptosporidium parva Bigelowiella natans - 9479 28.50 Rhodophyta Porphyra yezoensis 9531 28.13 Cyanidioschyzon merolae Reticulomyxa filosa - 9947 25.00 Physcomitrella patens - 9983 24.73 major - 10049 24.20 Theileria Theileria parva 10232 22.82 Theileria annulata cruzi - 10520 20.65 - 10999 17.04 Schistosoma japonicum - 11148 15.92 Tetrahymena thermophila - 11287 14.87 Strongylocentrotus purpuratus - 11304 14.74 Ustilago maydis - 11422 13.85 - 11762 11.28 Debaryomyces hansenii - 11896 10.27 Brachydanio rerio - 12246 7.63 Gibberella zeae - 12387 6.57 Xenopus laevis - 12448 6.11 Neurospora crassa - 12500 5.72 Orysa sativa - 12552 5.33 Yarrowia lipolytica - 12552 5.33 Cryptococcus neoformans - 12555 5.30 Anopheles gambiae - 12598 4.98 Caenorhabditis elegans - 12739 3.91 Arabidopsis thaliana - 13093 1.24 Mus musculus - 13134 0.94 Drosophila melanogaster - 13258 0.00 Homo sapiens - 13258 0.00 Summary of the species sampling and, when it was the case, the species used in the concatenation of different sequences (chimera). The number of amino acids per taxa and the occurrence of missing data in the complete dataset (CA) are also indicated.

 57 Chapter 3: Rhizaria in a phylogenomic framework

Supplementary Table S2-ch.3. List of all gene names, protein names and the number of amino- acid positions conserved for each gene alignment. Gene Protein name Number of aa po- name sitions

- Hypothetical protein 152 actg Actin 363 ant2 ADP/ATP translocase 2 209 ap1m1 AP-1 complex subunit mu-1 119 arf3 ADP-ribosylation factor 3 152 atp5b ATP synthase beta chain 219 atp6f Vacuolar ATP synthase 21 kDa proteolipid subunit 101 atp6v0c Vacuolar ATP synthase 16 kDa proteolipid subunit 120 atp6v1b2 Vacuolar ATP synthase subunit B 144 bat1 Spliceosome RNA helicase Bat1 112 cad CAD protein 204 cam Calmodulin 147 capzb F-actin capping protein beta subunit 113 cct1 T-complex protein 1 subunit alpha 200 cct5 T-complex protein 1 subunit epsilon 279 cdc42 Cell division control protein 42 homolog 135 cpox Coproporphyrinogen III oxidase 177 ctsd Cathepsin D 114 dars Aspartyl-tRNA synthetase 112 ddx48 Probable ATP-dependent RNA helicase DDX48 130 dld Dihydrolipoyl dehydrogenase 161 drg2 Developmentally-regulated GTP-binding protein 2 133 eif4a1 Eukaryotic initiation factor 4A 151 eif6 Eukaryotic translation initiation factor 6 133 etf1 Eukaryotic peptide chain release factor subunit 1 90 fah Fumarylacetoacetase 173 fbl Fibrillarin 168 gnai2 Guanine nucleotide-binding protein G(i), alpha-2 subunit 165 gnb1 Guanine nucleotide-binding protein G(I)/G(S)/G(T) beta subunit 1 138 gnb2 Guanine nucleotide-binding protein G(I)/G(S)/G(T) beta subunit 2 176 gnb2l1 Guanine nucleotide-binding protein beta subunit 2-like 1 116 got2 Aspartate aminotransferase 143 hist1h3 Histone H3.1 117 hsp90 Heat shock protein HSP 90 178 isyna1 Myo-inositol 1-phosphate synthase A1 124 magoh Protein mago nashi homolog 135 mat2a S-adenosylmethionine synthetase isoform type-2 190 mdh1 Malate dehydrogenase 153 nme1 Nucleoside diphosphate kinase A 121 nmt1 Glycylpeptide N-tetradecanoyltransferase 1 171 pdia4 Protein disulfide-isomerase A4 151 phb2 Prohibitin-2 143 phc Phosphate carrier protein 189 pmm1 Phosphomannomutase 1 135 ppih Peptidyl-prolyl cis-trans isomerase H 121 ppp1cc Serine/threonine-protein phosphatase PP1-gamma catalytic subunit 98 ppp2r5e Serine/threonine-protein phosphatase 2A 56 kDa regulatory subunit epsi- 169 lon isoform ppv Serine/threonine-protein phosphatase 6 83

58 Chapter 3: Rhizaria in a phylogenomic framework prmt1 Protein arginine N-methyltransferase 1 80 psma7 Proteasome subunit alpha type 7 183 psmc1 26S protease regulatory subunit 4 136 psmc6 26S protease regulatory subunit S10B 115 rab13 Ras-related protein Rab-13 84 rab-6 Ras-related protein Rab-6 87 rac2 Ras-related C3 botulinum toxin substrate 2 144 rad51 DNA repair protein RAD51 homolog 1 168 ran GTP-binding nuclear protein Ran 167 rfc2 Activator 1 40 kDa subunit 132 rpl10 60S ribosomal protein L10 172 rpl10a 60S ribosomal protein L10a 153 rpl15 60S ribosomal protein L15 180 rpl18a 60S ribosomal protein L18a 104 rpl23 60S ribosomal protein L23 126 rpl4 60S ribosomal protein L4 186 rpl5 60S ribosomal protein L5 181 rpl7a 60S ribosomal protein L7a 141 rplp0 60S ribosomal protein P0 220 rps18 40S ribosomal protein S18 121 rps2 40S ribosomal protein S2 96 rps3 40S ribosomal protein S3 92 rps4 40S ribosomal protein S4 186 rps9 40S ribosomal protein S9 156 rpsa 40S ribosomal protein SA 189 rpt2a 26S proteasome AAA-ATPase subunit rpt2a 101 ruvbl1 RuvB-like 1 129 sec61a1 Protein transport protein Sec61 alpha subunit isoform 1 169 smu Putative small G-protein 146 sod2 Superoxide dismutase 125 stt3a Oligosaccharyl transferase STT3 subunit homolog 178 tuba Alpha-tubulin 429 tubb Beta-tubulin 427 ube2d2 Ubiquitin-conjugating enzyme E2 D2 114 uxs1 UDP-glucuronic acid decarboxylase 1 187 wdr45l WD-repeat phosphoinositide-interacting protein 3 129 ywhae 14-3-3 protein epsilon 198

 59 Chapter 3: Rhizaria in a phylogenomic framework

Supplementary Figure S1-ch.3. Consensus ML phylogenetic tree as obtained with TREEFINDER after the analysis of the R.filosa non-missing data alignment (NMDA). 100 bootstrap replicates were done (bootstrap support are represented by the numbers at nodes) and the unresolved nodes correspond to relationships recovered in less than 50 replicates.

60 Chapter 3: Rhizaria in a phylogenomic framework

Supplementary Figure S2-ch.3. Best ML phylogenetic tree as obtained with PHYML after the analysis of the complete dataset (CA). Numbers on branches represent ML bootstrap support values generated with PHYML.

 61 Chapter 3: Rhizaria in a phylogenomic framework

Supplementary Figure S3-ch.3. Best PHYML tree inferred with the alignment having no missing data for R.filosa (NMDA). Bootstrap support values are reported in the same way as in supplemen- tary figure S2.

62 Chapter 3: Rhizaria in a phylogenomic framework

Supplementary Figure S4-ch.3. CA tree derived by Bayesian analysis. Numbers at nodes represent Bayesian posterior probabilities.

 63 Chapter 3: Rhizaria in a phylogenomic framework

Supplementary Figure S5-ch.3. NMDA tree derived by Bayesian analysis. Numbers at nodes rep- resent Bayesian posterior probabilities.

64

Chapter 4: Phylogenomics reshuffles the eukaryotic supergroups 

   F. burki, Shalchian-Tabrizi K, Minge M, Skjaeveland A, Nikolaev SI, Jakobsen KS & Pawlowski J

Published in: PLoS ONE, 2(8): e790, 200

 65 Chapter 4: Shuffling the supergroups and phylogenetic position of Rhizaria

4.1 Project Description

After our first attempt with phylogenomics (see chapter 3), we were left with mixed feelings about whether or not we would succeed in placing Rhizaria within the tree of eu- karyotes. On the one end we were rather unsuccessful in getting a globally resolved tree with our initial dataset, but on the other hand we were convinced we could go much fur- ther with more data. We thus undertook to generate ESTs for two more rhizarian species, Quinqueloculina sp. and Gymnophrys cometa (which was renamed afterwards Limnofila borokensis [Cavalier-Smith et al. 2008]), bringing up to four the number of Rhizaria avail- able for phylogenomics. Of major importance for our project, many other protists belonging to all eukaryotic supergroups were being released at about the same time by other groups (notably the Canadian Protist EST Program consortium http://megasun.bch. umont- real.ca/pepdb/pep_ main.html). As a result of this drastic increase in the diversity for which genomic data were accessible, new possibilities in the exploration of the relationships among the eukaryotic supergroups were opened.

66 Chapter 4: Shuffling the supergroups and phylogenetic position of Rhizaria

4.2 Abstract

Resolving the phylogenetic relationships between eukaryotes is an ongoing challenge of evolutionary biology. In recent years, the accumulation of molecular data led to a new evo- lutionary understanding, in which all eukaryotic diversity has been classified into five or six supergroups. Yet, the composition of these large assemblages and their relationships re- main controversial. Here, we report the sequencing of expressed sequence tags (ESTs) for two species belonging to the supergroup Rhizaria and present the analysis of a unique dataset combining 29908 amino acid positions and an extensive taxa sampling made of 49 mainly unicellular species representative of all supergroups. Our results show a very robust relationship between Rhizaria and two main clades of the supergroup chromalveolates: stramenopiles and alveolates. We confirm the existence of consistent affinities between as- semblages that were thought to belong to different supergroups of eukaryotes, thus not sharing a close evolutionary history. This well supported phylogeny has important conse- quences for our understanding of the evolutionary history of eukaryotes. In particular, it questions a single red algal origin of the chlorophyll-c containing plastids among the chro- malveolates. We propose the abbreviated name ‘SAR’ (Stramenopiles + Alveolates + Rhi- zaria) to accommodate this new super assemblage of eukaryotes, which comprises the larg- est diversity of unicellular eukaryotes.

4.3 Introduction

A well resolved phylogenetic tree describing the relationships among all organisms is one of the most important challenges of modern evolutionary biology. A current hypothesis for the tree of eukaryotes proposes that all diversity can be classified into five or six puta- tive very large assemblages, the so-called ‘supergroups’ (reviewed in [Keeling et al. 2005] and [Adl et al. 2005]). These comprise the ‘Opisthokonta’ and ‘Amoeboza’ (often united in the ‘Unikonts’), ‘’ or ‘Plantae’, ‘Excavata’, Chromalveolata’, and ‘Rhizaria’. The supergroup concept as a whole, however, has been shown to be only moderately sup- ported [Parfrey et al. 2006] and the evolutionary links among these groups are yet to be confirmed. These uncertainties may be due to the limited amounts of available data for most parts of the eukaryotic diversity. In particular, only a small fraction of the unicellular eukaryote diversity [Patterson 1999] has been subject to molecular studies, leading to im- portant imbalances in phylogenies and preventing researchers to reliably infer deep evolu- tionary relationships.

 67 Chapter 4: Shuffling the supergroups and phylogenetic position of Rhizaria

The supergroup Rhizaria [Cavalier-Smith 2002] is particularly interesting for testing dif- ferent possible scenarios of eukaryote evolution. This assemblage has only recently been described and is based exclusively on molecular data; nevertheless it is very well sup- ported in most phylogenies [Parfrey et al. 2006]. It includes very diverse organisms such as filose testate amoebae, cercomonads, chlorarachniophytes (together, core Cercozoa), fora- minifers, plasmodiophorids, haplosporidians, gromiids, and radiolarians (see [Adl et al. 2005] for an overview or [Bhattacharya et al. 1995; Cavalier-Smith 1998b; Keeling 2001; Burki et al. 2002; Longet et al. 2003; Nikolaev et al. 2004]). In opposition to Rhizaria, the mono- phyly of Chromalveolata is far from being undisputed (see [Bodyl 2005], or [Harper et al. 2005; Li et al. 2006; Parfrey et al. 2006; Patron et al. 2007]). Chromalveolates were origi- nally defined by their plastid of red algal origin that (when present) is believed to have arisen from a single secondary endosymbiosis [16]. This supergroup encompasses many ecol- ogically important photosynthetic protists, including coccolithophorids (belonging to the haptophytes), cryptophytes, , brown seaweeds (together, the chromists) and dinoflagellates (which form together with ciliates and apicomplexans the alveolates) [Cavalier-Smith and Chao 2003a; Keeling 2004].

Using a phylogenomic approach we recently confirmed the monophyly of Rhizaria and addressed the question of its evolutionary history [Burki and Pawlowski 2006]. The analy- ses of 85 concatenated nuclear protein sequences led to two potential affiliations with other eukaryotes. According to the first hypothesis, Rhizaria was sister group to an excavate clade defined by G. lamblia, T. vaginalis, and . The second hypothesis sug- gested that Rhizaria are closely related to stramenopiles, which form together with alveo- lates, haptophytes, and cryptophytes the supergroup of chromalveolates. Besides our study, the branching pattern between Rhizaria and other supergroups has been specifically evalu- ated only by Hackett et al. (2007), who reported a robust relationship between Rhizaria and members of the chromalveolates.

Here, we further address the phylogenetic position of Rhizaria within the eukaryotic tree using an extensive multigene approach. For this purpose, we have carried out two ex- pressed sequence tag (EST) surveys of rhizarian species: an undetermined foraminiferan species belonging to the Quinqueloculina (574 unique sequences, Accession Numbers: EV435154-EV435825) and Gymnophrys cometa, (Cienkowski, 1876) (628 unique sequences, Accession Numbers: EV434532-EV435153), a freshwater protist that has been shown to be part of core Cercozoa [Nikolaev et al. 2003]. Using novel EST datasets for two rhizarians [Keeling and Palmer 2001; Burki et al. 2006] and data from publicly available protists (TBestDB; http://tbestdb.bcm.umontreal.ca/searches/login.php), we constructed a taxo- nomically broad dataset of 123 protein alignments amounting to nearly 30000 unambigu-

68 Chapter 4: Shuffling the supergroups and phylogenetic position of Rhizaria ously aligned amino acid positions. Our superalignment includes several representatives for all described eukaryotic supergroups. Our results show an unambiguous relationship be- tween Rhizaria and stramenopiles, confirming the hypothesis we had previously proposed and suggesting the emergence of a new super assemblage of eukaryotes that we propose to name ‘SAR’ (stramenopiles + alveolates + Rhizaria).

4.4 Results

4.4.1 Single-gene analyses and concatenation

49 eukaryotic species representatives of all five current supergroups for which large amounts of data are available were selected. We identified 123 genes (see Supplementary Table S1-ch.4) that fulfilled the following criteria: 1) at least one of the four rhizarian spe- cies as well as at least one member of unikonts, plants, excavates, alveolates, and stra- menopiles were present in every single-gene alignment; 2) the orthology in every gene was unambiguous on the base of single-genes bootstrapped maximum likelihood (ML) trees. This second criterion is particularly important in multigene analyses in order to avoid the mixture of distant paralogs in concatenated alignments, because it would dilute the true phylogenetic signal by opposing strong mis-signals, thus preventing the recovering of deep relationships [Delsuc et al. 2005]. Similarly, it is essential to detect and discard putative candidates for endosymbiotic gene transfer (EGT) or Horizontal Gene Transfer (HGT). Hence, we submitted each of our single-gene alignments to ML reconstructions with boot- strap replications and systematically removed sequences that displayed ambiguous phylo- genetic positions for both paralogy and gene transfers. For example, we found few cases where B. natans and G. theta sequences actually corresponded to genes encoded in the genome of these species. This restrictive procedure allowed us to have a set of 123 single-gene alignments, each of them containing at least one rhizarian species, with only orthologous sequences, and virtually no gene transferred either from a plastid or from a foreign source.

One possible approach to analyze such a dataset is to build a supermatrix that is formed by the concatenation of individual genes (for a review see [Delsuc et al. 2005]). Af- ter concatenation, our final alignment contained 29908 unambiguously aligned amino acid positions. Overall, we observed an average missing data of 39% but these sites were not uniformly distributed across taxa (see Supplementary Tables S2-ch.4 and S3-ch.4 for more details). However, several studies have demonstrated that the phylogenetic power of a dataset remains as long as a large number of positions are still present in the analysis

 69 Chapter 4: Shuffling the supergroups and phylogenetic position of Rhizaria

[Philippe et al. 2004; Wiens 2005; McMahon and Sanderson 2006; Wiens 2006]. For exam- ple, Wiens (2005, 2006) demonstrated that the inclusion of highly incomplete taxa (with up to 90% missing data) in model-based phylogenies, such as likelihood or Bayesian analysis, could cause dramatic increases in accuracy.

Figure 1-ch.4. Best maximum likelihood tree of eukaryotes found using TREEFINDER, with 10 starting trees obtained with the global tree searching procedure. Numbers at nodes represent the result of the bootstrap analysis (underlined numbers; hundred bootstrap pseudoreplicates were performed) and Bayesian posterior probabilities. Black dots represent values of 100% bootstrap support (BP) and Bayesian posterior probabilities (BiPP) of 1.0. Nodes without numbers correspond to supports weaker than 50% BP and 0.8 BiPP.

70 Chapter 4: Shuffling the supergroups and phylogenetic position of Rhizaria

4.4.2 Phylogenetic position of Rhizaria

The ML and Bayesian trees inferred from the complete alignment (Figure 1-ch.4; see also Supplementary Figure S1-ch.4 and S2-ch.4) recover a number of groups observed pre- viously and are in most aspects congruent with global eukaryotic phylogenies published recently [Hackett et al. 2007; Nozaki et al. 2007; Patron et al. 2007]. A monophyletic group uniting Metazoa, Fungi, and Amoebozoa (altogether the unikonts) was robustly supported (100% bootstrap support, BP; 1.0 Bayesian posterior probability, BiPP); green plants, glau- cophytes, and rhodophytes came together, albeit only weakly supported (56% BP; this node was not recovered in the Bayesian analysis, see Supplementary Figure S2-ch.4); a group composed of haptophytes and cryptophytes, as well as excavates (without Malawi- monas that failed to consistently branch with the other excavates species) received only moderate supports for their union in the ML inference (68% and 61% BP, respectively) but 1.0 BiPP. Finally, alveolates, stramenopiles, and Rhizaria all formed monophyletic groups with 100% BP and 1.0 BiPP. Although most of the recognized eukaryotic super- groups are recovered in our analyses, the relationships among them are generally not well resolved. This is with two notable exceptions: the union of the unikonts and, much more interestingly, the strongly supported (BP = 100%; BiPP = 1.0) assemblage of strameno- piles, Rhizaria, and alveolates (clade SAR), with these last two groups being robustly clus- tered together (BP = 88%; BiPP = 1.0) (clade SR). Comparisons of substitution rates be- tween the different lineages were highly non significant at 1.25%, indicating that all species evolve at very similar rates, thus rendering unlikely a possible artifact caused by long branches (data not shown).

To further test this unexpected nested position of Rhizaria between alveolates and stramenopiles, we compared different topologies by performing the approximately unbiased (AU) test, which is considered as the least-biased and most rigorous test available to date [Shimodaira 2002]. More precisely we evaluated two questions: 1) Are Rhizaria indeed mo- nophyletic with stramenopiles and alveolates; 2) Are Rhizaria specifically related to stra- menopiles, with the exclusion of alveolates? Our analyses show that an alternative topol- ogy, which corresponded to the best topology with Rhizaria forced not to share a common ancestor with the assemblage composed of stramenopiles and alveolates (Supplementary Figure S3-ch.4; Table 1B-ch.4), had a likelihood significantly lower than the best ML tree obtained without constraint (Figure 1-ch.4; Table 1A-ch.4) at the significance level of 0.05 (P = 4e-008). On the other hand, the two other possible positions for Rhizaria within the SAR grouping (table 1D, E-ch.4) could not be significantly rejected (P = 0.112; P = 0.079, respectively), thus preventing the exclusion of a specific relationship between Rhizaria and

 71 Chapter 4: Shuffling the supergroups and phylogenetic position of Rhizaria alveolates or an early divergence of Rhizaria. In addition, we also tested the relationship between Rhizaria and excavates by evaluating all possible trees in which these two groups are monophyletic. None of these trees could be retained in the pool of plausible candidates (data not shown).

Table 1-ch.4. Likelihood AU Tests of Alternative Tree Topologies A B C D E Tree topology Fig. 1 Fig. S3 A(RS) R(SA) S(RA)

Aua 1.0 4e-008 0.895 0.112 0.079 Δ ln L b -369.2 369.2 -27.4 69.4 77.5 A, B) Comparison between topology A (best tree, corresponding to the Figure 1-ch.4) and the alterna- tive topology B (corresponding to the best tree when Rhizaria are forced not to be monophyletic with S and A, Figure S3-ch.4). C, D, E) Comparisons between topology C (best tree) and the alternative topologies D and E. Abbreviations are as follows: A = alveolates; S = stramenopiles; R = Rhizaria Underlined number corresponds to the significant P value of the rejected topology. aApproximate Unbiased Test. bLog likelihood difference.

4.5 Discussion

We present in this study the largest dataset currently available for eukaryote phylogeny combining both an extensive taxa sampling and a large amount of amino acid positions. Our analyses of this unique dataset bring a strong evidence for the assemblage of Rhizaria, stramenopiles and alveolates. Therefore we propose to label this monophyletic clade ’SAR’. Although weakly suggested in our previous multigene analysis [Burki and Pawlowski 2006], we show here using a much larger dataset that this specific grouping is in fact very robust. We confirm the existence of consistent affinities between assemblages that were thought to belong to different supergroups of eukaryotes, thus not sharing a close evo- lutionary history. The addition of about 20 relevant taxa of unicellular eukaryotes as well as more than 30 genes (to a total of 123 genes) seems to have stabilized the topology to consistently display a monophyly of SAR. Within this newly emerged assemblage, Rhizaria appear to be more closely related to stramenopiles than to alveolates, but topology com- parisons failed to discard alternative possibilities (i.e. R(SA) or S(RA)). In addition, we clearly reject the putative relationship between Rhizaria and excavates [Cavalier-Smith 1999; Burki and Pawlowski 2006], which has been already convincingly tested in [Rogers et al. 2007].

72 Chapter 4: Shuffling the supergroups and phylogenetic position of Rhizaria

Interestingly, an association between Rhizaria and stramenopiles could already be ob- served in 18S rRNA trees representing a very large diversity of eukaryotes (see for exam- ple [Polet et al. 2004; Shalchian-Tabrizi et al. 2006a; Shalchian-Tabrizi et al. 2007]). More recently, the analysis of 16 protein sequences from 46 taxa also showed a robust clade con- sisting of Rhizaria, alveolates, and stramenopiles [Hackett et al. 2007]. However, this work significantly differs from ours by rejecting the association of Rhizaria as sister to strameno- piles or as sister to all chromalveolates. Beside our much larger dataset, it is unclear why our data display more flexibility with respect to the position of Rhizaria within the SAR monophyletic clade. More comprehensive taxa sampling for both Rhizaria and strameno- piles, particularly for early diverging species (e.g., radiolarians), is likely to shed light on the internal order of divergence within SAR.

These new relationships suggest that the supergroup ‘Chromalveolata’, as originally de- fined [Cavalier-Smith 1999], does not correctly explain the evolutionary history of organ- isms bearing plastids derived from a red algae. In fact, our results confirm the lack of sup- port chromalveolates as a whole (i.e. including haptophytes and cryptophytes) received in several studies [Parfrey et al. 2006]. The phylogenetic position within the eukaryotic tree of the monophyletic group haptophytes + cryptophytes is uncertain [[Harper et al. 2005]. Globally, chromalveolates have been strongly supported by phylogenies of plastid genes and unique gene replacements in these taxa [Fast et al. 2001; Harper and Keeling 2003; Patron et al. 2004], but the monophyly of all its members has never been robustly recov- ered with nuclear loci, even using more than 18000 amino acids [Patron et al. 2007]. Over- all, the unresolved nodes between the chromalveolates lineages have prevented clear con- clusions relative to this model of evolution [Li et al. 2006; Parfrey et al. 2006].

The emergence of SAR may potentially complicate the situation of secondary endo- symbioses and questions the most parsimonious explanation of the evolution of chlorophyll- c containing plastids (see also [Bachvaroff et al. 2005; Burki and Pawlowski 2006; Shal- chian-Tabrizi et al. 2006b; Hackett et al. 2007]). At this stage at least two scenarios are conceivable, but none of them can be presently favored by concurrent topologies due to the uncertain position of the haptophytes and cryptophytes clade. First, a single en- gulfment of red algae might have occurred in a very early stage of chromalveolates evolu- tion and the resulting plastid was secondarily lost in certain lineages, such as ciliates and Rhizaria. Second, it is possible that stramenopiles (or alveolates, or even haptophytes + cryptophytes, depending on their real position within the tree) have acquired their secon- dary plastid in an independent endosymbiosis event from a red algal organism. If this latter scenario is correct, minimizing the number of endosymbiosis events as proposed by the chromalveolates hypothesis might actually not correspond to the true his-

 73 Chapter 4: Shuffling the supergroups and phylogenetic position of Rhizaria tory. So far, as many as 11 primary, secondary, and tertiary symbiotic events have been identified (see [Bodyl 2005]). Notably, two independent secondary endosymbiosis events involving have been recognized in members of excavates and Rhizaria: Eugle- nozoa and chlorarachniophytes [Rogers et al. 2007], respectively. Hence, multiplying the number of secondary endosymbiosis might better explain the phylogenetic relationships within eukaryotes than the chromalveolate hypothesis.

The new SAR supergroup implies that the major part of protist diversity shares a common ancestor. Indeed, the chromalveolates members alone already accounted for about half of the recognized species of protists and algae [Cavalier-Smith 2004]. With the addition of rhizarians, a huge variety of organisms with very different ecology and morphology are now united within a single monophyletic clade. Finding a synapomorphy that would en- dorse the unification of these groups will be the next most challenging step in the estab- lishment of eukaryote phylogeny.

4.6 Materials & methods

4.6.1 Sampling, culture and constructions of cDNA libraries

The miliolids of genus Quinqueloculina were collected in the locality called Le Bou- canet, near La Grande Motte (Camargue, France). They were sorted, picked, and cleaned by hand under the dissecting microscope. The culture of G. cometa was taken from the culture collection of IBIW RAS (Russia) and maintained as described in [Nikolaev et al. 2003]. Cells were collected by low-speed centrifugation, resuspended into five volumes of TriReagent (Invitrogen, Carlsbad, Calif.), and broken using manual pestles and adapted microtubes. Total RNA and cDNA were prepared as in [Burki et al. 2006]. EST sequenc- ing of the Quinqueloculina sp library was performed with the ABI-PRISM Big Dye Ter- minator Cycle Sequencing Kit and analysed with an ABI-3100 DNA Sequencer (Perkin- Elmer Inc., Wellesley, Mass.), all according to the manufacturer’s instructions. The G. cometa library was sequenced by Agencourt Bioscience Corporation (Beverly, Mass.).

4.6.2 Construction of the alignments

We performed TblastN searches against GenBank using as queries a rhizarian dataset made of all translated sequences (translations done with transeq, available at the Univer- sity of Oslo Bioportal; http://www.bioportal.uio.no) for R. filosa, Quinqueloculina sp., G. cometa, and B. natans. We retrieved and translated all sequences with an e-value cutoff at 10-40, accounting for 46 new genes out of a total of 126. The rest of the genes (i.e. 80 genes) corresponded to rhizarian proteins putatively homologous to sequences previously

74 Chapter 4: Shuffling the supergroups and phylogenetic position of Rhizaria used to infer large-scale phylogenies [Rodriguez-Ezpeleta et al. 2005] and available at http://megasun.bch.umontreal.ca/Software/scafos/scafos_download.html. In order to roughly check for orthology, we also added to these alignments the human sequence with the lowest e-value in our TblastN output to make sure that no closer homologs were known. These 126 genes were used to build a very well-sampled dataset by adding all available relevant species. For this purpose, we considered all species in TBestDB as well as all other bikont taxa for which sufficient sequence data were available and made a local database against which we ran TblastN searches with our rhizarian dataset (e-value threshold 10-40).

To decide on the final set of genes used in this study, we carefully tested the orthology for each of the 126 selected genes by carrying out Maximum likelihood (ML) analyses in- cluding bootstrap supports with the program TREEFINDER (JTT, 4 gamma categories and 100 bootstrap replications) [Jobb et al. 2004]. For three genes, the overall orthology could not be assessed with enough confidence and thus were removed. More generally, taxa displaying suspicious phylogenetic position were removed from the single-gene dataset. Once this pre-screen was complete, our final taxon sampling comprises 49 species and 123 genes (Supplementary Table S1-ch.4). We concatenated all single gene alignments into a supermatrix alignment using Scafos [Roure et al. 2007]. Because of the limited data for cer- tain groups and to maximize the number of genes by taxonomic assemblage, some lineages were represented by different closely related species always belonging to the same genus (for details see Supplementary Tables S2-ch.4 and S3-ch.4).

4.6.3 Phylogenomic analyzes

The concatenated alignment was first analyzed using the maximum likelihood (ML) framework encoded in TREEFINDER, with the global tree searching procedure (10 start- ing trees) [Jobb et al. 2004]. In order to double-check our topologies, we also ran RAxML (RAxML-VI-HPC-2.2.3) [Stamatakis 2006], using randomized maximum parsimony (MP) starting trees in multiple inferences and the rapid hill-climbing algorithm. Following the Akaike Information Criterion (AIC) [Posada and Buckley 2004] computed with ProtTest 1.3 [Abascal et al. 2005], the RtREV + G + F model allowing between-site rate variation was chosen (calculations were done with 6 gamma categories). The WAG model was also tested and gave the same topologies. To estimate the robustness of the phylogenetic infer- ence, we used the bootstrap method [Felsenstein 1985] with 100 pseudoreplicates in all analyses.

Bayesian analysis using the WAG + G + F model (4 gamma categories) was preformed with the parallel version of MrBayes 3.1.2 [Ronquist and Huelsenbeck 2003]. The inference,

 75 Chapter 4: Shuffling the supergroups and phylogenetic position of Rhizaria starting from a random tree and using four Metropolis-coupled Markov Chain Monte Carlo (MCMCMC), consisted of 1,000,000 generations with sampling every 100 generations. The average standard deviation of split frequencies was used to assess the convergence of the two runs. Bayesian posterior probabilities were calculated from the majority rule consensus of the tree sampled after the initial burnin period as determined by checking the conver- gence of likelihood values across MCMCMC generations (corresponding to 50,000 genera- tions, depending on the analysis).

The evolutionary rates of the selected species were calculated with the relative-rate test as implemented in RRTree [Robinson-Rechavi and Huchon 2000], by doing pairwise comparisons of two ingroups belonging to either SAR, hatptophytes + cryptophytes, exca- vates or plants relatively to the unikonts taken as outgroup.

4.6.4 Tree topology tests

To better assess the phylogenetic position of Rhizaria, we conducted topology compari- sons using the approximately unbiased (AU) test [Shimodaira 2002]. For each tested tree, site likelihoods were calculated using CODEML and the AU test was performed using CONSEL [Shimodaira and Hasegawa 2001] with default scaling and replicate values. To test the monophyly of the new assemblage SAR, we first compared our tree (Figure 1-ch.4) to the best possible tree in which Rhizaria were forced to be outside SAR, given topologi- cal constraints corresponding to a trichotomy of unikonts, stramenopiles + alveolates, and the rest of the groups represented as a multifurcation (Supplementary Figure S3-ch.4). Secondly, we evaluated the placement of Rhizaria within the SAR clade by testing the three possible branching patterns between Rhizaria, stramenopiles, and alveolates.

76 Chapter 4: Shuffling the supergroups and phylogenetic position of Rhizaria

4.7 Supplementary material

Supplementary Figure S1-ch.4. Best RAxML tree of eukaryotes. Numbers at nodes represent the result of the bootstrap analysis; black dots mean values of 100% (hundred bootstrap replicates were done). Nodes with support under 65% were collapsed.

 77 Chapter 4: Shuffling the supergroups and phylogenetic position of Rhizaria

Supplementary Figure S2-ch.4. MrBayes tree. Numbers at nodes represent the bayesian posterior probabilities.

78 Chapter 4: Shuffling the supergroups and phylogenetic position of Rhizaria

Supplementary Figure S3-ch.4. Best TREEFINDER tree in which Rhizaria were forced not to belong to SAR.

 79 Chapter 4: Shuffling the supergroups and phylogenetic position of Rhizaria

Supplementary Table S1-ch.4. Abbreviated and complete protein names

Abbreviated name Protein name actin Actin 14-3-3 14-3-3 protein acaa1 Acetyl-Coenzyme A acyltransferase 1 akr1 Aldo-keto reductase family 1 aldhy Antiquitin ant1 Adenine nucleotide translocator arf3 ADP-ribosylation factor 3 arpc1 Adaptor-related protein complex 1 atp6 ATPase, H+ transporting, lysosomal calm3 Calmodulin calr Calreticulin capz Capping protein cct6A Chaperonin containing TCP1 cct-A T complex protein 1 alpha subunit cct-B T complex protein 1 beta subunit cct-D T complex protein 1 delta subunit cct-E T complex protein 1 epsilon subunit cct-N T complex protein 1 eta subunit cct-T T complex protein 1 theta subunit ctsl1 Cathepsin L1 preproprotein drg1 Developmentally regulated GTP binding protein ef2 Elongation factor EF2 fh Fumarate hydratase precursor fibri Fibrillarin gdi2 GDP dissociation inhibitor 2 gnb Guanine nucleotide-binding protein, beta-3 subunit gnb2l Guanine nucleotide binding protein (G protein), beta polypeptide 2-like 1 gnbpa Guanine nucleotide binding protein, alpha activating polypeptide O grc5 60S ribosomal protein L10 QM protein h3 H3 histone h4 Histone H4 hla-B HLA-B associated transcript 1 hmt1 HMT1 hnRNP methyltransferase-like 2 hsp70-C Heat shock 70kDa protein hsp70-mt Heat shock 70kDa protein, mitochondrial form hsp90 Heat shock 90kDa protein 1 if2b Eukaryotic translation initiation factor 2b if2g Eukaryotic translation initiation factor 2g if6 Eukaryotic translation initiation factor 6 ino1 D-myo-inositol-3-phosphate synthase l12e-A 40S ribosomal Protein S12 l12e-D 60S ribosomal Protein L7a mcm-A minichromosome family maintenance protein 5 metap2 Methionyl aminopeptidase 2 metk S-adenosyl-methionine synthetase ndf1 NADH dehydrogenase (ubiquinone) flavoprotein 1 nop NOP5/NOP58 protein nsf1-G Vacuolar protein sorting factor 4b nsf1-J 26S proteasome AAA-ATPase regulatory subunit 6 nsf1-K 26S proteasome AAA-ATPase regulatory subunit 6a nsf1-l proteasome 26S ATPase subunit 2 nsf1-L 26S proteasome AAA-ATPase regulatory subunit 6b nsf1-M 26S proteasome AAA-ATPase regulatory subunit 4 nsf2-A Transitional endoplasmic reticulum ATPase TER ATPase nsf2-F Vesicular fusion protein nsf2

80 Chapter 4: Shuffling the supergroups and phylogenetic position of Rhizaria oplah 5-oxoprolinase orf2 putative 28 kDa protein osgep O-sialoglycoprotein endopeptidase pace2-A XPA binding protein phb Prohibitin 2 pmca2 Ca2+-ATPase pp2A-b Protein phosphatase 2 psma-B 20S proteasome alpha 1a chain psma-C 20S proteasome alpha 1b chain psma-E 20S proteasome alpha 1c chain psma-F 20S proteasome alpha 3 chain psma-I 20S proteasome alpha 1e chain psmb-K 20S proteasome beta 7 chain psmb-L 20S proteasome beta 6 chain psmb-M 20S proteasome beta 4 chain psmc6 Proteasome 26S ATPase subunit 6c psmd 26S proteasome-associated pad1 homolog pyk Pyruvate kinase rac Small GTP binding protein Rac1 rad51-A DNA repair protein RAD51 ran Ras-related nuclear protein rap1 RAS-related protein RAP1B rf1 Eukaryotic peptide chain release factor subunit 1 rpl1 60S ribosomal Protein 1 rpl11b 60S ribosomal Protein 11b rpl12b 60S ribosomal Protein 12b rpl15a 60S ribosomal Protein 13 rpl16b 60S ribosomal Protein 16b rpl17 60S ribosomal Protein 17 rpl18 60S ribosomal Protein 18 rpl19a 60S ribosomal Protein 19a rpl2 60S ribosomal Protein 2 rpl20 60S ribosomal Protein 20 rpl27 60S ribosomal Protein 21 rpl3 60S ribosomal Protein 3 rpl30 60S ribosomal Protein 30 rpl32 60S ribosomal Protein 32 rpl4B 60S ribosomal Protein 4b rpl5 60S ribosomal Protein 5 rpl7-A 60S ribosomal Protein 7a rpl9 60S ribosomal Protein 9 rpp0 60S acidic ribosomal protein P0 L10E rps1 40S ribosomal Protein 1 rps10 40S ribosomal Protein 10 rps13a 40S ribosomal Protein 13a rps14 Ribosomal protein S14 rps15 40S ribosomal Protein 15 rps16 40S ribosomal Protein 16 rps18 40S ribosomal Protein 18 rps2 40S ribosomal Protein 2 rps22a 40S ribosomal Protein 22a rps23 40S ribosomal Protein 23 rps3 40S ribosomal Protein 3 rps4 40S ribosomal Protein 4 rps8 40S ribosomal Protein 8 rps9 Ribosomal protein S9 sap40 40S ribosomal protein SA 40kDa laminin receptor 1 stk38 Serine/threonine kinase 38

 81 Chapter 4: Shuffling the supergroups and phylogenetic position of Rhizaria

suca Succinyl-CoA ligase alpha chain mitochondrial precursor? tif Translation initiation factor trs TARS protein tubulin-A Alpha-tubulin tubulin-B Beta-tubulin tubulin-G Gamma-tubulin ubc Ubiquitin-conjugating enzyme vata Vacuolar ATP synthase catalytic subunit A vatb Vacuolar ATP synthase catalytic subunit B wd WDR45-like

82 Chapter 4: Shuffling the supergroups and phylogenetic position of Rhizaria

Supplementary Table S2-ch.4. OTU (Operational Taxonomic Unit) names, number of characters, and percentage of characters included in the final alignment Taxon Characters in- % characters in- Species in chimera cluded in con- cluded in con- catenated se- catenated se- quences quences Acanthamoeba castellanii 15112 50.52 Alexandrium 7004 23.41 Alexandrium fundyense (2%) Alexandrium tamarense (43%) Arabidopsis thaliana 29383 98.24 Bigelowiella natans 12789 42.76 Blastocystis hominis 11656 38.97 Chlamydomonas 19955 66.72 Chlamydomonas incerta (36%) Chlamydomonas reinhardtii (69%) Chlamydomonas sp (1%) Cryptococcus neoformans 26795 89.59 Cryptosporidium 24814 82.96 Cryptosporidium hominis (16%) Cryptosporidium parvum (77%) Cyanidioschyzon merolae 24066 80.46 Cyanophora paradoxa 13424 44.88 Dictyostelium discoideum 29206 97.65 Drosophila 29131 97.4 Drosophila melanogaster (92%) Drosophila pseudoobscura (4%) Drosophila yakuba (1%) tenella 8077 27 Emiliania huxleyi 9523 31.84 Euglena gracilis 14840 49.61 Galdieria sulphuraria 9140 30.56 Glaucocystis nostochi- 10887 36.4 nearum Guillardia theta 10825 36.19 Gymnophrys cometa 4152 13.88 Hartmannella vermi- 11981 40 formis aroides 6350 21.23 Homo sapiens 29811 99.67 Isochrysis galbana 12531 41.89 13908 46.5 Jakoba bahamensis (39%) Jakoba libera (41%) micrum 9119 30.49 Leishmania 23497 78.56 Leishmania amazonensis (2%) Leishmania tarentolae (2%) Leishmania mexicana (2%) Leishmania major (72%) Leishmania infantum (6%) Leishmania enriettii (2%) Leishmania donovani (6%) Leishmania chagasi (2%) Leishmania braziliensis (15%) 13391 44.77 Malawimonas californiana (43%) Malawimonas jakobiformis (51%) Mastigamoeba balamuthi 15012 50.19 Mus musculus 29470 98.53 Neurospora crassa 27930 93.38

 83 Chapter 4: Shuffling the supergroups and phylogenetic position of Rhizaria

Oryza sativa 29192 97.6 marina 10169 34 Paramecium 24742 82.72 Paramecium caudatum (2%) Paramecium tetraurelia (79%) Pavlova lutheri 10738 35.9 Phaeodactylum tricornum 21096 70.53 Physcomitrella patens 18075 60.43 Phytophthora 25938 86.72 Phytophthora infestans (51%) Phytophthora palmivora (1%) Phytophthora infestans (51%) Phytophthora palmivora (1%) Phytophthora parasitica (5%) Phytophthora ramorum (7%) Phytophthora sojae (72%) Plasmodium 25454 85.1 (4%) Plasmodium yoelii (71%) Plasmodium falciparum (81%) Plasmodium chabaudi (2%) Porphyra 14825 49.56 Porphyra purpurea (1%) Porphyra yezoensis (62%) Quinqueloculina sp 6370 21.29 americana 14886 49.77 Reticulomyxa filosa 9411 31.47 Schizosaccharomyces 27021 90.34 pombe Tetrahymena 26980 90.2 Tetrahymena pyriformis (8%) Tetrahymena thermophila (92%) Thalassiosira pseu- 27578 92.2 donana Theileria 21660 72.42 Theileria annulata (62%) Theileria parva (59%) Toxoplasma gondii 19843 66.35 Trypanosoma 25063 83.8 Trypanosoma brucei (76%) Trypanosoma cruzi (66%) Ustilago maydis 26469 88.5

Supplementary Table S3-ch.4. Percentage of missing data per species and per genes. Not shown in this manuscript because of its very large size, but is available at: http://www.plosone.org/article/fetchFirstRepresentation.action?uri=info:doi/10.1371/journal.pone.0000790.s006 or upon request.

84

Chapter 5: Phylogenomics reveals a new ‘Megagroup’ including most photosynthetic eukaryotes 

   F. Burki, Shalchian-Tabrizi K & Pawlowski J

Published in: Biology Letters, 4: 366-369, 2008

 85 Chapter 5: A new megagroup of eukaryotes

5.1 Project description

Following the same general approach as described in chapter 4, we increased the taxon-sampling and added a few genes to our concatenated alignment to improve further the resolution of the tree. To our great satisfaction, this work led to better resolved deep nodes within the tree of eukaryotes.

86 Chapter 5: A new megagroup of eukaryotes

5.2 Abstract

Advances in molecular phylogeny of eukaryotes have suggested a tree composed of a small number of supergroups. Phylogenomics recently established the relationships between some of these large assemblages, yet the deepest nodes are still unresolved. Here, we inves- tigate early evolution among the major eukaryotic supergroups using the broadest multigene dataset to date (65 species, 135 genes). Our analyses provide strong support for the clustering of plants, chromalveolates, rhizarians, haptophytes and cryptomonads, thus linking nearly all photosynthetic lineages and raising the question of a possible unique ori- gin of plastids. At its deepest level, the tree of eukaryotes receives now strong support for two monophyletic megagroups comprising most of the eukaryotic diversity.

5.3 Introduction

Resolving the global tree of eukaryotes is one of the most important goals in evolution- ary biology. Molecular phylogenies, morphology, and biochemical characteristics have al- lowed the division of the majority of eukaryotic diversity into five or six putative super- groups (reviewed in [Keeling et al. 2005; Lane and Archibald 2008]); these comprise the opisthokonts and Amoebozoa (united as ‘unikonts’, [Cavalier-Smith 2002]), Plantae (or Ar- chaeplastida), Excavata, Chromalveolata, and Rhizaria (often considered as members of the so-called ‘bikonts’, [Stechmann and Cavalier-Smith 2003a]). Recent phylogenomic recon- structions based on large sequence datasets have been used to infer the relationships be- tween some of these large assemblages, notably Rhizaria have been shown to share a com- mon origin with members of the chromalveolates [Burki et al. 2007; Hackett et al. 2007; Rodriguez-Ezpeleta et al. 2007a]. However, the order of divergence among the deepest nodes remains uncertain, particularly the relationships between plants, chromalveolates, and other photosynthetic lineages (haptophytes, cryptomonads). In order to investigate early evolution among eukaryotic supergroups, we have assembled the broadest dataset to date (65 species, 135 genes representing 31,921 amino acids) and show that the eukaryotes can be divided into two highly supported monophyletic megagroups and a few less diversi- fied lineages related to the excavates.

 87 Chapter 5: A new megagroup of eukaryotes

Figure 1-ch.5. Bayesian unrooted phylogeny of eukaryotes, with a basal trichotomy representing uncertainties in the relationships between the three groups. Tree obtained from the consensus between two independent Markov chains, ran under the CAT model implemented in Phylobayes. The species colour code corresponds to the type of plastid pigments, as following: purple: chlorophyll a; green: chlorophyll a+b; red: chlorophyll a+c. The stars represent primary, secondary, or tertiary endosym- biosis. Underlined numbers at nodes represent posterior probabilities (PP) of the analysis performed with the constant sites removed / analysis performed with all sites; other numbers represent the result of the ML bootstrap analysis (BS) - Node 1 below the line: ML analysis of the full-length alignment // ML analysis with category 7 removed / ML analysis with catergories 6 + 7 removed. Black dots correspond to 1.0 PP and 100% BS; black squares correspond to 1.0 PP and the specified values of BS. The scale bar represents the estimated number of amino acid substitutions per sites.

88 Chapter 5: A new megagroup of eukaryotes

5.4 Results

We first performed a bayesian analysis on a species-rich dataset, using the powerful CAT model which has been developed to overcome systematic errors due to homoplasy [Lartillot and Philippe 2004; Lartillot et al. 2007] (Figure 1-ch.5). The tree obtained is in agreement with previously published studies; it strongly supports monophyletic groupings of unikonts (Amoebozoa, fungi, and animals), excavates, plants, stramenopiles + alveolates + Rhizaria (SAR), and haptophytes + cryptomonads (HC). This latter group appears as sister to plants, with 1.0 bayesian posterior probabilities (PP) when the constant sites were removed and 0.92 PP with the full length alignment. Remarkably, the plants + HC clade forms a strongly supported monophyletic megagroup with the SAR assemblage (1.0 PP, node 1), revealing an ancient split in the eukaryote evolution and resolving almost entirely the relationships within most ‘bikont’ supergroups.

This new megagroup received relatively low support (73% bootstrap support, BS) in the maximum likelihood (ML) analysis of the complete dataset (Figure 1-ch.5). However, because we are investigating relationships deriving from very ancient splits in the eukary- otic tree, it is likely that multiple substitutions occurred at several sites in our alignment, decreasing the true phylogenetic signal and rendering standard site-homogeneous models based on empirical matrices of amino-acid replacement (such as WAG) less accurate. To test this further, we investigated the effect of the exclusion of the fastest evolving-sites, which are more likely to be saturated and thus be the cause of model violations [Rodriguez-Ezpeleta et al. 2007b]. Not surprisingly, the removal of the noisiest positions led to a drastic increase in the statistical support for the new megagroup (94% and 97% BS when categories 7 and 6 + 7 were removed, respectively - Figure 1-ch.5).

5.5 Discussion

At its deepest level, the tree of eukaryotes presented here displays only three stems, i.e. the two highly supported megagroups, enclosing the vast majority of eukaryotic species, and the excavates. If the monophyly of excavates is further confirmed and a strong support is recovered for their possible sister position to the new megagroup, we may well be able to provide independent evidence (based on phylogenetic reconstructions) for the concept of the two primary clades of eukaryotes - unikonts and bikonts [Stechmann and Cavalier- Smith 2003b; Richards and Cavalier-Smith 2005]. This model, however, would need to be modified as the widely used dihydrofolate reductase-thymidylate synthase (DHFR-TS) gene fusion is questionable for several reasons (see discussion in [Kim et al. 2006]). Of

 89 Chapter 5: A new megagroup of eukaryotes course this does not rule out the possibility that some protists, such as Telonemia or the heliozoans that have not been placed with confidence yet [Shalchian-Tabrizi et al. 2006a; Sakaguchi et al. 2007], might represent additional independent lineages. But gen- erally, we believe that most eukaryotes fall into one of these megagroups.

As we are getting closer towards a fully resolved phylogeny for the eukaryotes, an ob- vious question of crucial importance is the position for the root. We chose, however, to show an unrooted tree as the absence of compelling information leaves the rooting of the eukaryotic tree an open question. Over the past few years independent data proposed a root lying either between unikonts and bikonts [Stechmann and Cavalier-Smith 2003b] or within excavates, e.g., basal to jakobids [Rodriguez-Ezpeleta et al. 2007a] or on the branch leading to diplomonads/parabasalids [Arisue et al. 2005]. In the absence of evidence for rooting the eukaryotes within the plants + HC + SAR megagroup, the plausible rooting scenarios together with our tree consistently suggest that this assemblage is holophyletic.

Our results bring convincing support for the clustering of almost all photosynthetic groups in a unique clade (with the notable exception of the second-hand green plastids in Euglenozoa, belonging to the excavates), and sustain a single primary endosymbiotic event as also suggested by gene-based models of the import machinery [Mcfadden and van Dooren 2004]. The strongest scenario to date for the evolution of primary plastid-containing species is that a unique endosymbiosis involving a cyanobacterium took place in the last common ancestor of Plantae (see [Bhattacharya et al. 2007]). The trees presented here al- low the possibility that the primary plastid was established even before in one of the an- cestors of the new megagroup, and was subsequently lost and independently replaced by plastids of secondary origin in several lineages (HC, Rhizaria, alveolates and stramenopiles), corroborating the hypothesis of an early chloroplast acquisition in eukaryotes based on the phylogeny of the 6-phosphogluconate dehydrogenase gene [Andersson and Roger 2002] (see also [Nozaki 2005] for a more general discussion). We speculate that the high observable diversity of plastids within the new megagroup can be traced back to its last common an- cestor, and is the consequence of an increased capability of all its members to accept and keep plastids or plastid-bearing cells.

5.6 Materials & methods

Our multigene dataset was assembled according to a custom pipeline, as followed: a) construction of databases made of all existing sequences for species specifically selected for their broad taxonomic distribution and availability of genomic sequences (downloaded from

90 Chapter 5: A new megagroup of eukaryotes http://www.ncbi.nlm.nih.gov/ and http://amoebidia.bcm.umontreal.ca/pepdb/searches/ welcome.php ; b) BLAST searches against these databases using as queries the single-gene sequences composing our previously described multiple alignments [Burki et al. 2007]; c) retrieving (with a stringent e-value cutoff at 10-50) and addition of the new homologous copies to the existing single-genes alignments; d) automatic alignments using Mafft [Katoh et al. 2002], followed by manual inspection to extract unambiguously aligned positions; e) testing the orthology, in particular possible lateral or endosymbiotic gene transfer, for each of the selected genes by performing single-gene maximum likelihood (ML) reconstructions using Treefinder (WAG, 4 gamma categories; [Jobb et al. 2004]); the final concatenation of all single-gene alignments was done using Scafos [Roure et al. 2007]. Because of the limited data for certain groups and to maximize the number of genes by taxonomic assemblage, some lineages were represented by different closely related species always belonging to the same genus (supplementary material, this chapter). Potential interesting species with full genome available, such as the excavates Giardia and Trichomonas, or the red algae Cya- nidioschyzon, have been discarded from our taxon sampling because of their extreme rate of sequence evolution or their demonstrated tendency to led to systematic errors in phylo- genies [Rodriguez-Ezpeleta et al. 2007b].

The concatenated alignment was analyzed using both bayesian (BI) and maximum likelihood (ML) frameworks, with Phylobayes v.2.3 [Lartillot and Philippe 2004] and RAxML-VI-HPC v.2.2.3 [Stamatakis 2006], respectively. Phylobayes was run using the site-heterogeneous mixture CAT model and two independent Markov chains with a total length of 10,000 cycles, discarding the first 4000 points as burn-in and calculating the pos- terior consensus on the remaining 6000 trees. The convergence between the two chains was checked and always led to the exact same tree, except for uncertainties in the order of divergence between the glaucophytes, the red algae, and HC. In order to reduce mixing problems of the chains, the constant sites were removed from the alignment in a subse- quent analysis. The convergence was in this case much quicker, after only 5000 cycles (burn-in of 1000), and HC was unambiguously positioned as sister to the Plantae. RAxML was used in combination with the WAG amino acid replacement matrix and stationary amino acid frequencies estimated from the dataset. The best ML tree was determined with the PROTMIX implementation, in a multiple inferences using 20 randomized maximum parsimony (MP) starting tree. Statistical support was evaluated with 100 bootstrap repli- cates. Two independent runs were performed on each replicate, using a different starting tree (maximum parsimony - MP, and the best ML tree), in order to prevent the analysis to get trapped in a local maximum. The tree with the best log likelihood was selected for each replicate, and the 100 resulting trees were used to calculate the bootstraps propor-

 91 Chapter 5: A new megagroup of eukaryotes tions. To save computational burden, the PROTMIX solution was chosen with 25 distinct rate categories. To minimize potential systematic errors associated with saturation and ho- moplasy, the fast-evolving sites were identified using PAML [Yang 1997], given the 20 to- pologies obtained in the ML analysis. Sites were classified according to their mean site-wise rates and ML bootstrap values were computed from shorter concatenated alignments with sites corresponding to categories 7 and 6 + 7 removed.

92 Chapter 5: A new megagroup of eukaryotes

5.7 Supplementary material

Supplementary Table S1-ch.5. OTU (Operational Taxonomic Unit) names and chimera construc- tions. Table listing the full species names used to build the concatenated alignment, as well as specify- ing the species used to construct chimera. The new species in our alignment, compared to our previ- ously published dataset [Burki et al. 2007], are marked in bold. Taxon Species in chimera Acanthamoeba castellanii Alexandrium Alexandrium fundyense (2%) Alexandrium tamarense (41%) Arabidopsis thaliana Aureococcus anophagefferens Bigelowiella natans Blastocystis hominis Brassica Brassica_rapa (73%) Brassica_napus (2%) Cercomonas longicauda Chlamydomonas Chlamydomonas incerta (32%) Chlamydomonas reinhardtii (71%) Chlamydomonas sp (1%) Cryptococcus neoformans Cryptosporidium Cryptosporidium hominis (15%) Cryptosporidium parvum (79%) Cyanophora paradoxa Dictyostelium discoideum Drosophila Drosophila melanogaster (93%) Drosophila pseudoobscura (4%) Drosophila yakuba (1%) Eimeria tenella Emiliania huxleyi Euglena gracilis Gymnophrys cometa Galdieria sulphuraria Gallus gallus Glaucocystis nostochinearum Gracilaria changii Guillardia theta Hartmannella vermiformis Histiona aroides Homo sapiens Isochrysis galbana Jakoba Jakoba bahamensis (38%) Jakoba libera (36%) brevis Karlodinium micrum Leishmania Leishmania amazonensis (2%) Leishmania tarentolae (2%) Leishmania mexicana (2%) Leishmania major (72%) Leishmania infantum (6%) Leishmania enriettii (2%) Leishmania donovani (6%) Leishmania chagasi (2%) Leishmania braziliensis (15%) Mastigamoeba balamuthi Quinqueloculina sp Monosiga brevicollis Mus musculus

 93 Chapter 5: A new megagroup of eukaryotes

Naegleria Naegleria_gruberi (90%) Naegleria_fowleri (1%) Nematostella vectensis Neurospora crassa Oryza sativa Ostreococcus lucimarinsus Oxyrrhis marina Paramecium Paramecium caudatum (2%) Paramecium tetraurelia (79%) Pavlova lutheri Phaeodactylum tricornutum Phycomyces blakesleeanus Physcomitrella patens Phytophthora Phytophthora infestans (51%) Phytophthora palmivora (1%) Phytophthora parasitica (5%) Phytophthora ramorum (7%) Phytophthora sojae (72%) Pinus taeda Plasmodium Plasmodium berghei (4%) Plasmodium yoelii (71%) Plasmodium falciparum (81%) Plasmodium chabaudi (2%) Porphyra Porphyra purpurea (1%) Porphyra yezoensis (62%) Reticulomyxa filosa Reclinomonas americana marylandensis Schizosaccharomyces pombe Sorghum bicolor lipophora Tetrahymena Tetrahymena pyriformis (8%) Tetrahymena thermophila (92%) Thalassiosira pseudonana Theileria Theileria annulata (62%) Theileria parva (59%) Toxoplasma gondii Trichoplax adhaerens Trypanosoma Trypanosoma brucei (76%) Trypanosoma cruzi (66%) Ustilago maydis Volvox carteri

Supplementary Table S2-ch.5. Percentage of missing data per species and per genes. Detailed list of the 135 genes used, precising the amount of missing data per species Not shown in this manuscript because of its very large size, but is available at: http://dx.doi.org/10.1098/rsbl.2008.0224 or upon request.

94

Chapter 6: Early evolution of eukaryotes: two enigmatic heterotrophic groups are related to photosynthetic chromalveolates 

   F. Burki, Inagaki Y, Brate J, Archibald JM, Keeling PJ, Cavalier-Smith T, Horak A, Sakaguchi M, Hashimoto T, Klaveness D, Jakobsen KS, Pawlowski J & Shalchian- Tabrizi K

Submitted to: Genome Biology and Evolution

 95 Chapter 6: Deep evolution of eukaryotes

6.1 Project description

We have seen in the previous chapters that phylogenomics is a powerful tool that al- lows one to infer ancient divergences in the eukaryotic tree. It is also a method of choice to reconstruct the phylogenetic position of species that have remained unplaced in the global phylogeny, despite years of effort using single or few genes (see for example {Minge, 2009, p08461} in the annexes section of this manuscript). In this chapter, we describe our recent analyses of two such orphan groups, telonemids and centrohelid heliozoans. In order to obtain sufficient amounts of genomic data to position these groups using our phyloge- nomic dataset, we sequenced by 454 pyrosequencing dedicated cDNA libraries for Te- lonema subtilis and Raphidiophrys contractilis.

96 Chapter 6: Deep evolution of eukaryotes

6.2 Summary

Understanding the early evolution and diversification of the eukaryotic cell relies on a fully resolved phylogenetic tree [Minge et al. 2008; Roger and Simpson 2009]. In recent years, most eukaryotic diversity has been assigned to six putative supergroups, but the evolutionary origin of a few major ‘orphan’ lineages remains elusive [Keeling et al. 2005; Lane and Archibald 2008]. Two ecologically important orphan groups are the heterotrophic Telonemia and Centroheliozoa: both have been proposed to be related to the photosyn- thetic cryptomonads and haptophytes [Shalchian-Tabrizi et al. 2006a; Cavalier-Smith and von der Heyden 2007], but molecular phylogenies have failed to provide strong support for any phylogenetic hypothesis. Here we investigate their origins using genome sequence sur- veys of a telonemid, a centrohelid , a , and two cryptomonads. Phylogenetic analyses of 127 genes from 72 species indicate that telonemids and centrohelids are two dis- tinct early diverging lineages that are members of an emerging major group of eukaryotes also comprising cryptomonads and haptophytes. Furthermore, this group is possibly closely related to the SAR clade comprising stramenopiles, alveolates, and Rhizaria [Burki et al. 2007; Hackett et al. 2007]. This links two additional heterotrophic lineages to the predomi- nantly photosynthetic supergroup chromalveolates, providing a framework for interpreting the evolution of eukaryotic cell structures and the diversification of plastids.

6.3 Results and Discussion

6.3.1 Evolutionary origin of Telonemia and Centroheliozoa

The phylum Telonemia encompasses only two formally described heterotrophic zooflag- ellate species, Telonema subtilis and Telonema antarcticum [Klaveness et al. 2005], but a recent study of environmental sequences identified a large number of unknown representa- tives of the group [Shalchian-Tabrizi et al. 2007]. Telonemids are of pivotal evolutionary significance because they exhibit a combination of cellular structures that have only been found separately in different eukaryotic lineages, suggesting they may represent a ‘missing link’ between deeply diverging eukaryotes [Shalchian-Tabrizi et al. 2006a]. Thus far, mo- lecular support for the position of telonemids relative to other eukaryotes remains unclear, although a 3-gene analysis weakly suggested a position close to plastid-bearing crypto- monads [Shalchian-Tabrizi et al. 2006a].

The heliozoans, on the other hand, are a large and diverse group that have recently been shown to be polyphyletic [Nikolaev et al. 2004; Sakaguchi et al. 2005]. While some heliozoans are now known to belong to Rhizaria or stramenopiles, one group, the Centrohe-

 97 Chapter 6: Deep evolution of eukaryotes liozoa, represents the last substantial eukaryotic group that has not yet been clearly placed in the tree. Based on weakly supported 18S ribosomal RNA trees and some intriguing ul- trastructural similarities, a possible relation between centrohelids and haptophytes was re- cently suggested [Cavalier-Smith and von der Heyden 2007], but molecular trees (even based on as many as seven genes) are generally inconsistent and inconclusive [Sakaguchi et al. 2007].

98 Chapter 6: Deep evolution of eukaryotes

To infer the phylogenetic position of telonemids and centrohelids, we obtained ~200,000 and ~360,000 sequence reads from Telonema subtilis (a telonemid) and Raphidiophrys contractilis (a centrohelid), respectively, using 454 pyrosequencing. New data for the cryp- tomonads Guillardia theta and Plagioselmis nannoplanctica as well as the haptophytes Imantonia rotunda were also generated (see Experimental procedures), leading to a much improved sampling for the /haptophyte group compared to earlier phyloge- nomic studies [Patron et al. 2007; Burki et al. 2008]. These sequences were used to con- struct a multigene alignment (supermatrix) containing 127 genes (29,235 amino acid posi- tions) and a taxon-rich sampling of 72 species belonging to all supergroups of eukaryotes. Importantly, all species were carefully selected to minimize the impact of heterogeneity in evolutionary rates, which might have caused tree reconstruction artifacts in recently pub- lished trees [Cavalier-Smith 2009]. This concatenated dataset was first analyzed by Baye- sian (Phylobayes – CAT model [Lartillot and Philippe 2004]) and maximum likelihood (ML) methods (RAxML – RTREV model [Stamatakis 2006]).

Figure 1-ch.6. Previous page. Unrooted Bayesian phylogeny of eukaryotes. The tree was obtained from the consensus between two independent Markov chains, run under the CAT model implemented in phylobayes. Identical relationships were obtained in our “separate” analysis (see text for Discussion). The curved dashed lines indicate the alternative branchings recovered in the ML analysis of the con- catenated dataset. Black dots correspond to 1.0 posterior probability (PP) and 100% ML bootstrap (BP). Values at node represent PP (above) and BP (below) when not maximal. The RELL bootstrap values are also shown for the three main nodes (RBP). Black squares indicate the constrained bifurca- tions used in the separate analysis. The white thick bars are the groups that were originally included in the chromalveolates. Assemblages indicated by capitalized names correspond to the hypothetical supergroups of eukaryotes. The scale bar represents the estimated number of amino acid substitutions per site.

 99 Chapter 6: Deep evolution of eukaryotes

Figure 1-ch.6 shows an unrooted Bayesian consensus tree, with ML bootstrap values (BP) and Bayesian posterior probabilities (PP) indicated. All main eukaryotic assemblages were recovered with moderate to maximum support and are consistent with the most re- cent published studies of eukaryote evolution [Burki et al. 2007; Rodriguez-Ezpeleta et al. 2007a; Burki et al. 2008; Minge et al. 2008; Hampl et al. 2009]. Amoebozoa and opisthok- onts robustly grouped together (the unikonts) to the exclusion of excavates (only recovered in ML analyses) and a megagroup composed of all other eukaryotes (78% BP; 1.0 PP). Within this megagrouping, Plantae were monophyletic, as were the stramenopiles, alveo- lates and Rhizaria clade (the SAR group [Burki et al. 2007]), and the haptophytes and cryptomonads. Remarkably, in all analyses T. subtilis and R. contractilis grouped with cryptomonads and haptophytes in a moderately supported clade (node 1, Figure 1-ch.6: 70% BP; 0.88 PP). Within this group (henceforth referred to as the CCTH group), the re- lationships are essentially unresolved and the Bayesian and ML methods yielded different and unsupported alternatives (Figure 1-ch.6). In addition, all analyses placed the CCTH group as sister to the SAR group with moderate support (node 2, Figure 1-ch.6: 65% BP; 0.99 PP) and this major assemblage branched with the plants (78% BP; 1.0 PP).

Previous studies have shown that phylogenomic analyses treating multigene datasets as concatenated alignments may not sufficiently account for the evolutionary specificities of each gene and potentially introduce tree reconstruction artifacts [Bapteste et al. 2002; Phil- ippe et al. 2004; Patron et al. 2007]. We therefore conducted a “separate” analysis that takes into account the difference in evolutionary tempos and modes across genes. This analysis specifically examined the relationships among 8 major groups — (1) T. subtilis, (2) R. contractilis, (3) cryptomonads, (4) haptophytes, (5) the SAR group, (6) Plantae, (7) excavates, and (8) unikonts (opisthokonts + Amoebozoa). Because not all genes in the original selection contained at least one representative taxon for each group of interest, a subset of 87 genes (amounting to 19,270 aa) was selected from the total 127 genes used in the concatenation. This analysis resulted in the same relationships observed in the Baye- sian analysis of the supermatrix, notably recovering a T. subtilis plus R. contractilis clade (65% RELL BP) that formed a group with haptophytes and cryptomonads (69% RELL PB; Figure 1-ch.6). Furthermore, this approach was consistent with the concatenated analysis in positioning the CCTH group together with the SAR group (92% RELL BP). Topology comparisons using the approximate unbiased (AU) test strongly confirmed the monophyletic association between the CCTH and SAR lineages. Indeed, only 19 out of 351 test trees were not rejected at the 5% level, among which 18 trees contained a clade com- prising T. subtilis, R. contractilis, haptophytes, cryptomonads, and the SAR group, to the exclusion of all other eukaryotes (Table 1-ch.6).

100 Chapter 6: Deep evolution of eukaryotes

Table 1-ch.6. The details of the test trees not rejected at 5% level in AU test.

ID1 Tree topology2 ΔlnL Distance from the ML tree (SE) P value

1* (Telo,Raph,((Cryp,Hapt),(SAR,(Plan,(Uni+Ex))))) (ML) – 0.899

2* (Telo,Hapt,((Cryp,Raph),(SAR,(Plan,(Uni+Ex))))) 21.2 0.46 0.597

3* (Telo,Raph,((SAR,(Cryp,Hapt)),(Plan,(Uni+Ex)))) 21.9 0.77 0.413

4* (Telo,Cryp,((SAR,(Raph,Hapt)),(Plan,(Uni+Ex)))) 68.1 1.22 0.254

6* (Telo,Raph,(Hapt,(Cryp,(SAR,(Plan,(Uni+Ex)))))) 46.8 1.48 0.177

10* (Telo,Raph,(Hapt,(SAR,(Cryp,(Plan,(Uni+Ex)))))) 69.8 1.65 0.148

8* (Telo,Cryp,((Raph,Hapt),(SAR,(Plan,(Uni+Ex))))) 75.4 1.57 0.127

11* (Telo,Raph,(Hapt,((Cryp,SAR),(Plan,(Uni+Ex))))) 70.3 1.71 0.125

9* (Telo,Hapt,(Raph,((Cryp,SAR),(Plan,(Uni+Ex))))) 80.6 1.59 0.100

12* (Telo,Hapt,(Cryp,(Raph,(SAR,(Plan,(Uni+Ex)))))) 76.6 1.72 0.089

15* (Telo,Raph,(Cryp,(Hapt,(SAR,(Plan,(Uni+Ex)))))) 58.5 1.86 0.089

20* (Telo,Cryp,(Hapt,(Raph,(SAR,(Plan,(Uni+Ex)))))) 86.8 1.95 0.088

23* (Telo,Cryp,(Hapt,(SAR,(Raph,(Plan,(Uni+Ex)))))) 99.2 2.04 0.081

16* (Telo,Raph,(SAR,((Hapt,Cryp),(Plan,(Uni+Ex))))) 56.5 1.89 0.080

26* (Telo,Cryp,(Hapt,((Raph,SAR),(Plan,(Uni+Ex))))) 114.2 2.22 0.078

5* (Telo,Hapt,(SAR,((Raph,Cryp),(Plan,(Uni+Ex))))) 75.8 1.36 0.074

19* (Telo,Hapt,(Cryp,((Raph,SAR),(Plan,(Uni+Ex))))) 97.6 1.91 0.065

17* (Telo,Cryp,(Raph,((Hapt,SAR),(Plan,(Uni+Ex))))) 100.6 1.89 0.053

190 (Telo,Cryp,((Hapt,SAR),((Raph,Plan),(Uni+Ex)))) 253.3 4.07 0.050

87 genes were used for the test (19,270 amino acid positions in total). We subjected 347 test trees with the Unik- onta–Excavata group that were distant from the ML tree by <5 SE units, and 4 extra trees of particular interest. 1Trees with the monophyly of Telonema, Raphidiophrys, cryptomonads, haptophytes, and SAR are highlighted by asterisks. 2Telo, Telonema; Raph, Raphidiophrys; Cryp, cryptomonads; Hapt, haptophytes; SAR, stramenopiles + alveolates + Rhizaria; Plan, Plantae; Uni+Ex, the grouping of Unikonta plus Exavata.

 101 Chapter 6: Deep evolution of eukaryotes

6.3.2 An emerging group of great diversity, and the expansion of chromalveo- lates

T. subtilis and R. contractilis are two heterotrophic unicellular eukaryotes that repre- sent groups which are among the most difficult to place within the tree of eukaryotes [Nikolaev et al. 2004; Shalchian-Tabrizi et al. 2006a; Cavalier-Smith and von der Heyden 2007; Sakaguchi et al. 2007]. By generating large molecular data sets from both lineages, we have provided convincing and congruent evidence suggesting that telonemids and centro- helids both have evolutionary affinities with haptophytes and cryptomonads, and more generally with the SAR group. Nevertheless, uncertainties remain. Despite use of a very large dataset, there are three important reasons why we failed to recover a more highly resolved topology. First, our analyses show that T. subtilis and R. contractilis are not closely related to any known eukaryotic lineage and may have diverged soon after the ori- gin of the CCTH-SAR grouping. If true, this would result in relatively few sequence syn- apomorphies accumulated during their brief period of specifically shared common ancestry and in the loss of most of the phylogenetic signal by such a long period of evolution since their origins. Second, one expects to observe a decrease in statistical support when early diverging species are added to a phylogeny [Sanderson and Wojciechowski 2000]. Accord- ingly, it is interesting to note that by removing T. subtilis and R. contractilis from our multigene alignment, support for the haptophyte/cryptomonad/SAR group and its sister- grouping with Plantae both increased substantially (Supplementary Figures 1-ch.6). Fi- nally, T. subtilis and R. contractilis are the only species representing telonemids and cen- trohelids for which genomic data are available; similar samples from additional representa- tives of these and other lineages from this part of the tree are needed before their phyloge- netic position can be determined conclusively. Of particular importance are the heterotro- phic flagellate and the uncultured biliphytes that have been proposed to be sister to cryptomonads or haptophytes based on a handful of genes [Okamoto and Inouye 2005; Not et al. 2007; Cuvelier et al. 2008; Kim and Graham 2008].

If the relationships between these lineages and the CCTH-SAR clade are confirmed, a new major assemblage is emerging with important implications for understanding the early evolution of eukaryotes. Recently, multigene analyses [Hackett et al. 2007; Patron et al. 2007] and a shared lateral transfer of bacterial rpl36 to their plastid genomes [Rice and Palmer 2006] have suggested that cryptomonads and haptophytes form a clade. Taking to- gether the evidence that katablepharids and biliphytes are related to cryptomonads, and now our demonstration that telonemids and centrohelids may also be part of this group, reveals a phylogenetic group of rapidly growing diversity and importance. Many members of cryptomonads, haptophytes, stramenopiles and alveolates possess chlorophyll-c contain-

102 Chapter 6: Deep evolution of eukaryotes ing plastids that are postulated to have originated by a single secondary endosymbiosis of a red alga in the ancestor of all these lineages. This is known as the chromalveolate hypothe- sis [Cavalier-Smith 1999]. The suggestion that Rhizaria, for which no red algal-derived plastid is known, were more related to the alveolates and stramenopiles than they are to cryptomonads and haptophytes caused some controversy over the chromalveolate hypothe- sis [Burki et al. 2007; Hackett et al. 2007; Rodriguez-Ezpeleta et al. 2007a; Burki et al. 2008]. The present study provides further confirmation for the monophy of the SAR clade, and, because of our carefully chosen taxon sampling, the likelihood that this grouping was a long-branch artefact [Cavalier-Smith 2009] is now reduced. More data are needed to completely exclude a relationship between the CCTH group and Plantae, as observed in several recent phylogenies [Patron et al. 2007; Burki et al. 2008; Minge et al. 2008; Hampl et al. 2009]. Yet, if the relationships we present here are true, the chromalveolate assem- blage should be expanded to include Rhizaria and at least two additional poorly known lineages for which plastids have never been reported, telonemids and centrohelids.

6.3.3 Implications for plastid evolution

An important question raised by our results is whether expanding the chromalveolates to include more non-photosynthetic lineages is still compatible with the history inferred from plastid phylogenies (e.g., [Iida et al. 2007; Khan et al. 2007]) and rare genomic events such as endosymbiotic gene replacements (e.g., [Harper and Keeling 2003; Patron et al. 2004]), which both suggest a photosynthetic ancestry of all chromalveolates. In this sce- nario, the ancestor of chromalveolates already contained a red algal-derived plastid that was lost on several occasions over the history of the group. The addition of telonemids, centrohelids and Rhizaria makes the non-photosynthetic chromalveolate lineages more nu- merous, requiring additional photosynthesis and/or plastid loss events to explain the ob- served distribution. However, the recent discoveries of cryptic plastids and genes of red algal origin in non-photosynthetic species indicate that the chromalveolate hypothesis re- mains reasonable [Tyler et al. 2006; Reyes-Prieto et al. 2008; Slamovits and Keeling 2008]. Thus, the genomes of the lineages we describe in this study potentially bear important in- formation in the form of remnant algal-derived genes that could shed light on early events responsible to the great eukaryotic diversity we observe today. Without such evidence, the alternative possibility that plastids were transferred between the CCTH and SAR groups by serial endosymbioses [Sanchez-Puerta and Delwiche 2008; Archibald 2009; Bodyl et al. 2009] cannot be excluded.

 103 Chapter 6: Deep evolution of eukaryotes

6.4 Experimental procedures

6.4.1 Cultures

T. subtilis was grown with Imantonia rotunda as food source in Erdschreiber medium [Foyn 1934] at 16 C for 14 days under light/dark cycles of 14h/10h, respectively, and fi- nally grown in dark for periods of 5-10 days to maximize the Telonema cells from the cul- ture. About 2 L of T. subtilis culture were harvested by centrifugation at 4000 rpm for 10 min and flash frozen in liquid nitrogen. R. contractilis was monoxenically grown with the green alga Chlorogonium elongatum as food source as described earlier [Sakaguchi and Suzaki 1999]. The cells were cultured at 20C for 14 days under light/dark periods of 14h/10h, respectively. After confirming that most of the C. elongatum cells were consumed by R. contractilis, ~0.9 liters of culture was collected by centrifugation at 500g for 3 min and the cells were transferred into RNAlater solution (Ambion, Austin, USA). Plagioselmis nannoplanctica will be described elsewhere as part of an independent project involving .

6.4.2 cDNA library construction and 454 pyrosequencing

Normalized cDNA libraries were constructed by Vertis Biotechnology AG (Germany) according to their Random-Primed (RPD) cDNA protocol. Frozen cells were ground under liquid nitrogen and total RNA isolated from the cell powder using the mirVana RNA isola- tion kit (Ambion). Poly(A)+ RNA was prepared from total RNA. First-strand cDNA syn- thesis was primed with a N6 randomized primer and second-strand cDNA was synthesized according to the classical Gubler-Hoffman protocol [Gubler and Hoffman 1983]. Double stranded DNA (dsDNA) was blunted and 454 adapters A and B ligated to the 5' and 3' ends. dsDNA carrying both adapter A and adapter B attached to its ends was selected and amplified with PCR using a proof reading enzyme (24 cycles). To ensure a reduction in highly expressed genes, an equalization of the gene representation was performed with a method developed by Vertis Biotechnology. For 454 sequencing the cDNA in the size range of 250 – 600 bp was eluted from a preparative agarose gel. Half a plate of a GSFLX instrument was sequenced for T. subtilis by the Norwegian ultra-high throughput sequenc- ing service unit at University of Oslo, yielding to about 210,000 reads. Half a plate for R. contractilis was sequenced by Macrogen Inc (South-Korea) generating about 360,000 reads.

6.4.3 Contig assembly and sequence alignment

All reads were assembled into contigs using the Newbler assembler with default pa- rameters. 26’013 contigs were produced for T. subtilis and 30’120 for R. contractilis. We searched among contigs larger than 200 bp for sequences with significant similarity to genes

104 Chapter 6: Deep evolution of eukaryotes recently used in multigene phylogenies using the following rigorous procedure [Burki et al. 2007; Burki et al. 2008]: a) BLASTP searches against the translated set of T. subtilis and R. contractilis contigs using as queries the single-gene sequences composing our multiple alignments; b) retrieving (with a stringent e-value cutoff at 10-40) and addition of the new homologous copies to existing single-genes alignments; c) automatic alignments using Mafft [Katoh et al. 2002], followed by manual inspection to remove unambiguously aligned posi- tions; d) testing the orthology, in particular possible lateral or ancestral endosymbiotic gene transfer, for each of the selected genes by performing single-gene maximum likelihood (ML) reconstructions using Treefinder (WAG, 6 gamma categories; [Jobb et al. 2004]). We retained a set of 127 genes (29’235 aa positions) that did not show any obvious problem of deep paralogy or non-vertical transmission, and 72 species excluding fast-evolving taxa used previously [Burki et al. 2008] (the rhizarians Reticulomyxa and Quinqueloculina; the stramenopile Blastocystis; the excavates Sawyeria, Leishmania and Trypanosoma) when more slowly evolving lineages were available. Importantly, careful attention was made to correctly distinguish the Imantonia sequences from the Telonema library, the Chlorogo- nium sequences from the Raphidiophrys library. These species were kept in the single genes only when unambiguous sequence attributions were recovered. Monophyletic groups corresponding to haptophytes (including Imantonia but excluding Telonema) and green algae (including Chlorogonium but excluding Raphidiophrys) were mandatory in order to consider sequences from these species for concatenation. The final concatenation of all sin- gle-gene alignments was done using Scafos [Roure et al. 2007]. Because of the limited data available for certain groups and to maximize the number of genes for each taxonomic as- semblage, some lineages were represented by different closely related species always be- longing to the same genus.

6.4.4 Phylogenetic analyses

The concatenated alignment was analyzed using both maximum likelihood (ML) and Bayesian (BI) frameworks, with RAxML-VI-HPC v.2.2.3 [Stamatakis 2006] and Phy- lobayes v.2.3 [Lartillot and Philippe 2004], respectively. RAxML was used in combination with the RTREV amino acid replacement matrix. The best ML tree was determined with the PROTGAMMA implementation in a multiple inferences using 10 randomized maxi- mum parsimony (MP) starting trees. Statistical support was evaluated with 200 bootstrap replicates. Four independent runs were performed on each replicate, using a different MP starting tree, in order to prevent the analysis from getting trapped in a local maximum. The tree with the best log likelihood was selected for each replicate, and the 200 resulting trees were used to calculate the bootstraps proportions. To save computational burden, the

 105 Chapter 6: Deep evolution of eukaryotes

PROTMIX solution was chosen with 25 distinct rate categories. Phylobayes was run using the site-heterogeneous mixture CAT model and two independent Markov chains with a total length of 8000 cycles, discarding the first 1000 points as burn-in and calculating the posterior consensus on the remaining trees. The convergence between the two chains was ascertained by comparing the frequency of their bipartitions.

In the separate analysis, we exhaustively examined the 10’395 test trees resulting from the applied constraints on 8 major groups of eukaryotes. Log-likelihoods for each test tree was calculated under the RTREV + Γ model using RAxML. RELL bootstrap values were calculated using TotalML in Molphy v.2.3 [Adachi and Hasegawa 1996]. Out of these 10’395 possible topologies, we subjected 351 test trees to the AU test using CONSEL v.0.1 [Shimodaira and Hasegawa 2001] with default scaling and replicate values. Specifically, we considered the 347 trees possessing the unikonts-excavates bifurcation that were closer than 5 standard error (SE) units from the ML tree (this restricted number of tested trees was due to computational burden), and four additional trees that were constructed by (a) mov- ing T. subtilis to the branch leading to unikonts, (b) moving T. subtilis to the branch lead- ing to excavates, (c) moving R. contractilis to the branch leading to unikonts, and (d) moving R. contractilis to the branch leading to excavates.

106 Chapter 6: Deep evolution of eukaryotes

6.5 Supplementary material

6.5.1 Analyses after removing both, or one of T. subtilis or R. contractilis

In order to see the impact on the topology and support of T. subtilis and R. contrac- tilis, analyses were performed using concatenated datasets that did not contain these spe- cies (Supp. Figure 1-ch.6), as well as with one or the other removed (Supp. Figure 2 and 3-ch.6). In all cases the major groups of eukaryotes were recovered as in Figure 1-ch.6 and the relationships among them were very consistent. Interestingly, we observed more robust support for the association between cryptomonads and haptophytes (corresponding to node 1 in Figure 1-ch.6, see the main text) and the sister position of this grouping to SAR (node 2 in Figure 1-ch.6) when both species were not included (Supp. Figure 1-ch.6). This is consistent with our interpretation of the ancient origin of telonemids and centrohelids. In- deed, if relatively few sequence synapomorphies accumulated during a brief period of shared common ancestry with cryptomonads and haptophytes, and even fewer now remain following hundreds of millions of years of divergence, one expects that T. subtilis and R. contractilis will randomly branch elsewhere in the tree, thus lowering the statistical sup- port for the whole CCTH/CCTH-SAR groups.

A “separate” analysis was also conducted on the dataset that lacked T. subtilis and R. contractilis. Here we specifically examined the relationships among 5 major groups — (1) cryptomonads plus haptophytes, (2) SAR group, (3) Plantae, (4) excavates, and (5) unik- onts (opisthokonts + Amoebozoa). 123 genes (amounting to a total of 28’166 aa), which contained at least one representative taxon for each group of interest, were selected from the 127 genes used in the concatenation. The best tree was identical to the Bayesian and ML analyses of the supermatrix and RELL values were consistently higher than those on Figure 1-ch.6 (Supp. Figure 1-ch.6).

When R. contractilis was removed from the alignment in isolation (Supp. Figure 2- ch.6), T. subtilis branched within a clade also including cryptomonads and haptophytes (CTH group), and this group was sister to SAR. The Bayesian and ML approaches gave two different unsupported topologies for the position of T. subtilis within the CTH group, a poor resolution that was also observed in Figure 1-ch.6. This means that adding R. con- tractilis, another enigmatic lineage that likely diverged soon after the origin of the CCTH- SAR grouping, did not help in recovering good support for placing the telonemids. When T. subtilis was removed to see how R. contractilis alone influenced the results (Supp. Fig- ure 3-ch.6), we again recovered the same major eukaryotic groups and relationships, nota- bly an assemblage enclosing cryptomonads, haptophytes, and centrohelids (CCH group) and its sister position to SAR. Consistent with our previous observations, the CCH group

 107 Chapter 6: Deep evolution of eukaryotes and the CCH-SAR relationship received in this analysis the lowest ML support of all analyses (60% BP), indicating once more that only weak phylogenetic signal remains in centrohelids sequences (probably less than in telonemids). It is worth noting that this lack of sequence synapomorphies resulting in poor phylogenetic signal is likely the main reason for the many unsuccessful attempts to place centrohelids in the tree of eukaryotes until this study [Cavalier-Smith and Chao 2003b; Sakaguchi et al. 2005; Cavalier-Smith and von der Heyden 2007; Sakaguchi et al. 2007].

6.5.2 Topology comparisons based on the supermatrices

To better assess the phylogenetic position of T. subtilis and R. contractilis, we con- ducted topology comparisons using the approximately unbiased (AU) test. For each tested tree, per-site log likelihoods for the supermatrices were calculated using RAxML [Stamatakis 2006] and the AU tests were performed using CONSEL [Shimodaira and Hase- gawa 2001] with default scaling and replicate values. The test trees were constructed by using the Bayesian and ML topologies shown in Figure 1-ch.6, Supp. Figure 2-ch.6 and Supp. Figure 3-ch.6 and placing T. subtilis and R. contractilis on different branches (we did not test positions within monophyletic groups that received maximal support) (Supp. Figure 4-ch.6 A-E). These analyses generally confirmed the trends observed in the tree reconstructions, that is: (1) alternative positions for T.subtilis cannot be rejected, but only if placed within or sister to the CCTH or CTH groups; an exception was a non-rejected position on the branch leading to the red algae when R. contractilis was absent from the alignment (Supp. Figure 4D-ch.6), but this branching was discarded when R. contractilis was present, underlying the importance of taxon-sampling (Supp. Figure 4A and B-ch.6). (2) Alternative positions for R. contractilis even outside that of the CCTH-SAR grouping were kept in the set of plausible trees, precisely as sister to or within the excavates (Supp. Figure 4B and E-ch.6); a sister relationship to the red algae was also accepted when T. subtilis was absent (Supp. Figure 4E-ch.6), but it was similarly rejected when both species were analyzed together (Supp. Figure A and B-ch.6). Altogether these analyses suggest once more the early origin of telonemids and centrohelids; because several deep branchings could not be rejected for centrohelids, it is possible that this group diverged even earlier. Finally, we also tested the putative relationship of cryptomonads and haptophytes with Plantae because several recent phylogenomic studies showed such position [Patron et al. 2007; Burki et al. 2008]. The AU tests did not reject this branching, and this was also the case when the whole CCTH group was placed as sister to Plantae. Importantly, the AU test based on the “separate” analysis that is described in the main text was more restric- tive: it strongly suggested that telonemids, centrohelids, crytomonads, haptophytes, and

108 Chapter 6: Deep evolution of eukaryotes

SAR are closely related, thus excluding both R. contractilis closely related to excavates and the CCTH group closely related to Plantae.

The AU tests retained in the pool of candidate trees a relationship between T. subtilis (or R. contractilis) and red algae when only one of these two species was considered. Al- though no obvious relationship with red algae was found in our single-gene tree reconstruc- tions, this signal could be explained by genes of red origin that were transferred from a red algal endosymbiont to the nucleus in the ancestor of CCTH-SAR if the chromalveolate hypothesis is correct [Lane and Archibald 2008]. In this context, it is interesting to note that a close relationship between R. contractilis and red algae was previously observed on 18S rRNA [Cavalier-Smith and Chao 2003b] and - and -tubulin phylogenies [Sakaguchi et al. 2005] and was not rejected in an analysis of six housekeeping genes [Sakaguchi et al. 2007]. Yet neither of these genes has been shown to have a red algal ancestry in chromal- veolate species.

6.5.3 Supplementary Table and Figures

Supplementary Table S1-ch.6. OTU (Operational Taxonomic Unit) name and chimera construc- tion. Percentage of missing data per species and per genes. Detailed list of the 127 genes used, describ- ing the amount of missing data per species. Not shown in this manuscript because of its very large size, but is available upon request.

 109 Chapter 6: Deep evolution of eukaryotes

Supplementary Figure 1-ch.6. Phylogeny summarizing the relationships among the major groups of eukaryotes when T. subtilis and R. contractilis are not included in the analysis. This tree was ob- tained with phylobayes ran under the CAT model (consensus between two independent Markov chains), and subsequently schematized in FigTree (http://tree.bio.ed.ac.uk/software/figtree/) with the “Cartoon” option. Black dots correspond to 1.0 posterior probability (PP) and 100% ML boot- strap (BP), otherwise values at node represent PP (above) and BP (below) when not maximal. Black squares show the constrained bifurcations used in the separate analysis and RELL bootstraps (RBP) are indicated.

110 Chapter 6: Deep evolution of eukaryotes

 

Supplementary Figure 2-ch.6. Tree representing a Bayesian phylogeny of eukaryotes when R. con- tractilis is removed, obtained from the consensus between two independent Markov chains, run under the CAT model implemented in phylobayes. The curved dashed lines indicate the alternative branchings recovered in the ML analysis of the same dataset. Black dots correspond to 1.0 posterior probability (PP) and 100% ML bootstrap (BP), otherwise values at node represent PP (above) and BP (below) when not maximal. The white thick bars are the groups that were originally included in the chromalveolates. Assemblages indicated by capitalized names correspond to the hypothetical su- pergroups of eukaryotes. The scale bar represents the estimated number of amino acid substitutions per site.

 111 Chapter 6: Deep evolution of eukaryotes



  

   

  

   

  

   



Supplementary Figure 3-ch.6. Tree representing a Bayesian phylogeny of eukaryotes when T. sub- tilis is removed, obtained from the consensus between two independent Markov chains, run under the CAT model implemented in phylobayes. The curved dashed lines indicate the alternative branchings recovered in the ML analysis of the same dataset. Black dots correspond to 1.0 posterior probability (PP) and 100% ML bootstrap (BP), otherwise values at node represent PP (above) and BP (below) when not maximal. The white thick bars are the groups that were originally included in the chromal- veolates. Assemblages indicated by capitalized names correspond to the hypothetical supergroups of eukaryotes. The scale bar represents the estimated number of amino acid substitutions per site.

112 Chapter 6: Deep evolution of eukaryotes



  

   

  

   

  

  

Supplementary Figure 4-ch.6. Summary of the AU tests based on the concatenated alignments, showing the alternative branching points that were tested (numbers on branches) and the P-values higher than 0.05. The values in circles correspond to the positions that were not rejected by the AU tests. When both T. subtilis and R. contractilis were present, only one species was moved at a time leaving the other in its inferred position. (A) Bayesian tree as in Figure 1, T. subtilis or R. contractilis were successively placed on alternative branches; (B) ML tree as in Figure 1, T. subtilis or R. contrac- tilis were successively placed on alternative branches; (C) Bayesian tree as in Figure 1, both T. subtilis and R. contractilis were successively placed on alternative branches; (D) Bayesian tree as in Supp. Fig- ure 2, T. subtilis was successively placed on alternative branches; (E) Bayesian tree as in Supp. Figure 2, T. subtilis was successively placed on alternative branches. Ma: Malawimonas; Tr: ; Di: Discoba; Re: Red algae; Gr: Green algae; Gl: Glaucophytes; Cr: Cryptomonads; Ha: Haptophytes; Te: T.subtilis; Ra: R.contractilis; Un: Unikonts. 

 113

Chapter 7: General conclusions and perspectives   

 115

Chapter 7: Conclusions and perspectives

7.1 Achievements

Our work has explored ancient evolution within the eukaryotic tree of life by means of phylogenomics, a new tool that has allowed biologists to infer phylogenetic relationships and revisit the history of life. As in any approach it has its drawbacks and one needs to remain critical when constructing and analyzing very large datasets, because statistical supports can be high for trees that are not necessarily correct. Yet we believe that the analyses of phylogenomic alignments represents one important way towards the successful reconstruction of the tree of eukaryotes. Compared to phylogenies inferred from much less data, phylogenomics generally allows extraction of the weak signal that is contained in in- dividual genes. Thus it has become possible to address questions that were out of reach not so long ago. I have discussed in this manuscript examples of the early successes of phylo- genomics, many more are to come.

When we started this project four years ago, one of the six supergroups of eukaryotes – Rhizaria– was absent from phylogenomic studies. This was a critical issue because without it discussions about early eukaryote evolution were necessarily missing an important as- pect. Although this major assemblage is still suffering from poor genomic representation in databases compared to all other supergroups, our initial efforts allowed us to include Rhi- zaria in a phylogenomic context (Chapters 2 and 3: [Burki et al. 2006; Burki and Paw- lowski 2006]). Soon after, the inference of the unexpected “SAR” group (chapter 4: [Burki et al. 2007]) and its association with haptophytes, cryptomonads and Plantae into a mono- phyletic mega-clade of eukaryotes (chapter 5: [Burki et al. 2008]) defined a new framework that will certainly have important consequences on our understanding of eukaryote evolu- tion (schematized in Figure 1-ch.7).

The placement of taxa whose phylogenetic affinities to the major groups of eukaryotes are still controversial is a particularly important issue. Indeed these species are often char- acterized by cellular properties that, if reliably placed in a robust evolutionary framework, can shed light on crucial events in evolution. We have investigated the origin of two such ‘orphan’ groups, the enigmatic eukaryotic lineages telonemids and centrohelids (chapter 6). I have also participated in the phylogenomic study of anathema, another deeply branching anaerobic amoeboflagellate eukaryote that was also unplaced in the eukaryotic tree ({Minge, 2009, p08461}, see annexes for the article). Importantly, this nomadic species is regarded as crucial for better defining the nature of the last common ancestral eukaryo- tes [Roger and Simpson 2009].

 117 Chapter 7: Conclusions and perspectives

7.2 Origin and spread of chlorophyll-c containing plastids, and the early photosynthetic eukaryotes

A recurrent issue of our work concerns the evolution of plastids. Although we have not investigated plastid evolution per se, the new phylogenetic relationships we described have important implications. In particular, we discussed the role that Rhizaria, telonemids, and centrohelids have now to play for untangling the much debated question of the spread of chlorophyll-c containing plastids (almost amusingly, in fact, as such plastids have never been reported in either groups). Central to this debate is the chromalveolate hypothesis [Cavalier-Smith 1999], and passionate discussions about whether chromalveolate plastids do have a common origin through a single endosymbiotic event will go on until unifying evi- dences are provided (see for instance the recent exchange [Bodyl et al. 2009; Lane and Archibald 2009]). A crucial point is that the consequences of plastid loss versus plastid gain for a cell are generally not well understood, yet the chromalveolate hypothesis requires a large number of losses. New data are thus needed, and only a combined effort of many forces is likely to help in evaluating further these questions. Genomics will be crucial, for example by looking for traces of an ancient photosynthetic activity (identification of plastid derived relict genes, possible evidence for plastid targeting or even a plastid genome) in modern-day non-photosynthetic lineages related to autotrophic ones. The improbable dis- covery of new key species, such as the recently characterized [Moore et al. 2008], could also bring decisive answers by filling evolutionary gaps. Finally, advanced biochemical and ultrastructural analyses of similarities and differences of plastid features will be of primary interest to better define the origin(s) of red plastids. In this context, robust phylogenetic trees provide evolutionary frameworks that are essential to rule out incompatible hypotheses.

Hence, it should be stressed that, for the time being, no one really knows how the plastids of red-algal origin spread among eukaryotes. A unique secondary endosymbiosis followed by numerous losses (what is postulated by the chromalveolate hypothesis) could be correct, of course, but we believe it should not be taken as the a priori correct sce- nario. Figure 1-ch.7 describes a new framework for eukaryote evolution, summarizing our work and also taking into account results of others (e.g., [Hackett et al. 2007]). This tree is compatible with the chromalveolate hypothesis because it represents the phylogenetic rela- tionships among the host cells and all species with chlorophyll-c containing plastids share a common ancestor, although, differently from the original description [Cavalier-Smith 1999], our tree displays a much expanded assemblage also containing Rhizaria, telonemids, cen- trohelids and possibly other lineages. But, importantly, even if this evolutionary framework

118 Chapter 7: Conclusions and perspectives is correct, it does not necessarily imply that the chromalveolate scheme is the true scenario for red plastid evolution and other valid alternatives are totally possible.

Figure 1-ch.7. The new tree of eukaryotes, resulting from a consensus of our and other’s work. The position of the root is unknown, but between Unikonts and the rest of the major groups (here called “bikonts” for simplicity), as shown on this tree, is a reasonable possibility. Within bikonts excavates are probably early diverging, followed by a megagroup composed of Plantae and two groups that could be most closely related to each other: CCTH (centrohelids, cryptomonads, telonemids, hapto- phytes, but also katablepharids and biliphytes) and SAR (stramenopiles, alveolates and Rhizaria). CCTH and SAR are colored to emphasise our contribution to the emergence of these assemblages, but also to stress the importance of this new evolutionary framework to understand the evolution of plas- tids. Colored branches correspond to photosynthetic groups (or mostly photosynthetic), and are color- coded according to plastid pigmentation. Broken colored branches means that cryptic plastids were found in these lineages, but photosynthesis was lost. Numbers in grey circles represent possible events of primary, secondary, and tertiary endosymbioses (dinoflagellates apart). Mapped on this tree are two hypotheses that both explain the current distribution of chlorophyll-c containing plastids: the chromalveolate hypothesis [Cavalier-Smith 1999] and the hypothesis of Sanchez-Puerta and Delwiche [Sanchez-Puerta and Delwiche 2008], represented by the red and blue curved dashed arrows, respec- tively. Other scenarios have been proposed but are not shown here [Bodyl et al. 2009]. Spread of sec- ondary plastids of green algal origin is not shown. Question marks in grey circles correspond to possi- ble timings for our proposition of ancestral “capacitation” to establish plastids: the shopping bag model for plastid origin [Howe et al. 2008] could have taken place very early on.

For example, an alternative model of plastid evolution that have been recently pro- posed is mapped on the schematized tree in Figure 1-ch.7 [Sanchez-Puerta and Delwiche 2008]. This model accommodates several new pieces of information such as the existence of the SAR group (i.e. Rhizaria are within the chromalveolates, yet they do not have red plastids) [Burki et al. 2007; Hackett et al. 2007] or the HGT of the rpl36 gene [Rice and

 119 Chapter 7: Conclusions and perspectives

Palmer 2006]. It involves a single secondary endosymbiosis with a red alga during the evo- lution of cryptomonads and haptophytes, and one or two tertiary endosymbioses (some lineages aside) during the evolution of stramenopiles and alveolates. However, given the uncertainties of the phylogenetic relationships we now showed within the CCTH group, and the numerous uninvestigated early diverging stramenopile and alveolate linea- ges, it is not possible for the moment to be more precise on the timing of plastid acquisi- tions in such scenarios. Multiple tertiary endosymbioses are also invoked in another model that explain the distribution of red algal-derived plastids by serial transfers, the authors insisting that plastid losses should not be favored over plastid gains [Bodyl et al. 2009].

In order to eliminate hypotheses that are inconsistent with a robust phylogeny, a cru- cial point that remains to be strongly demonstrated for understanding the current distribu- tion of red plastids is the position of the CCTH group. Our lastest results showed a sister relationship to SAR (chapter 6), but this group has proven unstable, in particular several studies recovered a close relationship with the Plantae [Patron et al. 2007; Burki et al. 2008; Hampl et al. 2009]. If the latter position gets convincing support in future studies, even when all potential artifacts are demonstrably removed, it would indeed invalidate the chromalveolate scenario because a secondary red algal plastid would then need to have been present before red algae ever originated.

More generally, the association of the SAR and CCTH groups with Plantae into a “megagroup” of eukaryotes, regardless of the relationships among these assemblages, con- tributes to the debate on the origin and evolution of eukaryotic photosynthesis. In chapter 5, we have speculated that the last common ancestor of this clade could have transmitted an increased capability to its descendants to establish plastids (no photosynthetic species are found within unikonts), resulting in an early split in eukaryote evolution with funda- mentally different properties in each part of the tree (Figure 1-ch.7). In that sense our view is quite consistent with the shopping bag model for the origin of plastids [Howe et al. 2008]. It seems indeed possible that in this last common ancestor (or even earlier, in the ancestor that also gave rise to excavates), the early stages of establishment of a durable endosymbiosis involved a process of unsuccessful attempts, where the stable symbiont ul- timately acquired by the host cell was not the first one it ever acquired; it could have been preceded by the continuous uptake of photosynthetic organisms that, for some at least, persisted for some time within the host (kleptoplasty could be an example of such an evolutionary stage). During this process, endosymbiont DNA could have been transferred and integrated into the nuclear genome, providing a pool of sequences of symbiont origin that could have subsequently facilitated plastid establishments (“capacitation”): primary plastids on the branch leading to Plantae, and secondary and tertiary plastids on several

120 Chapter 7: Conclusions and perspectives occasions (following models that have yet to be clarified, see above). Importantly, these plastid acquisitions would have been facilitated by the earlier transferred genes from the former endosymbionts (those that preceded primary plastid gain). Alternatively, primary plastids could have been established even earlier, lost and later replaced by secondary and tertiary plastids (except in Plantae which preserved primary plastids). Dinoflagellates could be a much more recent example that follows the same evolutionary pattern, with an ap- parent “facility” in replacing plastids, providing evidence in support of this hypothetical scheme [Tengs et al. 2000; Nosenko et al. 2006; Gould et al. 2008].

7.3 A molecular time-scale for eukaryote evolution: combining phylogenomics and the continuous microfossil record

A fundamental aspect that is inextricably related to the origin and history of eukaryo- tes, but that we have not discussed in this manuscript so far, is the timing of eukaryotic evolution. How ancient are eukaryotes, and when did the major groups diverge? These are the type of questions that will need to be accurately answered before one can propose a complete scenario for the tree of eukaryotes, also taking into account the essential temporal facet. Of course numerous attempts have been made. On the one hand paleontologists have for example suggested that eukaryotes originated about 1.8 billion years (Gyr) ago based on the fossil record [Zhang 1986], even 2.7 Gyr ago based on traces of biomarkers (molecular fossils) indicative of eukaryotes [Brocks et al. 1999]. But these dates are highly dependent on the controversial interpretations of early eukaryotic fossils, which are hotly debated. On the other hand molecular data have also been used to infer a time-scale of eukaryote evo- lution, but generally the various studies did not confirm the paleontological data, neither did they agree with one another (the origin of eukaryotes varies between more than 2 Gyr ago and about 950 Gyr ago depending on the calibration points and molecular markers used [Douzery et al. 2004; Hedges et al. 2004; Berney and Pawlowski 2006].

In order to reduce the discrepancies between molecular-based time estimates, and see how conflicting fossils can fit onto such a molecular time-scale, several steps can be im- proved: (1) use of a large number of genes and species to decrease phylogenetic errors asso- ciated with lack of signal and poor taxon-sampling; (2) use of advanced relaxed clock methods; and (3) most importantly, use of multiple accurate calibration intervals to ac- count for paleontological uncertainties. Taking advantage of our phylogenomic dataset, we recently undertook a study that aims to infer a molecular time-scale for eukaryote evolu- tion, fulfilling all of the above weaknesses of earlier attempts. This is an ongoing project, and only very preliminary results (thus incomplete) are available at the moment. Never-

 121 Chapter 7: Conclusions and perspectives theless I decided to briefly describe our project here, as it is for us the logical follow up of our work on phylogenomics and more accurate results should be available shortly. It is a collaborative study that, in addition to Jan Pawlowski, also involves Hervé Philippe (lead- ing the dating analyses), Colomban de Vargas and Ian Probert (provided the new cocco- lithophorid species).

A large number of genes and a broad taxon-sampling are important for reducing sto- chastic errors in phylogenetic reconstructions; this is also primordial to obtain the most ac- curate inferred dates. Our dataset used to investigate the phylogenetic position of telone- mids and centrohelids already contained over 100 genes for 72 species belonging to all ma- jor groups of eukaryotes, so this was a good starting point to build a new alignment on (see chapter 6). Another key aspect of molecular dating is the availability of several accurate calibrations [Hug and Roger 2007]. We added about 20 new species to our taxon sampling, in part selected for their calibration values (i.e., giving access to bifurcations that can be calibrated with fossils). These species typically belonged to plants or metazoans, because these groups represent the most abundant source of macrofossils that can be mapped on trees. However macrofossils are usually discrete entities that cannot be dissociated from the imperfection of the fossil record, a consequence of the non-preservation of the earliest fossils of most lineages or incorrect identification. Thus more precise complementary calibration intervals need to be obtained to avoid conflicts associated with calibration errors.

One approach that has been proposed is to use the well-documented Phanerozoic (540 million years (Myr) ago to present time) microfossil record of protists as calibration sources [Berney and Pawlowski 2006]. These microfossils have a key advantage: they usually pre- sent a continuous record, so one has access to the detailed successions of forms in the dif- ferent stratigraphic levels from their time of appearance. Thus it is in principle possible to reliably date the first radiation of certain groups, or the divergence of two lineages that do fossilized, and not be affected by uncertainties related to macrofossils. In this project we massively sequenced three protistan species which allowed us to add two new calibration intervals based on the continuous microfossil record.

Precisely, we sequenced by 454 pyrosequencing a cDNA library of an early diverging pennate diatom, Fragilaria pinnata, and two coccolithophorids (haptophytes), Calcidiscus leptoporus and Coccolithus braarudii by 454 pyrosequencing and classical Sanger method, respectively. The precise time of appearance of the pennate diatoms is unclear because of a period of poor silica deposition in the Upper Cretaceous [Kooistra and Medlin 1996]. How- ever they are abundant in the Tertiary and totally absent from the Mid-Cretaceous, 110 Myr ago. Thus, when the gap in the diatom fossil record is taken into account, this time

122 Chapter 7: Conclusions and perspectives can be conservatively chosen as a reliable upper limit for the divergence of pennate dia- toms from their centric ancestor (lower limit of 65 Myr, corresponding to the divergence of the raphid pennates). The divergence between C. leptoporus and C. braarudii is well known to have occurred in the Palaeocene [Bown et al. 2007], that is between the first oc- currence of the Coccolithaceae in the basal Danian, 64 Myr ago and the first occurrence of the Calcidiscaceae which are now known to range down to the Late Palaeocene 58 Myr ago. These two mostly unambiguous calibrations were combined to 17 others in a set of 19 calibration intervals (Table 1-ch.8), and used together with 93 species and more than 100 genes to estimate under a Bayesian relaxed clock the divergence times of the major eu- karyotic clades.

Table 1-ch.8. Nodes and corresponding calibrations.

Node* Calibration# Reference Radiation of extant crown dinoflagellate lineages 250/210 [Fensome et al. 1996] Radiation of pennate diatoms 110/65 [Kooistra and Medlin 1996] Centric/pennate diatom split -/185 [Rothpletz 1896] R. filosa/Quinqueloculina split -/525 [Culver 1991] Calcidiscus/Coccolithus split 64/58 [Bown et al. 2007] Radiation of coccolithophores -/215 [Bown 1998] Pinus/Ginkgo split -/307 [Magallón and Sanderson 2005] Radiation of Eudicots 121/0 [Magallón and Sanderson 2005] Gymnosperms/Angiosperms split 359/299 [Magallón and Sanderson 2005] Bryophytes/Angiosperms split 488/443 [Magallón and Sanderson 2005] Neurospora/Schizosaccharomyces split -/400 [Padovan et al. 2005] Mus/Rattus split 16/12 [Benton and Donoghue 2007] Primates/ split 100/62 [Benton and Donoghue 2007] Monotrenata/Theria split 191/162 [Benton and Donoghue 2007] Bird/ split 330/312 [Benton and Donoghue 2007] Actinopterygii/Sarcopterygii split 422/416 [Benton and Donoghue 2007] Origin of crown-group Eumetazoa -/635 [Peterson and Butterfield 2005] Acanthamoeba/Hartmannella split -/750 [Porter et al. 2003] Origin of Rhodophytes -/570 [Wellman et al. 2003] * Nodes onto which the calibrations apply. # Calibration intervals, as deduced from the corresponding references. Upper limit on the left of the slash, lower limit on the right of the slash. A dash means no upper known value. Bold text corresponds to the two new calibrated nodes added in this study.

 123 Chapter 7: Conclusions and perspectives

According to our preliminary estimates, the basal radiation of extant eukaryotes took place 1126-1306 Myr ago (mean 1211), and at about the same period the unikonts (1126- 1309, mean 1210) and bikonts (1102-1281, mean 1188) originated. The SAR and CCTH groups originated shortly after, between 1044-1213 Myr ago (mean 1124) and 1047-1225 (mean 1134), respectively. Interestingly, present-day red algal groups included in our analyses started to diverge 702-924 Myr ago (mean 812) and their split from the green line occurred between 1020-1191 Myr ago (mean 1101). We also observe a very ancient origin for Rhizaria, between 975-1132 (mean 1048). These results seem to give slightly older dates (but within the same range) for the most basal eukaryotic splits than the last two studies of reference, that is Berney and Pawlowski (2006) who used microfossil calibrations but only the SSU rDNA gene, and Douzery et al. (2004) who used 129 proteins but for a lim- ited taxon-sampling and only 6 macrofossil calibrations. On the other hand, they are in strong disagreement with the recent proposition of much younger basal radiations for the major groups, i.e. after the snowball earth thawed about 635 Myr ago [Cavalier-Smith and Chao 2006; Cavalier-Smith 2009].

Once again these examples of dates are only raw results and much caution is thus re- quired. Detailed analyses will follow, in particular to assess the confidence of the dates (by comparing several chains, run the chains longer), test different sets of calibrations, test dif- ferent relaxed clock models (correlated vs uncorrelated), test different topologies or alterna- tive rootings.

7.4 Other perspectives

Our work has contributed to a more accurate picture for the eukaryote evolution. But some key lineages, such as the apusomonads, collodictyonids, or radiolarians have not yet been placed within this new framework. Thank to new mass sequencing technologies, it has recently become cheaper to generate genomic data so it is to be expected that most of these orphan species will rapidly be under investigation (if not already in the pipeline of various laboratories).

Similarly, questions regarding phylogenetic uncertainties of higher taxonomic levels are likely to be soon addressed as more species are being massively sequenced. This switch to ‘within clades’ phylogenomics will for example concern the relationships among the mem- bers of the CCTH group we described in chapter 6, the relationships among Amoebozoa or within Rhizaria where many doubts remain (see our paper “Untangling the Phylogeny of Amoeboid Protists” in annexes).

124 Chapter 7: Conclusions and perspectives

These are only some directions but obviously the possibilities are endless. Yes it is a promising time for studying evolution, in particular to reconstruct the tree of life.

 125

Chapter 8: Literature cited

 127

Chapitre 8: References

Abascal, F, Zardoya, R Posada, D. 2005. ProtTest: selection of best-fit models of protein evolution. Bioinformatics. 21:2104-2105.

Abu, SM, Li, G Asiegbu, FO. 2004. Identification of Heterobasidion annosum (S-type) genes expressed during initial stages of conidiospore germination and under varying culture conditions. FEMS Micro- biol Lett. 233:205-213.

Adachi, J Hasegawa, M. 1996. MOLPHY version 2.3: Programs for molecular phylogenetics based on maximum likelihood. Comp Sci Monographs. 28:1-150.

Adl, SM et al. 2005. The new higher level classification of eukaryotes with emphasis on the taxonomy of protists. J Eukaryot Microbiol. 52:399-451.

Altschul, SF, Gish, W, Miller, W, Myers, EW Lipman, DJ. 1990. Basic local alignment search tool. J Mol Biol. 215:403-410.

Andersson, JO Roger, AJ. 2002. A cyanobacterial gene in nonphotosynthetic protists--an early chloro- plast acquisition in eukaryotes? Curr Biol. 12:115-119.

Archibald, JM. 2009. The puzzle of plastid evolution. Curr Biol. 19:R81-88.

Archibald, JM Keeling, PJ. 2004. Actin and ubiquitin protein sequences support a cerco- zoan/foraminiferan ancestry for the plasmodiophorid plant pathogens. J Eukaryot Microbiol. 51:113- 118.

Archibald, JM, Longet, D, Pawlowski, J Keeling, PJ. 2003a. A novel polyubiquitin structure in Cer- cozoa and Foraminifera: evidence for a new eukaryotic supergroup. Mol Biol Evol. 20:62-66.

Archibald, JM, Rogers, MB, Toop, M, Ishida, K Keeling, PJ. 2003b. Lateral gene transfer and the evolution of plastid-targeted proteins in the secondary plastid-containing alga Bigelowiella natans. Proc Natl Acad Sci USA. 100:7678-7683.

Arisue, N, Hasegawa, M Hashimoto, T. 2005. Root of the Eukaryota tree as inferred from combined maximum likelihood analyses of multiple molecular sequence data. Mol Biol Evol. 22:409-420.

Armbrust, E et al. 2004. The genome of the diatom Thalassiosira pseudonana: ecology, evolution, and metabolism. Science. 306:79-86.

Atkins, MS, Teske, AP Anderson, OR. 2000. A survey of flagellate diversity at four deep-sea hydro- thermal vents in the Eastern Pacific Ocean using structural and molecular approaches. J Eukaryot Microbiol. 47:400-411.

Bachvaroff, TR, Sanchez Puerta, MV Delwiche, CF. 2005. Chlorophyll c-Containing Plastid Relation- ships Based on Analyses of a Multigene Data Set with All Four Chromalveolate Lineages. Mol Biol Evol. 22:1772-1782.

Baldauf, SL. 1999. A Search for the Origins of Animals and Fungi: Comparing and Combining Mo- lecular Data. Am Nat. 154:S178-S188.

Baldauf, SL. 2003. The deep roots of eukaryotes. Science. 300:1703-1706.

Baldauf, SL Doolittle, WF. 1997. Origin and evolution of the slime molds (). Proc Natl Acad Sci USA. 94:12007-12012.

Baldauf, SL Palmer, JD. 1993. Animals and fungi are each other's closest relatives: congruent evi- dence from multiple proteins. Proc Natl Acad Sci USA. 90:11558-11562.

Baldauf, SL, Roger, AJ, Wenk-Siefert, I Doolittle, WF. 2000. A -level phylogeny of eukaryo- tes based on combined protein data. Science. 290:972-977.

 129 Chapitre 8: References

Bapteste, E et al. 2002. The analysis of 100 genes supports the grouping of three highly divergent amoebae: Dictyostelium, Entamoeba, and Mastigamoeba. Proc Natl Acad Sci USA. 99:1414-1419.

Bapteste, E Philippe, H. 2002. The potential value of indels as phylogenetic markers: position of trichomonads as a case study. Mol Biol Evol. 19:972-977.

Barker, S, Higgins, JA Elderfield, H. 2003. The future of the carbon cycle: review, calcification re- sponse, ballast and feedback on atmospheric CO2. Philos Trans R Soc Lond A. 361:1977-1998; discus- sion 1998-1979.

Bass, D, Moreira, D, López-García, P, Polet, S, Chao, EE, von der Heyden, S, Pawlowski, J Cava- lier-Smith, T. 2005. Polyubiquitin insertions and the phylogeny of Cercozoa and Rhizaria. Protist. 156:149-161.

Beiko, RG Ragan, MA. 2008. Detecting lateral genetic transfer : a phylogenetic approach. Methods Mol Biol. 452:457-469.

Benton, MJ Donoghue, PC. 2007. Paleontological evidence to date the tree of life. Mol Biol Evol. 24:26-53.

Berney, C Pawlowski, J. 2003. Revised small subunit rRNA analysis provides further evidence that Foraminifera are related to Cercozoa. J Mol Evol. 57 Suppl 1:S120-127.

Berney, C Pawlowski, J. 2006. A molecular time-scale for eukaryote evolution recalibrated with the continuous microfossil record. Philos Trans R Soc Lond B Biol Sci. 273:1867-1872.

Bhattacharya, D, Archibald, JM, Weber, AP Reyes-Prieto, A. 2007. How do endosymbionts become organelles? Understanding early events in plastid evolution. Bioessays. 29:1239-1246.

Bhattacharya, D, Helmchen, T Melkonian, M. 1995. Molecular evolutionary analyses of nuclear- encoded small subunit ribosomal RNA identify an independent Rhizopod lineage containing the and the Chlorarachniophyta. J Eukaryot Microbiol. 42:65-69.

Bodyl, A. 2005. Do plastid-related characters support the chromalveolate hypothesis? J Phycol. 41:712-719.

Bodyl, A, Stiller, JW P., M. 2009. Chromalveolate plastids: direct descent or multiple endosymbio- ses? Trends Ecol Evol. 24:119-121.

Bowler, C et al. 2008. The Phaeodactylum genome reveals the evolutionary history of diatom ge- nomes. Nature. 456:239-244.

Bown, P. 1998. Calcareous Nannofossil Biostratigraphy.

Bown, P, Dunkley Jones, T Young, J. 2007. Umbilicosphaera jordanii Bown, 2005 from the Paleogene of Tanzania: confirmation of generic assignment and a Paleocene origination for the Family Calcidis- caceae. J Nannoplankton Res. 29:25-30.

Breuker, R. 1997. Cytoskelettenkomponenten des plasmodialen Rhizopoden Reticulomyxa filosa. Dis- sertation. Ruhr-Universität, Bochum, Germany.

Brinkmann, H Philippe, H. 1999. Archaea sister group of Bacteria? Indications from tree reconstruc- tion artifacts in ancient phylogenies. Mol Biol Evol. 16:817-825.

Brinkmann, H, van der Giezen, M, Zhou, Y, Poncelin de Raucourt, G Philippe, H. 2005. An empirical assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics. Syst Biol. 54:743- 757.

Brocks, JJ, Logan, GA, Buick, R Summons, RE. 1999. Archean molecular fossils and the early rise of eukaryotes. Science. 285:1033-1036.

130 Chapitre 8: References

Bui, ET, Bradley, PJ Johnson, PJ. 1996. A common evolutionary origin for mitochondria and hydro- genosomes. Proc Natl Acad Sci USA. 93:9651-9656.

Burki, F, Berney, C Pawlowski, J. 2002. Phylogenetic position of oviformis Dujardin inferred from nuclear-encoded small subunit ribosomal DNA. Protist. 153:251-260.

Burki, F, Nikolaev, SI, Bolivar, I, Guiard, J Pawlowski, J. 2006. Analysis of expressed sequence tags from a naked foraminiferan Reticulomyxa filosa. Genome. 49:882-887.

Burki, F Pawlowski, J. 2006. Monophyly of Rhizaria and multigene phylogeny of unicellular bikonts. Mol Biol Evol. 23:1922-1930.

Burki, F, Shalchian-Tabrizi, K, Minge, MA, Skjaeveland, A, Nikolaev, SI, Jakobsen, KS Pawlowski, J. 2007. Phylogenomics reshuffles the eukaryotic supergroups. PLoS ONE. 2:e790.

Burki, F, Shalchian-Tabrizi, K Pawlowski, J. 2008. Phylogenomics reveals a new 'megagroup' includ- ing most photosynthetic eukaryotes. Biol Lett. 4:366-369.

Castresana, J. 2000. Selection of Conserved Blocks from Multiple Alignments for Their Use in Phylo- genetic Analysis. Mol Biol Evol. 17:540-552.

Cavalier-Smith, T. 1989. Molecular phylogeny. Archaebacteria and Archezoa. Nature. 339:l00-101.

Cavalier-Smith, T. 1993. Kingdom and its 18 phyla. Microbiol Rev. 57:953-994.

Cavalier-Smith, T. 1996-1997. Amoeboflagellates and mitochondrial cristae in eukaryotic evolution: megasystematics of the new protozoan subkingdoms Eozoa and Neozoa. Arch Protistenkd. 147:237- 258.

Cavalier-Smith, T. 1998a. A revised six-kingdom system of life. Biol Rev Camb Philos Soc. 73:203- 266.

Cavalier-Smith, T. 1998b. A revised six-kingdom system of life. Biol Rev Camb Philos Soc. 73:203- 266.

Cavalier-Smith, T. 1999. Principles of protein and lipid targeting in secondary symbiogenesis: eugle- noid, dinoflagellate, and sporozoan plastid origins and the eukaryote family tree. J Eukaryot Micro- biol. 46:347-366.

Cavalier-Smith, T. 2002. The phagotrophic origin of eukaryotes and phylogenetic classification of Protozoa. Int J Syst Evol Microbiol. 52:297-354.

Cavalier-Smith, T. 2003. Protist phylogeny and the high-level classification of Protozoa. Eur J Pro- tistol. 39:338-348.

Cavalier-Smith, T. 2009. Megaphylogeny, cell body plans, adaptive zones: causes and timing of eu- karyote basal radiations. J Eukaryot Microbiol. 56:26-33.

Cavalier-Smith, T. 2004. Chromalveolate diversity and cell megaevolution: interplay of membranes, genomes and cytoskeleton. In Organelles, Genomes and Eukaryotic Evolution, London: Taylor and Francis.

Cavalier-Smith, T Chao, E. 2003a. Phylogeny and classification of phylum Cercozoa (Protozoa). Protist. 154:341-358.

Cavalier-Smith, T Chao, EE. 1995. The Opalozoan is Related to the Common Ancestor of Animals, Fungi, and . Philos Trans R Soc Lond B Biol Sci. 261:1-6.

Cavalier-Smith, T Chao, EE. 1996-1997. Sarcomonad ribosomal RNA sequences, rhizopod phylogeny, and the origin of euglyphid amoebae. Arch Protistenkd. 147:227-236.

 131 Chapitre 8: References

Cavalier-Smith, T Chao, EE. 1997. Sarcomonad ribosomal RNA sequences, rhizopod phylogeny and the origin of euglyphid amoebae. Arch Protistenkd. 147:227-236.

Cavalier-Smith, T Chao, EE. 2003b. Molecular phylogeny of centrohelid heliozoa, a novel lineage of bikont eukaryotes that arose by ciliary loss. J Mol Evol. 56:387-396.

Cavalier-Smith, T Chao, EE. 2003c. Phylogeny of choanozoa, , and other protozoa and early eukaryote megaevolution. J Mol Evol. 56:540-563.

Cavalier-Smith, T Chao, EE. 2006. Phylogeny and megasystematics of phagotrophic heterokonts (kingdom ). J Mol Evol. 62:388-420.

Cavalier-Smith, T, Lewis, R, Chao, EE, Oates, B Bass, D. 2008. Morphology and phylogeny of Sainouron acronematica sp. n. and the ultrastructural unity of Cercozoa. Protist. 159:591-620.

Cavalier-Smith, T von der Heyden, S. 2007. Molecular phylogeny, scale evolution and taxonomy of centrohelid heliozoa. Mol Phylogenet Evol. 44:1186-1203.

Chu, KH, Qi, J, Yu, ZG Anh, V. 2004. Origin and phylogeny of chloroplasts revealed by a simple correlation analysis of complete genomes. Mol Biol Evol. 21:200-206.

Clark, CG Roger, AJ. 1995. Direct evidence for secondary loss of mitochondria in Entamoeba histo- lytica. Proc Natl Acad Sci USA. 92:6518-6521.

Copeland, HF. 1938. The Kingdoms of Organisms. Q Rev Biol. 13:383-420.

Cox, CJ, Foster, PG, Hirt, RP, Harris, SR Embley, TM. 2008. The archaebacterial origin of eu- karyotes. Proc Natl Acad Sci U S A.

Culver, SJ. 1991. Early Cambrian Foraminifera from West Africa. Science. 254:689-691.

Cuvelier, ML, Ortiz, A, Kim, E, Moehlig, H, Richardson, DE, Heidelberg, JF, Archibald, JM Wor- den, AZ. 2008. Widespread distribution of a unique marine protistan lineage. Environ Microbiol. 10:1621-1634. de Koning, A, Tartar, A, Boucias, D Keeling, P. 2005. Expressed sequence tag (EST) survey of the highly adapted green algal parasite, Helicosporidium. Protist. 156:181-190. de Vargas, C, Zaninetti, L, Hilbrecht, H Pawlowski, J. 1997. Phylogeny and rates of molecular evolu- tion of planktonic foraminifera: SSU rDNA sequences compared to the fossil record. J Mol Evol. 45:285-294.

Delsuc, F, Brinkmann, H, Chourrout, D Philippe, H. 2006. Tunicates and not cephalochordates are the closest living relatives of vertebrates. Nature. 439:965-968.

Delsuc, F, Brinkmann, H Philippe, H. 2005. Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet. 6:361-375.

Deschavanne, PJ, Giron, A, Vilain, J, Fagot, G Fertil, B. 1999. Genomic signature: characterization and classification of species assessed by chaos game representation of sequences. Mol Biol Evol. 16:1391-1399.

Douzery, EJ, Snell, EA, Bapteste, E, Delsuc, F Philippe, H. 2004. The timing of eukaryotic evolution: does a relaxed molecular clock reconcile proteins and fossils? Proc Natl Acad Sci U S A. 101:15386- 15391.

Driskell, AC, Ané, C, Burleigh, JG, McMahon, MM, O'meara, BC Sanderson, MJ. 2004. Prospects for building the tree of life from large sequence databases. Science. 306:1172-1174.

132 Chapitre 8: References

Edwards, SV, Fertil, B, Giron, A Deschavanne, PJ. 2002. A genomic schism in birds revealed by phylogenetic analysis of DNA strings. Syst Biol. 51:599-613.

Eisen, JA. 1998. Phylogenomics: improving functional predictions for uncharacterized genes by evolu- tionary analysis. Genome Res. 8:163-167.

Eisen, JA Fraser, CM. 2003. Phylogenomics: intersection of evolution and genomics. Science. 300:1706-1707.

Embley, TM Hirt, RP. 1998. Early branching eukaryotes? Curr Opin Genet Dev. 8:624-629.

Embley, TM Martin, W. 2006. Eukaryotic evolution, changes and challenges. Nature. 440:623-630.

Ewing, B, Hillier, L, Wendl, MC Green, P. 1998. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8:175-185.

Fahrni, JF, Bolivar, I, Berney, C, Nassonova, E, Smirnov, A Pawlowski, J. 2003. Phylogeny of lo- bose amoebae based on actin and small-subunit ribosomal RNA genes. Mol Biol Evol. 20:1881-1886.

Falquet, L, Bordoli, L, Ioannidis, V, Pagni, M Jongeneel, CV. 2003. Swiss EMBnet node web server. Nucleic Acids Res. 31:3782-3783.

Fast, NM, Kissinger, JC, Roos, DS Keeling, PJ. 2001. Nuclear-Encoded, Plastid-Targeted Genes Suggest a Single Common Origin for Apicomplexan and Dinoflagellate Plastids. Mol Biol Evol. 18:418-426.

Fast, NM, Logsdon, JM Doolittle, WF. 1999. Phylogenetic analysis of the TATA box binding protein (TBP) gene from Nosema locustae: evidence for a microsporidia-fungi relationship and spliceosomal intron loss. Mol Biol Evol. 16:1415-1419.

Fast, NM, Xue, L, Bingham, S Keeling, P. 2002. Re-examining alveolate evolution using multiple protein molecular phylogenies. J Eukaryot Microbiol. 49:30-37.

Felsenstein, J. 1978. Cases in which parsimony or compatibility methods will be positively mislead- ing. Syst Zool. 27:401-410.

Felsenstein, J. 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol. 17:368-376.

Felsenstein, J. 1985. Confidence limits on phylogenies: An approach using the bootstrap. Evolution. 40:783-791.

Felsenstein, J. 2004. PHYLIP (phylogeny inference package), version 3.6, University of Washington, Seattle.

Fensome, RA, MacRae, RA, Moldowan, JM, Taylor, FJR Williams, GL. 1996. The early Mesozoic radiation of dinoflagellates. Paleobiology. 22:329-338.

Fitz-Gibbon, ST House, CH. 1999. Whole genome-based phylogenetic analysis of free-living microor- ganisms. Nucleic Acids Res. 27:4218-4222.

Fitzpatrick, DA, Logue, ME, Stajich, JE Butler, G. 2006. A fungal phylogeny based on 42 complete genomes derived from supertree and combined gene analysis. BMC Evol Biol. 6:99.

Flakowski, J, Bolivar, I, Fahrni, J Pawlowski, J. 2005. Actin phylogeny of foraminifera. Pp. 93-102. J Foramin Res.

Foster, PG. 2004. Modeling compositional heterogeneity. Syst Biol. 53:485-495.

Foyn, B. 1934. Lebenszyklus, cytology und sexualitat der Chlorophyceae Cladopora subriana Kützing. Arch Protistenkd. 83:1-56.

 133 Chapitre 8: References

Friedman, S, Debrunner-Vossbrinck, BA Woese, CR. 1987. Ribosomal RNA sequence suggests mi- crosporidia are extremely ancient eukaryotes. Nature. 326:411-414.

Galtier, N Lobry, JR. 1997. Relationships between genomic G+C content, RNA secondary structures, and optimal growth temperature in prokaryotes. J Mol Evol. 44:632-636.

Gee, H. 2003. Evolution: ending incongruence. Nature. 425:782.

Germot, A, Philippe, H Le Guyader, H. 1997. Evidence for loss of mitochondria in Microsporidia from a mitochondrial-type HSP70 in Nosema locustae. Mol Biochem Parasitol. 87:159-168.

Gooday, AJ. 2002. Organic-walled allogromiids: Aspects of their occurrence, diversity and ecology in marine habitats. Pp. 384-399. J Foramin Res.

Gould, SB, Waller, RF McFadden, GI. 2008. Plastid evolution. Annu Rev Plant Biol. 59:491-517.

Graybeal, A. 1998. Is it better to add taxa or characters to a difficult phylogenetic problem? Syst Biol. 47:9-17.

Green, P. 1996. PHRAP documentation.

Gu, X Zhang, H. 2004. Genome phylogenetic analysis based on extended gene contents. Mol Biol Evol. 21:1401-1408.

Gubler, U Hoffman, BJ. 1983. A simple and very efficient method for generating cDNA libraries. Gene. 25:263-269.

Guindon, S Gascuel, O. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol. 52:696-704.

Habura, A, Pawlowski, J, Hanes, SD Bowser, SS. 2004. Unexpected foraminiferal diversity revealed by small-subunit rDNA analysis of Antarctic sediment. J Eukaryot Microbiol. 51:173-179.

Habura, A, Wegener, L, Travis, JL Bowser, SS. 2005. Structural and functional implications of an unusual foraminiferal beta-tubulin. Mol Biol Evol. 22:2000-2009.

Hackett, JD, Scheetz, TE, Yoon, HS, Soares, MB, Bonaldo, MF, Casavant, TL Bhattacharya, D. 2005. Insights into a dinoflagellate genome through expressed sequence tag analysis. BMC Genomics. 6:80.

Hackett, JD, Yoon, HS, Li, S, Reyes-Prieto, A, Rümmele, SE Bhattacharya, D. 2007. Phylogenomic analysis supports the monophyly of cryptophytes and haptophytes and the association of rhizaria with chromalveolates. Mol Biol Evol. 24:1702-1713.

Haeckel, EHPA. 1866. Generelle Morphologie der Organismen: Allgemeine Grundzüge der organischen Formen-wissenschaft, …. books.google.com.

Hagopian, JC, Reis, M, Kitajima, JP, Bhattacharya, D de Oliveira, MC. 2004. Comparative analysis of the complete plastid genome sequence of the red alga Gracilaria tenuistipitata var. liui provides insights into the evolution of rhodoplasts and their relationship to other plastids. J Mol Evol. 59:464- 477.

Hall, T. 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucl Acids Symp Ser. 41:95-98.

Hampl, V, Horner, DS, Dyal, P, Kulda, J, Flegr, J, Foster, PG Embley, TM. 2005. Inference of the Phylogenetic Position of Based on Nine Genes: Support for Metamonada and Excavata. Mol Biol Evol. 22:2508-2518.

134 Chapitre 8: References

Hampl, V, Hug, LA, Leigh, JW, Dacks, JB, Lang, BF, Simpson, AG Roger, AJ. 2009. Phylogenomic Analyses Support the Monophyly of Excavata and Resolve Relationships among Eukaryotic “Super- groups”. Proc Natl Acad Sci U S A. in press.

Harper, JT Keeling, PJ. 2003. Nucleus-encoded, plastid-targeted glyceraldehyde-3-phosphate dehydro- genase (GAPDH) indicates a single origin for chromalveolate plastids. Mol Biol Evol. 20:1730-1735.

Harper, JT, Waanders, E Keeling, PJ. 2005. On the monophyly of chromalveolates using a six- protein phylogeny of eukaryotes. Int J Syst Evol Microbiol. 55:487-496.

Hedges, SB, Blair, JE, Venturi, ML Shoe, JL. 2004. A molecular timescale of eukaryote evolution and the rise of complex multicellular life. BMC Evol Biol. 4:2.

Hendy, MD Penny, D. 1989. A framework for the quantitative study of evolutionary trees. Syst Zool. 38:297-309.

Hirt, RP, Logsdon, JM, Healy, B, Dorey, MW, Doolittle, WF Embley, TM. 1999. Microsporidia are related to Fungi: evidence from the largest subunit of RNA polymerase II and other proteins. Proc Natl Acad Sci USA. 96:580-585.

Ho, SY Jermiin, L. 2004. Tracing the decay of the historical signal in biological sequence data. Syst Biol. 53:623-637.

Holzmann, M, Habura, A, Giles, H, Bowser, SS Pawlowski, J. 2003. Freshwater foraminiferans re- vealed by analysis of environmental DNA samples. J Eukaryot Microbiol. 50:135-139.

Holzmann, M Pawlowski, J. 2002. Freshwater foraminiferans from Lake Geneva: Past and present. J Foramin Res. 32:344-350.

Hoppenrath, M Leander, BS. 2006. Ebriid phylogeny and the expansion of the Cercozoa. Protist. 157:279-290.

House, CH Fitz-Gibbon, ST. 2002. Using homolog groups to create a whole-genomic tree of free-living organisms: an update. J Mol Evol. 54:539-547.

Howe, CJ, Barbrook, AC, Nisbet, RE, Lockhart, PJ Larkum, AW. 2008. The origin of plastids. Phi- los Trans R Soc Lond B Biol Sci. 363:2675-2685.

Huelsenbeck, JP. 1997. Is the Felsenstein zone a fly trap? Syst Biol. 46:69-74.

Huelsenbeck, JP, Ronquist, F, Nielsen, R Bollback, JP. 2001. Bayesian inference of phylogeny and its impact on evolutionary biology. Science. 294:2310-2314.

Hug, LA Roger, AJ. 2007. The impact of fossils and taxon sampling on ancient molecular dating analyses. Mol Biol Evol. 24:1889-1897.

Hülsmann, N. 1984. Biology of the genus Reticulomyxa (Rhizopoda). J Protozool. 31:55A.

Iida, K, Takishita, K, Ohshima, K Inagaki, Y. 2007. Assessing the monophyly of chlorophyll-c con- taining plastids by multi-gene phylogenies under the unlinked model conditions. Mol Phylogenet Evol. 45:227-238.

Jeffroy, O, Brinkmann, H, Delsuc, F Philippe, H. 2006. Phylogenomics: the beginning of incongru- ence? Trends Genet. 22:225-231.

Jermiin, L, Ho, SY, Ababneh, F, Robinson, J Larkum, AW. 2004. The biasing effect of compositional heterogeneity on phylogenetic estimates may be underestimated. Syst Biol. 53:638-643.

Jobb, G, von Haeseler, A Strimmer, K. 2004. TREEFINDER: a powerful graphical analysis environ- ment for molecular phylogenetics. BMC Evol Biol. 4:18.

 135 Chapitre 8: References

Jouannic, S, Argout, X, Lechauve, F, Fizames, C, Borgel, A, Morcillo, F, Aberlenc-Bertossi, F, Duval, Y Tregear, J. 2005. Analysis of expressed sequence tags from oil palm (Elaeis guineensis). FEBS Lett. 579:2709-2714.

Katoh, K, Misawa, K, Kuma, K Miyata, T. 2002. MAFFT: a novel method for rapid multiple se- quence alignment based on fast Fourier transform. Nucleic Acids Res. 30:3059-3066.

Keeling, P Palmer, J. 2001. Lateral transfer at the gene and subgenic levels in the evolution of eu- karyotic enolase. Proc Natl Acad Sci USA. 98:10745-10750.

Keeling, PJ. 1998. A kingdom's progress: Archezoa and the origin of eukaryotes. Pp. 87-95. Bioes- says.

Keeling, PJ. 2001. Foraminifera and Cercozoa are related in actin phylogeny: two orphans find a home? Mol Biol Evol. 18:1551-1557.

Keeling, PJ. 2004. Diversity and evolutionary history of plastids and their hosts. Pp. 1481-1493. Am J Bot.

Keeling, PJ, Burger, G, Durnford, DG, Lang, BF, Lee, RW, Pearlman, RE, Roger, AJ Gray, MW. 2005. The tree of eukaryotes. Trends Ecol Evol. 20:670-676.

Keeling, PJ Doolittle, WF. 1996. Alpha-tubulin from early-diverging eukaryotic lineages and the evo- lution of the tubulin family. Mol Biol Evol. 13:1297-1305.

Keeling, PJ Palmer, JD. 2008. Horizontal gene transfer in eukaryotic evolution. Nat Rev Genet. 9:605-618.

Keon, J, Antoniw, J, Rudd, J, Skinner, W, Hargreaves, J Hammond-Kosack, K. 2005. Analysis of expressed sequence tags from the wheat leaf blotch pathogen Mycosphaerella graminicola (anamorph Septoria tritici). Fungal Genet Biol. 42:376-389.

Khan, H, Parks, N, Kozera, C, Curtis, BA, Parsons, BJ, Bowman, S Archibald, JM. 2007. Plastid genome sequence of the cryptophyte alga Rhodomonas salina CCMP1319: lateral transfer of putative DNA replication machinery and a test of chromist plastid phylogeny. Mol Biol Evol. 24:1832-1842.

Kim, E Graham, LE. 2008. EEF2 analysis challenges the monophyly of Archaeplastida and Chromal- veolata. PLoS ONE. 3:e2621.

Kim, E, Simpson, AG Graham, LE. 2006. Evolutionary relationships of apusomonads inferred from taxon-rich analyses of 6 nuclear encoded genes. Mol Biol Evol. 23:2455-2466.

Klaveness, D, Shalchian-Tabrizi, K, Thomsen, HA, Eikrem, W Jakobsen, KS. 2005. Telonema antarc- ticum sp. nov., a common marine phagotrophic flagellate. Int J Syst Evol Microbiol. 55:2595-2604.

Knoll, AH. 1992. The Early Evolution of Eukaryotes: A Geological Perspective. Science.622-627.

Kolaczkowski, B Thornton, JW. 2004. Performance of maximum parsimony and likelihood phyloge- netics when evolution is heterogeneous. Nature. 431:980-984.

Kooistra, WH Medlin, LK. 1996. Evolution of the diatoms (Bacillariophyta). IV. A reconstruction of their age from small subunit rRNA coding regions and the fossil record. Mol Phylogenet Evol. 6:391- 407.

Koonce, M Schliwa, M. 1985. Bidirectional transport can occur in cell processes that contain single microtubules. J Cell Biol. 100:322-326.

Korbel, JO, Snel, B, Huynen, MA Bork, P. 2002. SHOT: a web server for the construction of genome phylogenies. Trends Genet. 18:158-162.

136 Chapitre 8: References

Koski, L, Gray, M, Lang, BF Burger, G. 2005. AutoFACT: An Automatic Functional Annotation and Classification Tool %U http://www.biomedcentral.com/1471-2105/6/151. BMC Bioinformatics. 6:151.

Koski, LB Golding, GB. 2001. The closest BLAST hit is often not the nearest neighbor. J Mol Evol. 52:540-542.

Kuhn, S, Lange, M Medlin, LK. 2000. Phylogenetic position of Cryothecomonas inferred from nuclear- encoded small subunit ribosomal RNA. Protist. 151:337-345.

Kumar, S Rzhetsky, A. 1996. Evolutionary Relationships of Eukaryotic Kingdoms. J Mol Evol. 42:183.

Lake, JA Rivera, MC. 2004. Deriving the genomic tree of life in the presence of horizontal gene trans- fer: conditioned reconstruction. Mol Biol Evol. 21:681-690.

Lane, CE Archibald, JM. 2008. The eukaryotic tree of life: endosymbiosis takes its TOL. Trends Ecol Evol. 23:268-275.

Lane, CE Archibald, JM. 2009. Reply to Bodyl, Stiller and Mackiewicz: "Chromalveolate plastids: direct descent or multiple endosymbioses?". Trends Ecol Evol.

Lang, BF, Burger, G, O'Kelly, CJ, Cedergren, R, Golding, GB, Lemieux, C, Sankoff, D, Turmel, M Gray, MW. 1997. An ancestral mitochondrial DNA resembling a eubacterial genome in miniature. Nature. 387:493-497.

Lang, BF, O'Kelly, C, Nerad, TA, Gray, MW Burger, G. 2002. The closest unicellular relatives of animals. Curr Biol. 12:1773-1778.

Larget, B, Kadane, JB Simon, DL. 2005a. A Bayesian approach to the estimation of ancestral genome arrangements. Mol Phylogenet Evol. 36:214-223.

Larget, B, Simon, DL, Kadane, JB Sweet, D. 2005b. A bayesian analysis of metazoan mitochondrial genome arrangements. Mol Biol Evol. 22:486-495.

Lartillot, N, Brinkmann, H Philippe, H. 2007. Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model. BMC Evol Biol. 7 Suppl 1:S4.

Lartillot, N Philippe, H. 2004. A Bayesian mixture model for across-site heterogeneities in the amino- acid replacement process. Mol Biol Evol. 21:1095-1109.

Lartillot, N Philippe, H. 2008. Improvement of molecular phylogenetic inference and the phylogeny of Bilateria. Philos Trans R Soc Lond B Biol Sci. 363:1463-1472.

Leander, B Keeling, P. 2003. Morphostasis in alveolate evolution. Trends Ecol Evol. 18:498-504.

Leander, B Keeling, P. 2004. Early evolutionary history of dinoflagellates and apicomplexans (Alveo- lata) as inferred from HSP90 and actin phylogenies. J Phycol. 40:341-350.

Lecointre, G, Philippe, H, Vân Lê, HL Le Guyader, H. 1993. Species sampling has a major impact on phylogenetic inference. Mol Phylogenet Evol. 2:205-224.

Lee, J Anderson, O. 1991. Biology of Foraminifera. Academic Press, New York.

Li, S, Nosenko, T, Hackett, JD Bhattacharya, D. 2006. Phylogenomic Analysis Identifies Red Algal Genes of Endosymbiotic Origin in the Chromalveolates. Mol Biol Evol. 23:663-674.

Lin, YH, McLenachan, PA, Gore, AR, Phillips, MJ, Ota, R, Hendy, MD Penny, D. 2002. Four new mitochondrial genomes and the increased stability of evolutionary trees of from improved taxon sampling. Mol Biol Evol. 19:2060-2070.

 137 Chapitre 8: References

Lonergan, KM Gray, MW. 1996. Expression of a continuous open reading frame encoding subunits 1 and 2 of cytochrome c oxidase in the mitochondrial DNA of Acanthamoeba castellanii. J Mol Biol. 257:1019-1030.

Longet, D, Archibald, JM, Keeling, PJ Pawlowski, J. 2003. Foraminifera and Cercozoa share a com- mon origin according to RNA polymerase II phylogenies. Int J Syst Evol Microbiol. 53:1735-1739.

Longet, D, Burki, F, Flakowski, J, Berney, C Polet, S. 2004. Multigene evidence for close evolution- ary relations between Gromia and Foraminifera. Acta Protozool. 43:303-311.

Lopez, P, Casane, D Philippe, H. 2002. Heterotachy, an important process of protein evolution. Mol Biol Evol. 19:1-7.

Magallón, SA Sanderson, MJ. 2005. Angiosperm divergence times: the effect of genes, codon positions, and time constraints. Evolution. 59:1653-1670.

Martin, W et al. 2002. Evolutionary analysis of Arabidopsis, cyanobacterial, and chloroplast genomes reveals plastid phylogeny and thousands of cyanobacterial genes in the nucleus. Proc Natl Acad Sci USA. 99:12246-12251.

Maruyama, S, Misawa, K, Iseki, M, Watanabe, M Nozaki, H. 2008. Origins of a cyanobacterial 6- phosphogluconate dehydrogenase in plastid-lacking eukaryotes. BMC Evol Biol. 8:151.

Mcfadden, GI van Dooren, GG. 2004. Evolution: red algal genome affirms a common origin of all plastids. Curr Biol. 14:R514-516.

McMahon, MM Sanderson, MJ. 2006. Phylogenetic Supermatrix Analysis of GenBank Sequences from 2228 Papilionoid Legumes. Systematic Biol. 55:818 - 836.

Meisterfeld, R, Holzmann, M Pawlowski, J. 2001. Morphological and molecular characterization of a new terrestrial allogromiid species: Edaphoallogromia australica gen. et spec. nov. (Foraminifera) from Northern Queensland (Australia). Protist. 152:185-192.

Minge, MA, Silberman, JD, Orr, RJ, Cavalier-Smith, T, Shalchian-Tabrizi, K, Burki, F, Skjæveland, A Jakobsen, KS. 2009. Evolutionary position of breviate amoebae and the primary eukaryote diver- gence. Philos Trans R Soc Lond B Biol Sci. 276:597-604.

Mitchell, A, Mitter, C Regier, JC. 2000. More taxa or more characters revisited: combining data from nuclear protein-encoding genes for phylogenetic analyses of Noctuoidea (Insecta: Lepidoptera). Syst Biol. 49:202-224.

Moore, DV et al. 2008. A photosynthetic alveolate closely related to apicomplexan parasites. Nature. 451:959-963.

Moreira, D, Le Guyader, H Philippe, H. 1999. Unusually high evolutionary rate of the elongation fac- tor 1 alpha genes from the Ciliophora and its impact on the phylogeny of eukaryotes. Mol Biol Evol. 16:234-245.

Moreira, D, Le Guyader, H Philippe, H. 2000. The origin of red algae and the evolution of chloro- plasts. Nature. 405:69-72.

Moret, BME Warnow, T. 2005. Advances in Phylogeny Reconstruction from Gene Order and Content Data. Molecular evolution.

Morin, L. 2000. Long branch attraction effects and the status of "basal eukaryotes": phylogeny and structural analysis of the ribosomal RNA gene cluster of the free-living Trepomonas agi- lis. J Eukaryot Microbiol. 47:167-177.

138 Chapitre 8: References

Moustafa, A Bhattacharya, D. 2008. PhyloSort: a user-friendly phylogenetic sorting tool and its ap- plication to estimating the cyanobacterial contribution to the nuclear genome of Chlamydomonas. BMC Evol Biol. 8:7.

Nauss, R. 1949. Reticulomyxa filosa gen. et sp. nov., a new primitive plasmodium. Bull Torrey Bot Club. 76:161-176.

Nikolaev, SI, Berney, C, Fahrni, J, Mylnikov, AP, Aleshin, VV, Petrov, NB Pawlowski, J. 2003. Gymnophrys cometa and Lecythium sp. are core Cercozoa: evolutionary implications. Acta Protozool. 42:183-190.

Nikolaev, SI, Berney, C, Fahrni, JF Bolivar, I. 2004. The twilight of Heliozoa and rise of Rhizaria, an emerging supergroup of amoeboid …. Proc Natl Acad Sci USA. 101:8066-8071.

Nosenko, T, Lidie, KL, Van Dolah, FM, Lindquist, E, Cheng, JF Bhattacharya, D. 2006. Chimeric plastid proteome in the Florida "" dinoflagellate Karenia brevis. Mol Biol Evol. 23:2026-2038.

Not, F, Valentin, K, Romari, K, Lovejoy, C, Massana, R, Töbe, K, Vaulot, D Medlin, LK. 2007. Picobiliphytes: a marine picoplanktonic algal group with unknown affinities to other eukaryotes. Sci- ence. 315:253-255.

Nowack, EC, Melkonian, M Glöckner, G. 2008. Chromatophore Genome Sequence of Sheds Light on Acquisition of Photosynthesis by Eukaryotes. Curr Biol. 18.

Nozaki, H. 2005. A new scenario of plastid evolution: plastid primary endosymbiosis before the diver- gence of the "Plantae," emended. J Plant Res. 118:247-255.

Nozaki, H, Iseki, M, Hasegawa, M, Misawa, K, Nakada, T, Sasaki, N Watanabe, M. 2007. Phylogeny of primary photosynthetic eukaryotes as deduced from slowly evolving nuclear genes. Mol Biol Evol. 24:1592-1595.

Nozaki, H et al. 2003. The phylogenetic position of red algae revealed by multiple nuclear genes from mitochondria-containing eukaryotes and an alternative hypothesis on the origin of plastids. J Mol Evol. 56:485-497.

Okamoto, N Inouye, I. 2005. The katablepharids are a distant sister group of the Cryptophyta: A proposal for Katablepharidophyta divisio nova/ Kathablepharida phylum novum based on SSU rDNA and beta-tubulin phylogeny. Protist. 156:163-179.

Ozkaynak, E, Rueger, D, Drier, E, Corbett, C, Ridge, R, Sampath, T Oppermann, H. 1990. OP-1 cDNA encodes an osteogenic protein in the TGF-beta family. EMBO J. 9:2085-2093.

Pace, NR. 2006. Time for a change. Nature. 441:289.

Padovan, A, Sanson, G, Brunstein, A Briones, M. 2005. Fungi Evolution Revisited: Application of the Penalized Likelihood Method to a Bayesian Fungal Phylogeny Provides a New Perspective on Phylogenetic Relationships and Divergence Dates of Ascomycota Groups. J Mol Evol. 60:726-735.

Palmer, JD. 2003. The symbiotic birth and spread of plastids: how many times and whodunit? J Phy- col. 39:4-11.

Parfrey, LW, Barbero, E, Lasser, E, Dunthorn, M, Bhattacharya, D, Patterson, DJ Katz, LA. 2006. Evaluating support for the current classification of eukaryotic diversity. PLoS Genet. 2:e220.

Patron, NJ, Inagaki, Y Keeling, PJ. 2007. Multiple gene phylogenies support the monophyly of cryp- tomonad and haptophyte host lineages. Curr Biol. 17:887-891.

 139 Chapitre 8: References

Patron, NJ, Rogers, MB Keeling, PJ. 2004. Gene replacement of fructose-1,6-bisphosphate aldolase supports the hypothesis of a single photosynthetic ancestor of chromalveolates. Eukaryot Cell. 3:1169- 1175.

Patterson, DJ. 1999. The Diversity of Eukaryotes. Am Nat.S96-S124.

Pawlowski, J, Bolivar, I, Fahrni, JF, Cavalier-Smith, T Gouy, M. 1996. Early origin of foraminifera suggested by SSU rRNA gene sequences. Mol Biol Evol. 13:445-450.

Pawlowski, J, Bolivar, I, Fahrni, JF, de Vargas, C, Gouy, M Zaninetti, L. 1997. Extreme differences in rates of molecular evolution of foraminifera revealed by comparison of ribosomal DNA sequences and the fossil record. Mol Biol Evol. 14:498-505.

Pawlowski, J, Bolivar, I, Fahrni, JF, de Vargas, Cd Bowser, SS. 1999. Molecular evidence that Re- ticulomyxa filosa is a freshwater naked foraminifer. J Eukaryot Microbiol. 46:612-617.

Pawlowski, J, Bolivar, I, Guiard-Maffia, J Gouy, M. 1994. Phylogenetic position of foraminifera inferred from LSU rRNA gene sequences. Mol Biol Evol. 11:929-938.

Pawlowski, J, Fahrni, J, Brykczynska, U, Habura, A Bowser, SS. 2002. Molecular data reveal high taxonomic diversity of allogromiid foraminifera in Explorers Cove (McMurdo Sound, Antarctica). Polar Biology. 25:96-105.

Perasso, R, Baroin, A, Qu, LH Bachellerie, JP. 1989. Origin of the algae. Nature. 339:142-144.

Peterson, KJ Butterfield, NJ. 2005. Origin of the Eumetazoa: testing ecological predictions of mo- lecular clocks against the Proterozoic fossil record. Proc Natl Acad Sci U S A. 102:9547-9552.

Philip, GK, Creevey, CJ McInerney, JO. 2005. The Opisthokonta and the Ecdysozoa may not be clades: stronger support for the grouping of plant and animal than for animal and fungi and stronger support for the Coelomata than Ecdysozoa. Mol Biol Evol. 22:1175-1184.

Philippe, H. 2000. Opinion: long branch attraction and protists phylogeny. Protist. 151:307-316.

Philippe, H Adoutte, A. 1998. The molecular phylogeny of Eukaryote: solid facts and uncertainties. In evolutionary relationships among Protozoa. Pp. 25-56. Evolutionary relationships among Protozoa. Chapman and Hall, London.

Philippe, H Germot, A. 2000. Phylogeny of Eukaryotes Based on Ribosomal RNA: Long-Branch At- traction and Models of Sequence Evolution. Mol Biol Evol. 17:830-834.

Philippe, H, Germot, A Moreira, D. 2000a. The new phylogeny of eukaryotes. Curr Opin Genet Dev. 10:596-601.

Philippe, H, Lartillot, N Brinkmann, H. 2005. Multigene analyses of bilaterian animals corroborate the monophyly of Ecdysozoa, Lophotrochozoa, and Protostomia. Mol Biol Evol. 22:1246-1253.

Philippe, H Laurent, J. 1998. How good are deep phylogenetic trees? Curr Opin Genet Dev. 8:616-623.

Philippe, H, Lopez, P, Brinkmann, H, Budin, K, Germot, A, Laurent, J, Moreira, D, Müller, M Le Guyader, H. 2000b. Early-branching or fast-evolving eukaryotes? An answer based on slowly evolving positions. Philos Trans R Soc Lond B Biol Sci. 267:1213-1221.

Philippe, H, Snell, EA, Bapteste, E, Lopez, P, Holland, PW Casane, D. 2004. Phylogenomics of eu- karyotes: impact of missing data on large alignments. Mol Biol Evol. 21:1740-1752.

Polet, S, Berney, C, Fahrni, J Pawlowski, J. 2004. Small-subunit ribosomal RNA gene sequences of challenge the monophyly of Haeckel's Radiolaria. Protist. 155:53-63.

140 Chapitre 8: References

Porter, SM, Meisterfeld, R Knoll, AH. 2003. Vase-shaped microfossils from the Neoproterozoic Chuar Group, Grand Canyon: A classification guided by modern testate amoebae. Pp. 409-429. J Paleontol.

Posada, D Buckley, T. 2004. Model Selection and Model Averaging in Phylogenetics: Advantages of Akaike Information Criterion and Bayesian Approaches Over Likelihood Ratio Tests. Syst Biol. 53:793-808.

Prechtl, J, Kneip, C, Lockhart, P, Wenderoth, K Maier, UG. 2004. Intracellular spheroid bodies of Rhopalodia gibba have nitrogen-fixing apparatus of cyanobacterial origin. Mol Biol Evol. 21:1477- 1481.

Pride, DT, Meinersmann, RJ, Wassenaar, TM Blaser, MJ. 2003. Evolutionary implications of micro- bial genome tetranucleotide frequency biases. Genome Res. 13:145-158.

Qi, J, Wang, B Hao, BI. 2004. Whole proteome prokaryote phylogeny without sequence alignment: a K-string composition approach. J Mol Evol. 58:1-11.

Reyes-Prieto, A, Hackett, JD, Soares, MB, Bonaldo, MF Bhattacharya, D. 2006. Cyanobacterial con- tribution to algal nuclear genomes is primarily limited to plastid functions. Curr Biol. 16:2320-2325.

Reyes-Prieto, A, Moustafa, A Bhattacharya, D. 2008. Multiple Genes of Apparent Algal Origin Sug- gest Ciliates May Once Have Been Photosynthetic. Curr Biol. 18:956-962.

Reyes-Prieto, A, Weber, AP Bhattacharya, D. 2007. The origin and establishment of the plastid in algae and plants. Annu Rev Genet. 41:147-168.

Ribichich, K, Salem-Izacc, S, Georg, R, Vencio, R, Navarro, L Gomes, S. 2005. Gene Discovery and Expression Profile Analysis through Sequencing of Expressed Sequence Tags from Different Develop- mental Stages of the Chytridiomycete Blastocladiella emersonii. Eukaryot Cell. 4:455-464.

Rice, DW Palmer, JD. 2006. An exceptional horizontal gene transfer in plastids: gene replacement by a distant bacterial paralog and evidence that haptophyte and cryptophyte plastids are sisters. BMC Biol. 4:31.

Richards, TA Cavalier-Smith, T. 2005. Myosin domain evolution and the primary divergence of eu- karyotes. Nature. 436:1113-1118.

Robinson-Rechavi, M Huchon, D. 2000. RRTree: Relative-Rate Tests between groups of sequences on a phylogenetic tree. Bioinformatics. 16:296-297.

Rodriguez-Ezpeleta, N, Brinkmann, H, Burey, SC, Roure, B, Burger, G, Löffelhardt, W, Bohnert, HJ, Philippe, H Lang, BF. 2005. Monophyly of primary photosynthetic eukaryotes: green plants, red al- gae, and glaucophytes. Curr Biol. 15:1325-1330.

Rodriguez-Ezpeleta, N, Brinkmann, H, Burger, G, Roger, AJ, Gray, MW, Philippe, H Lang, BF. 2007a. Toward resolving the eukaryotic tree: the phylogenetic positions of jakobids and cercozoans. Curr Biol. 17:1420-1425.

Rodriguez-Ezpeleta, N, Brinkmann, H, Roure, B, Lartillot, N, Lang, BF Philippe, H. 2007b. Detect- ing and overcoming systematic errors in genome-scale phylogenies. Syst Biol. 56:389-399.

Rodriguez-Ezpeleta, N, Philippe, H, Brinkmann, H, Becker, B Melkonian, M. 2007c. Phylogenetic analyses of nuclear, mitochondrial, and plastid multigene data sets support the placement of Mesostigma in the Streptophyta. Mol Biol Evol. 24:723-731.

Roger, AJ, Clark, CG Doolittle, WF. 1996. A possible mitochondrial gene in the early-branching ami- tochondriate protist Trichomonas vaginalis. Proc Natl Acad Sci USA. 93:14618-14622.

 141 Chapitre 8: References

Roger, AJ, Sandblom, O, Doolittle, WF Philippe, H. 1999. An evaluation of elongation factor 1 alpha as a phylogenetic marker for eukaryotes. Mol Biol Evol. 16:218-233.

Roger, AJ Simpson, AG. 2009. Evolution: revisiting the root of the eukaryote tree. Curr Biol. in pressx.

Roger, AJ, Svärd, SG, Tovar, J, Clark, CG, Smith, MW, Gillin, FD Sogin, ML. 1998. A mitochon- drial-like chaperonin 60 gene in Giardia lamblia: evidence that diplomonads once harbored an endo- symbiont related to the progenitor of mitochondria. Proc Natl Acad Sci USA. 95:229-234.

Rogers, MB, Archibald, JM, Field, M, Li, C, Strieped, B Keeling, PJ. 2004. Plastid-Targeting Pep- tides from the Chlorarachniophyte Bigelowiella natans. J Eukaryot Microbiol. 51:529-535.

Rogers, MB, Gilson, PR, Su, V, McFadden, GI Keeling, PJ. 2007. The Complete Chloroplast Ge- nome of the Chlorarachniophyte Bigelowiella natans: Evidence for Independent Origins of Chlorarach- niophyte and Secondary Endosymbionts. Mol Biol Evol. 24:54-62.

Rokas, A Carroll, SB. 2005. More genes or more taxa? The relative contribution of gene number and taxon number to phylogenetic accuracy. Mol Biol Evol. 22:1337-1344.

Rokas, A Holland, PW. 2000. Rare genomic changes as a tool for phylogenetics. Trends Ecol Evol. 15:454-459.

Rokas, A, Krüger, D Carroll, SB. 2005. Animal evolution and the molecular signature of radiations compressed in time. Science. 310:1933-1938.

Ronquist, F Huelsenbeck, JP. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 19:1572-1574.

Rothpletz, A. 1896. Uber die Flysch-Fucoiden und einige andere fossile Algen, sowie uberliasische dia- tomeen fuhrende Hornschwamme. Zeitschrift der Deutschen Geologischen Gesellschaft. 48.

Roure, B, Rodriguez-Ezpeleta, N Philippe, H. 2007. SCaFoS: a tool for selection, concatenation and fusion of sequences for phylogenomics. BMC Evol Biol. 7 Suppl 1:S2.

Ruiz-Trillo, I, Roger, AJ, Burger, G, Gray, MW Lang, BF. 2008. A phylogenomic investigation into the origin of Metazoa. Mol Biol Evol. 25:664-672.

Sakaguchi, M, Inagaki, Y Hashimoto, T. 2007. Centrohelida is still searching for a phylogenetic home: analyses of seven Raphidiophrys contractilis genes. Gene. 405:47-54.

Sakaguchi, M, Nakayama, T, Hashimoto, T Inouye, I. 2005. Phylogeny of the Centrohelida inferred from SSU rRNA, tubulins, and actin genes. J Mol Evol. 61:765-775.

Sakaguchi, M Suzaki, T. 1999. Monoxenic culture of the heliozoon Actinophrys sol. Eur J Protistol. 35:411-415.

Sanchez-Puerta, MV Delwiche, CF. 2008. A hypothesis for plastid evolution in chromalveolates. Pp. 1097-1107. J Phycol.

Sanderson, MJ Wojciechowski, MF. 2000. Improved bootstrap confidence limits in large-scale phylo- genies, with an example from Neo-Astragalus (Leguminosae). Syst Biol. 49:671-685.

Sato, N, Ishikawa, M, Fujiwara, M Sonoike, K. 2005. Mass identification of chloroplast proteins of endosymbiont origin by phylogenetic profiling based on organism-optimized homologous protein groups. Genome informatics. 16:56-68.

Sen Gupta, B. 1999. Modern Foraminifera. Kluwer, Dordrecht, The Netherlands.

142 Chapitre 8: References

Shalchian-Tabrizi, K et al. 2006a. Telonemia, a new protist phylum with affinity to chromist linea- ges. Philos Trans R Soc Lond B Biol Sci. 273:1833-1842.

Shalchian-Tabrizi, K, Kauserud, H, Massana, R, Klaveness, D Jakobsen, KS. 2007. Analysis of envi- ronmental 18S ribosomal RNA sequences reveals unknown diversity of the cosmopolitan phylum Te- lonemia. Protist. 158:173-180.

Shalchian-Tabrizi, K, Minge, MA, Espelund, M, Orr, R, Ruden, T, Jakobsen, KS Cavalier-Smith, T. 2008. Multigene phylogeny of choanozoa and the origin of animals. PLoS ONE. 3:e2098.

Shalchian-Tabrizi, K, Skanseng, M, Ronquist, F, Klaveness, D, Bachvaroff, TR, Delwiche, CF, Bot- nen, A, Tengs, T Jakobsen, KS. 2006b. Heterotachy Processes in Rhodophyte-Derived Secondhand Plastid Genes: Implications for Addressing the Origin and Evolution of Dinoflagellate Plastids. Mol Biol Evol. 23:1504-1515.

Shimodaira, H. 2002. An approximately unbiased test of phylogenetic tree selection. Syst Biol. 51:492- 508.

Shimodaira, H Hasegawa, M. 2001. CONSEL: for assessing the confidence of phylogenetic tree selec- tion. Bioinformatics. 17:1246-1247.

Simpson, AG. 2003. Cytoskeletal organization, phylogenetic affinities and systematics in the conten- tious taxon Excavata (Eukaryota). Int J Syst Evol Microbiol. 53:1759-1777.

Simpson, AG, Inagaki, Y Roger, AJ. 2006. Comprehensive multigene phylogenies of excavate protists reveal the evolutionary positions of "primitive" eukaryotes. Mol Biol Evol. 23:615-625.

Simpson, AG Roger, AJ. 2004. The real 'kingdoms' of eukaryotes. Curr Biol. 14:R693-696.

Simpson, AGB, Radek, R, Dacks, J O'Kelly, C. 2002a. How oxymonads lost their groove: an ultra- structural comparison of Monocercomonoides and excavate taxa. J Eukaryot Microbiol. 49:239-248.

Simpson, AGB, Roger, AJ, Silberman, JD, Leipe, DD, Edgcomb, VP, Jermiin, LS, Patterson, DJ Sogin, ML. 2002b. Evolutionary History of "Early-Diverging" Eukaryotes: The Excavate Taxon is a Close Relative of Giardia. Mol Biol Evol. 19:1782-1791.

Sims, P, Mann, D Medlin, LK. 2006. Evolution of the diatoms: insights from fossil, biological and molecular data. Phycol. 45:361.

Sjölander, K. 2004. Phylogenomic inference of protein molecular function: advances and challenges. Bioinformatics. 20:170-179.

Skovgaard, A, Daugbjerg, N. 2008. Identity and systematic position of Paradinium poucheti and other Paradinium-like parasites of marine copepods based on morphology and nuclear-encoded SSU rDNA. Protist. 159:401-413.

Slamovits, CH Keeling, PJ. 2008. Plastid-Derived Genes in the Non-Photosynthetic Alveolate Oxyr- rhis marina. Mol Biol Evol.

Smirnov, A, Nassonova, E, Berney, C, Fahrni, JF, Bolivar, I Pawlowski, J. 2005. Molecular phylogeny and classification of the lobose amoebae. Protist. 156:129-142.

Snel, B, Bork, P Huynen, MA. 1999. Genome phylogeny based on gene content. Nat Genet. 21:108- 110.

Sogin, ML. 1989. Evolution of Eukaryotic and Their Small Subunit Ribosomal RNAs 1. Integr Comp Biol. 29:487-499.

Sogin, ML. 1991. Early evolution and the origin of eukaryotes. Curr Opin Genet Dev. 1:457-463.

 143 Chapitre 8: References

Sogin, ML, Elwood, HJ Gunderson, JH. 1986. Evolutionary diversity of eukaryotic small-subunit rRNA genes. Proc Natl Acad Sci USA. 83:1383-1387.

Sogin, ML, Gunderson, JH, Elwood, HJ Alonso, RA. 1989. Phylogenetic meaning of the kingdom con- cept: an unusual ribosomal RNA from Giardia lamblia. Science. 243:75-77.

Sogin, ML Silberman, JD. 1998. Evolution of the protists and protistan parasites from the perspective of molecular systematics. Int J Parasitol. 28:11.

Soll, J Schleiff, E. 2004. Protein import into chloroplasts. Nat Rev Mol Cell Biol. 5:198-208.

Stamatakis, A. 2006. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thou- sands of taxa and mixed models. Bioinformatics. 22:2688-2690.

Stechmann, A Cavalier-Smith, T. 2002. Rooting the eukaryote tree by using a derived gene fusion. Science. 297:89-91.

Stechmann, A Cavalier-Smith, T. 2003a. Phylogenetic analysis of eukaryotes using heat-shock protein Hsp90. J Mol Evol. 57:408-419.

Stechmann, A Cavalier-Smith, T. 2003b. The root of the eukaryote tree pinpointed. Curr Biol. 13:R665-666.

Steenkamp, ET, Wright, J Baldauf, SL. 2006. The protistan origins of animals and fungi. Mol Biol Evol. 23:93-106.

Stefanović, S, Rice, DW Palmer, JD. 2004. Long branch attraction, taxon sampling, and the earliest angiosperms: Amborella or monocots? BMC Evol Biol. 4:35.

Stiller, JW. 2007. Plastid endosymbiosis, genome evolution and the origin of green plants. Trends Plant Sci. 12:391-396.

Stiller, JW Hall, BD. 1999. Long-Branch Attraction and the rDNA Model of Early Eukaryotic Evolu- tion. Mol Biol Evol. 16:1270-1279.

Tatusov, R et al. 2003. The COG database: an updated version includes eukaryotes. BMC Bioinfor- matics. 4:41.

Tekle, YI, Grant, J, Cole, JC, Nerad, TA, Anderson, OR, Patterson, DJ Katz, LA. 2007. A multigene analysis of Corallomyxa tenera sp. nov. suggests its membership in a clade that includes Gromia, Haplosporidia and Foraminifera. Protist. 158:457-472.

Tengs, T, Dahlberg, OJ, Shalchian-Tabrizi, K, Klaveness, D, Rudi, K, Delwiche, CF Jakobsen, KS. 2000. Phylogenetic analyses indicate that the 19'Hexanoyloxy-fucoxanthin-containing dinoflagellates have tertiary plastids of haptophyte origin. Mol Biol Evol. 17:718-729.

Thompson, J, Higgins, D Gibson, T. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucl Acids Res. 22:4673-4680.

Todo, Y, Kitazato, H, Hashimoto, J Gooday, AJ. 2005. Simple Foraminifera Flourish at the Ocean's Deepest Point. Science. 307:689.

Tovar, J, Fischer, A Clark, CG. 1999. The , a novel organelle related to mitochondria in the amitochondrial parasite Entamoeba …. Mol Microbiol. 32:1013-1021.

Tyler, BM et al. 2006. Phytophthora genome sequences uncover evolutionary origins and mechanisms of pathogenesis. Science. 313:1261-1266.

144 Chapitre 8: References

Wainright, PO, Hinkle, G, Sogin, ML Stickel, SK. 1993. Monophyletic origins of the metazoa: an evolutionary link with fungi. Science. 260:340-342.

Wellman, C, Osterloff, P Mohiuddin, U. 2003. Fragments of the earliest land plants. Nature. 425:282-285.

Whittaker, H. 1969. New concepts of kingdoms of organisms. Pp. 150-160. Science.

Wiens, JJ. 2003. Missing data, incomplete taxa, and phylogenetic accuracy. Syst Biol. 52:528-538.

Wiens, JJ. 2005. Can incomplete taxa rescue phylogenetic analyses from long-branch attraction? Syst Biol. 54:731-742.

Wiens, JJ. 2006. Missing data and the design of phylogenetic analyses. J Biomed Inform. 39:34-42.

Woese, CR. 2002. On the evolution of cells. Proc Natl Acad Sci U S A. 99:8742-8747.

Woese, CR, Kandler, O Wheelis, ML. 1990. Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci USA. 87:4576-4579.

Wylezich, C, Meisterfeld, R, Meisterfeld, S Schlegel, M. 2002. Phylogenetic analyses of small subunit ribosomal RNA coding regions reveal a monophyletic lineage of euglyphid testate amoebae (Order Euglyphida). J Eukaryot Microbiol. 49:108-118.

Yamamoto, A, Hashimoto, T, Asaga, E, Hasegawa, M Goto, N. 1997. Phylogenetic Position of the Mitochondrion-Lacking Protozoan Trichomonas tenax, Based on Amino Acid Sequences of Elongation Factors 1. J Mol Evol. 44:98.

Yang, Z. 1994. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J Mol Evol. 39:306-314.

Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 13:555-556.

Yoon, HS, Hackett, JD, Ciniglia, C, Pinto, G Bhattacharya, D. 2004. A molecular timeline for the origin of photosynthetic eukaryotes. Mol Biol Evol. 21:809-818.

Yoon, HS, Hackett, JD, Pinto, G Bhattacharya, D. 2002. The single, ancient origin of chromist plas- tids. Proc Natl Acad Sci USA. 99:15507-15512.

Zhang, Z. 1986. Clastic facies microfossils from the Chuanlinggou Formation (1800 Ma) near Jixian, North China. J Micropaleontol. 5.

Zuckerkandl, E Pauling, L. 1965. Molecules as documents of evolutionary history. J Theor Biol. 8:357-366.

 145

Chapter 9: Annexes  

9.1 Other projects in which I have been involved during my PhD

Evolutionary position of breviate amoebae and the primary eukaryote diver- gence

by M. A. Minge, Silbermann JD, Orr RJS, Cavalier-Smith T, Shalchian-Tabrizi K, Burki F, Skjaeveland A & Jakobsen KS Published in: Proceedings of the Royal Society B, 276: 597-604, 2009

Untangling the Phylogeny of Amoeboid Protists

by J. Pawlowski & Burki F Published in: Journal of Eukaryotic , 56: 16-25

9.2 Journal-formatted copy of the published chapters

Analysis of expressed sequence tags from a naked foraminiferan Reticulomyxa filosa

by F. Burki, Nikolaev SI, Bolivar I, Guiard J & Pawlowski J Published in: Genome, 49: 882-887, 2006

Monophyly of Rhizaria and Multigene Phylogeny of Unicellular Bikonts

by F. Burki & Pawlowski J Published in: Molecular Biology and Evolution, 23: 1922-1930, 2006

 147

Phylogenomics Reshuffles the Eukaryotic Supergroups

by F. Burki, Shalchian-Tabrizi K, Minge M, Skjaeveland A, Nikolaev SI, Jakobsen KS & Pawlowski J Published in: PLoS ONE, 2: e790, 2007

Phylogenomics reveals a new ‘megagroup’ including most photosynthetic eu- karyotes

by F. Burki, Shalchian-Tabrizi K & Pawslowski J Published in: Biology letters, 4: 366-369, 2008

9.3 Articles related to our work

Building the Tree of Life, Genome by Genome

by E. Pennisi NewsFocus published in: Science, 320: 1716-1717, 2008

L’arbre de la vie perd une branche

par A. Vos Publié dans: Campus, No 88, 4-5, 2008

148 Downloaded from rspb.royalsocietypublishing.org on 6 April 2009

Proc. R. Soc. B (2009) 276, 597–604 doi:10.1098/rspb.2008.1358 Published online 11 November 2008

Evolutionary position of breviate amoebae and the primary eukaryote divergence Marianne A. Minge1, Jeffrey D. Silberman2, Russell J. S. Orr3, Thomas Cavalier-Smith4, Kamran Shalchian-Tabrizi3, Fabien Burki5, A˚ smund Skjæveland3 and Kjetill S. Jakobsen1,* 1Department of Biology, Centre for Ecological and Evolutionary Synthesis, University of Oslo, 0316 Oslo, Norway 2Department of Biological Sciences, University of Arkansas, Fayetteville, AR 72701, USA 3Department of Biology, Microbial Evolution Research Group, University of Oslo, 0316 Oslo, Norway 4Department of Zoology, University of Oxford, South Parks Road, Oxford OX1 3PS, UK 5Department of Zoology and Animal Biology, Molecular Systematics Group, University of Geneva, 1224 Cheˇne-Bougeries, Switzerland Integration of ultrastructural and molecular sequence data has revealed six supergroups of eukaryote organisms (excavates, Rhizaria, chromalveolates, Plantae, Amoebozoa and opisthokonts), and the root of the eukaryote evolutionary tree is suggested to lie between unikonts (Amoebozoa, opisthokonts) and bikonts (the other supergroups). However, some smaller lineages remain of uncertain affinity. One of these unassigned taxa is the anaerobic, free-living, amoeboid flagellate Breviata anathema, which is of key significance as it is unclear whether it is a unikont (i.e. possibly the deepest branching amoebozoan) or a bikont. To establish its evolutionary position, we sequenced thousands of Breviata genes and calculated trees using 78 protein sequences. Our trees and specific substitutions in the 18S RNA sequence indicate that Breviata is related to other Amoebozoa, thereby significantly increasing the cellular diversity of this phylum and establishing Breviata as a deep-branching unikont. We discuss the implications of these results for the ancestral state of Amoebozoa and eukaryotes generally, demonstrating that phylogenomics of phylogenetically ‘nomadic’ species can elucidate key questions in eukaryote evolution. Furthermore, mitochondrial genes among the Breviata ESTs demonstrate that Breviata probably contains a modified anaerobic mitochondrion. With these findings, remnants of mitochondria have been detected in all putatively deep-branching amitochondriate organisms. Keywords: Breviata anathema; bikont; unikont; amoebozoa; excavates; phylogenomics

1. INTRODUCTION Bikonts were defined as all eukaryotes ancestrally Almost all the millions of eukaryote species belong to only having two centrioles and cilia, with the anterior one six recognized supergroups of organisms (Baldauf 2003; being the younger and undergoing ciliary transformation Keeling 2004; Simpson & Roger 2004; Keeling et al. 2005). to become the posterior with a modified structure Recent molecular and cellular evidence suggests that these in its second cell cycle (Cavalier-Smith 2002). Unikonts in turn may comprise just two superclades: unikonts and were proposed to have had a last common ancestor with bikonts (Stechmann & Cavalier-Smith 2003; Richards & only one centriole and one cilium. It has long been known Cavalier-Smith 2005). The exclusively heterotrophic that many unikonts have two centrioles and some even two unikont eukaryotes comprise opisthokonts (animals, cilia but these were considered derived complications. fungi and immediate unicellular relatives) and Amoebozoa When unikonts have two cilia, the anterior one never (amoebae with broad pseudopods and slime moulds), transforms into the posterior one. As many bikonts are while the bikonts comprise photosynthetic Plantae, secondarily uniciliate, the unikont/bikont distinction chromalveolates (chromophyte algae and their non- stresses fundamental differences in centriolar develop- photosynthetic descendants, e.g. and sporozoan ment and inferred ancestral state, not the number of protozoa) and two diverse groups of mainly heterotrophic centrioles or cilia per cell, which is evolutionarily more protozoa (excavates, predominantly flagellates with rigid labile. Based on a rare gene fusion and other molecular cell cortex and a specialized feeding groove, and Rhizaria, cladistic characters, as well as basic differences in mostly soft-surfaced cells with elaborate nets or filamen- microtubular cytoskeleton and ciliary development tous pseudopods for feeding) (Stechmann & Cavalier- (Cavalier-Smith 2002), the root of the eukaryote tree of Smith 2002, 2003; Cavalier-Smith 2004; Keeling 2004; life was proposed to lie between bikonts and unikonts Simpson & Roger 2004; Keeling et al. 2005). (Stechmann & Cavalier-Smith 2002, 2003; Richards & Cavalier-Smith 2005). All recent multigene trees (e.g. * Author for correspondence ([email protected]). Burki et al. 2007; Rodrı´guez-Ezpeleta et al. 2007) strongly Electronic supplementary material is available at http://dx.doi.org/10. support a bipartition of eukaryotes into unikonts and 1098/rspb.2008.1358 or via http://journals.royalsociety.org. bikonts and are compatible with the root lying between

Received 23 September 2008 Accepted 15 October 2008 597 This journal is q 2008 The Royal Society Downloaded from rspb.royalsocietypublishing.org on 6 April 2009

598 M. A. Minge et al. Evolution of Amoebozoa

(b) branching bikonts) and/or planomonads (formerly mis- identified as ; see Cavalier-Smith et al. 2008) or with Amoebozoa, but no position is significantly supported (Bolivar et al. 2001; Cavalier-Smith et al. 2004; Walker et al. 2006). Amoebozoa, the group to which we now show Breviata belongs, is probably one of the earliest branches from the eukaryotic cenancestor and important for deducing its characteristics (Cavalier-Smith 2002; Richards & Cavalier-Smith 2005). Although the name Amoebozoa is old (Lu¨he 1913), it has only recently been recognized as a phylogenetically coherent group, with many unrelated amoebae now being excluded (Cavalier-Smith 1998; Cavalier-Smith & Chao 2003) (a) and its classification revised (Cavalier-Smith et al. 2004; Nikolaev et al.2006). Amoebozoa currently include classical naked and testate lobose amoebae, anaerobic (Entamoebae and pelobionts) and myce- tozoan slime moulds (Cavalier-Smith et al. 2004), but exclude all amoeboid protozoa with true filopodia (ones (c) that draw the cell forward by contraction), which instead belong to the bikont phylum Cercozoa that includes the chlorarachnean algae (Cavalier-Smith & Chao 2003). However, based on phylogenetic analyses and ultrastruc- tural features, Cavalier-Smith et al. (2004) proposed a new class Breviatea including Breviata and two environmental sequences that clustered together with Breviata in 18S rRNA phylogenies, and postulated breviates as the out- group to all other Amoebozoa. As multigene analyses usually generate more robust Figure 1. In vivo morphology of B. anathema.Light phylogenetic inferences than single genes (Bapteste et al. micrographs of unstained living B. anathema cells. (a)400! 2002; Burki et al. 2007), we constructed a cDNA library DIC image highlighting the numerous branching pseudopodia and widened cell sheath at the base of the single flagellum. from B. anathema and sequenced approximately 4100 clones (b) Inset 630! DIC image showing the position of the nucleus and reconstructed global eukaryote phylogeny using containing a centrally located nucleolus. (c)400! phase- approximately 17 300 amino acid characters (figure 2). We contrast image highlighting the flattened pseudopodial attach- also searched our database for mitochondria-related genes, ments to the substrate. Scale bars, 5 mm. as Breviata is also of special evolutionary interest as an anaerobic/microaerophilic organism with unusual hydroge- them, though a recent paper on just a few genes raises a nosome-like organelles, whose putative mitochondrial nature potential problem for the simplest interpretation of these is controversial (Walker et al. 2006). As is well known, several data (Kim et al. 2006). To test it more thoroughly and eukaryote lineages within fungi, Amoebozoa (pelobionts, better eliminate alternatives, additional putatively derived Entamoeba), ciliates, heterokonts (Blastocystis)andexcavates cladistic characters need to be identified (Rodrı´guez- (Heterolobosea, Preaxostyla, parabasalids, diplomonads and Ezpeleta et al. 2007), and other little studied lineages must Carpediemonas) independently modifiedtheirmitochondria be included in multigene analyses. into anaerobic energy-generating organelles (hydrogeno- We focus here on the phylogenomics of one such key somes) or the more degenerate mitosomes (Tielens et al. lineage, the breviate amoeboflagellates (Cavalier-Smith 2002; van der Giezen & Tovar 2005; Barbera et al. 2007). et al. 2004)—a group that has defied placement in either Since all groups other than breviates that putatively unikonts or bikonts or any of the six eukaryotic super- represented descendants of a pre-mitochondrial eukaryotic groups, and whose correct placement is likely to illuminate lineage have now been investigated and shown to contain the primary eukaryotic divergence. mitochondrial-related remnants (i.e. organelles or genes) Breviata anathema (previously misidentified as (Hampl et al. 2008), the only remaining known lineage that Mastigamoeba invertens) is a deeply branching anaerobic might be ancestrally amitochondriate is the breviates. amoeboflagellate eukaryote, which has been notoriously However, genes that trace their ancestry to the difficult to place phylogenetically (Cavalier-Smith et al. mitochondrion clearly demonstrate a mitochondrial 2004; Walker et al.2006), and has some apparent history for Breviata. morphological affinities with unikonts (i.e. its amoeboid cell body and single flagellum) and some with bikonts (two basal bodies); its filose pseudopodia (micrographs, 2. MATERIAL AND METHODS figure 1) differ from those of either group. In single-gene (a) Library construction and EST sequencing phylogenetic analyses of the small subunit ribosomal RNA B. anathema (strain ATCC 50338) was cultured with one or gene (18S) and the largest subunit of DNA-dependent two unidentified bacteria as food in tightly sealed 500 ml RNA polymerase II (RPB1), the position of Breviata is tissue culture flasks containing 75 ml ATCC 1773 medium at very unstable; it variably associates with the excavates, room temperature (approx. 218C). Total RNA was isolated apusomonads (themselves either excavates or still earlier from cells harvested by centrifugation using Tri reagent

Proc. R. Soc. B (2009) Downloaded from rspb.royalsocietypublishing.org on 6 April 2009

Evolution of Amoebozoa M. A. Minge et al. 599

Karlodinium micrum Alexandrium tamarense Cryptosporidium parvum alveolates Toxoplasma gondii heterokonts 76/80/1.0 Tetrahymena thermophila Phaeodactylum tricornutum 84/81/1.0 Phytophthora sojae Reticulomyxa filosa Rhizaria Bigelowiella natans 65/52/ Pavlova lutheri haptophytes 0.98 Isochrysis galbana 57/–/0.89 cryptophytes bikonts Guillardia theta Cyanophora paradoxa Glaucocystis nostochinearum Glaucophyta –/–/0.72 Cyanidioschyzon merolae Rhodophyta plants Porphyra yezoensis 53/58/1.0 Chlamydomonas reinhardtii Arabidopsis thaliana 83/84/1.0 Reclinomonas americana ‘Seculamonas ecuadoriensis’ 93/74/1.0 Sawyeria marylandensis excavates 93/75/0.77 63/55/– Euglena gracilis Malawimonas jakobiformis Monosiga ovata 77/71/– Capsaspora owczarzaki Homo sapiens Drosophila melanogaster opisthokonts Neurospora crassa Cryptococcus neoformans Fungi

Mortierella verticillata ikonts

Mastigamoeba balamuthi un Entamoeba histolytica –/53/0.66 Physarum polycephalum Dictyostelium discoideum Amoebozoa Acanthamoeba castellani 53/–/– Hartmannella vermiformis 87/88/0.97 Breviata anathema 0.1 Figure 2. A global phylogeny of eukaryotes. Maximum-likelihood tree with bootstrap support values (BV) from an amino acid alignment of 78 concatenated genes (17 283 characters) inferred using RAXML and TREEFINDER (both giving identical topology; the RAXML tree is shown). Bayesian PP support values for bipartitions are also shown if more than 0.50. Filled circles denote support values of 100% BV and 1.0 PP, and dash (K) denotes support value below 50% BV or 0.50 PP. Nodes without denotation received less than 50% BV and less than 0.50 PP.

(Sigma-Aldrich, St Louis, MO, USA). A non-normalized, Ambiguously aligned characters were selected manually and directional, ‘microquantity cDNA library’ was constructed in excluded from the analyses. For each single-gene alignment, the plasmid vector pAGEN-1 by Agencourt Bioscience, Corp. orthologous gene copies were identified by manual inspection (Beverly, MA, USA). Approximately, 4100 randomly picked of phylogenetic trees and bootstrap values (BV) inferred with clones were 50-end sequenced; the EST sequences were PhyML (rtREV substitution model, 100 bootstrap replicates; subsequently quality checked and assembled to contigs using Guindon & Gascuel 2003). Additionally, for taxa with two or a Phred/Phrap pipeline at the freely available Bioportal service more nearly identical sequences, the sequence displaying the at the University of Oslo (http://www.bioportal.uio.no). shortest branch length on the tree was kept. The final multigene dataset contained 78 genes (17 280 amino acid (b) Multigene alignment construction characters) and 37 taxa. Taxa sampled were chosen to reflect BLASTx analyses (http://www.ncbi.nlm.nih.gov/BLAST)of the evolutionary range of eukaryotes, and the genes selected Breviata singletons and contigs were performed to identify are based on the genes detected in the Breviata library. Details gene similarities. Breviata sequences and significant hits about taxon sampling and genes used in the analyses are given (E-value O1eK5) from a range of other publicly available in table S1 in the electronic supplementary material. sequences from different databases (TBestDB, http://tbestdb. Three fast-evolving excavates were excluded from the bcm.umontreal.ca/searches/login.php; NCBIest and NCBInr main analyses owing to their long branches (Simpson et al. database) were added to the existing single-gene align- 2006), known to cause long-branch attraction artefacts in ments (Rodrı´guez-Ezpeleta et al. 2005; Burki et al. 2007). phylogenetic trees (Philippe 2000), but were included in an

Proc. R. Soc. B (2009) Downloaded from rspb.royalsocietypublishing.org on 6 April 2009

600 M. A. Minge et al. Evolution of Amoebozoa

categories placement of Breviata anathema within Amoebozoa removed

Entamoeba histolytica 86 Mastigamoeba balamuthi Breviata anathema Hartmannella vermiformis 8 Physarum polycephalum Dictyostelium discoideum Acanthamoeba castellani

Entamoeba histolytica Mastigamoeba balamuthi 55 Breviata anathema 54 Dictyostelium discoideum 7+8 87 Physarum polycephalum Acanthamoeba castellani 54 Hartmannella vermiformis

Entamoeba histolytica 99 72 Mastigamoeba balamuthi Breviata anathema 57 99 Dictyostelium discoideum 6+7+8 55 Physarum polycephalum Acanthamoeba castellani 59 Hartmannella vermiformis

Figure 3. The placement of Breviata within Amoebozoa in three maximum-likelihood phylogenies with BV inferred with RAXML after removing categories of fast-evolving sites. Only the Amoebozoa branch is shown and global trees are shown in figure S2 in the electronic supplementary material. Categories 6, 7 and 8 refer to the sites removed; category 8 comprises the fastest evolving sites. Filled circles denote support values of 100% BV. additional analysis shown in supplementary material (see Bayesian inference used PHYLOBAYES v. 2.3 (Lartillot & figure S1 in the electronic supplementary material). Philippe 2004), with the CATevolutionary model, a gamma- The impact of fast-evolving sites on the phylogeny was distributed across-site variation (four discrete rate categories) assessed by estimation with codonML in PAML (Yang 2007) and random starting tree. Changes in log likelihood as a under eight rate categories and subsequent site removal script function of time were used to estimate whether the two applied to the alignment (S. Kumar, A˚ , Skjævelend, T. parallel chains had reached a stationary state. This was then Ruden, A. Botnen & K. Shalchian-Tabrizi 2008, unpublished used to set the burn-in and compare the frequency of the data). ML bootstrap consensus trees were inferred (as bipartitions between several independent runs. The largest described below) from 100 pseudoreplicate datasets after discrepancy (maxdiff ) between the bipartitions was less than the three fastest site-rate categories were removed (see figure 0.1, and therefore we considered the Markov chain Monte S2 in the electronic supplementary material). Support for Carlo chains to have converged. The tree and PP values Amoebozoa and for the position of Breviata in optimal trees is presented in figure 2 are a consensus of the cold chains from shown in figure 3. the two independent runs. The approximately unbiased (AU) tests were performed (c) Phylogenetic analyses and approximately on the dataset that included all sites and on datasets with unbiased test categories of fast-evolving sites removed (see table S2 in the All phylogenetic analyses were performed on the Bioportal at electronic supplementary material). Site likelihoods were the University of Oslo (http://www.bioportal.uio.no). Maxi- calculated in RAXML and the AU test performed with mum-likelihood phylogeny of the concatenated data was CONSEL (Shimodaira & Hasegawa 2001) using the rtREV inferred with RAXML MPI v. 2.2.3 (Stamatakis 2006) and evolutionary model, default scaling and replicate values. TREEFINDER (Jobb et al. 2004) The rtREVCF evolutionary model was preferred by PROTTEST v. 1.3 under the Akaike information criterion with four GAMMA rate categories 3. RESULTS AND DISCUSSION (Posada & Crandall 1998). Topological tree searches (a) A global phylogeny including B. anathema were performed with 100 randomly generated starting In our phylogeny, Breviata is convincingly placed with trees, while bootstrap analysiswasperformedon100 Amoebozoa (supported with 87/88% BV and 0.97 PP pseudoreplicates and one random starting tree for each value; figure 2) by both maximum-likelihood (inferred replicate, with the same evolutionary model as the initial with RAXML and TREEFINDER, respectively) and Bayesian search. In the RAXML, analyses trees were inferred under methods. Removing the fastest evolving sites of the PROTMIX (Stamatakis 2006). alignment did not influence this placement (figure 3; see

Proc. R. Soc. B (2009) Downloaded from rspb.royalsocietypublishing.org on 6 April 2009

Evolution of Amoebozoa M. A. Minge et al. 601

figure S2 in the electronic supplementary material). oxymonads), Eopharyngia (diplomonads and retortamo- Removing the fastest evolving sites increased the bootstrap nads) and parabasalids, are monophyletic but with weak support to 100 per cent BV for Breviata grouping with bootstrap support (63/55% BV). This clade is Amoebozoa (figure 3a). Sequential removal of additional not recovered in the Bayesian phylogeny. Plantae are fast-site categories decreased the support for most super- paraphyletic here owing to the inclusion of haptophytes groups, including Amoebozoa, but the relationship of and cryptomonads. BreviataCAmoebozoa was always recovered. In all trees A minority of 18S rRNA analyses have suggested a with fastest evolving sites removed, the clear-cut separ- specific affiliation of Breviata to apusomonads (Walker ation into unikonts and bikonts (with Breviata among the et al. 2006), but too few protein-coding genes are available unikonts) was even more strongly supported than that from apusomonads for us to test this hypothesis directly. shown in figure 2 (88, 97, 95% BV; see figure S2 in the Likewise, the phylogenetic position of apusomonads is electronic supplementary material). An additional phylo- controversial, with ultrastructural and gene fusion evi- geny including three additional fast-evolving excavate taxa dence suggesting a bikont affinity (Karpov & Zhukov (Giardia intestinalis, Trichomonas vaginalis and Trimastix 1986; Stechmann & Cavalier-Smith 2002) while two- to pyriformis; see figure S1 in the electronic supplementary six-gene phylogenies place Apusomonas proboscidea as sister material) also supported the placement of Breviata with to opisthokonts (Kim et al. 2006). However, when Amoebozoa, but somewhat less strongly. Hence, this a-tubulin was excluded from the multigene analyses of relationship is robust and not sensitive to the removal of Kim et al. (2006), the placement of A. proboscidea as sister fast-evolving sites or to taxon sampling. The alternative to Amoebozoa could not be rejected (Kim et al. 2006). placement of Breviata within bikonts suggested by many Thus, there is no evidence suggesting that Breviata is single-gene trees (Cavalier-Smith et al. 2004; Shalchian- misplaced in our tree. Tabrizi et al. 2006; Walker et al. 2006) is not seen in any inferred multigene trees, and this topology was rejected (b) Breviate amoebae are unusual amoebozoans by the AU tests of the reduced datasets from which the In all our multigene trees, Breviata is placed with fastest evolving sites were successively removed (AU test; Amoebozoa with high support. The precise placement see table S2 in the electronic supplementary material). within the group, however, is not consistent in the trees Although grouping of Breviata with Amoebozoa is inferred, as some of them support a sister relationship strong, bootstrap support for placing Breviata as a sister between Archamoebae and Breviata, while others indicate to—rather than among—the other amoebozoan taxa is that Breviata is sister to the remaining Amoebozoa (figures weak. Accordingly, the AU tests did not reject the 2 and 3). Notably, the absence of the Amoebozoa-specific possibility that Breviata may branch among other substitutions in the 18S sequence indicates that the latter Amoebozoa as sister to the other anaerobic amoebae hypothesis, consistent with the hypothesis proposed by (Archamoebae: Entamoeba and Mastigamoeba; see table S2 Cavalier-Smith et al. (2004), is more likely. Walker et al. in the electronic supplementary material) and this sister (2006) reasonably argued that because Breviata is not relationship is supported in two of the trees inferred after closely similar in morphology to any of the other classes of removing fast-evolving sites (figure 3a,c). However, it is ciliated Amoebozoa it does not belong in any of them more likely that Breviata is sister to the other Amoebozoa, (Walker et al. 2006). However, their conclusion that it owing to its lack of four sequence signatures in the 18S is therefore not an amoebozoan did not take into rRNA gene that other Amoebozoa all share; single account the possibility of a common ancestry plus later nucleotide substitutions at positions 385, 777 and 1010 substantial morphological divergence from the other and a 1–2 nucleotide insertion in the loop between classes, which now appears to be the case. Indeed, positions 1060 and 1064 (Fahrni et al. 2003). If Breviatea amoebozoan morphological diversity has been expanded were sisters to Archamoebae, all four signatures must have by careful observations that reveal a unique gait in reverted to the ancestral state found in all out-groups to Breviata locomotion. These amoebae travel by ‘walking’ Amoebozoa (Fahrni et al. 2003), which is unlikely as most with thin but robust leg-like pseudopodia that emanate other Amoebozoa have all four of these signatures, and all from the anterior of the cell body, and adhere to the have at least two (Fahrni et al. 2003). substratum, while the cell body proceeds forward just as Overall, our inferred phylogeny (figure 2) is congruent a package travelling on a roller conveyor or ‘tractor on with other recent global eukaryotic phylogenies (Burki treads’ (figure 1). The filose ‘legs’ often remain as et al. 2007; Rodrı´guez-Ezpeleta et al. 2007). Several trailing filaments before they retract into the cell body. lineages are strongly supported by maximum-likelihood This character distinguishes Breviata from other organ- BV and PP values, including Holozoa (animalsCchoano- isms, as no other eukaryote has even vaguely similar flagellates), fungi, opisthokonts, Rhodophyta, Glauco- motor movements. phyta, Viridiplantae, Haptophyta, Alveolata, Rhizaria and Prior to the addition of Breviatea, Amoebozoa Heterokonta. Our tree is congruent with several higher comprised two well-defined subphyla: the often ciliated order relationships with BV values above 80 per cent: (Mycetozoa, Archamoebae) characterized by a Amoebozoa, including Breviata (87/88% BV, 0.99 PP) conical microtubular skeleton diverging from the centriole and a grouping of alveolates, heterokonts (stramenopiles) or centrosome, and the purely amoeboid that lack and Rhizaria—the putative SAR assembly, noted previo- cilia, centrioles and cytoplasmic microtubules (Cavalier- usly in several recent phylogenies (84/81% BV; 0.97 PP) Smith 1998). Our demonstration that Breviata is an (Burki et al. 2007; Hackett et al. 2007; Rodrı´guez-Ezpeleta amoebozoan significantly increases the cellular diversity of et al. 2007). The putative basal bifurcation between the phylum owing to its unusual pseudopodial unikonts and bikonts is supported by 83/84 per cent BV morphology, mode of locomotion and rather complex (1.00 PP). Excavates, excluding Preaxostyla (TrimastixC cytoskeleton. In marked contrast to the also anaerobic

Proc. R. Soc. B (2009) Downloaded from rspb.royalsocietypublishing.org on 6 April 2009

602 M. A. Minge et al. Evolution of Amoebozoa

Archamoebae, Breviata has two centrioles and a substan- necessarily bikont; for the distinction see Cavalier-Smith tially more asymmetric microtubular cytoskeleton. These 2002) nature and the structure of their ciliary roots differences, plus the presence of Golgi stacks in Breviata, (Molina & Nerad 1991), and needs further confirmation but not Archamoebae, justify their being in separate by multigene analyses. classes (Cavalier-Smith et al. 2004;asdothefour contrasting rRNA signatures mentioned above), but (d) The mitochondria-like organelle in (contrary to Walker et al. 2006) are not enough to merit B. anathema was probably derived independently separate phyla. Thus, there are now three broadly different from the other anaerobic lineages cytoskeletal patterns in Amoebozoa. In our Breviata cDNA library, we identified key mito- chondria-derived nuclear-encoded genes often seen in (c) Implications for ultrastructural evolution in amitochondrial taxa that trace their ancestry back to an early eukaryotes a-proteobacterial ancestor (here shown by cpn60 (see figure The ancestral cellular structure for Amoebozoa was S3 in the electronic supplementary material) and tim17 argued to be a uniciliate, unicentriolar amoeba with a (data not shown)). This clearly rejects the possibility that radially symmetric pericentriolar microtubular cone Breviata is a pre-mitochondrial eukaryote, and suggests that (Cavalier-Smith et al. 2004). However, as the uniciliate the dense organelles bounded by two membranes seen Breviata possesses two centrioles, one of which serves as proximal to the nucleus in Breviata are mitochondria- the basal body of the cilium resulting in an asymmetric related organelles. Further investigations of mitochondrial cytoskeleton (Walker et al. 2006), this interpretation needs function in Breviata, including a search for hydrogenase some re-evaluation. As there are also other amoebozoan and biochemical studies, are now needed. If Breviata is lineages with two basal bodies, such as myxogastrids and a sister to other amoebozoa, the anaerobic adaption of the few protostelids, the two basal bodies in Breviata do not mitochondria in Breviata occurred independently of other contradict the inference that Breviata is an amoebozoan, known cases. However, our multigene trees and AU tests do but merely suggest that it is not an Archamoeba not exclude the possibility that Archamoebae and Breviata (Cavalier-Smith et al. 2004). If Breviata were sister to form a single secondarily anaerobic amoebozoan clade. Archamoebae, as some trees excluding faster evolving sites All extant eukaryotes examined in detail, even anaero- suggest but which the rRNA signatures render unlikely, bic ‘amitochondriate’ eukaryotes, have nuclear genes one could argue more strongly that its having a second whose phylogenetic history is best explained by entry barren centriole is a derived state. However, our more into the eukaryote lineage with the mitochondrion inclusive trees and 18S rRNA signatures in combination endosymbiont. It is thus unlikely that the anaerobic nature indicate that Breviata is probably sister to all previously of Breviata represents the ancestral state of Amoebozoa, accepted Amoebozoa. This makes it harder to infer the even though our data suggest that Breviata may be the ancestral state of Amoebozoa, in which there are now two deepest diverging amoebozoan lineage. The ancestral groups with two centrioles/basal bodies (Breviata, myx- amoebozoan must have been at least facultatively aerobic, ogastrids), three with one centriole per kinetid (Multicilia, though it could have been a facultative aerobe/anaerobe, Phalansterium, Archamoebae) and one with a mixture as many have postulated for the ancestral eukaryote (protostelids). Thus, a double centriolar ancestral state for (Cavalier-Smith 2006). Possibly aerobic members of the Amoebozoa is almost as parsimonious as the single Breviata clade will be discovered. centriolar scenario (Cavalier-Smith et al. 2004), especially as deeply branching opisthokonts (chytrids and choano- (e) Phylogenomics of unassigned species resolves flagellates), the sister group to Amoebozoa, have two key questions in eukaryote evolution centrioles. With respect to the cytoskeleton, the marked The challenging task of resolving eukaryotic global asymmetry found in B. anathema contrasts with the phylogeny has progressed through phylogenomic analysis hypothesized symmetrical ancestral state of Amoebozoa of major lineages (e.g. Nikolaev et al. 2004; Rodrı´guez- (Cavalier-Smith 2002). This asymmetry could be secon- Ezpeleta et al. 2005; Burki & Pawlowski 2006; Burki et al. darily derived in B. anathema and does not imply an 2007; Patron et al. 2007; Rodrı´guez-Ezpeleta et al. 2007). affinity to the asymmetric bikonts since the detailed Here, we demonstrated that investigating single, deeply arrangement of their ciliary roots differ substantially. diverging nomadic species is also crucial for improving our Thus, the inclusion of Breviata within Amoebozoa as its understanding of early evolutionary history of major most divergent group has important implications for the lineages of eukaryotes. Placing the previously unaffiliated ultrastructural evolution and likely ancestral state of the breviates, with their unique cytoskeletal pattern, in a clade cytoskeleton and centrioles in Amoebozoa and eukaryotes with other Amoebozoa illuminates the evolutionary generally. Our findings make it important to study both diversity of Amoebozoa and raises new questions con- the cytoskeleton and the pattern of ciliary and centriolar cerning the nature of ancestral amoebozoan and of the development more thoroughly in B. anathema and test unikont–bikont bifurcation suspected to reside at the base their generality among different breviates. As contrasting of the eukaryote tree. modes of ciliary development were a key aspect of the original recognition of the primary dichotomy between We thank Ce´dric Berney for helpful comments on the bikont and unikont eukaryotes (Cavalier-Smith 2002), manuscript, Dag Klaveness for fruitful discussions and such studies are of key significance for clarifying the basic Surendra Kumar for the site removal script. The Norwegian Research Council has granted scholarships to K.S.-T., organization of the earliest eukaryote cells. Unfortunately, M.A.M., R.J.S.O. and research project to K.S.J. T.C.-S. ciliary development is unstudied for Breviata and for thanks NERC and the Canadian Institute for Advanced apusomonads, whose putative inclusion within unikonts Research Evolutionary Biology Program for fellowship (Kim et al. 2006), is unexpected, given their biciliate (not support and NERC for research grants. The pan-Canadian

Proc. R. Soc. B (2009) Downloaded from rspb.royalsocietypublishing.org on 6 April 2009

Evolution of Amoebozoa M. A. Minge et al. 603 collaboration Protist EST Program (PEP: http://megasun. a mitochondriate ancestry in the ‘amitochondriate’ bch.umontreal.ca/pepdb/pep.html) generated sequence data flagellate Trimastix pyriformis. PLoS ONE 3, e1383. for some of the species included in the phylogenetic analyses. (doi:10.1371/journal.pone.0001383) Jobb, G., von Haeseler, A. & Strimmer, K. 2004 TREEFINDER: a powerful graphical analysis environment for molecular REFERENCES phylogenetics. BMC Evol. Biol. 4.(doi:10.1186/1471- Baldauf, S. L. 2003 The deep roots of eukaryotes. Science 2148-4-18) 300, 1703–1706. (doi:10.1126/science.1085544) Karpov, S. A. & Zhukov, B. F. 1986 Ultrastructure and Bapteste, E. et al. 2002 The analysis of 100 genes supports the taxonomic position of Apusomonas proboscidea Alexeieff. grouping of three highly divergent amoebae: Dictyostelium, Arch. Protistenkd. 131, 13–26. Entamoeba, and Mastigamoeba. Proc. Natl Acad. Sci. USA Keeling, P. 2004 A brief history of plastids and their hosts. 99, 1414–1419. (doi:10.1073/pnas.032662799) Protist 155, 3–7. (doi:10.1078/1434461000156) Barbera, M. J., Ruiz-Trillo, I., Leigh, J., Hug, L. A. & Roger, Keeling, P. J., Burger, G., Durnford, D. G., Lang, B. F., Lee, A. J. 2007 The diversity of mitochondrion-related organelles R. W., Pearlman, R. E., Roger, A. J. & Gray, M. W. 2005 amongst eukaryotic microbes. In Origin of mitochondria and The tree of eukaryotes. Trends. Ecol. Evol. 20, 670–676. hydrogenosomes (eds W. Martin & M. Mu¨ller), pp. 239–268. (doi:10.1016/j.tree.2005.09.005) Heidelberg, Germany: Springer. Kim, E., Simpson, A. G. B. & Graham, L. E. 2006 Bolivar, I., Fahrni, J. F., Smirnov, A. & Pawlowski, J. 2001 SSU Evolutionary relationships of apusomonads inferred from rRNA-based phylogenetic position of the genera Amoeba taxon-rich analyses of 6 nuclear encoded genes. Mol. Biol. and Chaos (Lobosea, Gymnamoebia): the origin of Evol. 23, 2455–2466. (doi:10.1093/molbev/msl120) gymnamoebae revisited. Mol. Biol. Evol. 18, 2306–2314. Lartillot, N. & Philippe, H. 2004 A Bayesian mixture model Burki, F. & Pawlowski, J. 2006 Monophyly of Rhizaria and for across-site heterogeneities in the amino-acid replace- multigene phylogeny of unicellular bikonts. Mol. Biol. ment process. Mol. Biol. Evol. 21, 1095–1109. (doi:10. Evol. 23, 1922–1930. (doi:10.1093/molbev/msl055) 1093/molbev/msh112) Burki, F., Shalchian-Tabrizi, K., Minge, M. A., Skjæveland, Molina,F.I.&Nerad,T.A.1991Ultrastructureof A˚ ., Nikolaev, S. I., Jakobsen, K. S. & Pawlowski, J. 2007 Amastigomonas bermudensis ATCC-50234 sp. nov—a new Phylogenomics reshuffles the eukaryotic supergroups. heterotrophic marine flagellate. Eur. J. Protistol. 27,386–396. PLoS ONE 2, e790. (doi:10.1371/journal.pone.0000790) Nikolaev, S. I., Berney, C., Fahrni, J. F., Bolivar, I., Polet, S., Cavalier-Smith, T. 1998 A revised six-kingdom system of Mylnikov, A. P., Aleshin, V. V., Petrov, N. B. & Pawlowski, life. Biol. Rev. 73, 203–266. (doi:10.1017/S000632319 J. 2004 The twilight of Heliozoa and rise of Rhizaria, an 8005167) emerging supergroup of amoeboid eukaryotes. Proc. Natl Cavalier-Smith, T. 2002 The phagotrophic origin of eukar- Acad. Sci. USA 101, 8066–8071. (doi:10.1073/pnas. yotes and phylogenetic classification of Protozoa. Int. 0308602101) J. Syst. Evol. Microbiol. 52, 297–354. Nikolaev, S. I., Berney, C., Petrov, N. B., Mylnikov, A. P., Cavalier-Smith, T. 2004 Only six kingdoms of life. Proc. R. Fahrni, J. F. & Pawlowski, J. 2006 Phylogenetic position of Soc. B 271, 1251–1262. (doi:10.1098/rspb.2004.2705) Multicilia marina and the evolution of Amoebozoa. Int. Cavalier-Smith, T. 2006 Origin of mitochondria by enslave- J. Syst. Evol. Microbiol. 56, 1449–1458. (doi:10.1099/ijs.0. ment of a photosynthetic purple bacterium. Proc. R. Soc. B 63763-0) 273, 1943–1952. (doi:10.1098/rspb.2006.3531) Patron, N. J., Inagaki, Y. & Keeling, P. J. 2007 Multiple gene Cavalier-Smith, T. & Chao, E. E. Y. 2003 Phylogeny and phylogenies support the monophyly of cryptomonad and classification of phylum Cercozoa (Protozoa). Protist 154, haptophyte host lineages. Curr. Biol. 17, 887–891. (doi:10. 341–358. (doi:10.1078/143446103322454112) 1016/j.cub.2007.03.069) Cavalier-Smith, T., Chao, E. E. Y. & Oates, B. 2004 Philippe, H. 2000 Opinion: long branch attraction and protist Molecular phylogeny of Amoebozoa and the evolutionary phylogeny. Protist 151, 307–316. (doi:10.1078/S1434- significance of the unikont Phalansterium. Eur. J. Protistol. 4610(04)70029-2) 40, 21–48. (doi:10.1016/j.ejop.2003.10.001) Posada, D. & Crandall, K. A. 1998 MODELTEST: testing the Cavalier-Smith, T., Chao, E. E., Stechmann, A., Oates, B. & model of DNA substitution. Bioinformatics 14, 817–818. Nikolaev, S. 2008 Planomonadida ord. nov. (Apusozoa): (doi:10.1093/bioinformatics/14.9.817) ultrastructural affinity with Micronuclearia podoventralis Richards, T. A. & Cavalier-Smith, T. 2005 Myosin domain and deep divergences within Planomonas gen. nov. Protist evolution and the primary divergence of eukaryotes. 159, 535–562. Nature 436, 1113–1118. (doi:10.1038/nature03949) Corliss, J. O. 1984 The kingdom Protista and its 45 phyla. Rodrı´guez-Ezpeleta, N. et al. 2005 Monophyly of primary BioSystems 17, 87–126. (doi:10.1016/0303-2647(84)90 photosynthetic eukaryotes: green plants, red algae, and 003-0) glaucophytes. Curr. Biol. 15, 1325–1330. (doi:10.1016/ Fahrni, J., Bolivar, I., Berney, C., Nassonova, E., Smirnov, A. j.cub.2005.06.040) & Pawlowski, J. 2003 Phylogeny of lobose amoeba based Rodrı´guez-Ezpeleta, N., Brinkmann, H., Burger, G., Roger, on actin and small-subunit ribosomal RNA genes. Mol. A. J., Gray, M. W., Philippe, H. & Lang, B. F. 2007 Biol. Evol. 20, 1881–1886. (doi:10.1093/molbev/msg201) Toward resolving the eukaryotic tree: the phylogenetic Guindon, S. & Gascuel, O. 2003 A simple, fast and accurate positions of jakobids and cercozoans. Curr. Biol. 17, method to estimate large phylogenies by maximum- 1420–1425. (doi:10.1016/j.cub.2007.07.036) likelihood. Syst. Biol. 52, 696–704. (doi:10.1080/ Shalchian-Tabrizi, K. et al. 2006 Telonemia, a new protist 10635150390235520) phylum with affinity to chromist lineages. Proc. R. Soc. B Hackett, J. D., Yoon, H. S., Li, S., Reyes-Prieto, A., 273, 1833–1842. (doi:10.1098/rspb.2006.3515) Rummele, S. E. & Bhattacharya, D. 2007 Phylogenomic Shimodaira, H. & Hasegawa, M. 2001 CONSEL: for analysis supports the monophyly of cryptophytes and assessing the confidence of phylogenetic tree selection. haptophytes and the association of Rhizaria with chro- Bioinformatics 16, 296–297. malveolates. Mol. Biol. Evol. 24, 1702–1713. (doi:10. Simpson, A. G. B., Inagaki, Y. & Roger, A. J. 2006 1093/molbev/msm089) Comprehensive multigene phylogenies of excavate protists Hampl, V., Silberman, J. D., Stechmann, A., Diaz-Trivino, reveal the evolutionary positions of ‘primitive’ eukaryotes. S., Johnson, P. J. & Roger, A. J. 2008 Genetic evidence for Mol. Biol. Evol. 23,615–625.(doi:10.1093/molbev/msj068)

Proc. R. Soc. B (2009) Downloaded from rspb.royalsocietypublishing.org on 6 April 2009

604 M. A. Minge et al. Evolution of Amoebozoa

Simpson, A. G. B. & Roger, A. J. 2004 The real ’kingdoms’ Tielens, A. G. M., Rotte, C., van Hellemond, J. J. & Martin, of eukaryotes. Curr. Biol. 14, R693–R696. (doi:10.1016/ W. 2002 Mitochondria as we don’t know them. TiBS 27, j.cub.2004.08.038) 564–572. (doi:10.1016/S0968-0004(02)02193-X) Stamatakis, A. 2006 RAXML-VI-HPC: maximum likelihood- van der Giezen, M. & Tovar, J. 2005 Degenerate mitochondria. based phylogenetic analyses with thousands of taxa and EMBO Rep. 6, 525–530. (doi:10.1038/sj.embor.7400440) mixed models. Bioinformatics 22, 2688–2690. (doi:10. Walker, G., Dacks, J. B. & Embley, T. M. 2006 Ultra- 1093/bioinformatics/btl446) structural description of Breviata anathema, n. gen., n. sp., Stechmann, A. & Cavalier-Smith, T. 2002 Rooting the the organism previously studied as ‘Mastigamoeba inver- eukaryote tree by using a derived gene fusion. Science tens’. J. Eukaryot. Microbiol. 53, 65–78. (doi:10.1111/ 297, 89–91. (doi:10.1126/science.1071196) j.1550-7408.2005.00087.x) Stechmann, A. & Cavalier-Smith, T. 2003 The root of the Yang, Z. 2007 PAML 4: phylogenetic analysis by maximum eukaryote tree pinpointed. Curr. Biol. 13, R665–R666. likelihood. Mol. Biol. Evol. 24, 1586–1591. (doi:10.1093/ (doi:10.1016/S0960-9822(03)00602-X) molbev/msm088)

Proc. R. Soc. B (2009) J. Eukaryot. Microbiol., 56(1), 2009 pp. 16–25 r 2009 The Author(s) Journal compilation r 2009 by the International Society of Protistologists DOI: 10.1111/j.1550-7408.2008.00379.x Untangling the Phylogeny of Amoeboid Protists1

JAN PAWLOWSKI and FABIEN BURKI Department of Zoology and Animal Biology, University of Geneva, Geneva, Switzerland

ABSTRACT. The amoebae and amoeboid protists form a large and diverse assemblage of eukaryotes characterized by various types of pseudopodia. For convenience, the traditional morphology-based classification grouped them together in a macrotaxon named Sarcodina. Molecular phylogenies contributed to the dismantlement of this assemblage, placing the majority of sarcodinids into two new supergroups: Amoebozoa and Rhizaria. In this review, we describe the taxonomic composition of both supergroups and present their small subunit rDNA-based phylogeny. We comment on the advantages and weaknesses of these phylogenies and emphasize the necessity of taxon-rich multigene datasets to resolve phylogenetic relationships within Amoebozoa and Rhizaria. We show the importance of environmental sequencing as a way of increasing taxon sampling in these supergroups. Finally, we highlight the interest of Amoebozoa and Rhizaria for understanding eukaryotic evolution and suggest that resolving their phylogenies will be among the main challenges for future phylogenomic analyses. Key Words. Amoebae, Amoebozoa, eukaryote, evolution, Foraminifera, Radiolaria, Rhizaria, SSU, rDNA.

FROM SARCODINA TO AMOEBOZOA AND RHIZARIA ribosomal genes. The most spectacular fast-evolving lineages, HE amoebae and amoeboid protists form an important part of such as foraminiferans (Pawlowski et al. 1996), Teukaryotic diversity, amounting for about 15,000 described (Amaral Zettler, Sogin, and Caron 1997), pelobionts (Hinkle species (Adl et al. 2007), among which are several ecologically et al. 1994), entamoebids (Silberman et al. 1999), and mycetozo- important taxonomic groups. Lobose naked and testate amoebae are ans, were all affected by long-branch attraction artifacts in early common elements of soil and freshwater microbial communities, and studies (Philippe and Adoutte 1998; Stiller and Hall 1999). include species of critical medical importance (e.g. Entamoeba It was only after the development of probabilistic methods and histolytica). Radiolarians are among the most abundant and diverse the introduction of new evolutionary models correcting for among- groups of marine holoplankton. Organic-walled and agglutinated site heterogeneity that the SSU rDNA phylogeny of amoeboid pro- benthic foraminiferans dominate the deep-sea meiofauna, while tists could be partially resolved (Bolivar et al. 2001; Milyutina et al. planktonic and large benthic calcareous species are among the 2001). Complementing these results, protein-coding genes also be- main calcifying protists, contributing to almost 25% of the present- came available for a few species (Fahrni et al. 2003; Keeling 2001; day carbonate production in the oceans (Langer 2008). Both Pawlowski et al. 1999). New taxonomic entities of amoeboid pro- Foraminifera and Radiolaria are major groups of microfossils, tists, such as Amoebozoa and Rhizaria, started to emerge following widely used in paleostratigraphic and paleoclimatic reconstructions. these improvements (Cavalier-Smith 1998, 2002). Further multi- For convenience, all these taxonomic groups were placed gene studies and better taxon sampling in SSU rDNA trees have within the class or phylum Sarcodina, defined as protists possess- contributed to definitely establish both major groups (Archibald et ing pseudopodia or locomotive protoplasmic flow, with flagella al. 2003; Bapteste et al. 2002; Burki and Pawlowski 2006; Cavalier- usually restricted to developmental stages (Levine et al. 1980). Smith and Chao 2003b; Cavalier-Smith, Chao, and Oates 2004; Depending on the type of pseudopodia, the Sarcodina were further Longet et al. 2003; Nikolaev et al. 2004; Takishita et al. 2005). subdivided into the superclass Rhizopodea comprising protists Consequently, most sarcodinids were placed within either Am- having lopobodia, filopodia, and reticulopodia and the superclass oebozoa or Rhizaria in the new classification of protists (Adl et al. Actinopodea, composed of all axopodia-bearing protists (Lee, 2005). There are in fact only four taxonomic groups, traditionally Hutner, and Bovee 1985; Levine et al. 1980). Although this sys- included in Sarcodina, that now branch outside these supergroups. tem was vigorously criticized based on ultrastructural studies Among them are two orders of Heliozoa (i.e. Actinophryida and (Patterson 1994), no alternative classifications were proposed un- Centrohelida), the class Heterolobosea, and the genus Nuclearia. til the advent of molecular phylogenies. With the notable exception of Centrohelida, the other three taxa The first molecular phylogenies based on the small subunit have been confidently placed in one of the other eukaryotic su- (SSU) rDNA sequences provided strong evidence for the poly- pergroups. Actinophryida branch among stramenopiles in SSU phyletic origin of amoeboid protists. The independent branching rDNA trees, either as sister to Opalozoa (Cavalier-Smith and of Acanthamoeba and (Clark and Cross 1988) con- Chao 2006) or close to the ultrastructurally similar pedinellid al- firmed the ultrastructural differences between Lobosea and He- gae (Nikolaev et al. 2004), but the support for either relationship is terolobosea (Page and Blanton 1985). However, the erratic weak and there are currently no other genes, except for a partial distribution of other amoeboid protists in eukaryotic trees was sequence of Actinosphaerium actin (Nikolaev et al. 2004) to test strongly influenced by heterogeneity of the evolutionary rate in these hypotheses. Heterolobosea are usually grouped with Eugle- nozoa in the taxon , based on rDNA and protein se- quence data (Baldauf 2003; Cavalier-Smith 2002; Keeling and Corresponding Author: J. Pawlowski, Department of Zoology and Doolittle 1996), but some recent multigene phylogenies suggested Animal Biology, University of Geneva, Sciences III, 30 Quai Ernest that they are more closely related to jakobids (Simpson, Inagaki, Ansermet, 1211 Geneva 4, Switzerland—Telephone number: 141 22 and Roger 2006). Nuclearia branches as sister group to Fungi, as 379 30 69; FAX number: 141 22 379 33 40; e-mail: jan.pawlowski@ first revealed by SSU rDNA trees (Amaral Zettler et al. 2001) and zoo.unige.ch later confirmed by multigene analyses (Steenkamp, Wright, and 1Invited presentation delivered for the symposium: Advances in Evo- lutionary : a Symposium Honoring the Contributions of Baldauf 2006). The branch of Centrohelida is floating in current Tom Cavalier-Smith, 26 July 2008, The International Society of Evo- phylogenetic trees depending on the analyzed genes. In SSU lutionary Protistology and the International Society of Protistologists, rDNA trees, centrohelids appeared either as sister to haptophytes Dalhousie University, Halifax, NB Canada. (Cavalier-Smith and Chao 2003a) or as sister to rhodophytes 16 PAWLOWSKI & BURKI—PHYLOGENY OF AMOEBOZOA AND RHIZARIA 17

(Sakaguchi et al. 2005). A seven-gene analysis placed them as a the distinction of Vannellida and Dactylopodida, first shown by sister group to a clade comprising Chromalveolates and Plantae, Peglar et al. (2003), is not well supported. This is mainly due to the but without statistical support (Sakaguchi, Inagaki, and Hashi- rapidly evolving sequences of Clydonella, Ripella, Pessonella,and moto 2007). Vexillifera minutissima, which have a tendency to group together probably because of long-branch attraction (see Fig. 1). The genus Cochliopodium, whose position is still unresolved (Kudryavtsev et MOLECULAR PHYLOGENY OF AMOEBOZOA al. 2005), also seems to belong to this clade (Fig. 1). The supergroup Amoebozoa includes all naked and testate lob- The grouping of Archamoebae and Mycetozoa (Dictyost- ose amoebae, which are traditionally classified in the class Lobo- elia1Myxogastria) representing the class Conosea (Cavalier- sea, Carpenter 1861 (Page 1987), as well as the pelobionts, Smith 1998; Smirnov et al. 2005), appears in some but not all entamoebids, and mycetozoans (Cavalier-Smith 1998). In addi- SSU rDNA trees (Nikolaev et al. 2006). The extremely divergent tion to the amoeboid forms, Amoebozoa also comprise the uni- sequences of myxogastrids often branch separately as a sister ciliate zooflagellate Phalansterium solitarium (Cavalier-Smith et group to some Variosea (Cavalier-Smith et al. 2004; Tekle et al. al. 2004) and the multiciliated species Multicilia marina (Niko- 2008). However, the monophyly of Dictyostelia and Myxogastria laev et al. 2006). Finally, the group includes the class Breviatea, is strongly supported by elongation factor (EF) 1A phylogenies introduced by Cavalier-Smith (2004) for the enigmatic free-living (Arisue et al. 2002; Baldauf and Doolittle 1997) and by phyloge- amoeboflagellate Mastigamoeba invertens, redescribed as Brevia- nomic analyses (Bapteste et al. 2002; Minge et al. 2008). In a re- ta anathema (Walker, Dacks, and Embley 2006), and recently cent analysis of EF1A and SSU rDNA data including a large taxon shown to be likely in a sister position to all other Amoebozoa in a sampling, this clade also comprises some Protostelida (Ceratio- phylogenomic analysis (Minge et al. 2008). myxa) but most protostelids branch separately (Fiore-Donno et al. The taxon Amoebozoa (Lu¨he 1913) was emended as a phylum unpublished). It has been proposed that Conosea also includes a by Cavalier-Smith (1998). Its taxonomic composition barely group of flagellated amoebozoans (Phalansterium, Multicilia) and changed since its creation (Adl et al. 2005). However, molecular some lobose amoebae (Acramoeba 5 former Gephyramoeba, Fil- evidence for the monophyly of all members is still quite circum- amoeba) that often branch as a paraphyletic assemblage at the stantial. The close relationship between some lobose amoebae and base of Mycetozoa and Archamoebae (Nikolaev et al. 2006). This mycetozoans was first suggested based on similarities of Acan- group partially corresponds to the class Variosea (Cavalier-Smith thamoeba and Dictyostelium mitochondrial genomes (Gray, Bur- 2004) whose monophyly is supported by a conserved motif of ger, and Lang 1999; Iwamoto et al. 1998), but these features are eight nucleotides in the variable region V7. not found in other amoebozoan taxa (Kudryavstev, pers. com- Among the other clades of Amoebozoa, Acanthopodida (in- mun.). The grouping of lobose amoebae together with entamoe- cluding Acanthamoeba and Balamuthia) are the only strongly bids, pelobionts, and mycetozoans was demonstrated in SSU supported group in all types of analyses. This clade, considered as rDNA and actin trees (Bolivar et al. 2001; Fahrni et al. 2003; the order Centramoebida, was included into the class Variosea by Milyutina et al. 2001), yet the support for this clade was very Cavalier-Smith (2004), but there is no support for this relationship weak. Much stronger support was obtained in Bayesian analyses in any SSU rDNA trees. There is also no support for the position of the SSU rDNA (Nikolaev et al. 2006) or the concatenated of Thecamoebida, which appears in the most recent SSU rDNA alignment of four genes (Tekle et al. 2008). The monophyly of analyses and has been confirmed by myosin II data (Berney, pers. Amoebozoa was also suggested by the presence of a particular commun.). Moreover, there is neither indication concerning the type of myosin II (Richards and Cavalier-Smith 2005) and con- position of Vermistella antarctica (Moran et al. 2007) nor that of firmed by its phylogenetic analysis (Berney and Cavalier-Smith the clade Mayorella1Dermamoeba. Among the 2007). However, a recent addition of two lobosean amoebae amoebozans, there are also Trichosphaerium spp., which branch (Hartmannella vermiformis and Acanthamoeba castellanii)to as sister group to Myxogastria in some analyses (Tekle et al. the phylogenomic analyses of ESTs from Archamoebae and My- 2008), but this is likely to be due to their extremely fast evolving cetozoa did not improve the support for the monophyly of all SSU rDNA. Amoebozoa (Minge et al. 2008). The lack of strong statistical support for the larger grouping also applies to the phylogenetic relationships within Amoebozoa. Most MOLECULAR PHYLOGENY OF RHIZARIA of the taxonomic groups recognized in recent classifications (Adl et Rhizaria are the most recently recognized supergroup al. 2005; Smirnov et al. 2005) have been based solely on SSU of eukaryotes, commonly including organisms bearing ‘‘root-like rDNA phylogenies. However, very few of these groups are robustly reticulose or filose pseudopodia’’ (Cavalier-Smith 2002). It supported and there is at present no clear evidence for the branching contains the majority of protists that were traditionally classified pattern among them. Because the composition and position of some among Rhizopoda (Filosea, Granuloreticulosea) and Actinopoda of these groups greatly depends on the choice of sites and taxa in (Fig. 2). However, in addition to typically amoeboid taxa, such as SSU rDNA analyses, we present here an amoebozoan phylogeny in euglyphids, gromiids, foraminiferans, and radiolarians, Rhizaria the form of a schematized consensus tree with a basal multifurcat- also includes a large diversity of free-living flagellates, amoebo- ion that reflects better the uncertainties (Fig. 1). flagellates, and parasitic protists. Six major clades can be distinguished in this tree: , The supergroup Rhizaria has been established based ex- Flabellinea, Conosea, Variosea, Thecamoebida, and Acanthopod- clusively on molecular data. The first presage for this grouping ida. The clades Tubulinea and Flabellinea comprise the majority of was a clade formed by the euglyphid testate amoebae and the naked and testate lobose amoebae. Tubulinea, which include Tub- photosynthetic chlorarachniophytes (Bhattacharya, Helmchen, ulinida, Arcellinida, Leptomyxida, and incertae sedis genus Echin- and Melkonian 1995). This clade was later enlarged to include amoeba and H. vermiformis, appears in most of SSU rDNA and the zooflagellates Cercomonas, Heteromita, and Thaumatomonas, actin trees (Bolivar et al. 2001; Fahrni et al. 2003; Nikolaev as well as the plasmodiophorid plant parasites (Cavalier-Smith and et al. 2005; Smirnov et al. 2005; Tekle et al. 2008). Tubulinea are Chao 1996/1997), leading to the creation of the phylum Cercozoa relatively well supported and also defined by tubular pseudopodia (Cavalier-Smith 1996/1997). The next important step was the find- and monoaxial cytoplasmic flow (Smirnov et al. 2005). In the case ing that Cercozoa and Foraminifera are related in actin phylogeny of Flabellinea, phylogenetic analyses are much less consistent and (Keeling 2001). This unexpected result was later confirmed by the 18 J. EUKARYOT. MICROBIOL., 56, NO. 1, JANUARY– FEBRUARY 2009 discovery of an amino acid insertion between the monomers of the trees is generally poor. The monophyly of Cercozoa is consis- polyubiquitin gene in Cercozoa, Foraminifera, and Plasmodiophor- tently recovered, although not always strongly supported (Bass ida (Archibald and Keeling 2004; Archibald et al. 2003), and an- and Cavalier-Smith 2004; Cavalier-Smith and Chao 2003b). A alyses of the large subunit of RNA polymerase gene (Longet et al. characteristic feature of this group is the insertion of two amino 2003) and SSU rDNA (Berney and Pawlowski 2003). acids in the polyubiquitin protein, but some taxa (Metopion, The taxonomic composition of Cercozoa was progressively ex- Chlorarachnea) harbor only one amino acid insertion (Bass et al. panded by including various zooflagellates (Atkins, Teske, and 2005). Anderson 2000; Ku¨hn, Lange, and Medlin 2000), gromiids The grouping of , Haplosporidia, Foraminifera, (Burki, Berney, and Pawlowski 2002), testate amoebae (Wylezich and the monogeneric clades of Gromia, Paradinium,and et al. 2002), filose and reticulate protists (Nikolaev et al. 2003), Filoreta is not well supported and the relationships among these and radiolarians (Polet et al. 2004). The position of plasmodiopho- different clades are not resolved. All these groups are character- rid plant pathogens among Cercozoa was confirmed by molecular ized by a single amino acid insertion in the polyubiquitin studies (Bulman et al. 2001). The haplosporidian parasites were gene (Bass et al. 2005) and the GA-AG deletion in SSU rDNA placed together with plasmodiophorids and gromiids in the sub- also present in Cercozoa (Cavalier-Smith and Chao 2003b), but phylum (Cavalier-Smith 2003; Cavalier-Smith and which is apparently modified in Foraminifera and subject to am- Chao 2003b). Cercozoa were suggested to be sister group to Ret- biguities in the alignment. Moreover, Haplosporidia, Paradinium, aria, composed of Polycystinea, , and Foraminifera Gromia, and Filoreta tenera (C. tenera) share the specific (Cavalier-Smith 1999). Both Cercozoa and formed the stem E23-13-1 (Tekle et al. 2007). However, this stem is new infrakingdom Rhizaria (Cavalier-Smith 2002). It was initially absent in other species of Filoreta ‘‘’’ and its iden- proposed that Rhizaria should also include Apusozoa and Centro- tification in highly divergent foraminiferan sequences is helida, but these lineages are in fact unrelated to Cercozoa and ambiguous. Retaria. A strong support for Rhizaria, composed of all previously The most controversial question is the position of Foraminifera. included taxonomic groups, plus Desmothoracida and Taxopod- In the SSU rDNA trees, Foraminifera were placed either close to ida, was recovered in a combined analysis of actin and SSU rDNA Haplosporidia and Gromiida (Berney, Fahrni, and Pawlowski genes (Nikolaev et al. 2004). A close relationship between 2004; Longet et al. 2004; Nikolaev et al. 2004) or as sister group Cercozoa and Foraminifera was confirmed by the analysis of to Polycystinea (Cavalier-Smith and Chao 2003b). They appeared three cytoskeletal proteins (Takishita et al. 2005) and more re- as sister to polycystine-like and -like clones in a cently by phylogenomic analyses of EST data (Burki and Paw- combined analysis of SSU and LSU rDNA data (Moreira et al. lowski 2006; Burki, Shalchian-Tabrizi, and Pawlowski 2008; 2007), but with very small taxon sampling. Knowing the extreme Burki et al. 2006, 2007). The rhizarian supergroup is growing acceleration of the foraminiferan stem lineage and the relatively continuously by new inclusions, such as the marine flagellate rapid evolution of radiolarian SSU rDNA sequences, this result ebriids (Hoppenrath and Leander 2006), the amoeboid Coral- could well be an artifact of long-branch attraction. The best ev- lomyxa (Tekle et al. 2007), the parasitic plasmodial Paradinium idence for the position of Foraminifera close to Haplosporidia, (Skovgaard and Daugbjerg 2008), and the soil flagellate Sainou- Gromia, and Filoreta clades is the presence of a polyubiquitin ron (Cavalier-Smith et al. 2008). insertion in all these taxa and its apparent absence in Radiolaria Phylogenetic relationships within Rhizaria have been studied (Bass et al. 2005). However, the grouping of one of two for- using SSU and LSU rDNA, actin, RNA polymerase II (RPB1), aminiferan actin paralogs with the actin of Polycystinea (Tekle et and tubulins (Cavalier-Smith and Chao 2003b; Longet et al. 2003; al. 2007) further complicates the situation. Moreira et al. 2007; Nikolaev et al. 2004; Takishita et al. 2005). The clade Radiolaria (Radiozoa in Cavalier-Smith 1987) is en- Our schematized SSU rDNA tree represents an actual view of tirely composed of various types of radiolarians, including rhizarian phylogeny with indications of an alternative branching Polycystinea (Spumellarida, Nasselarida, ), Acanthar- possibility for Foraminifera (Fig. 2). In general, we can distin- ea, and Taxopodida. Phylogenetic studies of this clade are based guish two major clades (i.e. Cercozoa and Radiolaria) and an as- exclusively on the SSU rDNA sequences (Amaral Zettler and semblage of six more or less diverse clades of uncertain position, Caron 2000; Amaral Zettler, Anderson, and Caron 1999; Amaral which correspond to Phytomyxea, Foraminifera, Haplosporidia, Zettler et al. 1997; Kunitomo et al. 2006; Lopez-Garcia, Rodri- and the genera Paradinium, Gromia, and Filoreta—a new genus guez-Valera, and Moreira 2002; Polet et al. 2004; Takahashi et al. comprising the misidentified Corallomyxa tenera (Tekle et al. 2004; Yuasa et al. 2006). The only protein-coding genes available 2007) and an organism previously identified tentatively as for Radiolaria are three sequences of actin (Nikolaev et al. 2004) ‘‘Reticulamoeba’’ (Bass et al. 2008). and five polyubiquitin sequences (Bass et al. 2005). The relation- The clade Cercozoa (corresponding to the subphylum Filosa in ships shown in our tree (Fig. 2) are similar to those obtained by Cavalier-Smith 2003, and ‘‘core’’ Cercozoa in Nikolaev et al. Yuasa et al. (2006) and Kunitomo et al. (2006). We could recover 2004) includes the euglyphids, phaeodarians, desmothoracids, the monophyly of Polycystinea (except Larcopyle that branches chlorarachniophytes, ebriids, and various flagellated genera, within Taxopodida); however, only the relations between which are often able to form filopodia. The relationships among Nasselarida and Collodaria are strongly supported. As in most these taxa have been extensively studied based on the SSU rDNA rhizarian clades, good support is found for each taxonomic group (Bass and Cavalier-Smith 2004; Bass et al. 2005; Cavalier-Smith (with the exception of Taxopodida), but the relationships among and Chao 2003b), but the resolution of the cercozoan SSU rDNA these groups remain unresolved.

Fig. 1. Phylogeny of Amoebozoa. Maximum likelihood small subunit rDNA tree showing the current knowledge for the evolutionary relationships between and within the main groups of Amoebozoa. Tree inferred from 1,298 aligned positions and a GTR1I1G8 model of nucleotide substitutions, obtained with the program TREEFINDER (Jobb, von Haeseler, and Strimmer 2004), and subsequently schematized by hand to better emphasize the confidences and uncertainties (see text). A RAxML (Stamatakis 2006) tree was also obtained with the same alignment and differed by the branching of Mycetozoa within Variosea (not shown). Thick branches denote bootstrap support 490%. The conserved motif of eight nucleotides in the variable region V7 (GGGTGAAG) is indicated on the branches where it is found. Drawings were adapted from the following sources: Thecamoebida, Variosea, and Leptomyxida (MicroÃscope, http://starcentral.mbl.edu/microscope/portal.php?pagetitle=index), Mycetozoa, Archamoebida, and Tubulinea (http:// www.unige.ch/sciences/biologie/biani/msg/Amoeboids/Eukaryotes.html), Flabelinea (Alexander Kudryavtsev). PAWLOWSKI & BURKI—PHYLOGENY OF AMOEBOZOA AND RHIZARIA 19

IMPORTANCE OF ENVIRONMENTAL SEQUENCING protists in all types of examined habitats (reviewed in Epstein and Small subunit rDNA-based analyses of environmental DNA Lopez-Garcia 2008). Dawson and Pace (2002) have proposed that some environmental sequences even represent novel eukaryotic samples revealed an extraordinarily large and hidden diversity of 20 J. EUKARYOT. MICROBIOL., 56, NO. 1, JANUARY– FEBRUARY 2009 lineages, but a careful analysis shows that most of the sequences quences have been characterized as belonging to Amoebozoa or can be placed into one of the existing supergroups of eukaryotes Rhizaria. Although their proportion is relatively small compared (Berney et al. 2004; Cavalier-Smith 2004). Some of these se- with other groups of eukaryotes, they significantly increase the PAWLOWSKI & BURKI—PHYLOGENY OF AMOEBOZOA AND RHIZARIA 21 taxon sampling in some amoebozoan and rhizarian groups and evidence based on a single molecular character is certainly the help to resolve their phylogenies by filling the evolutionary gaps most important handicap of amoebozoan and rhizarian phyloge- that exist between identified species. A good example is the case nies. Nevertheless, the SSU rDNA possesses several obvious ad- of the amoebozoan clade Variosea. Composed of flagellate and vantages that make it the most commonly used phylogenetic amoeboid species, this clade is considered as crucial for the place- marker. First, due to an elevated number of homogenous copies ment of the root of the amoebozoan tree (Nikolaev et al. 2006). and the presence of highly conserved regions, SSU rDNA is However, in most SSU rDNA analyses, Variosea appeared as a undeniably the most easily amplified nuclear gene. This is series of independent lineages branching at the base of Conosea particularly important for those amoeboid protists, like the for- (Nikolaev et al. 2006; Tekle et al. 2008). It was only after adding aminiferans and radiolarians, which can hardly be cultivated and five new environmental sequences that the monophyly of Vario- have to be amplified from single cell extractions. This is also one sea was recovered (Fig. 2), in agreement with the signature in SSU of the reasons why almost all environmental surveys of protists rDNA present in all members of this clade. are based solely on SSU rDNA sequences. The importance of environmental sequencing for revealing hid- Another advantage of the SSU rDNA is the presence of con- den diversity is particularly tangible in the case of Cercozoa, Ra- served and variable regions that enable recovery of phylogenetic diolaria, and monothalamous Foraminifera. The environmental relationships at different taxonomic levels. Of particular interest surveys of Cercozoa using specific polymerase chain reaction are the conserved motifs that can be defined as phylogenetic sig- (PCR) primers revealed nine novel cercozoan clades and 168 dis- natures. Some of them have been used to design specific ampli- tinct lineages (Bass and Cavalier-Smith 2004), some of global geo- fication primers, like the AAC insertion in foraminiferan stem 33 graphic distribution (Bass et al. 2007). The study of (Pawlowski 2000). Others are used to define larger phylogenetic picoeukaryotic diversity in the Sargasso Sea revealed five new ra- groupings, for example the GA-AG deletion in Cercozoa (Cava- diolarian clades, of which two are related to Taxopodida that were lier-Smith and Chao 2003b) or the stem 23-13-1 defining the clade known until then only on the base of one described species of Gromia1Haplosporidia1‘‘Corallomyxa’’ (Tekle et al. 2007) Sticholonche zanclea (Not et al. 2007). Finally, the foraminiferal- (Fig. 2). Another putatively important conserved motif of eight specific environmental surveys revealed two new clades of fresh- nucleotides (GGGTGAAG), not yet described in the literature, water species (Holzmann et al. 2003) and a large diversity of mono- can be found in the variable domain V7 of all representatives of thalamous foraminiferans (Habura et al. 2004, 2008). Strikingly, the class Variosea, including the environmental sequences, but not some rhizarian groups (i.e. Haplosporidia, Phytomyxea, Foraminif- in other Amoebozoa, except the myxogastriid Symphytocarpus era) are preceded by the divergence of single environmental se- impectus (Fig. 1). Although this motif cannot be used in phylo- quences (Fig. 2). Future isolation and characterization of the genetic analyses because of the lack of homologous regions in eukaryotic lineages to which these sequences belong will certainly other amoebozoans, its significance is certainly more important provide important information about the ancestors of these groups than the weak bootstrap support for Variosea in SSU rDNA-based and the evolutionary changes that have permitted their speciation. trees. Further discovery of such motifs in other groups of amoe- The environmental sequence data, however, must be interpreted boid protists may give a yet unexploited source of phylogenetic with caution. Polymerase chain reaction amplification of SSU information. rDNA gene frequently produces chimeric sequences that are not The main weakness of the SSU rDNA is the heterogeneity of always easy to detect (Berney et al. 2004). Therefore, all sequences substitution rates. The amoeboid protists seem particularly af- that slightly differ from well-established clades should be carefully fected by this phenomenon. For instance, an extraordinary accel- checked for the presence of chimeras using programs such as eration characterizes the stem lineage of Foraminifera (Pawlowski CHECK CHIMERA (Larsen et al. 1993). On the other hand, it is and Berney 2003). As a consequence, this group was for a long well known that taxonomic composition of environmental surveys time excluded from phylogenetic reconstructions of eukaryotes is strongly biased by PCR conditions. For example, it is rare to find and its position is still highly controversial (Moreira et al. 2007). amoebozoans (except Variosea) in environmental sequences and Exceptional rate variations have also been observed between and almost impossible to amplify foraminiferan SSU rDNA with typ- within foraminiferan groups (De Vargas and Pawlowski 1998; ical ‘‘universal’’ eukaryotic primers. Hence, the generally low en- Pawlowski et al. 1997). Fast evolving species are also common in vironmental diversity for some Amoebozoa and Rhizaria groups is Amoebozoa, in particular among the pelobionts, entamoebids, and probably artifactual, and reveals the need to search for other ge- myxomycetes. The most spectacular acceleration is observed nomic markers (perhaps among mitochondrial genes) to obtain a within the amoebozoan genus Trichosphaerium (Pawlowski and better view of the diversity of these groups. Fahrni 2007; Tekle et al. 2007), which is thus not included in our analyses. In this genus, 69 out of 609 SSU rDNA sites conserved in almost all (495%) amoebozoans are modified, rendering its ADVANTAGES AND LIMITATIONS OF accurate placement practically impossible, even with the best SSU rDNA PHYLOGENIES methods and models. Difficulties in placing the fast evolving amoebozoan and rhiz- Almost all that we know about the phylogeny of Amoebozoa arian species are only one of the drawbacks of SSU rDNA phylo- and Rhizaria is based on the SSU rDNA sequences. This sum of genies. More generally there is a lack of overall support at

Fig. 2. Phylogeny of Rhizaria. Maximum likelihood small subunit (SSU) rDNA tree showing the current knowledge for the evolutionary relation- ships between and within the main groups of Rhizaria. Tree inferred from 1,167 aligned positions and the GTRGAMMA model of nucleotide substi- tutions, obtained with the program RAxML (Stamatakis 2006), and subsequently schematized by hand to better emphasis the confidences and uncertainties (see text). Thick branches denote bootstrap support 490%. The insertion in the polyubiquitin protein, deletion and E23-13-1 stem in the SSU rDNA gene are indicated on the branches where they are found. The letter ‘‘P’’ represents the parasitic lineages. An alternative branching pattern for Foraminifera, corresponding to the Retaria hypothesis, is represented by the arrow. Species names: Mesofila limnetica (formerly Dimorpha like); Limnofila borokensis (formerly Gymnophrys cometa); Nanofila marina (formerly N-Por) have been changed following Bass et al. (2008). Drawings were adapted from the following sources: Cercozoa (Jahn, Bovee, and Jahn 1979; Taylor 1990, John Archibald, pers. commun.), Radiolaria (Haeckel 1862), Gromia and Foraminifera (photos of the authors). 22 J. EUKARYOT. MICROBIOL., 56, NO. 1, JANUARY– FEBRUARY 2009 different phylogenetic levels, especially for deep branches of SSU questions concerning the branching order among the super- rDNA trees. Among Amoebozoa, only the Acanthopodida and groups (Burki et al. 2007, 2008; Hampl pers. commun.; Rodri- Myxogastria are supported by more than 95% bootstrap guez-Ezpeleta et al. 2007). We expect that the coming genomic values (Fig. 1). The situation is slightly better among Rhizaria data will also be very useful for inferring intra-supergroups phylo- (Fig. 2), but in both cases the relationships between major clades genies. In many respects, the molecular study of Amoebozoa and remain largely unresolved. Therefore, although the SSU rDNA Rhizaria has proven to be particularly demanding and therefore sequences will remain extremely valuable as first indicators of working with both groups constitutes a challenging test for the phylogenetic affinities, the inferred SSU rDNA-based phylogenies phylogenomic approach. Excitingly, despite the important ad- should be considered with a lot of caution. vances in the phylogeny of Amoebozoa and Rhizaria reported in this review, our understanding of their evolution is still relatively NEW CHALLENGES FOR FUTURE PHYLOGENOMIC poor and further progress will depend on access to much larger genomic database. STUDIES Because the SSU rDNA phylogenies cannot reliably resolve all relationships between amoeboid protists, it is absolutely necessary ACKNOWLEDGMENTS to search for other molecular markers. As described above the The authors thank Ce´dric Berney, Anne-Marie Fiore-Donno, number of protein-coding genes available for Amoebozoa and Jose´ Fahrni, Alexey Smirnov, and Alexander Kudryavtsev for Rhizaria is very limited. There are also only few genomic data comments and discussion. We thank Chitchai Chantangsi and available for members of both supergroups. Among Amoebozoa, John Archibald for sharing some illustrations. The Swiss National the genomes of E. histolytica and Dictyostelium discoideum have Science Foundation is acknowledged for the generous support to been sequenced (Eichinger et al. 2005; Loftus et al. 2005), and this research, through grants 3100-064073.00 and 3100A0- those of some other entamoebids, dictyostelids, and A. castellanii 112645, and SCOPES Joint Research Projects (7SUPJ062342 are in progress. EST data are available for Mastigamoeba bal- and IB73A0-111064). amuthi (Bapteste et al. 2002), H. vermiformis, Physarum polycephalum, Hyperamoeba dachnaya, and Hyperamoeba sp. (Watkins and Gray 2008). Among Rhizaria, the Bigellowiella LITERATURE CITED natans genome has been sequenced but not yet published (Archi- Adl, S. M., Leander, B. S., Simpson, A. G., Archibald, J. M., Anderson, O. bald, pers. commun.) and a project to sequence the Paulinella R., Bass, D., Bowser, S. S., Brugerolle, G., Farmer, M. A., Karpov, S., chromatophora genome has been recently accepted (Yoon, pers. Kolisko, M., Lane, C. E., Lodge, D. J., Mann, D. G., Meisterfeld, R., commun.). EST data are available for five rhizarians, including Mendoza, L., Moestrup, Ø., Mozley-Standridge, S. E., Smirnov, A. V. two foraminiferans Reticulomyxa filosa and Quinqueloculina sp., & Spiegel, F. 2007. Diversity, nomenclature and taxonomy of protists. three cercozoans Cercomonas, B. natans, and Gymnophrys (Burki Syst. Biol., 56:684–689. and Pawlowski 2006; Burki et al. 2007; Rodriguez-Ezpeleta et al. Adl, S. M., Simpson, A. G., Farmer, M. A., Andersen, R. S., Anderson O, . 2007), as well as a yet unpublished dataset for the reticulate R., Barta, J. R., Bowser, S. S., Brugerolle, G., Fensome, R. A., Fred- ericq, S., James, T. Y., Karpov, S., Kugrens, P., Krug, J., Lane, C. E., amoeba Filoreta (Lewis, pers. commun.). Importantly, the recent Lewis, L. A., Lodge, J., Lynn, D. H., Mann, D. G., Mccourt, R. M., development of high-throughput sequencing technologies, such as Mendoza, L., Moestrup, Ø., Mozley-Standridge, S. E., Nerad, T. A., the 454 system, will lead to a massive increase of genomic data. Shearer, C. A., Smirnov, A. V., Spiegel, F. W. & Taylor, M. F. J. R. Notably, several EST projects on amoebozoan and rhizarian taxa 2005. The new higher-level classification of eukaryotes with emphasis are in progress (e.g. Gromia sphaerica, brass- on the taxonomy of protists. J. Eukaryot. Microbiol., 52:399–451. icae, Spongospora subterranean, Vannella sp., etc.), in order to Amaral Zettler, L. A. & Caron, D. A. 2000. New insights into the phy- address important phylogenetic questions. logeny of the Acantharea based on SSU rRNA gene sequencing. Eur. J. In the case of Amoebozoa, it is first essential to confirm their Protistol., 36:34–39. monophyly in a broadly sampled tree of eukaryotes, and if pos- Amaral Zettler, L. A., Anderson, O. R. & Caron, D. A. 1999. Towards a molecular phylogeny of colonial spumellarian Radiolaria. Mar. Micro- sible find a molecular synapomorphy for the group. The hypo- paleontol., 36:67–79. thetical position of the root of Amoebozoa between Conosea and Amaral Zettler, L. A., Sogin, M. L. & Caron, D. A. 1997. Phylogenetic other amoebae, as suggested in Nikolaev et al. (2006), needs to be relationships between the Acantharea and the Polycystinea: a molecular tested. The monophyly of the major groups (Tubulinea, Flabel- perspective on Haeckel’s Radiolaria. Proc. Natl. Acad. Sci. USA, linea, Variosea) suggested by the SSU rDNA analyses should be 94:11411–11416. confirmed and their relationships need to be established. The Amaral Zettler, L. A., Nerad, T. A., O’Kelly, C. J. & Sogin, M. L. 2001. position of incertae sedis amoebozoans with fast evolving SSU The nucleariid amoebae: more protists at the animal-fungal boundary. sequences (e.g. Trichosphaerium) should be revised. J. Eukaryot. Microbiol., 48:293–297. In the case of Rhizaria, the Retaria hypothesis urgently requires Archibald, J. M. & Keeling, P. J. 2004. Actin and ubiquitin protein sequences support a Cercozoa/Foraminiferan ancestry for the testing. As discussed above, this hypothesis is contradicted by plasmodiophorid plant pathogens. J. Eukaryot. Microbiol., 51:113–118. some current SSU rDNA phylogenies (Fig. 2) as well as the Archibald, J. M., Longet, D., Pawlowski, J. & Keeling, P. J. 2003. A novel absence of the polyubiquitin insertion in all tested radiolarian polyubiquitin structure in Cercozoa and Foraminifera: evidence for a species. Furthermore, Radiolaria generally lack the cercozoan- new eukaryotic supergroup. Mol. Biol. Evol., 20:62–66. specific SSU rDNA deletion. In Foraminifera, the site of the de- Arisue, N., Hashimoto, T., Lee, J. A., Moore, D. V., Gordon, P., Sensen, letion is situated in a variable region, impeding any conclusion C. W., Gaasterland, T., Hasegawa, M. & Mueller, M. 2002. The phylo- about its ancestral or derived character. It cannot be excluded that genetic position of the pelobiont Mastigamoeba balamuthi based on both Foraminifera and Radiolaria cluster together either as sister sequences of rDNA and translation elongation factors EF-1alpha and group to other rhizarians or within the rhizarian radiation. This EF-2. J. Eukaryot. Microbiol., 49:1–10. Atkins, M. S., Teske, A. P. & Anderson, O. R. 2000. A survey of flagellate question is particularly important given that Foraminifera and Ra- diversity at four deep-sea hydrothermal vents in the Eastern Pacific diolaria possess very old and well-preserved fossil records and Ocean using structural and molecular approaches. J. Eukaryot. Micro- their position is crucial to calibrate the tree of eukaryotes. biol., 47:400–411. So far, phylogenomic studies have been extremely efficient Baldauf, S. L. 2003. The deep root of eukaryotes. Science, 300:1703– in resolving the deep eukaryote phylogeny, answering important 1706. PAWLOWSKI & BURKI—PHYLOGENY OF AMOEBOZOA AND RHIZARIA 23

Baldauf, S. L. & Doolittle, W. F. 1997. Origin and evolution of slime Cavalier-Smith, T. 2004. Only six kingdoms of life. Proc. Roy. Soc. Lond. molds (Mycetozoa). Proc. Natl. Acad. Sci. USA, 94:12007–12012. B, 271:1251–1262. Bapteste, E., Brinkmann, H., Lee, J. A., Moore, D. V., Sensen, C. W., Cavalier-Smith, T., Levis, R., Chao, E. E., Oates, B. & Bass, D. 2008. Gordon, P., Durufle´, L., Gaasterland, T., Lopez, P., Mu¨ller, M. & Phil- Morphology and phylogeny of Sainouron acronematica sp. n. and the ippe, H. 2002. The analysis of 100 genes support the grouping of three ultrastructural unity of Cercozoa. Protist., 159:591–620. highly divergent amoebae: Dictyostelium, Entamoeba, and Mastig- Cavalier-Smith, T. & Chao, E. E.-Y. 1996/1997. Sarcomonad ribosomal amoeba. Proc. Natl. Acad. Sci. USA, 99:1414–1419. RNA sequences, rhizopod phylogeny, and the origin of euglyphid Bass, D. & Cavalier-Smith, T. 2004. Phylum-specific environmental DNA amoebae. Arch. Protistenkd., 147:227–236. analysis reveals remarkably high of Cercozoa (Pro- Cavalier-Smith, T. & Chao, E. E.-Y. 2003a. Molecular phylogeny of cen- tozoa). Int. J. Syst. Evol. Microbiol., 54:2393–2404. trohelid Heliozoa, a novel lineage of bikont eukaryotes that arose by Bass, D., Richards, T. A., Matthai, L., Marsh, V. & Cavalier-Smith, T. ciliary loss. J. Mol. Evol., 56:387–396. 2007. DNA evidence for global dispersal and probable endemicity of Cavalier-Smith, T. & Chao, E. E.-Y. 2003b. Phylogeny and classification protozoa. BMC Evol. Biol., 7:162. of phylum Cercozoa (Protozoa). Protist, 154:341–358. Bass, D., Moreira, D., Lopez-Garcia, P., Polet, S., Chao, E. E., Herden, S., Cavalier-Smith, T. & Chao, E. E.-Y. 2006. Phylogeny and megasystema- Pawlowski, J. & Cavalier-Smith, T. 2005. Polyubiquitin insertions and tics of phagotrophic heterokonts (kingdom Chromista). J. Mol. Evol., the phylogeny of Cercozoa and Rhizaria. Protist, 156: 62:388–420. 149–161. Cavalier-Smith, T., Chao, E. E.-Y. & Oates, B. 2004. Molecular phylog- Bass, D., Chao, E. E., Nikolaev, S., Yabuki, A., Ishida, K., Berney, C., eny of Amoebozoa and the evolutionary significance of the unikont Pakzad, U., Wylezich, C. & Cavalier-Smith, T. 2008. Phylogeny of na- Phalansterium. Eur. J. Protistol., 40:21–48. ked filose and reticulose Cercozoa: Granofilosea cl. n. and Proteomyx- Clark, C. G. & Cross, G. A. M. 1988. Small-subunit ribosomal RNA se- idea revised. Protist. (in press). quence from Naegleria gruberi supports the polyphyletic origin of Berney, C. & Cavalier-Smith, T. 2007. Myosin II and the evolution of amebas. Mol. Biol. Evol., 5:512–518. unikonts (abstract V European Congress of Protistology). Protistology, Dawson, S. C. & Pace, N. R. 2002. Novel kingdom-level eukaryotic diver- 5:14. sity in anoxic environments. Proc. Natl. Acad. Sci. USA, 99:8324–8329. Berney, C. & Pawlowski, J. 2003. Revised small subunit rRNA analysis De Vargas, C. & Pawlowski, J. 1998. Molecular versus taxonomic rates provides further evidence that Foraminifera are related to Cercozoa. of evolution in planktonic foraminifera. Mol. Phylogenet. Evol., 9: J. Mol. Evol., 57(Suppl. 1):120–127. 463–469. Berney, C., Fahrni, J. & Pawlowski, J. 2004.. How many novel eukaryotic Eichinger, L., Pachebat, J. A., Glo¨ckner, G., Rajandream, M. A., Sucgang, ‘‘kingdoms’’? Pitfalls and limitations of environmental DNA surveys. R., Berriman, M., Song, J., Olsen, R., Szafranski, K., Xu, Q., Tunggal, BMC Biol., 2:13. B., Kummerfeld, S., Madera, M., Konfortov, B. A., Rivero, F., Bankier, Bhattacharya, D., Helmchen, T. & Melkonian, M. 1995. Molecular evo- A. T., Lehmann, R., Hamlin, N., Davies, R., Gaudet, P., Fey, P., Pilcher, K., Chen, G., Saunders, D., Sodergren, E., Davis, P., Kerhornou, A., lutionary analyses of nuclear-encoded small subunit ribosomal RNA Nie, X., Hall, N., Anjard, C., Hemphill, L., Bason, N., Farbrother, P., identify an independent rhizopod lineage containing the Euglyphida and Desany, B., Just, E., Morio, T., Rost, R., Churcher, C., Cooper, J., Hay- the Chlorarachniophyta. J Eukaryot Microbiol., 42:65–69. dock, S., van Driessche, N., Cronin, A., Goodhead, I., Muzny, D., Mo- Bolivar, I., Fahrni, J., Smirnov, A. & Pawlowski, J. 2001. SSU rRNA- urier, T., Pain, A., Lu, M., Harper, D., Lindsay, R., Hauser, H., James, based phylogenetic position of the genera Amoeba and Chaos (Lobosea, K., Quiles, M., Madan Babu, M., Saito, T., Buchrieser, C., Wardroper, Gymnamoebia): the origin of gymnamoebae revisited. Mol. Biol. Evol., A., Felder, M., Thangavelu, M., Johnson, D., Knights, A., Loulseged, 18:2306–2314. H., Mungall, K., Oliver, K., Price, C., Quail, M. A., Urushihara, H., Bulman, S. R., Ku¨hn, S. F., Marshall, J. W. & Schnepf, E. 2001. A phylo- Hernandez, J., Rabbinowitsch, E., Steffen, D., Sanders, M., Ma, J., Ko- genetic analysis of the SSU rRNA from members of the Plasmodiophor- hara, Y., Sharp, S., Simmonds, M., Spiegler, S., Tivey, A., Sugano, S., ida and Phagomyxida. Protist, 152:43–51. White, B., Walker, D., Woodward, J., Winckler, T., Tanaka, Y., Shaul- Burki, F. & Pawlowski, J. 2006. Monophyly of Rhizaria and multigene sky, G., Schleicher, M., Weinstock, G., Rosenthal, A., Cox, E. C., phylogeny of unicellular bikonts. Mol. Biol. Evol., 23:1922–1930. Chisholm, R. L., Gibbs, R., Loomis, W. F., Platzer, M., Kay, R. R., Burki, F., Berney, C. & Pawlowski, J. 2002. Phylogenetic position of Williams, J., Dear, P. H., Noegel, A. A., Barrell, B. & Kuspa, A. 2005. Gromia oviformis Dujardin inferred from nuclear-encoded small sub- The genome of the social amoeba Dictyostelium discoideum. Nature, unit ribosomal DNA. Protist, 153:251–260. 435:3–57. Burki, F., Shalchian-Tabrizi, K. & Pawlowski, J. 2008. Phylogenomics Epstein, S. & Lopez-Garcia, P. 2008. ‘‘Missing’’ protists: a molecular reveals a new ‘‘megagroup’’ including most photosynthetic eukaryotes. prospective. Biodivers. Conserv., 17:261–276. Biol. Lett., 4:366–369. Fahrni, J. H., Bolivar, I., Berney, C., Nassonova, E., Smirnov, A. & Paw- Burki, F., Nikolaev, S. L., Bolivar, I., Guaird, J. & Pawlowski, J. 2006. lowski, J. 2003. Phylogeny of lobose amoebae based on actin and small- Analysis of expressed sequence tags (ESTs) from a naked foraminiferan subunit ribosomal RNA genes. Mol. Biol. Evol., 20:1881–1886. Reticulomyxa filosa. Genome, 49:882–887. Gray, M. W., Burger, G. & Lang, B. F. 1999. Mitochondrial evolution. Burki, F., Shalchian-Tabrizi, K., Minge, M., Skaeveland, A., Nikolaev, S. Science, 283:1476–1481. I., Jakobsen, K. S. & Pawlowski, J. 2007. Phylogenomics reshuffles the Habura, A., Goldstein, S. T., Broderick, S. & Bowser, S. S. 2008. A bush, eukaryotic supergroups. PLoS ONE, 2:e790. not at tree: the extraordinary diversity of cold-water basal foraminifer- Cavalier-Smith, T. 1987. The origin of eukaryote and archaebacterial ans extends to warm-water environments. Limnol. Oceanogr., 53:1339– cells. Ann. N. Y. Acad. Sci., 503:17–54. 1351. Cavalier-Smith, T. 1996/1997. Amoeboflagellates and mitochondrial Habura, A., Pawlowski, J., Hanes, S. D. & Bowser, S. S. 2004. Unexpected cristae in eukaryotic evolution: megasystematics of the new pro- foraminiferal diversity revealed by small-subunit rRNA analysis of tozoan subkingdoms Eozoa and Neozoa. Arch. Protistenkd., 147: Antarctic sediment. J. Eukaryot. Microbiol., 51:173–179. 237–258. Haeckel, E. 1862. Die Radiolarien (Rhizopoda Radiaria). Eine Mono- Cavalier-Smith, T. 1998. A revised six-kingdom system of life. Biol. Rev., graphie. Reimer, Berlin. 73:203–266. Hinkle, G., Leipe, D. D., Nerad, T. A. & Sogin, M. L. 1994. The unusually Cavalier-Smith, T. 1999. Principles of protein and lipid targeting in sec- long small subunit ribosomal RNA of Phreatamoeba balamuthi. Nu- ondary symbiogenesis: euglenoid, dinoflagellates, and sporozoan plas- cleic Acids Res., 22:465–469. tid origins and the eukaryote family tree. J. Eukaryot. Microbiol., Holzmann, M., Habura, A., Giles, H., Bowser, S. S. & Pawlowski, J. 2003. 46:347–366. Freshwater foraminiferans revealed by analysis of environmental DNA Cavalier-Smith, T. 2002. The phagotrophic origin of eukaryotes and samples. J. Eukaryot. Microbiol., 50:135–139. phylogenetic classification of Protozoa. Int. J. Syst. Evol. Microbiol., Hoppenrath, M. & Leander, B. S. 2006. Ebriid phylogeny and the expan- 52:297–354. sion of the Cercozoa. Protist, 157:279–290. Cavalier-Smith, T. 2003. Protist phylogeny and the high-level classifica- Iwamoto, M., Pi, M., Kurihara, M., Morio, T. & Tanaka, Y. 1998. A ribo- tion of Protozoa. Eur. J. Protistol., 39:338–348. somal protein gene cluster is encoded in the mitochondrial DNA 24 J. EUKARYOT. MICROBIOL., 56, NO. 1, JANUARY– FEBRUARY 2009

of Dictyostelium discoideum: UGA termination codons and simi- amoeba contorta n.sp. and Vermistella antarctica n.gen. n.sp. larity of gene order to Acanthamoeba castellanii. Curr. Genet., 33: J. Eukaryot. Microbiol., 54:169–183. 304–310. Moreira, D., von der Heyden, S., Bass, D., Lopez-Garcia, P., Chao, E. & Jahn, T. L., Bovee, E. C. & Jahn, F. F. 1979. How to know the Protozoa. Cavalier-Smith, T. 2007. Global eukaryote phylogeny: combined small- 2nd ed. Wm. C. Brown Company Publ., Dubusque, IA. and large-subunit ribosomal DNA support monophyly of Rhizaria, Ret- Jobb, G., von Haeseler, A. & Strimmer, K. 2004. TREEFINDER: a pow- aria and Excavata. Mol. Phylogenet. Evol., 44:255–266. erful graphical analysis environment for molecular phylogenetics. BMC Nikolaev, S. I., Berney, C., Petrov, N. B., Mylnikov, A. P., Fahrni, J. F. & Evol. Biol., 4:18. Pawlowski, J. 2006. Phylogenetic position of Multicilia marina and Keeling, P. J. 2001. Foraminifera and Cercozoa are related in actin phy- evolution of Amoebozoa. Int. J. Syst. Evol. Microbiol., 56:1449–1458. logeny: two orphans find a home? Mol. Biol. Evol., 18:1551–1557. Nikolaev, S. I., Berney, C., Fahrni, J., Mylnikov, A. P., Aleshin, V. V., Keeling, P. J. & Doolittle, W. F. 1996. Alpha-tubulin from early-diverging Petrov, N. & Pawlowski, J. 2003. Gymnophrys cometa and Lecythium eukaryotic lineages and the evolution of the tubulin family. Mol. Biol. sp. are core Cercozoa: evolutionary implications. Acta Protozool., Evol., 13:1297–1305. 42:183–190. Kudryavtsev, A., Detlef, B., Schlegel, M., Chao, E. E.-Y. & Cavalier- Nikolaev, S. I., Mitchell, E. A. D., Petrov, N. B., Berney, C., Fahrni, J. & Smith, T. 2005. 18S Ribosomal RNA gene sequences of Cochliopodium Pawlowski, J. 2005. The testate lobose amoebae (order Arcellinida (Himatismenida) and the phylogeny of Amoebozoa. Protist, 156:215– Kent, 1880) finally find their home within Amoebozoa. Protist, 224. 156:191–202. Ku¨hn, S., Lange, M. & Medlin, L. K. 2000. Phylogenetic position of Cry- Nikolaev, S. I., Berney, C., Fahrni, J., Bolivar, I., Polet, S., Mylnikov, A. othecomonas inferred from nuclear-encoded small subunit ribosomal P., Aleshin, V. V., Petrov, N. B. & Pawlowski, J. 2004. The twilight of RNA. Protist, 151:337–345. Heliozoa and rise of Rhizaria: an emerging supergroup of amoeboid Kunitomo, Y., Sarashina, I., Iijima, M., Endo, K. & Sashida, K. 2006. eukaryotes. Proc. Natl. Acad. Sci. USA, 101:8066–8071. Molecular phylogeny of acantharian and polycystine radiolarians based Not, F., Gausling, R., Azam, F., Heidelberg, J. F. & Worden, A. Z. 2007. on ribosomal DNA sequences, and some comparisons with data from Vertical distribution of picoeukaryotic diversity in the Sargasso Sea. the fossil record. Eur. J. Prot., 42:143–153. Environ. Microbiol., 9:1233–1252. Langer, M. R. 2008. Assessing the contribution of foraminiferan protists to Page, F. C. 1987. The classification of ‘naked’ amoebae (Phylum Rhizo- global ocean carbonate production. J. Eukaryot. Microbiol., 55:163– poda). Arch. Protistenkd., 133:199–217. 169. Page, F. C. & Blanton, L. 1985. The Heterolobosea (Sarcodina: Rhizo- Larsen, N., Olsen, G. J., Maidak, B. L., McCaughey, M. J., Overbeek, R., poda), a new class uniting the and the (Acras- Macke, T. J., Marsh, T. L. & Woese, C. R. 1993. The ribosomal da- ida). Protistologica, 21:121–132. tabase project. Nucleic Acids Res., 21(Suppl.):3021–3023. Patterson, D. J. 1994. Protozoa: evolution and systematics. In: Hausmann, Lee, J. J., Hutner, S. H. & Bovee, E. C. 1985. An Illustrated Guide to the K. & Hu¨lsmann, N. (ed.), Progress in Protozoology. Proceedings of the Protozoa. Society of Protozoologists, Lawrence, KS. IX International Congress of Protozoology, Berlin 1993. Gustav Fischer Levine, N. D., Corliss, J. O., Cox, F. E. G., Deroux, G., Grain, J., Honig- Verlag, Stuttgart, Jena, NY. p. 1–14. berg, B. M., Leedale, G. F., Loeblich III, A. R., Lom, J., Lynn, D., Pawlowski, J. 2000. Introduction to the molecular systematics of fora- Merinfeld, E. G., Page, F. C., Poljansky, G., Sprague, V., Vavra, J. & minifera. Micropaleontology, 46(Suppl. 1):1–12. Wallace, F. G. 1980. A newly revised classification of the Protozoa. Pawlowski, J. & Berney, C. 2003. Episodic evolution of nuclear small J. Protozool., 27:37–58. subunit ribosomal RNA gene in the stem lineage of Foraminifera. In: Loftus, B., Andreson, I., Davies, R., Alsmark, U. C., Samuelson, J., Am- Donoghue, P. C. & Smith, M. P. (ed.), Telling the Evolutionary Time: edeo, P., Roncaglia, P., Berriman, M., Hirt, R. P., Mann, B. J., Nozaki, Molecular Clocks and the Fossil Record. Systematics Assoc. Special T., Suh, B., Pop, M., Duchene, M., Ackers, J., Tannich, E., Leippe, M., Vol. No. 66. Taylor & Francis, London. p. 107–118. Hofer, M., Bruchhaus, I., Willhoeft, U., Bhattacharya, A., Chilling- Pawlowski, J. & Fahrni, J. F. 2007. Phylogenetic position of Trichosidae. worth, T., Churcher, C., Hance, Z., Harris, B., Harris, D., Jagels, K., In: Goodkov, A. V. & Karpov, S. A. (ed.), V European Congress of Moule, S., Mungall, K., Ormond, D., Squares, R., Whitehead, S., Quail, Protistology Abstracts. Protistology, 5:61–62. M. A., Rabbinowitsch, E., Norbertczak, H., Price, C., Wang, Z., Pawlowski, J., Bolivar, I., Fahrni, J., Cavalier-Smith, T. & Gouy, M. 1996. Guille´n, N., Gilchrist, C., Stroup, S. E., Bhattacharya, S., Lohia, A., Early origin of foraminifera suggested by SSU rRNA gene sequences. Foster, P. G., Sicheritz-Ponten, T., Weber, C., Singh, U., Mukherjee, C., Mol. Biol. Evol., 13:445–450. El-Sayed, N. M., Petri Jr., W. A., Clark, C. G., Embley, T. M., Barrell, Pawlowski, J., Bolivar, I., Fahrni, J., De Vargas, C. & Bowser, S. S. 1999. B., Fraser, C. M. & Hall, N. 2005. The genome of the protist parasite Molecular evidence that Reticulomyxa filosa is a freshwater naked for- Entamoeba histolytica. Nature, 433:865–868. aminifer. J. Eukaryot. Microbiol., 46:612–617. Longet, D., Archibald, J. M., Keeling, P. J. & Pawlowski, J. 2003. Fora- Pawlowski, J., Bolivar, I., Fahrni, J., De Vargas, C., Gouy, M. & Zaninetti, minifera and Cercozoa share a common origin according to RNA poly- L. 1997. Extreme differences in rates of molecular evolution of fora- merase II phylogenies. Int. J. Syst. Evol. Microbiol., 53:1735–1739. minifera revealed by comparison of ribosomal DNA sequences and the Longet, D., Burki, F., Flakowski, J., Berney, C., Polet, S., Fahrni, J. & fossil record. Mol. Biol. Evol., 14:498–505. Pawlowski, J. 2004. Multigene evidence for close evolutionary relations Peglar, M. T., Amaral Zettler, L. A., Anderson, O. R., Nerad, T. A., Gille- between Gromia and Foraminifera. Acta Protozool., 43:303–311. vet, P. M., Mullen, T. E., Frasca Jr., S., Silberman, J. D., O’Kelly, C. J. Lopez-Garcia, P., Rodriguez-Valera, F. & Moreira, D. 2002. Towards the & Sogin, M. L. 2003. Two new small-subunit ribosomal RNA gene monophyly of Haeckel’s Radiolaria: 18S rRNA environmental data lineages within the subclass Gymnamoebia. J. Eukaryot. Microbiol., support the sisterhood of Polycystinea and Acantharea. Mol. Biol. Evol., 50:224–232. 19:118–121. Philippe, H. & Adoutte, A. 1998. The molecular phylogeny of Eukaryota: Milyutina, I. A., Aleshin, V. V., Mikrjukov, K. A., Kedrova, O. S. & Pet- solid facts and uncertainties. In: Coombs, G., Vickerman, K., Sleigh, M. rov, N. B. 2001. The unusually long small subunit ribosomal RNA gene & Warren, A. (ed.), Evolutionary Relationships among Protozoa. Chap- found in amitochondriate amoeboflagellate palustris: its man & Hall, London. p. 25–26. rRNA predicted secondary structure and phylogenetic implication. Polet, S., Berney, C., Fahrni, J. & Pawlowski, J. 2004. Small subunit ribo- Gene, 272:131–139. somal RNA sequences of Phaeodarea challenge the monophyly of Ha- Minge, M. A., Silbermann, J. D., Orr, R. J. S., Cavalier-Smith, T., Shalch- eckel’s Radiolaria. Protist, 155:53–63. ian-Tabrizi, K., Burki, F., Skjaeveland, A. & Jakobsen, K. S. 2008. Richards, T. A. & Cavalier-Smith, T. 2005. Myosin domain evolution Evolutionary position of breviate amoebae and the primary eukaryote and the primary divergence of eukaryotes. Nature, 436: divergence. Proc. Roy. Soc. B-Biol. Sci. (in press). 1113–1118. Moran, D. M., Anderson, O. R., Dennett, M. R., Caron, D. A. & Gast, R. J. Rodriguez-Ezpeleta, N., Brinkmann, H., Burger, G., Roger, A. J., Gray, M. 2007. A description of seven Antarctic marine gymnamoebae including W., Philippe, H. & Lang, B. F. 2007. Toward resolving the eukaryotic a new subspecies, two new species and a new genus: Neoparamoeba tree: the phylogenetic positions of jakobids and cercozoans. Curr. Biol., aestuarina antarctica n.subsp., Platyamoeba oblongata n.sp., Platy- 17:1420–1425. PAWLOWSKI & BURKI—PHYLOGENY OF AMOEBOZOA AND RHIZARIA 25

Sakaguchi, M., Inagaki, Y. & Hashimoto, T. 2007. Centrohelida is still quences of three cytoskeletal proteins (actin, a-tubulin, b-tubulin). searching for a phylogenetic home: analyses of seven Raphidiophrys Gene, 362:153–160. contractilis genes. Gene, 405:47–54. Taylor, F. J. R. 1990. Incertae sedis ebridians. In: Margulis, L., Corliss, J. Sakaguchi, M., Nakayama, T., Hashimoto, T. & Inouye, I. 2005. Phylog- O., Melkonian, M. & Chapman, D. J. (ed.), Handbook of Protoctista. eny of the Centrohelida inferred from SSU rRNA, tubulins, and actin Jones and Bartlett, Boston. p. 720–721. genes. J. Mol. Evol., 61:765–775. Tekle, Y. I., Grant, J., Anderson, O. R., Nerad, T. A., Cole, J. C., Patterson, Silberman, J. D., Clark, C. G., Diamond, L. S. & Sogin, M. L. 1999. Phy- D. J. & Katz, L. A. 2008. Phylogenetic placement of diverse amoebae logeny of the genera Entamoeba and Endolimax as deduced from inferred from multigene analyses and assessment of clade stability small-subunit ribosomal RNA sequences. Mol. Biol. Evol., 16: within ‘‘Amoebozoa’’ upon removal of varying rate classes of SSU- 1740–1751. rDNA. Mol. Phylogenet. Evol., 47:339–352. Simpson, A. G. B., Inagaki, Y. & Roger, A. J. 2006. Comprehensive mul- Tekle, Y. I., Grant, J., Cole, J. C., Nerad, T. A., Anderson, O. R., Patterson, tigene phylogenies of excavate protists reveal the evolutionary positions D. J. & Katz, L. A. 2007. A multigene analysis of Corallomyxa tenera of ‘‘primitive’’ eukaryotes. Mol. Biol. Evol., 23:615–625. sp.nov. suggests its membership in a clade that includes Gromia, Hap- Skovgaard, A. & Daugbjerg, N. 2008. Identity and systematic position of losporidia and Foraminifera. Protist, 158:457–472. Paradinium poucheti and other Paradinium-like parasites of marine Walker, G., Dacks, J. B. & Embley, M. T. 2006. Ultrastructural description copepods based on morphology and nuclear-encoded SSU rDNA. of Breviata anathema, n.gen., n.sp., the organism previously studied as Protist, 159:401–413. ‘‘Mastigamoeba invertens’’. J. Eukaryot. Microbiol., 53:65–78. Smirnov, A. V., Nassonova, E. S., Berney, C., Fahrni, J., Bolivar, I. & Watkins, R. F. & Gray, M. W. 2008. Sampling gene diversity across the Pawlowski, J. 2005. Molecular phylogeny and classification of the lob- supergroup Amoebozoa: large EST data sets from Acanthamoeba caste- ose amoebae. Protist, 156:129–142. llanii, Hartmannella vermiformis, Physarum polycephalum, Hyper- Stamatakis, A. 2006. RAxML-VI-HPC: maximum likelihood-based amoeba dachnaya and Hyperamoeba sp.. Protist, 159:269–281. phylogenetic analyses with thousands of taxa and mixed models. Bio- Wylezich, C., Meisterfeld, R., Meisterfeld, S. & Schlegel, M. 2002. informatics, 22:2688–2690. Phylogenetic analyses of small subunit ribosomal RNA coding regions Steenkamp, E. T., Wright, J. & Baldauf, S. L. 2006. The protistan origins reveal a monophyletic lineage of euglyphid testate amoebae (order of animals and fungi. Mol. Biol.Evol., 23:93–106. Euglpyhida). J. Eukaryot. Microbiol., 49:108–118. Stiller, J. W. & Hall, B. D. 1999. Long-branch attraction and the rDNA Yuasa, T., Takahashi, O., Dolven, J. K., Mayama, S., Matsuoka, A., model of early eukaryotic evolution. Mol. Biol. Evol., 16: Honda, D. & Bjorklund, K. R. 2006. Phylogenetic position of the small 1270–1279. solitary phaeodarians (Radiolaria) based on 18S rDNA sequences by Takahashi, O., Yuasa, T., Honda, D. & Mayama, S. 2004. Molecular phy- single cell PCR analysis. Mar. Micropaleontol., 59:104–114. logeny of solitary shell-bearing Polycystinea (Radiolaria). Rev. Micro- paleontol., 47:111–118. Takishita, K., Inagaki, Y., Tsuchiya, M., Sakaguchi, M. & Maruyama, T. 2005. A close relationship between Cercozoa and Foraminifera sup- ported by phylogenetic analyses based on combined amino acid se- Received: 09/01/08, 10/11/08; accepted: 10/12/08                                                 Monophyly of Rhizaria and Multigene Phylogeny of Unicellular Bikonts

Fabien Burki and Jan Pawlowski Department of Zoology and Animal Biology, University of Geneva, Geneva, Switzerland

Reconstructing a global phylogeny of eukaryotes is an ongoing challenge of molecular phylogenetics. The availability of genomic data from a broad range of eukaryotic phyla helped in resolving the eukaryotic tree into a topology with a rather small number of large assemblages, but the relationships between these ‘‘supergroups’’ are yet to be confirmed. Rhizaria is the most recently recognized ‘‘supergroup,’’ but, in spite of this important position within the tree of life, their represen- tatives are still missing in global phylogenies of eukaryotes. Here, we report the first large-scale analysis of eukaryote phylogeny including data for 2 rhizarian species, the foraminiferan Reticulomyxa filosa and the chlorarachniophyte Bi- gelowiella natans. Our results confirm the monophyly of Rhizaria (Foraminifera 1 Cercozoa), with very high bootstrap supports in all analyses. The overall topology of our trees is in agreement with the current view of eukaryote phylogeny with basal division into ‘‘unikonts’’ (Opisthokonts and Ameobozoa) and ‘‘bikonts’’ (Plantae, alveolates, stramenopiles, and excavates). As expected, Rhizaria branch among bikonts; however, their phylogenetic position is uncertain. Depending on the data set and the type of analysis, Rhizaria branch as sister group to either stramenopiles or excavates. Overall, the relationships between the major groups of unicellular bikonts are poorly resolved, despite the use of 85 proteins and the largest taxonomic sampling for this part of the tree available to date. This may be due to an acceleration of evolutionary rates in some bikont phyla or be related to their rapid diversification in the early evolution of eukaryotes.

Introduction and ultrastructural data (Simpson, Radek, et al. 2002). Resolving the structure of the phylogenetic tree of Overall, the vast majority of the known diversity of eukar- eukaryotes is of crucial importance for understanding the yotes seems to be distributed among only 5 to 6 major divi- major evolutionary steps that could possibly explain the sions that are probably all monophyletic, referred to as the relationships between species. During the last 2 decades, plants, excavates, chromalveolates, Rhizaria (all belonging the advances in molecular systematics led to establishing to the assemblage of the so-called ‘‘bikonts’’) and the ‘‘uni- new monophyletic assemblages and helped in drawing the konts,’’ which comprise the Opisthokonts and Amoebozoa relations between the numerous lineages recognized on (Keeling et al. 2005). Identifying these natural supergroups the basis of morphological and ultrastructural data. At first, raised the new challenge of understanding the relationships based almost exclusively on the small-subunit rRNA (SSU among them, which, for most of the eukaryotic tree, has yet rRNA) gene (Sogin et al. 1989; Sogin 1991; Kumar and to be confirmed. Rzhetsky 1996; Pawlowski et al. 1996; Sogin and Silberman Rhizaria (Cavalier-Smith 2002) is a recently emerged 1998), molecular phylogenies of eukaryotes were subse- supergroup of eukaryotes enclosing organisms as diverse as quently tested with protein-coding genes (Yamamoto filose testate amoebae, cercomonads, chlorarachniophytes, et al. 1997; Moreira et al. 1999; Philippe et al. 2000). Foraminifera, plasmodiophorids, haplosporidians, gromiids, Despite their important role in the early days of molecular and radiolarians (Adl et al. 2005). The first hints for the evo- phylogenetics, single-gene phylogenies are now known to lutionary meaning of the group came from SSU rRNA– be highly sensitive to variation of evolutionary rates, which based phylogenies (Bhattacharya et al. 1995; Cavalier- often led to false representation of early eukaryotic evolu- Smith and Chao 1997). Rapidly, the phylum Cercozoa was tion (Stiller and Hall 1999; Morin 2000; Philippe 2000; created to accommodate this new assemblage (Cavalier- Philippe and Germot 2000). Smith 1998). Further molecular studies confirmed the het- Over time, the accumulation of protein sequences erogeneity of this phylum, with various protists being in- from a large variety of eukaryotes has made it possible to cluded in it (Burki et al. 2002; Cavalier-Smith and Chao test single-gene phylogenies using combined data (Baldauf 2003; Polet et al. 2004). Protein data indicated a relationship et al. 2000). A new view of global phylogeny of eukaryotes between Foraminifera and Cercozoa (Keeling 2001; Archi- emerged from a growing number of evidence based on sev- bald, Longet, et al. 2003; Longet et al. 2003), and a combined eral different kinds of mutually reinforcing data, such as 1) analysis of SSU rRNA and actin confirmed their relation multiple gene phylogenies (Bapteste et al. 2002; Yoon et al. with Radiolaria (Nikolaev et al. 2004). Finally, a study 2002; Philippe et al. 2004; Hampl et al. 2005; Harper et al. of a single or double amino acid insertion in the protein 2005; Philippe et al. 2005; Rodriguez-Ezpeleta et al. 2005; polyubiquitin suggests that Radiolaria represent the most Simpson et al. 2006; Steenkamp et al. 2006), 2) individual basal branch of Rhizaria, followed by Foraminifera and phylogenies converging on the same relationships (Fast Cercozoa (Bass et al. 2005). et al. 2002; Simpson, Roger, et al. 2002; Longet et al. Despite their now well-accepted taxonomic status, the 2004), 3) discrete characters (Baldauf and Palmer 1993; Rhizaria are still missing in most of the multigene phylog- Keeling and Palmer 2001; Stechmann and Cavalier-Smith enies published to date (Bapteste et al. 2002; Philippe et al. 2002; Archibald, Longet, et al. 2003), and 4) morphological 2004; Hampl et al. 2005; Rodriguez-Ezpeleta et al. 2005; Steenkamp et al. 2006). Until recently, the only available Key words: eukaryotes, unicellular bikonts, Rhizaria, multigene rhizarian genomic information was an expressed sequence phylogeny. tag (EST) data set for the chlorarachniophyte Bigelowiella E-mail: [email protected]. natans, comprising about 3,500 sequences (Keeling and Mol. Biol. Evol. 23(10):1922–1930. 2006 doi:10.1093/molbev/msl055 Palmer 2001). Some of these sequences have been used in Advance Access publication July 7, 2006 studies with other purposes than exploring the phylogenetic

Ó The Author 2006. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] Monophyly and Evolutionary Relationship of Rhizaria 1923 position of Rhizaria (de Koning et al. 2005; Harper et al. Table S1, Supplementary Material online). To decide on the 2005) or are even absent from the final trees because of final set of genes used in this study, we checked for orthol- a suspected artifactual position (Simpson et al. 2006). ogy between all the retrieved sequences for each selected To include Rhizaria in multigene phylogenies of genes by first carrying out a Neighbor-Joining (NJ) analysis eukaryotes, we have recently conducted an EST project with the program PROTDIST 3.6 (Felsenstein 2004), al- on the freshwater naked foraminiferan Reticulomyxa filosa, lowing us to discard very distant paralogous genes. To re- which led to approximately 1,600 high-quality sequences fine our selection, we then constructed for each gene (Burki et al. forthcoming). Combining the available geno- a maximum likelihood (ML) tree using PHYML (JTT 1 mic information, we assembled in this study a data set of 85 F 1 C4) (Guindon and Gascuel 2003) so that we were able orthologous proteins for 37 eukaryotic species, including to keep genes only where clear orthology between species the 2 rhizarian species R. filosa and B. natans, in order could be identified. to 1) confirm the monophyly of Rhizaria when using a large number of protein-coding genes and 2) infer the phyloge- Phylogenetic Analyses netic position of this supergroup within eukaryotes. We concatenated all genes into alignments that were analyzed with both ML and Bayesian Inference (BI). ML analyses utilized the programs PHYML (Guindon and Materials and Methods Gascuel 2003) and TREEFINDER (Jobb et al. 2004). Fol- Construction of the Alignment lowing the Akaike Information Criterion (AIC) (Posada and Using our R. filosa ESTs as queries, we performed Buckley 2004) computed with ProtTest 1.2.6 (Abascal et al. BlastX searches against the UniProt protein database on 2005), the RtREV 1 F 1 C model allowing between- the Swiss Institute of Bioinformatics server to find suffi- site rate variation was chosen (calculations were done with ciently conserved genes in a broad taxonomic sampling of 8 gamma categories). Coming right after according to the eukaryotes. A homemade perl script linking the Blast out- AIC, the WAG model was also tested and gave the same put and the seqret program from the EMBOSS package topologies. To estimate the robustness of the phylogenetic (http://emboss.ch.embnet.org/EMBOSSDOC/programs inference, we used the bootstrap method (Felsenstein 1985) /html/seqret.html) allowed us to retrieve and store in differ- with 100 pseudoreplicates generated and analyzed with ent files (each corresponding to a different R. filosa gene) all PHYML and TREEFINDER. ÿ40 sequences from the database with an e-value , 10 . This Bayesian analyses using the WAG 1 F 1 C4 model relatively stringent cutoff was defined in order to avoid the were performed with the parallel version of MrBayes 3.1.2 integration of paralogous genes. The homologous proteins (Ronquist and Huelsenbeck 2003). Each inference, starting were then aligned with ClustalW (Thompson et al. 1994) from a random tree and using 4 Metropolis-coupled and kept for further analyses if they 1) showed a reasonable Markov Chain Monte Carlo (MCMCMC), consisted of taxonomic distribution and 2) were conserved enough 1,000,000 generations with sampling every 100 genera- across all eukaryotes. tions. The average standard deviation of split frequencies To increase the number of eukaryotes represented, we was used to assess the convergence of the 2 runs. Bayesian downloaded all available nucleotide sequences from Gen- posterior probabilities were calculated from the majority Bank through the taxonomy browser at National Center for rule consensus of the tree sampled after the initial burn- Biotechnology Information (http://www.ncbi.nlm.nih.gov) in period as determined by checking the convergence of for the stramenopiles, ciliates, alveolates, Entamoeba, likelihood values across MCMCMC generations (corre- Physcomitrella, Rhodophyta, Strongylocentrotus, Schisto- sponding to roughly 20,000–50,000 generations, depend- soma, Giardia, Trichomonas, Alexandrium, and B. natans. ing on the analysis). We searched for homology between this constructed data In subsequent analyses, amino acid positions were set and our R. filosa ESTs by performing local TBlastX successively removed from the complete alignment (CA) (threshold , 10ÿ40) and added the resulting matching se- according to their substitution rates. Substitution rates at quences to our alignments. At this point, only the genes sites were computed with the program CODEML from found either in both R. filosa and B. natans or only in R. the PAML package (Yang 1997), given the 15 possible filosa had been retained. To increase the number of genes, trees uniting the bikonts when alveolates, stramenopiles, we repeated the blasting and selecting procedures using this Rhizaria, and excavates are defined as a multifurcation, time the B. natans sequences as query. Overall, it resulted and the WAG model with all parameters to be estimated in a data set of homologous aligned genes containing for (12 gamma categories). Based on the substitution rates Rhizaria both R. filosa and B. natans, only R. filosa,oronly expressed in number of substitution per sites, we defined B. natans, in addition to all other eukaryotic species. Align- several categories of sites (i.e., going from the fastest evolv- ments were eye checked and refined manually with BioEdit ing sites to slower evolving sites). Seven different align- 7.0.5 (Hall 1999), and ambiguously aligned positions were ments were generated, each having 1 category plus the removed with Gblocks (Castresana 2000). faster categories of sites removed (see fig. 2 for the details). Because of the limited data for certain groups and to PHYML and CODEML were executed on the Vital-IT maximize the number of genes by taxonomic assemblage, computational facilities at the Swiss Institute of Bio- some higher taxa were represented by different closely re- informatics (http://www.vital-it.ch). The parallel (MPI) lated species: Paramecium, Phytophthora, Cryptosporidium, MrBayes was run at the freely available University of Oslo Rhodophyta, and Theileria (for details see Supplementary Bioportal (http://www.bioportal.uio.no). 1924 Burki and Pawlowski

Testing Phylogenies 2004). In all analyses, 3 major assemblages of species can be distinguished. The first assemblage comprises ani- Phylogenetic hypotheses were tested using the approx- mals, fungi, and Amoebozoa, that is, the ‘‘unikonts’’ of imately unbiased (AU) test (Shimodaira 2002). For each Stechmann and Cavalier-Smith (2003). The second assem- tested tree, site likelihoods were calculated using CODEML blage is composed of green plants and rhodophytes, which andtheAUtestwasperformedusingCONSEL(Shimodaira form a strongly supported grouping of the primary photo- and Hasegawa 2001) with default scaling and replicate synthetic eukaryotes (Rodriguez-Ezpeleta et al. 2005). The values. third assemblage includes all other unicellular ‘‘bikonts’’ (stramenopiles, alveolates, rhizarians, and excavates). Results These 3 major assemblages are strongly supported in the Sequences and Alignments analysis of the CA, but, with the exception of the MrBayes Thirty-seven eukaryotic species representing a broad analysis, the support is globally weaker in the case of taxonomic sampling and for which a large amount of data NMDA (Supplementary Figs. S1, S3, and S5, Supplemen- are available were selected. From our initial data set, we tary Material online). Although most of the supergroups of retained 85 proteins (see Supplementary Table S2, Supple- eukaryotes, including Rhizaria, are recovered in all analy- mentary Material online) according to the following crite- ses, their relationships are not well resolved. In particular, ria: 1) at least 19 species out of the total of 37 (. 50%) the assemblage of unicellular bikonts appears as an unre- could be retrieved, 2) at least one out of the 2 rhizarian spe- solved radiation of 4 supergroups (fig. 1). cies were present, and 3) the orthology between all species The phylogenetic position of Rhizaria varied depend- was unambiguous on the base of ML trees. To minimize ing on both the type of alignment and the method of missing data in Rhizaria, sequences were shorted by remov- analysis. In the ML (PHYML) analysis of the CA (Supple- ing all sites if present neither in R. filosa nor in B. natans, mentary Fig. S2, Supplementary Material online), Rhizaria leading to a final concatenated alignment of 13,258 amino branch as sister group to stramenopiles, whereas in the acid positions (CA). Overall, the average missing data Bayesian analysis (Supplementary Fig. S4, Supplementary across the alignment were 21% with a minimum of no miss- Material online), they branch as sister group to excavates. ing data in Homo sapiens and Drosophila melanogaster This last topology was also found in the ML analysis using (0%) and a maximum in Alexandrium tamarense (79.55%) TREEFINDER, but in this case, the ciliates branched be- (for a detailed list see Supplementary Table S1, Supple- tween Rhizaria and excavates (not shown). Both ML and mentary Material online). Bayesian methods show Rhizaria branching as sister group We also considered for analyses a reduced alignment to excavates in analysis of the NMDA, but the bootstrap where genes not found in our R. filosa ESTs survey were support for this and other groupings was rather weak taken off, leaving 9,947 amino acid positions (R. filosa no (Supplementary Figs. S3 and S5, Supplementary Material missing data alignment or NMDA). This has been done for online). 2 reasons. First, R. filosa is our organism of main interest, To better examine the position of Rhizaria, we succes- thus we wanted to have an alignment without any missing sively removed some fast-evolving lineages, which could data for this species. Second, the B. natans EST data set potentially introduce systematic bias in our analyses, espe- contains a lot of sequences encoding plastid-targeted pro- cially with analyses of large-scale data sets (Brinkmann teins with a chlorophyte green algal origin for the most part et al. 2005; Jeffroy et al. 2006). In particular, to avoid a but also with streptophyte algae, red algae, or even bacteria long-branch attraction (LBA) artifact (Felsenstein 1978), origins (Archibald, Roger, et al. 2003). Although quite a we reanalyzed our data in absence of excavates or ciliates, few of these ESTs have already been annotated (Archibald, which appeared particularly unstable in our analyses. These Roger, et al. 2003; Rogers et al. 2004), it was crucial to modifications of species composition had different impacts avoid the mixture of host genes with nonannotated endo- on the rhizarian position, depending on both the alignment symbiont or laterally transferred genes. Based on separate studied and the method used. After removing both Giardia phylogenetic analyses for each selected gene, we were and Trichomonas, or all excavates at the same time, the able to discard many questionable B. natans genes (i.e., topology of the CA tree remained unchanged (see Supple- B. natans genes that doubtfully branched very closely to mentary Fig. S2, Supplementary Material online), whereas plants), but one might still argue that some genes with only the NMDA topology was drastically changed as the rela- B. natans as rhizarian species in our CA have originated tionship between Rhizaria and stramenopiles was recovered through secondary endosymbiosis or lateral gene transfer. (data not shown). When ciliates were removed, Rhizaria Thus, considering the NMDA where for every B. natans branched as sister group to excavates in ML analyses of sequence an orthologous rhizarian sequence from R. filosa both complete and NMDA. Finally, because R. filosa has was available lead to higher confidence in our results (see a slightly longer branch than B. natans (see fig. 1), we tested below). whether B. natans alone prefers the excavate or the strame- nopile position by reconstructing a TREEFINDER tree (not shown). Interestingly, it branched as sister to stramenopiles Phylogenetic Position of Rhizaria preventing us to rule out the possibility that the relationship The analyses of the CA and the NMDA give trees of between Rhizaria and excavates is due to the rapid evolu- generally similar structure (fig. 1 and Supplementary Ma- tionary rates of foraminifers. terial online), congruent with global eukaryotic phylogenies This observed instability could indicate the presence inferred in previous EST-based studies (Philippe et al. in the data of 2 opposite signals of similar strength (a Monophyly and Evolutionary Relationship of Rhizaria 1925

100 Trichomonas 93 Giardia 100 Trypanosoma Excavates Leishmania 97 Reticulomyxa Rhizaria 94 Bigelowiella 100 Thalassiosira 100 Phaeodactylum Stramenopiles Phytophthora 100 Theileria 63 Plasmodium 97 Cryptosporidium 100 Toxoplasma Alveolates 52 Alexandrium 100 Paramecium Tetrahymena 100 Oryza 100 Arabidopsis Plants 93 Physcomitrella Rhodophyta 100 Mus 95 Homo 100 Xenopus 99 Danio 53 Strongylocentrotus Animals 100 Drosophila 100 Anopheles Schistosoma Caenorhabditis 100 100 Neurospora 100 Gibberella Yarrowia 100 Fungi 91 100 Debaryomyces 100 Ustilago Cryptococcus Entamoeba 96 Amoebozoa Dictyostelium

0.1

FIG. 1.—Consensus ML phylogenetic tree as obtained with TREEFINDER after the analysis of the complete data set (CA). Hundred bootstrap replicates were done (bootstrap support are represented by the numbers at nodes), and the unresolved nodes correspond to relationships recovered in less than 50 replicates. phylogenetic and a nonphylogenetic signal) that prevent Additionally, to assess a confidence level for the phylogenetic methods from finding the true evolutionary tree comparison of the topologies, we performed the AU test, (N. Rodriguez-Ezpeleta and H. Philippe, personal commu- which is considered as the least-biased and most rigorous nication). One way to eliminate the nonphylogenetic signal test available to date (Shimodaira 2002). Precisely, the and extract the true evolutionary information is the removal only 4 different topologies obtained during this study of potentially saturated fast-evolving sites. To do this, we (i.e., topologies in fig. 2) were tested, given CA, NMDA, divided the fastest evolving amino acid positions in the and the 7 alignments resulting from the removal of class of CA in different categories according to their evolutionary sites. The rows 2 to 5 of table 1 corresponds to the com- rates and inferred ML trees based on alignments successively parison of the 4 trees given CA and shows that no topology shortened by removing a class of sites. Figure 2 (A, B, C,and can be rejected although topology D is just above the limit D) shows the 4 different topologies we obtained and their at the significance level of 0.05. Focusing on NMDA, the occurrence (fig. 2E). As one can notice, the relationships AU test significantly rejects topologies B and D (rows 6–9 were very dependent on both the alignment and the method. of table 1), keeping only solutions where Rhizaria are PHYML gave a mixture of topologies B and C, whereas directly related to excavates. As we go further down, the TREEFINDER mostly found topology D but also found to- rest of the results in table 1 means that all topologies passed pology B when the 5 fastest categories were removed. Based the test (no rejection), except topology D, which is either on these comparisons, one cannot obviously decide in favor discarded with the shortest alignments or just above the of a particular topology as no clear pattern appears. rejection limit. 1926 Burki and Pawlowski

A B Excavates Excavates

Rhizaria + dino Apicomplexa + dino Ciliates Ciliates Rhizaria

Stramenopiles Stramenopiles

Plants Plants

Animals Animals

Fungi Fungi

Amoebozoa Amoebozoa

C D Excavates Excavates

Rhizaria Ciliates Stramenopiles Rhizaria

Apicomplexa + dino Apicomplexa + dino Ciliates Stramenopiles

Plants Plants

Animals Animals

Fungi Fungi

Amoebozoa Amoebozoa

E

a s .filo

Complet No R ithout 3.20 alignment missing data Without 3.80 Without 3.70 Without 3.60 Without 3.50 Without 3.40 Without 3.30 W

Aln length 13258 9947 13006 12827 12689 12590 12507 12434 12370

PHYML BABCBC BBC

TreeFinder DADDDD DBB

Mrbayes CA ---- -C-

FIG. 2.—Results of the fast-evolving site removal analysis. (A, B, C, and D) The 4 different topologies obtained after successively excluding classes of sites (see Materials and Methods for details). The length of triangles corresponds to the branch length of the faster evolving lineage in that group, and the width is proportional to the number of taxa included in our analyses. (E) Summary of the different data sets analyzed with for each class (columns) the length of the alignment in amino acids and the topology obtained with PHYML, TREEFINDER, and MrBayes (rows). Monophyly and Evolutionary Relationship of Rhizaria 1927

Table 1 Likelihood AU Test of Alternative Tree Topologies Alignments/Tree Topologies D Ln La AU Test Complet (CA)/Rhiz. sister to exc. chromal. (fig. 2A) 50.5 0.147 Complet (CA)/Rhiz. sister to stram. (fig. 2B) 14.6 0.412 Complet (CA)/Rhiz. sister to exc. (fig. 2C) ÿ14.6 0.802 Complet (CA)/Rhiz. sister to ciliates 1 exc. (fig. 2D) 69.9 0.051 No missing data R. filosa (NMDA)/Rhiz. sister to exc. chromal. (fig. 2A) ÿ15.5 0.702 No missing data R. filosa (NMDA)/Rhiz. sister to stram. (fig. 2B) 63.2 0.028 No missing data R. filosa (NMDA)/Rhiz. sister to exc. (fig. 2C) 15.5 0.438 No missing data R. filosa (NMDA)/Rhiz. sister to ciliates 1 exc. (fig. 2D) 71.1 0.044 Without 3.80/Rhiz. sister to exc. chromal. (fig. 2A) 42.4 0.183 Without 3.80/Rhiz. sister to stram. (fig. 2B) 15.8 0.386 Without 3.80/Rhiz. sister to exc. (fig. 2C) ÿ15.8 0.816 Without 3.80/Rhiz. sister to ciliates 1 exc. (fig. 2D) 71.4 0.052 Without 3.70/Rhiz. sister to exc. chromal. (fig. 2A) 67.6 0.227 Without 3.70/Rhiz. sister to stram. (fig. 2B) 18.1 0.360 Without 3.70/Rhiz. sister to exc. (fig. 2C) ÿ18.1 0.821 Without 3.70/Rhiz. sister to ciliates 1 exc. (fig. 2D) 37.6 0.063 Without 3.60/Rhiz. sister to exc. chromal. (fig. 2A) 50.7 0.138 Without 3.60/Rhiz. sister to stram. (fig. 2B) 11.2 0.420 Without 3.60/Rhiz. sister to exc. (fig. 2C) ÿ11.2 0.793 Without 3.60/Rhiz. sister to ciliates 1 exc. (fig. 2D) 69.2 0.057 Without 3.50/Rhiz. sister to exc. chromal. (fig. 2A) 52.5 0.117 Without 3.50/Rhiz. sister to stram. (fig. 2B) 14.7 0.401 Without 3.50/Rhiz. sister to exc. (fig. 2C) ÿ14.7 0.805 Without 3.50/Rhiz. sister to ciliates 1 exc. (fig. 2D) 74.7 0.036 Without 3.40/Rhiz. sister to exc. chromal. (fig. 2A) 57.0 0.093 Without 3.40/Rhiz. sister to stram. (fig. 2B) 15.5 0.374 Without 3.40/Rhiz. sister to exc. (fig. 2C) ÿ15.5 0.818 Without 3.40/Rhiz. sister to ciliates 1 exc. (fig. 2D) 76.8 0.040 Without 3.30/Rhiz. sister to exc. chromal. (fig. 2A) 58.1 0.104 Without 3.30/Rhiz. sister to stram. (fig. 2B) 12.5 0.409 Without 3.30/Rhiz. sister to exc. (fig. 2C) ÿ12.5 0.775 Without 3.30/Rhiz. sister to ciliates 1 exc. (fig. 2D) 80.1 0.022 Without 3.20/Rhiz. sister to exc. chromal. (fig. 2A) 60.3 0.095 Without 3.20/Rhiz. sister to stram. (fig. 2B) 14.5 0.372 Without 3.20/Rhiz. sister to exc. (fig. 2C) ÿ14.5 0.794 Without 3.20/Rhiz. sister to ciliates 1 exc. (fig. 2D) 84.1 0.023

NOTE.—Underlined numbers correspond to the significant P values of the rejected topologies. Abbreviations are as follows: Rhiz. 5 Rhizaria; exc. 5 excavates; chromal. 5 chromalveolates; stram. 5 stramenopiles. a Log likelihood difference.

Discussion rhizarians, especially the polycystine and acantharian radio- larians, should still be confirmed by multigene data. Our data bring a new multigenic evidence for the close Although the monophyly of Rhizaria (Cercozoa 1 evolutionary relationships between Foraminifera and Cer- Foraminifera) was ascertained by our data, their phylogenetic cozoa. The branching of the foraminiferan R. filosa and the position in the eukaryotic tree remains questionable. Two chlorarachniophyte B. natans receives strong bootstrap concurrent hypotheses on the relationships between Rhizaria support in all our analyses. Besides, these 2 species branch and other eukaryotes were brought by our analyses, pre- together in all different topologies we obtained (fig. 2). The venting us from a univocal conclusion. According to the first relationships between these 2 phyla were previously sug- hypothesis, Rhizaria are sister group to excavates. There are gested based on analyses of actin (Keeling 2001; Flakowski several lines of evidence supporting this hypothesis: 1) all et al. 2005), polyubiquitin (Archibald, Longet, et al. 2003; phylogenetic reconstruction methods used in this study show Bass et al. 2005), RNA polymerase (Longet et al. 2003), and this association when an alignment with for R. filosa is ana- lyzed; 2) if ciliates are removed from the taxa sampling, this SSU rRNA gene (Berney and Pawlowski 2003; Cavalier- union is also recovered with the alignment of the complete Smith and Chao 2003). With more than 80 analyzed genes, data set; 3) topology comparisons never reject trees where our study strongly confirms these single-gene analyses, Rhizaria are specifically related to excavates and they are providing a compelling evidence for the monophyly of always the best plausible trees examined. Finally, this rela- Rhizaria. However, as this supergroup is very heterogenous tionship has been previously suggested based on the presence (Adl et al. 2005), the phylogenetic position of other putative of secondary symbiosis with green algae in some excavates 1928 Burki and Pawlowski

(Euglena) and some rhizarians (chlorarachniophytes) and is has been proposed that this lack of resolution observed in known as the cabozoan hypothesis (Cavalier-Smith 1999). other EST-based phylogenies is due to the mutational sat- More unexpected is the second hypothesis suggesting uration, phylogenetic incongruence, or rapid diversification that Rhizaria are sister group to stramenopiles. The branch- (Philippe et al. 2004). Indeed, it has been demonstrated by ing of Rhizaria and stramenopiles is shown by many of ML single-gene phylogenies that some excavates (Philippe et al. analyses (fig. 2) and none of these trees can be statistically 2000), foraminifers (Pawlowski et al. 1996), and ciliates rejected (table 1). Moreover, Rhizaria also branch with (Philippe and Adoutte 1998) can evolve exceptionally rap- stramenopiles when fast-evolving excavate sequences are idly, and it cannot be excluded that most part of these removed as well as when the less divergent B. natans se- genomes show accelerate rates of evolution. In our trees, quence in isolation is kept. If this configuration turns out to this is particularly well illustrated by the case of ciliates be correct with additional evidence such as discrete charac- (Tetrahymena 1 Paramecium). Although there are several ters or phylogenomic analyses of other less rapidly evolv- evidences that ciliates share a common ancestor with api- ing rhizarians, it would have important implications on the complexans and dinoflagellates (Cavalier-Smith 1993; Fast chromalveolates hypothesis (Harper et al. 2005). This hy- et al. 2002; Leander and Keeling 2003, 2004), in our anal- pothesis is based, among others, on a specific model of plas- yses, they often branch as sister group to excavates tid evolution suggesting that both stramenopiles and (fig. 2D), but this branching is systematically rejected by alveolates (with the exception of ciliates) have a plastid de- the AU test, suggesting an artifactual position. rived from a single endosymbiotic event with a red algae in The accelerated rates of evolution in some unicellular their common ancestor (Cavalier-Smith 1999; Harper et al. bikonts, which potentially erase the phylogenetic signal, are 2005). A putative sister relationship between Rhizaria and probably the main source of problems when inferring their stramenopiles would complicate the situation suggesting evolutionary relationships. However, other possible causes that either stramenopiles have acquired their secondary cannot be discarded. One of them could be the rapid plastid in an independent event of endosymbiosis or the diversification of eukaryotes, suggested by some authors single engulfment of a red algae occurred in a very early (Cavalier-Smith 2002). In fact, the lack of resolution in stage of chromalveolates evolution and the resulting plastid early animal phylogeny compared with the well-resolved was secondarily lost in certain lineages, such as ciliates phylogeny of fungi (observed also in our data, see fig. 1) and Rhizaria. Although such a scenario is certainly less par- has been interpreted as an indirect evidence for Cambrian simonious than the chromalveolates or cabozoan hypo- explosion (Rokas et al. 2005). However, it is not clear why theses, none of them are actually strongly supported by such rapid diversification would occur in the unicellular multigenic data. bikonts but not in other eukaryotes. Alternatively, it may The uncertainty concerning the phylogenetic position be that the position of the root for the eukaryotic tree be- of Rhizaria reflects the general difficulties in resolving the tween unikonts and bikonts, principally based on a single phylogeny in this part of the eukaryotic tree. Except for genomic fusion (Stechmann and Cavalier-Smith 2002), is plants, whose position seems to be well established, the not correct. Some authors indeed suggest that this root relations between all other groups of bikonts remained un- could rather be on the branch leading to Opisthokonts or resolved. This is not surprising given that even the analyses to the common ancestor of diplomonads/parabasalids of larger data sets, with more than 100 proteins, failed to (Arisue et al. 2005). If this is true, then the unicellular properly resolve the phylogeny of bikonts (Bapteste et al. bikonts would be paraphyletic and their phylogeny will 2002). For example, chromalveolates were strongly sup- be particularly difficult to resolve. ported in multigene phylogenies only when no other unicel- To conclude, resolving the phylogeny of bikonts will lular bikonts were present in the analyses (Rodriguez- probably require several additional efforts. As illustrated by Ezpeleta et al. 2005) and other phylogenetic analyses our study, the addition of new higher-level taxa, such as provided only mixed support for this plastid-based view Rhizaria, is not sufficient but may help to solidify the rela- of eukaryotic relationships (Yoon et al. 2002, 2004). De- tionships within particular supergroups. It is doubtful spite this lack of clear support, the union of chromalveolate whether better resolution can be achieved only by increas- taxa has been potentially confirmed by the existence of a ing the number of analyzed genes (more EST or whole- gene replacement in which the cytosolic GAPDH gene genome data). In fact, the analysis of selected slowly evolv- was duplicated and retargeted to the plastid uniquely in ing genes may be more informative than the analysis of these taxa (Fast et al. 2001; Harper and Keeling 2003). Nev- large databases, as it has been shown in case of chromal- ertheless, none of these studies was directly concerned by veolates (Harper et al. 2005). Also, searching for new the overall phylogeny of bikonts, which resulted in a rela- genomic signatures may be an essential complement to tively limited taxon sampling of unicellular bikonts and multigene analyses. Finally, proper rooting of the eukary- a lack of detailed analysis of their relationships. By adding otic tree will be crucial for an accurate interpretation of the Rhizaria and all available sequence data on stramenopiles, relationships between unicellular bikonts and a better un- alveolates, and excavates, we included in our analyses all derstanding of the deep phylogeny of eukarytotes. major bikont phyla, except haptophytes, cryptophytes, and centrohelids. However, even with such exhaustive sam- Supplementary Material pling, we were unable to resolve the relationships between these taxa. Supplementary Tables S1 and S2 and Figures S1, S2, The obvious question is why multigene analyses can- S3, S4, and S5 are available at Molecular Biology and Evo- not reliably resolve the phylogeny of unicellular bikonts? It lution online (http://www.mbe.oxfordjournals.org/). Monophyly and Evolutionary Relationship of Rhizaria 1929

Acknowledgments Castresana J. 2000. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol The authors would like to thank Jose´Fahrni, Jackie Evol 17:540–52. Guiard, Ignacio Bolivar, and Sergey Nikolaev for technical Cavalier-Smith T. 1993. Kingdom protozoa and its 18 phyla. assistance and constructive discussions; Juan Montoya and Microbiol Rev 57:953–94. Naiara Rodriguez-Ezpeleta for useful suggestions; Bruno Cavalier-Smith T. 1998. A revised six-kingdom system of life. Nyffeler and Jacques Rougemont for help with the Vital- Biol Rev Camb Philos Soc 73:203–66. IT server; Kamran Shalchian-Tabrizi for help with the Cavalier-Smith T. 1999. Principles of protein and lipid targeting in Bioportal server; and Vassilios Ioannidis for help with secondary symbiogenesis: euglenoid, dinoflagellate, and spo- programming. This research was supported by the Swiss rozoan plastid origins and the eukaryote family tree. J Eukaryot National Science Foundation grant 3100A0-100415 and Microbiol 46:347–66. 3100A0-112645 (J.P.). Cavalier-Smith T. 2002. The phagotrophic origin of eukaryotes and phylogenetic classification of Protozoa. Int J Syst Evol Microbiol 52:297–354. Literature Cited Cavalier-Smith T, Chao EE. 1997. Sarcomonad ribosomal RNA sequences, rhizopod phylogeny and the origin of euglyphid Abascal F, Zardoya R, Posada D. 2005. ProtTest: selection of best- amoebae. Arch Protistenkd 147:227–36. fit models of protein evolution. Bioinformatics 21:2104–5. Cavalier-Smith T, Chao EE. 2003. Phylogeny and classification of Adl SM, Simpson AGB, Farmer MA, et al. (28 co-authors). 2005. phylum Cercozoa (Protozoa). Protist 154:341–58. The new higher level classification of eukaryotes with empha- de Koning A, Tartar A, Boucias D, Keeling P. 2005. Expressed sis on the taxonomy of protists. J Eukaryot Microbiol 52: sequence tag (EST) survey of the highly adapted green algal 399–451. parasite, Helicosporidium. Protist 156:181–90. Archibald JM, Longet D, Pawlowski J, Keeling PJ. 2003. A novel Fast NM, Kissinger JC, Roos DS, Keeling PJ. 2001. Nuclear- polyubiquitin structure in Cercozoa and Foraminifera: evi- encoded, plastid-targeted genes suggest a single common dence for a new eukaryotic supergroup. Mol Biol Evol origin for apicomplexan and dinoflagellate plastids. Mol Biol 20:62–6. Evol 18:418–26. Archibald JM, Rogers MB, Toop M, Ishida KI, Keeling PJ. 2003. Fast NM, Xue L, Bingham S, Keeling P. 2002. Re-examining al- Lateral gene transfer and the evolution of plastid-targeted pro- veolate evolution using multiple protein molecular phyloge- teins in the secondary plastid-containing alga Bigelowiella nies. J Eukaryot Microbiol 49:30–7. natans. Proc Natl Acad Sci 100:7678–83. Felsenstein J. 1978. Cases in which parsimony or compatibility Arisue N, Hasegawa M, Hashimoto T. 2005. Root of the Eukar- methods will be positively misleading. Syst Zool 27:401–10. yota tree as inferred from combined maximum likelihood anal- Felsenstein J. 1985. Confidence limits on phylogenies: an ap- yses of multiple molecular sequence data. Mol Biol Evol proach using the bootstrap. Evolution 40:783–91. 22:409–20. Felsenstein J. 2004. PHYLIP (phylogeny inference package). Baldauf SL, Palmer JD. 1993. Animals and fungi are each other’s Version 3.6. Seattle, WA: University of Washington. closest relatives: congruent evidence from multiple proteins. Flakowski J, Bolivar I, Fahrni J, Pawlowski J. 2005. Actin Proc Natl Acad Sci USA 90:11558–62. phylogeny of Foraminifera. J Foraminifer Res 35:93–102. Baldauf SL, Roger AJ, Wenk-Siefert I, Doolitle WF. 2000. A Guindon S, Gascuel O. 2003. A simple, fast, and accurate algo- kingdom-level phylogeny of eukaryotes based on combined rithm to estimate large phylogenies by maximum likelihood. protein data. Science 290:972–7. Syst Biol 52:696–704. Bapteste E, Brinkmann H, Lee JA, et al. (11 co-authors). 2002. Hall T. 1999. BioEdit: a user-friendly biological sequence align- The analysis of 100 genes supports the grouping of three highly ment editor and analysis program for Windows 95/98/NT. divergent amoebae: Dictyostelium, Entamoeba, and Mastiga- Nucleic Acids Symp Ser 41:95–8. moeba. Proc Natl Acad Sci USA 99:1414–9. Hampl V, Horner DS, Dyal P, Kulda J, Flegr J, Foster PG, Embley Bass D, Moreira D, Purificacion L, Polet S, Chao E, von der TM. 2005. Inference of the phylogenetic position of oxymo- Heyden S, Pawlowski J, Cavalier-Smith T. 2005. Polyubiqui- nads based on nine genes: support for Metamonada and Exca- tin insertions and the phylogeny of Cercotoa and Rhizaria. vata. Mol Biol Evol 22:2508–18. Protist 156:149–61. Harper JT, Keeling PJ. 2003. Nucleus-encoded, plastid-targeted Berney C, Pawlowski J. 2003. Revised small subunit rRNA anal- glyceraldehyde-3-phosphate dehydrogenase (GAPDH) indi- ysis provides further evidence that Foraminifera are related to cates a single origin for the chromalveolate plastids. Mol Biol Cercozoa. J Mol Evol 57:S120–7. Evol 20:1730–5. Bhattacharya D, Helmchen T, Melkonian M. 1995. Molecular Harper JT, Waanders E, Keeling PJ. 2005. On the monophyly of evolutionary analyses of nuclear-encoded small subunit ribo- chromalveolates using a six-protein phylogeny of eukaryotes. somal RNA identify an independent Rhizopod lineage contain- Int J Syst Evol Microbiol 55:487–96. ing the Euglyphida and the Chlorarachniophyta. J Eukaryot Jeffroy O, Brinkmann H, Delsuc F, Philippe H. 2006. Phyloge- Microbiol 42:65–9. nomics: the beginning of incongruence? Trends Genet Brinkmann H, Giezen M, Zhou Y, Poncelin de Raucourt G, Phil- 22:225–31. ippe H. 2005. An empirical assessment of long-branch attrac- Jobb G, von Haeseler A, Strimmer K. 2004. TREEFINDER: tion artefacts in deep eukaryotic phylogenomics. Syst Biol a powerful graphical analysis environment for molecular phy- 54:743–57. logenetics. BMC Evol Biol 4:18. Burki F, Berney C, Pawlowski J. 2002. Phylogenetic position of Keeling PJ. 2001. Foraminifera and Cercozoa are related in actin Gromia oviformis Dujardin inferred from nuclear-encoded phylogeny: two orphans find a home? Mol Biol Evol 18: small subunit ribosomal DNA. Protist 153:251–60. 1551–7. Burki F, Nikolaev S, Bolivar I, Guiard J, Pawlowski J. 2006. Anal- Keeling PJ, Burger G, Durnford DG, Lang BF, Lee RW, Pearlman ysis of expressed sequence tags (ESTs) from a naked forami- RE, Roger AJ, Gray MW. 2005. The tree of eukaryotes. Trends niferan Reticulomyxa filosa. Genome. Forthcoming. Ecol Evol 20:670–6. 1930 Burki and Pawlowski

Keeling P, Palmer J. 2001. Lateral transfer at the gene and sub- Rogers MB, Archibald JM, Field M, Li C, Strieped B, Keeling PJ. genic levels in the evolution of eukaryotic enolase. Proc Natl 2004. Plastid-targeting peptides from the Chlorarachniophyte Acad Sci USA 98:10745–50. Bigelowiella natans. J Eukaryot Microbiol 51:529–35. Kumar S, Rzhetsky A. 1996. Evolutionary relationships of Rokas A, Kruger D, Caroll SB. 2005. Animal evolution and the eukaryotic kingdoms. J Mol Evol 42:183–93. molecular signature of radiations compressed in time. Science Leander B, Keeling P. 2003. Morphostasis in alveolate evolution. 310:1933–8. Trends Ecol Evol 18:498–504. Ronquist F, Huelsenbeck JP. 2003. MrBayes 3: Bayesian phylo- Leander B, Keeling P. 2004. Early evolutionary history of dino- genetic inference under mixed models. Bioinformatics flagellates and apicomplexans (Alveolata) as inferred from 19:1572–4. HSP90 and actin phylogenies. J Phycol 40:341–50. Shimodaira H. 2002. An approximately unbiased test of phyloge- Longet D, Archibald JM, Keeling PJ, Pawlowski J. 2003. Fora- netic tree selection. Syst Biol 51:492–508. minifera and Cercozoa share a common origin according to Shimodaira H, Hasegawa M. 2001. CONSEL: for assessing the RNA polymerase II phylogenies. Int J Syst Evol Microbiol confidence of phylogenetic tree selection. Bioinformatics 53:1735–9. 17:1246–7. Longet D, Burki F, Flakowski J, Berney C, Polet S, Fahrni J, Simpson AGB, Inagaki Y, Roger AJ. 2006. Comprehensive multi- Pawlowski J. 2004. Multigene evidence for close evolutionary gene phylogenies of excavate protists reveal the evolutionary relations between Gromia and Foraminifera. Acta Protozool positions of ‘‘primitive’’ eukaryotes. Mol Biol Evol 23:615–25. 43:303–11. Simpson AGB, Radek R, Dacks J, O’Kelly C. 2002. How oxy- Moreira D, Le Guyader H, Philippe H. 1999. Unusually high evo- monads lost their groove: an ultrastructural comparison of lutionary rate of the elongation factor 1 alpha genes from the Monocercomonoides and excavate taxa. J Eukaryot Microbiol Ciliophora and its impact on the phylogeny of eukaryotes. Mol 49:239–48. Biol Evol 16:234–45. Simpson AGB, Roger AJ, Silberman JD, Leipe DD, Edgcomb VP, Morin L. 2000. Long branch attraction effects and the status of Jermiin LS, Patterson DJ, Sogin ML. 2002. Evolutionary his- ‘‘basal eukaryotes’’: phylogeny and structural analysis of the tory of ‘‘early-diverging’’ eukaryotes: the excavate taxon Car- ribosomal RNA gene cluster of the free-living diplomonad pediemonas is a close relative of Giardia. Mol Biol Evol Trepomonas agilis. J Eukaryot Microbiol 47:167–77. 19:1782–91. Nikolaev SI, Berney C, Fahrni JF, Bolivar I, Polet S, Mylnikov Sogin M. 1991. Early evolution and the origin of eukaryotes. Curr AP, Aleshin VV, Petrov NB, Pawlowski J. 2004. The twilight Opin Genet Dev 1:457–63. of Heliozoa and rise of Rhizaria, an emerging supergroup of Sogin ML, Gunderson J, Elwood H, Alonso R, Peattie D. 1989. Phylogenetic meaning of the kingdom concept: an unusual amoeboid eukaryotes. Proc Natl Acad Sci USA 101:8066–71. ribosomal RNA from Giardia lamblia. Science 243:75–7. Pawlowski J, Bolivar I, Fahrni J, Cavalier-Smith T, Gouy M. Sogin ML, Silberman JD. 1998. Evolution of the protists and pro- 1996. Early origin of foraminifera suggested by SSU rRNA tistan parasites from the perspective of molecular systematics. gene sequences. Mol Biol Evol 13:445–50. Int J Parasitol 28:11–20. Philippe H. 2000. Opinion: long branch attraction and protists Stechmann A, Cavalier-Smith T. 2002. Rooting the eukaryote tree phylogeny. Protist 151:307–16. by using a derived gene fusion. Science 297:89–91. Philippe H, Adoutte A. 1998. The molecular phylogeny of Stechmann A, Cavalier-Smith T. 2003. The root of the eukaryote Eukaryote: solid facts and uncertainties. In: Coombs G, tree pinpointed. Curr Biol 13:R665–6. Vickerman K, Sleigh M, Warren A, editors. Evolutionary Steenkamp ET, Wright J, Baldauf SL. 2006. The protistan origins relationships among Protozoa. London: Chapman and Hall. of animals and fungi. Mol Biol Evol 23:93–106. p 25–56. Stiller JW, Hall BD. 1999. Long-branch attraction and the rDNA Philippe H, Germot A. 2000. Phylogeny of eukaryotes based on model of early eukaryotic evolution. Mol Biol Evol 16: ribosomal RNA: long-branch attraction and models of se- 1270–9. quence evolution. Mol Biol Evol 17:830–4. Thompson J, Higgins D, Gibson T. 1994. CLUSTAL W: improv- Philippe H, Lartillot N, Brinkmann H. 2005. Multigene analyses ing the sensitivity of progressive multiple sequence alignment of bilaterian animals corroborate the monophyly of Ecdysozoa, through sequence weighting, position-specific gap penalties Lophotrochozoa, and Protostomia. Mol Biol Evol 22:1246–53. and weight matrix choice. Nucleic Acids Res 22:4673–80. Philippe H, Lopez P, Brinkmann H, Budin K, Germot A, Laurent Yamamoto A, Hashimoto T, Asaga E, Hasegawa M, Goto N. J, Moreira D, Muller M, Le Guyader H. 2000. Early-branching 1997. Phylogenetic position of the mitochondrion-lacking pro- or fast-evolving eukaryotes? An answer based on slowly tozoan Trichomonas tenax, based on amino acid sequences of evolving positions. Proc R Soc Lond B Biol Sci 267:1213–21. elongation factors 1. J Mol Evol 44:98–105. Philippe H, Snell EA, Bapteste E, Lopez P, Holland PWH, Casane Yang Z. 1997. PAML: a program package for phylogenetic anal- D. 2004. Phylogenomics of eukaryotes: impact of missing data ysis by maximum likelihood. Comput Appl Biosci 13: on large alignments. Mol Biol Evol 21:1740–52. 555–6. Polet S, Berney C, Fahrni J, Pawlowski J. 2004. Small-subunit Yoon HS, Hackett JD, Pinto G, Bhattacharya D. 2002. The single, ribosomal RNA gene sequences of Phaeodarea challenge ancient origin of chromist plastids. Proc Natl Acad Sci USA the monophyly of Haeckel’s Radiolaria. Protist 155:53–63. 99:15507–12. Posada D, Buckley T. 2004. Model selection and model averaging Yoon HS, Hackett JD, Ciniglia C, Pinto G, Bhattacharya D. 2004. in phylogenetics: advantages of Akaike information criterion A molecular timeline for the origin of photosynthetic eukar- and Bayesian approaches over likelihood ratio tests. Syst Biol yotes. Mol Biol Evol 21:809–18. 53:793–808. Rodriguez-Ezpeleta N, Brinkmann H, Burey SC, Roure B, Burger G, Loffelhardt W, Bohnert HJ, Philippe H, Lang BF. 2005. Martin Embley, Associate Editor Monophyly of primary photosynthetic eukaryotes: green plants, red algae, and glaucophytes. Curr Biol 15:1325–30. Accepted July 4, 2006

Phylogenomics Reshuffles the Eukaryotic Supergroups Fabien Burki1*, Kamran Shalchian-Tabrizi3, Marianne Minge3,A˚smund Skjæveland3, Sergey I. Nikolaev2, Kjetill S. Jakobsen3, Jan Pawlowski1

1 Department of Zoology and Animal Biology, University of Geneva, Geneva, Switzerland, 2 Department of Genetic Medicine and Development, University of Geneva, Geneva, Switzerland, 3 Department of Biology, University of Oslo, Oslo, Norway

Background. Resolving the phylogenetic relationships between eukaryotes is an ongoing challenge of evolutionary biology. In recent years, the accumulation of molecular data led to a new evolutionary understanding, in which all eukaryotic diversity has been classified into five or six supergroups. Yet, the composition of these large assemblages and their relationships remain controversial. Methodology/Principle Findings. Here, we report the sequencing of expressed sequence tags (ESTs) for two species belonging to the supergroup Rhizaria and present the analysis of a unique dataset combining 29908 amino acid positions and an extensive taxa sampling made of 49 mainly unicellular species representative of all supergroups. Our results show a very robust relationship between Rhizaria and two main clades of the supergroup chromalveolates: stramenopiles and alveolates. We confirm the existence of consistent affinities between assemblages that were thought to belong to different supergroups of eukaryotes, thus not sharing a close evolutionary history. Conclusions. This well supported phylogeny has important consequences for our understanding of the evolutionary history of eukaryotes. In particular, it questions a single red algal origin of the chlorophyll-c containing plastids among the chromalveolates. We propose the abbreviated name ‘SAR’ (Stramenopiles+Alveolates+Rhizaria) to accommodate this new super assemblage of eukaryotes, which comprises the largest diversity of unicellular eukaryotes. Citation: Burki F, Shalchian-Tabrizi K, Minge M, Skjæveland A˚, Nikolaev SI, et al (2007) Phylogenomics Reshuffles the Eukaryotic Supergroups. PLoS ONE 2(8): e790. doi:10.1371/journal.pone.0000790

INTRODUCTION protein sequences led to two potential affiliations with other A well resolved phylogenetic tree describing the relationships eukaryotes. According to the first hypothesis, Rhizaria was sister among all organisms is one of the most important challenges of group to an excavate clade defined by G. lamblia, T. vaginalis, and modern evolutionary biology. A current hypothesis for the tree of Euglenozoa. The second hypothesis suggested that Rhizaria are eukaryotes proposes that all diversity can be classified into five or closely related to stramenopiles, which form together with six putative very large assemblages, the so-called ‘supergroups’ alveolates, haptophytes, and cryptophytes the supergroup of (reviewed in [1] and [2]). These comprise the ‘Opisthokonta’ and chromalveolates. Besides our study, the branching pattern between ‘Amoeboza’ (often united in the ‘Unikonts’), ‘Archaeplastida’ or Rhizaria and other supergroups has been specifically evaluated ‘Plantae’, ‘Excavata’, Chromalveolata’, and ‘Rhizaria’. The super- only by Hackett et al. (2007), who reported a robust relationship group concept as a whole, however, has been shown to be only between Rhizaria and members of the chromalveolates. moderately supported [3] and the evolutionary links among these Here, we further address the phylogenetic position of Rhizaria groups are yet to be confirmed. These uncertainties may be due to within the eukaryotic tree using an extensive multigene approach. the limited amounts of available data for the most parts of the For this purpose, we have carried out two expressed sequence tag eukaryotic diversity. In particular, only a small fraction of the (EST) surveys of rhizarian species: an undetermined foraminiferan unicellular eukaryote diversity [4] has been subject to molecular species belonging to the genus Quinqueloculina (574 unique studies, leading to important imbalances in phylogenies sequences, Accession Numbers: EV435154-EV435825) and Gym- and preventing researchers to reliably infer deep evolutionary nophrys cometa (628 unique sequences, Accession Numbers: relationships. EV434532-EV435153) (Cienkowski, 1876), a freshwater protist The supergroup Rhizaria [5] is particularly interesting for testing that has been shown to be part of core Cercozoa [20]. Using novel different possible scenarios of eukaryote evolution. This assemblage EST datasets for two rhizarians [21,22] and data from publicly has only recently been described and is based exclusively on available protists (TBestDB; http://tbestdb.bcm.umontreal.ca/ searches/login.php), we constructed a taxonomically broad dataset molecular data; nevertheless it is very well supported in most of 123 protein alignments amounting to nearly 30000 unambig- phylogenies [3]. It includes very diverse organisms such as filose testate amoeba, cercomonads, chlorarachniophytes (together, core Cercozoa), foraminifers, plasmodiophorids, haplosporidians, gro- Academic Editor: Geraldine Butler, University College Dublin, Ireland miids, and radiolarians (see [2] for an overview or [6–11]). In opposition to Rhizaria, the monophyly of Chromalveolata is far from Received June 17, 2007; Accepted July 26, 2007; Published August 29, 2007 being undisputed (see [12], or [3,13–15]). Chromalveolates were Copyright: ß 2007 Burki et al. This is an open-access article distributed under originally defined by their plastid of red algal origin that (when the terms of the Creative Commons Attribution License, which permits present) is believed to have arisen from a single secondary endo- unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. symbiosis [16]. This supergroup encompasses many ecologically important photosynthetic protists, including coccolithophorids Funding: This research was supported by the Swiss National Science Foundation (belonging to the haptophytes), cryptophytes, diatoms, brown grant 3100A0-100415 and 3100A0-112645 (JP); and by research grant (grant no 118894/431) from the Norwegian Research Council (KSJ). seaweeds (together, the chromists) and dinoflagellates (which form together with ciliates and apicomplexans the alveolates) [17,18]. Competing Interests: The authors have declared that no competing interests Using a phylogenomic approach we recently confirmed the exist. monophyly of Rhizaria and addressed the question of its * To whom correspondence should be addressed. E-mail: Fabien.Burki@zoo. evolutionary history [19]. The analyses of 85 concatenated nuclear unige.ch

PLoS ONE | www.plosone.org 1 August 2007 | Issue 8 | e790 Reshuffling the Supergroups uously aligned amino acid positions. Our superalignment includes in the ML inference (68% and 61% BP, respectively) but 1.0 BiPP. several representatives for all described eukaryotic supergroups. Finally, alveolates, stramenopiles, and Rhizaria all formed mono- Our results show an unambiguous relationship between Rhizaria phyletic groups with 100% BP and 1.0 BiPP. Although most of the and stramenopiles, confirming the hypothesis we had previously recognized eukaryotic supergroups are recovered in our analyses, proposed and suggesting the emergence of a new super assemblage the relationships among them are generally not well resolved. This of eukaryotes that we propose to name ‘SAR’ (stramenopiles+al- is with two notable exceptions: the union of the unikonts and, veolates+Rhizaria). much more interestingly, the strongly supported (BP = 100%; BiPP = 1.0) assemblage of stramenopiles, Rhizaria, and alveolates RESULTS (clade SAR), with these last two groups being robustly clustered together (BP = 88%; BiPP = 1.0) (clade SR). Comparisons of Single-gene analyses and concatenation substitution rates between the different lineages were highly non 49 eukaryotic species representatives of all five current super- significant at 1.25%, indicating that all species evolve at very groups for which large amounts of data are available were selected. similar rates, thus rendering unlikely a possible artefact caused by We identified 123 genes (see Table S1) that fulfilled the following long branches (data not shown). criteria: 1) at least one of the four rhizarian species as well as at To further test this unexpected nested position of Rhizaria least one member of unikonts, plants, excavates, alveolates, and between alveolates and stramenopiles, we compared different stramenopiles were present in every single-gene alignment; 2) the topologies by performing the approximately unbiased (AU) test, orthology in every gene was unambiguous on the base of single- which is considered as the least-biased and most rigorous test genes bootstrapped maximum likelihood (ML) trees. This second available to date [30]. More precisely we evaluated two questions: criterion is particularly important in multigene analyses in order to 1) Are Rhizaria indeed monophyletic with stramenopiles and avoid the mixture of distant paralogs in concatenated alignments, alveolates; 2) Are Rhizaria specifically related to stramenopiles, because it would dilute the true phylogenetic signal by opposing with the exclusion of alveolates? Our analyses show that an strong mis-signals, thus preventing the recovering of deep alternative topology, which corresponded to the best topology with relationships [23]. Similarly, it is essential to detect and discard Rhizaria forced not to share a common ancestor with the assemblage putative candidates for endosymbiotic gene transfer (EGT) or composed of stramenopiles and alveolates (Figure S3; Table 1B), had Horizontal Gene Transfer (HGT). Hence, we submitted each of a likelihood significantly lower than the best ML tree obtained our single-gene alignments to ML reconstructions with bootstrap without constraint (Figure 1; Table 1A) at the significance level of replications and systematically removed sequences that displayed 0.05 (P = 4e-008). On the other hand, the two other possible ambiguous phylogenetic positions for both paralogy and gene positions for Rhizaria within the SAR grouping (Table 1D, E) could transfers. For example, we found few cases where B. natans and G. not be significantly rejected (P = 0.112; P = 0.079, respectively), thus theta sequences actually corresponded to genes encoded in the preventing the exclusion of a specific relationship between Rhizaria nucleomorph genome of these species. This restrictive procedure and alveolates or an early divergence of Rhizaria. In addition, we allowed us to have a set of 123 single-gene alignments, each of also tested the relationship between Rhizaria and excavates by them containing at least one rhizarian species, with only evaluating all possible trees in which these two groups are orthologous sequences, and virtually no gene transferred either monophyletic. None of these trees could be retained in the pool of from a plastid or from a foreign source. plausible candidates (data not shown). One possible approach to analyse such a dataset is to build a supermatrix that is formed by the concatenation of individual DISCUSSION genes (for a review see [23]). After concatenation, our final We present in this study the largest dataset currently available for alignment contained 29908 unambiguously aligned amino acid eukaryote phylogeny combining both an extensive taxa sampling positions. Overall, we observed an average missing data of 39% and a large amount of amino acid positions. Our analyses of this but these sites were not uniformly distributed across taxa (see unique dataset bring a strong evidence for the assemblage of Tables S2 and S3 for more details). However, several studies have Rhizaria, stramenopiles and alveolates. Therefore we propose to demonstrated that the phylogenetic power of a dataset remains as label this monophyletic clade ’SAR’. Although weakly suggested in long as a large number of positions are still present in the analysis our previous multigene analysis [19], we show here using a much [24–27]. For example, Wiens [26,27] demonstrated that the larger dataset that this specific grouping is in fact very robust. We inclusion of highly incomplete taxa (with up to 90% missing data) confirm the existence of consistent affinities between assemblages in model-based phylogenies, such as likelihood or Bayesian that were thought to belong to different supergroups of eukaryotes, analysis, could cause dramatic increases in accuracy. thus not sharing a close evolutionary history. The addition of about 20 relevant taxa of unicellular eukaryotes as well as more Phylogenetic position of Rhizaria than 30 genes (to a total of 123 genes) seems to have stabilized the The ML and Bayesian trees inferred from the complete alignment topology to consistently display a monophyly of SAR. Within this (Figure 1; see also Figure S1 and S2) recover a number of groups newly emerged assemblage, Rhizaria appear to be more closely observed previously and are in most aspects congruent with global related to stramenopiles than to alveolates, but topology eukaryotic phylogenies published recently [14,28,29]. A mono- comparisons failed to discard alternative possibilities (i.e. R(SA) phyletic group uniting Metazoa, Fungi, and Amoebozoa (alto- or S(RA)). In addition, we clearly reject the putative relationship gether the unikonts) was robustly supported (100% bootstrap between Rhizaria and excavates [16,19], which has been already support, BP; 1.0 Bayesian posterior probability, BiPP); green convincingly tested in [31]. plants, glaucophytes, and rhodophytes came together, albeit only Interestingly, an association between Rhizaria and stramenopiles weakly supported (56% BP; this node was not recovered in the could already be observed in 18S rRNA trees representing a very Bayesian analysis, see Figure S2); a group composed of large diversity of eukaryotes (see for example [32–34]). More haptophytes and cryptophytes, as well as excavates (without recently, the analysis of 16 protein sequences from 46 taxa also Malawimonas that failed to consistently branch with the other showed a robust clade consisting of Rhizaria, alveolates, and excavates species) received only moderate supports for their union stramenopiles [29]. However, this work significantly differs from ours

PLoS ONE | www.plosone.org 2 August 2007 | Issue 8 | e790 Reshuffling the Supergroups

Figure 1. Best maximum likelihood tree of eukaryotes found using TREEFINDER, with 10 starting trees obtained with the global tree searching procedure. Numbers at nodes represent the result of the bootstrap analysis (underlined numbers; hundred bootstrap pseudoreplicates were performed) and Bayesian posterior probabilities. Black dots represent values of 100% bootstrap support (BP) and Bayesian posterior probabilities (BiPP) of 1.0. Nodes without numbers correspond to supports weaker than 50% BP and 0.8 BiPP. doi:10.1371/journal.pone.0000790.g001 by rejecting the association of Rhizaria as sister to stramenopiles or chromalveolates as a whole (i.e. including haptophytes and as sister to all chromalveolates. Beside our much larger dataset, it is cryptophytes) received in several studies [3]. The phylogenetic unclear why our data display more flexibility with respect to the position within the eukaryotic tree of the monophyletic group position of Rhizaria within the SAR monophyletic clade. More haptophytes+cryptophytes is uncertain [13]. Globally, chromal- comprehensive taxa sampling for both Rhizaria and stramenopiles, veolates have been strongly supported by phylogenies of plastid particularly for early diverging species (e.g. radiolarians), is likely to genes and unique gene replacements in these taxa [35–37], but the shed light on the internal order of divergence within SAR. monophyly of all its members has never been robustly recovered These new relationships suggest that the supergroup ‘Chro- with nuclear loci, even using more than 18000 amino acids (Patron malveolata’, as originally defined [16], does not correctly explain et al. 2007). Overall, the unresolved nodes between the the evolutionary history of organisms bearing plastids derived from chromalveolates lineages have prevented clear conclusions relative a red algae. In fact, our results confirm the lack of support to this model of evolution [3,15].

PLoS ONE | www.plosone.org 3 August 2007 | Issue 8 | e790 Reshuffling the Supergroups

Table 1. Likelihood AU Tests of Alternative Tree Topologies. microtubes. Total RNA and cDNA were prepared as in [21]. EST ...... sequencing of the Quinqueloculina sp library was performed with the Tree topology A B C D E ABI-PRISM Big Dye Terminator Cycle Sequencing Kit and analysed with an ABI-3100 DNA Sequencer (Perkin-Elmer Inc., Fig. 1 Fig. S3 A(RS) R(SA) S(RA) Wellesley, Mass.), all according to the manufacturer’s instructions. Aua 1.0 4e-008 0.895 0.112 0.079 The G. cometa library was sequenced by Agencourt Bioscience D ln Lb 2369.2 369.2 227.4 69.4 77.5 Corporation (Beverly, Mass.).

A, B) Comparison between topology A (best tree, corresponding to the Figure 1) and the alternative topology B (corresponding to the best tree when Rhizaria Construction of the alignments are forced not to be monophyletic with S and A, Figure S3). We performed TblastN searches against GenBank using as queries C, D, E) Comparisons between topology C (best tree) and the alternative a rhizarian dataset made of all translated sequences (translations topologies D and E. done with transeq, available at the University of Oslo Bioportal; Abbreviations are as follows: A = alveolates; S = stramenopiles; R = Rhizaria Underlined number corresponds to the significant P value of the rejected http://www.bioportal.uio.no) for R. filosa, Quinqueloculina sp., G. topology. cometa, and B. natans. We retrieved and translated all sequences aApproximate Unbiased Test. with an e-value cutoff at 10240, accounting for 46 new genes out of b Log likelihood difference. a total of 126. The rest of the genes (i.e. 80 genes) corresponded to

...... doi:10.1371/journal.pone.0000790.t001 rhizarian proteins putatively homologous to sequences previously used to infer large-scale phylogenies [41] and available at http:// The emergence of SAR may potentially complicate the situation megasun.bch.umontreal.ca/Software/scafos/scafos_download. of secondary endosymbioses and questions the most parsimonious html. In order to roughly check for orthology, we also added to explanation of the evolution of chlorophyll-c containing plastids these alignments the human sequence with the lowest e-value in (see also [19,29,38,39]). At this stage at least two scenarios are our TblastN output to make sure that no closer homologs were conceivable, but none of them can be presently favoured by known. These 126 genes were used to build a very well-sampled concurrent topologies due to the uncertain position of the dataset by adding all available relevant species. For this purpose, haptophytes and cryptophytes clade. First, a single engulfment of we considered all species in TBestDB as well as all other bikont red algae might have occurred in a very early stage of taxa for which sufficient sequence data were available and made chromalveolates evolution and the resulting plastid was secondar- a local database against which we ran TblastN searches with our ily lost in certain lineages, such as ciliates and Rhizaria. Second, it rhizarian dataset (e-value threshold 10240). is possible that stramenopiles (or alveolates, or even haptophyte- To decide on the final set of genes used in this study, we carefully s+cryptophytes, depending on their real position within the tree) tested the orthology for each of the 126 selected genes by carrying have acquired their secondary plastid in an independent out Maximum likelihood (ML) analyses including bootstrap endosymbiosis event from a red algal organism. If this latter supports with the program TREEFINDER (JTT, 4 gamma scenario is correct, minimizing the number of endosymbiosis categories and 100 bootstrap replications) [42]. For three genes, events as proposed by the chromalveolates hypothesis might the overall orthology could not be assessed with enough confidence actually not correspond to the true symbiogenesis history. So far, and thus were removed. More generally, taxa displaying suspicious as many as 11 primary, secondary, and tertiary symbiotic events phylogenetic position were removed from the single-gene dataset. have been identified (see [12]). Notably, two independent Once this pre-screen was complete, our final taxon sampling secondary endosymbiosis events involving green algae have been comprises 49 species and 123 genes (Table S1). We concatenated recognized in members of excavates and Rhizaria: Euglenozoa all single gene alignments into a supermatrix alignment using and chlorarachniophytes [31], respectively. Hence, multiplying the Scafos [43]. Because of the limited data for certain groups and to number of secondary endosymbiosis might better explain the maximize the number of genes by taxonomic assemblage, some phylogenetic relationships within eukaryotes than the chromal- lineages were represented by different closely related species always veolate hypothesis. belonging to the same genus (for details see Tables S2 and S3). The new SAR supergroup implies that the major part of protists diversity shares a common ancestor. Indeed, the chromalveolates members alone already accounted for about half of the recognized Phylogenomic analyses species of protists and algae [40]. With the addition of rhizarians, The concatenated alignment was first analyzed using the a huge variety of organisms with very different ecology and maximum likelihood (ML) framework encoded in TREEFINDER, morphology are now united within a single monophyletic clade. with the global tree searching procedure (10 starting trees) [42]. In Finding a synapomorphy that would endorse the unification of order to double-check our topologies, we also ran RAxML these groups will be the next most challenging step in the (RAxML-VI-HPC-2.2.3) [44], using randomized maximum par- establishment of eukaryote phylogeny. simony (MP) starting trees in multiple inferences and the rapid hill- climbing algorithm. Following the Akaike Information Criterion (AIC) [45] computed with ProtTest 1.3 [46], the RtREV+G+F MATERIALS AND METHODS model allowing between-site rate variation was chosen (calcula- Sampling, culture and construction of cDNA libraries tions were done with 6 gamma categories). The WAG model was The miliolids of genus Quinqueloculina were collected in the locality also tested and gave the same topologies. To estimate the called Le Boucanet, near La Grande Motte (Camargue, France). robustness of the phylogenetic inference, we used the bootstrap They were sorted, picked, and cleaned by hand under the method [47] with 100 pseudoreplicates in all analyses. dissecting microscope. The culture of G. cometa was taken from the Bayesian analysis using the WAG+G+F model (4 gamma culture collection of IBIW RAS (Russia) and maintained as categories) was preformed with the parallel version of MrBayes described in [20]. Cells were collected by low-speed centrifugation, 3.1.2 [48]. The inference, starting from a random tree and using resuspended into five volumes of TriReagent (Invitrogen, four Metropolis-coupled Markov Chain Monte Carlo Carlsbad, Calif.), and broken using manual pestles and adapted (MCMCMC), consisted of 1,000,000 generations with sampling

PLoS ONE | www.plosone.org 4 August 2007 | Issue 8 | e790 Reshuffling the Supergroups every 100 generations. The average standard deviation of split Found at: doi:10.1371/journal.pone.0000790.s001 (3.29 MB TIF) frequencies was used to assess the convergence of the two runs. Figure S2 MrBayes tree. Numbers at nodes represent the Bayesian posterior probabilities were calculated from the majority bayesian posterior probabilities. rule consensus of the tree sampled after the initial burnin period as Found at: doi:10.1371/journal.pone.0000790.s002 (3.37 MB TIF) determined by checking the convergence of likelihood values across MCMCMC generations (corresponding to 50,000 genera- Figure S3 Best TREEFINDER tree in which Rhizaria were tions, depending on the analysis). forced not to belong to SAR. The evolutionary rates of the selected species were calculated Found at: doi:10.1371/journal.pone.0000790.s003 (3.34 MB TIF) with the relative-rate test as implemented in RRTree [49], by doing pairwise comparisons of two ingroups belonging to either Table S1 Abbreviated and complete protein names. SAR, hatptophytes+cryptophytes, excavates or plants relatively to Found at: doi:10.1371/journal.pone.0000790.s004 (0.05 MB the unikonts taken as outgroup. XLS) Table S2 OTU (Operational Taxonomic Unit) names, number Tree topology tests of characters, and percentage of characters included in the final To better assess the phylogenetic position of Rhizaria, we alignment conducted topology comparisons using the approximately un- Found at: doi:10.1371/journal.pone.0000790.s005 (0.02 MB biased (AU) test [30]. For each tested tree, site likelihoods were XLS) calculated using CODEML and the AU test was performed using Table S3 Percentage of missing data per species and per genes CONSEL [50] with default scaling and replicate values. To test Found at: doi:10.1371/journal.pone.0000790.s006 (0.06 MB the monophyly of the new assemblage SAR, we first compared our XLS) tree (Figure 1) to the best possible tree in which Rhizaria were forced to be outside SAR, given topological constraints corre- sponding to a trichotomy of unikonts, stramenopiles+alveolates, ACKNOWLEDGMENTS and the rest of the groups represented as a multifurcation (Figure The authors would like to thank Jose´ Fahrni and Jackie Guiard for S3). Secondly, we evaluated the placement of Rhizaria within the technical assistance; Juan Montoya for useful suggestions and constructive discussions; Jacques Rougemont for help with the vital-it server. Analyses SAR clade by testing the three possible branching patterns were done at the University of Oslo Bioportal (http://www.bioporta- between Rhizaria, stramenopiles, and alveolates. l.uio.no) and at the Vital-IT computational facilities at the Swiss Institute of Bioinformatics (http://www.vital-it.ch). SUPPORTING INFORMATION Figure S1 Best RAxML tree of eukaryotes.Numbers at nodes Author Contributions represent the result of the bootstrap analysis; black dots mean Conceived and designed the experiments: FB JP. Performed the values of 100% (hundred bootstrap replicates were done). Nodes experiments: FB MM. Analyzed the data: KS FB MM JP. Contributed with support under 65% were collapsed. reagents/materials/analysis tools: SN AS. Wrote the paper: KS FB KJ JP.

REFERENCES 1. Keeling PJ, Burger G, Durnford DG, Lang BF, Lee RW, et al. (2005) The tree of 14. Patron NJ, Inagaki Y, Keeling PJ (2007) Multiple gene phylogenies support the eukaryotes. Trends Ecol Evol 20: 670–676. monophyly of cryptomonad and haptophyte host lineages. Curr Biol 17: 2. Adl SM, Simpson AGB, Farmer MA, Andersen RA, Anderson OR, et al. (2005) 887–891. The New Higher Level Classification of Eukaryotes with Emphasis on the 15. Li S, Nosenko T, Hackett JD, Bhattacharya D (2006) Phylogenomic Analysis Taxonomy of Protists. J Eukaryot Microbiol 52: 399–451. Identifies Red Algal Genes of Endosymbiotic Origin in the Chromalveolates. 3. Parfrey LW, Barbero E, Lasser E, Dunthorn M, Bhattacharya D, et al. (2006) Mol Biol Evol 23: 663–674. Evaluating Support for the Current Classification of Eukaryotic Diversity. PLoS 16. Cavalier-Smith T (1999) Principles of protein and lipid targeting in secondary Genet 2: e220. symbiogenesis: euglenoid, dinoflagellate, and sporozoan plastid origins and the 4. Patterson DJ (1999) The diversity of eukaryotes. Am Nat 154: S96–S124. eukaryote family tree. J Eukaryot Microbiol 46: 347–366. 5. Cavalier-Smith T (2002) The phagotrophic origin of eukaryotes and 17. Cavalier-Smith T, Chao E (2003) Phylogeny and classification of phylum phylogenetic classification of Protozoa. Int J Syst Evol Microbiol 52: Cercozoa (Protozoa). Protist 154: 341–358. 297–354. 18. Keeling PJ (2004) Diversity and evolutionary history of plastids and their hosts. 6. Bhattacharya D, Helmchen T, Melkonian M (1995) Molecular evolutionary Am J Bot 91: 1481–1493. analyses of nuclear-encoded small subunit ribosomal RNA identify an 19. Burki F, Pawlowski J (2006) Monophyly of Rhizaria and Multigene Phylogeny of Unicellular Bikonts. Mol Biol Evol 23: 1922–1930. independent Rhizopod lineage containing the Euglyphida and the Chlorar- 20. Nikolaev SI, Berney C, Fahrni J, Mylnikov AP, Aleshin VV, et al. (2003) achniophyta. J Eukaryot Microbiol 42: 65–69. Gymnophrys cometa and Lecythium sp. are core Cercozoa: evolutionary 7. Burki F, Berney C, Pawlowski J (2002) Phylogenetic Position of Gromia implications. Acta Protozool 42: 183–190. oviformis Dujardin inferred from Nuclear-Encoded Small Subunit Ribosomal 21. Burki F, Nikolaev S, Bolivar I, Guiard J, Pawlowski J (2006) Analysis of DNA. Protist 153: 251–260. expressed sequence tags (ESTs) from a naked foraminiferan Reticulomyxa filosa. 8. Cavalier-Smith T (1998) A revised six-kingdom system of life. Biol Rev Camb Genome 49: 882–887. Philos Soc 73: 203–266. 22. Keeling P, Palmer J (2001) Lateral transfer at the gene and subgenic levels in the 9. Keeling PJ (2001) Foraminifera and Cercozoa Are Related in Actin Phylogeny: evolution of eukaryotic enolase. Proc Natl Acad Sci USA 98: 10745–10750. Two Orphans Find a Home? Mol Biol Evol 18: 1551–1557. 23. Delsuc F, Brinkmann H, Philippe H (2005) Phylogenomics and the re- 10. Longet D, Archibald JM, Keeling PJ, Pawlowski J (2003) Foraminifera and construction of the tree of life. Nat Rev Genet 6: 361–375. Cercozoa share a common origin according to RNA polymerase II phylogenies. 24. McMahon MM, Sanderson MJ (2006) Phylogenetic Supermatrix Analysis of Int J Syst Evol Microbiol 53: 1735–1739. GenBank Sequences from 2228 Papilionoid Legumes. Systematic Biol 55: 11. Nikolaev SI, Berney C, Fahrni JF, Bolivar I, Polet S, et al. (2004) The twilight of 818–836. Heliozoa and rise of Rhizaria, an emerging supergroup of amoeboid eukaryotes. 25. Philippe H, Snell EA, Bapteste E, Lopez P, Holland PWH, et al. (2004) Proc Natl Acad Sci USA 101: 8066–8071. Phylogenomics of Eukaryotes: Impact of Missing Data on Large Alignments. 12. Bodyl A (2005) Do plastid-related characters support the chromalveolate Mol Biol Evol 21: 1740–1752. hypothesis? J Phycol 41: 712–719. 26. Wiens JJ (2006) Missing data and the design of phylogenetic analyses. J Biomed 13. Harper JT, Waanders E, Keeling PJ (2005) On the monophyly of Inform 39: 34–42. chromalveolates using a six-protein phylogeny of eukaryotes. Int J Syst Evol 27. Wiens JJ (2005) Can Incomplete Taxa Rescue Phylogenetic Analyses from Long- Microbiol 55: 487–496. Branch Attraction? Systematic Biol 54: 73–742.

PLoS ONE | www.plosone.org 5 August 2007 | Issue 8 | e790 Reshuffling the Supergroups

28. Nozaki H, Iseki M, Hasegawa M, Misawa K, Nakada T, et al. (2007) Phylogeny 39. Shalchian-Tabrizi K, Skanseng M, Ronquist F, Klaveness D, Bachvaroff TR, et of Primary Photosynthetic Eukaryotes as Deduced from Slowly Evolving al. (2006) Heterotachy Processes in Rhodophyte-Derived Secondhand Plastid Nuclear Genes. Mol Biol Evol. Genes: Implications for Addressing the Origin and Evolution of Dinoflagellate 29. Hackett JD, Yoon HS, Li S, Reyes-Prieto A, Rummele SE, et al. (2007) Plastids. Mol Biol Evol 23: 1504–1515. Phylogenomic Analysis Supports the Monophyly of Cryptophytes and 40. Cavalier-Smith T (2004) Chromalveolate diversity and cell megaevolution: Haptophytes and the Association of ‘Rhizaria’ with Chromalveolates. Mol Biol interplay of membranes, genomes and cytoskeleton. In Organelles, Genomes Evol. pp msm089. and Eukaryotic Evolution; Hirt R, Horner DS, eds. London: Taylor and Francis, 30. Shimodaira H (2002) An approximately unbiased test of phylogenetic tree 32. selection. Systematic Biol 51: 492–508. 41. Rodriguez-Ezpeleta N, Brinkmann H, Burey SC, Roure B, Burger G, et al. 31. Rogers MB, Gilson PR, Su V, McFadden GI, Keeling PJ (2007) The Complete (2005) Monophyly of primary photosynthetic eukaryotes: green plants, red algae, Chloroplast Genome of the Chlorarachniophyte Bigelowiella natans: Evidence and glaucophytes. Curr Biol 15: 1325–1330. for Independent Origins of Chlorarachniophyte and Euglenid Secondary 42. Jobb G, von Haeseler A, Strimmer K (2004) TREEFINDER: a powerful Endosymbionts. Mol Biol Evol 24: 54–62. graphical analysis environment for molecular phylogenetics. BMC Evol Biol 4: 32. Polet S, Berney C, Fahrni J, Pawlowski J (2004) Small-subunit ribosomal RNA 18. gene sequences of Phaeodarea challenge the monophyly of Haeckel’s Radiolaria. 43. Roure B, Rodriguez-Ezpeleta N, Philippe H (2007) SCaFoS: a tool for selection, Protist 155: 53–63. concatenation and fusion of sequences for phylogenomics. BMC Evol Biol 7 33. Shalchian-Tabrizi K, Eikrem W, Klaveness D, Vaulot D, Minge MA, et al. Suppl 1: S2. (2006) Telonemia, a new protist phylum with affinity to chromist lineages. Proc R Soc Lond 273: 1833–1842. 44. Stamatakis A (2006) RAxML-VI-HPC: maximum likelihood-based phylogenetic 34. Shalchian-Tabrizi K, Kauserud H, Massana R, Klaveness D, Jakobsen KS analyses with thousands of taxa and mixed models. Bioinformatics 22: (2007) Analysis of Environmental 18S Ribosomal RNA Sequences reveals 2688–2690. Unknown Diversity of the Cosmopolitan Phylum Telonemia. Protist 158: 45. Posada D, Buckley T (2004) Model Selection and Model Averaging in 173–180. Phylogenetics: Advantages of Akaike Information Criterion and Bayesian 35. Fast NM, Kissinger JC, Roos DS, Keeling PJ (2001) Nuclear-Encoded, Plastid- Approaches Over Likelihood Ratio Tests. Systematic Biol 53: 793–808. Targeted Genes Suggest a Single Common Origin for Apicomplexan and 46. Abascal F, Zardoya R, Posada D (2005) ProtTest: selection of best-fit models of Dinoflagellate Plastids. Mol Biol Evol 18: 418–426. protein evolution. Bioinformatics 21: 2104–2105. 36. Harper J, Keeling P (2003) Nucleus-Encoded, Plastid-Targeted Glyceraldehyde- 47. Felsenstein J (1985) Confidence limits on phylogenies: An approach using the 3-Phosphate Dehydrogenase (GAPDH) Indicates a Single Origin for Chromal- bootstrap. Evolution 40: 783–791. veolate Plastids. Mol Bio Evol 20: 1730–1735. 48. Ronquist F, Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic inference 37. Patron NJ, Rogers MB, Keeling PJ (2004) Gene Replacement of Fructose-1,6- under mixed models. Bioinformatics 19: 1572–1574. Bisphosphate Aldolase Supports the Hypothesis of a Single Photosynthetic 49. Robinson-Rechavi M, Huchon D (2000) RRTree: Relative-Rate Tests between Ancestor of Chromalveolates. Eukaryotic Cell 3: 1169–1175. groups of sequences on a phylogenetic tree. Bioinformatics 16: 296–297. 38. Bachvaroff TR, Sanchez Puerta MV, Delwiche CF (2005) Chlorophyll c- 50. Shimodaira H, Hasegawa M (2001) CONSEL: for assessing the confidence of Containing Plastid Relationships Based on Analyses of a Multigene Data Set phylogenetic tree selection. Bioinformatics 17: 1246–1247. with All Four Chromalveolate Lineages. Mol Biol Evol 22: 1772–1782.

PLoS ONE | www.plosone.org 6 August 2007 | Issue 8 | e790 Biol. Lett. (2008) 4, 366–369 chromalveolates (Burki et al. 2007; Hackett et al. doi:10.1098/rsbl.2008.0224 2007; Rodrı´guez-Ezpeleta et al. 2007a). However, the Published online 3 June 2008 order of divergence among the deepest nodes remains Evolutionary biology uncertain, particularly the relationships between plants, chromalveolates and other photosynthetic lineages (haptophytes and cryptomonads). In order to Phylogenomics reveals investigate early evolution among eukaryotic super- groups, we have assembled the broadest dataset to a new ‘megagroup’ date (65 species, 135 genes representing 31 921 amino acids) and show that the eukaryotes can be including most divided into two highly supported monophyletic megagroups and a few less diversified lineages related photosynthetic eukaryotes to the excavates. Fabien Burki1,*, Kamran Shalchian-Tabrizi 2 and Jan Pawlowski1 1Department of Zoology and Animal Biology, University of Geneva, 2. MATERIAL AND METHODS 1211 Geneva 4, Switzerland 2 Our multigene dataset was assembled according to a custom Microbial Evolution Reseach Group, Department of Biology, pipeline, as follows: (i) construction of databases made of all University of Oslo, 1066 Blindern, 0316 Oslo, Norway existing sequences for species specifically selected for their broad *Author for correspondence ( [email protected]). taxonomic distribution and availability of genomic sequences Advances in molecular phylogeny of eukaryotes (downloaded from http://www.ncbi.nlm.nih.gov/ and http://amoebi- dia.bcm.umontreal.ca/pepdb/searches/welcome.php), (ii) BLAST have suggested a tree composed of a small searches against these databases using as queries the single-gene number of supergroups. Phylogenomics recently sequences composing our previously described multiple alignments established the relationships between some of (Burki et al. 2007), (iii) retrieval (with a stringent e-value cut-off at these large assemblages, yet the deepest nodes 10K50) and addition of the new homologous copies to the existing are still unresolved. Here, we investigate early single-gene alignments, (iv) automatic alignments using MAFFT evolution among the major eukaryotic super- (Katoh et al. 2002), followed by manual inspection to extract groups using the broadest multigene dataset to unambiguously aligned positions, (v) testing the orthology, in particular possible lateral or endosymbiotic gene transfer, for each date (65 species, 135 genes). Our analyses pro- of the selected genes by performing single-gene maximum- vide strong support for the clustering of plants, likelihood (ML) reconstructions using TREEFINDER Whelan and chromalveolates, rhizarians, haptophytes and Goldman (WAG, four gamma categories; Jobb et al. 2004), and cryptomonads, thus linking nearly all photosyn- (vi) the final concatenation of all single-gene alignments was done thetic lineages and raising the question of a using SCaFoS (Roure et al. 2007). Owing to the limited data for possible unique origin of plastids. At its deepest certain groups and to maximize the number of genes by taxonomic level, the tree of eukaryotes now receives strong assemblage, some lineages were represented by different closely related species always belonging to the same genus (electronic support for two monophyletic megagroups com- supplementary material). Potential interesting species with full prising most of the eukaryotic diversity. genomes available, such as the excavates Giardia and Trichomonas or the red algae Cyanidioschyzon, have been discarded from our Keywords: eukaryote evolution; deep phylogeny; taxon sampling owing to their extreme rate of sequence evolution phylogenomics; endosymbiosis; root; megagroup or their demonstrated tendency to lead to systematic errors in phylogenies (Rodrı´guez-Ezpeleta et al. 2007b). Abbreviations: HC, The concatenated alignment was analysed using both bayesian monophyletic grouping of haptophytes and (BI) and ML frameworks, with PHYLOBAYES v. 2.3 (Lartillot & cryptomonads; SAR, Philippe 2004) and RAxML-VI-HPC v. 2.2.3 (Stamatakis 2006), monophyletic grouping of stramenopiles, alveolates respectively. PHYLOBAYES was run using the site-heterogeneous and Rhizaria mixture CAT model and two independent Markov chains with a total length of 10 000 cycles, discarding the first 4000 points as burn-in and calculating the posterior consensus on the remaining 6000 trees. The convergence between the two chains was checked and always led to the exact same tree, except for uncertainties of the order of divergence between the glaucophytes, the red algae and 1. INTRODUCTION haptophytesCcryptomonads (HC). In order to reduce mixing Resolving the global tree of eukaryotes is one of the problems of the chains, the constant sites were removed from the most important goals in evolutionary biology. Mole- alignment in a subsequent analysis. The convergence was in this cular phylogenies, morphology and biochemical charac- case much quicker, after only 5000 cycles (burn-in of 1000), and HC was unambiguously positioned as sister to the Plantae. RAxML teristics have allowed the division of the majority of was used in combination with the WAG amino acid replacement eukaryotic diversity into five or six putative supergroups matrix and stationary amino acid frequencies estimated from the (reviewed in Keeling et al.(2005)and Lane & Archibald dataset. The best ML tree was determined with the PROTMIX implementation, in a multiple inferences using 20 randomized (2008)); these comprise the opisthokonts and Amoe- maximum parsimony (MP) starting trees. Statistical support was bozoa (united as ‘unikonts’; Cavalier-Smith 2002), evaluated with 100 bootstrap replicates. Two independent runs Plantae (or Archaeplastida), Excavata, Chromalveo- were performed on each replicate, using a different starting tree lata and Rhizaria (often considered as members of the (MP and the best ML tree), in order to prevent the analysis from getting trapped in a local maximum. The tree with the best log so-called ‘bikonts’; Stechmann & Cavalier-Smith likelihood was selected for each replicate, and the 100 resulting (2003a)). Recent phylogenomic reconstructions trees were used to calculate the bootstraps proportions. To save based on large sequence datasets have been used to computational burden, the PROTMIX solution was chosen with 25 distinct rate categories. To minimize potential systematic errors infer the relationships between some of these large associated with saturation and homoplasy, the fast-evolving sites assemblages, and notably Rhizaria have been shown were identified using PAML (Yang 1997), given the 20 topologies toshareacommonoriginwithmembersofthe obtained in the ML analysis. Sites were classified according to their mean site-wise rates and ML bootstrap values were computed from Electronic supplementary material is available at http://dx.doi.org/ shorter concatenated alignments with sites corresponding to 10.1098/rsbl.2008.0224 or via http://journals.royalsociety.org. categories 7 and 6C7 removed.

Received 21 April 2008 366 This journal is q 2008 The Royal Society Accepted 2 May 2008 Deep phylogeny of eukaryotes F. Burki et al. 367

Alexandrium*** Karenia*** 50 Karlodinium*** Oxyrrhis alveolates Perkinsus Cryptosporidium Eimeria** Toxoplasma** Plasmodium** Theileria** Paramecium 79 Tetrahymena Aureococcus** Phaeodactylum** Thalassiosira** stramenopiles Phytophthora Blastocystis Bigelowiella** Cercomonas 93 Gymnophrys Rhizaria Quinqueloculina Reticulomyxa Arabidopsis* Brassica* node1 Oryza* 1.0/1.0 Sorghum* 73//94/97 Pinus* Physcomitrella* Chlamydomonas* plants 0.60/0.63 Volvox* – Ostreococcus* Galdieria* 0.87/0.99 Gracilaria* Porphyra* 54 Cyanophora* Glaucocystis* 1.0/0.92 Emiliania** 80 Isochrysis** haptophytes Pavlova** cryptomonads Guillardia**

92 Hartmannella Acanthamoeba Dictyostelium Amoebozoa 92 Mastigamoeba Cryptococcus Ustilago Neurospora fungi Schizosaccharomyces Phycomyces Drosophila Gallus Homo Mus animals Nematostella Trichoplax Monosiga Euglena** Leishmania Trypanosoma Naegleria Sawyeria 91 Stachyamoeba Histiona Reclinomonas excavates 0.1 Jakoba

Figure 1. Bayesian unrooted phylogeny of eukaryotes, with a basal trichotomy representing uncertainties in the relationships between the three groups. The tree was obtained from the consensus between two independent Markov chains, run under the CAT model implemented in PHYLOBAYES. The species colour code corresponds to the type of plastid pigments, as follows: purple, chlorophyll a; green, chlorophyll aCb; and red, chlorophyll aCc. The asterisks represent primary, secondary or tertiary endosymbiosis. Underlined numbers at nodes represent PP of the analysis performed with the constant sites removed/analysis performed with all sites; other numbers represent the result of the ML bootstrap analysis (BS)—Node 1 below the line: ML analysis of the full-length alignment//ML analysis with category 7 removed/ML analysis with catergories 6C7 removed. Black dots correspond to 1.0 PP and 100% BS; black squares correspond to 1.0 PP and the specified values of BS. The scale bar represents the estimated number of amino acid substitutions per site.

3. RESULTS homoplasy (Lartillot & Philippe 2004; Lartillot et al. We first performed a bayesian analysis on a species-rich 2007; figure 1). The tree obtained is in agreement with dataset, using the powerful CAT model that has been previously published studies; it strongly supports developed to overcome systematic errors due to monophyletic groupings of unikonts (Amoebozoa,

Biol. Lett. (2008) 368 F. Burki et al. Deep phylogeny of eukaryotes fungi and animals), excavates, plants, stramenopilesC 2005). In the absence of evidence for rooting the alveolatesCRhizaria (SAR) and HC. This latter group eukaryotes within the plantsCHCCSAR megagroup, appears as sister to plants, with 1.0 Bayesian posterior the plausible rooting scenarios together with our tree probability (PP) when the constant sites were removed consistently suggest that this assemblage is holophyletic. and 0.92 PP with the full-length alignment. Remark- Our results bring convincing support for the ably, the plantsCHC clade form a strongly supported clustering of almost all photosynthetic groups in a monophyletic megagroup with the SAR assemblage unique clade (with the notable exception of the (1.0 PP, node 1), revealing an ancient split in eukaryote second-hand green plastids in Euglenozoa, belonging evolution and almost entirely resolving the relation- to the excavates), and sustain a single primary ships within most ‘bikont’ supergroups. endosymbiotic event as also suggested by gene-based This new megagroup received relatively low sup- models of the import machinery (McFadden & van port (73% bootstrap support, BS) in the ML analysis Dooren 2004). The strongest scenario to date for the of the complete dataset (figure 1). However, because evolution of primary plastid-containing species is that we are investigating relationships deriving from very a unique endosymbiosis involving a cyanobacterium ancient splits in the eukaryotic tree, it is probable that took place in the last common ancestor of Plantae multiple substitutions occurred at several sites in our (see Bhattacharya et al. 2007). The trees presented alignment, decreasing the true phylogenetic signal here allow the possibility that the primary plastid was and rendering standard site-homogeneous models established even earlier in one of the ancestors of the based on empirical matrices of amino acid replace- new megagroup, and was subsequently lost and ment (such as WAG) less accurate. To test this independently replaced by plastids of secondary further, we investigated the effect of the exclusion of origin in several lineages (HC, Rhizaria, alveolates the fastest evolving sites, which are more likely to be and stramenopiles), corroborating the hypothesis of saturated and thus be the cause of model violations an early chloroplast acquisition in eukaryotes based (Rodrı´guez-Ezpeleta et al. 2007b). Not surprisingly, on the phylogeny of the 6-phosphogluconate dehydro- the removal of the noisiest positions led to a drastic genase gene (Andersson & Roger 2002;seealso increase in the statistical support for the new mega- Nosaki (2005) for a more general discussion). We group (94 and 97% BS when categories 7 and 6C7 speculate that the high observable diversity of plastids were removed, respectively; figure 1). within the new megagroup can be traced back to its last common ancestor, and is the consequence of an increased capability of all its members to accept and 4. DISCUSSION keep plastids or plastid-bearing cells. At its deepest level, the tree of eukaryotes presented here displays only three stems, i.e. the two highly The authors would like to thank the Vital-it server (www. supported megagroups, enclosing the vast majority of vital-it.ch) and the Bioportal platform (www.bioportal.uio. eukaryotic species, and the excavates. If the mono- no) for allowing us to perform phylogenomic analyses. They phyly of excavates is further confirmed and strong are also very grateful to the Canadian consortium PEP that has made publicly available several EST projects through support is found for their possible sister position to the its database (http://tbestdb.bcm.umontreal.ca/searches/ new megagroup, we may well be able to provide welcome.php). This research was supported by the Swiss independent evidence (based on phylogenetic recon- National Science Foundation grant 3100A0-112645 ( J.P.). structions) for the concept of the two primary clades of eukaryotes—unikonts and bikonts (Stechmann & Andersson, J. O. & Roger, A. J. 2002 A cyanobacterial gene Cavalier-Smith 2003b; Richards & Cavalier-Smith in nonphotosynthetic protists—an early chloroplast 2005). This model, however, would need to be acquisition in eukaryotes? Curr. Biol. 12, 115–119. modified as the widely used dihydrofolate reductase- (doi:10.1016/S0960-9822(01)00649-2) thymidylate synthase gene fusion is questionable for Arisue, N., Hasegawa, M. & Hashimoto, T. 2005 Root of several reasons (see discussion in Kim et al. 2006). Of the eukaryota tree as inferred from combined maximum course, this does not rule out the possibility that some likelihood analyses of multiple molecular sequence data. protists, such as Telonemia or the centrohelid heliozo- Mol. Biol. Evol. 22, 409–420. (doi:10.1093/molbev/ msi023) ans that have not yet been placed with confidence Bhattacharya, D., Archibald, J. M., Weber, A. P. & Reyes- (Shalchian-Tabrizi et al. 2006; Sakaguchi et al. 2007), Prieto, A. 2007 How do endosymbionts become orga- might represent additional independent lineages. But nelles? Understanding early events in plastid evolution. generally, we believe that most eukaryotes fall into one Bioessays 29, 1239–1246. (doi:10.1002/bies.20671) of these megagroups. Burki, F., Shalchian-Tabrizi, K., Minge, M., Skjaeveland, As we are getting closer to a fully resolved phylogeny A., Nikolaev, S. I., Jakobsen, K. S. & Pawlowski, J. 2007 for the eukaryotes, an obvious question of crucial Phylogenomics reshuffles the eukaryotic supergroups. importance is the position for the root. We chose, PLoS One 2, e790. (doi:10.1371/journal.pone.0000790) however, to show an unrooted tree as the absence of Cavalier-Smith, T. 2002 The phagotrophic origin of eukar- compelling information leaves the rooting of the yotes and phylogenetic classification of Protozoa. Int. eukaryotic tree an open question. Over the past few J. Syst. Evol. Microbiol. 52, 297–354. Hackett, J., Yoon, H., Li, S., Reyes-Prieto, A., Ru¨mmele, S. years, independent data proposed a root lying either & Bhattacharya, D. 2007 Phylogenomic analysis sup- between unikonts and bikonts (Stechmann & Cavalier- ports the monophyly of cryptophytes and haptophytes Smith 2003b) or within excavates, e.g. basal to jakobids and the association of rhizaria with chromalveolates. (Rodrı´guez-Ezpeleta et al.2007a) or on the branch Mol. Biol. Evol. 24, 1702–1713. (doi:10.1093/molbev/ leading to diplomonads/parabasalids (Arisue et al. msm089)

Biol. Lett. (2008) Deep phylogeny of eukaryotes F. Burki et al. 369

Jobb, G., von Haeseler, A. & Strimmer, K. 2004 TREEFIN- Rodrı´guez-Ezpeleta, N., Brinkmann, H., Burger, G., Roger, DER: a powerful graphical analysis environment for A. J., Gray, M. W., Philippe, H. & Lang, B. F. 2007a molecular phylogenetics. BMC Evol. Biol. 4, 18. (doi:10. Toward resolving the eukaryotic tree: the phylogenetic 1186/1471-2148-4-18) positions of jakobids and cercozoans. Curr. Biol. 17, Katoh, K., Misawa, K., Kuma, K. & Miyata, T. 2002 1420–1425. (doi:10.1016/j.cub.2007.07.036) MAFFT version 5.25: multiple sequence alignment Rodrı´guez-Ezpeleta, N., Brinkmann, H., Roure, B., program. Nucl. Acids Res. 30, 3059–3066. (doi:10.1093/ nar/gkf436) Lartillot, N., Lang, B. F. & Philippe, H. 2007b Detecting Keeling, P. J., Burger, G., Durnford, D. G., Lang, B. F., and overcoming systematic errors in genome-scale phylo- Lee, R. W., Pearlman, R. E., Roger, A. J. & Gray, M. W. genies. Syst. Biol. 56, 389–399. (doi:10.1080/1063515 2005 The tree of eukaryotes. Trends Ecol. Evol. 20, 0701397643) 670–676. (doi:10.1016/j.tree.2005.09.005) Roure, B., Rodriguez-Ezpeleta, N. & Philippe, H. 2007 Kim, E., Simpson, A. G. & Graham, L. E. 2006 Evolution- SCaFoS: a tool for selection, concatenation and fusion ary relationships of apusomonads inferred from taxon- of sequences for phylogenomics. BMC Evol. Biol. rich analyses of 6 nuclear encoded genes. Mol. Biol. 7(Suppl. 1), S2. (doi:10.1186/1471-2148-7-S1-S2) Evol. 23, 2455–2466. (doi:10.1093/molbev/msl120) Sakaguchi, M., Inagaki, Y. & Hashimoto, T. 2007 Centro- Lane, C. E. & Archibald, J. M. 2008 The eukaryotic tree of helida is still searching for a phylogenetic home: analyses life: endosymbiosis takes its TOL. Trends Ecol. 23, of seven Raphidiophrys contractilis genes. Gene 405, 268–275. (doi:10.1016/j.tree.2008.02.004) 47–54. (doi:10.1016/j.gene.2007.09.003) Lartillot, N. & Philippe, H. 2004 A Bayesian mixture model Shalchian-Tabrizi, K. et al. 2006 Telonemia, a new protist for across-site heterogeneities in the amino-acid replace- phylum with affinity to chromist lineages. Proc. R. Soc. B ment process. Mol. Biol. Evol. 21, 1095–1109. (doi:10. 273, 1833–1842. (doi:10.1098/rspb.2006.3515) 1093/molbev/msh112) Stamatakis, A. 2006 RAxML-VI-HPC: maximum likelihood- Lartillot, N., Brinkmann, H. & Philippe, H. 2007 Suppres- based phylogenetic analyses with thousands of taxa and sion of long-branch attraction artefacts in the animal mixed models. Bioinformatics 22,2688–2690.(doi:10. phylogeny using a site-heterogeneous model. BMC Evol. 1093/bioinformatics/btl446) Biol. 7(Suppl. 1), S4. (doi:10.1186/1471-2148-7-S1-S4) McFadden, G. I. & van Dooren, G. G. 2004 Evolution: red Stechmann, A. & Cavalier-Smith, T. 2003a Phylogenetic algal genome affirms a common origin of all plastids. Curr. analysis of eukaryotes using heat-shock protein Hsp90. Biol. 14, R514–R516. (doi:10.1016/j.cub.2004.06.041) J. Mol. Evol. 57, 408–419. (doi:10.1007/s00239-003- Nosaki, H. 2005 A new scenario of plastid evolution: plastid 2490-x) primary endosymbiosis before the divergence of the Stechmann, A. & Cavalier-Smith, T. 2003b The root of the “Plantae”, emended. J. Plant Res. 111, 247–255. (doi:10. eukaryote tree pinpointed. Curr. Biol. 13, R665–R666. 1007/s10265-005-0219-1) (doi:10.1016/S0960-9822(03)00602-x) Richards, T. A. & Cavalier-Smith, T. 2005 Myosin domain Yang, Z. 1997 PAML: a program package for phylogenetic evolution and the primary divergence of eukaryotes. analysis by maximum likelihood. Comput. Appl. Biosci. Nature 436, 1113–1118. (doi:10.1038/nature03949) 13, 555–556.

Biol. Lett. (2008) NEWSFOCUS

EVOLUTION describe large-scale efforts to build family trees based on lots of molecular data. Systematists may like the label, but there’s Building the Tree of Life, no agreement about how many genes it takes to make an evolutionary tree phylogenomic. Genome by Genome “We would say our study is phylogenomic because we have sampled many different Cheaper sequencing has put many more genes into the hands of researchers trying to genes from many different chromosomes sort out the degree of relatedness of a menagerie of organisms across a subset of avian species, but others would say we still sampled a small portion of Phylogenetic studies have gone ’omic. Museum of Natural History in New York City. the genome,” Hackett points out. And Whereas researchers used to be satisfied “It will be used by avian systematists and non- ornithologist Michael Sorenson of Boston comparing one gene, or a few, to sort out the avian systematists for a very long time.” University applies an even tougher standard: branching of the tree of life, the push now The bird work follows two phylogenomic “I would reserve the term for what lies ahead, among those building phylogenies is to con- studies published over the past 3 months that i.e., comparisons of whole genomes.” sider whole genomes—at the very least, have shaken up perceived evolutionary In traditional molecular phylogeny, dozens of genes and thousands of DNA relationships among animals and more researchers pick out a short stretch of one bases—in establishing kinships among flora broadly among eukaryotes. In the former gene, often a mitochondrial gene, count up and fauna. In this way, evolutionary biology effort, a team led by Casey Dunn, now at the sequence differences between species in is joining the bandwagon of data-intensive Brown University, has rearranged the ani- that stretch, and use sophisticated computer studies pioneered by genomics. mal kingdom such that comb jellies, not programs to come up with the hierarchy of Thanks to one such phylogenomic analy- sponges, are among the earliest fauna. In the evolutionary relationships between the sis reported on page 1763, bird guides may latter, a European team now divides eukary- species. Most simply, the fewer the differ- never be the same. According to this new otes into two megagroups, not a half-dozen. ences, the more closely related two species avian family tree, grebes will share a section Together, the three trees speak to the poten- were considered to be. with flamingos, not loons. Dull brown night tial of phylogenomics. “We are just begin- Gradually, however, researchers realized jars and iridescent hummingbirds would ning to understand what large sequence data “that single-gene trees are prone to errors and now go together. Even parrots and songbirds sets have to say about the evolution of life that many genes are necessary,” explains Jose share a closer kinship than has been appreci- on Earth,” says Hackett. Castresana of the CSIC Institute of Molecular ated, says Shannon Hackett, an ornithologist Biology of Barcelona in . Because genes at The Field Museum of Natural History in Entering the genome age can evolve at different rates, it’s not always Chicago, Illinois. The term “phylogenomics” was coined by possible to pinpoint the true time a species She and more than a dozen colleagues con- Jonathan Eisen a decade ago to describe incip- under consideration diverged from a common structed the new genealogy after analyzing ient efforts to integrate evolutionary thinking ancestor by looking just at the changes in one 32,000 bases from 19 genes in 169 species. into genomic analyses and vice versa. What gene from that species. In some cases, there More than just rearranging which birds perch this evolutionary biologist at the University of are too few changes to provide statistically on what branches of the tree, the results raise California, Davis, had in mind was using reliable results. Other times, the transfer of questions about the evolution of flight; some information about the relatedness of newly a gene from one species to another causes birds that don’t fly are unexpectedly grouped sequenced organisms to help sort out gene phylogenetic chaos. with those that do. “It’s the most impressive function and identify comparable stretches of When Hackett, her postdoc Sushma paper in the higher level phylogeny of birds to DNA in genomes that have been deciphered. Reddy, Rebecca Kimball of the University of come along in a long time,” says Joel Cracraft, But the term has been “kidnapped,” says Eisen Florida, Gainesville, and colleagues started

an evolutionary biologist at the American jokingly, by the likes of Hackett and others to their avian project in 2003, collaborators first WIKIPEDIA KRAMER/U.S. FISH AND WILDLIFE SERVICE; FRANK KRAHMER/ZEFA/CORBIS; CREDITS (LEFT TO RIGHT): GARY

1716 27 JUNE 2008 VOL 320 SCIENCE www.sciencemag.org Published by AAAS NEWSFOCUS

Treed. An in-depth comparison of DNA showed that and Kamran Shalchian-Tabrizi of the Uni- little-studied creatures, including water Western tanagers, parrots, and falcons (left to right) versity of Oslo, Norway, also upsets old bears, comb jellies, sea spiders, and a variety are closer kin than expected. assumptions. Interested in deciphering the of worms. These data, combined with exist- deep roots of eukaryotes, which include pro- ing information, enabled them to evaluate did a computer simulation to determine how tists, plants, animals, and fungi, they 150 genes from 71 animals. much and what kind of DNA sequence would combed the public databases, coming up In some cases, the major branches of enable them to figure out the early history of with 135 genes from 65 species to compare. the new animal family tree confirmed birds. The simulation directed the team to col- Based on the pattern of differences in the researchers’ suspicions. For example, based lect at least 20,000 bases from introns and sequences, they and their colleagues came up on a suite of similar traits seen in the animals, intergenic regions, where mutations occur fre- with three early branches, two containing morphologists have long thought that mol- quently enough for there to be significant dif- almost all eukaryotes and one tentatively lusks all stem from a common ancestor. Yet ferences in the various lineages. At first, the placed branch representing excavates, protists there is no single unifying trait among the researchers sampled only about 75 species, but that include Euglena and Giardia. phylum, which includes scallops, squid, chi- after realizing how much more robust results Unlike past analyses based on just a few tons, and snails. Many, but not all, have a would be with a larger number, they doubled it. eukaryotic genes, or just one, this phyloge- toothlike structure called a radula, and a sub- “The ultimate goal is to provide the rest of nomic effort, published online 3 June in set have no shell, even though mollusk means the ornithological community with the roots Biology Letters, brought all photosynthetic “thin-shelled.” Moreover, the molecular data and base of the tree that they can leaf out more organisms—save Euglena and its relatives— did not back up the premise that all traditional effectively,” Hackett says. Traditionally, avian into one group. The researchers suggest that mollusks belong together. Dunn’s new tree systematists have had trouble sorting out those the cyanobacterium that gave rise to the mod- shows that the mollusks are one big family, early days of bird evolution, notes Harvard ern chloroplasts seen in plants and in green however. “It’s nice to have tied [this related- University ornithologist Scott Edwards. The and red algae was acquired much earlier in ness] down,” says Dunn. new results are “bold in setting an agenda for eukaryotic evolution than had been thought, But the conclusion that comb jellies are the future research,” he says. though more data is needed to confirm this oldest animals is a surprise, says Dunn, who In agreement with previous avian phyloge- idea, says Burki. adds that the reaction has ranged from “ ‘That nies, Hackett, Kimball, and Reddy found that That plants now group with dinoflagel- is so cool’ to ‘There is no way.’ ” Dunn him- the South American bird family tinamous, lates, diatoms, or freshwater flagellates—all self calls that result provisional and sees his along with ratites—kiwis, ostriches, and the previously considered independent “super- 10 April Nature paper as just the beginning. like—split off close to the base Thanks to new sequencing of the bird tree. Slightly later, technologies, “within a year or chickens, ducks, and their kin two, we’ll be seeing studies branched away from the main that have 10 times as many group of birds. The subsequent genes from 10 times as many history of birds has been enig- taxa,” he predicts. matic, but the new work offers And he’s not the only one to some clarity. Songbirds, for soon be awash in data. Burki is example, are a sister group to generating more sequences for parrots, and the two groups his work with eukaryotes, and encompass all the descendents Hackett and colleagues are from their most recent com- Rooting animals. After sequencing DNA expanding their data set as mon ancestor. Hummingbirds from 29 animals, researchers concluded that well. “Phylogenomics is descended from night jars, comb jellies (above) are likely the most prim- becoming the rule,” says Hervé evolving bright colors and a itive known animals and that nudibranchs Philippe, who develops new diurnal lifestyle along the way. (left) and other mollusks are really true kin. phylogenetic techniques at One of the more controver- the University of Montreal, sial results is that tinamous, all capable of groups”—has raised some eyebrows. “I think Canada. Philippe looks forward to more flight, belong in the same group as the flight- this is untenable,” says Patrick Keeling at the phylogenomics studies that use gene order, less ratites. This “can change the way people University of British Columbia in Vancouver, even gene content or intron positions, to infer look at the evolution of flight,” Hackett says. Canada. Nonetheless, he adds, “this paper rep- relationships—approaches that will become Grouping the birds together suggests either resents one of the right ways we should be “more natural when complete genomes are that flightlessness evolved multiple times, not going to resolve the tree of eukaryotes.” The available,” he says. once in the ancestor to this group, or that flight challenge is to include more organisms in Philippe and others caution, however, that evolved more than once in birds, showing up future studies. In doing so, “it’s entirely possi- more data don’t always guarantee better fam- independently in the tinamous and in other ble that strong support for many relationships ily trees. “It will be important to reanalyze flying birds. “This result flies in the face of will evaporate,” he notes. [data sets] with many different and emerging many other kinds of data,” says Edwards. When Dunn and his colleagues wanted to methods to see if the results change at all,” tackle the animal kingdom, they couldn’t says Edwards. And a few scientists question Shaky branches find enough publicly available DNA whether, even then, the full tree of life can The phylogenomics study of eukaryotes, con- sequence for the many species they needed to really be resolved. But, Edwards argues, ducted by Fabien Burki and Jan Pawlowski, examine. So they sequenced 39.9 million “phylogenomics is our best shot.”

CREDITS: CASEY BROWN both at the University of Geneva, Switzerland, bases from 29 of nature’s more peculiar and –ELIZABETH PENNISI

www.sciencemag.org SCIENCE VOL 320 27 JUNE 2008 1717 Published by AAAS  recherche | génomique L’arbre de la vie perd une branche — L’accumulation de données génétiques et l’amélioration des techniques d’analyse bouleversent les classifications du monde vivant. Fabien Burki, du Département de zoologie et biologie animale, y contribue

Scier une branche de l’arbre de la vie, ce n’est grands groupes d’unicellulaires pourvus concurrente, regroupant des chercheurs de pas commun. C’est pourtant ce que propose de chloroplastes d’origine secondaire) et les plusieurs universités du Canada, a publié ses Fabien Burki, doctorant dans l’équipe de Jan rhizaria (comprenant des unicellulaires ami- résultats la même semaine. Pour l’intérêt de Pawlowski, professeur titulaire au Départe- boïdes parmi lesquels les foraminifères et les la science, les deux équipes sont parvenues, à ment de zoologie et biologie animale. Dans un radiolaires, des unicellulaires qui possèdent quelques détails près, à la même conclusion. article paru dans la revue PLoS One du mois pour la plupart une coquille). Ce sont ces deux d’août, le jeune chercheur genevois établit en derniers ensembles, chromalveolates et rhiza- Ramure élaguée effet une nouvelle classification des eucaryo- ria, que Fabien Burki estime devoir fusionner L’élagage de l’arbre de la vie par Fabien tes (organismes dont les cellules possèdent en un nouveau supergroupe, baptisé SAR. Burki n’est pas le premier et certainement pas un noyau). Ceux-ci sont pour l’instant répar- Le travail de Fabien Burki est basé sur l’ex- le dernier. En moins de dix ans, le vénérable tis en cinq supergroupes: les plantes (algues ploitation de données génétiques prélevées monument sylvestre n’a cessé de changer de vertes, algues rouges, plantes terrestres…), les sur 49 espèces issues des cinq supergroupes forme. Sans même parler de ses prédécesseurs initiaux. En tout, des siècles passés, c’est la classification que près de 30 000 acides l’Américain Robert Whittaker a proposée en d r aminés (composants 1969 qui est entrée dans l’enseignement de la essentiels des protéi- biologie et qui y est restée jusqu’à aujourd’hui. nes, étant elles-mê- On y voit d’amples racines représentant les mes des reflets du bactéries (les procaryotes, pour être précis), code génétique) ont surmontées d’un solide tronc formé des protis- été passés à la mou- tes (tous les eucaryotes unicellulaires). Trois linette de deux ré- majestueuses branches couronnent le tout: les seaux d’ordinateurs plantes, les champignons et les animaux. surpuissants (le Bio- En 2003, toutefois, la génétique ayant de- portal de l’Université puis un moment mis à mal cette vision assez d’Oslo et le Vital-IT hiérarchique des choses, les biologistes se sont de l’Institut suisse de mis d’accord pour redessiner l’arbre de la vie. bio-informatique). En ne considérant ici que les eucaryotes, les A partir de cet en- spécialistes de la phylogénie ont effacé tout semble de données ce qui pouvait s’apparenter à des racines ou phylogénét iques , à un tronc pour ne garder que des branches le plus complet au d’égale importance. Dans sa première version, monde pour les euca- la ramure de l’arbre de la vie a été organisée ryotes, le chercheur en huit supergroupes dont seulement deux genevois a pu tirer contiennent des organismes pluricellulaires, le «meilleur arbre» les plantes et les opisthokontes. C’est dans ce «Reticulomyxa filosa» est un foraminifère appartenant au supergroupe des rhizaria. possible, statistique- dernier, en compagnie des animaux et des ment parlant, les es- champignons, que se cache Homo sapiens, unikonts (regroupant entre autres les animaux pèces étant reliées entre elles en fonction de une brindille parmi une multitude d’autres et les champignons), les excavates (comptant leurs relations évolutives. (lire Campus n°70, mai-juin 2004). essentiellement des organismes parasites), les Il s’en est fallu de peu, toutefois, que Fabien Ce beau foisonnement égalitaire ne dure chromalveolates (représentés par plusieurs Burki se fasse coiffer au poteau. Une équipe pas. Dès 2004, les bûcherons de la phylogé-

Campus N° 88 Université de Genève 

CLASSEMENT DES ORGANISMES EUCARIOTES

AVANT PLANTES algues vertes algues rouges RHIZARIA plantes terrestres foraminifères … radiolaires ... CHROMALVEOLATES Haptophytes et Cryptophytes Alveolates Straménopiles UNIKONTS champignons animaux EXCAVATES nomique se mettent à l’œuvre. Cette principalement de parasites qui, par ... organismes parasites année-là, grâce à la multiplication ... définition, sont capables d’évoluer des données génétiques sur des orga- rapidement pour s’adapter aux mo- nismes de plus en plus divers, deux difications du système de leur hôte. branches sont coupées et en 2005 une De quoi brouiller les pistes pour les APRÈS troisième. Lors d’une de ces métamor- biologistes. phoses, les amibes, qui représentaient Autre incertitude à lever: l’empla- un supergroupe à part entière, sont Alveolates cement des haptophytes et des cryp- intégrées à celui des animaux et des PLANTES tophytes que la fusion des rhizaria champignons. Un nouvel ensemble Rhizaria avec les chromalvéolates a rendu est créé: les unikonts. Cela signifie, quelque peu incertain. Ces organis- en d’autres termes, qu’entre l’amibe, mes unicellulaires éveillent l’intérêt Straménopiles le bolet et l’être humain, il n’existe des scientifiques de manière générale pas assez de différences génétiques SAR en raison du fait que certains d’entre pour justifier de les classer dans des eux sont impliqués dans le cycle du supergroupes séparés. Petite leçon carbone et donc dans l’évolution du UNIKONTS d’humilité. EXCAVATES climat. Le hic, c’est que les génomes de ces créatures sont pleins de parti- Premier arbre stable cularités qui posent des problèmes «Ces réajustements incessants témoi- aux chercheurs. Selon de nouvelles données génétiques, les organismes classés gnent des progrès effectués dans la géno- dans la branche des Rhizaria (tous unicellulaires) ne sont pas Pour Fabien Burki, l’étape suivante mique et dans les technologies qui lui sont si isolés du point de vue évolutif. Ils semblent faire partie consistera à ajouter à son analyse des associées, explique Fabien Burki. Pour de la branche des Chromalveolates, créant ainsi un nouveau espèces dont on possède également notre travail, nous avons effectué pour supergroupe, le SAR. des traces fossiles bien documentées la première fois le décryptage de gènes de pour tenter de dater les différents em- trois espèces appartenant au supergroupe branchements de ce nouvel arbre de la des rhizaria: deux foraminifères (Reticulomyxa mique. Toutes ces données additionnelles permet- vie. Bien que l’on connaisse ou sache estimer filosa et Quiqueloculina) et un protiste amiboïde tent de réévaluer la position des branches de l’ar- la fréquence à laquelle peuvent survenir cer- (Gymnophrys cometa). Ce sont des opérations bre de la vie à des endroits plus justes que supposé taines mutations à l’origine des différences gé- longues et difficiles, car, pour récolter suffisemment précédemment. Dans notre cas, celle des rhizaria nétiques entre les espèces, il manque en effet de matériel, il faut parvenir à mettre en culture s’est subitement retrouvée au beau milieu de celle toujours des bornes stables permettant de ca- ces organismes, ce qui n’est pas toujours possible. des chromalvéolates. Fusionner ces deux groupes a librer l’arbre des eucaryotes. C’est donc la pa- Nous avons néanmoins réussi à fournir un nombre été la suite logique. L’arbre que nous obtenons est léontologie qui en fournira, notamment sous important de données phylogénomiques pour un l’un des tout premiers qui soit stable.» la forme de microfossiles ayant contribué à la supergroupe qui en manquait cruellement jusque- Pour le chercheur genevois, il s’agit main- formation des roches sédimentaires au cours là. Nous avons également profité du travail de nos tenant d’ajouter des espèces-clés qui per- des derniers cinq cents millions d’années. ❚ concurrents canadiens qui ont récemment rendu mettraient de préciser l’emplacement des public le décryptage d’une espèce supplémentaire branches ou sous branches encore flottantes. Anton Vos de rhizaria ainsi que de nombreuses autres répar- D’ailleurs, le supergroupe des excavates po- ties à travers l’arbre des eucaryotes. Cela nous a sera sans doute encore beaucoup de problè- http://www.unige.ch/sciences/biologie/biani/msg/ permis de compléter notre échantillonnage taxono- mes aux chercheurs. Il est en effet composé

Université de Genève