<<

University of Massachusetts Amherst ScholarWorks@UMass Amherst

Open Access Dissertations

2-2011

Diversity of and Their Genomes

Laura Ellen Wegener Parfrey University of Massachusetts Amherst

Follow this and additional works at: https://scholarworks.umass.edu/open_access_dissertations

Part of the Ecology and Evolutionary Biology Commons

Recommended Citation Wegener Parfrey, Laura Ellen, "Diversity of Eukaryotes and Their Genomes" (2011). Open Access Dissertations. 345. https://scholarworks.umass.edu/open_access_dissertations/345

This Open Access Dissertation is brought to you for free and open access by ScholarWorks@UMass Amherst. It has been accepted for inclusion in Open Access Dissertations by an authorized administrator of ScholarWorks@UMass Amherst. For more information, please contact [email protected].

DIVERSITY OF EUKARYOTES AND THEIR GENOMES

A Dissertation Presented

by

LAURA E. WEGENER PARFREY

Submitted to the Graduate School of the University of Massachusetts Amherst in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

February 2011

Program in Organismic and Evolutionary Biology

© Copyright by Laura E. Wegener Parfrey 2011

All Rights Reserved

DIVERSITY OF EUKARYOTES AND THEIR GENOMES

A Dissertation Presented

by

LAURA E. WEGENER PARFREY

Approved as to style and content by:

______Laura A. Katz, Chair

______Benjamin B. Normark, Member

______Robert L. Dorit, Member

______Samuel S. Bowser, Member

______Elizabeth R. Dumont, Program Director Organismic and Evolutionary Biology

DEDICATION

To James for his continued respect and support, despite my persistent lack on interest in cute, furry research subjects.

ACKNOWLEDGEMENTS

I am extremely grateful to the numerous people who helped me complete my

Dissertation. I want to thank Laura Katz for inviting me to her lab and providing guidance in all aspects of my academic development. I am indebted to her for all her advice and encouragement. I would like to thank the members of the lab that have provided assistance, insightful conversations, and good times through the years, especially Yonas Tekle, Maiko Tamura, Micah Dunthorn, Daniel Lahr, and Mary

Doherty. Thanks to Jessica Grant for a tremendous amount of support in the through the years. I also acknowledge Judith Wopereis and Dick Briggs at Smith College for help in the microscopy facilities, and Tony Caldanaro for technical assistance.

I would also like to thank many people who contributed their time and knowledge to my development as a scientist and as a person. I thank my Dissertation committee,

Sam Bowser, Ben Normark, and Rob Dorit, for their helpful comments on manuscripts, career advice, and good conversations. Sue Goldstein, Sam, and Andrea Habura, and

Paddy Patterson were instrumental in my development as a field biologist. Many in the

OEB community made my graduate experience much more fulfilling, enlightening, and enjoyable, especially Dana Moseley, Susannah Lerman, and Sharlene Santana. Of course, thanks to Penny Jaques for much guidance and help navigating the ways of

UMass, and more importantly actively seeking to improve the of all OEB students.

I would like to thank my family for being supportive and stressing the importance of education, and especially James Parfrey.

Finally, this research would not have been possible without the financial support from many sources. I thank the University of Massachusetts Amherst for teaching

v

assistantship support and a UMass Graduate School Fellowship. Support also came from a Sigma Xi Grant in aid of research, a Joseph A. Cushman award from the Cushman

Foundation for Foraminiferal Research, and a miniPEET grant from the Society of

Systematic Biologists. This work was presented at numerous meetings with travel support supplied by the International Society of Protistologists, American Society of

Microbiology, Society for Integrative and Comparative Biology, Cushman Foundation for Foraminiferal Research, UMass Amherst, and the OEB program. Research was also supported by an NSF grant to Laura A Katz (NSF AToL 043115).

vi

ABSTRACT

DIVERSITY OF EUKARYOTES AND THEIR GENOMES

FEBRUARY 2011

LAURA E. WEGENER, B.S., UNIVERSITY AT ALBANY

Ph.D., UNIVERSITY OF MASSACHUSETTS AMHERST

Directed by: Professor Laura A. Katz

My dissertation addresses two aspects of eukaryotic , 1) the organization of eukaryotic diversity and 2) genomic variation in . The bulk of eukaryotic diversity is microbial with and representing just two of the estimated 75 lineages of eukaryotes. Among these microbial lineages, there are many examples of dynamic genome processes. Elucidating the origin and evolution of genome features requires a robust phylogenetic framework for eukaryotes. Taxon-rich molecular analyses provide a mechanism to test hypothesized evolutionary relationships and enable placement of diverse taxa on the tree of . These analyses result in a well-resolved eukaryotic tree of life. Relaxed molecular clock analyses of this taxon-rich dataset place the origin on eukaryotes in the Paleoproterozoic, and suggest that all of the major lineages of eukaryotes diverged before the Neoproterozoic. This robust scaffold of the tree of eukaryotes is also used to elucidate common themes in across eukaryotes. Mapping dynamic genome features onto this tree demonstrates that they are widespread in eukaryotes, and suggests that a common mechanism underlies genome plasticity. Foraminifera, a diverse lineage of marine amoebae, provide a good model system for investigating genome dynamics because they amplify portions of their genome

vii

and go through ploidy cycles during their life cycle. Assessment of nuclear dynamics in one of Foraminifera, laticollaris strain CSH, reveals that genome content varies according the life cycle stage and food source, which may differentially impact organismal fitness. The inclusion of diverse microbial eukaryotes enables better resolution of eukaryotic relationships and improves our understanding the dynamic nature of eukaryotic genomes.

viii

TABLE OF CONTENTS ACKNOWLEDGEMENTS ...... v ABSTRACT...... vii LIST OF TABLES ...... xii LIST OF FIGURES ...... xiii CHAPTER 1. EVALUATING SUPPORT FOR THE CURRENT CLASSIFICATION OF EUKARYOTIC DIVERSITY...... 1

1.1 Abstract...... 1 1.2 Introduction ...... 2

1.2.1 Eukaryotic Classification...... 2 1.2.2 The Supergroups ...... 3 1.2.3 Our Approach...... 7

1.3 Methods...... 8

1.3.1 Stability of ...... 8 1.3.2 Membership support ...... 8 1.3.3 Supergroup Monophyly ...... 9

1.4 Results ...... 10

1.4.1 Taxonomic Instability ...... 10 1.4.2 Varying Support for Membership within and Monophyly of Targeted Supergroups ..11 1.4.3 Decreased Support for Monophyly of Supergroups as Outgroups in Other Studies ....12

1.5 Discussion...... 14

1.5.1 Supergroup Robustness...... 14 1.5.2 Alternative hypotheses...... 15 1.5.3 Why is Eukaryotic Taxonomy so Difficult? ...... 16 1.5.4 Towards a Robust Scaffold of the Eukaryotic Tree of Life...... 20

1.6 Conclusion...... 23 2. THE DYNAMIC NATURE OF EUKARYOTIC GENOMES...... 33

2.1 Abstract...... 33 2.2 Introduction ...... 34 2.3 Varying DNA content during nuclear cycles ...... 37 2.4 Intraspecific variation in DNA content ...... 42 2.5 Synthesis...... 45 3. BROADLY SAMPLED MULTIGENE ANALYSES YIELD A WELL- RESOLVED EUKARYOTIC TREE OF LIFE ...... 51

ix

3.1 Abstract...... 51 3.2 Introduction ...... 52 3.3 Methods...... 55

3.3.1 Gene Sequencing ...... 55 3.3.2 Data set Assembly...... 57 3.3.3 Creation of Subdata Matrices...... 61 3.3.4 Phylogenetic Analyses...... 63 3.3.5 Topology Testing...... 65

3.4 Results and Discussion...... 65

3.4.1 Robust Topology of the Eukaryotic Tree of Life...... 65 3.4.2 Orphan Lineages ...... 69 3.4.3 Photosynthetic Lineages ...... 70 3.4.4 Relationships Within the Well-Sampled and ...... 75

3.5 Conclusions ...... 76 4. GENOME DYNAMICS ARE INFLUENCED BY FOOD SOURCE IN ALLOGROMIA LATICOLLARIS STRAIN CSH (FORAMINIFERA) ...... 85

4.1 Abstract...... 85 4.2 Introduction ...... 86 4.3 Materials and Methods ...... 88

4.3.1 Cultures...... 88 4.3.2 Fixation and DAPI Staining...... 88 4.3.3 Imaging and Measurement...... 89 4.3.4 Genome Size Estimate ...... 90

4.4 Results ...... 91

4.4.1 Nuclear Dynamics during the Life Cycle ...... 91 4.4.2 Impact of Food Source...... 92 4.4.3 Ploidy Levels and Genome Size of Allogromia laticollaris CSH ...... 93

4.5 Discussion...... 94 5. ELUCIDATING THE TIMING OF EUKARYOTIC DIVERSIFICATION ..... 104

5.1 Abstract...... 104 5.2 Introduction ...... 104

5.2.1 fossil record ...... 105 5.2.2 Challenges molecular clock analyses...... 106 5.2.3 Eukaryotic phylogeny and the root of eukaryotes ...... 108 5.2.4 My approach ...... 110

5.3 Methods...... 110

5.3.1 Alignments...... 110

x

5.3.2 Phylogenetic analysis...... 111 5.3.3 Molecular dating analyses...... 111 5.3.4 Assessing the position of the root ...... 113 5.3.5 Calibration points...... 114

5.4 Results ...... 116

5.4.1 Timing of eukaryotic origins and radiation of extant lineages ...... 116 5.4.2 Impact of including Proterozoic calibration constraints ...... 117 5.4.3 Impact of root position on divergence time estimates ...... 119

5.5 Discussion...... 120

5.5.1 Molecular time scale of eukaryotic evolution...... 120 5.5.2 Discrepancy between these and previous molecular clock studies...... 121 5.5.3 Neoproterozoic origin of eukaryotes rejected...... 122 5.5.4 Ecological setting for the origin and diversification of eukaryotes ...... 123

5.6 Acknowledgements...... 124 APPENDIX: PRODUCTS RESULTING FROM THIS DISSERTATION...... 133 BIBLIOGRAPHY...... 134

xi

LIST OF TABLES

Table Page

1.1. Summary of Eukaryotic Supergroups...... 24

1.2. Support for membership and supergroup monophyly from ‘’ targeted molecular genealogies...... 25

1.3. Support for membership and supergroup monophyly from ‘’ targeted molecular genealogies...... 26

1.4. Support for membership and supergroup monophyly from ‘Excavata’ targeted molecular genealogies...... 27

1.5. Support for membership and supergroup monophyly from Opisthokonta targeted molecular genealogies...... 28

1.6. Support for membership and supergroup monophyly from ‘Plantae’ targeted molecular genealogies...... 29

1.7. Support for membership and supergroup monophyly from ‘Rhizaria’ targeted molecular genealogies...... 30

3.1. Support for major of eukaryotes in analyses containing varying levels of taxon inclusion and missing data...... 78

5.1. Calibration constraints for dating the eukaryotic tree of life ...... 125

5.2. Details of gene and taxon sampling...... 126

xii

LIST OF FIGURES

Figure Page

1.1. Trends in the taxonomy of eukaryotes...... 31

1.2. Trends in supergroup taxonomy...... 32

2.1. Distribution of genomic features...... 47

2.2. Exemplar microbial eukaryotes...... 48

2.3. Diversity of nuclear cycles...... 49

2.4. Range of intraspecific variation...... 50

3.1. Most likely eukaryotic tree of life reconstructed using all 451 taxa and all 16 genes (SSU-rDNA plus 15 protein genes)...... 79

3.2. Most likely eukaryotic tree of life reconstructed with 10:16, which includes 88 taxa and 16 genes (SSU-rDNA plus 15 protein genes)...... 81

3.3. Maximum likelihood tree of Rhizaria reconstructed with 103 Rhizaria taxa and 16 genes...... 82

3.4. Maximum likelihood tree of Excavata with 75 taxa and 16 genes...... 83

3.5. Summary of major findings—the evolutionary relationships among major lineages of eukaryotes...... 84

4.1. Nuclear dynamics during the life cycle of Allogromia laticollaris CSH...... 99

4.2. Genome size in Allogromia laticollaris CSH is estimated to be 83 ± 29 MB...... 100

4.3. Significant positive correlation between nuclear size and DNA content with size throughout life cycle...... 101

4.4. Food source has a significant impact on nuclear dynamics and DNA content in Allogromia laticollaris CSH...... 102

4.5. Food source impacts ploidy levels, and presumably fecundity of Allogromia laticollaris CSH...... 103

5.1. Most likely tree constructed with 15 proteins and 109 taxa in RAxML...... 129

xiii

5.2. Summary of mean divergence dates for the major clades of eukaryotes...... 130

5.3. Time calibrated tree of eukaryotes using Phanerozoic calibration points...... 131

5.4. Time calibrated tree of eukaryotes using All calibration points...... 132

xiv

CHAPTER 1

EVALUATING SUPPORT FOR THE CURRENT CLASSIFICATION OF

EUKARYOTIC DIVERSITY

1.1 Abstract

Perspectives on the classification of eukaryotic diversity have changed rapidly in recent years, as the four eukaryotic groups within the five classification — plants, animals, fungi, and — have been transformed through numerous permutations into the current system of six ‘supergroups’. The intent of the supergroup classification system is to unite microbial and macroscopic eukaryotes based on phylogenetic inference. This supergroup approach is increasing in popularity in the literature and is appearing in introductory biology textbooks. We evaluate the stability and support for the current six supergroup classification of eukaryotes. We assess three aspects of each supergroup: 1) the stability of its taxonomy, 2) the support for monophyly

(single evolutionary origin) in molecular analyses targeting a supergroup, and 3) the support for monophyly when a supergroup is included as an outgroup in phylogenetic studies targeting other taxa. Our analysis demonstrates that supergroup taxonomies are unstable and that support for groups varies tremendously, indicating that the current classification scheme of eukaryotes is likely premature. We highlight several trends contributing to the instability, and discuss the requirements for establishing robust clades within the eukaryotic tree of life.

1

1.2 Introduction

1.2.1 Eukaryotic Classification

Biological research is based on the shared history of living things. Taxonomy — the science of classifying organismal diversity — is the scaffold on which biological knowledge is assembled and integrated into a cohesive structure. A comprehensive eukaryotic taxonomy is a powerful research tool in evolutionary , medicine, and many other fields. As the foundation of much subsequent research, the framework must, however, be robust. Here we test the existing framework by evaluating the support for and stability of the classification of eukaryotic diversity into six supergroups.

Eukaryotes ( containing nuclei) encompass incredible morphological diversity from picoplankton of only two microns in size to the blue whale and giant sequoia that are eight orders of magnitude larger. Many evolutionary innovations are found only in eukaryotes, some of which are present in all lineages (e.g., the cytoskeleton, nucleus) and others that are restricted to a few lineages (e.g., multicellularity, photosynthetic []). These and other eukaryotic features evolved within microbial eukaryotes (protists) that thrived for hundreds of millions of years before they gave rise independently to multicellular eukaryotes, the familiar plants, animals, and fungi (Javaux et al. 2001). Thus, elucidating the origins of novel eukaryotic traits requires a comprehensive phylogeny — an inference of organismal relationships — that includes the diverse microbial lineages.

Higher-level classifications have historically emphasized the visible diversity of large eukaryotes, as reflected by the establishment of the , , and fungal kingdoms. In these schemes the diverse microbial eukaryotes have generally been placed

2

in one (Protista; Copeland 1956; Dobell 1911; Whittaker 1969) (or Protoctista; Margulis and Schwartz 1988) or two ( and ; Cavalier-Smith 1998) groups (Fig.

1.1; but see also Adl et al. 2005; Patterson 1999). However, this historic distinction between macroscopic and microscopic eukaryotes does not capture adequately their complex evolutionary relationships or the vast diversity within the microbial world.

In the past decade, the emphasis in high-level taxonomy has shifted away from the historic kingdoms and towards a new system of six supergroups that aims to portray evolutionary relationships between microbial and macrobial lineages. The supergroup concept is gaining popularity as evidenced by several reviews (Baldauf 2003; Keeling et al. 2005) and inclusion in forthcoming editions of introductory biology textbooks. In addition, the International Society of Protozoologists recently proposed a formal reclassification of eukaryotes into six supergroups, though acknowledging uncertainty in some groups (Adl et al. 2005).

1.2.2 The Supergroups

Below we introduce the six supergroups in alphabetical order (Table 1.1). The supergroup ‘Amoebozoa’ was proposed in 1996 (Cavalier-Smith 1996/97). Original evidence for the group was drawn from molecular genealogies and morphological characters such as eruptive and branched tubular mitochondrial cristae.

However, no clear synapomorphy — shared derived character — exists for Amoebozoa.

In fact, amoeboid organisms are not restricted to the ‘Amoebozoa’, but are found in at least four of the six supergroups.

3

The ‘Amoebozoa’ include a diversity of predominantly amoeboid members such as Dictyostelium discoideum (cellular ), which is a model for understanding multicellularity (Williams et al. 2005). histolytica, an amitochondriate amoebae (Pelobiont), is the cause of amoebic dysentery, an intestinal infection with global health consequences (Stauffer and Ravdin 2003).

‘Chromalveolata’ was introduced as a parsimonious, albeit controversial, explanation for the presence of plastids of red algal origin in photosynthetic members of the 'Alveolata' and ‘Chromista’ (Cavalier-Smith 1999). Under this hypothesis, the last common ancestor of the chromalveolates was a heterotroph that acquired by engulfing a red alga and retaining it as a (Delwiche 1999; Keeling 2004). The

Alveolata include , , and and its monophyly is well supported by morphology and molecules. ‘Chromista’ was created as a kingdom to unite diverse microbial lineages with red algal plastids (and their nonphotosynthetic descendants; Cavalier-Smith 1982; Cavalier-Smith 1998), but no clear synapomorphy unites this .

The supergroup ‘Chromalveolata’ includes microbes with critical roles in the environment and in human health. Numerous key discoveries emerged from studies of the model (: Alveolata) including self-splicing RNAs and the presence of (Prescott 1994). Phytophthora (: ‘Chromista’), a soil-dwelling organism, is the causative agent of the Irish Potato Famine (May and

Ristaino 2004) whereas (Apicomplexa: Alveolata) is the causative agent of malaria (Sherman 1998).

4

‘Excavata’ is a supergroup composed predominately of heterotrophic whose ancestor is postulated to have had a synapomorphy of a conserved ventral feeding groove (Simpson and Patterson 1999). Most members of ‘Excavata’ are free-living heterotrophs, but there are notable exceptions that are pathogens. For example,

(Diplomonada) causes the intestinal infection giardiasis, and vaginalis

(Parabasalia) is the causative agent of a sexually transmitted disease (Kreler 1991-1995).

Kinetoplastids, such as (), have unique molecular features such as extensive RNA editing of mitochondrial genes that is templated by minicircle DNA

(Smith 1996).

Opisthokonta includes animals, fungi, and their microbial relatives. This supergroup emerged from molecular gene trees (Wainright et al. 1993) and is united by the presence of a single posterior in many constituent lineages (Cavalier-Smith and Chao 1995). Molecular studies have expanded microbial membership of the group and revealed a potential molecular synapomorphy, an insertion in the Elongation Factor

1a gene (Baldauf and Palmer 1993; Steenkamp et al. 2006).

Opisthokonts include many biological model organisms (Drosophila,

Saccharomyces). Vast amounts of research have been conducted on members of this supergroup and much textbook science is based on inferences from these lineages. Other notable include Encephalitozoon (: Fungi), a causative agent of diarrhea, which has one of the smallest known nuclear genomes at 2.9 MB (Keeling and Slamovits 2004). Also included within the Opisthokonta are the

(e.g. Monosiga) that are the sister to animals (King 2004).

5

The supergroup ‘Plantae’ was erected as a kingdom in 1981 (Cavalier-Smith

1981) to unite the three lineages with primary plastids: (including land plants), rhodophytes, and . Under this hypothesis a single ancestral primary endosymbiosis of a cyanobacterium gave rise to the plastid in this supergroup (McFadden

2001). The term ‘Plantae’ has been used to describe numerous subsets of photosynthetic organisms, but in this manuscript will only be used in reference to the supergroup.

Well known ‘Plantae’ genera include Arabidopsis, a model angiosperm, and

Porphyra (red alga), the edible seaweed nori. Within the ‘Plantae,’ there have been numerous independent origins of multicellularity including: ()

(Michod et al. 2006), the land plants, and .

‘Rhizaria’ emerged from molecular data in 2002 to unite a heterogeneous group of flagellates and amoebae including: cercomonads, foraminifera, diverse , and former members of the polyphyletic (Cavalier-Smith 2002).

‘Rhizaria’ is an expansion of the ‘’ (Cavalier-Smith 1998) that was also recognized from molecular data (Bhattacharya 1995; Cavalier-Smith and Chao 1997).

‘Cercozoa’ and foraminifera appear to share a unique insertion in ubiquitin (Archibald et al. 2003) but there is a paucity of non-molecular characters uniting members of

‘Rhizaria’.

‘Rhizaria’ encompasses a diversity of forms, including a heterotrophic

Cercomonas (Cercomonada: ‘Cercozoa’), and a photosynthetic chromatophora, (Silicofilosea: ‘Cercozoa’) that represents a recent endosymbiosis of a cyanobacterium (Marin et al. 2005; Yoon et al. 2006). Some members of the ‘Rhizaria’,

6

notably the shelled foraminifera, also have a substantial fossil record that can be used to determine the age of sediments (Loeblich and Tappan 1987).

1.2.3 Our Approach

To assess the robustness of the six proposed supergroups we compare formal taxonomies and track group composition and nomenclature across time (Figs. 1.1 and

1.2). We also evaluate support for the six supergroups by analyzing published molecular genealogies that either target a specific supergroup or aim to survey all supergroups. Our focus on molecular genealogies is limited. We recognize that supergroups have, in many cases, been defined by suites of characters such as flagellar apparatus in ‘Excavata’

(Cavalier-Smith 2002; Simpson 2003) and Opisthokonta (Cavalier-Smith and Chao

1995), and that groups are more robust when supported by multiple data types (see

Discussion). Use of genealogies is further complicated because a genealogy is the reconstruction of the history of a gene, and may or may not be congruent with phylogenies, which depict the history of organisms (Doyle 1992; Maddison 1997).

Despite these factors, our treatment of molecular genealogies is warranted given the prevalence of molecular analyses in the literature that seeks support for supergroups and the reliance on these gene trees in establishing taxonomy.

For each genealogy we evaluate the taxon sampling for the targeted supergroup

(Membership; Tables 1.2-1.7) and the monophyly of all supergroups with at least two member taxa (Supergroup monophyly; Tables 1.2-1.7). Monophyletic clades — those that include an ancestor and all of its descendants (Hennig 1966) — are scored (+; Tables

1.2-1.7). We assess support for supergroups when they are targeted by specific studies and when they are included as outgroups in studies targeting other supergroups. A

7

conservative measure of outgroup monophyly was used because we required only two member lineages be present. In contrast, focal supergroups had broader taxonomic sampling.

1.3 Methods

1.3.1 Stability of Taxonomy

To assess the stability of supergroup taxonomies over time, we selected three classification schemes for each supergroup and tracked both the stability of taxa membership (solid and dashed lines; Figs. 1.1 and 1.2) and the fate of newly created taxon names (*; Fig. 1.2). In sampling representative taxonomies, we aimed to capture a diversity of authors and opinions. In the case of Opisthokonta and ‘Chromalveolata’ we are aware of only one formal, peer-reviewed classification scheme (Adl et al. 2005).

Given the lack of equivalency in ranks between taxonomies, we have chosen to display three levels with the intention of listing equivalent levels clearly.

1.3.2 Membership support

Within each supergroup, we assess the support for each member taxon by documenting its inclusion in molecular genealogies (Tables 1.2-1.7). Member taxa were chosen because they are historically a well-supported group, usually with an ultrastructural identity. The are such a group, and share a haptonema

(Patterson 1999). We included members that represent a broad interpretation of the supergroup. For example, ‘Rhizaria’ member taxa include groups (e.g. apusomonads) originally placed in ‘Rhizaria’, but later removed. We considered a taxon to be a

8

supported member of its supergroup (filled circles; Tables 1.2-1.7) when it falls within a monophyletic clade containing a majority of the supergroup members. A taxon that falls outside of its supergroup clade, or on the occasion that a majority of members do not form a monophyletic clade, is considered unsupported in that genealogy (open circles;

Tables 1.2-1.7).

The inclusion of a genealogy requires that it be found in a paper that specifically addresses one of the supergroups or analyzes broad eukaryotic diversity. The genealogies must also include adequate sampling — two member taxa per supergroup — from at least two of the six supergroups to allow for the comparison of supergroup monophyly. In cases where multiple gene trees are presented we display the authors’ findings as multiple entries when the trees are not congruent or as a single entry when the trees are concordant. Due to the lack of monophyly in virtually all analyses, we have evaluated the support for several hypothesized subgroups within the ‘Excavata’ (geometric shapes;

Table 1.4).

1.3.3 Supergroup Monophyly

To assess monophyly of supergroups, we used the set of genealogies described above to evaluate the molecular support for the supergroups sensu Adl et al 2005 (Tables

1.2-1.7). We analyzed the monophyly (Hennig 1966) of each supergroup in trees having at least two member taxa present (+/-; Tables 1.2-1.7). We do not indicate the method of tree construction. Although the algorithm used is important, we did not find a clear correlation between supported groups and algorithm used. We were also liberal in accepting any level of support (e.g. bootstrap values and posterior probabilities ranged

9

from 4-100%) when determining monophyly in part because there is debate over acceptable cutoff values (Efron et al. 1996; Felsenstein 2004; Lewis et al. 2005).

1.4 Results

1.4.1 Taxonomic Instability

There is considerable instability in taxonomies of the six putative supergroups

(Fig. 1.2). Causes of the rapid revisions in eukaryotic taxonomy over short time periods include: (1) nomenclatural ambiguity, (2) ephemeral and poorly-supported higher level taxa and (3) classification schemes erected under differing taxonomic philosophies. For example, taxonomy of the ‘Amoebozoa,’ a term originally introduced by Lühe in 1913

(Lühe 1913) to encompass a very different assemblage of organisms, has changed considerably in ten years (Fig. 1.2A). ‘Variosea’ was created as a subclade within the

‘Amoebozoa’ in 2004 to group taxonomically unplaced genera of amoebae with

“exceptionally varied phenotype” (Cavalier-Smith et al. 2004). Rarely supported by morphology or molecular evidence (Cavalier-Smith 2004; Fahrni et al. 2003;

Kudryavtsev et al. 2005; Smirnov et al. 2005), this taxon was excluded from subsequent classifications (Adl et al. 2005; Smirnov et al. 2005) but is still discussed in the literature

(Adl et al. 2005; Kudryavtsev et al. 2005). Similarly, the excavate taxon ‘Loukozoa’

(Cavalier-Smith 1998) has been continually redefined to include a variety of taxa bearing a ventral groove (Fig. 1.2B) and finally abandoned (Simpson 2003). The taxonomy of

‘Rhizaria’ has emerged largely from molecular genealogies, and has varied partly in response to shifting topology of gene trees that change with taxon sampling and the method of tree construction (Cavalier-Smith 1998; Cavalier-Smith 2002; Cavalier-Smith and Chao 2003a; Philippe and Adoutte 1998).

10

The taxonomy of ‘Plantae’ is destabilized by the complex history of the term.

Used since Haeckel’s time (Haeckel 1866), ‘Plantae’ has been redefined numerous times to describe various collections of photosynthetic organisms, leading to major discrepancies between taxonomic schemes (Fig. 1.2C; e.g., Margulis and Schwartz 1988;

Whittaker 1969). The term ‘’ was recently introduced to alleviate confusion over ‘Plantae’, but this synonym is not widely used.

The stability of two supergroups, ‘Chromalveolata’ and Opisthokonta cannot be assessed at this time as only a single formal taxonomy exists (Adl et al. 2005). Other classification schemes of eukaryotes segregate animals and fungi as separate kingdoms, and place microbial opisthokonts in the kingdom Protozoa (Fig. 1.1; Cavalier-Smith

1998; Cavalier-Smith 2002). Similarly, chromalveolate members are often divided between the polyphyletic kingdoms ‘Chromista’ and ‘Protozoa’ (Fig. 1.1; e.g., Cavalier-

Smith 2002; Cavalier-Smith 2004).

1.4.2 Varying Support for Membership within and Monophyly of Targeted

Supergroups

Several supergroups are generally well supported when targeted in molecular systematic studies. Strikingly, the monophyly of both the original and expanded

Opisthokonta members is strongly supported in all investigations targeting the group (10 of 10, Table 1.5). Two other supergroups are also well supported: ‘Rhizaria’ monophyly recovered in 11 of 14 studies focusing on this supergroup (Table 1.7) and ‘Amoebozoa’ retained in five of seven topologies (Table 1.2). However, support for these groups is

11

expected given that they were recognized from molecular gene trees (Cavalier-Smith

1996/97; Cavalier-Smith 2002).

‘Excavata’ rarely form a monophyletic group in molecular systematic studies targeting this supergroup (two of nine; Table 1.4). Moreover, the position of putative members , , Parabasalia and Diphylleia vary by analysis (Table

1.4). Three distinct subclades, all of which are supported by ultrastructural characters

(Simpson 2003) are generally recovered (Fornicata [six of six], Preaxostyla [six of six], and [five of eight]; Table 1.4).

Support for two supergroups varies depending on the type of character used: plastid or nuclear. The monophyly of 'Plantae' and 'Chromalveolata' are well supported by plastid characters: four of four plastid analyses (Table 1.6) and six of nine (Table 1.3), respectively. The ‘Plantae’ clade is monophyletic in only three of six analyses using nuclear genes, including EF2 (Moreira et al. 2000) and a 100+ gene analysis that included very limited taxon sample (Rodríguez-Ezpeleta et al. 2005). Nuclear loci never support

'Chromalveolata' (zero of six; Table 1.3), though and often form a clade to the exclusion of haptophytes and cryptophytes (e.g., Harper et al. 2005; Longet et al. 2004; Nishi et al. 2005).

1.4.3 Decreased Support for Monophyly of Supergroups as Outgroups in Other

Studies

For each genealogy we also assessed the monophyly of the supergroups included as outgroups. Overall, we find that support for the monophyly of a given supergroup is

12

stronger when targeted and support decreases when the same supergroup is included as an outgroup in other studies.

This trend is particularly unexpected given our less stringent requirements for monophyly of outgroups: a minimum of only two members need be included, while targeted groups had broader taxon sampling (see Methods). A priori it would seem that the lower stringency could allow a limited sample of supergroup members to substitute for overall supergroup monophyly, thereby increasing the occurrence of supergroup monophyly for outgroup taxa. However, this scenario is realized only in the groups that receive poor support, ‘Excavata’ and ‘Chromalveolata’ assessed by nuclear genes.

‘Excavata’ is monophyletic more frequently when members are included as outgroups

(seven of 30; Tables 1.2, 1.3, and 1.5-1.7 versus two of nine; Table 1.4). Taxonomic sampling of these lineages is often considerably lower in non-targeted analysis and monophyly reflects that of the subclades ‘Discicristata’ or ‘Fornicata’ (such as in (Fahrni et al. 2003; Keeling et al. 1998; Reece et al. 2004); Table 1.7, but see (Cavalier-Smith

2003; Hampl et al. 2005) for two exceptions). ‘Chromalveolata’ is monophyletic in 10 of

45 nuclear gene trees targeting other taxonomic areas (Tables 1.2 and 1.4-1.7).

Intriguingly, in all ten of the cases where nuclear genes support monophyletic

‘Chromalveolata’, only alveolates and stramenopiles are included (Tables 1.2-1.7).

In contrast, the remaining supergroups are monophyletic less often when included as outgroups. For example, 'Opisthokonta’ was recovered in all studies targeting this supergroup, but in 33 of 41 studies that target other groups (Tables 1.3-1.7). Similarly, both the ‘Amoebozoa’ and 'Rhizaria' are monophyletic less often when their members are included as outgroups in studies targeting the remaining five supergroups (15 of 35 and

13

eight of 15, respectively: Tables 1.3-1.7 and 1.2-1.6). When included as an outgroup

‘Plantae’ plastids usually form a monophyletic clade (eight of nine analyses, Table 1.3) but support is much lower in nuclear gene trees (11 of 42, Tables 1.2-1.5, 1.7).

1.5 Discussion

Our analysis reveals varying levels of stability and support for the six supergroups. Below, we assess the status of each supergroup, describe factors that contribute to the instability, and propose measures to improve reconstruction of an accurate eukaryotic phylogeny.

1.5.1 Supergroup Robustness

Robust taxa — those consistently supported by multiple data sets — are emerging and include the supergroup Opisthokonta. This group of animals, fungi, and their microbial relatives receives consistent support in molecular genealogies. This supergroup was monophyletic in 43 of 51 trees we analyzed (Tables 1.2-1.7).

Opisthokonta is also united by additional types of data: most members share a single posterior flagellum, contain plate-like cristae in mitochondria, and have an insertion within the Elongation Factor 1α gene (Baldauf and Palmer 1993; Cavalier-Smith and

Chao 1995; Patterson 1999; Steenkamp et al. 2006).

The remaining five supergroups receive varying degrees of support from molecular genealogies. ‘Amoebozoa’ and ‘Rhizaria’ received high support in analyses that targeted them (Tables 1.2 and 1.7 respectively), but formed monophyletic clades less often when included as outgroups. The two photosynthetic clades ‘Chromalveolata’ and

14

‘Plantae’ receive differential support depending on the origin of the gene: high support in plastid genealogies but low in nuclear gene trees (Tables 1.3 and 1.6, see Results).

Molecular support for the ‘Excavata’ as a whole is lacking from well-sampled gene trees

(Table 1.4).

Although the six supergroups are not consistently supported by molecular genealogies, some nested clades are emerging as robust groups. For example, a sister relationship between Alveolata and Stramenopila is often recovered. It is this relationship makes ‘Chromalveolata’ appear monophyletic in nuclear genealogies when only these clades are included (e.g., (Baldauf et al. 2000; Nikolaev et al. 2004)). There is also growing support for several subgroups within the poorly supported ‘Excavata’ (i.e.

Fornicata, Preaxostyla; Table 1.4).

1.5.2 Alternative hypotheses

Although it is clear from our analysis that eukaryotic supergroups are not well supported, no alternative high-level groupings emerge from molecular genealogies.

Rather, there is support for lower-level groups, such as the 'Excavata’ subgroups discussed above and perhaps also alveolates + stramenopiles. This suggests that either there are no higher-level groupings to be founds, or there is as yet inadequate data to resolve these clades. We feel that lack of taxon sampling is the key to resolution.

Further evidence against the six supergroup view of eukaryotic diversity is the existence of ‘nomadic’ taxa ⎯ lineages that do not have a consistent sister group, but instead wander between various weakly supported positions. Some ‘nomadic’ taxa are acknowledged (of unknown taxonomic position) such as ,

15

Breviata, and Apusomonadidae (Adl et al. 2005; Patterson 1999). Other taxa that have been assigned to supergroups also appear to be ‘nomadic’ including Haptophyta (putative member of ‘Chromalveolata’) and Malawimonas (putative member of ‘Excavata’). For example, the haptophytes variously branch with Centrohelida and red algae (Cavalier-

Smith et al. 2004), sister to a clade of ‘Rhizaria’ and Heterolobosea (Fahrni et al. 2003), sister to cryptophytes (Harper et al. 2005), and in a basal polytomy (Nikolaev et al.

2004). These ‘nomadic’ taxa may either represent independent, early diverging lineages or their phylogenetic position cannot yet be resolved with the data available. Again, we feel that taxon sampling is the key in order to distinguish between these possibilities.

1.5.3 Why is Eukaryotic Taxonomy so Difficult?

The variable support for relationships is in part attributable to the inherent difficulty of deep phylogeny, the chimeric nature of eukaryotes, misidentified organisms, and conflicting approaches to taxonomy. Here we elaborate on these destabilizing trends and provide illustrative examples.

Challenges of deep phylogeny: Reconstructing the history of eukaryotic lineages requires extraction of phylogenetic signal from the noise that has accumulated over many hundreds of millions of years of divergent evolutionary histories. There is doubt whether resolution of divergences this deep can be resolved with molecular data (Penny et al.

2001). Additionally, the nature of the relationships may also pose a significant challenge.

For example, a rapid radiation of major eukaryotic lineages has been proposed (Philippe et al. 2000) and is the most difficult scenario to resolve because of the lack of time to accumulate synapomorphies at deep nodes.

16

Further, phylogenetic relationships can be obscured by heterogeneous rates of evolution and divergent selection pressures. For example, genes in many parasitic lineages of eukaryotes experience elevated rates of evolution. If not properly accounted for, these fast lineages will group together due to long-branch attraction (Felsenstein

1978; Felsenstein 1988). This was the case for Microsporidia, intracellular parasites of animals; early small subunit rDNA (SSU; HomolGene#6629) genealogies placed the

Microsporidia at the base of the tree with other amitochondriate taxa including Giardia and Entamoeba (Sogin 1991). These parasites were united under the hypothesis (Cavalier-Smith 1983). More recent analyses with appropriate models of evolution (Van de Peer et al. 2000) and those using protein-coding genes (Keeling et al.

2000) place the Microsporidia within fungi, and falsify Archezoa. This example demonstrates the importance of phylogenetic methods in the interpretation of eukaryotic diversity. In our analysis we find no clear correlation between method of tree building and group stability. Arguments about phylogenetic inference have been discussed extensively (Embley et al. 2003; Gribaldo and Philippe 2002; Inagaki et al. 2004; Penny et al. 2001; Philippe and Forterre 1999; Philippe and Germot 2000; Roger 1999; Susko et al. 2002) and increasingly sophisticated algorithms are being developed to compensate for the difficulties (Huelsenbeck and Ronquist 2001; Ronquist and Huelsenbeck 2003;

Yang 1997).

The chimeric nature of eukaryotes: Reconstructing the history of eukaryotic lineages is complicated by horizontal transfer of genes and organelles (Andersson 2005;

Huang et al. 2004; Katz 1999; Katz 2002; Roger 1999). For example, ‘Chromalveolata’ plastid genes tell one story, consistent with a single transfer from red algae, which is not

17

currently supported by available nuclear genes (Table 1.3). There is also a growing body of evidence for aberrant lateral gene transfers in eukaryotes (reviewed in Andersson

2005; Katz 2002).

Instability due to misidentification: Misidentification destabilizes taxonomy because all efforts to classify a misidentified organism reach erroneous conclusions.

Cases of misidentification lead to inaccurate conclusions and require considerable effort to remedy. There is a rigorous standard for identifying microbial eukaryotes but this standard is not always upheld. For example, the putative ‘Amoebozoa’ species

invertens" that always branched outside the ‘Amoebozoa’ clade (Bolivar et al. 2001; Cavalier-Smith 2004; Cavalier-Smith et al. 2004) was misidentified (Walker et al. 2006); it has now been properly described as anathema and is not yet placed within any of the supergroups (Walker et al. 2006).

Inaccurate conclusions about organismal relationships can also result from contamination by contaminating DNA (e.g., symbionts and parasites). The results of subsequent molecular genealogies are therefore wrong and misleading. For example, opalinids, multinucleated flagellates that inhabit the lower digestive track of Anurans, were placed in the stramenopiles (: ‘Chromalevolata’) based on ultrastructural data (Patterson 1985). However, the first molecular sequences for this group placed them within fungi ( ranarum; GB#AF141969 and virguloidea;

GB#AF141970) (Guillou et al. 1999; Karpov et al. 2001). These sequences were later shown to belong to zygomycete fungal contaminants, not to the opalinids. Subsequent isolates ( intestinalis; GB#AY576544-AY576546) yielded genealogies congruent with the ultrastructural data, placing P. intestinalis within the stramenopiles

18

(Kostka et al. 2004). To avoid setbacks and confusion due to misidentification we propose that all analyses of eukaryotic diversity include of a vouchering system for strains, images and DNAs.

Conflicting approaches to taxonomy: Our evaluation of the stability of taxonomy for supergroups reveals a rapidly changing landscape (Figs. 1.1 and 1.2). The instability in higher-level classifications of eukaryotes reflects the diversity of philosophical approaches, the exploratory state of taxonomy, and premature taxon naming. Many researchers seek schemes based on monophyletic groupings so that their taxonomies reflect evolutionary relationships (Adl et al. 2005; Lipscomb 1984; Patterson 1999;

Simpson 1997). In contrast, others employ a taxonomic philosophy in which evolutionary relatedness and monophyly are just one criterion from a set of group characteristics (Cavalier-Smith 2002). —taxon without all descendants—is tolerated in these system, and paraphyletic taxa are designated as such (see Cavalier-

Smith 1998).

In many cases, classification schemes that are separated by two years or less vary substantially from one another (e.g., Fig. 1.2A and B). New groups and fluctuating group composition result in numerous cases of homonymy (two concepts linked to one name), synonymy (one concept linked to two names), and redefinition of existing terms. For example, at the highest level the terms Amoebozoa, Opisthokonta, and Plantae were all introduced under different definitions (Copeland 1956; Haeckel 1866; Lühe 1913) before being applied to supergroups. The term Plantae is an extreme case of homonymy having referred to numerous groups of photosynthetic organisms over the past century and a half

19

(Fig. 1.2C). The rapidly changing taxonomic landscape makes it difficult for non- specialists as well specialists to follow the current debate over supergroups.

1.5.4 Towards a Robust Scaffold of the Eukaryotic Tree of Life

Taxonomic sampling: Perhaps the most critical aspect of the current state of eukaryotic systematics is the very limited taxonomic sampling to date. This is particularly problematic as the supergroup literature is often derived from a resampled pool of genes and taxa. More than 60 lineages of microbial eukaryotes have been identified by ultrastructure (Patterson 1999), yet only about half of these have been included in molecular analyses. Furthermore, even when these lineages are included, they are generally represented by a single species. Such sparse sampling increases the risk of long branch attraction as discussed above, such as occurred for Giardia, and may cause artefactual relationships (Hendy and Penny 1989). Further, analyses of sequences from newly sampled lineages have altered or expanded supergroup definitions (e.g., nucleariids in Opisthokonta (Amaral-Zettler et al. 2001) and in ‘Rhizaria'

(Polet et al. 2004). Thus, statements of monophyly may be premature when taxonomic sampling is low.

There is tension between increasing the number of taxa versus the numbers of genes. Several theoretical works have demonstrated the diminishing returns of increased number of genes relative to increased taxon sampling (Bapteste et al. 2002; Cummings and Meyer 2005; Cummings et al. 1995), but see (Rokas and Carroll 2005). In addition, increasing taxon sampling can lead to shifts in molecular tree topology (DeBry 2005;

Hillis 1998; Hillis et al. 2003). These results provide incentive to concentrate sequencing

20

efforts on obtaining more taxa and a moderate number of genes. We recommend increasing the lineages sampled, and the number of diverse taxa within lineages. We are optimistic that as data becomes available from a greater diversity of taxa, eukaryotic phylogeny will become increasingly more resolved.

Multiple Character Sets: We further anticipate that support for clades will increase as additional character sets are incorporated. Phylogenies based on single characters, whether genes, morphology, or ultrastructure, are subject to biases in the data and are not reliable by themselves. Hence, multiple character sets should be used to corroborate results. Ultrastructural apomorphies combined with molecular genealogies have proven to be good indicators of phylogeny at the level below supergroups (Simpson

2003; Taylor 1999). This approach has bolstered support for Fornicata and Preaxostyla, which are consistently recovered in molecular genealogies and have defining ultrastructural characters.

Multigene and genome scale molecular systematics provide another powerful tool for resolving ancient splits in the tree of life. The NSF initiative “Assembling the Tree of

Life” provides evidence of this shift in systematics research whereby all proposals involve multigene or genome (organellar) sequencing to establish robust phylogenetic hypotheses (see http://www.nsf.gov/pubs/2005/nsf05523/nsf05523.htm; Rodríguez-

Ezpeleta et al. 2005). The EuTree consortium (www.eutree.org) aims to increase the sampled diversity of eukaryotes by focusing on understudied lineages in our multigene project to assemble the tree of life.

An example of multigene study is analysis of genes involved in clade specific functions. This approach has been employed in testing ‘Plantae’ and ‘Chromalveolata’

21

(e.g., Nosenko et al. 2006). A single endosymbiosis (of a cyanobacterium in ‘Plantae’ and red alga in ‘Chromalveolata’) predicts that the systems that facilitate controlled exchange of metabolic intermediates between the symbiotic partners be shared by putative members of these two supergroups (Bhattacharya et al. 2004). This prediction has been supported by analyses of the plastid import machinery (McFadden and van

Dooren 2004) and antiporters that transport fixed carbons across the plastid membranes

(Weber et al. 2006). However, taxon sampling has been limited in these studies.

Currently, increased sampling of genomes from diverse photosynthetic eukaryotes is yielding additional genes for clade specific predictions (Sanchez-Puerta et al. 2005; Yoon et al. 2005).

A conservative approach to taxonomy: Because taxonomy is the foundation for much of the dialog and research in evolutionary biology, there must be an unambiguous taxonomic system in which one term is linked to one concept. In contrast to this ideal, homonymy and redefinition are prevalent in the taxonomy of eukaryotes, often as the result of premature introduction or redefinition of taxa (see above; Fig. 1.2). Emerging hypotheses benefit the community by sparking new research to test the hypothesis, but they also introduce ambiguity. To alleviate the confusion we suggest introducing hypotheses as informal groups and using inverted commas to indicate the existence of a caveat, as done in many uncertain groups and in this manuscript. These steps will inform the community that group composition is likely to change, alleviate quick taxon turnover, and promote stable taxa that are more resistant to compositional change.

As increasing amounts of data become available, well-supported nodes emerge and classifications tend to stabilize, such as is occurring for the ordinal framework for

22

angiosperms (Group 1998; GroupII 2003). Similarly, we expect that this conservative approach, combined with increased sampling of taxa and genes, will promote the future stabilization of eukaryotic classification.

1.6 Conclusion

Although the level of support varies among groups, the current classification of eukaryotes into six supergroups is being adopted broadly by the biological community

(i.e. evidenced by its appearance in biology textbooks). The supergroup Opisthokonta and a number of nested clades within supergroups are supported by most studies.

However, support for ‘Amoebozoa’, ‘Chromalveolata’, ‘Excavata’, ‘Plantae’, and

‘Rhizaria’ is less consistent. The supergroups, and eukaryotic taxonomy in general, are further destabilized by considerable fluidity of taxa, taxon membership, and ambiguous nomenclature as revealed by comparison of classification schemes.

The accurate reconstruction of the eukaryotic tree of life requires: 1) a more inclusive sample of microbial eukaryotes; 2) distinguishing emerging hypotheses from taxa corroborated by multiple data sets; and 3) a conservative, mutually agreed upon approach to establishing taxonomies. Analyses of these types of data from a broad, inclusive sampling of eukaryotes is likely to lead to a robust scaffold for the eukaryotic tree of life.

23

Table 1.1. Summary of Eukaryotic Supergroups

24

Table 1.2. Support for membership and supergroup monophyly from ‘Amoebozoa’ targeted molecular genealogies

Membershipa Supergroup Monophylyb Year Gene My Dc Tu Am Fl Pe Ma En Rs Brc A C E O P R 1995 SSU       — —   — 1997 SSU       — —  — — 2001 SSU           — —  — — 2002 Multi      —   2003             — 2003 SSU           —   — — 2004 SSU            — —  — 

2000 Multi      —   2002 SSU        —   — — 2004 SSU           — — —  —  a Membership coding:  indicates the member taxon falls within the supergroup ‘Amoebozoa’;  indicates that the member taxon is excluded from the ‘Amoebozoa’ clade, or no clade is formed. Papers below blank line survey eukaryotic diversity, and are included in all analyses. Member taxa: My: , Dc: Dictyosteliids, Tu: (Lobosea, Gymnamoebea sensu stricto), Am: , Fl: Flabellinea (, Glycostylea), Pe: , Ma: Mastigamoebidae, En: , Rs: residua, Br: Breviata = “Mastigamoeba invertans sensu NCBI”. b Supergroup Monophyly:  indicates monophyly, — indicates group is para- or polyphyletic, and blank indicates insufficient data available. Supergroup definition based on Adl et al 2005: A: ‘Amoebozoa’, C: ‘Chromalveolata’, E: ‘Excavata’, O: Opisthokonta, P: ‘Plantae’, R: ‘Rhizaria’ c The position of Breviata, Br, was not considered when scoring the monophyly of ‘Amoebozoa’ as this organism was misidentified and affiliations are unknown (see text). d Some nodes were constrained in this analysis.

25

Table 1.3. Support for membership and supergroup monophyly from ‘Chromalveolata’ targeted molecular genealogies

Membershipa Supergroup Monophylyb Year Gene Locc Al St Ha Cr A C E O P R 1999 Multi Pla    —  2002 Multi Pla      2003 GAPDH Pla       2004 Multi Pla     —  2004 Multi Pla       2004 FBA II Pla      — 2005 Multi Pla       2005 PsbA Pla     —  2005 Multi Pla       2003 GAPDH Nuc     — — — — 2003 Hsp90 Nuc    — —  — 2005 Hsp90 Nuc     — — — — — 2005 Multi Nuc     — — —  — 2005 β-tub Nuc    — — — 2005 SSU Nuc     — —

2000 Multi Nuc     —   2002 SSU Nuc      —   — — 2004 SSU Nuc     — — —  —  a Member taxa: Al: Alveolata, St: Stramenopiles (), Ha: Haptophyta, Cr: . b Monophyletic ‘Plantae’ from plastid genealogies includes secondarily derived plastids. c Loc: Location (genome) from which gene of interest originated. Pla: Plastid genome, Nuc: Nuclear genome, Mit: Mitochondrial genome. See Table 1.2 for further notes.

26

Table 1.4. Support for membership and supergroup monophyly from ‘Excavata’ targeted molecular genealogies

Membershipa Supergroup Monophyly Year Gene Di Rt Cp Tr Ox Ht Eu Ml Jk Pa Dyb A C E O P R 2001 SSU       — — —   2002 SSU        — — —  —  2002 SSU          — — —  —  2002 Tub        — —  — — 2003 SSU           — — —  —  2003 SSU            —   —  2003 SSU       — —  — 2005 Multi            —   2005 Multi       

2000 Multi       —   2002 SSU       —   — — 2004 SSU        — — —  —  a Member taxa: Di: Diplomonadida, Rt: Retortamonadida, Cp: , Tr: , Ox: Oxymonadida, Ht: Heterolobosea, Eu: Euglenozoa, Ml: Malawimonas, Jk: Jakobida, Pa: Parabasalia, Dy: Diphylleia. Hypothesized subgroups:  Fornicata clade (Di+Rt+Cp) monophyletic,  Preaxostyla clade (Ox+Tr) monophyletic,  Discicristata clade (Ht+ Eu) monophyletic. b The position of Diphylleia, Dy, was not considered when scoring the monophyly of ‘Excavata’ as the inclusion of this organism within Excavata is controversial, and has been removed from recent classifications (see text) See Table 1.2 for further notes.

27

Table 1.5. Support for membership and supergroup monophyly from Opisthokonta targeted molecular genealogies

Membershipa Supergroup Monophyly Year Gene Mt Fu Cf Cy Ic Co Nu Mi Apb A C E O P R 1993 SSU    —   1993 βTub   —  1993 Actin   — — —  1995 SSU     — —   — 1996 SSU        — —   — 1998 SSU      —  — 2003 SSU          — —  —  2003 HSP90     — —  — 2003 EF1α      — 2006 Multi            — 

2000 Multi     —   2002 SSU        — —   — — 2004 SSU         — — —  —  a Member taxa: Mt: Metazoa, Fu: Fungi, Cf: Choanomonada, Cy: chytrids, Ic: Ichthyosporea (DRIPs), Cl: , Nu: , Mi: , Ap: apusomonads. b The position of apusomonads, Ap, was not considered when scoring the monophyly of Opisthokonta as this organism is highly variable, and it has been removed from recent classifications (see text) See Table 1.2 for further notes.

28

Table 1.6. Support for membership and supergroup monophyly from ‘Plantae’ targeted molecular genealogies

Membershipa Supergroup Monophyly Year Gene Loc Gr Rh Gl A C E O P R 2002 Genome Pla    —  2004 Multi Pla    —  2004 Genome Pla    —  2005 Multi Pla      1995 SSU Nuc    — —  —  1997 Actin Nuc     — —  2000 EF2 Nuc   — — —   2001 RPB1 Nuc   — —  — 2005 Multi Nuc    —    — 2005 Multi Nuc      

2000 Multi Nuc      —   2002 SSU Nuc     —   — — 2004 SSU Nuc    — — —  —  a Member taxa: Gr: Chloroplastida = (Green algae, including land plants), Rd: Rhodophyceae (Red algae), Gl: Glaucophyta See Table 1.2 for general notes and Table 1.3 for plastid specific notes.

29

Table 1.7. Support for membership and supergroup monophyly from ‘Rhizaria’ targeted molecular genealogies

Membershipa Supergroup Monophyly Year Gene Ce Ch Eg Pt Gr Fo Hs Ph Ac Po Ds Rs A C E O P R 1997 SSU      — —  — — 1998 αTub    —  — — 2001 Actin     — — — —  2003 SSU         —  —  2003 SSU          — —  —  2003 SSU        —   2003 RPB1      — — —   2004 Ubq/Act      — —  —  2004 Act        — — —   2004 RBP1      — — —   2004 SSU            —  2004 Act/SSU              —   —  2004 SSU        — — — — — 2004 Actin     — —  

2002 SSU         —   — — 2004 SSU         — — —  —  aMember taxa: Ce: , Ch: Chlorarachniophyta, Eg: euglyphids, Pt: (plasmophorids), Ph: Phaeodarea, Gr: , Fo: Foraminifera, Hs: Haplosporidia (), Po: Polycystinea, Ac: Acantharia, Ds: desmothoracids, Rs: residua, Ap: apusomonads. See Table 1.2 for further notes.

30

Figure 1.1. Trends in the taxonomy of eukaryotes. A comparison of four representative taxonomies illustrates trends within eukaryotic taxonomy over the past 50 years (Adl et al. 2005; Cavalier-Smith 1998; Margulis and Schwartz 1988; Whittaker 1969). Movement of taxa is traced from earlier to more recent taxonomies with solid and dashed lines. A solid line indicates all members of a group (left of line) are incorporated into the subsequent group (right of line). Dashed lines indicate that a subset of members (left) is incorporated into subsequent groups (right).

31

Figure 1.2. Trends in supergroup taxonomy. A comparison of three formal classifications illustrates trends within A) ‘Amoebozoa’ (Adl et al. 2005; Cavalier-Smith 2004; Smirnov et al. 2005), B) ‘Excavata’ (Cavalier- Smith 2002; Cavalier-Smith 2003), C) ‘Plantae’ (Adl et al. 2005; Cavalier-Smith 1998; Whittaker 1969), and D) ‘Rhizaria’ (Adl et al. 2005; Cavalier-Smith 1998; Cavalier- Smith 2002). A majority of solid, horizontal lines would indicate temporal stability of supergroup classification. For visual simplicity we do not indicate groups newly included in the supergroups or taxonomic restructuring within subgroups. * indicates a newly introduced term. ‘Chromalveolata’ and Opisthokonta are not included because only one formal taxonomy exists for both groups. See Fig. 1.1 for further notes.

32

CHAPTER 2

THE DYNAMIC NATURE OF EUKARYOTIC GENOMES

2.1 Abstract

Analyses from diverse eukaryotes reveal that genomes are dynamic, sometimes dramatically so. In numerous lineages across the eukaryotic tree of life, DNA content varies within individuals throughout life cycles and among individuals within species.

Novel genome features are discovered and our understanding of the extent of genome dynamism continues to grow as more genomes are sequenced. Though most completed eukaryotic genomes are from animals, fungi, and plants, these lineages represent only three of the 70+ lineages of eukaryotes. Here, we discuss the diverse genomic strategies in exemplar eukaryotic lineages, including several microbial eukaryotes, to reveal dramatic variation that challenges established views of genome evolution. For example, in the life cycle of some members of the 'radiolaria' ploidy increases from haploid (N) to approximately 1000N, while intrapopulation variability of the enteric parasite Entamoeba ranges from 4N to 40N. Variation has also been found within our own species, with substantial differences in both gene content and lengths between individuals. Data on the dynamic nature of genomes shift the perception of the genome from being fixed and characteristic of a species (typological) to plastic due to variation within and between species.

33

2.2 Introduction

Genomes are traditionally perceived as having fixed karyotypes within eukaryotic species (e.g., Hartwell et al. 2004; Lewin 2000). Under such a model, closely related individuals (within populations or species) share nearly identical genomes and genomic integrity is maintained through the cell cycle. This static notion of the genome is challenged by studies in a variety of lineages that demonstrate considerable variation in genomic DNA content throughout organismal life cycles and among members of a single species. As discussed below, data from diverse eukaryotic lineages reveal extensive intra- and interspecific variation in genome content. These data, along with recognition of the widespread influence of epigenetics on the genome (Bird 2007; Cerutti and Casas-

Mollano 2006; Katz 2006; Richards 2006), illuminate an increasingly dynamic picture of the eukaryotic genome.

Examining the phylogenetic distribution of genome dynamics enhances interpretation of the evolution of genome features. The phylogenetic framework of eukaryotes has shifted from the five (Margulis and Schwartz 1988; Whittaker 1969) or six (Cavalier-Smith 2002) kingdom systems emphasizing plants, animals, and fungi towards systems recognizing that these macroscopic groups are only three of an estimated

70 lineages (Patterson 1999), the rest are diverse microbial lineages. The current view of eukaryotic classification divides eukaryotes into six “supergroups” that encompass both macrobial and microbial members (Adl et al. 2005; Baldauf et al. 2000; Keeling et al.

2005), although this classification is likely premature as some groups are poorly supported (Parfrey et al. 2006). The diverse microbial lineages employ many genomic strategies and are known to provide extreme examples of some, such as genome

34

processing in some ciliates that generates up to 25,000,000 somatic

(McGrath and Katz 2004; Raikov 1982; Zufall et al. 2005). Thus, the consideration of microbial eukaryotes expands understanding of both the distribution and types of genome dynamics.

To elucidate the range of variation in eukaryotic genomes we focus on two aspects of intraspecific genome dynamics: (1) variation in nuclear cycles within lineages, and (2) variation in genome content among individuals within species. Selected examples are presented to highlight the broad phylogenetic distribution of dynamic features of eukaryotic genomes (bold lineages Fig. 2.1; images of representative taxa Fig.

2.2). Elsewhere there are extensive discussions of related topics such as the evolution of sex and meiosis (Archetti 2004; Kondrashov 1994; Mable and Otto 1998; Nuismer and

Otto 2004) and changes in ploidy associated with speciation (genome duplication, hybridization, etc; e.g. Bennett 2004; Otto and Whitton 2000).

We first consider changes in ploidy levels and genome content during the life cycles of exemplar lineages. To facilitate discussion, we present a generalized nuclear cycle for eukaryotes that depicts the progression through meiosis and karyogamy with intervening rounds of , such as occurs in the alternation of generations in plants

(Fig. 2.3A). This generalized nuclear cycle provides a common framework and terminology that enable comparison among diverse eukaryotic lineages. Meiosis and karyogamy, which respectively halve and double genome ploidy (coded yellow and red in Fig. 2.3), are the basis of the sexual haploid-diploid cycle that underlies the typified nuclear cycles of plants, animals, and fungi (Fig. 2.3A-C). Variation in nuclear cycles can be imparted through elimination of some stages (e.g. asexual eukaryotes reproduce

35

via mitosis alone, and in many lineages haploid nuclei do not undergo mitosis).

Our focus is on the diversity of nuclear cycles that deviate from textbook versions of haploid-diploid cycles (Fig. 2.3D-I).

A second aspect of dynamic genomes explored here is the heterogeneity of DNA content among conspecific individuals (♦ Fig. 2.1 and Fig. 2.4). In contrast to the perception that single nucleotide polymorphisms (SNPs) are responsible for intraspecific differentiation, emerging data demonstrate a greater range of variation. This intraspecific genomic variation ranges from insertions and deletions of short stretches of DNA through population differences in karyotype and ploidy. Examples from animals, fungi and microbes are used to illustrate intraspecific genomic variability.

Underlying the diversity of genomic processes are similarities that suggest an ancient origin of the basic features of the genome propagation including meiosis and mitosis. We argue that eukaryotes share mechanisms, possibly epigenetic, to maintain the integrity of genetic material for transmission to future generations while also allowing extensive variation within life cycles and among conspecific individuals. From the observed diversity several trends in genome dynamics emerge such as: 1) elevated ploidy levels in large cells coupled with diverse mechanisms for reducing ploidy (e.g.

Foraminifera, Phaeodarea, and the bacterium Epulopiscium fishelsoni); 2) dynamism in the somatic genome in lineages with sequestered germlines (e.g. ciliates and animals); and 3) ploidy cycles in asexual lineages (e.g. Giardia, Entamoeba, and Amoeba). We further suggest that intraspecific variation in genome content is a widespread source of phenotypic variation.

36

2.3 Varying DNA content during nuclear cycles

DNA content elevated over the diploid level is the most common manifestation of dynamic eukaryotic genomes (Fig. 2.1), and even occurs in some and archaea.

Two mechanisms by which organisms increase their DNA content are endoreplication resulting in and genome amplification. There are two fates of polyploid cells:

(1) terminal polyploidy and (2) cyclic polyploidy that is later reduced by a non-meiotic mechanism (Fig. 2.1 ⇑ and ↑↓, respectively). Terminal polyploidy occurs in differentiated somatic tissues in many multicellular taxa, including red algae, green algae, and animals, and in the somatic nuclei of ciliates (Fig. 2.3D and E). In lineages characterized by cyclic polyploidy, increasing genome content is a phase in the life cycle and ploidy levels are reduced prior to either sexual or asexual reproduction (Fig. 2.3G-I).

These groups have diverse strategies for reducing ploidy level, such as genome segregation in which one polyploid nucleus yields thousands of haploid daughter nuclei

(Raikov 1982). Genomic DNA levels can also be reduced during life cycles of some lineages when portions of the genome are eliminated in response to stress or during development (Fig. 2.3D-F). Below we introduce examples of these types of dynamic genomes, though in many cases the molecular details of genome processes have yet to be elucidated.

Ciliates (Fig. 2.2A) are microbial eukaryotes that undergo extensive genomic processing through genome amplification and chromosome fragmentation during the development of the somatic (Fig. 2.3D; Katz 2001; Prescott 1994). During conjugation two cells exchange meiotically produced micronuclear-derived gametic nuclei that fuse to become the zygotic nucleus. Both the macronucleus and the germline

37

, which does not undergo genome processing, differentiate from the zygotic nucleus following conjugation. The nuclear cycle of the germline micronucleus is similar to animals in that it is sequestered from the somatic genome, though in ciliates this all occurs within a single cell. Macronuclear genomes are highly polyploid, with a range of

60 copies of each chromosome in Tetrahymena to as many as 1000 copies of a chromosome in . The parental macronucleus degrades during conjugation, but influences the developing macronucleus through epigenetic processes (Mochizuki and Gorovsky 2005). The somatic macronculear genomes of the model ciliates

Tetrahymena (Eisen et al. 2006) and (Aury et al. 2006) have recently been completed, and planned micronuclear genome sequencing will elucidate further genome processing in this lineage.

Selective elimination of germline-limited DNA during somatic development also occurs in widely distributed lineages of animals, including hagfish (Vertebrata), dicyemid worms (Mesozoa), ascarids (Nematoda), and copepods (Crustacea) (Awata et al. 2006;

Kloc and Zagrodzinska 2001; Redi et al. 2001; Zufall et al. 2005). In animals this process is referred to as chromatin diminution and results in the loss of 15 – 95% of germline DNA (Kloc and Zagrodzinska 2001). Following fertilization in copepods, the genome is endoreplicated five to ten-fold. Roughly half of the germline genome, predominately highly repetitive heterochromatin, is then eliminated during diminution

(Fig. 2.3E; Beermann 1977). The resulting diminuted somatic nuclei (diploid) contain the same amount of DNA as the haploid sperm cells (Drouin 2006; Wyngaard et al.

2001). The broad phylogenetic distribution of somatic chromosome processing in animals indicates that extensive modification of the somatic genome may be ancient in

38

lineages with sequestered germline genomes. We anticipate that numerous additional examples will be discovered.

In the plant flax (Linum usitatissimum), reduction of DNA content occurs as a response to stress in the lifetime of the individual (Fig. 2.3F; Cullis 2005). Several studies find that the amount of DNA in the genome can be reduced by 15% when the plant is grown under stress (Fig. 2.3F), and this reduction appears to effect high, middle, and low repetitive classes of DNA equally (Cullis 1973b; Cullis 1980; Evans et al. 1966).

Thus, it is possible that coding regions are lost in addition to highly repetitive, non- coding DNA. The genomic response to stress is non-random, as the same stress induced genomic changes occur repeatedly (Cullis 2005). Intriguingly, an increase in homologous recombination has also been measured in Arabidopsis when this model plant is exposed to stress by ultraviolet light or pathogens (Molinier et al. 2006). In the case of flax, Cullis (2005) argues that genomic changes are reversible as after six generations in native conditions the full genome complement returns in new tissues. These observations suggest that this response may be epigenetic in nature.

Foraminifera, a diverse group of amoebae (Fig. 2.2B), have dynamic genomes that provide examples of genome segregation and possibly genomic processing. Nuclear cycles have been studied in fewer than 1% of extant Foraminifera species (Goldstein

1999), but are generally characterized by an alternation between multinucleated diploid agamonts and uninucleate haploid gamonts (Fig. 2.3G). The large agamonts reproduce by meiosis and multiple , yielding numerous haploid gamonts. The single gamontic nucleus then increases in size and DNA content as the organism grows, reaching 400 µm in diameter in some species (Bowser et al. 2006; Goldstein 1997). It is

39

not known whether DNA content increases through polyploidization, by differentially amplifying some portion of the genome, or through a combination of the two. Just prior to gametogenesis nuclear products, predominately nucleoli plus some DNA, are expelled from the gamontic nucleus and degraded in a process referred to as ‘Zerfall’ (Føyn 1936).

This ‘nuclear cleansing’ during ‘Zerfall’ may facilitate a return to the haploid genome before reproduction (Bowser et al. 2006). The retained DNA proliferates through many rounds of mitosis (Arnold 1955; Føyn 1936) to generate hundreds to thousands of small

(~ 5 µm) haploid gametes (Bé and Anderson 1976; Goldstein 1997).

Amoeba proteus (Fig. 2.2C) is reported to reduce the DNA levels in its nucleus by depolyploidization in the absence of mitosis (Fig. 2.3H). A. proteus, an asexual lobose amoebae, contains over 500 chromosomes and is believed to be polyploid (Raikov

1982). Though there are no direct measurements, several indirect methods have been applied to show ploidy variations (Afonkin 1986). Cytofluorimetric data shows that the

DNA content of A. proteus increases and decreases by up to 2.9 fold during interphase of the cell cycle (Makhlin et al. 1979). DNA synthesis is reported to occur in bursts, with one major peak right after mitosis and another smaller peak immediately before subsequent mitosis and (Ord 1968). Taken together these studies suggest that A. proteus becomes polyploid during interphase and that prior to reproduction, this surplus of DNA is eliminated by depolyploidization to recover the haploid genome.

Under this hypothesis, the genome is then replicated and passed on to daughter A. proteus cells by mitosis (Fig. 2.3H). Intriguingly, A. proteus is a congener of Amoeba dubia, the with the largest reported genome at 670,000 MB (Friz 1968), and may share mechanisms for managing huge amounts of DNA during its life cycle with A. proteus.

40

Phaeodarea (Fig. 2.2D), a member of the polyphyletic ‘radiolaria’ that is currently placed within the ‘Cercozoa’, reduce ploidy levels in their large polyploid nucleus by genome segregation (Fig. 2.3I; Grell 1953; Raikov 1982). While the complete nuclear cycle of this lineage has not been fully elaborated, in part as no member of this clade has yet to be maintained in culture, Phaeodarea members are reported to be asexual and have a single nucleus that increases in ploidy by endomitosis. Ploidy is reduced prior to reproduction as the chromatin in this large (~100 µm in diameter) nucleus condenses into thousands of polytene “chromosomes” that segregate along microtubules, each becoming a nucleus (Fig. 2.3I; Grell and Ruthmann 1964). These polytene

“chromosomes” later break down into 10-12 smaller chromosomes, presumably the haploid genome complement. The subsequent nuclei undergo rounds of mitosis yielding thousands of daughter nuclei that are individually packaged into biflagellate spores along with other organelles. Production of spores consumes the entire of the parent, and allows alternation between a large vegetative cell and small reproductive cells

(Raikov 1982).

Although not the focus of this manuscript, variation in ploidy levels is also reported from bacteria and archaea. DNA content in Escherichia coli varies from two, four, or eight genome equivalents in stationary phase up to 11 genome equivalents per cell in early exponential phase (Akerlund et al. 1995). Similar variability is reported for

Methanococcus jannaschii, Micrococcus radiodurans, PCC6301,

Desulfovibrio gigas, Borrelia hermsii, Azotobacter vinlandii (Bendich and Drlica 2000).

The giant bacterium Epulopiscium fishelsoni (up to 600 µm in length) has a 3,000-fold range in ploidy variation that corresponds to a similar range in cell size (Bresler and

41

Fishelson 2003), suggesting that polyploidy is associated with large cell sizes in this of life as well. Ploidy levels are reduced in during reproduction E. fishelsoni when numerous daughter cells are produced simultaneously in the mother cell (Angert

2005).

2.4 Intraspecific variation in DNA content

Genomes also vary dramatically among individuals within species of diverse eukaryotic lineages (♦ Fig. 2.1). The portion of the genome that varies ranges from polyploidization of the entire genome to insertions and deletions of megabase stretches of genomic DNA (Fig. 2.4). Such variation contrasts markedly with the SNP variants that are the focus of many current studies of genomic variation within species.

Substantial levels of among individual variation in DNA content have recently been found in humans, where the phenomenon is called copy number variation (CNV;

Freeman et al. 2006; Redon et al. 2006). Redon, (2006) found insertions and deletions of kilobase to megabase segments of DNA that lead to polymorphisms in the presence/absence of chromosome regions and genes contained within them. The scale of this variation is enormous; a survey of 270 individuals found that 360 MB (13% of the genome) varied (Redon et al. 2006), presenting a marked contrast to the genomic conservation symbolized by the 99% similarity in orthologous sequences between humans and chimps (Mikkelsen et al. 2005). Geneticists are struggling to conceptualize the species genome and redefine ‘normal’ in this context as they search for the disease implications of this variability (Kehrer-Sawatzki 2007).

42

The ciliate macronucleus displays intraspecific variability on the level of whole chromosomes. The macronucleus is inherited by amitosis during asexual reproduction, an imprecise mechanism resembling binary fission or budding in which chromosomes are replicated and distributed to daughter nuclei without a mitotic spindle. Amitosis can lead to differential inheritance of alleles or paralogs (Robinson and Katz 2007) and contribute to overall elevated rates of protein evolution seen in ciliates (Katz et al. 2004; Zufall and

Katz 2007). In both cases, epigenetic mechanisms likely play a role in regulating genome dynamics. Hence, ciliates within a population may have identical, or very similar, germline nuclei while their somatic nuclei can vary in the presence/absence of chromosomes.

Populations of Entamoeba (Fig 2.2E), the causative agent of amoebic dysentery in humans (Stauffer and Ravdin 2003), demonstrate heterogeneity in nuclear ploidy due to varying levels of endomitosis. Entamoeba alternates between infective resting cysts with four haploid nuclei and metabolically active trophozoites with one or more nuclei.

DNA levels within a population of trophozoites exhibit continuous variation from 4N to

40N, and this variation is present both within individuals and among the nuclei of separate individuals (Lohia 2003). Populations can be synchronized to 4N by starvation, but achieve the same 10-fold range of variation within two hours of addition of serum (Lohia 2003).

The Giardia (Fig. 2.2F), a causative agent of diarrhea in humans, is an anaerobic flagellate with a hypervariable karyotype whereby the number and lengths of chromosomes vary among isolates (Adam 2000; Hou et al. 1995). Though Giardia is a putative asexual lineage it has a ploidy cycle due to endoreplication and reduction

43

(Bernander et al. 2001). Analysis of Giardia chromosomes by pulse field gel electrophoresis reveals several size variants for each chromosome and there is no fixed karyotype for this species (Le Blancq and Adam 1998). While the core portions of the chromosomes appear to be stable, there is considerable variation in the subtelomeric region (Adam 2000; Le Blancq and Adam 1998). Heterogeneity in karyotypes contrasts starkly with the almost complete lack of nucleotide heterogeneity in protein-coding genes between strains (Lasek-Nesselquist et al. 2009; Morrison et al. 2007; Teodorovic et al.

2007). As Giardia is asexual, one would expect large amounts of allelic variation to accumulate.

Intraspecific DNA sequence variation has also been found in Arbuscular

Mycorrhizal Fungi (AMF), which supply essential nutrients to plant roots (Smith and

Read 1997). Vegetative AMF cells contain numerous nuclei, as do fungi in general; however AMF are unusual in that hundreds to thousands of nuclei appear to be transferred to each spore (Pawlowska 2005). Within individual and within spore genetic variation is documented for rDNA and protein coding genes in several species of AMF

(reviewed in Pawlowska 2005). These results contradict the expected clonal population structure and haploid genome expected for fungi considered ancient asexuals (Judson and

Normark 1996). It remains to be seen whether this variation is harbored in each nuclei as either duplicated genes or polyploid genomes (Pawlowska and Taylor 2004) or in genetically distinct nuclei (each presumably haploid) that are passed to each spore (Hijri and Sanders 2005; Kuhn et al. 2001).

44

2.5 Synthesis

We demonstrate that genomes are dynamic within and between eukaryotic lineages. This dynamism elaborates the generalized ‘textbook’ view of nuclear cycles

(Fig. 2.3A-C) through addition of ploidy cycles, modification of genome content during development or in response to stress, and/or generation of individual variation in karyotypes (Fig. 2.3D-I and 2.4). For example, cyclic changes in ploidy occur during the life cycle in many ‘large’ organisms, such as Phaeodarea, Foraminifera, and large prokaryotes. Ploidy cycles may be advantageous as they allow both reproduction via haploid gametes or spores, which is suggested to reduce the mutational load (Kondrashov

1997), and the high levels of DNA that are correlated with large vegetative cells

(Kondorosi et al. 2000; Mortimer 1958). These nuclear features are broadly distributed

(Fig. 2.1) and likely involve shared mechanisms. The observed variation in genome content is underlain by conservation in at least some of the molecular machinery regulating meiosis and mitosis—two basic components of the nuclear cycle—across the diversity of eukaryotes (Nurse 1990; Ramesh et al. 2005). Hence, the broad phylogenetic distribution of ploidy cycles suggest that mode of genome propagation evolves quickly, but within the constraints of the conserved regulatory framework that was likely present in the last common ancestor of eukaryotes. The limited data on ploidy levels in prokaryotes hint that the molecular machinery may be even more ancient.

Data on the population level variation in DNA content are changing further perceptions on the nature of the eukaryotic genome. Recognized cases of intraspecific genomic heterogeneity are already widespread in eukaryotes (Fig. 2.4), with examples coming from genome array studies on humans and limited data on microbial eukaryotes

45

such as Giardia and Entamoeba. Intraspecific genome variability is likely a broad phenomenon that awaits detection in many more groups, and may be a common source of phenotypic variation.

Elucidating the scope of genome dynamics is essential for understanding the relationship between genomes and phenotypes. Recent studies suggest that genome features can affect evolution by altering population genetic parameters such as effective population size (Lynch 2007) and rates of molecular evolution (Zufall et al. 2006).

Understanding the wide range of ploidy levels found in diverse eukaryotes may also shed light on the intolerance of vertebrate cells to such fluctuations, where departure from diploidy often leads to cancer and aneuploid defects such as trisomy 21 (Ganem et al.

2007).

Though much remains to be discovered about the mechanisms behind genome dynamics during life cycles and within species, candidate mechanisms are most likely epigenetic. Epigenetic phenomena such as methylation, acetylation and genome scanning through RNAi (e.g. Cerutti and Casas-Mollano 2006; Goldberg et al. 2007;

Mochizuki and Gorovsky 2005) may enable cells to differentiate between germline and somatic genomes, even in the context of a single nucleus. Under such scenarios, nuclei mark 'germline' genetic information for transmission to subsequent generation. Such a distinction between germline and soma may be key in enabling variation in genomes within life cycles and among individuals within populations while also maintaining integrity between generations.

46

Figure 2.1. Distribution of genomic features. Dynamic genomes are widespread across the eukaryotic tree of life. Occurrence of three metrics of genome dynamism are plotted onto our cartoon of the eukaryotic tree of life, the topology of which is derived from our interpretation of multigene genealogies (Parfrey et al. 2006; Rodríguez-Ezpeleta et al.). (p) indicates paraphyly in radiolaria and green algae. Symbols indicate that the feature is reported in at least one taxon within the lineage. ⇑: Somatic polyploidy, ↑↓: Cyclic polyploidy, ♦: Intraspecific genome variation.

47

Figure 2.2. Exemplar microbial eukaryotes. Images of microbial organisms discussed reveal morphological diversity in addition to genomic diversity described in the text. Approximate size is give for reference. (A) Uronychia (ciliate) – 150µm, (B) Ammonia (Foraminifera) – 300 µm, (C) – 300 µm, (D) Aulacantha (Phaeodarea) – 200 µm, (E) Entamoeba – 25 µm, (F) Giardia – 12 µm. All except D are images of live organisms, D is a drawing from Haeckel (1862) from the library of Kurt Stueber (http://caliban.mpiz- koeln.mpg.de/~stueber/haeckel/radiolarien). All images used with permission from micro*scope (http://starcentral.mbl.edu/microscope/portal.php).

48

Figure 2.3. Diversity of nuclear cycles. Depiction of changes in DNA content through the nuclear cycles of nine lineages of eukaryotes. Horizontal axis and colors corresponds to the nuclear cycle stage as shown in the inset diagram, while the vertical axis measure approximate DNA content within the nucleus. Gray shading represents periods of multicellularity or multinuclearity. Inset diagram in panel A is the generalized nuclear cycle as exemplified by plants. Arrows represent progression of genome through karyogamy (red), mitosis as a diploid (blue), meiosis (yellow), and mitosis as a haploid (green). Amitosis in ciliates is black. The nuclear cycle of organisms may include some or all components of the generalized nuclear cycle. A dashed arrow indicates the absence of intervening steps. Panels D and E depict the fate of the somatic genomes, therefore they are dead ends.

49

Figure 2.4. Range of intraspecific variation. Individuals within a species (and population) do not have identical genomes. Intraspecific genomes can be different from the level of a few nucleotides, to chromosomes, to ploidy of the whole genome. We plot examples of this variation discussed in the text along a gradient of variation. The number of nucleotides involved increases from left to right.

50

CHAPTER 3

BROADLY SAMPLED MULTIGENE ANALYSES YIELD A WELL-RESOLVED

EUKARYOTIC TREE OF LIFE

3.1 Abstract

An accurate reconstruction of the eukaryotic tree of life is essential to identify the innovations underlying the diversity of microbial and macroscopic (e.g., plants and animals) eukaryotes. Previous work has divided eukaryotic diversity into a small number of high-level “supergroups,” many of which receive strong support in phylogenomic analyses. However, the abundance of data in phylogenomic analyses can lead to highly supported but incorrect relationships due to systematic phylogenetic error. Furthermore, the paucity of major eukaryotic lineages (19 or fewer) included in these genomic studies may exaggerate systematic error and reduce power to evaluate hypotheses. Here, we use a taxon-rich strategy to assess eukaryotic relationships. We show that analyses emphasizing broad taxonomic sampling (up to 451 taxa representing 72 major lineages) combined with a moderate number of genes yield a well-resolved eukaryotic tree of life.

The consistency across analyses with varying numbers of taxa (88–451) and levels of missing data (17–69%) supports the accuracy of the resulting topologies. The resulting stable topology emerges without the removal of rapidly evolving genes or taxa, a practice common to phylogenomic analyses. Several major groups are stable and strongly supported in these analyses (e.g., SAR, Rhizaria, Excavata), whereas the proposed supergroup ‘Chromalveolata’ is rejected. Furthermore, extensive instability among

51

photosynthetic lineages suggests the presence of systematic biases including endosymbiotic gene transfer from symbiont (nucleus or plastid) to host. Our analyses demonstrate that stable topologies of ancient evolutionary relationships can be achieved with broad taxonomic sampling and a moderate number of genes. Finally, taxon-rich analyses such as presented here provide a method for testing the accuracy of relationships that receive high bootstrap support (BS) in phylogenomic analyses and enable placement of the multitude of lineages that lack genome scale data.

3.2 Introduction

Perspectives on the structure of the eukaryotic tree of life have shifted in the past decade as molecular analyses provide hypotheses for relationships among the approximately 75 robust lineages of eukaryotes. These lineages are defined by ultrastructural identities (Patterson 1999)—patterns of cellular and subcellular organization revealed by electron microscopy—and are strongly supported in molecular analyses (Parfrey et al. 2006; Yoon et al. 2008). Most of these lineages now fall within a small number of higher level clades, the supergroups of eukaryotes (Adl et al. 2005;

Keeling et al. 2005; Simpson and Roger 2004). Several of these clades—Opisthokonta,

Rhizaria, and Amoebozoa—are increasingly well supported by phylogenomic (Burki et al. 2008; Hampl et al. 2009; Rodríguez-Ezpeleta et al. 2007a) and phylogenetic (Parfrey et al. 2006; Pawlowski and Burki 2009), analyses, whereas support for ‘Archaeplastida’ predominantly comes from some phylogenomic studies (Rodríguez-Ezpeleta et al. 2005) or analyses of plastid genes (Parfrey et al. 2006; Yoon et al. 2002). In contrast, support for ‘Chromalveolata’ and Excavata is mixed, often dependent on the selection of taxa

52

included in analyses (Rodríguez-Ezpeleta et al. 2005). We use quotation marks throughout to note groups where uncertainties remain. Moreover, it is difficult to evaluate the overall stability of major clades of eukaryotes because phylogenomic analyses have

19 or fewer of the major lineages and hence do not sufficiently sample eukaryotic diversity (Burki et al. 2008; Hampl et al. 2009; Rodríguez-Ezpeleta et al. 2007a), whereas taxon-rich analyses with 4 or fewer genes yield topologies with poor support at deep nodes (Cavalier-Smith 2004; Parfrey et al. 2006; Yoon et al. 2008).

Estimating the relationships of the major lineages of eukaryotes is difficult because of both the ancient age of eukaryotes (1.2–1.8 billion years; Knoll et al. 2006) and complex gene histories that include heterogeneous rates of molecular evolution and paralogy (Gribaldo and Philippe 2002; Maddison 1997; Tekle et al. 2009). A further issue obscuring eukaryotic relationships is the chimeric nature of the eukaryotic genome—not all genes are vertically inherited due to lateral gene transfer (LGT) and endosymbiotic gene transfer (EGT)—that can also mislead efforts to reconstruct phylogenetic relationships (Andersson 2005; Rannala and Yang 2008; Tekle et al. 2009). This is especially true among photosynthetic lineages that comprise ‘Chromalveolata’ and

‘Archaeplastida’ where a large portion of the host genome (approximately 8–18%) is derived from the plastid through EGT (Lane and Archibald 2008; Martin et al. 2002;

Martin and Schnarrenberger 1997; Moustafa et al. 2009; Tekle et al. 2009).

There is a long-standing debate among systematists as to the relative benefits of increasing gene or taxon sampling (Cummings and Meyer 2005; Hillis et al. 2003; Rokas and Carroll 2005). Both approaches improve phylogenetic reconstruction by alleviating either stochastic or systematic phylogenetic error (e.g., Hedtke et al. 2006; Rokas and

53

Carroll 2005). Stochastic error results from too little signal in the data (e.g., single to few gene trees) to estimate relationships and results in poorly resolved trees with low support, especially at deep levels (Rokas and Carroll 2005; Swofford et al. 1996). The problems of stochastic error are amplified for deep relationships, such as relationships among major clades of eukaryotes (Roger and Hug 2006). Many researchers opt to increase the number of genes, exemplified by phylogenomic studies, which alleviates stochastic error and yields well-resolved trees that are highly supported (Burki et al. 2007; Rokas and Carroll

2005). However, analyses of many genes are still vulnerable to systematic error and often include very few lineages.

Systematic error results from biases in the data that mislead phylogenetic reconstruction, yielding incorrect sister group relationships that do not reflect historical relationships; the most well known of these is long-branch attraction (Felsenstein 1978).

Incongruence can also arise from conflicts between gene trees and species trees resulting from population genetic processes or the chimeric nature of eukaryotic genomes

(Maddison 1997; Rannala and Yang 2008). Systematic errors can be detected and eliminated by several methods that are often combined, including using more realistic models of sequence evolution (e.g., Rodríguez-Ezpeleta et al. 2007b), removing rapidly evolving genes and/or taxa that cause errors (Brinkmann et al. 2005), and by increasing taxonomic sampling (Hedtke et al. 2006; Zwickl and Hillis 2002). Increased taxon sampling has been shown to improve phylogenetic accuracy even when the additional taxa contain large amounts of missing data (Philippe et al. 2004; Wiens 2005; Wiens and

Moen 2008). In contrast, the abundance of data in phylogenomic studies can yield highly supported, but incorrect relationships caused by these systematic biases (Hedtke et al.

54

2006; Jeffroy et al. 2006; Philippe et al. 2004; Rokas and Chatzimanolis 2008). Taxon- rich analyses provide a method for testing the accuracy of relationships that receive high

BS support in phylogenomic analyses (Heath et al. 2008; Zwickl and Hillis 2002).

Here, we assess the eukaryotic tree of life by analyzing 16 genes from a broadly sampled data set that includes 451 diverse taxa from 72 lineages. We aim to overcome both stochastic and systematic phylogenetic error by assessing two measures of clade robustness: (i) statistical support (bootstrap), and (ii) the stability of clades across analyses with varying numbers of taxa and levels of missing data. We demonstrate that extensive taxon sampling coupled with selection of a modest number of well-sampled genes counteracts systematic error and correctly places many rapidly evolving lineages without the removal of genes or taxa. Furthermore, this approach enables us to place the numerous lineages that have only a few genes sequenced, and to assess support for the hypothesized clades of eukaryotes with a more inclusive sampling of diverse lineages.

3.3 Methods

3.3.1 Gene Sequencing

Ovammina opaca and Ammonia sp. T7 were collected from a salt marsh on

Cabretta Island, Georgia with assistance from Susan T. Goldstein (University of

Georgia). DNA was isolated from 60 cells each that were individually picked, washed, and purged of food items overnight using a plant DNeasy kit (Qiagen). Gromia sp.

Antarctica DNA was isolated from one cell undergoing gametogenesis and generously provided by Sam Bowser and Andrea Habura (Wadsworth Center). DNA for all other taxa was obtained from American Type Culture Collection (ATCC; Table S1,

55

supplementary data available from http://www.sysbio.oxfordjournals.org/) and accessions have been photodocumented (http://eutree.lifedesks.org/). Small subunit ribosomal DNA

(SSU-rDNA) was amplified with previously described primers (Medlin et al. 1988) and 3 additional primers were used to generate overlapping sequences from each clone

(Snoeyenbos-West et al. 2002). Hsp90 was amplified with CAC CTG ATG TCT YTN

ATH ATH AAY and CTG GCG AGA NAN RTT NAR NGG, and reamplified with nested primers TCT CTG ATC ATC AAY RCN TTY TAY and AGA GAT GTT NAR

NGG NAN RTC. Primers for actin, alpha-tubulin and beta-tubulin are from Tekle et al.

(2008). Phusion DNA Polymerase (Finnzymes Inc.), a strict proofreading enzyme, was used to amplify the genes of interest and Invitrogen Zero Blunt Topo cloning kits were used for cloning. Sequencing of cloned plasmid DNA was accomplished using vector- or gene-specific primers and the BigDye terminator kit (Applied Biosystems). Sequences were run on an ABI 3100 automated sequencer. We have fully sequenced 1–4 clones of each gene and surveyed up to 10 clones per taxon in order to detect paralogs.

Stephanopogon apogon SSU-rDNA is extremely large and we were unable to amplify it using standard methods. Instead, we amplified 3 overlapping fragments that were then combined for use in our analyses. All new sequences, including any paralogs identified, have been deposited in GenBank (GQ377645–GQ377715 and HM244866–HM244878).

Cultures of microbial eukaryotes for expressed sequence tag (EST) sequencing were obtained from ATCC or the Culture Collection of Algae and Protozoa (Table S1) and grown in Corning culture flasks according to supplier’s recommended protocols.

Cultures of Heteromita sp. were kindly provided by Linda Amaral Zettler and subsequently deposited at ATCC (ATCC PRA-74). Cultures were harvested and pooled

56

as needed to obtain approximately 2x107 cells. Cells were pelleted and messenger RNA

(mRNA) was extracted using the Qiagen Oligotex direct mRNA protocol. The resultant mRNA was quantitated by NanoDrop and/or Agilent Bioanalyzer RNA chip.

Complementary DNA was generated using the ClonTech SMART cDNA construction protocol and ligated into the Lucigen pSMART vector (Diplonema papillatum) or the

ClonTech pDNRlib vector (all others). Electrocompetent cells were transformed using the ligation products and plated on Luria broth-kanamycin agar. Clones were grown in

96-well polypropylene 2.0 mL deep well growth blocks containing 1.2 mL superbroth

(with 30µL/mL kanamycin) per well and plasmid DNA was prepared using a modified alkaline lysis procedure adapted for automation (GenomicSolutions RevPrep Orbit or

Beckman BiomekFX). Approximately 10,000 clones from each library were sequenced bidirectionally with vector primers using Sanger cycle sequencing (Applied Biosystems

BigDye Terminator chemistry). Paired reads from the same clone were trimmed using custom Perl scripts and assembled based on sequence overlap using phrap

(www.phrap.org). Clustering was done after assembly of paired reads, by TGICL (Pertea et al. 2003), and was used to group highly similar sequences that were extremely likely to be copies of the same gene. The size of a cluster thus reflects number of transcripts of a particular gene (gene copy number and expression level).

3.3.2 Data set Assembly

Taxa and genes were selected to maximize taxonomic diversity and evenness given the availability of molecular data. This strategy was used to improve phylogenetic accuracy by breaking up long branches with dense sampling across the eukaryotic tree

57

(Hillis 1998). The classifications systems of Patterson (1999) and Adl et al. (2005) were used as guides as we aimed to sample eukaryotic diversity by including representatives of as many lineages defined by ultrastructural identities as possible (Table S2). These lineages have generally proven to be robust as they are well supported in molecular analyses (e.g., Adl et al. 2005; Parfrey et al. 2006; Yoon et al. 2008), including the current study, and they represent monophyletic groups that serve as a proxy for taxonomic diversity. Our data set has representatives from 72 lineages, including 53 of the 71 lineages plus 7 of 200 unplaced genera as defined in Patterson (Patterson 1999).

Additionally, we include 3 unplaced lineages isolated more recently, Malawimonas jakobiformis (O'Kelly and Nerad 1999), Breviata anathema (Walker et al. 2006), and

ATCC strain 50646 (an isolate given the candidate name “Soginia anisocystis” that has yet to be described formally). We use an updated classification (Adl et al. 2005) to designate lineages in Amoebozoa and Rhizaria that belonged to the single unsupported clade (Ramicristate) from Patterson 1999 (Table S2). In order to maximize taxon evenness along with breadth, we chose limited but diverse members from within lineages where possible (e.g., we included 15 phylogenetically distant animals).

To maximize gene sampling for diverse taxa, we include markers historically targeted by polymerase chain reaction–based analyses (e.g., SSU-rDNA, actin, elongation factor 1α; Table S3) plus commonly sequenced ESTs (e.g., ribosomal proteins, 14-3-3;

Table S3). The comprehensively sampled SSU-rDNA and the historical markers facilitate inclusion of many additional taxa for which only these genes have been characterized

(Table S4). The minimum sequence data required for inclusion were nearly full-length

SSU-rDNA, which provided the core of information necessary for phylogenetic

58

placement with large amounts of missing data (Wiens and Moen 2008).

SSU-rDNA sequences were hand curated for target taxa by removing introns, unalignable regions, nonnuclear rDNAs, and misannotated sequences. This alignment was crucial to overall accuracy because nearly half of the target taxa are represented only by SSU-rDNA, thus several alignment and masking methods were assessed to ensure the robustness of the SSUrDNA alignment. SSU-rDNA sequences were aligned by HMMER

(Eddy 2001), version 2.1.4 with default settings, taking secondary structure into account.

HMMER used a set of previously aligned sequences to model the secondary structure of a sequence. The training alignment for building the model, consisting of all available

SSU-rDNA eukaryote sequences (as of December 2008) aligned according to their secondary structure, was downloaded from the European Ribosomal Database (Wuyts et al. 2002). An additional SSU-rDNA alignment was constructed in MAFFT 6 implemented in SeaView (Galtier et al. 1996) with the E-INS-i algorithm (Katoh and Toh

2008). Both alignments were further edited manually in MacClade v4.08 (Maddison and

Maddison 2005)To assess the effect of rate heterogeneity on the SSU-rDNA topologies, we partitioned the data matrices into 8 rate classes using the general time-reversible

(GTR) model with invariable sites and rate variation among sites following a discrete gamma distribution, as implemented in HyPhy version .99b package (Kosakovsky Pond et al. 2005). We then ran analyses without the fastest and two fastest rate classes, resulting in 1197 and 1019 characters, respectively. However, the reduced data sets resulted in less resolution in the backbone without improving apparent the long-branch attraction. Thus, we used the alignment generated in MAFFT and masked with GBlocks

(Talavera and Castresana 2007) and by eye in MacClade, resulting in 867 unambiguously

59

aligned characters.

Assembly of the protein data set relied on a custom built pipeline and database that combined Perl and Python scripts to identify homologs from diverse eukaryotes. Our goal in developing this pipeline was to ensure that we captured the broadest possible set of sequences given the tremendous heterogeneity among microbial eukaryotes. All available protein and EST data from our target taxa (Table S4) were downloaded from

GenBank in January 2009 and ESTs were analyzed in all 6 translated frames to identify correct sequences for our alignment. A fasta file of 6 sequences representing the six

“supergroups” was created for each target gene and used to query our database of target taxa by BLASTp. Results were limited by length, e-value, and identity, and all sequences with greater than 1% divergence within each taxon were retained for assessment of paralogy. The resulting sequences were aligned with ClustalW (Thompson et al. 1994) and the resulting single gene alignments were assessed by eye to remove nonhomologous sequences.

The inferred amino acid sequences for each of the protein genes from our data pipeline were combined with the new sequences generated for this study and again aligned in Clustal W (Thompson et al. 1994). The alignment was adjusted by eye in

MacClade (Maddison and Maddison 2005). As these alignments included all paralogs extracted from the pipeline, individual gene trees were examined to choose appropriate orthologs. For example, in cases where paralogs formed a monophyletic group, the shortest branch sequence was retained. When paralogs fell into multiple locations on the tree, we aimed to maintain orthologous groups that included the greatest taxonomic representation. The individual gene alignments were then concatenated to build a 16

60

gene, 451-taxon matrix with 6578 unambiguously aligned characters, including SSU- rDNA. All other data sets were constructed by removing taxa and/or genes from this matrix. All data matrices are available at TreeBASE (submission ID S10562).

3.3.3 Creation of Subdata Matrices

We created an array of data matrices by subsampling our full data matrix of 16 genes (15 protein-coding genes plus SSU-rDNA) and 451 taxa (denoted all:16) in order to assess the impact of taxon sampling, missing data, and gene sampling. First, seven data sets were created to assess the impact of missing data and taxon sampling (summarized in

Table 3.1). The least inclusive of these contained 16 genes and all 88 taxa that had at least 10 of the 16 genes (10:16), which resulted in 17% missing data. Similarly, the 6:16 and 4:16 matrices include all taxa with at least 6 and 4 of the targeted 16 genes, respectively. SSU-rDNA is ubiquitously sampled in our data set and many phylogenetic hypotheses are based on SSU-rDNA genealogies. To address the concern that SSU-rDNA was driving our results, we deleted it from each of the 16 gene data sets resulting in 9:15,

5:15, 3:15, and all:15 matrices.

To assess the relative importance of gene versus taxon sampling, we compared our full analysis to data sets with taxon sampling based a recent phylogenomic analysis

(Hampl et al. 2009) and phylogenetic analysis (Yoon et al. 2008). We also analyzed a data set of the 4 genes used by Yoon et al. 2008 (actin, alpha tubulin, beta tubulin, and

SSU-rDNA) with our taxon sampling (Table S5; all:4 gene). Although a thorough test of the impact of gene sampling would require a large number of analyses of data sets with

61

genes systematically deleted, we feel that this approach provides insight into the contributions of genes and taxa.

Photosynthetic lineages have chimeric genomes that are composed of genes originating both from the host eukaryote, the endosymbiotic plastid (through EGT), and, in cases of secondary or greater endosymbiosis, from the symbiont nucleus. If genes of multiple origins were retained in our concatenated data set, the resulting conflicting signal between host, symbiont, and plastid could mislead phylogenetic reconstruction.

This chimerism may contribute to the instability observed for photosynthetic lineages without clear sister groups (red algae, green algae, glaucocystophytes, , and haptophytes). Thus, we used 2 methods to detect discordance among loci that could indicate EGT. First, the 16 genes from representatives of each of these photosynthetic lineages were analyzed by top BLASTp hit. We scored the first 2 lineages hit, with red algae, green algae, plants, or glaucophytes taken as evidence for EGT. Nine genes showed some evidence of EGT, and these were removed to create non-EGT data sets

(5:non-EGT and 3:non-EGT; Table S6). The second approach was to use Concaterpillar to identify protein-coding genes with discordant histories (Leigh et al. 2008), which could be caused by EGT or LGT. Repeated runs yielded different results, indicating an absence of supported discordances. Nevertheless, we analyzed several gene sets identified by

Concaterpillar as concordant, including (i) the largest set of concordant genes plus SSU- rDNA (3: cater 7 gene; Table S6), (ii) a 13-gene data set that excluded the 3 genes that were not concordant with any others (5: cater 13 gene; Table S6). To target discordance caused by EGT, we ran Concaterpillar on photosynthetic lineages alone and analyzed the largest concordant gene set (5: cater 9 gene; Table S6).

62

3.3.4 Phylogenetic Analyses

Genealogies for this study were constructed almost exclusively in RaxML. The

MPI version of RaxML 7.0.4 with rapid bootstrapping was used (Stamatakis et al. 2008).

The SSU-rDNA partition was analyzed with GTR+gamma as this was the best fitting model available in RAxML, according to MrModelTest (Nylander 2004). ProtTest

(Abascal et al. 2005) was used to select the appropriate model of sequence evolution for the amino acid data using the 9:15 data set. The WAG amino acid replacement matrix was found to be the best-fitting model for the concatenated data, but the rtREV amino acid replacement matrix was the best for some of the individual partitions and both WAG and rtREV were among the top 3 models for all but 1 gene (and with similar likelihood scores). We ran our data under both WAG and rtREV models and found consistent results, indicating that our interpretations are robust to at least this level of model choice.

The results presented are from the WAG analyses and the rtREV analyses differed only in level of BS for key nodes (usually ±5 points). In initial analyses, the appropriate number of independent bootstrap replicates was determined for each data set using bootstopping criteria in RAxML 7.0.4 as implemented on Cyberinfrastructure for

Phylogenetic Research (CIPRES) portal 2 (Miller et al. 2009). All analyses stopped after

200 or fewer replicates, except all:16, which stopped after 400 replicates. In later analyses, using the MPI version of RAxML, which does not implement a bootstopping criterion, 200 rapid bootstrap replicates followed by a full maximum-likelihood search was used for all analyses except all:16, for which 600 bootstrap replicates were run.

Because of the computational cost of the all:16 analysis, this was run as 6 separate analyses: 100 bootstraps followed by a full maximum likelihood search and 5 other runs

63

of 100 bootstraps each. These data were combined in RAxML to complete the analysis.

We found no significant difference in comparisons between fast and slow RAxML bootstrap methods (Fig. S1i), which we tested because the fast bootstrapping method in

RAxML can produce misleading results particularly for long-branch taxa (Leigh 2008).

The results of rapid bootstrapping are shown.

To investigate the stability of our tree topology under different analytic methods, select data sets were analyzed with Bayesian approaches and Parsimony (Fig. S1s–v).

Parsimony analysis of 10:16, implemented in Paup* (Swofford 2002), yielded a less resolved version of the RAxML topology (i.e. Excavata as a polytomy) that is generally concordant with the more resolved tree obtained by maximum-likelihood methods. The one exception was the misplacement of some rapidly evolving lineages (including

Giardia, Microsporidia, Foraminifera, and Entamoeba). PhyloBayes was run on the 9:15 data set using the CAT model with recoded amino acids. The amino acids were recoded using the Dayhoff (6) model, based on the chemical properties of the amino acids.

PhyloBayes was stopped after building 2 chains of > 13,000 trees with a maxdiff of 0.26, which indicates weak convergence, but that the chains disagreed on at least one clade

26% of the time. A burn-in of 100 trees was removed and the posterior probabilities were calculated after sampling every other tree. The topology of the consensus tree is consistent with, though less well resolved than the results from RAxML. The parallel version of MrBayes 3.1.4 was used to analyze the 10:16 data matrix using the GTR+I+γ

(for nucleotide partition) and WAG (for amino acid partition) models of sequence evolution (Ronquist and Huelsenbeck 2003). Six simultaneous MCMCMC chains were run for 5,600,000 generations, sampling every 1000 generations. An average standard

64

deviation of split frequencies of <0.1 indicated weak convergence. Stationarity was determined by plotting the maximum-likelihood values of the 2 runs, and 10,756 trees were retained. The resulting topology is the same as shown in Fig. 2, except that Breviata nests within Amoebozoa sister to Mastigamoeba + Entamoeba. Most nodes are strongly supported: posterior probability equals 1.00 for Amoebozoa, Opisthokonta, Rhizaria, and

SAR, and 0.66 for Excavata and “Unikonta.”

3.3.5 Topology Testing

We performed the approximately unbiased (AU) test (Swofford 2002) as well as the more conventional Kishono-Hasegawa and Shimodaira-Hasegawa tests, as implemented in Consel 0.1j (Shimodaira and Hasegawa 2001) to test the monophyly of

“Chromalveolata,” “Archaeplastida,” and “Chromista.” The most likely trees with these groups constrained to be monophyletic were built, and the site likelihood values for each constrained topology and the unconstrained topology were estimated using RAxML 7.0.4

(Table S7). In addition, we explored in Paup* v4.08b (Swofford 2002) the number of

Bayesian trees that were consistent with these hypotheses (Table S7).

3.4 Results and Discussion

3.4.1 Robust Topology of the Eukaryotic Tree of Life

Many major clades were consistently recovered across our analyses (Fig. 3.1 and

Table 3.1). These stable groups receive moderate to strong support in analyses with limited missing data (Fig. 3.2) and less support as missing data increases. The

Opisthokonta, which includes animals and fungi, and the heterogeneous clade Rhizaria

65

are recovered in all analyses with strong support (Fig. 3.1 and Table 3.1). Excavata are recovered in all analyses with moderate support (Fig. 3.1 and Table 3.1). Amoebozoa receives low to moderate support in all but our most inclusive analysis (all:16) where most members form a clade with the exception of Mastigamoebidae + Entamoeba that form a separate clade with Breviata, Diphylleia and Centroheliozoa (Fig. 3.1 and Table

3.1). Both Rhizaria and Amoebozoa are heterogeneous assemblages of organisms with diverse body plans (Pawlowski and Burki 2009; Tekle et al. 2009) that were created based on molecular analyses (Parfrey et al. 2006). There are no defining morphological features or molecular signatures for Rhizaria, which now encompasses nearly 30 of the

75 lineages with ultrastructural identities (Pawlowski and Burki 2009). Excavata was hypothesized in part on the basis of ultrastructural characters associated with the ventral feeding groove (Simpson 2003), but is generally polyphyletic in phylogenetic (Parfrey et al. 2006; Simpson et al. 2006) and phylogenomic analyses unless rapidly evolving taxa and characters are removed from the analyses (Hampl et al. 2009; Rodríguez-Ezpeleta et al. 2007a). We also find strong support for the clade of stramenopiles, alveolates, plus

Rhizaria (SAR; (Burki et al. 2007; Burki et al. 2008; Hackett et al. 2007)) and a sister relationship between stramenopiles and Rhizaria (Fig. 3.2 and Table 3.1). This latter finding is at odds with many phylogenomic analyses (Burki et al. 2008; Hampl et al.

2009; Rodríguez-Ezpeleta et al. 2007a) that find stramenopiles and alveolates are sister to one another.

In contrast, the relationships among photosynthetic lineages and the position of most orphan lineages (e.g., Breviata and Centroheliozoa) remain unresolved, as discussed below. Furthermore, the root of the eukaryotic tree of life has been hypothesized to be

66

between a clade containing Amoebozoa and Opisthokonta (‘Unikonta’) and all remaining eukaryotes (Stechmann and Cavalier-Smith 2003), although there is conflict among evidence (reviewed in (Roger and Simpson 2009; Tekle et al. 2009)). In our analyses, we find at best moderate support for ‘Unikonta’ (Table 3.1), but concatenated analyses such as these cannot resolve the root.

In exploring the tradeoffs between increasing taxonomic sampling and decreasing missing data, we analyzed varying combinations of genes and taxa using almost exclusively a maximum-likelihood approach implemented in the software RAxML 7.0.4

(Stamatakis et al. 2005). Node support was highest when we included taxa with 10 or more of our targeted 16 genes (10:16, with 17% missing data and 88 taxa; Fig. 3.2 and

Table 3.1). As taxa are added, node support decreases (Table 3.1, BS in Fig. 3.1) due to the diminishing amount of character data available to estimate a growing number of relationships (i.e., 211 of 451 taxa are represented by SSU-rDNA only). Put another way, stochastic error increases with increasing missing data because the signal-to-noise ratio is decreasing. The mosaic structure of missing data in phylogenomic studies using ESTs is known to decrease phylogenetic accuracy (Hartmann and Vision 2008). However, Wiens and Moen (2008) found that taxa with large amounts of missing data (up to 90%) could be accurately placed so long as there is a shared core of informative data. The ubiquitous

SSU-rDNA plus a few well-sampled protein genes likely provide such a core of informativeness in this study.

In addition to allowing assessment of the phylogenetic diversity of eukaryotes, a strength of this taxon rich analysis is that it enables us to assess clade stability by comparing tree topologies across analyses that vary in numbers of taxa and genes

67

included. Much of the topology remains consistent across all analyses: supported clades

(Table 3.1) and most clades with ultrastructural identities (bold lineages Fig. 3.1; Table

S2) are recovered regardless of the number of genes/level of missing data included. We argue that this is strong evidence that these clades are accurately reconstructed—they reflect true relationships. The ability to accurately place so many lineages that are represented only by SSU-rDNA demonstrates the robustness of these analyses.

We tested the hypothesis that SSU-rDNA was driving our results, as this gene is ubiquitously sampled but is not present in phylogenomic analyses. However, the 15- protein data sets yielded similar topologies that were again robust to varying taxonomic representation (Table 3.1). We also looked for supported incongruences among loci using

Concaterpillar (Leigh et al. 2008) on the 15 protein-coding genes. Repeated runs yielded varying gene sets, suggesting there are no well-supported incongruences. Analyses of several of these gene sets yielded a topology consistent with that depicted in Figures 3.1 and 3.2, although support was low in analyses with few genes (Table S6). Here again, the placement of photosynthetic lineages was unstable, suggesting that they may be responsible for discordance among loci.

We also assessed the extent to which choice of these particular 16 genes versus the breadth of our taxon sampling impacted the generation of stable topologies by comparing with previously published studies. Using our 16 genes and a taxon set comparable with Hampl (2009) that included only 48 taxa representing 19 lineages, we generated a highly supported tree similar to what we find using broader taxon sampling

(Table S5). Indeed, with our 16 genes and this Hampl-like data set, we recover monophyletic Excavata with 82% BS, whereas this clade is only monophyletic after

68

removal of rapidly evolving lineages in the phylogenomic analysis (Hampl et al. 2009).

In contrast, using the broader taxon set of Yoon (2008) generates a topology that is less well supported at many nodes, and Excavata is polyphyletic (Table S5). Finally, using all our taxa and the 4 genes from Yoon (2008), generates poorly supported topologies (Table

S5). Together, these analyses demonstrate that it is an interaction of gene choice and taxon sampling that yields well-resolved trees.

The ability of our taxon-rich approach to place lineages known to be problematic for phylogenetic reconstruction into correct territories, including Microsporidia, Giardia and ciliates (e.g., Hampl et al. 2009; Hirt et al. 1999; Yoon et al. 2008; Zufall et al. 2006), is a testament to the role of sufficient gene and taxon sampling in accurately reconstructing relationships. Other analyses with fewer taxa and/or genes routinely remove rapidly evolving taxa and/or sites so that these clades “behave” (Hackett et al.

2007; Hampl et al. 2009; Rodríguez-Ezpeleta et al. 2007a). However, removal of taxa weakens the credibility of the process and support for taxonomic hypotheses while also decreasing the power of interpretation of the resulting phylogenetic trees (Hillis 1998).

3.4.2 Orphan Lineages

Our taxon-rich analyses enable inclusion of numerous unplaced lineages that have only limited molecular data. Some of these remain orphans (i.e., without clear sister taxa) including Breviata, Centroheliozoa, Ancyromonas, and Micronuclearia, as their position is unstable and support values are very low (Table S8). These taxa may be either independent lineages or their sister taxa may not yet be sequenced. Consistent with other analyses, we find support for the sister relationships of with

69

Opisthokonta (Cavalier-Smith and Chao 2003b), and the nonphotosynthetic kathablepharids with cryptomonads (Okamoto and Inouye 2005). is consistently basal to green algae (including plants), albeit with low support (Table S8), which is in contrast to the hypothesis that this lineage is sister to cryptomonads

(Shalchian-Tabrizi et al. 2006). Several unplaced lineages represented only by SSU- rDNA are placed within robust groups, but often on long branches and with low support

(Paramyxea, Mikrocytos; Table S8). We believe that their placement is artifactual, either due to long-branch attraction or the lack of a sequenced sister lineage. In support of this hypothesis, these taxa also bounce around in analyses of SSU-rDNA alone with and without rapidly evolving sites (as described in Methods section).

3.4.3 Photosynthetic Lineages

Our analyses do not resolve the placement of many lineages with photosynthetic ancestry including the green algae, red algae (rhodophytes), glaucocystophytes, haptophytes, and cryptomonads. Notably, there is no support in any analysis for

‘Archaeplastida’ (‘Plantae’) or ‘Chromalveolata’ (Tables 3.1 and S6) or the nested hypothesis ‘Chromista’ (stramenopiles, cryptomonads, and haptophytes). These hypothesized clades rest on the assertion that plastid acquisition is a rare event, happening once in the ‘Archaeplastida’ (primary acquisition of a cyanobacterium in the ancestor of red algae, green algae and glaucocystophtes; (Cavalier-Smith 1981) and once in ‘Chromalveolata’ (secondary acquisition of a red algal plastid in the ancestor of stramenopiles, alveolates, haptophytes, and cryptomonads; (Cavalier-Smith 1999). We hypothesize that the lack of resolution among the photosynthetic lineages (e.g.,

70

cryptomonads, haptophytes, glaucocystophytes, red algae, and green algae) is due to conflicting signal following endosymbiotic gene transfer from plastid genomes or from the nuclei of secondary (or tertiary) eukaryotic (Lane and Archibald

2008; Martin and Schnarrenberger 1997; Tekle et al. 2009). We discuss this hypothesis and alternatives below.

Our analyses, like many others (Cavalier-Smith 2004; Kim and Graham 2008;

Parfrey et al. 2006; Rodríguez-Ezpeleta et al. 2007a) find polyphyletic ‘Chromalveolata’ and thus falsify the chromalveolate hypothesis as it was originally proposed.

Furthermore, ‘Chromalveolata’ and the nested hypothesis ‘Chromista’ (stramenopiles, cryptomonads, and haptophytes) are rejected by the AU test (P = 0.007 and P < 0.001, respectively) and other statistical methods, and this topology was not found among the

10,756 trees in Bayesian analyses (Table S7). A single endosymbiotic event at the base of the chromalveoate lineages necessitates that the descendant lineages be monophyletic, although not everyone agrees with this interpretation (Keeling 2009). Instead, our analyses are consistent with alternative hypotheses that postulate multiple secondary endosymbioses of red algal plastids in the ancestors of ‘Chromalveolata’ (Bodył et al.

2009; Grzebyk et al. 2003; Howe et al. 2008).

Recent findings indicate that plastid acquisition is not as rare as once assumed, challenging the central tenet that plastid acquisition is much more difficult than loss. Two independent primary endosymbioses that may be first steps toward organelles have been detailed in the testate amoeba Paulinella chromatophora (Nakayama and Ishida 2009) and the Rhopalodia gibba (Kneip et al. 2008). Further, numerous secondary endosymbiotic events are also known in lineages such as , ,

71

and kathablepharids (Archibald 2009), and there is evidence for tertiary endosymbiosis in (Moustafa et al. 2009) and dinoflagellates (Archibald 2009). Thus, plastid acquisition is more common across the eukaryotic tree of life than previously believed.

The possibility that plastid acquisition may have occurred multiple times will make a stable resolution of photosynthetic lineages difficult (Bodył et al. 2009; Lane and

Archibald 2008).

As the stramenopiles and alveolates (two putative members of the

‘Chromalveolata’) form a well-supported clade including Rhizaria (SAR), we suggest it is time to abandon the chromalveolate hypothesis. Although some argue for expanding the chromalveolate concept to include Rhizaria and other heterotrophic assemblages of eukaryotes as descendants of an ancestor with a red algal symbiont (Keeling 2009), we do not think this revision is warranted due to the large number of losses and replacement of plastids that this would necessitate. Instead, multiple endosymbioses are a much more parsimonious scenario and are consistent with the monophyly of former chromalveolate lineages in analyses of plastid genes (Bodył 2005; Yoon et al. 2008; Parfrey et al. 2006).

Similarly, the mere handful of genes that are potentially of photosynthetic origin in heterotrophic lineages such as ciliates (16 genes from a total of 27,446 in the complete genome; Reyes-Prieto et al. 2008) or the basal marina (8 genes from 9876 ESTs; Slamovits and Keeling 2008) are more consistent with the “you are what you eat” hypothesis (Doolittle 1998) than the chromalveolate hypothesis.

A single primary plastid acquisition at the base of ‘Archaeplastida’ is the prevailing view (Archibald 2009; Gould et al. 2008; Keeling 2009). The Archaeplastida hypothesis is supported by many shared features of plastids and their integration into the

72

host cell, including plastid protein import machinery, conserved gene order, and metabolic pathways (Gould et al. 2008; Larkum et al. 2007; McFadden). Although analyses of few genes do not generally support ‘Archaeplastida’ (Kim and Graham 2008;

Parfrey et al. 2006), support is strong in some phylogenomic analyses (Rodríguez-

Ezpeleta et al. 2005) though see (Hampl et al. 2009). It has been suggested that 100+ genes are necessary to recover ‘Archaeplastida’ with strong support (Rodríguez-Ezpeleta et al. 2005).

The Archaeplastida hypothesis is not supported in our analyses (Tables 3.1 and S6 and Figs. 3.1 and 3.2) or those of others (Hampl et al. 2009; Kim and Graham 2008;

Parfrey et al. 2006; Yoon et al. 2008). Here, the ‘Archaeplastida’ lineages red algae, green algae, and glaucocystophytes are never monophyletic, but instead generally form a poorly supported cluster with the secondarily photosynthetic haptophytes and cryptomonads plus other nonphotosynthetic lineages (Table 3.1 and Figs. 3.1 and 3.2).

This lack of resolution is not simply a by-product of our overall approach as the same analyses yield relatively well-supported nodes for much of the rest of the tree (Table 3.1 and Figs. 3.1 and 3.2), and recover groups with ultrastructural identities with strong support, including photosynthetic lineages (e.g., green algae including land plants; Fig.

3.2). The confounding effects of EGT (from plastid or nucleus of secondary ) may explain the lack of resolution and failure to recover ‘Archaeplastida’.

Being aware of these issues, we attempted to identify conflicting signal and remove genes impacted by EGT both by inspection of individual genes using BLAST analyses and by assessing concordant data sets identified by Concaterpillar (Table S6 and Fig. S1m–r).

These approaches failed to yield robust placement of the problematic photosynthetic

73

lineages (Table S6). For example, we hypothesized that the secondarily photosynthetic haptophytes and cryptomonads were branching within ‘Archaeplastida’ due to EGT; however, ‘Archaeplastida’ remains polyphyletic in analyses without haptophytes and cryptomonads (Table S6). In contrast to the ‘Archaeplastida’, other lineages with photosynthetic ancestry are robustly placed in clades containing both photosynthetic and heterotrophic lineages (e.g., dinoflagellates within alveolates, diatoms within stramenopiles, and euglenids as sister to kinetoplastids). This may reflect differential timing of endosymbiotic events as ancient events will be more difficult to reconstruct than recent secondary transfers because (i) more genes in the plastid were available for transfer early and (ii) more time for subsequent confounding events will have elapsed.

Alternatively, nonmonophyly of ‘Archaeplastida’ may be reflective of the true host histories if there were multiple endosymbiotic events in the ancestors of red algae, green algae, and glaucocystophytes. Many scenarios are consistent with both the nonmonophyly of ‘Archaeplastida’ and the similarities of the plastids of these lineages

(Larkum et al. 2007; Palmer 2003; Stiller 2003). Two of these are (i) multiple primary endosymbioses of closely related followed by a convergent path of plastid reduction plus extinction of intervening cyanobacterial lineages and (ii) a single primary endosymbiosis into one lineage followed by ancient secondary endosymbioses into the remaining ‘Archaeplastida’ lineages. Such scenarios, as well as a single primary acquisition, are also consistent with the well-supported monophyly of plastid genes with respect to cyanobacteria (Rodríguez-Ezpeleta et al. 2005) plus possibly the confounding data on the divergent Rubisco genes in red and green algae (Delwiche and Palmer 1996).

Furthermore, the phylogenetic position of ‘Archaeplastida’ lineages may be difficult to

74

resolve because their sister groups have not yet been sequenced, or are extinct. The unstable position of these lineages across our analyses mimics the patterns observed in orphan lineages (Table S8) in support of this hypothesis. Under these scenarios, phylogenomic analyses that recover ‘Archaeplastida’ may be picking up misleading EGT signal of genes independently transferred from the plastid to the host nucleus of these three lineages.

We suspect that resolving relationships among photosynthetic groups will require more intensive taxon and more careful gene sampling to disentangle signals from host and symbiont genomes, coupled with the recognition that plastid genes may be derived from several sources (Larkum et al. 2007). These data, combined with methods that distinguish between conflicting phylogenetic signal (Ahmadinejad et al. 2007; Leigh et al. 2008) or gene-tree species-tree reconciliation (Akerborg et al. 2009; Wehe et al.

2008), are likely required to elucidate the history of photosynthetic lineages.

3.4.4 Relationships Within the Well-Sampled Rhizaria and Excavata

We subsampled the data set to estimate relationships within 2 diverse clades,

Excavata and Rhizaria, for which we had large numbers of taxa. We analyzed a 97-taxon data set of Rhizaria that included all lineages with previously published data plus additional multigene data for 12 taxa added for this study (Table S1). Three major clades are strongly supported, though the relationships among them are unresolved: i) Cercozoa, ii) Foraminifera plus Polycystinea and (formerly classified with Phaeodarea as radiolarians), and (iii) the parasitic Haplosporidia and Plasmodiophorida with Gromia and vampyrellids (Fig. 3.3; Bass et al. 2009). We show that Theratromyxa, a nematode

75

eating soil amoeba, is related to vampyrellid amoebae (Fig. 3.3; 100% BS), and together they are sister to the plant parasites plasmodiophorids (100% BS). The SSU-rDNA sequence for Theratromyxa is identical to an amoeba isolated from Siberia where it was identified as impatiens (EU567294; Bass et al. 2009).

The topology within the Excavata is consistent with previous hypotheses and clades with ultrastructural identities (Simpson 2003), when contaminant EST data originally mislabeled as strix are excluded (Slamovits and Keeling 2006).

Excavata is often polyphyletic in other analyses because Malawimonas branches outside the other clades of Excavata (Hampl et al. 2009; Rodríguez-Ezpeleta et al. 2007a), whereas in analyses of fewer genes Excavata members fall into 2 or 3 clades (Parfrey et al. 2006; Simpson et al. 2006). Although Malawimonas nests robustly within Excavata in our analyses, it does not have a stable sister group and may represent an independent lineage (Fig. 3.4). Our analyses confirm that (unplaced in Patterson 1999) branches within Heterolobosea (Cavalier-Smith and Nikolaev 2008; Yubuki and Leander

2008) and suggests that another enigmatic flagellate, ATCC 50646 (tentatively named

Soginia anisocystis) is a basal member of Heterolobosea.

3.5 Conclusions

The robust tree of life emerging from this study demonstrates the benefits of improved taxon sampling for reconstructing deep phylogeny as our analyses produce stable topologies that include a broad representation of eukaryotes. The current study, combined with insights from other studies referenced herein, has refined the eukaryotic tree of life from over 70 major lineages (Patterson 1999) to ~16 major groups (Fig. 3.5,

76

http://eutree.lifedesks.org/). Most significantly, we attribute the stability of major clades

(e.g., Excavata, Amoebozoa, Opisthokonta, and SAR) to broader taxonomic sampling combined with analyses of sufficient characters (16 genes or 6578 characters). In our view, inclusion of more taxa coupled with carefully chosen genes is necessary to further resolve the 16 or so major lineages of microbial eukaryotes for which sister group relationships remain uncertain.

77

Table 3.1. Support for major clades of eukaryotes in analyses containing varying levels of taxon inclusion and missing data.

Supported clades 15 10: 16 10: 16 6: 16 4: 16 all: 15 9: 15 5: 3: 15 all:

Opisthokonta 99 97 97 69 100 99 85 19 Rhizaria 100 99 94 82 100 100 47 29 SAR 97 98 63 22 100 100 32 19 Rhizaria + stramenopiles 94 94 57 26 92 96 29 18 Excavata 83 77 65 6 84 76 44 19 Amoebozoa 59 46 49 nm 68 56 44 5 ‘Unikonta' 63 39 21 nm 54 50 15 3

Weak/unsupported hypotheses

‘Archaeplastida’ nm nm nm nm nm nm nm nm ‘Chromalveloata' nm nm nm nm nm nm nm nm Cryptomonads + haptophytes 33 50 nm 29 38 56 22 25 Haptophytes + SAR nm nm 15 nm nm nm nm nm Alveolates + stramenopiles nm nm nm nm nm nm nm nm Red algae + green algae nm nm nm nm nm nm nm nm Red, Green, Glauco, Hapto, Crypt 47 32 nm 9 39 27 16 8

Dataset statistics Number of taxa 88 111 160 451 88 111 160 240 Number of lineages 26 30 45 72 26 30 45 54 % Missing data (characters) 17 25 38 69 19 28 43 59

Supported clades are stable across analyses, albeit with decreasing support as the percentage of missing data increases. Varying bootstrap support values are represented with warmer colors corresponding to higher levels of support in RAxML analyses. nm= not monophyletic. Column headings describe the data sets. For example, ‘10: 16’ includes all taxa that have at least ten of the 16 genes, with a total of 88 taxa representing 26 lineages and containing 17% missing data. The ‘all: 15’ includes the protein coding genes from all taxa and contains 59% missing data. See Table S2 for lineages and Figure S1a-h for individual trees.

78

Figure 3.1. Most likely eukaryotic tree of life reconstructed using all 451 taxa and all 16 genes (SSU-rDNA plus 15 protein genes). Major nodes in this topology are robust to analyses of subsets of taxa and genes, which include varying levels of missing data (Table 3.1). Clades in bold are monophyletic in analyses with 2 or more members except in all:15 in which taxa represented by a single gene were sometimes misplaced. Numbers in boxes represent support at key nodes in analyses with increasing amounts of missing data (10:16, 6:16, 4:16, and all:16 analyses; see Table 3.1 for more detail). Given uncertainties around the root of the eukaryotic tree of life (see text), we have chosen to draw the tree rooted with the well-supported clade

79

Opisthokonta. Dashed line indicates alternate branching pattern seen for Amoebozoa in other analyses. Long branches, indicated by //, have been reduced by half. †Preaxostyla is not monophyletic likely because of contaminating Streblomastix strix ESTs. The six lineages labeled by * represent taxa that are misplaced, probably due to LBA, listed from top to bottom with expected clade in parentheses. These are Protoopalina japonica (Stramenopiles), octopiana (Apicomplexa), Mikrocytos mackini (Haplosporidia), Centropyxis laevigata (Tubulinea), Marteilioides chungmuensis (unplaced), and spiniferum (Amoebozoa).

80

Figure 3.2. Most likely eukaryotic tree of life reconstructed with 10:16, which includes 88 taxa and 16 genes (SSU-rDNA plus 15 protein genes). Thickened lines receive >95% bootstrap support. Other notes as in Methods and Figure 3.1.

81

Figure 3.3. Maximum likelihood tree of Rhizaria reconstructed with 103 Rhizaria taxa and 16 genes. The SSU-rDNA partition was analyzed with GTR+gamma and proteins with rtREV. Thickened lines receive >80% bootstrap support in all analyses. Node support in boxes from Rhizaria:4-gene, Rhizaria:16-gene, all:16 analyses. Taxa with new data are bold. Dash lines indicate non-monophyly.

82

Figure 3.4. Maximum likelihood tree of Excavata with 75 taxa and 16 genes. The SSU-rDNA partition was analyzed with GTR+gamma and proteins with rtREV. See Figure 3.3 for other notes.

83

Figure 3.5. Summary of major findings—the evolutionary relationships among major lineages of eukaryotes. Clades have been collapsed into those that we view to be strongly supported. The many polytomies represent uncertainties that remain.

84

CHAPTER 4

GENOME DYNAMICS ARE INFLUENCED BY FOOD SOURCE IN

ALLOGROMIA LATICOLLARIS STRAIN CSH (FORAMINIFERA)

4.1 Abstract

Across the eukaryotic tree of life, genomes vary within populations and within individuals during their life cycle. Understanding intraspecific genome variation in diverse eukaryotes is key to elucidating the factors that underlie this variation. Here we characterize genome dynamics during the life cycle of Allogromia laticollaris strain

CSH, a member of the Foraminifera, using fluorescence microscopy and reveal extensive variation in nuclear size and DNA content. Both nuclear size and DNA content are tightly correlated across a 700-fold range in cell volume. In contrast to models in yeast where nuclear size is determined solely by cell size, the relationship in A. laticollaris

CSH differs according to both life cycle stage and food source. Feeding A. laticollaris

CSH a diet that includes algae results in a two fold increase in DNA content in reproductive cells compared to a diet of bacteria alone. This difference in DNA content likely corresponds to increased fecundity, as reproduction occurs through segregation of the polyploid nucleus into numerous daughter nuclei. Environmentally mediated variation in DNA content may be a widespread phenomenon, as it has been previously reported in the plant flax and the flagellate . We hypothesize that DNA content is influenced by food in other single-celled eukaryotes with ploidy cycles, and that this

85

genome dynamic may enable these eukaryotes to maximize fitness across changing environmental conditions.

4.2 Introduction

Emerging data on intraspecific genome variation challenge the traditional view that genomes are stable within species. Across the eukaryotic tree of life, genomic DNA content varies among individuals in populations (Cheeseman et al. 2009; Parfrey et al.

2008; Redon et al. 2006), and even within individuals over the course of the life cycle

(Cohen and Segal 2009; Kondrashov 1997; Parfrey et al. 2008). The most common changes in genome content within the life cycle are elevated ploidy level and differential amplification or elimination of portions of the genome (Parfrey et al. 2008; Raikov

1982). The recognition that genome content varies widely over the course of life cycles poses new challenges as we seek to elucidate the factors that regulate genome content within individuals.

Foraminifera present a unique system for studying nuclear dynamics because these single-celled organisms undergo large changes in nuclear and cell size during their life cycle. For example, some Foraminifera grow from 5 µm gametes to adults several centimeters in diameter (Goldstein 1999). Foraminifera are a diverse lineage of eukaryotes with ~10,000 extant species and a fossil record that dates back to the

Cambrian (Sen Gupta 1999). They are defined by their dynamic network of pseudopodia

(Bowser and Travis 2002) and are placed within the major eukaryotic clade Rhizaria

(Pawlowski and Burki 2009). Additionally, Foraminifera have a range of nuclear

86

features that enable the study of genome dynamics, such as genome processing, polyploidy, and multinuclearity (Arnold 1984; Goldstein 1999).

The life cycles of Foraminifera vary widely among species, but generally alternate between uninucleate and multinucleate life cycle phases. These stages are classically thought to be haploid and diploid but polyploidy is reported in some species (Arnold

1955; Arnold 1984; Goldstein 1999; Raikov 1982). All Foraminifera that have been studied, including A. laticollaris CSH, go through Zerfall (German for decay), a process of DNA elimination and degradation in the uninucleate cell prior to reproduction (Føyn

1936; Goldstein 1997). The elimination of DNA during Zerfall is likely evidence of genome processing—portions of the genome must be differentially amplified during the development of the nucleus prior to being eliminated (Parfrey and Katz 2010).

Here we assess the nature of variation in genome content in a clonal line of

Allogromia laticollaris CSH throughout its life cycle. We measure changes in two nuclear metrics, nuclear size and DNA content, to elucidate genome dynamics. Cultures were grown on two food sources, enabling assessment of the impact of environmental factors on genome variation within a constant genetic background. Allogromia laticollaris CSH is a good model system for investigating genome dynamics because its complex life cycle has been described (Lee and McEnery 1970; McEnery and Lee 1976), and this description hints at variation beyond haploidy and diploidy. We use measurements of cell size, nuclear size and DNA content to address the following hypotheses: 1) genome content remains constant throughout the life cycle and 2) variation in nuclear size and genome content are independent of food source.

87

4.3 Materials and Methods

4.3.1 Cultures

Allogromia laticollaris CSH was originally isolated as a clonal culture from Cold

Spring Harbor in 1960 (Lee et al. 1969). Allogromia laticollaris CSH has a much different life cycle than A. laticollaris described and studied by Arnold (1955), and these isolates almost certainly represent different species. Our culture was obtained from

Jeffrey Travis (University at Albany) and was grown in 75cm2 tissue culture flasks in

Erdschrieber media and transferred monthly (modified from Travis and Allen 1981).

Cultures were propagated with their native bacterial flora and the alga Isochrysis sp. on the bench top. Five bacteria-only cultures were derived from the algal fed population by isolating and washing 1 to 50 cells and growing them in 6-well dishes. Cultures could not be successfully grown on bacteria in the absence of flocculent material. Thus, bacteria-only cultures were supplemented with 0.5% wheat extract (wheat grains autoclaved in water) to facilitate bacterial growth, and to provide flocculent material for

A. laticollaris CSH. Once cultures were established, they were transferred to 75cm2 tissue culture flasks and propagated under the same conditions as algae fed cultures.

4.3.2 Fixation and DAPI Staining

Cells were fixed in four batches from 2008 and 2009 from a total of six bacteria fed populations and four algae fed populations. Cells were fixed in 1.5mL microcentrifuge tubes on ice for 30 minutes in 0.5% Triton X100 and 3% formaldehyde in PHEM buffer (10mM EGTA, 2mM MgCI2, 60mM Pipes, 25mM Hepes, pH6.9). The cells were concentrated by centrifuging for 5 minutes at 3000 rpm then washed once with

88

PBS and once with PBS with 1% BSA. Staining was done at 37°C for 1 hour in the dark in hybridization buffer (5µg/mL DAPI, 10% dextran sulfate, 0.2% BSA, 0.01% polyadenylic acid in 10xSET) according to a protocol from Joan Bernhard (Woods Hole

Oceanographic Institute). Subsequently, cells were washed twice in PBS and affixed to

Superfrost (Fisher) microscope slides with 1-2 drops of SlowFade®Gold (Invitrogen) to minimize loss of fluorescent signal.

4.3.3 Imaging and Measurement

Organisms were photomicrographed under UV light (excitation filter BP330-385 and barrier filter BA420) and transmitted light with an Olympus BX70 epifluorescent microscope and Olympus DP70 digital camera. We are mindful of the caveats introduced by measuring DNA content by DAPI staining (e.g., Kapuscinski 1995), and made every effort to standardize the staining and photodocumentation protocols. Photographs were taken with a 0.091 second exposure time one day after staining with all procedures standardized to allow comparison between experimental batches. Cross sectional area of cells and nuclei, which both approximate spheres, were measured from images by outlining the perimeter in the freely available ImageJ64 (http://rsbweb.nih.gov/ij/).

Multiple images were taken per cell as necessary so that each nucleus and the cell body were in focus for all measurements. Measurements of the cell were of the cell body, not of the outer thecate test (shell) of A. laticollaris CSH. In pilot experiments, both the cell body and the test were measured. The relationship between cell body and nuclei was the same as the relationship between test and nuclei, thus only the cell body was analyzed. In instances where juvenile cells were within the parental test, the cell body of the juvenile

89

was measured and analyzed. Gametic and zygotic nuclei do not have distinct cell bodies, thus they were excluded from further analyses. DNA content was measured as the fluorescence intensity integrated across the area of the nucleus with background fluorescence of the cell subtracted out. DAPI intensity is not correlated with batch

(ANOVA: algae p=0.34, and bacteria p=0.44).

Allogromia laticollaris CSH cells are all roughly spherical and life cycle stages are indistinguishable with light microscopy. Life cycle stages are differentiated by nuclear number: one in Agamont I and multiple in Agamont II (Lee and McEnery 1970;

McEnery and Lee 1976); Fig. 4.1). There were roughly 20 cells that could not be confidently assigned as either Agamont I or II because they either had two nuclei, appeared to be post Zerfall Agamont I cells, or cells with later stage zygotic nuclei.

These were excluded from further analyses. Microscope slides were also surveyed to search for cells in rare life cycle stages. Nuclear size and cell size were log-transformed as they did not conform to the assumptions of normality or homogeneity of variance.

Statistics were calculated in Jmp7 (SAS corporation), including regressions, ANOVA,

ANCOVA, and summary statistics. Graphs were made in Excel (Microsoft).

4.3.4 Genome Size Estimate

Thirty-three nuclei from four cells that could confidently be identified as gametic were used to calculate haploid genome size. The timing of nuclear fusion and development are staggered and occur earlier on the periphery. Thus, nuclei in the interior of cells undergoing sexual reproduction that could confidently be identified as gametes were used to generate a measure of haploid nuclei. Three cell types of known genome

90

size, Saccharomyces cerivisiae, the author’s cheek cells, and Allium cepa, were used to calculate genome size (Regression of genome size (MB) by mean DNA content: y=10435x1.69, R2=.99; Fig. 4.2). Data from these standards of known genome size were also used to assess the variability in the intensity of DAPI staining (DNA content) introduced by our DAPI staining method. The coefficient of variation for measured DNA content ranged from 0.19 for S. cerivisiae to 0.35 for cheek cells. This uncertainty was taken into account when calculating genome size by multiplying our estimate for A. laticollaris CSH by 0.35 to establish a probable range of genome size.

4.4 Results

4.4.1 Nuclear Dynamics during the Life Cycle

Several thousand A. laticollaris CSH cells were examined by fluorescence microscopy. From this sample, 610 that captured the breadth of life cycle stages were photodocumented and measured (Fig. 4.1). Our results are consistent with the life cycle outlined in McEnery and Lee 1976. The most common life cycle phase is an asexual alternation between uninucleate (Agamont I) and multinucleate adults (Agamont II; Fig.

4.1A and C). Division occurs through multiple fissions such that one parental cell gives rise to multiple daughter cells (Fig. 4.1B and D). A new test forms around each daughter cell and juvenile cells emerge from the empty parental test shortly after division. In addition, gametic nuclei can be produced during a rare sexual phase (Fig. 4.1E and F;

(McEnery and Lee 1976). Sexual reproduction in A. laticollaris CSH is apogamic and karyogamy occurs within the parental test to form zygotic nuclei. These zygotic nuclei are subsequently partitioned into Agamont II daughter cells.

91

Our data indicate that both Agamont I and II become polyploid during cell growth, in contrast to the original description of both of these life cycle stages as diploid

(McEnery and Lee 1976). We infer polyploidy because DNA content, as measured by the intensity of DAPI staining, is positively correlated with cell size for both life cycle stages (Fig. 4.3A). This suggests that DNA content increases continuously with cell growth for all cell types. Nuclear and cell size are positively correlated throughout the life cycle as well (Fig. 4.3B), demonstrating that nuclei increase continually as cells grow. This is an allometric power relationship with an exponent of 0.91 (regression: p<.001; Fig. 4.3B), thus the nucleus is growing at a slightly lower rate than the cell. The ratio of nuclear to cell area (N/C ratio) differs according to life cycle stage, such that the multiple nuclei of Agamont II take up a proportionally greater amount of cell space (N/C ratio = 0.031 ± .019) than does the single Agamont I nucleus (N/C ratio = 0.014 ± .011,

ANOVA comparing Agamont I and II: f = 193, p < .001; Fig. 4.4A and B).

4.4.2 Impact of Food Source

Surprisingly, food source has a significant impact on both nuclear size and DNA content in Allogromia laticollaris CSH. In cultures growing with algae, the nuclear to cell area relationship is nearly isometric (power exponent of 1.09 for Agamont I and 0.95 for Agamont II) and the N/C ratio remains constant across the entire range of cell area

(Fig. 4.4A). In contrast, the N/C ratio in A. laticollaris CSH fed bacteria alone is negatively correlated with cell size (p<.001; Fig. 4.4B). This negative trend corresponds to slower nuclear growth relative to cell size in bacteria fed cells (power exponent = 0.75 for Agamont I and 0.82 for Agamont II).

92

Food source also impacts the amount of nuclear DNA in A. laticollaris CSH.

Cultures growing on algae maintain the same amount of DNA per unit area (DNA/area), as measured by intensity of DAPI staining, across the range of nuclear area (e.g. regression Agamont I p = .66; Fig. 4.4C). Conversely, bacteria fed cultures show a significant decrease in the relationship between DNA/area and nuclear size (Fig. 4.4D).

The shift in this relationship is observed for both Agamont I and II cells, but is more pronounced among Agamont I cells (Fig. 4.4C and D). These results suggest that DNA is continuously synthesized under all conditions, but that the rate of DNA synthesis is much higher when A. laticollaris CSH is fed algae and bacteria.

4.4.3 Ploidy Levels and Genome Size of Allogromia laticollaris CSH

We use haploid gametic nuclei to calculate the haploid genome complement, and then estimate ploidy levels across the life cycle to reveal that ploidy extends well beyond diploid in both life cycle phases. Assuming that DNA is added only through polyploidization, ploidy levels reach 300±105N in Agamont I cells and 100±35N in

Agamont II cells (Fig. 4.5). The impact of food source on DNA content is clear when estimated ploidy levels are compared (Fig. 4.5A). Estimated ploidy levels increase at a faster rate in algae fed cells such that total nuclear DNA content is two fold higher in the largest size class of Agamont I cells (Fig. 4.5A). Gametic nuclei were also compared to other cells of known genome size, suggesting that the haploid genome size of Allogromia laticollaris CSH is 83MB ± 29MB (Fig. 4.2).

93

4.5 Discussion

Our analysis of genome dynamics in Allogromia laticollaris CSH reveals extensive variation in nuclear size and content both during the life cycle and in response to changing food source. Given these data we reject the hypotheses that 1) genome content and nuclear size are constant through the life cycle (Figs. 4.3 and 4.4) and 2) that nuclear and genome dynamics are independent of food source (Fig. 4.4). The impact of food source is striking and leads to differences in the relationship between nuclear size,

DNA content, and cell growth (Fig. 4.4). This shift is likely to be biologically significant, as it results in a significant reduction in DNA content and N/C ratio in bacteria fed cells when compared to their algae fed counterparts for the largest size class where reproduction occurs (Figs. 4.4 and 4.5).

Our findings on the nuclear dynamics in A. laticollaris CSH call into question the generality of the model of nuclear size regulation developed in fission and budding yeast.

In Schizosaccharomyces pombe and Saccharomyces cerevisiae, nuclear size is regulated exclusively by cell size and not by DNA content as demonstrated by measurements during the cell cycle and across numerous mutant lines (Jorgensen et al. 2007; Neumann and Nurse 2007). The ratio of nuclear to cell volume remains constant across all conditions, including in multinucleate mutants (Jorgensen et al. 2007; Neumann and

Nurse 2007). The nuclear dynamics of A. laticollaris CSH depart from the yeast model in two ways. First, the cell to nucleus ratio differs according to life cycle stage; with the combined nuclei of the multinucleate Agamont II cells taking up a significantly larger proportion of the cell area (Fig. 4.4A and B). Second, nuclear size does depend in part on

94

DNA content in A. laticollaris CSH as seen in the concomitant decrease of both DNA content and N/C ratio with increasing cell size in bacteria fed cells (Fig. 4.4B and D).

Our results also enable us to reinterpret the nuclear cycle of Allogromia laticollaris CSH. We show that DNA content increases continuously during cell growth in both life cycle stages (Fig. 4.3) and this increase is expected to be primarily due to polyploidization. The argument for polyploidy is strongest for Agamont I cells. Here, the numerous daughter nuclei generated from a single nucleus without detected DNA synthesis (McEnery and Lee 1976) imply that the Agamont I nucleus already contained many genome copies. A portion of the elevated DNA content in Agamont I nuclei may be due to differential amplification, with amplified loci subsequently eliminated during

Zerfall. Zerfall reportedly occurs just at one stage in the life cycle of A. laticollaris CSH

(Agamont I: McEnery and Lee 1976) and in other Foraminifera (Gamont: Føyn 1936;

Goldstein 1997), implying that differential amplification occurs only in the uninucleate stage. Consistent with this hypothesis, we see a slight, but significant, decrease in

DNA/area as nuclei grow in Agamont II nuclei but not in Agamont I. It is also possible that that differential amplification of specific loci occurs at multiple points in the life cycle, a hypothesis that will be tested in future studies using methods such as quantitative

PCR.

The disparity in the intensity of DAPI staining according to food source is probably due to varying levels of polyploidization, differential amplification, or both.

Alternatively, it is possible that the observed shift in DAPI intensity reflects changes in genome architecture rather than DNA amount as DAPI binds preferentially to AT-rich regions of heterochromatin. In our view, it is unlikely that all of the variation observed is

95

due to changes in genome architecture given the magnitude of change in DAPI (110-fold range in DNA content across nuclei) and its specific association with changing diet.

Further, changing food source in A. laticollaris CSH changes the nature of the relationship between DNA/area and nuclear size rather than simply shifting the curve to a higher intercept (Fig. 4.4C and D). In algae fed cells overall DNA content increases isometrically to maintain constant DNA/area (Fig. 4.4C), where as DNA content increases at a lower rate in bacteria fed cells, resulting in a negative relationship between

DNA/area and nuclear size (Fig. 4.4D). This difference can be visualized in the comparison of estimated ploidy levels for Agamont I cells (Fig. 4.5A). Thus, different food sources appear to alter the rate of DNA synthesis over the course of cell growth.

We predict that A. laticollaris CSH cultures growing on algae reach a higher ploidy level in reproductive cells, which will in turn lead to higher fecundity compared to bacterial fed cells. McEnery et al. (1976) previously demonstrated that bacteria alone are a poor food source for A. laticollaris CSH as cultures fed only bacteria either died or grew more slowly than algae fed cultures. We were able to grow cultures on bacteria with the addition of flocculent material in the form of autoclaved wheat grains. The reduction in DNA synthesis observed here in bacteria fed cultures suggests that bacteria alone are a poor food source. In large cells, where reproduction occurs (McEnery and

Lee 1976), algal fed Agamont I cells have twice as much DNA as their counterparts fed bacteria (Fig. 4.5A). We hypothesize that this difference results in twice as many offspring produced by multiple fission (Fig. 4.5B), and that the ability to alter ploidy levels in response to food quality may enable A. laticollaris CSH to maximize its fitness across a range of environmental conditions.

96

Alternatively, the observed difference in nuclear dynamics may signify increased physiological demands associated with preying upon algae as of algae require the construction of complex vacuolar system (Bowser et al. 1985). The differences in the relationship between DNA/area and nuclear size in algae fed versus bacteria fed cells may result from changes in patterns of differential amplification.

Varying DNA content in response to environmental conditions may be a widespread mechanism for maximizing fitness in a changing landscape. Numerous lineages across the eukaryotic tree of life go through ploidy cycles, during which they can reach 1000 genome copies or higher (Parfrey and Katz 2010; Parfrey et al. 2008; Raikov

1982). These genome copies are then segregated into haploid or diploid daughter cells, gametes and spores respectively. It is plausible that the degree of polyploidy varies with environmental factors such that when food is plentiful organisms attain a higher ploidy level and subsequently produce more offspring. This may be analogous to animals and plants allocating more resources towards reproduction in favorable conditions (e.g.,

Eldridge and Krapu 1988; Jorgensen et al. 2008; Stutzman 1995; van Huis et al. 2008;

Vanni and Lampert 1992).

The unexpected influence of food source on nuclear size and DNA content in A. laticollaris CSH may foreshadow additional complexity yet to be uncovered in the nuclear dynamics of diverse eukaryotes. Changes in DNA content in response to the environment are well established in the plant flax (Linum usitatissimum) and distantly related unicellular alga Euglena. Flax loses 15% of its DNA repeatably when grown under stressful conditions, and then maintains this lower genomic content during subsequent generations (Cullis 2005). In Euglena gracilis, pH differences in culturing

97

media cause DNA content to vary by 50% (Cook 1981). This variation may be due changes in ploidy level (Raikov 1982). Further, in several clades of ciliates, starvation causes a reduction in macronuclear DNA amounts (Adl and Berger 1997; Cullis 1973a;

Raikov 1982). These examples demonstrate that DNA content can change in response to the environment, and the repeatability of these changes suggests that the ability to shift genome content is heritable. The broad distribution of dynamic genome processes suggests that mechanisms enabling genome plasticity arose in the ancestor of eukaryotes

(Parfrey and Katz 2010; Parfrey et al. 2008).

98

Figure 4.1. Nuclear dynamics during the life cycle of Allogromia laticollaris CSH. DAPI stained individuals representing all life cycle stages observed. (A) Agamont I, (B) Juvenile Agamont II within parental test, (C) Agamont II, (D) Many juvenile Agamont I cells within parental test, (E) cell filled with gametes, (F) Cell filled with zygotes. All scale bars 50 µm. Redrawn from (McEnery and Lee 1976).

99

Figure 4.2. Genome size in Allogromia laticollaris CSH is estimated to be 83 ± 29 MB. This was calculated from a regression of known genome size on the intensity of DAPI staining for Saccharomyces ceriviseae, Allium cepa, and human cells along the A. laticollaris CSH gametes. This figure demonstrates the linear relationship between genome size and intensity of DAPI staining.

100

Figure 4.3. Significant positive correlation between nuclear size and DNA content with cell size throughout life cycle. N= 610; includes all cells. (A) DNA content as measured by intensity of DNA staining across the entire nucleus are tightly correlated. (B) Nuclear and cell area are related by a power function. There is no difference in this relationship between Agamont I and II (ANCOVA for difference in slope: p = 0.37).

101

Figure 4.4. Food source has a significant impact on nuclear dynamics and DNA content in Allogromia laticollaris CSH. (A and B) Scatter plots of N/C ratio versus cell area demonstrate that N/C ratio is negatively correlated with cell size in A. laticollaris CSH fed bacteria, while there is a constant relationship for cells fed algae. (A) Algae fed cells: N/C ratio remains constant as cell size increases. (B) Bacteria fed cells: N/C ratio decreases significantly with increasing cell size. (C and D) Scatter plots of DNA content per unit area (as measured by DAPI intensity per pixel) versus nuclear size of individual nuclei reveals that A. laticollaris CSH feeding on algae produce more DNA than cells grown on bacteria. The regression slopes of C are significantly different than D (ANCOVA difference in slope p < .001). (C) There is a constant relationship between DNA per unit area and nuclear size in Agamont I grown on algae (N = 628). (D) DNA per unit area decreases significantly with nuclear size in bacteria fed cells (N = 1581). The slopes for Agamont I and II are not statistically different (ANCOVA difference in slope p = 0.36).

102

Figure 4.5. Food source impacts ploidy levels, and presumably fecundity of Allogromia laticollaris CSH. (A) Estimates of ploidy are much greater for A. laticollaris CSH growing on algae (N = 197, R2 = 0.63, p < .001) than cultures growing bacteria (N = 404, R2 = 0.44, p < .001) (ANCOVA, R2 = 0.55, p < .001). Ploidy calculated using gametes as a haploid baseline. (B) Cartoon depicting the hypothesis that differences in DNA content will lead to differences in number of offspring produced. Images are representative cells from algae and bacteria fed cultures that are DAPI stained, Scale bar = 50 µm.

103

CHAPTER 5

ELUCIDATING THE TIMING OF EUKARYOTIC DIVERSIFICATION

5.1 Abstract

Although the macroscopic plants, animals, and fungi are the most familiar eukaryotes, the bulk of eukaryotic diversity is microbial. Elucidating the timing and patterns of the diversification within the more than 70 microbial lineages is key to understanding the tempo and mode of eukaryotic evolution. Here, I use taxon-rich multigene data combined with fossils of diverse eukaryotes to estimate the timing of the origin of eukaryotes and divergence times of major lineages in a relaxed molecular clock framework. These analyses suggest that eukaryotes arose between 1918 and 1652 million years ago (Ma). The oceans during this time were sulfidic and characterized by nitrogen limitation, in contrast to the phosphorous limitation that prevailed during the Archean and in modern Phanerozoic oceans. Estimates here indicate that all of the major clades of eukaryotes had diverged before 1100 Ma, prior to the oxygenation of Proterozoic oceans and the establishment of modern geochemical conditions.

5.2 Introduction

The timing of the origin of eukaryotes and the tempo of eukaryotic diversification remain outstanding questions in evolutionary biology. A range of dates differing by two billion years have been proposed for the origin of eukaryotes from the fossil record and molecular clock analyses. Putative biomarkers of eukaryotes have been found in 2.7

104

billion year old rocks (Ga; Brocks et al. 1999) and fossil acritarchs—eukaryotes that cannot be attributed to modern groups—at 1800 Ma (Knoll et al. 2006). These dates for the origin of eukaryotes are called into question by molecular clock studies that place the origin of eukaryotes at 850 – 1250 Ma (Berney and Pawlowski 2006; Douzery et al.

2004), and by alternative interpretations of that place first fossil eukaryotes at 850 Ma

(Cavalier Smith 2010; Cavalier-Smith 2002).

5.2.1 Proterozoic fossil record

The record of microfossils that are unambiguously eukaryotic extends back to the

Mesoproterozoic, around 1800 Ma (Javaux et al. 2003; Knoll et al. 2006; Porter 2004).

Acritarchs from this time period are assigned to eukaryotes because of their complex ultrastructure, size, and inferred behavioral characteristics, although their affinities among eukaryotes remain obscure (Knoll et al. 2006). Fossils that are consistent with a eukaryotic origin have recently been found in much older rocks, including a 3.2 Ga formation in South Africa, although a bacterial origin cannot be ruled out (Javaux et al.

2010). The diversity and abundance of eukaryotic microfossils increase dramatically in the late Neoproterozoic (Javaux 2007; Knoll 1994; Knoll et al. 2006). Eukaryotic fossils that can be assigned to extant taxonomic groups begin to appear around 1200 Ma

(Butterfield 2000; Butterfield 2004) and become more abundant in rocks 850 Ma and younger (Knoll et al. 2006; Porter et al. 2003). The hypothesis that eukaryotes arose only

850 million years ago is based on the appearance of diverse lineages in the fossil record at this time and the rise of fully oxygenated oceans; this hypothesis discounts older

105

eukaryotic fossils and other evidence for Mesoproterozoic eukaryotes (Cavalier Smith

2010; Cavalier-Smith 2002).

5.2.2 Challenges molecular clock analyses

Molecular clock analyses are based on the observation that genetic distance is correlated absolute divergence times (for more information see Drummond et al. 2006;

Welch and Bromham 2005). However, real sequence data often violate the assumption of a strict molecular clock because rates of evolution vary across the tree and among genes (Drummond et al. 2006; Roger and Hug 2006; Welch and Bromham 2005). Many programs now implement relaxed clock methods and incorporate heterogeneity in evolutionary rates and complex models of sequence evolution (Drummond and Rambaut

2007; Rutschmann 2006; Sanderson 2003; Thorne and Kishino 2002; Yang and Rannala

2006).

Most molecular clock methods estimate divergence times on a fixed topology that is assumed to be the correct tree (e.g. Sanderson 2003; Thorne and Kishino 2002). This assumption is problematic because some unresolved nodes often remain even when overall phylogenetic relationships are robustly reconstructed. For example, although general outline of the eukaryotic tree is becoming clear, relationships among major clades remain uncertain and poorly supported (Fig. 5.1; Parfrey et al. 2010). The program

BEAST does not require a fixed topology, but instead co-estimates phylogeny and divergence times, enabling topological uncertainty to be incorporated into molecular clock studies (Drummond et al. 2006). Co-estimating phylogeny and divergence times may even lead to more accurate gene trees because it is a more appropriate model of

106

evolution, and in any case allows unsupported clades to be detected through examination of posterior probabilities (Drummond et al. 2006).

Molecular clocks must be calibrated to relate genetic divergence to absolute time and this process generally uses information from the fossil record (Donoghue and Benton

2007). Early molecular clock methods assumed that this prior was known with 100% certainty (hard priors), an assumption that is inherently faulty (Forest 2009; Graur and

Martin 2004; Yang and Rannala 2006). Uncertainty surrounding calibration points comes from both paleontological and phylogenetic sources (Donoghue and Benton 2007;

Ho and Phillips 2009; Rutschmann et al. 2007). Major sources of error include 1) stratigraphic uncertainty as to the absolute and relative age of a fossil, 2) taxonomic uncertainty in assigning fossils to extant groups, and 3) the discrepancy between first fossil occurrence of a taxon and its evolutionary emergence, especially for lineages with low preservation potential. Current methods incorporate uncertainty by specifying calibration points as distributions rather than points (Drummond et al. 2006; Thorne and

Kishino 2002; Yang and Rannala 2006). The process of calibrating molecular clocks has also been greatly improved with the recognition that single calibration points are insufficient (Forest 2009; Graur and Martin 2004; Hug and Roger 2007; Rutschmann et al. 2007). In this analysis, I use 22 calibration points derived from the fossil record of many eukaryotic lineages (Table 5.1). Calibration points are specified as prior distributions to incorporate the many sources of uncertainty discussed above.

107

5.2.3 Eukaryotic phylogeny and the root of eukaryotes

The limited molecular data for eukaryotes forced a trade off in previous molecular clock analyses between many taxa and calibration points in analyses of a single gene

(Berney and Pawlowski 2006) and analyses of many genes but a small number of taxa and calibration points (Douzery et al. 2004; Hedges et al. 2004). Thus, the availability of taxon and gene-rich datasets coupled with flexible molecular clock methods make this an ideal time to revisit the age of eukaryotes.

Estimates of relationships among major lineages of eukaryotes have begun to stabilize in recent years (Adl et al. 2005; Burki et al. 2009; Hampl et al. 2009; Parfrey et al. 2010). The majority of the >70 lineages of eukaryotes (Patterson 1999) fall within four major groups, Opisthokonta, Excavata, Amoebozoa, and SAR (stramenopiles, alveolates, and Rhizaria; Burki et al. 2009; Hampl et al. 2009; Parfrey et al. 2010), although the placement of photosynthetic lineages remains controversial (Baurain et al.

2010; Lane and Archibald 2008; Parfrey et al. 2010). The improvements to eukaryotic phylogeny have been driven by increased availability of multigene molecular data from diverse eukaryotes. Broad taxonomic sampling coupled with sufficient gene sampling increases phylogenetic accuracy (Heath et al. 2008; Hillis 1998; Parfrey et al. 2010;

Zwickl and Hillis 2002). Greater taxonomic sampling is also expected to yield more reliable estimates of divergence dates both because of the improvements to the underlying phylogeny and branch lengths and because more nodes are then available to be calibrated with fossil data (Heath et al. 2008).

Rooted phylogenies are crucial for interpreting the evolutionary events in the history of a lineage and are a requirement for molecular clock analyses (Renner et al.

108

2008; Sanderson and Doyle 2001). However, the root of the eukaryotic tree of life if difficult to determine because the common methods for rooting phylogenies are vulnerable to artifacts caused by rate heterogeneity among lineages of eukaryotes and the vast distance between eukaryotes and archaea or bacteria (e.g. Embley and Martin 2006;

Roger and Hug 2006). Although there are numerous hypotheses (Arisue et al. 2005;

Cavalier-Smith 2010; Nozaki 2005; Rogozin et al. 2009; Stechmann and Cavalier-Smith

2003) the position of the root remains an open debate (Embley and Martin 2006; Koonin

2010; Roger and Hug 2006; Roger and Simpson 2009). The most popular hypothesis of recent years places the root of eukaryotes between the Opisthokonta + Amoebozoa

(‘unikonts’) and the remaining eukaryotes (‘’; Keeling et al. 2005; Stechmann and

Cavalier-Smith 2003), and previous molecular clock analyses of eukaryotes rooted the tree between these groups (Berney and Pawlowski 2006; Douzery et al. 2004; Hug and

Roger 2007). However, several lines of evidence contradict the ‘unikont’/ ‘’ split

(Arisue et al. 2005; Roger and Simpson 2009) and alternative roots have been suggested, including within ‘Archaeplastida’ (Nozaki 2005; Rogozin et al. 2009), at the

Opisthokonta (Arisue et al. 2005) or along the lineage leading to Euglenozoa (Cavalier-

Smith 2010). Here, I assess the impact different positions of the root on estimates of the age of eukaryotes. The root is either estimated in BEAST using the molecular clock criterion (Drummond et al. 2006) or placed between Opisthokonta and the rest of eukaryotes as suggested by gene tree-species tree (GT-ST) reconciliation methods

(Burleigh et al. unpublished).

109

5.2.4 My approach

I test hypotheses on the origin and diversification of eukaryotes using relaxed molecular clock methods to estimate divergence dates under a suite of conditions.

Taxon-rich multigene trees are used to estimate dates, with rate heterogeneity across the tree and among genes incorporated into the model. I assess the impact of two factors on estimated divergence times, 1) the position of the root of the eukaryotic tree, and 2) inclusion of calibration points for Proterozoic fossils that are less certain. Finally, I explore the impact of taxon sampling by analyzing two data sets with varying numbers of taxa.

5.3 Methods

5.3.1 Alignments

Alignments are derived from the 15 protein-coding gene alignment found in

Parfrey et al. 2010 (15:10). Taxon sampling was chosen to sample eukaryotic diversity, with emphasis on lineages that have fossil data, and was based on the most reduced taxon set of Parfrey et al. 2010 (Table 5.2). Modifications were made to improve the stability of the topology and to minimize rate heterogeneity across the tree. Streblomastix strix

ESTs have been removed, as these are from a (Parfrey et al. 2010). Rapidly evolving taxa (e.g. Encephalitozoon cunciculi) and orphans (e.g. Breviata anathema) were removed. Additional taxa were included to take advantage of available fossil data resulting in a 109-taxon alignment (36% missing data). These taxa had between three and 15 of the target 15 genes (109-taxon dataset; Table 5.2). A 91-taxon alignment was created by removing taxa with long branches or a lot of missing data (91-taxon dataset;

110

Table 5.2). The amount of missing data in these matrices is within the range found to be permissible in Bayesian analyses, which have been shown to reliably estimate phylogenies with up to 90% missing data (Wiens and Moen 2008).

5.3.2 Phylogenetic analysis

BEAST requires a reasonable starting tree to analyze complex datasets, so the initial topology was obtained in RAxML. 200 bootstrap replicates followed by an exhaustive maximum likelihood search were done using the MPI version of RaxML 7.0.4 with rapid bootstrapping and the WAG + gamma model (Stamatakis et al. 2008). The best fitting amino acid substitution matrix available in BEAST was WAG for all partitions as estimated in ProtTest (Abascal et al. 2005). This resulted in a highly supported topology consistent with that found in Parfrey et al. 2010 (Fig. 5.1).

5.3.3 Molecular dating analyses

Dating analyses were performed in Beast v1.5.4 (Drummond and Rambaut 2007).

One of the strengths of BEAST is the flexibility it provides in the specification of models of molecular evolution, molecular clock, as well as tree topology and calibration points.

This flexibility introduces a multitude of model options to choose among. I ran preliminary analyses to assess the impact of several of these options, including type of molecular clock, partitioning genes, and tree topology on eukaryotic divergences times.

Initially, parameters were deemed a poor fit for the data if the likelihood values did not converge across four runs of 10 million generations. If competing models both converged the best model was chosen by comparing likelihoods with Bayes factors

111

(Suchard et al. 2001). Convergence and Bayes factors were assessed in Tracer v1.5

(distributed with BEAST v1.5.4; Drummond and Rambaut 2007). The conditions the resulted in convergence or higher likelihood scores were used in all subsequent analyses.

Each gene was analyzed as a separate partition for both site models and molecular clock models because the unpartitioned model did not converge. The WAG amino acid substitution matrix was used in all cases, as it was the best fitting model available in

BEAST, as determined by PROTTEST (Abascal et al. 2005). A model of amino acid substitution that included gamma-distributed rate classes was found to be a better fit for the data, models with and without gamma correction yielded similar dates and topologies.

Including a gamma correction resulted in a 10-fold computational cost, thus the gamma correction was employed in only a few cases for comparison.

The uncorrelated lognormal (UCL) relaxed clock model was found to be the best clock model for these data. Analyses employing either a strict molecular clock or an uncorrelated exponential relaxed clock did not converge. The UCL relaxed clock is expected to perform better on datasets with deep divergences and rate heterogeneity across the tree because the standard deviation parameter captures the variation in rates across the tree (Drummond et al. 2006). The coefficient of variation of the UCL clock ranges from 0.3 to 2.5 for different genes, which indicates that rates vary between 30% and 250% across the tree depending on the gene. Thus, these data are not clock-like.

Finally, allowing BEAST to modify tree topology resulted in a highly supported topology that was broadly consistent with other analyses (Burki et al. 2009; Hampl et al. 2009;

Parfrey et al. 2010).

112

Estimating the divergence times of all eukaryotes is a challenging problem and resulted in poor chain mixing, as measured by estimated sample size (ESS) in Tracer. A two-pronged approach was taken to improve ESS values in BEAST analyses. First, initial analyses of four chains run for 10 million generations each were run. The best

RAxML tree was used as the starting tree, and branch heights were set to 360.0 so as that all nodes were all older than the calibration constraints assigned to them. Priors for all parameters (excluding CCs) were left at default settings. Five million generations were removed from each of these chains as burnin (as determined by convergence of likelihood values in Tracer v1.5.4) and chains and trees were combined in LogCombiner v1.5.4 (distributed with BEAST; Drummond et al. 2006). The initial runs yielded a robust tree topology with realistic branch heights that was used as a starting tree for subsequent analyses. Operator values and prior distributions on substitution rates were informed from the results from the initial runs. In the second phase of the analysis, 16 runs of 10 million generations each were conducted. These runs were combined by

LogCombiner after one million generations was removed from each as burnin (as determined by Tracer).

5.3.4 Assessing the position of the root

The impact of the position of the root on divergence times was assessed by comparing molecular clock analyses with the root constrained to the Opisthokonta, as suggested by gene tree-species tree (GT-ST) reconciliation methods to analyses where the root was determined by BEAST using the molecular clock criterion. The root was

113

constrained to Opisthokonta by constraining all non-opisthokonts to be a monophyletic ingroup.

5.3.5 Calibration points

All calibration constraints (CC) were established with paleontological data, and errors arising from stratigraphy and clade assignment taken into account when specifying the prior distribution around these calibrations (Table 5.1). Fourteen CCs were assigned based on Phanerozoic fossils and eight additional CCs were added from fossils of

Proterozoic age. Together these CCs are broadly distributed across the eukaryotic tree of life (Table 5.1). The Proterozoic fossil record is much more sparse, and the taxonomic assignment of some Proterozoic fossils has been called into questions (e.g. Cavalier-

Smith 2002). Thus, I assessed the impact of these older fossils by analyzing the data with only the 14 Phanerozoic CCs (Phan analyses) or with Phanerozoic and Proterozoic CCs

(All analyses).

Calibration constraints were specified as prior distributions of probable ages in

BEAST using the BEAUTi application v1.5.4 (Drummond and Rambaut 2007). The minimum age of the constraint (offset) was derived from a conservative reading of the fossil record. I used radiometric dates when available, and set the minimum constraint to the youngest edge of the confidence interval. Thus, the minimum age of the CC for

Arcellinida is 736 Ma because arcellinid fossils are found in rocks younger than 742 ± 6

Ma (Table 5.1; Porter et al. 2003). For fossils assigned to geological stages, I used the upper boundary of the stage according to the 2009 International Stratigraphy Chart published by the International Commission on Stratigraphy. For example, angiosperm

114

pollen is first found in Valanginian rocks (Crane et al. 1995) and was constrained to a minimum date of 133.9 Ma. Prior distributions were set in one of two ways depending of the level of uncertainty. For clades with good fossil records where the maximum age of the clade is unlikely to be substantially earlier than its first occurrence (e.g. angiosperms), the prior distribution was set to include 95% of the probable age of the clade. In contrast,

Proterozoic records and fossils of groups with a poor fossilization potential provide only minimum dates for the origin of the lineage, but there is no information on the maximum clade age (e.g. ). In these cases the prior distribution was specified with a very long tail, as assessed in BEAUTi, that extended back to ~3.5 Ga.

Selected calibration points are discussed here and the reader is referred to the references in Table 5.1 for remaining dates. Moyeria, a , was widely distributed in the Ordovician and Silurian, with the earliest occurrence in the Caradocian (Gray and

Boucot 1989), dated at 450 Ma. Moyeria is thought to have been photosynthetic based on the patterning of its pellicle, indicating that the secondary endosymbiosis of a green alga occurred early in euglenid history (Leander et al. 2001).

Fossils of the earliest red alga, Bangiomorpha, occur in the lower section of the

Hunting Formation, which is bracketed by radiometric dates of 1267±2 Ma and 723±3

Ma. The Pb-Pb dates of carbonates where the fossils occur yield a much narrow range of

1198±24 Ma (Butterfield 2000), but this date remains unpublished and radiometric dating of carbonates can be unreliable. Given the importance of the Bangiomorpha calibration as potentially the oldest fossil by more than 400 Myr, I ran the All analyses with the constraint for Bangiomorpha set to 1174 Ma or 720 Ma. Exclusion of Bangiomorpha pushed the estimate for the age of the root younger, by roughly 150 Ma. The true date of

115

Bangiomorpha fossils is likely closer to the maximum age of the Hunting formation because they are found in the lower sections of the formation (100 m of the 1000 m think section), and the biostratigraphy is consistent with the Mesoproterozoic (Butterfield

2000).

The calibration constraint for diatoms is based on the earliest diatoms from the

Myogok Formation in Korea, which is Valanginian and Hauterivian in age (Harwood et al. 2007). A date of 133.9 Ma is used to represent the upper Valanginian boundary. The calibration constraint of this node is younger than in other clock analyses (e.g. Berney and Pawlowski 2006) because I do not rely on Pyxidicula, a putative diatom from the

Toarcian (Rothpletz 1896) for which the material has been lost. This older date is included in the 95% HPD of the prior.

I include a calibration constraint for ciliates in the All analyses that is based on the presence of gammacerane (Summons and Walter 1990). Tetrahymenol, the precursor of gammacerane, is commonly found in ciliates, though it has also been found in bacteria

(Kleemann et al. 1990). This calibration constraint was included despite the possibility of bacterial origin because the 736 Ma constraint is much younger than the date estimated for ciliates (~1150 Ma) without this constraint in the Phan analyses.

5.4 Results

5.4.1 Timing of eukaryotic origins and radiation of extant lineages

Taxon-rich analyses of multiple genes reveal remarkable stability in the divergence dates across the eukaryotic tree of life. Analyses of varying taxon sets, calibration schemes, and root positions place the age of extant eukaryotes between 1918

116

Ma and 1652 Ma (Fig. 5.2). The precision of these estimates is measured by 95 % highest posterior densities (HPD) in BEAST, which are akin to confidence interval. The probable age for extant eukaryotes across all analyses is 2037 – 1518 Ma when the 95%

HPD is taken into account. A similar age range was also observed in preliminary analyses with limited taxon sampling and slightly different calibration constraints (71 taxa, root age 1850 – 1550 Ma), further suggesting that these estimates are robust to varying analysis conditions.

The major clades of eukaryotes had all diverged prior to 1100 Ma. These analyses also suggest that the major clades of eukaryotes arose within similar time frames, as they have overlapping 95% most probable date ranges (roughly equivalent to confidence intervals; Figs. 5.2-5.4). The mean estimated ages of Amoebozoa,

Opisthokonta, and SAR fall between 1690 and 1231 Ma (Fig. 5.2). The Excavata appear to diverge slightly earlier than the other clades at 1766 to 1387 Ma, although this may reflect the prevalence of rapidly evolving taxa within Excavata (Hampl et al. 2009;

Philippe et al. 2000) rather than a more ancient divergence. The 95% HPD intervals are wider for clades with a paucity of calibration points, such as the Excavata and

Amoebozoa (Figs. 5.3 and 5.4).

5.4.2 Impact of including Proterozoic calibration constraints

Inclusion of Proterozoic fossils as calibration constraints modestly shifted estimated divergence times. Datasets were analyzed with two sets of calibrations to assess the impact of including older Proterozoic fossils. Phan analyses included 14 calibration constraints from the Phanerozic fossil record of diverse lineages, while All

117

analyses included eight additional constraints of Proterozoic age (Table 5.1). Estimates for the root of extant eukaryotes were 1725-1652 in Phan analyses (95% HPD range 1883

-1518 Ma). When Proterozoic calibration points are included the range of probable dates for the origin of eukaryotes is estimated at 1918 -1843Ma (95% HPD range 2037-1700

Ma; Figs. 5.2 and 5.4). The shift in the age of the root and most other nodes from the

Phan to All analyses was smaller in the 109-taxon (shift ~120 Ma older) dataset relative to the 91-taxon dataset (shift 250 Ma older). This may reflect an increase in stability with greater taxon sampling.

Inclusion of Proterozoic calibration constraints shifted age estimates100 to 200 million years older for major clades of eukaryotes (Fig 5.2). The largest shifts occur for

Amoebozoa and Excavata (Fig 5.2), which is likely due to a combination of the long branches found in these clades and the paucity of fossil data (only one constraint for each clade; Amoebozoa is only constrained in All analyses). Differences in divergence times are small for nested clades, e.g. the range of 95% highest probability density (HPD) for

Alveolata shifts from 1390 - 1189 Ma to 1458 - 1267 Ma with the inclusion of

Proterozoic calibration points in 109-taxon analysis (Figs. 5.3 and 5.4). The differing calibration schemes did have a dramatic impact on the estimated age of the red algae

(Figs. 5.3 and 5.4). Estimated dates for the origin of red algae are 1103 - 860 Ma with

Phanerozoic calibration points (Fig. 5.3) and 1320 - 1182 Ma when this node is constrained to be older than 1174 Ma, in accordance with the Bangiomorpha red algal fossil (Table 5.1 and Fig. 5.4).

The small differences in estimated node age between the calibration sets is driven by the sequence data, rather than the prior distribution of calibration constraints.

118

Analyses of priors alone with no sequence data yield dates for major nodes that are 547-

1060 million years younger than analyses with data. For example, the root age is 1655

Ma in the 91-taxon Phan analysis, but only 743 Ma when this analysis is run without data.

5.4.3 Impact of root position on divergence time estimates

Variation in the position of the root has little impact on estimated divergence dates across eukaryotes, especially for the date of the root. The estimated date for the root differed by between 2 and 56 Myr in analyses rooted on Opisthokonta versus analyses rooted by the molecular clock in BEAST (Fig. 5.2). Similarly, estimated ages of the major clades were not strongly impacted by the position of the root, and node ages were shifted by 125 to 37 million years (Fig. 5.2). The possible exception to this trend is the age of Amoebozoa in the All 109-taxon analyses, where the node age shifts from

1419 Ma to 1606 Ma in molecular clock rooted analysis. This difference is likely due to the sister relationship of Amoebozoa and Opisthokonta that is reconstructed in all MC rooted analyses, but disrupted in Opisthokonta root analyses. However, no strongly supported alternative root emerged from the analyses in which BEAST determined the root based on the molecular clock. The position of the root varied across runs. The root fell within Excavata or between unikonts, Excavata, SAR, SAR + haptophytes, Excavata

+ unikonts and the rest of eukaryotes. There was also variation in the position of the root among replicate runs of different datasets. Given these results, the analyses rooted on

Opisthokonta in accordance with the GT-ST results (Burleigh et al, In Prep) are shown

(Figs 5.3 and 5.4).

119

5.4.4 Topology of eukaryotic tree

The dated trees produced by BEAST through co-estimation of phylogeny and dates are consistent the broad outlines of eukaryotic diversity recovered in other analyses

(Burki et al. 2009; Hampl et al. 2009; Parfrey et al. 2010). Several relationships emerge in the BEAST topologies that are not present in the RAxML analyses (Fig. 5.1). RAxML analyses divide Excavata into two clades when all major lineages are included (109-taxon set; Fig. 5.1); however, Excavata is monophyletic in all BEAST analyses (unless the root of eukaryotes falls within Excavata). Second, BEAST consistently places haptophytes as the sister clade of SAR ( Figs. 5.3 and 5.4). Finally, cryptomonads consistently branch as sister to kathablepharids and within the clade of putatively primary photosynthetic eukaryotes: red algae, green algae (including plants), and glaucophytes.

5.5 Discussion

5.5.1 Molecular time scale of eukaryotic evolution

The molecular clock analyses presented here suggest that eukaryotes arose between 1918 and 1843 Ma when both Phanerozoic and Proterozoic calibration points are considered. Including fossils that reflect the deep events of eukaryotic evolution should paint a more accurate picture of eukaryotic diversification, especially since the chosen fossils are widely accepted in the field and prior distributions were assigned in a conservative manner that accounts for the uncertainty surrounding these fossils. The stability of estimated dates when Proterozoic CCs are removed fosters greater confidence in the robustness of these results. This estimate for the timing of the origin of extant eukaryotes is in line with early fossil evidence of eukaryotes (Knoll et al. 2006; Porter

120

2006), although there are acritarchs such as that are older than 1850 Ma (Knoll et al. 2006). These estimates are much younger than the Archean fossils described by

Javaux et al. (Javaux et al. 2010) and the biomarkers from 2.7 Ga rocks (Brocks et al.

1999). If these Archean fossils do represent eukaryotes, they may reflect the presence of stem lineages of eukaryotes hundreds of millions of years prior to the radiation of extant clades.

Our estimates suggest that the major lineages of eukaryotes (Opisthokonta, SAR,

Excavata and Amoebozoa) diverged when the oceans were still sulfidic, prior to 1100 Ma

(Fig. 5.2). The origination of major clades of eukaryotes in the late Mesoproterozoic suggests these lineages were present for hundreds of millions of years before radiating in the Neoproterozoic. The abundance and diversity of eukaryotic microfossils increases rapidly in the late Neoproterozoic, and this pattern is observed in acritarchs, biomarkers, and fossils that can be assigned to extant clades (Knoll 1994; Knoll In review; Knoll et al.

2006; Leiming and Xunlai 2007; Porter 2006). Again, it is likely that stem groups were present well before recognizable members of the clades emerged. A similar pattern of long stem preceding crown diversification is also seen in animal and plants and this may be a consistent pattern in evolution (Knoll, in review).

5.5.2 Discrepancy between these and previous molecular clock studies

Previous molecular clock studies yielded widely different dates for the root of extant eukaryotes, from 3970 Ma to 1100 Ma (Roger and Hug 2006). In a recent analysis of SSU-rDNA from broadly sampled eukaryotes, Berney et al. (2006) placed the origin of eukaryotes at 1100 Ma. They had numerous CCs specified as either minimum or

121

maximum divergence dates (Berney and Pawlowski 2006). They found that including

Proterozoic calibration points such as Bangiomorpha (red algae at 1200 Ma),

Paleovaucheria (xanthophyte stramenopile at 1000 Ma), and Proterocladus

(Cladophoracean green algae at 750 Ma) shifted their estimates of the origin and diversification of eukaryotes by 1 to 2.5 billion years (Table 1 in Berney and Pawlowski

2006). Removing maximum constraints also caused dramatic shifts in date estimates.

The variance in dates observed by Berney and Pawlowski (2006) may be due to rate heterogeneity in SSU-rDNA across the tree (e.g. Philippe et al. 2000), the specification of calibration constraints as uniform distributions, or a combination of the two.

The observed discrepancy in the age of eukaryotes when Proterozoic CCs are included (Berney and Pawlowski 2006) sharply contrasts with the stability of dates seen in my analyses. This stability in molecular clock estimation is likely due to the increased taxon and gene sampling. This assertion is consistent wit the smaller observed difference between Phan and All CCs in the 109-taxon dataset as compared to the 91-taxon analyses. A further stabilizing factor may be the use of gamma-distributed priors to specify CCs rather than uniform priors. This method of calibration may prevent estimated ages from being pulled backward when ancient prior dates are included.

5.5.3 Neoproterozoic origin of eukaryotes rejected

Our analyses demonstrate that that eukaryotes arose in the Paleoproterozoic or earliest Mesoproterozoic, and this conclusion is robust to the interpretation of the

Proterozoic fossil record (Fig. 5.2). Some consider Proterozoic fossils, especially of the putative red algae Bangiomorpha, controversial (Berney and Pawlowski 2006; Cavalier-

122

Smith 2002). I tested the hypothesis that eukaryotes arose in the Neoproterozoic

(Cavalier Smith 2010; Cavalier-Smith 2002) by analyzing my datasets without any calibration points derived from Proterozoic fossils (Phan analyses). Even in the absence of Proterozoic calibrations, the estimated age for the last common ancestor of eukaryotes ranges between 1883 and 1518 Ma (Figs. 5.2 and 5.3). Thus, Cavalier-Smith’s (2002) view that eukaryotes originated only 850 million years ago is rejected even when calibration points questioned by Cavalier-Smith are removed.

5.5.4 Ecological setting for the origin and diversification of eukaryotes

The chemistry of mid Proterozoic oceans (1850-1250 Ma) was much different than the conditions that prevailed during either the Archean (4000-2500 Ma) or

Phanerozoic (542 Ma to present) Eons (Anbar and Knoll 2002). Geochemical data suggest that the oceans became sulfidic with the rise of atmospheric oxygen 2000 to 1800

Ma (Anbar and Knoll 2002; Canfield 1998). As a result, nitrogen rather than phosphorous became the limiting nutrient for biological productivity for the only time in

Earth’s history (Anbar and Knoll 2002). This shift altered the availability of iron and many trace metals that are biologically essential (Anbar and Knoll 2002) and potentially caused the unusually low and stable levels of primary productivity observed in the middle

Proterozoic (Anbar and Knoll 2002). Nitrogen limitation is argued to have major consequences for the evolution of eukaryotic organisms, because extant eukaryotes, especially primary producers, are not able to thrive without abundant nitrogen (Anbar and

Knoll 2002). These changes may have had widespread consequence for the early evolution of eukaryotes. It is possible that early lineages of eukaryotes originated and

123

diverged in restricted environments such as shallow, near shore waters or freshwater systems that were not impacted by the sulfidic conditions of the deep ocean.

5.6 Acknowledgements

Calibration points have been established in consultation with the eukaryotic paleontologist Andrew Knoll. Thanks to Dan Lahr, Laura Katz, and Andrew Knoll for helpful discussions.

124

Table 5.1. Calibration constraints for dating the eukaryotic tree of life Calibration Taxon Major Clade Fossil Eon1 Constraint2 Refs

Amniota Opisthokonta Westlonthania Phan 328.3 (4,3) 1 Angiosperms Archaeplastida Oldest angio pollen Phan 133.9 (2,2.5) 2 Ascomycetes Opisthokonta Paleopyrenomycites Phan 400 (4,20) 3 Basidiomycetes Opisthokonta Basidiomycete clamps Phan 330 (4,20) 4 Earliest Haptophytes Heterococcolith Phan 203.6 (2,8) 5 Diatoms Stramenopiles Pyxidicula Phan 133.9 (4,7) 6 Dinoflagellates Alveolates Earliest gonyaulacales Phan 240 (2,5) 7 Embryophytes Archaeplastida Land plant spores Phan 471 (2,3) 8 Endopterygota Opisthokonta Mecoptera Phan 284.4 (5,5) 9 Euglenids Excavata Moyeria Phan 450 (2,40) 10 Oldest multichambered Foraminifera Rhizaria forams Phan 525 (3,100) 11 Gonyaulacales Alveolates Gonyaulacaceae split Phan 196 (2,2) 7 Spirotrichs Alveolates Oldest Phan 444 (2.5,100) 12 Earliest Trachaeophytes Archaeplastida trachaeophytes Phan 425 (4,2.5) 13 LOEMs, Animals Opisthokonta biomarkers Protero 632 (2,300) 14, 15 Arcellinida Amoebozoa Paleoarcella Protero 736 (2,300) 16 Opisthokonta Protero 555 (3,40) 17 Ciliates Alveolates Gammacerane Protero 736 (2.5,300) 18 Florideophyceae Archaeplastida Doushantuo red algae Protero 550 (2.5,100) 19 Green algae Greens Palaeastrum Protero 700 (2.5,300) 20 Red algae Archaeplastida Bangiomorpha Protero 1174 (3,250) 21 Photosynthetic stramenopiles Stramenopiles Miaohephyton Protero 550 (2,250) 22

1Eon: Phan = Phanerozoic, Protero. = Proterozoic, Proterozoic calibrations are excluded from Phan analyses. 2Calibration constraints are specified using a gamma distribution with a minimum date in Ma based on the fossil record parameters as indicated: Min (shape, scale). Refs: 1=(Smithson and Rolfe 1990); 2=(Crane et al. 1995); 3=(Taylor et al. 1999); 4=(Krings et al. 2010); 5=(Bown 1998); 6=(Harwood et al. 2007); 7=(Fensome et al. 1999); 8=(Rubinstein et al. 2010); 9=(Dostál and Prokop 2009); 10=(Gray and Boucot 1989); 11=(Scott et al. 2003); 12=(Lipps 1993); 13=(Kenrick and Crane 1997); 14=(Cohen et al. 2009); 15=(Love et al. 2009); 16=(Porter et al. 2003); 17=(Martin et al. 2000); 18=(Summons and Walter 1990); 19=(Xiao et al. 2004); 20=(Butterfield et al. 1994); 21=(Butterfield 2000); 22=(Xiao et al. 1998).

125

Table 5.2. Details of gene and taxon sampling

3

-

α 3 - tub Lineage Taxon1 tub 14 40S Actin α β Ef1 Ef2 Enolase Grc5 Hsp70cyt Hsp90 MetK Rps22a Rps23a Tsec61 Sum

Alveolates Alexandrium tamarense 1 1 1 1 1 1 1 1 1 1 1 1 1 14 Alveolates Chilodonella uncinata 1 1 1 1 1 1 1 1 1 10 Alveolates Crypthecodinium cohnii 1 1 1 1 5 Alveolates tenella 1 1 1 1 1 1 1 1 1 10 Alveolates Heterocapsa rotundata 1 1 1 1 1 6 Alveolates brevis 1 1 1 1 1 1 1 1 1 1 11 Alveolates Nyctotherus ovalis 1 1 1 1 1 1 1 1 1 1 1 12 Alveolates Oxyrrhis marina 1 1 1 1 1 6 Alveolates Paramecium tetraurelia 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Alveolates marinus 1 1 1 1 1 1 1 1 1 1 1 1 1 1 15 Alveolates 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Alveolates Sterkiella histriomuscorum 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Alveolates Stylonychia lemnae 1 1 1 1 1 6 Alveolates Tetrahymena thermophila 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Alveolates parva 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Alveolates 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Amoebozoa castellanii 1 1 1 1 1 1 1 1 1 1 1 1 1 1 15 Amoebozoa hemisphaerica 1 1 3 Amoebozoa Dictyostelium discoideum 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Amoebozoa Entamoeba histolytica 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Amoebozoa vermiformis 1 1 1 1 1 1 1 1 1 1 1 1 1 14 Amoebozoa Mastigamoeba balamuthi 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Amoebozoa Physarum polycephalum 1 1 1 1 1 1 1 1 1 1 1 1 1 1 15 Amoebozoa sp. ATCC 50933 1 1 1 4 Animals Aphrocallistes vastus 1 1 1 1 5 Animals Apis mellifera 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Animals Aplysia californica 1 1 1 1 1 1 1 1 1 1 1 1 1 14 Animals Branchiostoma floridae 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Animals Caenorhabditis elegans 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Animals Capitella capitata 1 1 1 1 1 1 1 1 1 1 11 Animals Drosophila melanogaster 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Animals Gallus gallus 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Animals Homo sapiens 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Animals Mnemiopsis leidyi 1 1 1 1 1 1 1 1 1 1 1 12 Animals Nematostella vectensis 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Animals Oscarella carmela 1 1 1 1 1 1 1 1 1 1 1 12 Animals Schistosoma mansoni 1 1 1 1 1 1 1 1 1 1 1 1 1 14 Choanoflagellida Monosiga brevicollis 1 1 1 1 1 1 1 1 1 1 1 1 1 1 15 Cryptophyta Goniomonas2 1 1 1 1 1 1 1 1 1 1 11 Cryptophyta theta 1 1 1 1 1 1 1 1 1 10 Euglenozoa Bodo saltans 1 1 1 1 1 1 1 8 Euglenozoa Diplonema papillatum 1 1 1 1 1 1 1 1 1 1 1 1 1 14 Euglenozoa Entosiphon sulcatum 1 1 1 4 Euglenozoa Euglena gracilis 1 1 1 1 1 1 1 1 1 1 1 1 1 14 Euglenozoa Euglena longa 1 1 1 1 1 1 1 1 1 1 1 1 1 14 Euglenozoa major 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Euglenozoa 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Continue on the next page

126

3

-

α 3 - tub Lineage Taxon1 tub 14 40S Actin α β Ef1 Ef2 Enolase Grc5 Hsp70cyt Hsp90 MetK Rps22a Rps23a Tsec61 Sum

Fornicata Carpediemonas membranifera 1 1 1 1 5 Fornicata Giardia lamblia ATCC 50803 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Fornicata barkhanus 1 1 1 1 1 1 1 1 1 1 1 12 Fungi Allomyces macrogynus 1 1 1 1 1 1 1 1 1 1 1 1 13 Fungi Candida albicans 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Fungi Glomus intraradices 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Fungi Phanerochaete chrysosporium 1 1 1 1 1 1 1 1 1 1 1 1 1 1 15 Fungi Saccharomyces cerevisiae 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Fungi Schizosaccharomyces pombe 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Fungi Spizellomyces punctatus 1 1 1 1 1 1 1 1 1 1 1 12 Fungi Ustilago maydis 1 1 1 1 1 1 1 1 1 1 1 1 1 1 15 Glaucophytes Cyanophora paradoxa 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Glaucophytes Glaucocystis nostochinearum 1 1 1 1 1 1 1 1 1 1 1 1 13 Haptophytes Emiliania huxleyi 1 1 1 1 1 1 1 1 1 1 1 1 1 1 15 Haptophytes Isochrysis galbana 1 1 1 1 1 1 1 1 1 1 1 1 1 1 15 Haptophytes Pavlova lutheri 1 1 1 1 1 1 1 1 1 1 1 1 13 Haptophytes Prymnesium parvum 1 1 1 1 1 1 1 1 1 1 1 1 13 Heterolobosea gruberi 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Heterolobosea marylandensis 1 1 1 1 1 1 1 1 1 1 1 1 13 Ichthyosporea Amoebidium parasiticum 1 1 1 1 1 1 1 1 1 1 11 Ichthyosporea owczarzaki 1 1 1 1 1 1 1 1 1 1 1 1 1 1 15 Ichthyosporea 1 1 1 1 1 1 1 1 1 1 1 1 1 14 Jakodidae libera 1 1 1 1 1 1 1 1 1 1 1 12 Jakodidae americana 1 1 1 1 1 1 1 1 1 1 1 1 1 1 15 Jakodidae ‘Seculamonas ecuadoriensis' 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Kathablepharidae marina 1 1 1 1 1 1 7 Malawimonas Malawimonas californiana 1 1 1 1 1 1 1 1 1 1 1 1 12 Malawimonas Malawimonas jakobiformis 1 1 1 1 1 1 1 1 1 1 1 1 1 1 15 Parabasalidea Trichomonas vaginalis 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Preaxosytla sp. 1 1 1 1 1 6 Preaxosytla Streblomastix strix 1 1 1 1 5 Preaxosytla Trimastix pyriformis 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Rhizaria natans 1 1 1 1 1 1 1 1 1 1 1 12 Rhizaria Corallomyxa tenera 1 1 1 1 1 1 7 Rhizaria Gromia3 1 1 3 Rhizaria Heteromita4 1 1 1 1 1 1 1 1 1 10 Rhizaria Ovammina opaca 1 1 1 4 Rhizaria brassicae 1 1 1 4 Rhizaria filosa 1 1 1 1 1 1 1 1 1 10 Rhodophyta Chondrus crispus 1 1 1 1 1 1 1 1 1 1 1 1 13 Rhodophyta Cyanidioschyzon merolae 1 1 1 1 1 1 7 Rhodophyta Gracilaria changii 1 1 1 1 1 1 1 1 1 1 1 1 1 1 14 Rhodophyta Porphyra yezoensis 1 1 1 1 1 1 1 1 1 1 1 1 1 14 Stramenopiles Apodachlya brachynema 1 1 1 1 1 1 7 Stramenopiles Aureococcus anophagefferens 1 1 1 1 1 1 1 1 1 1 1 1 1 1 15 Stramenopiles Ectocarpus siliculosus 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Stramenopiles 1 1 1 1 1 1 7 Stramenopiles Phaeodactylum tricornutum 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Continue on the next page

127

3

ase

-

α 3 - tub Lineage Taxon1 tub 14 40S Actin α β Ef1 Ef2 Enol Grc5 Hsp70cyt Hsp90 MetK Rps22a Rps23a Tsec61 Sum

Stramenopiles Phytophthora infestans 1 1 1 1 1 1 1 1 1 1 1 1 1 1 15 Stramenopiles Thalassiosira pseudonana 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Viridiplantae Acetabularia acetabulum 1 1 1 1 1 1 1 1 1 10 Viridiplantae Arabidopsis thaliana 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Viridiplantae Chlamydomonas reinhardtii 1 1 1 1 1 1 1 1 1 1 1 1 1 1 15 Viridiplantae Dunaliella salina 1 1 1 1 1 1 1 1 1 1 1 12 Viridiplantae Ginkgo biloba 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Viridiplantae Mesostigma viride 1 1 1 1 1 1 1 1 1 1 1 1 1 14 Viridiplantae Micromonas pusilla 1 1 1 1 1 1 1 1 1 1 1 1 1 1 15 Viridiplantae Oryza sativa 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Viridiplantae Ostreococcus tauri 1 1 1 1 1 1 1 1 1 1 1 1 13 Viridiplantae Physcomitrella patens 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 Viridiplantae Volvox carteri 1 1 1 1 1 1 1 1 1 1 1 1 1 1 15 Viridiplantae Welwitschia mirabilis 1 1 1 1 1 1 1 1 1 1 1 1 1 14

1Taxa in bold are included in both the 91-taxon and 109-taxon analyses. 2chimera of truncata and Goniomonas cf. pacifica 3chimera of G. oviformis and Gromia sp. Antarctica 4chimera of. H. globosa and Heteromita sp. ATCC PRA-74

128

Figure 5.1. Most likely tree constructed with 15 proteins and 109 taxa in RAxML. Amino acids were analyzed under the WAG + gamma model. 200 bootstrap replicates were done. This topology was used as the starting tree in initial BEAST analyses. Thickened branches indicate bootstrap values of 95 or greater. This unrooted tree is drawn rooted on Opisthokonta.

129

Figure 5.2. Summary of mean divergence dates for the major clades of eukaryotes. Error bars represent 95% highest posterior density (HPD), akin to a 95% confidence interval.

130

Figure 5.3. Time calibrated tree of eukaryotes using Phanerozoic calibration points. BEAST analysis with 15-proteins and 109 taxa. Root was constrained to Opisthokonta. Nodes are at mean divergence times and gray bars represent 95% HPD. Geological time scale is on top and vertical bars demarcate Periods. Details of calibration points in Table 5.1.

131

Figure 5.4. Time calibrated tree of eukaryotes using All calibration points. BEAST analysis with 15-proteins and 109 taxa. Other notes as in Figure 5.3.

132

APPENDIX

PRODUCTS RESULTING FROM THIS DISSERTATION

Thesis chapters

Parfrey LW and Katz LA. 2010. Genome dynamics are influenced by food source in Allogromia laticollaris strain CSH (Foraminifera). Genome Biol Evol. 2:678-685.

Parfrey LW, Grant J, Tekle YI, Lasek-Nesselquist E, Morrison H, Sogin ML, Patterson DJ and Katz LA. 2010. Broadly sampled multigene analyses yield a well-resolved eukaryotic tree of life. Syst Biol. 59(5):518-533.

Parfrey LW, Lahr DJG, Katz LA. 2008. The dynamic nature of eukaryotic genomes. Mol Biol Evol. 25(4): 787-794.

Parfrey LW, Barbero E, Lasser E, Dunthorn MS, Patterson DJ, Bhattacharya D, Katz LA. 2006. Evaluating support for the current classification of eukaryotic diversity. PLoS Genet. 2: 2062-2073

Other publications Tekle YI, Parfrey LW, Katz LA. 2009. Molecular data are transforming hypotheses on the origin and diversification of eukaryotes. BioScience 59: 471–481.

Parfrey LW and Katz LA. 2010. Dynamic genomes of eukaryotes and the maintenance of genomic integrity. Microbe Magazine 5(4): 156-163.

Vazquez D, Parfrey LW and Katz LA. In Press. Putting animals in their place within a context of eukaryotic innovations. In Key Transitions in Animal Evolution (Ed. Rob DeSalle).

Patterson DJ, Bhattacharya D, Cole J, Dunthorn MS, Logsdon JL, Katz LA and Parfrey LW. 2007. Classifying protists. Microbiology Today. February: 44-45.

133

BIBLIOGRAPHY

Abascal, F., R. Zardoya, and D. Posada. 2005. ProtTest: selection of best-fit models of protein evolution. Bioinformatics 21:2104-2105.

Adam, R. D. 2000. The Giardia lamblia genome. Int J Parasitol 30:475-484.

Adl, S. M., and J. D. Berger. 1997. Timing of life cycle morphogenesis in synchronous samples of Sterkiella histriomuscorum 1. The vegetative cell cycle. Eur J Protistol 33:99-109.

Adl, S. M., A. G. B. Simpson, M. A. Farmer, R. A. Andersen, O. R. Anderson, J. R. Barta, S. S. Bowser, G. Brugerolle, R. A. Fensome, S. Fredericq, T. Y. James, S. Karpov, P. Kugrens, J. Krug, C. E. Lane, L. A. Lewis, J. Lodge, D. H. Lynn, D. G. Mann, R. M. McCourt, L. Mendoza, O. Moestrup, S. E. Mozley-Standridge, T. A. Nerad, C. A. Shearer, A. V. Smirnov, F. W. Spiegel, and M. Taylor. 2005. The new higher level classification of eukaryotes with emphasis on the taxonomy of protists. J Euk Microbiol 52:399-451.

Afonkin, S. J. 1986. Spontaneous depolyploidization of cells in ameba clones with increased nuclear-DNA content. Arch Protistenkd 131:101-112.

Ahmadinejad, N., T. Dagan, and W. Martin. 2007. Genome history in the symbiotic hybrid Euglena gracilis. Gene 402:35-39.

Akerborg, O., B. Sennblad, L. Arvestad, and J. Lagergren. 2009. Simultaneous Bayesian gene tree reconstruction and reconciliation analysis. Proc Natl Acad Sci, USA106:5714-5719.

Akerlund, T., K. Nordstrom, and R. Bernander. 1995. Analysis of cell-size and DNA content in exponentially growing and stationary-phase batch cultures of Escherichia coli. J Bacteriol 177:6791-6797.

Amaral-Zettler, L. A., T. A. Nerad, C. J. O'Kelly, and M. L. Sogin. 2001. The nucleariid amoebae: More protists at the animal-fungal boundary. J Euk Microbiol 48:293- 297.

Anbar, A. D., and A. H. Knoll. 2002. Proterozoic ocean chemistry and evolution: A bioinorganic bridge? Science 297:1137-1142.

Andersson, J. O. 2005. Lateral gene transfer in eukaryotes. Cell Mol Life Sci 62:1182- 1197.

Angert, E. R. 2005. Alternatives to binary fission in bacteria. Nat Rev Microbiol 3:214- 224.

134

Archetti, M. 2004. Loss of complementation and the logic of two-step meiosis. J Evol Bio 17:1098-1105.

Archibald, J. M. 2009. The Puzzle of Plastid Evolution. Curr Biol 19:R81-R88.

Archibald, J. M., D. Longet, J. Pawlowski, and P. J. Keeling. 2003. A novel polyubiquitin structure in Cercozoa and Foraminifera: Evidence for a new eukaryotic supergroup. Mol Biol Evol 20:62-66.

Arisue, N., M. Hasegawa, and T. Hashimoto. 2005. Root of the eukaryota tree as inferred from combined maximum likelihood analyses of multiple molecular sequence data. Mol Biol Evol 22:409-420.

Arnold, Z. M. 1955. Life history and cytology of the foraminiferan Allogromia laticollaris. University of California Publications in Zoology 61:167-252.

Arnold, Z. M. 1984. The gamontic karyology of the saccamminid foraminifer Psammophaga simplora Arnold. J Foramin Res 14:171-186.

Aury, J. M., O. Jaillon, L. Duret, B. Noel, C. Jubin, B. M. Porcel, B. Segurens, V. Daubin, V. Anthouard, N. Aiach, O. Arnaiz, A. Billaut, J. Beisson, I. Blanc, K. Bouhouche, F. Camara, S. Duharcourt, R. Guigo, D. Gogendeau, M. Katinka, A. M. Keller, R. Kissmehl, C. Klotz, F. Koll, A. Le Mouel, G. Lepere, S. Malinsky, M. Nowacki, J. K. Nowak, H. Plattner, J. Poulain, F. Ruiz, V. Serrano, M. Zagulski, P. Dessen, M. Betermier, J. Weissenbach, C. Scarpelli, V. Schachter, L. Sperling, E. Meyer, J. Cohen, and P. Wincker. 2006. Global trends of whole- genome duplications revealed by the ciliate Paramecium tetraurelia. Nature 444:171-178.

Awata, H., T. Noto, and H. Endoh. 2006. Peculiar behavior of distinct chromosomal DNA elements during and after development in dicyemid mesozoan Dicyema japonicum. Chromosome Res 14:817-830.

Baldauf, S. L. 2003. The deep roots of eukaryotes. Science 300:1703-1706.

Baldauf, S. L., and J. D. Palmer. 1993. Animals and fungi are each other's closest relatives: congruent evidence from multiple proteins. Proc Natl Acad Sci, USA90:11558-11562.

Baldauf, S. L., A. J. Roger, I. Wenk-Siefert, and W. F. Doolittle. 2000. A kingdom-level phylogeny of eukaryotes based on combined protein data. Science 290:972-977.

135

Bapteste, E., H. Brinkmann, J. A. Lee, D. V. Moore, C. W. Sensen, P. Gordon, L. Durufle, T. Gaasterland, P. Lopez, M. Muller, and H. Philippe. 2002. The analysis of 100 genes supports the grouping of three highly divergent amoebae: Dictyostelium, Entamoeba, and Mastigamoeba. Proc Natl Acad Sci, USA 99:1414-1419.

Bass, D., E. E. Y. Chao, S. Nikolaev, A. Yabuki, K. I. Ishida, C. Berney, U. Pakzad, C. Wylezich, and T. Cavalier-Smith. 2009. Phylogeny of Novel Naked Filose and Reticulose Cercozoa: cl. n. and Revised. 160:75-109.

Baurain, D., H. Brinkmann, J. Petersen, N. Rodriguez-Ezpeleta, A. Stechmann, V. Demoulin, A. J. Roger, G. Burger, B. F. Lang, and H. Philippe. 2010. Phylogenomic evidence for separate acquisition of plastids in cryptophytes, haptophytes, and stramenopiles. Mol Biol Evol 27:1698-1709.

Bé, A. W. H., and O. R. Anderson. 1976. Gametogenesis in planktonic foraminifera. Science 192:890-892.

Beermann, S. 1977. The diminution of heterochromatic chromosomal segments in Cyclops (Crustacea, Copepoda). Chromosoma 60:297-344.

Bendich, A. J., and K. Drlica. 2000. Prokaryotic and eukaryotic chromosomes: what's the difference? Bioessays 22:481-486.

Bennett, M. D. 2004. Perspectives on polyploidy in plants - ancient and neo. Biol J Linn Soc 82:411-423.

Bernander, R., J. E. D. Palm, and S. G. Svard. 2001. Genome ploidy in different stages of the Giardia lamblia life cycle. Cellular Microbiology 3:55-62.

Berney, C., and J. Pawlowski. 2006. A molecular time-scale for eukaryote evolution recalibrated with the continuous microfossil record. Proc Roy Soc B 273:1867- 1872.

Bhattacharya, D., Helmchen, T., Melkonian, M. 1995. Molecular evolutionary analyses of nuclear-encoded small subunit ribosomal RNA identify an independent rhizopod lineage containing the euglyphina and the chlorarachniophyta. J Euk Microbiol 42:65-69.

Bhattacharya, D., H. S. Yoon, and J. D. Hackett. 2004. Photosynthetic eukaryotes unite: endosymbiosis connects the dots. Bioessays 26:50-60.

Bird, A. 2007. Perceptions of epigenetics. Nature 447:396-398.

136

Bodył, A. 2005. Do plastid-related characters support the chromalveolate hypothesis? J Phycol 41:712-719.

Bodył, A., J. W. Stiller, and P. Mackiewicz. 2009. Chromalveolate plastids: direct descent or multiple endosymbioses? Trends Ecol Evol 24:119-121.

Bolivar, I., J. F. Fahrni, A. Smirnov, and J. Pawlowski. 2001. SSU rRNA-based phylogenetic position of the genera Amoeba and (Lobosea, Gymnamoebia): The origin of gymnamoebae revisited. Mol Biol Evol 18:2306- 2314.

Bown, P. R. 1998. Calcareous nannofossil biostratigraphy. Kluwer Academic Publishers, London

Bowser, S. S., A. Habura, and J. Pawlowski. 2006. Molecular evolution of Foraminiferain Genome Evolution in Eukaryotic Microbes (L. A. Katz, and D. Bhattacharya, eds.). Oxford University Press, Oxford.

Bowser, S. S., S. M. McGee-Russell, and C. L. Rieder. 1985. Digestion of prey in Foraminifera is not anomalous: A correlation of light microscopic, cytochemical, and HVEM techniques to study phagotrophy in two allogromiids. Tissue Cell 17:823-839.

Bowser, S. S., and J. L. Travis. 2002. Reticulopodia: Structural and behavioral basis for the suprageneric placement of Granuloreticulosan protists. J Foramin Res 32:440- 447.

Bresler, V., and L. Fishelson. 2003. Polyploidy and polyteny in the gigantic eubacterium Epulopiscium fishelsoni. Mar Biol 143:17-21.

Brinkmann, H., M. Van der Giezen, Y. Zhou, G. P. De Raucourt, and H. Philippe. 2005. An empirical assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics. Syst Biol 54:743-757.

Brocks, J. J., G. A. Logan, R. Buick, and R. E. Summons. 1999. Archean molecular fossils and the early rise of eukaryotes. Science 285:1033-1036.

Burki, F., K. Shalchian-Tabrizi, M. Minge, Å. Skjæveland, S. I. Nikolaev, K. S. Jakobsen, and J. Pawlowski. 2007. Phylogenomics reshuffles the eukaryotic supergroups. PLoS ONE 2:e790.

Burki, F., K. Shalchian-Tabrizi, and J. Pawlowski. 2008. Phylogenomics reveals a new 'megagroup' including most photosynthetic eukaryotes. Biology Letters 4:366- 369.

137

Burki, F., Y. Inagaki, J. Brate, J. M. Archibald, P. J. Keeling, T. Cavalier-Smith, M. Sakaguchi, T. Hashimoto, A. Horak, S. Kumar, D. Klaveness, K. S. Jakobsen, J. Pawlowski, and K. Shalchian-Tabrizi. 2009. Large-scale phylogenomic analyses reveal that two enigmatic protist lineages, and Centroheliozoa, are related to photosynthetic chromalveolates. Genome Biol Evol 2009:231-238.

Butterfield, N. J. 2000. Bangiomorpha pubescens n. gen., n. sp.: implications for the evolution of sex, multicellularity, and the Mesoproterozoic/Neoproterozoic radiation of eukaryotes. Paleobiol 26:386-404.

Butterfield, N. J. 2004. A vaucheriacean alga from the middle Neoproterozoic of Spitsbergen: implications for the evolution of Proterozoic eukaryotes and the . Paleobiol 30:231-252.

Butterfield, N. J., A. H. Knoll, and K. Swett. 1994. Paleobiology of the Neoproterozoic Svanbergfjellet Formation, Spitsbergen. Fossils and Strata 34:1-84.

Canfield, D. E. 1998. A new model for Proterozoic ocean chemistry. Nature 396:450- 453.

Cavalier-Smith, T. 1981. Eukaryote kingdoms - seven or nine. Biosystems 14:461-481.

Cavalier-Smith, T. 1982. The origins of plastids. Biol J Linn Soc 17:289-306.

Cavalier-Smith, T. 1983. A 6-kingdom classification and a unified phylogeny. Pages 1027-1034 in Endocytobiology II: Intracellular space as oligogenetic ecosystem (H. E. A. Schenk, and W. Schwemmler, eds.). Walter de Gruyter, Berlin.

Cavalier-Smith, T. 1996/97. Amoeboflagellates and mitochondrial cristae in eukaryote evolution: Megasystematics of the new protozoan subkingdoms Eozoa and Neozoa. Arch Protistenkd 147:237-258.

Cavalier-Smith, T. 1998. A revised six-kingdom system of life. Biol Rev Cambridge Philosophic Soc 73:203-266.

Cavalier-Smith, T. 1999. Principles of protein and lipid targeting in secondary : Euglenoid, dinoflagellate, and sporozoan plastid origins and the eukaryote family tree. J Euk Microbiol 46:347-366.

Cavalier-Smith, T. 2002. The phagotrophic origin of eukaryotes and phylogenetic classification of protozoa. Int J Syst Evol Microbiol 52:297-354.

Cavalier-Smith, T. 2003. The excavate protozoan phyla Metamonada Grasse emend. (, Parabasalia, Carpediemonas, Eopharyngia) and Loukozoa emend. (Jakobea, Malawimonas): their evolutionary affinities and new higher taxa. Int J Syst Evol Microbiol 53:1741-1758.

138

Cavalier-Smith, T. 2004. Only six kingdoms of life. Proc Roy Soc B 271:1251-1262.

Cavalier Smith, T. 2010. Deep phylogeny, ancestral groups and the four ages of life. Philos Trans R Soc B 365:111-132.

Cavalier-Smith, T. 2010. Kingdoms Protozoa and Chromista and the eozoan root of the eukaryotic tree. Biology Letters 6:342-345.

Cavalier-Smith, T., and E. E. Chao. 1995. The opalozoan is related to the common ancestor of animals, fungi, and choanoflagellates. Proc Roy Soc B 261:1-6.

Cavalier-Smith, T., and E. E. Chao. 1997. Sarcomonad ribosomal RNA sequences, rhizopod phylogeny, and the origin of euglyphid amoebae. Arch Protistenkd 147:227-236.

Cavalier-Smith, T., and E. E. Y. Chao. 2003a. Phylogeny and classification of Cercozoa (Protozoa). Protist 154:341-358.

Cavalier-Smith, T., and E. E. Y. Chao. 2003b. Phylogeny of , , and other protozoa and early eukaryote megaevolution. J Mol Evol 56:540-563.

Cavalier-Smith, T., E. E. Y. Chao, and B. Oates. 2004. Molecular phylogeny of Amoebozoa and the evolutionary significance of the unikont . Eur J Protistol 40:21-48.

Cavalier-Smith, T., and S. Nikolaev. 2008. The zooflagellates Stephanopogon and are a clade (Class Percolatea: Phylum ). J Euk Microbiol 55:501-509.

Cerutti, H., and J. A. Casas-Mollano. 2006. On the origin and functions of RNA- mediated silencing: from protists to man. Curr Genet 50:81-99.

Cheeseman, I. H., N. Gomez-Escobar, C. K. Carret, A. Ivens, L. B. Stewart, K. K. A. Tetteh, and D. J. Conway. 2009. Gene copy number variation throughout the Plasmodium falciparum genome. BMC Genomics 10:353.

Cohen, S., and D. Segal. 2009. Extrachromosomal circular DNA in eukaryotes: Possible involvement in the plasticity of tandem repeats. Cytogenet Genome Res 124:327- 338.

Cohen, P. A., A. H. Knoll, and R. B. Kodner. 2009. Large spinose microfossils in rocks as resting stages of early animals. Proc Natl Acad Sci, USA 106:6519-6524.

139

Cook, J. R. 1981. Variation of DNA levels in Euglena related to pH of culture-medium. J Protozool 28:148-150.

Copeland, H. F. 1956. Classification of Lower Organisms. Pacific Books, Palo Alto, CA, USA

Crane, P. R., E. M. Friis, and K. R. Pedersen. 1995. The origin and early diversification of angiosperms Nature 374:27-33.

Cullis, C. A. 1973a. DNA amounts in nuclei of Paramecium bursaria Chromosoma 40:127-133.

Cullis, C. A. 1973b. DNA differences between flax genotypes. Nature 243:515-516.

Cullis, C. A. 1980. DNA sequence organization in the flax genome. Biochemica et Biophysica Acta 652:1-15.

Cullis, C. A. 2005. Mechanisms and control of rapid genomic changes in flax. Ann Bot 95:201-206.

Cummings, M. P., and A. Meyer. 2005. Magic bullets and golden rules: Data sampling in molecular phylogenetics. Zoology 108:329-336.

Cummings, M. P., S. P. Otto, and J. Wakeley. 1995. Sampling properties of DNA- sequence data in phylogenetic analysis. Mol Biol Evol 12:814-822.

DeBry, R. W. 2005. The systematic component of phylogenetic error as a function of taxonomic sampling under parsimony. Syst Biol 54:432-440.

Delwiche, C. F. 1999. Tracing the tread of plastid diversity through the tapestry of life. Am Nat 154:S164-S177.

Delwiche, C. F., and J. D. Palmer. 1996. Rampant horizontal transfer and duplication of rubisco genes in eubacteria and plastids. Mol Biol Evol 13:873-82.

Dobell, C. C. 1911. The principles of protistology. Arch Protist 23:269-310.

Donoghue, P. C. J., and M. J. Benton. 2007. Rocks and clocks: calibrating the Tree of Life using fossils and molecules. Trends Ecol Evol 22:424-431.

Doolittle, W. F. 1998. You are what you eat: a gene transfer ratchet could account for bacterial genes in eukaryotic nuclear genomes. Trends Genet 14:307-311.

Dostál, O., and J. Prokop. 2009. New fossil (Diaphanopterodea: Martynoviidae) from the Lower Permian of the Boskovice Basin, southern Moravia. GeoBios 42:495-502.

140

Douzery, E. J. P., E. A. Snell, E. Bapteste, F. Delsuc, and H. Philippe. 2004. The timing of eukaryotic evolution: Does a relaxed molecular clock reconcile proteins and fossils? Proc Natl Acad Sci, USA101:15386-15391.

Doyle, J. J. 1992. Gene trees and species trees: molecular systematics as one-character taxonomy. Syst Bot 17:144-163.

Drouin, G. 2006. Chromatin diminution in the copepod Mesocyclops edax: diminution of tandemly repeated DNA families from somatic cells. Genome 49:657-665.

Drummond, A., and A. Rambaut. 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol 7:214.

Drummond, A. J., S. Y. W. Ho, M. J. Phillips, and A. Rambaut. 2006. Relaxed phylogenetics and dating with confidence. PLoS Biology 4:699-710.

Eddy, S. R. 2001. HMMER: Profile hidden markov models for biological sequence analysis. Available from: http://hmmer.janelia.org/.

Efron, B., E. Halloran, and S. Holmes. 1996. Bootstrap confidence levels for phylogenetic trees Proc Natl Acad Sci, USA93:13429-13434.

Eisen, J. A., R. S. Coyne, M. Wu, D. Y. Wu, M. Thiagarajan, J. R. Wortman, J. H. Badger, Q. H. Ren, P. Amedeo, K. M. Jones, L. J. Tallon, A. L. Delcher, S. L. Salzberg, J. C. Silva, B. J. Haas, W. H. Majoros, M. Farzad, J. M. Carlton, R. K. Smith, J. Garg, R. E. Pearlman, K. M. Karrer, L. Sun, G. Manning, N. C. Elde, A. P. Turkewitz, D. J. Asai, D. E. Wilkes, Y. F. Wang, H. Cai, K. Collins, A. Stewart, S. R. Lee, K. Wilamowska, Z. Weinberg, W. L. Ruzzo, D. Wloga, J. Gaertig, J. Frankel, C. C. Tsao, M. A. Gorovsky, P. J. Keeling, R. F. Waller, N. J. Patron, J. M. Cherry, N. A. Stover, C. J. Krieger, C. del Toro, H. F. Ryder, S. C. Williamson, R. A. Barbeau, E. P. Hamilton, and E. Orias. 2006. Macronuclear genome sequence of the ciliate Tetrahymena thermophila, a model eukaryote. Plos Biology 4:1620-1642.

Eldridge, J. L., and G. L. Krapu. 1988. The influence of diet quality on clutch size and laying pattern in Mallards. Auk 105:102-110.

Embley, T. M., M. van der Giezen, D. S. Horner, P. L. Dyal, and P. Foster. 2003. Mitochondria and hydrogenosomes are two forms of the same fundamental . Philos Trans Roy Soc Lond B-Biol Sci 358:191-202.

Embley, T. M., and W. Martin. 2006. Eukaryotic evolution, changes and challenges. Nature 440:623-630.

Evans, G. M., A. Durrant, and H. Rees. 1966. Associated nuclear changes in the induction of flax genotrophs. Nature 212:697-699.

141

Fahrni, J. F., I. Bolivar, U. Berney, E. Nassonova, A. Smirnov, and J. Pawlowski. 2003. Phylogeny of lobose amoebae based on actin and small-subunit ribosomal RNA genes. Mol Biol Evol 20:1881-1886.

Felsenstein, J. 1978. Cases in which parsimony or compatibility methods will be positively misleading. Syst Zool 27:401-410.

Felsenstein, J. 1988. Phylogenies from molecular sequences: inference and reliability. Annu Rev Genet 22:521-565.

Felsenstein, J. 2004. Inferring Phylogenies. Sinauer Associates, Sunderland, MA.

Fensome, R. A., J. F. Saldarriaga, and F. Taylor. 1999. Dinoflagellate phylogeny revisited: reconciling morphological and molecular based phylogenies. Grana 38:66-80.

Forest, F. 2009. Calibrating the Tree of Life: fossils, molecules and evolutionary timescales. Ann Bot 104:789-794.

Føyn, B. 1936. Über die Kernverhaltnisse der Foraminifere Myxotheca arenilega Schaudinn. Arch Protistenkd 87:272-295.

Freeman, J. L., G. H. Perry, L. Feuk, R. Redon, S. A. McCarroll, D. M. Altshuler, H. Aburatani, K. W. Jones, C. Tyler-Smith, M. E. Hurles, N. P. Carter, S. W. Scherer, and C. Lee. 2006. Copy number variation: New insights in genome diversity. Genome Res 16:949-961.

Friz, C. T. 1968. The biochemical composition of the free-living Amoebae Chaos chaos, Amoeba dubia and Amoeba proteus. Comp Biochem Physiol 26:81-90.

Galtier, N., M. Gouy, and C. Gautier. 1996. SeaView and Phylo_win, two graphic tools for sequence alignment and molecular phylogeny. Comput Appl Biosci 12:543- 548.

Ganem, N. J., Z. Storchova, and D. Pellman. 2007. Tetraploidy, aneuploidy and cancer. Curr Opin Genet Dev 17:157-162.

Goldberg, A. D., C. D. Allis, and E. Bernstein. 2007. Epigenetics: A landscape takes shape. Cell 128:635-638.

Goldstein, S. T. 1997. Gametogenesis and the antiquity of reproductive pattern in the Foraminiferida. J Foramin Res 27:319-328.

Goldstein, S. T. 1999. Foraminifera: A biological overview. Pages 37-56 in Modern Foraminifera (B. K. Sen Gupta, ed.) Kluwer, Dordrecht.

142

Gould, S. B., R. R. Waller, and G. I. McFadden. 2008. Plastid evolution. Ann Rev Plant Biol 59:491-517.

Graur, D., and W. Martin. 2004. Reading the entrails of chickens: molecular timescales of evolution and the illusion of precision. Trends Genet 20:80-86.

Gray, J., and A. J. Boucot. 1989. Is Moyeria a euglenoid? Lethaia 22:447-456.

Grell, K. G. 1953. Die Chromosomen von Aulacantha scolymantha Haeckel. . Arch Protistenkd 99:1-54.

Grell, K. G., and A. Ruthmann. 1964. Über die Karyologie des Radiolars Aulacantha scolymantha und die Feinstruktur seiner Chromosomen. Chromosoma 15:158- 211.

Gribaldo, S., and H. Philippe. 2002. Ancient phylogenetic relationships. Theoretical Population Biology 61:391-408.

Group, A. P. 1998. An ordinal classification for the families of flowering plants. Ann Missouri Bot Gard:531-553.

GroupII, A. P. 2003. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants. Botanical Journal of the Linnean Society:399-436.

Grzebyk, D., O. Schofield, C. Vetriani, and P. G. Falkowski. 2003. The mesozoic radiation of eukaryotic algae: The portable plastid hypothesis. J Phycol 39:259- 267.

Guillou, L., M.-J. Chretinnot-Dinet, S. Boulben, S. Y. Moon-Van der Staay, and D. Vaulot. 1999. Symbiomonas scintillans gen. et sp. nov. Picophagus flagellatus gen. et sp. nov. (Heterokonta): two new heterotrophic flagellates of picoplanctonic size. . Protist 150:383-398.

Hackett, J. D., H. S. Yoon, S. Li, A. Reyes-Prieto, S. E. Rummele, and D. Bhattacharya. 2007. Phylogenomic analysis supports the monophyly of cryptophytes and haptophytes and the association of 'Rhizaria' with chromalveolates. Mol Biol Evol 8:1702-13.

Haeckel, E. H. P. A. 1866. Generelle morphologie der organismen. Allgemeine grundzüge der organischen formen-wissenschaft, mechanisch begründet durch die von Charles Darwin reformirte descendenztheorie. G. Reimer, Berlin.

143

Hampl, V., D. S. Horner, P. Dyal, J. Kulda, J. Flegr, P. G. Foster, and T. M. Embley. 2005. Inference of the phylogenetic position of based on nine genes: Support for Metamonada and Excavata. Mol Biol Evol 22:2508-2518.

Hampl, V., L. Hug, J. W. Leigh, J. B. Dacks, B. F. Lang, A. G. B. Simpson, and A. J. Roger. 2009. Phylogenomic analyses support the monophyly of Excavata and resolve relationships among eukaryotic "supergroups". Proc Natl Acad Sci, USA106:3859-3864.

Harper, J. T., E. Waanders, and P. J. Keeling. 2005. On the monophyly of chromalveolates using a six-protein phylogeny of eukaryotes. Int J Syst Evol Microbiol 55:487-496.

Hartmann, S., and T. J. Vision. 2008. Using ESTs for phylogenomics: Can one accurately infer a from a gappy alignment? BMC Evol Biol 8:95.

Hartwell, L. H., L. Hood, M. L. Goldberg, A. E. Reynolds, L. M. Silver, and R. C. Veres. 2004. Genetics: The Study of Biological Information. Pages 1-12 in Genetics: From Genes to Genomes. McGraw Hill, New York.

Harwood, D. M., V. A. Nikolaev, and D. M. Winter. 2007. Cretaceous records of diatom evolution, radiation, and expansion. Pages 33-59 in Pond Scum to Carbon Sink: Geological and Environmental Applications of the Diatoms, Paleontological Society Short Course, October 27, 2007 (S. Starratt, ed.) Paleontological Society Papers.

Heath, T. A., S. M. Hedtke, and D. M. Hillis. 2008. Taxon sampling and the accuracy of phylogenetic analyses. J Syst Evol 46:239-257.

Hedges, S. B., J. E. Blair, M. L. Venturi, and J. L. Shoe. 2004. A molecular timescale of eukaryote evolution and the rise of complex multicellular life. BMC Evol Biol 4:2.

Hedtke, S. M., T. M. Townsend, and D. M. Hillis. 2006. Resolution of phylogenetic conflict in large data sets by increased taxon sampling. Syst Biol 55:522-529.

Hendy, M. D., and D. Penny. 1989. A framework for the quantitative study of evolutionary trees. Syst Zool 38:297-309.

Hennig, W. 1966. Phylogenetic Systematics. University of Illinois Press, Urbana

Hermann, T. N. 1990. Organic world one billion years ago. Nauka, Leningrad

Hijri, M., and I. R. Sanders. 2005. Low gene copy number shows that arbuscular mycorrhizal fungi inherit genetically different nuclei. Nature 433:160-163.

144

Hillis, D. M. 1998. Taxonomic sampling, phylogenetic accuracy, and investigator bias. Syst Biol 47:3-8.

Hillis, D. M., D. D. Pollock, J. A. McGuire, and D. J. Zwickl. 2003. Is sparse taxon sampling a problem for phylogenetic inference? Syst Biol 52:124-126.

Hirt, R. P., J. M. Logsdon, B. Healy, M. W. Dorey, W. F. Doolittle, and T. M. Embley. 1999. Microsporidia are related to Fungi: Evidence from the largest subunit of RNA polymerase II and other proteins. Proc Natl Acad Sci, USA96:580-585.

Ho, S. Y. W., and M. J. Phillips. 2009. Accounting for calibration uncertainty in phylogenetic estimation of evolutionary divergence times. Syst Biol 58:367-380.

Hou, G. Y., S. M. Leblancq, E. Yaping, H. X. Zhu, and M. G. S. Lee. 1995. Structure of a frequently rearranged ribosomal RNA encoding chromosome in Giardia lamblia. Nucleic Acids Res 23:3310-3317.

Howe, C. J., A. C. Barbrook, R. E. R. Nisbet, P. J. Lockhart, and A. W. D. Larkum. 2008. The origin of plastids. Philos Trans R Soc B 363:2675-2685.

Huang, J. L., N. Mullapudi, T. Sicheritz-Ponten, and J. C. Kissinger. 2004. A first glimpse into the pattern and scale of gene transfer in the Apicomplexa. Int J Parasitol 34:265-274.

Huelsenbeck, J. P., and F. Ronquist. 2001. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17:754-755.

Hug, L. A., and A. J. Roger. 2007. The impact of fossils and taxon sampling on ancient molecular dating analyses. Mol Biol Evol 24:1889-1897.

Inagaki, Y., E. Susko, N. M. Fast, and A. J. Roger. 2004. Covarion shifts cause a long- branch attraction artifact that unites Microsporidia and Archaebacteria in EF-1 alpha phylogenies. Mol Biol Evol 21:1340-1349.

Javaux, E. J. 2007. The early eukaryotic fossil record. Pages 1-19 in Eukaryotic Membranes and Cytoskeleton: Origins and Evolution. (ed. Jekely, G). Landes Biosciences and Springer, Netherlands.

Javaux, E. J., A. H. Knoll, and M. Walter. 2003. Recognizing and interpreting the fossils of early eukaryotes. Origins of Life and Evolution of the Biosphere 33:75-94.

Javaux, E. J., A. H. Knoll, and M. R. Walter. 2001. Morphological and ecological complexity in early eukaryotic ecosystems. Nature 412:66-69.

Javaux, E. J., C. P. Marshall, and A. Bekker. 2010. Organic-walled microfossils in 3.2- billion-year-old shallow-marine siliciclastic deposits. Nature 463:934-939.

145

Jeffroy, O., H. Brinkmann, F. Delsuc, and H. Philippe. 2006. Phylogenomics: the beginning of incongruence? Trends Genet 22:225-231.

Jorgensen, H. B., K. Hedlund, and J. A. Axelsen. 2008. Life-history traits of soil collembolans in relation to food quality. Applied Soil Ecology 38:146-151.

Jorgensen, P., N. P. Edgington, B. L. Schneider, I. Rupes, M. Tyers, and B. Futcher. 2007. The size of the nucleus increases as yeast cells grow. Mol Biol Cell 18:3523-3532.

Judson, O. P., and B. B. Normark. 1996. Ancient asexual scandals. Trends Ecol Evol 11:A41-A46.

Kapuscinski, J. 1995. DAPI: A DNA-Specific Fluorescent Probe. Biotech Histochem 70:220-233.

Karpov, S., M. L. Sogin, and J. D. Silberman. 2001. Rootlet , taxonomy, and phylogeny of bicosoecids based on 18S rRNA gene sequences. Protistology 2:34- 47.

Katoh, K., and H. Toh. 2008. Recent developments in the MAFFT multiple sequence alignment program. Briefings in Bioinformatics 9:286-298.

Katz, L. A. 1999. The tangled web: gene genealogies and the origin of eukaryotes. Am Nat 154:S137-S145.

Katz, L. A. 2001. Evolution of nuclear dualism in ciliates: a reanalysis in light of recent molecular data. Int J Syst Evol Microbiol 51:1587-1592.

Katz, L. A. 2002. Lateral gene transfers and the evolution of eukaryotes: theories and data. Int J Syst Evol Microbiol 52:1893-1900.

Katz, L. A. 2006. Genomes: Epigenomics and the future of genome sciences. Curr Biol 16:R996-R997.

Katz, L. A., J. G. Bornstein, E. Lasek-Nesselquist, and S. V. Muse. 2004. Dramatic diversity of ciliate histone H4 genes revealed by comparisons of patterns of substitutions and paralog divergences among eukaryotes. Mol Biol Evol 21:555- 562.

Keeling, P. J. 2004. Diversity and evolutionary history of plastids and their hosts. Am J Bot 91:1481-1493.

Keeling, P. J. 2009. Chromalveolates and the Evolution of Plastids by Secondary Endosymbiosis. J Euk Microbiol 56:1-8.

146

Keeling, P. J., G. Burger, D. G. Durnford, B. F. Lang, R. W. Lee, R. E. Pearlman, A. J. Roger, and M. W. Gray. 2005. The tree of eukaryotes. Trends Ecol Evol 20:670- 676.

Keeling, P. J., J. A. Deane, and G. I. McFadden. 1998. The phylogenetic position of alpha- and beta-tubulins from the Chlorarachnion host and Cercomonas (Cercozoa). J Euk Microbiol 45:561-570.

Keeling, P. J., M. A. Luker, and J. D. Palmer. 2000. Evidence from Beta-tubulin phylogeny that Microsporidia evolved within the fungi. Mol Biol Evol 17:23-31.

Keeling, P. J., and C. H. Slamovits. 2004. Simplicity and complexity of microsporidian genomes. Eukaryot Cell 3:1363-1369.

Kehrer-Sawatzki, H. 2007. What a difference copy number variation makes. Bioessays 29:311-313.

Kenrick, P., and P. R. Crane. 1997. The origin and early evolution of plants on land. Nature 389:33-39.

Kim, E., and L. E. Graham. 2008. EEF2 analysis challenges the monophyly of Archaeplastida and Chromalveolata. PLoS ONE 3:e2621.

King, N. 2004. The unicellular ancestry of animal development. Developmental Cell 7:313-325.

Kleemann, G., K. Poralla, G. Englert, H. Kjosen, N. Liaaen-jensen, S. Neunlist, and M. Rohmer. 1990. Tetrahymanol from the phototrophic bacterium Rhodopseudomonas palustris: first report of a gammacerane triterpene from a prokaryote. J General Microbiol 136:2551-2553.

Kloc, M., and B. Zagrodzinska. 2001. Chromatin elimination - an oddity or a common mechanism in differentiation and development? Differentiation 68:84-91.

Kneip, C., C. Voss, P. J. Lockhart, and U. G. Maier. 2008. The cyanobacterial endosymbiont of the unicellular algae Rhopalodia gibba shows reductive genome evolution. BMC Evol Biol 8:30.

Knoll, A. H. In review. The multiple origins of complex multicellularity. Annual Review.

Knoll, A. H. 1994. Proterozoic and early cambrian protists: evidence for accelerating evolutionary tempo. Proc Natl Acad Sci, USA 91:6743-6750.

Knoll, A. H., E. J. Javaux, D. Hewitt, and P. Cohen. 2006. Eukaryotic organisms in Proterozoic oceans. Philos Trans R Soc B 361:1023-1038.

147

Koonin, E. V. 2010. The origin and early evolution of eukaryotes in the light of phylogenomics. Genome Biology 11.

Kondorosi, E., F. Roudier, and E. Gendreau. 2000. Plant cell-size control: growing by ploidy? Curr Opin Plant Biol 3:488-492.

Kondrashov, A. S. 1994. The asexual ploidy cycle and the origin of sex. Nature 370:213- 216.

Kondrashov, A. S. 1997. Evolutionary genetics of life cycles. Annu Rev Ecol Syst 28:391-435.

Koonin, E. V. 2010. The origin and early evolution of eukaryotes in the light of phylogenomics. Genome Biol 11:209.

Kosakovsky Pond, S. L., S. D. Frost, and S. V. Muse. 2005. HyPhy: hypothesis testing using phylogenies. Bioinformatics 21:676-679.

Kostka, M., V. Hampl, I. Cepicka, and J. Flegr. 2004. Phylogenetic position of Protoopalina intestinalis based on SSU rRNA gene sequence. Mol Phylogenet Evol 33:220-224.

Kreler, J. 1991-1995. Parasitic Protozoa, Second Edition edition. Academic Press.

Krings, M., N. Dotzler, J. Galtier, and T. N. Taylor. 2010. Oldest fossil basidiomycete clamp connections. Mycoscience.

Kudryavtsev, A., D. Bernhard, M. Schlegel, E. E. Y. Chao, and T. Cavalier-Smith. 2005. 18S ribosomal RNA gene sequences of Cochliopodium () and the phylogeny of Amoebozoa. Protist 156:215-224.

Kuhn, G., M. Hijri, and I. R. Sanders. 2001. Evidence for the evolution of multiple genomes in arbuscular mycorrhizal fungi. Nature 414:745-748.

Lane, C. E., and J. M. Archibald. 2008. The eukaryotic tree of life: endosymbiosis takes its TOL. Trends Ecol Evol 23:268-275.

Larkum, A. W. D., P. J. Lockhart, and C. J. Howe. 2007. Shopping for plastids. Trends Plant Sci 12:189-195.

Lasek-Nesselquist, E., D. M. Welch, R. C. A. Thompson, R. F. Steuart, and M. L. Sogin. 2009. Genetic exchange within and between assemblages of Giardia duodenalis. J Euk Microbiol 56:504-518.

148

Leander, B. S., R. P. Witek, and M. A. Farmer. 2001. Trends in the evolution of the euglenid pellicle. Evoution 55:2215-2235.

Le Blancq, S. M., and R. D. Adam. 1998. Structural basis of karyotype heterogeneity in Giardia lamblia. Mol Biochem Parasitol 97:199-208.

Lee, J. J., and M. E. McEnery. 1970. Autogamy in Allogromia laticollaris (Foraminifera). J Protozool 17:184-195.

Lee, J. J., M. E. McEnery, and H. Rubin. 1969. Quantitative Studies on Growth of Allogromia Laticollaris (Foraminifera). J Protozool 16:377-395.

Leigh, J. W. 2008. Congruence in Phylogenomic Data: Exploring Artifacts in Deep Eukaryotic Phylogeny in Biochemistry Dalhousie, Halifax, NS.

Leigh, J. W., E. Susko, M. Baumgartner, and A. J. Roger. 2008. Testing congruence in phylogenomic analysis. Syst Biol 57:104-115.

Leiming, Y., and Y. Xunlai. 2007. Radiation of Meso-Neoproterozoic and early Cambrian protists inferred from the microfossil record of China. Palaeogeography, Palaeoclimatology, Palaeoecology 254:350-361.

Leliaert, F., O. De Clerck, H. Verbruggen, C. Boedeker, and E. Coppejans. 2007. Molecular phylogeny of the Siphonocladales (Chlorophyta : Cladophorophyceae). Mol Phylogenet Evol 44:1237-1256.

Lewin, B. 2000. Genes VII. Oxford University Press, Oxford.

Lewis, P. O., M. T. Holder, and K. E. Holsinger. 2005. Polytomies and Bayesian phylogenetic inference. Syst Biol 54:241-253.

Lipps, H. J. 1993. Fossil Prokaryotes and Protists. Blackwell Scientific Publications., Boston.

Lipscomb, D. L. 1984. Methods of systematic analysis: the relative superiority of phylogenetic systematics. Origins of Life 13:235-.

Liu, H., S. Aris-Brosou, I. Probert, and C. de Vargas. 2010. A Time line of the environmental genetics of the haptophytes. Mol Biol Evol 27:161-176.

Loeblich, A. R., Jr., and H. Tappan. 1987. Foraminiferal Genera and their Classification. Van Nostrand Reinhold Co., New York.

Lohia, A. 2003. The cell cycle of Entamoeba histolytica. Mol Cell Biochem 253:217- 222.

149

Longet, D., F. Burki, J. Flakowski, C. Berney, S. Polet, J. Fahrni, and J. Pawlowski. 2004. Multigene evidence for close evolutionary relations between Gromia and Foraminifera. Acta Protozool 43:303-311.

Love, G. D., E. Grosjean, C. Stalvies, D. A. Fike, J. P. Grotzinger, A. S. Bradley, A. E. Kelly, M. Bhatia, W. Meredith, C. E. Snape, S. A. Bowring, D. J. Condon, and R. E. Summons. 2009. Fossil steroids record the appearance of Demospongiae during the Cryogenian period. Nature 457:718-U5.

Lucking, R., S. Huhndorf, D. H. Pfister, E. R. Plata, and H. T. Lumbsch. 2009. Fungi evolved right on track. Mycologia 101:810-822.

Lühe, M. 1913. Erstes Urreich der Tierein Handbuch der Morphologie der Wirbellosen Tiere (A. Lang, ed.) G. Fischer, Jena.

Lynch, M. 2007. Origins of Genome Architecture Sinauer Associates Sunderland, MA.

Mable, B. K., and S. P. Otto. 1998. The evolution of life cycles with haploid and diploid phases. Bioessays 20:453-462.

Maddison, D. R., and W. P. Maddison. 2005. MacClade version 4.08: an analysis of phylogeny and character evolution. Sinauer Associates, Sunderland, MA.

Maddison, W. P. 1997. Gene trees in species trees. Syst Biol 46:523-536.

Makhlin, E., M. Kudryavtseva, and B. Kudryavtsev. 1979. Peculiarities of changes in DNA content of Amoeba proteus nuclei during interphase - cytofluorometric study. Expt Cell Res 118:143-150.

Margulis, L., and K. Schwartz. 1988. Five Kingdoms: An Illustrated Guide to the Phyla of Life on Earth (2nd edition). W. H. Freeman and Company, New York.

Marin, B., E. C. M. Nowack, and M. Melkonian. 2005. A plastid in the making: Primary endosymbiosis. Protist 156:425-432.

Martin, M. W., D. V. Grazhdankin, S. A. Bowring, D. A. D. Evans, M. A. Fedonkin, and J. L. Kirschvink. 2000. Age of Neoproterozoic bilatarian body and trace fossils, White Sea, Russia: Implications for metazoan evolution. Science 288:841-845.

Martin, W., T. Rujan, E. Richly, A. Hansen, S. Cornelsen, T. Lins, D. Leister, B. Stoebe, M. Hasegawa, and D. Penny. 2002. Evolutionary analysis of Arabidopsis, cyanobacterial, and genomes reveals plastid phylogeny and thousands of cyanobacterial genes in the nucleus. Proc Natl Acad Sci, USA 99:12246- 12251.

150

Martin, W., and C. Schnarrenberger. 1997. The evolution of the Calvin cycle from prokaryotic to eukaryotic chromosomes: a case study of functional redundancy in ancient pathways through endosymbiosis. Curr Genet 32:1-18.

May, K. J., and J. B. Ristaino. 2004. Identity of the mtDNA haplotype(s) of Phytophthora infestans in historical specimens from the Irish Potato Famine. Mycol Res 108:471-479.

McEnery, M. E., and J. J. Lee. 1976. Allogromia laticollaris: A Foraminiferan with an unusual apogamic metagenic life cycle. J Protozool 23:94-108.

McFadden, G. I. 2001. Primary and secondary endosymbiosis and the origin of plastids. J Phycol 37:951-959.

McFadden, G. I., and G. G. van Dooren. 2004. Evolution: Red algal genome affirms a common origin of all plastids. Curr Biol 14:R514-R516.

McGrath, C. L., and L. A. Katz. 2004. Genome diversity in microbial eukaryotes. Trends Ecol Evol 19:32-38.

Medlin, L., H. J. Elwood, S. Stickel, and M. L. Sogin. 1988. The characterization of enzymatically amplified eukaryotes 16S-like ribosomal RNA coding regions. Gene 71:491-500.

Michod, R. E., Y. Viossat, C. A. Solari, M. Hurand, and A. M. Nedelcu. 2006. Life- history evolution and the origin of multicellularity. J Theor Biol 239:257-272.

Mikkelsen, T. S., L. W. Hillier, E. E. Eichler, M. C. Zody, D. B. Jaffe, S. P. Yang, W. Enard, I. Hellmann, K. Lindblad-Toh, T. K. Altheide, N. Archidiacono, P. Bork, J. Butler, J. L. Chang, Z. Cheng, A. T. Chinwalla, P. deJong, K. D. Delehaunty, C. C. Fronick, L. L. Fulton, Y. Gilad, G. Glusman, S. Gnerre, T. A. Graves, T. Hayakawa, K. E. Hayden, X. Q. Huang, H. K. Ji, W. J. Kent, M. C. King, E. J. Kulbokas, M. K. Lee, G. Liu, C. Lopez-Otin, K. D. Makova, O. Man, E. R. Mardis, E. Mauceli, T. L. Miner, W. E. Nash, J. O. Nelson, S. Paabo, N. J. Patterson, C. S. Pohl, K. S. Pollard, K. Prufer, X. S. Puente, D. Reich, M. Rocchi, K. Rosenbloom, M. Ruvolo, D. J. Richter, S. F. Schaffner, A. F. A. Smit, S. M. Smith, M. Suyama, J. Taylor, D. Torrents, E. Tuzun, A. Varki, G. Velasco, M. Ventura, J. W. Wallis, M. C. Wendl, R. K. Wilson, E. S. Lander, and R. H. Waterston. 2005. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437:69-87.

Miller, M. A., M. T. Holder, R. Vos, P. E. Midford, T. Liebowitz, L. Chan, P. Hoover, and T. Warnow. 2009. The CIPRES Portals. www.phylo.org/sub_sections/portal/

151

Mochizuki, K., and M. A. Gorovsky. 2005. A Dicer-like protein in Tetrahymena has distinct functions in genome rearrangement, chromosome segregation, and meiotic prophase. Genes Dev 19:77-89.

Molinier, J., G. Ries, C. Zipfel, and B. Hohn. 2006. Transgeneration memory of stress in plants. Nature 442:1046-1049.

Moreira, D., H. Le Guyader, and H. Philippe. 2000. The origin of red algae and the evolution of . Nature 405:69-72.

Morrison, H. G., A. G. McArthur, F. D. Gillin, S. B. Aley, R. D. Adam, G. J. Olsen, A. A. Best, W. Z. Cande, F. Chen, M. J. Cipriano, B. J. Davids, S. C. Dawson, H. G. Elmendorf, A. B. Hehl, M. E. Holder, S. M. Huse, U. U. Kim, E. Lasek- Nesselquist, G. Manning, A. Nigam, J. E. J. Nixon, D. Palm, N. E. Passamaneck, A. Prabhu, C. I. Reich, D. S. Reiner, J. Samuelson, S. G. Svard, and M. L. Sogin. 2007. Genomic minimalism in the early diverging intestinal parasite Giardia lamblia. Science 317:1921-1926.

Mortimer, R. K. 1958. Radiobiological and genetic studies on a polyploid series (haploid to hexaploid) of Saccharomyces cerevisiae. Radiation Res 9:312-326.

Moustafa, A., B. Beszteri, U. G. Maier, C. Bowler, K. Valentin, and D. Bhattacharya. 2009. Genomic Footprints of a Cryptic Plastid Endosymbiosis in Diatoms. Science 324:1724-1726.

Nakayama, T., and K. Ishida. 2009. Another acquisition of a primary photosynthetic organelle is underway in Paulinella chromatophora. Curr Biol 19:R284-R285.

Neumann, F. R., and P. Nurse. 2007. Nuclear size control in fission yeast. J Cell Biol 179:593-600.

Nikolaev, S. I., C. Berney, J. F. Fahrni, I. Bolivar, S. Polet, A. P. Mylnikov, V. V. Aleshin, N. B. Petrov, and J. Pawlowski. 2004. The twilight of and rise of Rhizaria, an emerging supergroup of amoeboid eukaryotes. Proc Natl Acad Sci, USA101:8066-8071.

Nishi, A., K. Ishida, and H. Endoh. 2005. Reevaluation of the evolutionary position of opalinids based on 18S rDNA,and alpha- and beta-tubulin gene phylogenies. J Mol Evol 60:695-705.

Nosenko, T., K. L. Lidie, F. M. Van Dolah, E. Lindquist, F. Cheng, and D. Bhattacharya. 2006. Chimeric plastid proteome in Florida "red tide" dinoflagellate Karenia brevis Mol Biol Evol 23:2026-2038.

Nozaki, H. 2005. A new scenario of plastid evolution: plastid primary endosymbiosis before the divergence of the "Plantae," emended. J Plant Res 118:247-255.

152

Nuismer, S. L., and S. P. Otto. 2004. Host-parasite interactions and the evolution of ploidy. Proc Natl Acad Sci, USA101:11036-11039.

Nurse, P. 1990. Universal control mechanism regulating onset of M-phase. Nature 344:503-508.

Nylander, J. A. 2004. MrModelTest. Distributed by the author. Evolutionary Biology Centre, Uppsala University.

O'Kelly, C. J., and T. A. Nerad. 1999. Malawimonas jakobiformis n. gen., n. sp ( n. fam.): A Jakoba-like heterotrophic nanoflagellate with discoidal mitochondrial cristae. J Euk Microbiol 46:522-531.

Okamoto, N., and I. Inouye. 2005. The are a distant sister group of the Cryptophyta: A proposal for Katablepharidophyta divisio nova/Kathablepharida phylum novum based on SSU rDNA and beta-tubulin phylogeny. Protist 156:163- 179.

Ord, M. J. 1968. Synthesis of DNA through cell cycle of Amoeba proteus. J Cell Sci 3:483-491.

Otto, S. P., and J. Whitton. 2000. Polyploid incidence and evolution. Annu Rev Genet 34:401-437.

Palmer, J. D. 2003. The symbiotic birth and spread of plastids: how many times and whodunit? J Phycol 39:4-11.

Parfrey, L. W., E. Barbero, E. Lasser, M. Dunthorn, D. Bhattacharya, D. J. Patterson, and L. A. Katz. 2006. Evaluating support for the current classification of eukaryotic diversity. Plos Genetics 2:2062-2073.

Parfrey, L. W., J. Grant, Y. I. Tekle, E. Lasek-Nesselquist, H. G. Morrison, M. L. Sogin, D. J. Patterson, and L. A. Katz. 2010. Broadly sampled multigene analyses yield a well-resolved eukaryotic tree of life. Syst Biol.

Parfrey, L. W., and L. A. Katz. 2010. Dynamic genomes of eukaryotes and the maintenance of genomic integrity. Microbe Magazine 5:156-163.

Parfrey, L. W., D. J. G. Lahr, and L. A. Katz. 2008. The dynamic nature of eukaryotic genomes. Mol Biol Evol 25:787-794.

Patterson, D. J. 1985. The fine structure of Opalina ranarum (family ): opalinid phylogeny and classification. Protistologica 4:413-428.

Patterson, D. J. 1999. The diversity of eukaryotes. Am Nat 154:S96-S124.

153

Pawlowska, T. E. 2005. Genetic processes in arbuscular mycorrhizal fungi. FEMS Microbiology Letters 251:185-192.

Pawlowska, T. E., and J. W. Taylor. 2004. Organization of genetic variation in individuals of arbuscular mycorrhizal fungi. Nature 427:733-737.

Pawlowski, J., and F. Burki. 2009. Untangling the Phylogeny of Amoeboid Protists. J Euk Microbiol 56:16-25.

Penny, D., B. J. McComish, M. A. Charleston, and M. D. Hendy. 2001. Mathematical elegance with biochemical realism: The covarion model of molecular evolution. J Mol Evol 53:711-723.

Pertea, G., X. Q. Huang, F. Liang, V. Antonescu, R. Sultana, S. Karamycheva, Y. Lee, J. White, F. Cheung, B. Parvizi, J. Tsai, and J. Quackenbush. 2003. TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics 19:651-652.

Peterson, K. J., J. A. Cotton, J. G. Gehling, and D. Pisani. 2008. The Ediacaran emergence of bilaterians: congruence between the genetic and the geological fossil records. Philos Trans R Soc B 363:1435-1443.

Philippe, H., and A. Adoutte. 1998. The molecular phylogeny of Eukaryota: solid facts and uncertainties. Pages 25-56 in Evolutionary Relationships Among Protozoa (G. H. Coombs, K. Vickerman, M. A. Sleigh, and A. Warren, eds.). Kluwer Academic Publishers, Dodrecht.

Philippe, H., and P. Forterre. 1999. The rooting of the universal tree of life is not reliable. J Mol Evol 49:509-23.

Philippe, H., and A. Germot. 2000. Phylogeny of eukaryotes based on ribosomal RNA: long-branch attraction and models of sequence evolution. Mol Biol Evol 17:830- 834.

Philippe, H., P. Lopez, H. Brinkmann, K. Budin, A. Germot, J. Laurent, D. Moreira, M. Muller, and H. Le Guyader. 2000. Early-branching or fast-evolving eukaryotes? An answer based on slowly evolving positions. Proc Roy Soc B 267:1213-1221.

Philippe, H., E. A. Snell, E. Bapteste, P. Lopez, P. W. H. Holland, and D. Casane. 2004. Phylogenomics of eukaryotes: Impact of missing data on large alignments. Mol Biol Evol 21:1740-1752.

Polet, S., C. Berney, J. Fahrni, and J. Pawlowski. 2004. Small-subunit ribosomal RNA gene sequences of Phaeodarea challenge the monophyly of Haeckel's Radiolaria. Protist 155:53-63.

154

Porter, S. M. 2004. The fossil record of early eukaryotic diversification. Paleontological Society Papers 10:35-50.

Porter, S. M. 2006. Heterotrophic Eukaryotes. Pages 1-21 in Neoproterozoic Geobiology and Paleobiology (S. Xiao, and A. J. Kaufman, eds.). Springer, Netherlands.

Porter, S. M., R. Meisterfeld, and A. H. Knoll. 2003. Vase-shaped microfossils from the Neoproterozoic Chuar Group, Grand Canyon: A classification guided by modern testate amoebae. J Paleontol 77:409-429.

Prescott, D. M. 1994. The DNA of ciliated protozoa. Microbiol Rev 58:233-267.

Raikov, I. B. 1982. The Protozoan Nucleus: Morphology and Evolution. Springer, Netherlands.-Verlag, Wien Ramesh, M. A., S.-B. Malik, and J. M. Longsdon. 2005. A phylogenomic inventory of meiotic genes: evidence for sex in Giardia and an early eukaryotic origin of meiosis. Curr Biol 15:185-191.

Rannala, B., and Z. H. Yang. 2008. Phylogenetic inference using whole genomes. Annu Rev Genomics Hum Genet 9:217-231.

Redi, C. A., S. Garagna, H. Zacharias, M. Zuccotti, and E. Capanna. 2001. The other chromatin. Chromosoma 110:136-147.

Redon, R., S. Ishikawa, K. R. Fitch, L. Feuk, G. H. Perry, T. D. Andrews, H. Fiegler, M. H. Shapero, A. R. Carson, W. W. Chen, E. K. Cho, S. Dallaire, J. L. Freeman, J. R. Gonzalez, M. Gratacos, J. Huang, D. Kalaitzopoulos, D. Komura, J. R. MacDonald, C. R. Marshall, R. Mei, L. Montgomery, K. Nishimura, K. Okamura, F. Shen, M. J. Somerville, J. Tchinda, A. Valsesia, C. Woodwark, F. T. Yang, J. J. Zhang, T. Zerjal, J. Zhang, L. Armengol, D. F. Conrad, X. Estivill, C. Tyler- Smith, N. P. Carter, H. Aburatani, C. Lee, K. W. Jones, S. W. Scherer, and M. E. Hurles. 2006. Global variation in copy number in the human genome. Nature 444:444-454.

Reece, K. S., M. E. Siddall, N. A. Stokes, and E. M. Burreson. 2004. Molecular phylogeny of the Haplosporidia based on two independent gene sequences. J Parasitol 90:1111-1122.

Renner, S. S., G. W. Grimm, G. M. Schneeweiss, T. F. Stuessy, and R. E. Ricklefs. 2008. Rooting and Dating Maples (Acer) with an Uncorrelated-Rates Molecular Clock: Implications for North American/Asian Disjunctions. Syst Biol 57:795-808.

Reyes-Prieto, A., A. Moustafa, and D. Bhattacharya. 2008. Multiple genes of apparent algal origin suggest ciliates may once have been photosynthetic. Curr Biol 18:956-962.

155

Richards, E. J. 2006. Inherited epigenetic variation - revisiting soft inheritance. Nat Rev Genet 7:395-401.

Robinson, T., and L. A. Katz. 2007. Non-mendelian inheritance of paralogs of two cytoskeletal genes in the ciliate Chilodonella uncinata. Mol Biol Evol 24:2495- 2503.

Rodríguez-Ezpeleta, N., H. Brinkmann, S. C. Burey, B. Roure, G. Burger, W. Loffelhardt, H. J. Bohnert, H. Philippe, and B. F. Lang. 2005. Monophyly of primary photosynthetic eukaryotes: Green plants, red algae, and glaucophytes. Curr Biol 15:1325-1330.

Rodríguez-Ezpeleta, N., H. Brinkmann, G. Burger, A. J. Roger, M. W. Gray, H. Philippe, and B. F. Lang. 2007a. Toward resolving the eukaryotic tree: The phylogenetic positions of jakobids and cercozoans. Curr Biol 17:1420-1425.

Rodríguez-Ezpeleta, N., H. Brinkmann, B. Roure, N. Lartillot, B. F. Lang, and H. Philippe. 2007b. Detecting and overcoming systematic errors in genome-scale phylogenies. Syst Biol 56:389-399.

Roger, A. J. 1999. Reconstructing early events in eukaryotic evolution. Am Nat 154:S146-S163.

Roger, A. J., and L. A. Hug. 2006. The origin and diversification of eukaryotes: problems with molecular phylogenetics and molecular clock estimation. Philos Trans Roy Soc B 361:1039-1054.

Roger, A. J., and A. G. B. Simpson. 2009. Evolution: Revisiting the Root of the Eukaryote Tree. Curr Biol 19:R165-R167.

Rogozin, I. B., M. K. Basu, M. Csuros, and E. V. Koonin. 2009. Analysis of rare genomic changes does not support the unikont-bikont phylogeny and suggests cyanobacterial as the point of primary radiation of eukaryotes. Genome Biol Evol 2009:99-113.

Rokas, A., and S. B. Carroll. 2005. More genes or more taxa? The relative contribution of gene number and taxon number to phylogenetic accuracy. Mol Biol Evol 22:1337-1344.

Rokas, A., and S. Chatzimanolis. 2008. From gene-scale to genome-scale phylogenetics: the data flood in, but the challenges remain. Pages 1-12 in Methods in Molecular Biology: Phylogenomics (W. J. Murphy, ed.) Humana Press Inc., Totowa, NJ.

Ronquist, F., and J. P. Huelsenbeck. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19:1572-1574.

156

Rothpletz, A. 1896. Über die Flysch-Fucoiden und einige andere fossile Algen, sowie über liasische diatomeen führende Hornschwämme. Zeitschrift der Deutschen Geologishen Gesellschaft 52:154-160.

Rubinstein, C. V., P. Gerrienne, G. S. de la Puente, R. A. Astini, and P. Steemans. 2010. Early Middle Ordovician evidence for land plants in Argentina (eastern Gondwana). New Phytologist 188:365-369.

Rutschmann, F. 2006. Molecular dating of phylogenetic trees: A brief review of current methods that estimate divergence times. Divers Distrib 12:35-48.

Rutschmann, F., T. Eriksson, K. Abu Salim, and E. Conti. 2007. Assessing calibration uncertainty in molecular dating: The assignment of fossils to alternative calibration points. Syst Biol 56:591-608.

Sanchez-Puerta, M. V., T. R. Bachvaroff, and C. F. Delwiche. 2005. The complete plastid genome sequence of the Emiliania huxleyi : a comparison to other plastid genomes. DNA Res 12:151-156.

Sanderson, M. J. 2003. r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics 19:301-302.

Sanderson, M. J., and J. A. Doyle. 2001. Sources of error and confidence intervals in estimating the age of angiosperms from rbcL and 18S rDNA data. Am J Bot 88:1499-1516.

Scott, D. B., F. Medioli, and R. Braund. 2003. Foraminifera from the Cambrian of Nova Scotia: The oldest multichambered foraminifera. Micropaleontology 49:109-126.

Sen Gupta, B. K. 1999. Introduction to Modern Foraminifera. Pages 1-6 in Modern Foraminfera (B. K. Sen Gupta, ed.) Kluwer, Dordrecht.

Shalchian-Tabrizi, K., W. Eikrem, D. Klaveness, D. Vaulot, M. A. Minge, F. Le Gall, K. Romari, J. Throndsen, A. Botnen, R. Massana, H. A. Thomsen, and K. S. Jakobsen. 2006. Telonemia, a new protist phylum with affinity to chromist lineages. Proc Roy Soc B 273:1833-1842.

Sherman, I. W. (ed) 1998. Malaria. ASM Press, Washington DC.

Shimodaira, H., and M. Hasegawa. 2001. CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics 17:1246-1247.

Silberfeld, T., J. W. Leigh, H. Verbruggen, C. Cruaud, B. de Reviers, and F. Rousseau. 2010. A multi-locus time-calibrated phylogeny of the (Heterokonta, , Phaeophyceae): Investigating the evolutionary nature of the "brown algal crown radiation". Mol Phylogenet Evol 56:659-674.

157

Simpson, A. G. B. 1997. The identity and composition of the Euglenozoa. Arch Protistenkd 148:318-328.

Simpson, A. G. B. 2003. Cytoskeletal organization, phylogenetic affinities and systematics in the contentious taxon Excavata (Eukaryota). Int J Syst Evol Microbiol 53:1759-1777.

Simpson, A. G. B., Y. Inagaki, and A. J. Roger. 2006. Comprehensive multigene phylogenies of excavate protists reveal the evolutionary positions of "primitive" eukaryotes. Mol Biol Evol 23:615-625.

Simpson, A. G. B., and D. J. Patterson. 1999. The ultrastructure of Carpediemonas membranifera (Eukaryota) with reference to the "Excavate hypothesis". Eur J Protistol 35:353-370.

Simpson, A. G. B., and A. J. Roger. 2004. The real 'kingdoms' of eukaryotes. Curr Biol 14:R693-R696.

Slamovits, C. H., and P. J. Keeling. 2006. A high density of ancient spliceosomal introns in excavates. BMC Evol Biol 6:8.

Slamovits, C. H., and P. J. Keeling. 2008. Plastid-derived genes in the nonphotosynthetic Oxyrrhis marina. Mol Biol Evol 25:1297-1306.

Smirnov, A., E. Nassonova, C. Berney, J. Fahrni, I. Bolivar, and J. Pawlowski. 2005. Molecular phylogeny and classification of the lobose amoebae. Protist 156:129- 142.

Smith, P. 1996. Molecular Biology of Parasitic Protozoa. Oxford University Press, Oxford.

Smith, S. E., and D. J. Read. 1997. Mycorrhizal Symbiosis. Academic, San Diego.

Smithson, T. R., and W. D. I. Rolfe. 1990. Westlothiana Gen. nov. - naming the earliest known Scottish Journal of Geology 26:137-138.

Snoeyenbos-West, O. L. O., T. Salcedo, G. B. McManus, and L. A. Katz. 2002. Insights into the diversity of and ciliates (Class : Spirotrichea) based on genealogical analyses of multiple loci. Int J Syst Evol Microbiol 52:1901- 1913.

Sogin, M. L. 1991. Early evolution and the origin of eukaryotes. Curr Opin Genet Dev 1:457-463.

158

Stamatakis, A., P. Hoover, and J. Rougemont. 2008. A Rapid Bootstrap Algorithm for the RAxML Web-Servers. Syst Biol 57:758-771.

Stamatakis, A., T. Ludwig, and H. Meier. 2005. RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics 21:456-463.

Stauffer, W., and J. I. Ravdin. 2003. Entamoeba histolytica: an update. Curr Opin Infect Dis 16:479-485.

Stechmann, A., and T. Cavalier-Smith. 2002. Rooting the eukaryote tree by using a derived gene fusion. Science 297:89-91.

Stechmann, A., and T. Cavalier-Smith. 2003. The root of the eukaryote tree pinpointed. Curr Biol 13:R665-R666.

Steenkamp, E. T., J. Wright, and S. L. Baldauf. 2006. The protistan origins of animals and fungi. Mol Biol Evol 23:93-106.

Stiller, J. W. 2003. Weighing the evidence for a single origin of plastids. J Phycol 39:1283-1285.

Stutzman, P. 1995. Food quality of gelatinous colonial chlorophytes to the fresh-water zooplankters Daphina pulicaria and Diatomus oregonensis. Freshwater Biol 34:149-153.

Suchard, M. A., R. E. Weiss, and J. S. Sinsheimer. 2001. Bayesian selection of continuous-time Markov chain evolutionary models. Mol Biol Evol 18:1001- 1013.

Summons, R. E., and M. R. Walter. 1990. Molecular fossils and microfossils of prokaryotes and protists from Proterozoic sediments. Am J Sci 290-A:212-244.

Susko, E., Y. Inagaki, C. Field, M. E. Holder, and A. J. Roger. 2002. Testing for differences in rates-across-sites distributions in phylogenetic subtrees. Mol Biol Evol 19:1514-1523.

Swofford, D. 2002. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). version 4.0b8. Sinauer Assoc., Sunderland MA.

Swofford, D. L., G. J. Olsen, P. J. Waddell, and D. M. Hillis. 1996. Phylogenetic inference. Pages 407-514 in Molecular Systematics (D. M. Hillis, C. Moritz, and B. K. Mable, eds.). Sinauer Assoc., Sunderland MA.

159

Talavera, G., and J. Castresana. 2007. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol 56:564-577.

Taylor, F. J. R. 1999. Ultrastructure as a control for protistan molecular phylogeny. Am Nat 154:S125-S136.

Taylor, T. N., T. Hass, and H. Kerp. 1999. The oldest fossil ascomycetes. Nature 399:648-648.

Tekle, Y. I., J. Grant, O. R. Anderson, T. A. Nerad, J. C. Cole, D. J. Patterson, and L. A. Katz. 2008. Phylogenetic placement of diverse amoebae inferred from multigene analyses and assessment of clade stability within 'Amoebozoa' upon removal of varying rate classes of SSU-rDNA. Mol Phylogenet Evol 47:339-352.

Tekle, Y. I., L. W. Parfrey, and L. A. Katz. 2009. Molecular data are transforming hypotheses on the origin and diversification of eukaryotes. Bioscience 59:471- 481.

Teodorovic, S., J. M. Bravermanj, and H. G. Elmendorf. 2007. Unusually low levels of genetic variation among Giardia lamblia isolates. Eukaryot Cell 6:1421-1430.

Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. Clustal W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673-4680.

Thorne, J. L., and H. Kishino. 2002. Divergence time estimation and rate evolution with multilocus data sets. Syst Biol 51:689-702.

Travis, J. L., and R. D. Allen. 1981. Studies on the motility of the foraminifera .1. Ultrastructure of the reticulopodial network of Allogromia laticollaris (Arnold). J Cell Biol 90:211-221.

Van de Peer, Y., A. Ben Ali, and A. Meyer. 2000. Microsporidia: accumulating molecular evidence that a group of amitochondriate and suspectedly primitive eukaryotes are just curious fungi. Gene 246:1-8. van Huis, A., G. Woldewahid, K. Toleubayev, and W. van der Werf. 2008. Relationships between food quality and fitness in the desert locust, Schistocerca gregaria, and its distribution over habitats on the Red Sea coastal plain of Sudan. Entomologia Experimentalis Et Applicata 127:144-156.

Vanni, M. J., and W. Lampert. 1992. Food quality effects on life-history traits and fitness in the generalist herbivore Daphnia. Oecologia 92:48-57.

160

Wainright, P. O., G. Hinkle, M. L. Sogin, and S. K. Stickel. 1993. Monophyletic origins of the metazoa: an evolutionary link with fungi. Science 260:340-342.

Walker, G., J. B. Dacks, and T. M. Embley. 2006. Ultrastructural description of Breviata anathema, n. Gen., n. Sp., the organism previously studied as "Mastigamoeba invertens". J Euk Microbiol 53:65-78.

Weber, A., M. Linka, and D. Bhattacharya. 2006. A hypothesis to explain the establishment of the plastid in algae and plants. Eukaryot Cell 5:609-612.

Wehe, A., M. S. Bansal, J. G. Burleigh, and O. Eulenstein. 2008. DupTree: a program for large-scale phylogenetic analyses using gene tree parsimony. Bioinformatics 24:1540-1541.

Welch, J. J., and L. Bromham. 2005. Molecular dating when rates vary. Trends Ecol Evol 20:320-327.

Whittaker, R. H. 1969. New concepts of kingdoms of organisms. Science 163:150-160.

Wiens, J. J. 2005. Can incomplete taxa rescue phylogenetic analyses from long-branch attraction? Syst Biol 54:731-742.

Wiens, J. J., and D. S. Moen. 2008. Missing data and the accuracy of Bayesian phylogenetics. J Syst Evol 46:307-314.

Williams, J. G., A. A. Noegel, and L. Eichinger. 2005. Manifestations of multicellularity: Dictyostelium reports in. Trends Genet 21:392-398.

Wuyts, J., Y. Van de Peer, T. Winklemans, and R. De Wachter. 2002. The European database on small subunit ribosomal RNA. Nucleic Acids Res 30.

Wyngaard, G. A., R. Domangue, S. Gerken, and E. M. Rasch. 2001. A mechanistic role for chromatin diminution in the evolution of genome size and life history characteristics in . Am Zool 41:1632-1632.

Xiao, S., and L. Dong. 2006. On the morphological and ecological history of Proterozoic macroalgae. Pages 57-90 in Neoproterozoic Geobiology and Paleobiology (S. Xiao, and A. J. Kaufman, eds.). Springer, Netherlands.

Xiao, S. H., A. H. Knoll, X. L. Yuan, and C. M. Pueschel. 2004. Phosphatized multicellular algae in the Neoproterozoic Doushantuo Formation, China, and the early evolution of florideophyte red algae. Am J Bot 91:214-227.

Yang, Z. H. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 13:555-556.

161

Yang, Z. H., and B. Rannala. 2006. Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds. Mol Biol Evol 23:212-226.

Yoon, H. S., J. Grant, Y. I. Tekle, M. Wu, B. C. Chaon, J. C. Cole, J. M. Logsdon, D. J. Patterson, D. Bhattacharya, and L. A. Katz. 2008. Broadly sampled multigene trees of eukaryotes. BMC Evol Biol 8:14.

Yoon, H. S., J. D. Hackett, C. Ciniglia, G. Pinto, and D. Bhattacharya. 2004. A molecular timeline for the origin of photosynthetic eukaryotes. Mol Biol Evol 21:809-818.

Yoon, H. S., J. D. Hackett, G. Pinto, and D. Bhattacharya. 2002. The single, ancient origin of chromist plastids. Proc Natl Acad Sci, USA99:15507-15512.

Yoon, H. S., J. D. Hackett, F. M. Van Dolah, T. Nosenko, L. Lidie, and D. Bhattacharya. 2005. Tertiary endosymbiosis driven genome evolution in dinoflagellate algae. Mol Biol Evol 22:1299-1308.

Yoon, H. S., A. Reyes-Prieto, M. Melkonian, and D. Bhattacharya. 2006. Minimal plastid genome evolution in the Paulinella endosymbiont. Curr Biol 16:R670-R672.

Yubuki, N., and B. S. Leander. 2008. Ultrastructure and molecular phylogeny of Stephanopogon minuta: An enigmatic microeukaryote from marine interstitial environments. Eur J Protistol 44:241-253.

Zuckerkandl, E., and L. Pauling. 1962. Molecular diease, evolution, and genetic heterogeneity. Pages 189-225 in Horizons in Biochemistry (M. Kasha, and B. Pullman, eds.). Academic Press, New York.

Zufall, R. A., and L. A. Katz. 2007. Micronuclear and macronuclear forms of beta- tubulin genes in the ciliate Chilodonella uncinata reveal insights into genome processing and protein evolution. J Euk Microbiol 54:275-282.

Zufall, R. A., C. L. McGrath, S. V. Muse, and L. A. Katz. 2006. Genome architecture drives protein evolution in ciliates. Mol Biol Evol 23:1681-1687.

Zufall, R. A., T. Robinson, and L. A. Katz. 2005. Evolution of developmentally regulated genome rearrangements in eukaryotes. J Exp Zool Part B 304B:448-455.

Zwickl, D. J., and D. M. Hillis. 2002. Increased taxon sampling greatly reduces phylogenetic error. Syst Biol 51:588-598.

162