<<

The Origins of Multicellularity: Correlation between Morphological and Genomic Complexity in

by

Ben Qin, BS

A Dissertation

In

Biology

Submitted to the Graduate Faculty of Texas Tech University in Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

Approved

Sean Rice Chair of Committee

Michael San-Francisco

Zhixin Xie

John Zak

Kai Zhang

Mark Sheridan Dean of the Graduate School

May 2017

Copyright 2017, Ben Qin

Texas Tech University, Ben Qin, May 2017

Acknowledgements

This work is generated with the immense help from my advisor, Dr. Sean Rice, who directed me to lay down the framework of this research, walked me through the evolutionary theories behind the numbers collected, and encouraged me to overcome obstacles in the research process. I also want to thank my other committee, Dr. Michael San-Francisco, Dr. Zhixin Xie, Dr. John Zak, and Dr. Kai Zhang, for all of their insightful instructions from a professional perspective.

My friends from our lab and departments enlightened me with their novel ideas. Especially, Ryan Vazquez showed me how to use the random sampling to the origins of multicellularity. This statistical problem puzzled me for years and his suggestion gave a beautiful answer.

ii Texas Tech University, Ben Qin, May 2017

Table of Contents

Acknowledgements ...... ii

Abstract ...... vii

List of Tables ...... x

List of Figures ...... xi

List of Abbreviations ...... xv

1. Introduction ...... 1

1.1 The origin of the question ...... 1

1.2 PIC does not support Ne as an explanation ...... 2

1.3 Could morphological complexity play a role? ...... 7

1.4 Differences between and ...... 8

1.5 How did multicellularity evolve in eukaryotes? ...... 9

2. Prokaryotes ...... 12

2.1 type justification criteria ...... 12

2.2 Data mining ...... 14

2.3 Phylogenies ...... 15

2.4 Raw data comparison ...... 18

2.5 Diagnostic test ...... 25

iii Texas Tech University, Ben Qin, May 2017

2.6 Correlation ...... 26

2.7 Regressions of cell types on genomic traits ...... 28

2.8 Discussion ...... 31

3. Eukaryotes ...... 32

3.1 Cell type justification criteria ...... 32

3.2 Data mining ...... 33

3.3 Phylogenies ...... 34

3.4 Raw data comparison ...... 37

3.5 Diagnostic test ...... 41

3.6 Correlation ...... 42

3.7 Regressions of cell types on genomic traits ...... 43

3.8 Discussion ...... 45

4. Eukaryotic Multicellularity ...... 47

4.1 Two different paths to multicellularity ...... 47

4.2 Independent origins of divisional multicellularity ...... 48

4.2.1 Fungi ...... 48

4.2.2 & ...... 49

4.2.3 Stramenopiles ...... 50

4.2.4 Green and ...... 51

4.2.5 ...... 53

iv Texas Tech University, Ben Qin, May 2017

4.3 Independent origins of aggregative multicellularity ...... 53

4.3.1 Capsaspora ...... 54

4.3.2 ...... 55

4.3.3 ...... 56

4.3.4 Copromyxa ...... 57

4.3.5 Acrasis ...... 58

4.3.6 Sorogena ...... 59

4.3.7 Guttulinopsis ...... 60

4.3.8 Sorodiplophrys ...... 61

4.4 Phylogenies ...... 61

4.5 Data comparison ...... 63

4.6 Statistical analysis ...... 67

4.7 Discussions ...... 69

4.7.1 Divisional multicellularity arises from the simplest unicellular ancestors. ... 69

4.7.2 Aggregation and division based multicellularity show different patterns. ... 69

5. Conclusion ...... 71

5.1 What is new in this research ...... 71

5.2 Prokaryotes and eukaryotes ...... 72

5.3 multicellularity ...... 73

5.4 Future study ...... 74

v Texas Tech University, Ben Qin, May 2017

5.4.1 Incorporating more data ...... 74

5.4.2 Prokaryotic multicellularity ...... 75

5.4.3 of cooperation ...... 75

5.5 Concluding remarks ...... 76

Bibliography ...... 77

Appendices

A. Cell Type Number of Prokaryotes ...... 82

B. Genomic Data of Prokaryotes ...... 92

C. Cell Type Number of Eukaryotes ...... 99

D. Genomic Data of Eukaryotes ...... 106

E. Branch Values for Resampling Test ...... 110

F. References for Cell Types ...... 117

vi Texas Tech University, Ben Qin, May 2017

Abstract

The 's remarkable is a testament to the evolution of organismal complexity. The fact that some kinds of complexity, including multicellularity, have arisen many times suggests that there are repeating selection pressure to become more complex, but our current knowledge of the mechanisms allowing for increased complexity is still far from complete. Most of the work that has been done focused on a few groups, primarily multicellular animals and plants.

In this work, I study the evolution of phenotypic complexity across the entire tree of . I use cell type number as a measure of phenotypic complexity. Unlike previous researchers, I focus mostly on unicellular taxa, both because they can exhibit great complexity on their own, and because they set the stage for the evolution of multicellularity. This means that cell type number is calculated over all life cycle stages of a . As a result, a single celled can be "polycellular", in the sense of expressing multiple cell types throughout its lifecycle.

I collected cell type data for 83 prokaryotes and 45 eukaryotes that met two criteria: First, there must be scale data available for genomic traits that might correlate with phenotypic complexity, such as coding gene number and number of transcription factors. Second, because ignoring phylogenetic relationships can easily lead to incorrect conclusions, I used only for which good phylogenies exist.

A natural expectation is that organismal complexity will correlate with genomic complexity, and some researchers have indeed reported a positive correlation between these two kinds of complexity. However, most of the work done so far did not take phylogeny into account, so any correlations observed may be artifacts.

Using phylogenetic independent contrasts, I found that among prokaryotes (67 and 16 ), cell type number is indeed positively correlated with genome size,

vii Texas Tech University, Ben Qin, May 2017 protein number, coding gene number and especially transcription factors. The correlation between cell type number and transcription factors was much stronger than for any of the other genomic variables. Thus, it seems that prokaryotes build their complex structures in the intuitively expected manner, with the help of more genes, proteins, and, especially, increasing capacity for complex gene regulation.

Surprisingly, the eukaryote groups (45 species spread across the eukaryote tree) showed a completely different pattern. None of the genomic characters showed a statistically significant correlation with phenotypic complexity. Furthermore, the correlations between cell type number and both gene and protein number, though not significant, were actually negative. Eukaryotes therefore seem to achieve their organismal complexity not by simply accumulating genes or proteins, but in a novel way not seen in prokaryotes.

Intrigued by this phenomenon, I went on to test the role of these genomic traits in the evolution of eukaryotic multicellularity. Firstly, I distinguished two different types of multicellularity, divisional and aggregative. Mapping these on to the eukaryotic tree of life, I identified 19 independent origins of multicellularity, 11 divisional and 8 aggregative. I then used the data for unicellular eukaryotes to estimate the character states (cell type number, gene number, etc.) for all internal (unicellular) branches of the eukaryote tree.

I found that division-based multicellular groups consistently arose from simple ancestors, meaning that they arose from branches of fewer than average cell types, proteins and genes. Aggregation-based multicellularity, however, did not show this tendency.

In summary, this study contributed to our understanding of organismal complexity and the origins of multicellularity at least in three ways. Firstly, it generated an annotated dataset for microbial cell type number spanning the tree of life. Secondly, this study revealed that prokaryotes and eukaryotes achieved their phenotypic complexity in fundamentally different ways: the former did it by recruiting more genes and proteins, especially those involved in gene regulation, while the latter did not follow this path.

viii Texas Tech University, Ben Qin, May 2017

Thirdly, this study showed that the most complex modern multicellular organisms evolved from the simplest unicellular ancestors, and that this pattern holds for both morphological and genomic complexity.

ix Texas Tech University, Ben Qin, May 2017

List of Tables

2.1 Branch length diagnostic tests for two trees ...... 26

2.2 Regression of prokaryote cell types on genomic traits in two trees ...... 27

3.1 Branch length diagnostic tests for two eukaryote trees ...... 42

3.2 Regression of eukaryote cell types on genomic traits in two trees ...... 42

4.1 Sampling simulation test on the origins of divisional and aggregative multicellularity in two trees ...... 68

A.1 Cell type number of 83 prokaryotes ...... 82

B.1 Genomic information of 83 prokaryotes ...... 92

C.1 Cell type number of 45 eukaryotes ...... 99

D.1 Genomic information of 45 eukaryotes ...... 106

E.1 Branch values for resampling test ...... 110

x Texas Tech University, Ben Qin, May 2017

List of Figures

1.1 C-values of some plants and animals ...... 2

1.2 The proposed relationship between genome size and πs ...... 3

1.3 Reevaluation of the relationship between genome size and Ne with PIC .. 4

1.4 True relationship revealed by PIC is opposite to the raw data ...... 6

1.5 Life cycle of Caulobacter ...... 8

1.6 Regression of non-coding DNA on coding DNA for both prokaryotes and eukaryotes ...... 9

1.7 Multicellularity can be achieved through division or aggregation ...... 10

2.1 Summary of cell type number in prokaryotes ...... 15

2.2 Tree of life from organization ...... 16

2.3 Single most-parsimonious tree of life ...... 17

2.4 Mirror trees comparing the evolution of prokaryote cell type number and genome size ...... 19

2.5 Mirror trees comparing the evolution of prokaryote cell type number and coding genes ...... 20

2.6 Mirror trees comparing the evolution of prokaryote cell type number and total gene number ...... 21

2.7 Mirror trees comparing the evolution of prokaryote cell type number and coding percent ...... 22

xi Texas Tech University, Ben Qin, May 2017

2.8 Mirror trees comparing the evolution of prokaryote cell type number and regulatory genes ...... 23

2.9 Mirror trees comparing the evolution of prokaryote cell type number and transcription factors ...... 24

2.10 Regression of prokaryote cell type on genome size ...... 28

2.11 Regression of prokaryote cell type on coding genes ...... 28

2.12 Regression of prokaryote cell type on total genes ...... 29

2.13 Regression of prokaryote cell type on coding percent ...... 29

2.14 Regression of prokaryote cell type on regulatory genes ...... 30

2.15 Regression of prokaryote cell type on transcription factors ...... 30

3.1 Summary of cell type number in unicellular eukaryotes ...... 34

3.2 Most likely eukaryotic tree of life reconstructed from 16 genes ...... 35

3.3 Most likely eukaryotic tree of life reconstructed from 150 most even genes ...... 36

3.4 Mirror trees comparing the evolution of eukaryote cell type number and genome size ...... 37

3.5 Mirror trees comparing the evolution of eukaryote cell type number and protein number ...... 38

3.6 Mirror trees comparing the evolution of eukaryote cell type number and gene number ...... 39

3.7 Mirror trees comparing the evolution of eukaryote cell type number and GC content ...... 40

xii Texas Tech University, Ben Qin, May 2017

3.8 Mirror trees comparing the evolution of eukaryote cell type number transcription factors ...... 41

3.9 Regression of eukaryote cell types on genome size ...... 43

3.10 Regression of eukaryote cell types on protein number ...... 43

3.11 Regression of eukaryote cell types on gene number ...... 44

3.12 Regression of eukaryote cell types on GC content ...... 44

3.13 Regression of eukaryote cell types on transcription factor number ...... 45

3.14 Regression of coding gene number on genome size ...... 46

4.1 Fungal phylogeny showing the appearances of multicellular lineages .... 48

4.2 Maximum likelihood Phylogeny of Animals and Choanoflagellates ...... 50

4.3 Phylogeny of stramenopiles showing the incidences of multicellularity .. 51

4.4 Phylogeny of Chlorophytes showing the UTC group as a multicellular lineage ...... 52

4.5 Putative phylogeny of streptophytes showing the appearance of multicellularity ...... 52

4.6 Modified phylogenies of red algae ...... 53

4.7 Life cycle of Capsaspora ...... 54

4.8 Multicellular structure built by aggregation of free-living Fonticula cells 55

4.9 Life cycle of ...... 56

4.10 Life cycle of Copromyxa protea ...... 57

4.11 Life cycle of ...... 58

xiii Texas Tech University, Ben Qin, May 2017

4.12 Life cycle of Sorogena ...... 59

4.13 Life stages of Guttulinopsis vulgaris ...... 60

4.14 Life stages of Sorodiplophrys stercorea ...... 61

4.15 Eukaryotic tree of life showing cell type evolution along the branches ... 62

4.16 Comparison of cell type number ...... 64

4.17 Comparison of genome size ...... 64

4.18 Comparison of protein number ...... 65

4.19 Comparison of gene number ...... 65

4.20 Comparison of GC percentage ...... 66

4.21 Comparison of transcription factor number ...... 66

E.1 Node position in the resampling test ...... 116

xiv Texas Tech University, Ben Qin, May 2017

List of Abbreviations

A: aggregation

BS: Bootstrap

D: division df: degrees of freedom

ECM: extracellular matrix

Ka: number of non-synonymous substitutions per non-synonymous site

Kb: kilo base pairs

Ks: number of synonymous substitutions per synonymous site

L-form: deficient

LSR: linear square regression

Mb: mega base pairs

ML: maximum likelihood

MP: maximum parsimony

NC: protein-coding DNA

NCBI: national center for biotechnology information ncDNA: non-coding DNA

Ne: effective population size

NNC: non-coding DNA

xv Texas Tech University, Ben Qin, May 2017

Pfam: database of protein families

PIC: phylogenetic independent contrast

SAR: the monophyletic group of stramenopiles, , and

TF: transcription factor

UTC: the monophyletic group of , , and

πs : silent-site diversity

xvi Texas Tech University, Ben Qin, May 2017

Chapter 1

Introduction

1.1 The origin of the question The great variation in genome complexity always grabbed people’s attention, and a huge amount of effort has gone into explaining this phenomenon. At the first glance, smaller were usually found in smaller, so-called “micro-”, organisms, which were regarded as small in size, simple in structure, and appearing early in evolutionary history. But when more species were studied, it became clear that organismal complexity does not always agree with genome size. A well-known example is that onion’s genome is more than 10 times larger than ’s. This disconnect is called the C-value paradox [1] (Fig 1.1).

Naturally, multiple hypotheses were developed to address the C-value paradox. For example, avian genome size is the smallest and least variable among , presumably due to the correlation between genome size and cell size, which further correlates with metabolic rate and flight ability. Recent research has confirmed this suggestion in passerine , and it appears that flightless birds contain more DNA in each cell than those that fly. Some studies on bats also point in this direction[2].

Further investigation into the flightless birds’ mitochondrial DNA detected that the strength of selection was relaxed after the loss of flying ability[3]. This inspired us to speculate that the excess DNA might be a result of a lack of selective power. If true, this suggests that species with longer branch length in a phylogeny should have more compact genomes, since longer branch implied more directional selection. I found that this pattern can be seen in amphibian phylogeny and some other animals like tunicates, but does not seem to be general.

1 Texas Tech University, Ben Qin, May 2017

Fig 1.1 C-values of some plants and animals. The number of species in each is in parentheses and the vertical line indicates the average C-value[1].

1.2 PIC does not support Ne as an explanation

The above research about selection on flightless birds employed Ka/Ks ratio as a measurement of the strength of selection, which could also be reflected by the effective population size (Ne): selection usually dominated in a population with large Ne, overshadowing and preventing “junk DNA” from building up. This hypothesis came to the forefront when researchers’ focus moved from animals to all cellular life forms, especially microbes.

Single celled organisms generally have both streamlined genomes and large population sizes. This inspired researchers to suspect that Ne might play a key role in shaping genome architecture[4] – since in large populations would be effective at purging excessive, nonfunctional DNA, while a small Ne would allow them to

2 Texas Tech University, Ben Qin, May 2017 accumulate due to genetic drift. Incidentally, in population genetics, population size is believed to influence how often a species performs sexual : organisms with small population size rely on sex to create , such as vertebrates, while in bacteria and protozoan, the large population size means that alone is enough to generate variation needed for selection. Could genome structure be determined, or at least affected, by a similar mechanism?

100000 Vertebrates

10000 Plants 1000 Invertebrates 100 Univellular/oligocellular 10 eukaryotes Genome size (Mb) Prokaryotes 1

0.1 0.0001 0.001 0.01 0.1 1

Silent-site diversity (π ) s

Fig 1.2 The proposed relationship between genome size

and πs [5].

When genome size is plotted against silent-site diversity (which is related to effective population size: πs, equals to 4Neµ for diploid, or 2Neµ for haploid), there is indeed a tendency to find more DNA associated with smaller Ne [5](Fig 1.5). However, if one examines the plot carefully, this tendency seems to be an artifact generated by the placement of prokaryotes, uni-/oligocellular eukaryotes, and multicellular species.

Within each of these taxa, Ne does not appear to correlate with genome size. This could be a case in which phylogeny distorts the true relationship. An analysis revisited this issue using phylogenetic independent contrasts (PIC), and it did not detect any correlation between Ne and genome size [6](Fig 1.3).

3 Texas Tech University, Ben Qin, May 2017

Fig 1.3 Reevaluation of the relationship between genome

size and Ne with PIC. A: ordinary least squares regression (OLS) of Lynch & Conery; r2=0.64, P<0.0001. B: PIC with all branch lengths set as 1.0; r2=0/08, P=0.128[6].

4 Texas Tech University, Ben Qin, May 2017

PIC was first developed by Felsenstein to measure the actual correlation between traits, correcting for phylogenetic relationship[7]. Take two traits, X and Y for example [6](Fig 1.4), they may appear as positively correlated in raw data, and the result is statistically significant, considering the sample size (number of data points). But this correlation is actually generated by these animals’ common evolutionary history – the ancestor of these animals had large values of X and Y, and its descendants inherited this “correlation”, which was in fact only a large X and a large Y at the beginning. Then after the ancestor, X and Y correlate negatively all the way to the current generation, but due to a strong initial signal, the raw data still incorrectly point to a positive relationship.

Why did the traditional statistics without PIC point to an incorrect conclusion? Because the traditional correlation test assumed all the data points are independent of one another, but due to the topology of a , those data points are not truly independent (except in the case of a star phylogeny, which is rare and unrealistic most of the time). In a traditional bifurcating tree, terminal taxa share lots of internal branches (shared evolutionary history) with each other, meaning that the true number of independent values is less than what we see in the raw data (because they are not truly independent). Thus, the statistical power is reduced because of the loss of some data points, and the “significant” result may be turned around. In other words, traditional correlation test is not applicable since its assumption on independence is violated.

5 Texas Tech University, Ben Qin, May 2017

Fig 1.4 True relationship revealed by PIC is opposite to the raw data. A and B: From the trait values at the tips of the tree, ordinary least-squares linear regression gives a positive correlation (P=0.02). Note that the terminal taxa along show a negative pattern, and the values in parentheses are estimated by PIC. C: PIC returns a non- significant conclusion (P=0.82, all branch lengths set to 1, ancestral contrast in red)[6].

6 Texas Tech University, Ben Qin, May 2017

1.3 Could morphological complexity play a role?

If Ne cannot explain the variation in genome architecture, what can? Single celled organisms often have relatively simple, but variable, morphology, so it is natural to look for relation between their physical structure and genomic structure. To do this, I need to find a way to quantify organismal complexity that could be applied to unicellular organisms.

The number of cell types that an organism can produce is commonly used as a measurement of organismal complexity in plants and animals. For example, previous research on metazoans indicated larger cell type number correlated with higher metabolic rate, larger body size and later evolutionary appearance [8, 9]. However, these studies focused on the complexity of multicellular organisms, not single celled microbes. Furthermore, most of these studies did not include genomic information, and none accounted for evolutionary history by using phylogenetic independent contrasts.

In addition to bringing in organismal complexity, I wanted to consider aspects of the genome beyond just genome size. Gene (and protein) number, the number of regulatory components such as transcription factors, the coding percent and GC content may also be associated with organismal complexity.

When looking for patterns between genomic and organismal complexity, combining multicellular and unicellular species in one single analysis might cause a potential artifact. Some multicellular groups have more then 200 cell types, complicated structures and massive amount of DNA in the form of transposable elements. However, these features were often acquired later in their evolutionary history, after the appearance of multicellularity. Therefore, in order to study the origin of morphological complexity, I will firstly focus on uni-/oligocellular species, and later discuss how multicellularity evolves.

Relatively simple morphology, combined with abundant genomic data, make unicellular microbes an ideal group to start with. I estimated organismal complexity by quantifying

7 Texas Tech University, Ben Qin, May 2017 the number of cell types one species could have in its life cycle. There has been a concern that cell type number can only be applied to multicellular species[10], but many unicellular organisms actually have more than one life cycle stage[11](Fig 1.5). The term “polycellular” is used here to describe unicellular species that exhibit multiple cell types at different life cycle stages or in different environments.

Fig 1.5 Life cycle of Caulobacter. The life cycle of this unicellular prokaryote contains a swarmer cell stage, a stalk stage, and a predivisional cell stage[11].

1.4 Differences between prokaryotes and eukaryotes Prokaryotes and eukaryotes are fundamentally different in many aspects, so they were addressed separately in this study. Previous theoretical research [12](Fig 1.6) suggested that in prokaryotes, non-coding genes went up with protein-coding genes, until hitting a “complexity ceiling”, which restricted prokaryotes to very simple multicellular stage. Eukaryotes, on the other hand, can make use of non-coding DNA (ncDNA) to achieve higher level of complexity, and the theoretical model showed that the ncDNA increased quadratically with coding DNA in eukaryotes.

8 Texas Tech University, Ben Qin, May 2017

Fig 1.6 Regression of non-coding DNA on coding DNA for both prokaryotes and eukaryotes. In prokaryotes, non-

coding DNA (NNC) increases linearly with protein-coding

DNA (NC); while in eukaryotes, ncDNA grows much faster. The solid line depicts the theoretical lower limit of ncDNA for eukaryotes[12].

However, this tentative explanation for C-value paradox was purely mathematical induction, and did not incorporate real measurement of complexity or account for phylogenetic relationships.

1.5 How did multicellularity evolve in eukaryotes? It has been proposed that multicellularity evolved more than 25 times across the tree of life, and at least 10 of these were considered to be “complex” in the sense of having “sustained cell-to-cell interconnection and communication”[13]. These events are mainly found in animals, plants, fungi, and green and . Interestingly, all there species develop their complex multicellular structures by dividing from a single or .

9 Texas Tech University, Ben Qin, May 2017

Fig 1.7 Multicellularity can be achieved through division or aggregation[14]. The asteroid marks that multicellularity (bold) arose more than once within the group, and the pie chart shows the proportion of multicellular taxa (black) in that lineage (the number of divisional events in Fungi and Brown algae will be adjusted later in this research).

10 Texas Tech University, Ben Qin, May 2017

Another way to achieve multicellularity is through cell-aggregation, as seen in organisms such as Dictyostelium. These multicellular organisms may not be as complex as those that develop through division, but they are as common on the tree of life[14](Fig 1.7).

In this study, I revisited the distinction between division and aggregation as routs to multicellularity, paying particular attention to the number of independent origins of each type. In addition to this, I incorporated the data from cell type number and genomic features to investigate how multicellularity evolved in eukaryotes.

Since I had already collected data on cell type number and genomic information across the tree of life, I could then insert the multicellular taxa into the phylogeny, and visualize from which sections and branches of the tree these multicellular taxa originated. With this approach, I will address questions like:

Did two types of multicellularity evolve through the same mechanism?

Would having more genes help to generate multicellular species?

Did a larger genome point to more complex structure?

Is multicellularity correlated with polycellularity?

Answering these questions will contribute to our understanding on the evolution of complexity in general, and multicellularity in particular.

11 Texas Tech University, Ben Qin, May 2017

Chapter 2

Prokaryotes

Prokaryotes, a large paraphyletic group, appeared before the morphologically more complex eukaryotes, and for this reason, this research will begin from prokaryotes.

Prokaryotes contain both bacteria, which colonize almost every corner of the biosphere, and archaea, which include many . They are named from their lack of membrane-bound nucleus, but this structural simplicity does not mean that they cannot be complex. Quite the contrary, prokaryotes demonstrate a vast diversity and complexity in morphology, and ecology, and in this study, their life cycle stages in various conditions were used as a measure of this complex.

2.1 Cell type justification criteria The criteria for defining cell types needs to be both morphologically and evolutionarily meaningful. As a result, I defined various cell types in the context of an organism’s life cycle, and each cell type represents a distinct morphology that the organism’s genome should be able to construct.

Specifically, I looked for definite life cycle stages in which the cell consistently exhibits a distinct shape. If the cell takes on a different morphology as the organism enters the next life cycle stage, I recognize it as having another cell type. For instance, the Pseudomonas putida cells are “long and cylindrical during exponential growth”, but once starved, the cells will predictably shrink to round or coccoid shapes, as small as one fifth of the original size[15]. In this case, Pseudomonas putida is counted as having two cell types.

12 Texas Tech University, Ben Qin, May 2017

The presence or absence of some external cellular structures, mainly and , can also define a cell type. These structures are often used by parasitic prokaryotes to locate, attach to, and enter the host, so they are usually shed once the invasion is completed. As an illustration, Neisseria meningitidis is an intracellular parasite that has pili when outside the host (human) cell, but the pili are subsequently lost in the host cell[16]. Thus, Neisseria meningitidis has two cell types, even though there may not be other significant change in shape before and after entering the host.

Additionally, spore and cyst are both considered as distinct cell types. In the absence of other information, a species that is non-spore-forming and non-motile will be considered as having one cell type, because the former implies it does not go through a hibernating stage in its life cycle, while the latter indicates it does not have appendages (see above).

For variation within one species, something that is clearly a variant strain will not be treated as a new cell type, unless morphological change is involved. The former case concerns Haemophilus influenzae, which “can be divided into six capsular types, a-f, and noncapsulate strains”[17], but this alone does not appear as a change in shape and therefore does not count as new types. The latter can be seen in Oceanobacillus iheyensis, whose filamentous form is also regarded as a cell type complementary to the predominant rod form, for the reason that these two forms have different shapes[18].

If a species enters the next life cycle stage or switches to a new environment without changing in morphology, it would not be recorded as having another cell type, even though the cell might behave differently in physiology, or , etc. Take Clostridium acetobutylicum for example: in vigorous growth phase the cell produces organic (mainly butyric and acetic) acids and molecular hydrogen, while in the slow growth phase it produces solvents like butanol, ethanol, and acetone[19]. This physiological and biochemical switching is not characterized by morphological change, and therefore the cell type number of Clostridium acetobutylicum is calculated based on other aspects.

13 Texas Tech University, Ben Qin, May 2017

As a comparison, some prokaryotes change shape when the environment deteriorates or alters. The new cell type is recognized if the species can persist in such a condition and show a consistent change in shape, unless the change is a passive, irregular response to merely physical stress like osmotic pressure or culture aging. For example, Agrobacterium tumefaciens has an L-form (cell wall deficient) under glucose agar[20], and in this case the L-form is counted as a cell type since its emergence is, at least to some extent, consistent and spontaneous. The counterexample is found in Mycoplasma genitalium, which develops bleb and thanks to the culture ages[21]. These signs of disruption are not considered as new cell types, because they are not really shape changes and their occurrences are not reliable (unlike Mycoplasma pulmonis, which faithfully turns spherical in phase[22]).

Similarly, budding, branching and polyploidy are not seemed as novel structures, as illustrated in pili-like appendages of [23], and budding of Campylobacter jejuni[24].

Overall, the ultimate criteria are based on whether there is morphological change, and whether this change is reliably repeatable, rather than a random response to stress.

2.2 Data mining The initial species list of prokaryotes and their regulatory gene numbers were from Croft and colleagues’ research on prokaryotic complexity[25], and this list included all major groups of prokaryotes. For every single species of these prokaryotes, I went through the literature to study their morphology and life cycle individually, and successfully identified the cell type number for 67 bacteria and 16 archaea. As shown in Fig 2.1, most prokaryotes have 1-2 cell types, with the maximum range to 6, and the detail was captured in Table A.1.

Once the 83 prokaryotes had their cell types known, a study of Hou and Lin[26] provided other genomic features, such as genome size, number of coding/total genes, and coding

14 Texas Tech University, Ben Qin, May 2017 percentage. Meanwhile, an online database[27] predicted the transcription factors for all organisms in this chapter, from two separate libraries, SUPERFAMILY and Pfam.

40

30

20

10 Number of cases 0 1 2 3 4 5 6 Cell type number

Fig 2.1 Summary of cell type number in prokaryotes.

2.3 Phylogenies Independent contrast relies on the phylogeny, but there is some disagreement about some parts of prokaryote phylogeny. To make sure that my results are not merely artifacts of a particular tree, I carried out all analyses on two recent phylogenies. The first tree was generated by Fukami-Kobayashi et al.[28], using protein domain organization repertoires (Fig 2.2), while the other was constructed by Lienau et al.[29], accounting for and sequence evolution (Fig 2.3). All results that I will report were consistent across both trees.

Contrary to the traditional view, Fukami-Kobayashi et al. generated a prokaryotic monophyletic group, containing both bacteria and archaea, outgrouping eukaryotes. Though this was unlikely to reflect the true relationship, it should not be a concern because the PIC of this chapter just relied on the topology within prokaryotes, so treating eukaryotes as an outgroup (perhaps mistakenly) should not cause problems in PIC.

15 Texas Tech University, Ben Qin, May 2017

Pyrobaculum aerophilum CrenarchaeotaCrenarchaeota Aeropyrum pernix solfataricus Sulfolobus tokodaii Thermoplasma volcanium Thermoplasma acidophilum EuryarchaeotaEuryarchaeota EuryarchaeotaEuryarchaeota Archaeoglobus fulgidus ArchaeaArchaea Methanobacterium thermoautotrophicum jannaschii kandleri Methanosarcina mazei Methanosarcina acetivorans sp. radiodurans Synechocystis sp. Mycoplasma genitalium Mycoplasma pneumoniae Ureaplasma urealyticum Mycoplasma pulmonis Mycoplasma penetrans BacteriaBacteria subtilis Bacillus halodurans Oceanobacillus iheyensis BacillalesBacillales Staphylococcus aureus Staphylococcus epidermidis Listeria monocytogenes Listeria innocua LactobacillalesLactobacillales Lactobacillus plantarum FirmicutesFirmicutes Lactococcus lactis Streptococcus mutans Streptococcus pneumoniae Streptococcus agalactiae Streptococcus pyogenes Clostridium tetani Clostridium perfringens CostridiaCostridia Clostridium acetobutylicum Thermoanaerobacter tengcongensis Thermotoga maritima Fusobacterium nucleatum Bifidobacterium longum Mycobacterium tuberculosis ActinobacteriaActinobacteria Mycobacterium leprae Streptomyces avermitilis Streptomyces coelicolor Corynebacterium efficiens Tropheryma whipplei Chlorobium tepidum ChlamydiaeChlamydiae Chlamydophila pneumoniae Chlamydia trachomatis Chlamydia muridarum SpirochaetesSpirochaetes Treponema pallidum Borrelia burgdorferi Leptospira interrogans aeolicus Campylobacter jejuni Helicobacter pylori Caulobacter crescentus ProteobacteriaProteobacteria prowazekii Bradyrhizobium japonicum Brucella suis Brucella melitensis Sinorhizobium meliloti Agrobacterium tumefaciens Xylella fastidiosa Xanthomonas campestris pv. campestris Ralstonia solanacearum Neisseria meningitidis Pseudomonas putida Pseudomonas aeruginosa Shewanella oneidensis Vibrio cholerae Vibrio vulnificus Pasteurella multocida Haemophilus influenzae Wigglesworthia brevipalpis Yersinia pestis Buchnera aphidicola Salmonella typhimurium Shigella flexneri

Modified, based on Tree 1++ Fig 2.2 Tree of life from domain organization[28]. The branches, like in the original publication, were color-coded according to NCBI .

The other prokaryote tree, from Lienau et al., supported the conventional monophyly of archaea and eukaryotes, but it unconventionally arranged ε- outside proteobacteria. Besides that, there were still quite a few differences between the two phylogenies, including that the Lienau et al.’s tree put Chlamydia, Spirochetes and

16 Texas Tech University, Ben Qin, May 2017

Chlorobium as basal groups, which were Deinococcus and Synechocystis in the tree of Fukami-Kobayashi et al.

CrenarchaeotaCrenarchaeota Pyrobaculum aerophilum Aeropyrum pernix Sulfolobus solfataricus Sulfolobus tokodaii Pyrococcus furiosus Pyrococcus horikoshii EuryarchaeotaEuryarchaeota Pyrococcus abyssi ArchaeaArchaea Methanobacterium thermoautotrophicum Methanococcus jannaschii Methanopyrus kandleri Thermoplasma volcanium Thermoplasma acidophilum Methanosarcina mazei Archaeoglobus fulgidus Methanosarcina acetivorans Halobacterium sp. ChlamydiaeChlamydiae Chlamydophila pneumoniae Chlamydia trachomatis Chlamydia muridarum SpirochetesSpirochetes Treponema pallidum Borrelia burgdorferi Chlorobium tepidum Bacillus halodurans Oceanobacillus iheyensis Listeria monocytogenes BacteriaBacteria Listeria innocua BacillalesBacillales Staphylococcus aureus Staphylococcus epidermidis Lactobacillus plantarum Lactococcus lactis Streptococcus pneumoniae FirmicutesFirmicutes Streptococcus mutans Streptococcus agalactiae Streptococcus pyogenes Clostridium tetani CostridiaCostridia Clostridium acetobutylicum Clostridium perfringens Thermoanaerobacter tengcongensis Fusobacterium nucleatum Thermotoga maritima Campylobacter jejuni Helicobacter pylori Mycoplasma genitalium Mycoplasma pneumoniae MycoplasmaMycoplasma Mycoplasma penetrans Ureaplasma urealyticum Mycoplasma pulmonis Synechocystis sp. Mycobacterium tuberculosis Mycobacterium leprae Corynebacterium efficiens Streptomyces avermitilis ActinobacteriaActinobacteria Streptomyces coelicolor Bifidobacterium longum Tropheryma whipplei Leptospira interrogans Rickettsia prowazekii Caulobacter crescentus Brucella melitensis Bradyrhizobium japonicum Brucella suis ProteobacteriaProteobacteria ProteobacteriaProteobacteria Sinorhizobium meliloti Agrobacterium tumefaciens Xylella fastidiosa Xanthomonas campestris pv. campestris Ralstonia solanacearum Neisseria meningitidis Pseudomonas putida Pseudomonas aeruginosa Buchnera aphidicola Wigglesworthia brevipalpis Pasteurella multocida Haemophilus influenzae Vibrio cholerae Vibrio vulnificus Shewanella oneidensis Yersinia pestis Salmonella typhimurium Shigella flexneri Escherichia coli

Modified, based on Tree 1++ Fig 2.3 Single most-parsimonious tree of life using genome-scale horizontal gene transfer and sequence evolution data[29]. More than 90% nodes received 100% BS support. Branches were color-coded as the previous tree, with black branches showing the differences.

17 Texas Tech University, Ben Qin, May 2017

There were more discrepancies when it came to archaea and further down the bacterial branches, so the result would be more convincing if PICs from different trees returns the same conclusion.

2.4 Raw data comparison Before applying PIC, it is informative to consider pairs of traits that were mapped onto mirror trees (Fig 2.4 – 2.9, using Lienau et al.’s tree). In those trees, each of the genomic traits studied is lined up side by side with cell type number, with darker branches representing larger trait values. The number of bins was set to six because the most complex prokaryote in this research has 6 cell types, and the bins for genomic features were also kept as 6 to be consistent.

Generally speaking, prokaryotes with more cell types tended to have higher values of each of the genomic traits. This association is particularly strong for gene number and transcription factor number. Note that this comparison was with raw data, not PIC, so any seeming pattern might be an artifact because of common ancestry. The next step is thus to perform PIC analysis, as discussed in the following sections.

18 Texas Tech University, Ben Qin, May 2017

Crenarchaeota

Euryarchaeota Archaea

Chlamydiae

Spirochetes

Bacteria

Firmicutes

Costridia

Mycoplasma

Actinobacteria

Cell Type Proteobacteria Genome size 1 < 1000 kb 2 1000 - 2000 kb 3 2000 - 3000 kb 4 3000 - 4000 kb 5 4000 - 5000 kb 6 > 5000 kb

Fig 2.Modified,4 Mirror tree baseds compar on T r eeing the evolution of 1++ prokaryote cell type number and genome size.

19 Texas Tech University, Ben Qin, May 2017

Crenarchaeota

Euryarchaeota Archaea

Chlamydiae

Spirochetes

Bacteria Bacillales

Firmicutes

Costridia

Mycoplasma

Actinobacteria

Coding genes Cell Type Proteobacteria < 1000 1 1000 - 2000 2 2000 - 3000 3 3000 - 4000 4 4000 - 5000 5 > 5000 6

Fig 2.Modified,5 Mirror trees co based onmparing the evolution T r ee 1++ of prokaryote cell type number and coding genes.

20 Texas Tech University, Ben Qin, May 2017

Crenarchaeota

Euryarchaeota Archaea

Chlamydiae

Spirochetes

Bacteria Bacillales

Firmicutes

Costridia

Mycoplasma

Actinobacteria

Total genes

Cell Type Proteobacteria < 1000 1 1000 - 2000 2 2000 - 3000 3 3000 - 4000 4 4000 - 5000 5 > 5000 6

Fig 2.Modified,6 Mirror trees basedco onmparing the evolution T r ee 1++ of prokaryote cell type number and total gene number.

21 Texas Tech University, Ben Qin, May 2017

Crenarchaeota

Euryarchaeota Archaea

Chlamydiae

Spirochetes

Bacteria Bacillales

Firmicutes

Costridia

Mycoplasma

Actinobacteria

Cell Type Proteobacteria Coding percent 1 < 75 %

2 75 - 80 %

3 80 - 85 %

4 85 - 90 %

5 90 - 95 %

6 > 95 %

Fig 2.Modified,7 Mirror trees basedco onmparing the evolution T r ee 1++ of prokaryote cell type number and coding percent.

22 Texas Tech University, Ben Qin, May 2017

Crenarchaeota

Euryarchaeota Archaea

Chlamydiae

Spirochetes

Bacteria Bacillales

Firmicutes

Costridia

Mycoplasma

Actinobacteria

Cell Type Proteobacteria Regulatory genes 1 < 50

2 50 - 100

3 100 - 200

4 200 - 300 5 300 - 500 6 > 500

Fig 2.Edited,8 Mirror trees based on com T r eeparing the evolution 1++ of prokaryote cell type number and regulatory genes.

23 Texas Tech University, Ben Qin, May 2017

Crenarchaeota

Euryarchaeota Archaea

Chlamydiae

Spirochetes

Bacteria Bacillales

Firmicutes

Costridia

Mycoplasma

Actinobacteria

Cell Type Proteobacteria Transcription factors 1 < 50

2 50 - 100

3 100 - 200

4 200 - 300

5 300 - 500

6 > 500

Fig 2.9 Mirror trees comparing the evolution of prokaryote cell type number and transcription factors.

24 Texas Tech University, Ben Qin, May 2017

2.5 Diagnostic test PIC estimated the values of internal nodes under the assumption that the evolutionary rates (per unit branch length) were constant across the phylogeny[30]. Specifically, trait evolution is modeled as Brownian motion[7].

Brownian motion describes that a character state increases or decreases randomly and independently from its original value, following a normal distribution with a mean that equals to zero. Under Brownian motion, the distance travelled is proportional to the square root of time. In phylogeny, the distance can be interpreted as the absolute value of contrast (difference between paired values at each node), while the time (number of generations) is represented by branch length, measured in units of expected variance of change[31]. Because background rates of evolution can vary across branches (violating the Brownian motion assumption), I first applied a diagnostic test on whether the rates were sufficiently constant, which justifies the usage of phylogenetic independent contrast.

The rate test can be done by plotting the ratio of absolute value of the contrast divided by its standard deviation, against the standard deviation itself. Under Brownian motion, the relationship between these variables is expected to be zero. If the relationship turns out to be significant, it means that the evolution of this trait may not follow the Brownian motion, so the assumption is violated under current branch length setting. In this case, a branch length transformation (e. g., logarithmic) is needed, until the above variables are no longer correlated.

Mesquite 2.75[32] and its PDAP package[33, 34] were used to perform the diagnostic test and then PIC. Within each of these two trees, all branch lengths were set to 1. Unless the branches required further standardization, using equal branch lengths across the entire tree was a traditional method for PIC, and this method was also used in Whitney & Garland’s paper examining Lynch’s results[6].

25 Texas Tech University, Ben Qin, May 2017

Once this test was carried out, if p>0.05 (no significant relationship) then the distribution pattern of data points did not differ significantly from Brownian motion, therefore the branch lengths had been adequately standardized (Table 2.1). To further reveal any potential regularity, a bacteria subset was created from the prokaryote population (leaving out the archaea) and tested separately. The table shows that all traits passed this test for the Fukami-Kobayashi et al. tree, and all but one trait (coding percent) passed for the Lienau et al. tree.

Table 2.1 Branch length diagnostic tests for two prokaryote trees Trait Fukami-Kobayashi et al. Lienau et al. Cell types 0.10 0.11 Genome size 0.10 0.73 Gene number Coding genes 0.096 0.58 Total genes 0.095 0.56 Regulatory components Regulatory genes 0.26 0.98 Transcription factors 0.15 0.70 Coding percent 0.41 0.038 Two-tailed p values are shown; significant values are in bold; df=80.

2.6 Correlation Having established that PIC can be applied to these data, I proceeded to test for correlation between cell type number and those genomic traits. As Table 2.2 shows, on both phylogenies the cell type number strongly correlated with all the genomic features except coding percent. The same pattern was found in the bacterial subset (data not shown), though not in the archaeal subset due to small sample size (n=16, data not shown). The linear square regression (LSR) verified that all the correlations in this section are positive.

26 Texas Tech University, Ben Qin, May 2017

Table 2.2 Regression of prokaryote cell types on genomic traits in two trees Fukami-Kobayashi et al. Lienau et al. Trait slope p value slope p value Genome size 0.00024 0.0038 0.00033 0.00009 Gene number Coding genes 0.00028 0.0026 0.00039 0.00007 Total genes 0.00027 0.0025 0.00039 0.00007 Regulatory components Regulatory genes 0.0046 0.00002 0.0054 0.000002 Transcription factors 0.0046 0.00008 0.0057 0.000003 Coding percent 0.056 0.074 0.027 0.38 The regression slopes and two-tailed p values are shown; significant values are in bold; df=81.

The correlations remained significant when the branch lengths were standardized in some other ways (data not shown), so did the uncorrelated relationship seen in coding percent.

Within the two phylogenies, the more recent tree from Lienau et al. returned a smaller p value when the correlation was significant, and a larger one when it was not. Meanwhile, when the bacterial subset was considered alone, the bacterial correlations were with a smaller p value in Fukami-Kobayashi et al.’s tree, and less so in the other tree (data not shown).

As for the coding percent, setting all branch lengths to one was not enough to standardize the data, and in order to satisfy the diagnostic test, the branch lengths had to be transformed to unrealistic, extreme extent (data not shown). Even so, no significant relationship could be detected between coding percent and cell type number.

27 Texas Tech University, Ben Qin, May 2017

2.7 Regressions of cell types on genomic traits The regressions of cell type number on various genomic traits are listed below. The phylogeny is from Lienau et al., and the trend lines are set to have zero intercept.

3

2

1

Cell types 0 0 1000 2000 3000 4000

-1

-2 Genome size (kb)

Fig 2.10 Regression of prokaryote cell type on genome size.

3

2

1

Cell types 0 0 1000 2000 3000 4000 5000

-1

-2 Coding genes

Fig 2.11 Regression of prokaryote cell type on coding genes.

28 Texas Tech University, Ben Qin, May 2017

3

2

1

Cell types 0 0 1000 2000 3000 4000

-1

-2 Total genes

Fig 2.12 Regression of prokaryote cell type on total genes.

3

2

1

0 0 2 4 6 8 10 Cell types -1

-2

-3 Coding percent

Fig 2.13 Regression of prokaryote cell type on coding percent.

29 Texas Tech University, Ben Qin, May 2017

3

2

1

0 0 100 200 300 400 Cell types -1

-2

-3 Regulatory genes

Fig 2.14 Regression of prokaryote cell type on regulatory genes.

3

2

1

0 0 100 200 300 400 Cell types -1

-2

-3 Transcripon factors

Fig 2.15 Regression of prokaryote cell type on transcription factors.

30 Texas Tech University, Ben Qin, May 2017

2.8 Discussion These results imply that prokaryotes build complex structures by recruiting more genes and regulatory components (therefore having larger genomes). This is especially noteworthy in the case of transcription factors, because there are not many TFs in prokaryotes as in eukaryotes, but these TFs still show the strongest correlation even when they are not abundant. Also note the increasingly smaller p values from genome size to gene number then to regulatory components. This might point to the possibility that the correlations were caused primarily by regulatory components, but also showed up in gene number and genome size because the correlation among these three types of genomic features.

Thus, building the complex structures in prokaryotes required complex regulatory network in the genetic level. This could be the reason that their genetic traits, like genome size, genes, and especially TFs, increase with complexity.

However, if this inclination keeps on as a prokaryote becomes more complex, the need for regulatory components will go beyond the reality, because regulatory genes themselves require regulation. This has been called a “complexity ceiling” and may be the reason that prokaryotes can not construct complicated multicellular colonies to the extent that eukaryotes do[12].

In Fig 1.6, if the prokaryotic line extends towards higher level of complexity, the “complexity ceiling” would be met for the hypothetical prokaryotes. In reality, this area is within the range of eukaryotes. So how did eukaryotes solve this problem? This will be the topic of the next chapter.

31 Texas Tech University, Ben Qin, May 2017

Chapter 3

Eukaryotes

Eukaryotes branched off prokaryotes by around two billion years ago, long after life had originated. They were characterized by a set of iconic eukaryotic synapomorphies. These derived eukaryotic traits include & , , a double membrane bound nucleus, and elaboration of the cytoskeleton through expansion (via gene duplication) of the actin and tubulin gene families[35].

3.1 Cell type justification criteria Eukaryotes were often viewed as structurally more complex than prokaryotes, so it was not surprising that on average, more cell types could be identified for each species. Nevertheless, this might allow more room for debate on the cell type number. For the sake of consistency, I applied the same criteria used in prokaryotes, which meant the main focus should still be the morphological change. But considering some unique eukaryotic natural history, it is necessary to discuss some cases in detail.

Some species complete in their host, and the resulting zygote can develop into ookinete. For example, the ookinete of falciparum goes through the midgut epithelium of mosquito[36], while in a closely related species, parva, there is no ookinete stage and it is the zygote that penetrates epithelial cells of the ’s gut[37]. Sometimes the and ookinete are indistinguishable in the literature, so they are treated as one cell type, as in the case of bovis[38].

32 Texas Tech University, Ben Qin, May 2017

The unsporulated oocyst of tenella contains a single zygote, so they are counted as one cell, and the same situation can be seen in its sporoblast and sporocyst. Meanwhile, the merozoite from the 2nd generation is much larger (4 times longer) than the 1st generation, and is a second cell type[39].

For anisogamous species, macro- and micro- gametocytes or are differentiated. Gametocyte and are also treated as distinct cell types if they are consistently distinguished in the literature. Gametes are also counted as two distinct cell types in , since one of its gametes (but not the other) has a structure called the Strahlenkörper, used to enter the host cell[38].

3.2 Data mining In this study of eukaryotes, the genomic data became a bottleneck limiting the sample size, because compared to prokaryotes, eukaryotic genome sequences were less available due to their larger size. To assemble the species list, I started with the phylogeny from Burki et al.[40], then added taxa for which the genome sequence was available in NCBI[41], and for which the phylogeny had been established, allowing them to be grafted onto the tree.

A total of 45 eukaryotes were chosen, with NCBI providing the genomic data, including genome size, protein number, gene number, and GC content (note that unlike in prokaryotes, there was no coding gene number, regulatory gene number, or coding percent. These were replaced with protein number and GC content; data retrieved at the end of 2014). The transcription factor number data were from the same online database[27] as in prokaryotes, plus a publication of de Mendoza et al[42]. However, there were insufficient data for some eukaryotic species, making the sample size smaller for analyses involving transcription factors in eukaryotes.

The cell type number of each eukaryote was also gathered individually from the literature on that species (Table C.1, Fig 3.1). Compared with prokaryotes, eukaryotes certainly showed greater diversity in structure, as there were cases distributed more

33 Texas Tech University, Ben Qin, May 2017 evenly for each cell type number. Within all the species, 14 types of cell were identified for , rendering it the most complex species in these 45 microbes, and this brought attention to the group Apicomplexa, which contained all taxa with more than ten life cycle stages.

10 8 6 4 2 Number of cases 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Cell type number

Fig 3.1 Summary of cell type number in unicellular eukaryotes.

3.3 Phylogenies As with the prokaryotes, PIC analysis was performed separately on two recent eukaryote phylogenies. The first tree used the phylogeny constructed by Parfrey & Grant[43] as a foundation, but added to this some taxa for which data were available but that were not included in Parfrey & Grant’s study. Specifically, the taxa Sterkiella, Ostreococcus, and were grafted on using Derelle & Lang’s tree[44], and Cryptococcus was inserted using Burki et al.’s phylogenetic information[40]. The other tree was based on Katz & Grant[45], with the missing taxa (Aureococcus, Emiliania, Galdieria, , and ) added according to the first tree.

The second tree only modified the first one in three places: 1) in , Acanthamoeba was the outgroup; 2) red algae were reevaluated as sister to ; and 3) Rhizaria were set as the outgroup to Stramenopiles and Alveolates.

34 Texas Tech University, Ben Qin, May 2017

Candida albicans Schizosaccharomyces pombe Cryptococcus neoformans Ustilago maydis Batrachochytrium dendrobatidis OpisthokontsOpisthokonts Trachipleistophora hominis Capsaspora owczarzaki Monosiga brevicollis rosetta Dictyostelium discoideum AmoebozoaAmoebozoa AmoebozoaAmoebozoaAcanthamoeba castellanii major cruzi ExcavataExcavata gruberi intestinalis vaginalis reinhardtii Volvox carteri GreenGreen algaealgae Chlorella variabilis pusilla merolae Guillardia theta Phaeodactylum tricornum Aureococcus anophagefferens gaditana StramenopilesStramenopiles Phytophthora sojae Blastocystis hominis filosa SARSAR thermophila tetraurelia AlveolatesAlveolates Sterkiella histriomuscorum / Oxytricha Perkinsus marinus parvum Toxoplasma gondii Neospora caninum Eimeria tenella Babesia bovis Edited, based on Untitled Tree Fig 3.2 Most likely eukaryotic tree of life reconstructed from 16 genes[43]. Branches were color-coded according to the original publication.

35 Texas Tech University, Ben Qin, May 2017

Candida albicans Schizosaccharomyces pombe Cryptococcus neoformans Ustilago maydis Batrachochytrium dendrobatidis OpisthokontaOpisthokonta Trachipleistophora hominis Encephalitozoon cuniculi Capsaspora owczarzaki Monosiga brevicollis Salpingoeca rosetta AmoebozoaAmoebozoaAcanthamoeba castellanii Dictyostelium discoideum Entamoeba histolytica Leishmania major Trypanosoma cruzi ExcavataExcavata Naegleria gruberi Giardia intestinalis Trichomonas vaginalis Chlamydomonas reinhardtii Volvox carteri Chlorella variabilis Ostreococcus tauri PlantaePlantae Micromonas pusilla Cyanidioschyzon merolae Galdieria sulphuraria Emiliania huxleyi Guillardia theta Reticulomyxa filosa Thalassiosira pseudonana Phaeodactylum tricornum SARSAR Aureococcus anophagefferens Nannochloropsis gaditana StramenopilesStramenopiles Phytophthora sojae Blastocystis hominis Tetrahymena thermophila Paramecium tetraurelia AlveolatesAlveolates Sterkiella histriomuscorum / Oxytricha Perkinsus marinus Cryptosporidium parvum Toxoplasma gondii Neospora caninum Eimeria tenella Plasmodium falciparum Babesia bovis Theileria parva Edited, based on Untitled Tree Fig 3.3 Most likely eukaryotic tree of life reconstructed from 150 most even genes[45]. Most branches were with ≥80% BS support. Branches were color-coded according to the original publication.

36 Texas Tech University, Ben Qin, May 2017

3.4 Raw data comparison Similar to what was done in Chapter 2 for prokaryotes, I present the raw data comparison on eukaryotic character evolution (Fig 3.4 – 3.8, using Katz and Grant’s phylogeny). In the figure legends, all traits are binned into 6 categories with the exception of GC content (due to the data range).

Unlike in prokaryotes, cell type evolution in eukaryotes did not correlate with any of the genomic traits. Though this “mismatch” was initially seen from raw data without PIC, it hinted that the morphological evolution of eukaryotes could be very different from what the prokaryote analysis would lead us to expect.

Opisthokonta

Amoebozoa

Excavata

Plantae

Cell Type SAR Genome size Stramenopiles 1 - 2 < 20 mb

3 - 4 20 - 40 mb

5 - 6 Alveolates 40 - 60 mb

7 - 8 60 - 80 mb

9 - 10 80 - 100 mb > 10 > 100 mb

Fig 3.4 Mirror trees comparing the evolution of eukaryote cell type number and genome size.

37 Texas Tech University, Ben Qin, May 2017

Opisthokonta

Amoebozoa

Excavata

Plantae

Cell Type SAR Proteins Stramenopiles 1 - 2 < 5000

3 - 4 5000 - 10000

5 - 6 Alveolates 10000 - 20000

7 - 8 20000 - 30000

9 - 10 30000 - 40000 > 10 > 40000

Fig 3.5 Mirror trees comparing the evolution of eukaryote cell type number and protein number.

38 Texas Tech University, Ben Qin, May 2017

Opisthokonta

Amoebozoa

Excavata

Plantae

Cell Type SAR Genes Stramenopiles 1 - 2 < 5000

3 - 4 5000 - 10000

5 - 6 Alveolates 10000 - 20000

7 - 8 20000 - 30000

9 - 10 30000 - 40000 > 10 > 40000

Fig 3.6 Mirror trees comparing the evolution of eukaryote cell type number and gene number.

39 Texas Tech University, Ben Qin, May 2017

Opisthokonta

Amoebozoa

Excavata

Plantae

Cell Type SAR Stramenopiles 1 - 2 GC content 3 - 4 20 - 30 %

5 - 6 Alveolates 30 - 40 %

7 - 8 40 - 50 %

9 - 10 50 - 60 % > 10 60 - 70 %

Fig 3.7 Mirror trees comparing the evolution of eukaryote cell type number and GC content.

40 Texas Tech University, Ben Qin, May 2017

Opisthokonta

Amoebozoa

Excavata

Plantae

Transcription factors Cell Type SAR No data Stramenopiles 1 - 2 < 50

3 - 4 50 - 100 Alveolates 5 - 6 100 - 150

7 - 8 150 - 200

9 - 10 200 - 600 > 10 > 600

Fig 3.8 Mirror trees comparing the evolution of eukaryote cell type number transcription factors.

3.5 Diagnostic test As in prokaryotes, a diagnostic test was necessary before running the PIC. For both eukaryotic trees, the Brownian motion assumption was not upheld when all branch lengths were set to 1.0. I therefore applied one round of branch length transformation of Nee, after which both trees passed the test (Table 3.1).

41 Texas Tech University, Ben Qin, May 2017

Table 3.1 Branch length diagnostic tests for two eukaryote trees Trait Parfrey & Grant Katz & Grant Cell types 0.19 0.16 Genome size 0.24 0.46 Proteins 0.38 0.73 Genes 0.42 0.77 Transcription factors 0.21 0.22 GC content 0.25 0.41 Two-tailed p values are shown; df=34 for Transcription factors and df=42 for other traits.

3.6 Correlation Interestingly, none of the eukaryotic genomic features correlated with their cell type number in either tree (Table 3.2). This implied that, unlike prokaryotes, eukaryotes achieved complex structure not through accumulating regulatory genes, but in an unknown novel way.

Table 3.2 Regression of eukaryote cell types on genomic traits in two trees Parfrey & Grant Katz & Grant Trait slope p value slope p value Genome size -0.0000047 0.64 -0.0000035 0.74 Proteins -0.00004 0.24 -0.000037 0.30 Genes -0.000042 0.21 -0.000038 0.27 Transcription factors -0.0014 0.36 -0.0014 0.37 GC content -0.031 0.45 -0.023 0.58 The regression slopes and two-tailed p values are shown; df=35 for transcription factors and df=43 for other traits.

42 Texas Tech University, Ben Qin, May 2017

3.7 Regressions of cell types on genomic traits Similarly, I regressed eukaryote cell type number on various genomic traits. The phylogeny is from Katz and Grant, and the trend lines are set to have zero intercept

15

10

5

Cell types 0 0 50000 100000 150000 200000 250000

-5

-10 Genome size (kb)

Fig 3.9 Regression of eukaryote cell types on genome size.

15

10

5

Cell types 0 0 10000 20000 30000 40000 50000 60000 70000

-5

-10 Proteins

Fig 3.10 Regression of eukaryote cell types on protein number.

43 Texas Tech University, Ben Qin, May 2017

15

10

5

Cell types 0 0 10000 20000 30000 40000 50000 60000 70000

-5

-10 Genes

Fig 3.11 Regression of eukaryote cell types on gene number.

10

5

0 0 10 20 30 40

Cell types -5

-10

-15 GC content

Fig 3.12 Regression of eukaryote cell types on GC content.

44 Texas Tech University, Ben Qin, May 2017

6

3

0 0 1000 2000

Cell types -3

-6

-9 Transcripon factors

Fig 3.13 Regression of eukaryote cell types on transcription factor number.

3.8 Discussion Interestingly, all the regression slopes were negative, though not statistically significant, whereas they were positive in prokaryotes. This suggests that the different pattern observed in eukaryotes is not simply an artifact of smaller sample size, but rather shows a real distinction between eukaryotes and prokaryotes in the evolution of morphological complexity.

The graphs and table implies that eukaryotes achieved their complexity in a way completely different from prokaryotes. This discovery is consistent with other works, including Hou & Lin’s [26] and Ahnert, et al.’s[12], and it seems that patterns observed in prokaryotic genomes often fail to carry over to eukaryotes. For instance, protein- coding gene number increases with genome size at different rates in prokaryotes and eukaryotes (Fig 3.14). However, previous research relating to this phenomenon did not employ PIC. The present study is thus the first to show a notable difference in genome evolution between prokaryotes and eukaryotes when phylogeny is taken into account.

45 Texas Tech University, Ben Qin, May 2017

Fig 3.14 Regression of coding gene number on genome size[26]. Note that the different patterns for eukaryotes and non-eukaryotes, and this was plot with raw data before PIC.

Up to this point, my study has focused on single-celled organisms with PIC, but what about multicellular species? Multicellular taxa included the most complex life forms on earth, some having many more cell types than any of the unicellular organisms studied. The next step is thus to study the relationship between polycellularity, genome structure, and multicellularity.

46 Texas Tech University, Ben Qin, May 2017

Chapter 4

Eukaryotic Multicellularity

The evolution of multicellularity in eukaryotes has attracted a lot of attention because multicellular eukaryotes include the most complex life forms known. Despite this interest, the initial stages in the evolution of eukaryote multicellelarity are still not well understood. Given that multicellularity has arisen many times in eukaryotes, we can ask what properties of a branch make it more likely to give rise to multicellular taxa.

4.1 Two different paths to multicellularity In order to develop multicellular , a eukaryotic cell (spore or zygote) can divide into many daughter cells, or get together with other individual cells[46]. This study accordingly categorized multicellular species into two categories: division-based and aggregation-based groups.

Based on the cellular structures and life history, I identified 11 as divisional groups: Chytrids, , , Animals+Choanoflagellates, Brown algae, , Chrysophytes, the UTC , Streptophytes, +Floridean algae, and . Unlike the classification proposed by Niklas & Newman[13], this identification classified Choanoflagellates (along with Animals) and Diatoms as clades containing multicellular species.

In addition to divisional multicellular group, I identified 8 clades in which multicellularity arises through the aggregation of free-living [14, 47]. These aggregation- based multicellular groups are Guttulinopsis in , Sorodiplophrys in Labyrinthulids, Sorogena in , Acrasids in Heterolobosea, Copromyxa in , Dictyostellids, Capsaspora, and Fonticula (Fig 4.15).

47 Texas Tech University, Ben Qin, May 2017

4.2 Independent origins of divisional multicellularity

4.2.1 Fungi Among these 11 division-based multicellular clades, 3 of them are fungal lineages: , Pezizomycotina (), and Agaricomycotina (). Chytrids can be distinguished from the other two taxa () by their presence of flagellum and absence of conidia or regular septa[48]. As for , they only form septa when releasing gametes or detaching dead hyphae, so they were not counted as another seperate multicellular lineage from Chytridiomycota. Within Dikarya, Pezizomycotina and Agaricomycotina both have unicellular sister groups, such as budding for the former and Ustaligo for the latter, convincingly indicating that Pezizomycotina and Agaricomycotina represent separate origins of multicellularity (Fig 4.1).

Kickxellomycotina Entomophthoromycotina Pezizomycotina Agaricomycotina Chytridiomycota Mircosporidia

Rozella Edited, based on Untitled Tree+ Fig 4.1 Fungal phylogeny showing the appearances of multicellular lineages (colored in red, modified from Stajich et al.[48]).

48 Texas Tech University, Ben Qin, May 2017

4.2.2 Animals & Choanoflagellates Concerning the lineage composed of Animals+Choanoflagellates, some Choanoflagellates like Salpingoeca rosetta can form multicellular colonies through division. However, the phylogeny shows that these colonial species do not form a single clade within the Choanoflagellates, but instead are intersperse with lots of species like Monosiga brevicollis, which are not known to form colonies[49]. Considering the extreme similarity between colonial Choanoflagellates and , I decided to treat the evolution of multicellularity in Animals+Choanoflagellates as a single event, because it is unlikely for a trait of such resemblance to evolve more than 5 times (Fig 4.2). The ability to form structured colonies therefore appears to have been present in the common ancestor of Choanoflagellates and Metazoans, and to have been maintained at different levels, and sometimes lost, in descendant groups[50]. It should be noted that counting multiple origins of multicellularity in these groups would actually strengthen the conclusions presented below.

49 Texas Tech University, Ben Qin, May 2017

Salpingoeca sp. (ATCC 50931) Salpingoeca rosetta Salpingoeca infusionum Salpingoeca abyssalis Codosiga gracilis Choanoeca perplexa Monosiga brevicollis Salpingoeca urceolata Salpingoeca sp. (Mallorca) Salpingoeca sp. (ATCC 50929) Monosiga sp. (ATCC 50635) Desmarella sp. Salpingoeca sp. (ATCC 50153) Salpingoeca sp. (ATCC 50938) Salpingoeca sp. (ATCC 50788) Polyoeca dichotoma Helgoeca nana Acanthoeca spectabilis Savillea micropora Diaphanoeca grandis Diaphanoeca sp. Stephanoeca cauliculata Stephanoeca paucicostata Didymoeca costata Acanthocorbis unguiculata Parvicobicula pedunculata Stephanoeca apheles Stephanoeca norrisii Stephanoeca diplocostata () Stephanoeca diplocostata ( Australia) Metozoa parasiticum

Fig 4.2 Maximum likelihood (ML) Phylogeny of Animals Edited, based on Default symmetrical+ and Choanoflagellates[49]. The species with known ability to form colony are denoted by red.

4.2.3 Stramenopiles Brown algae such as kelp are typical multicellular species, and some Chrysophytes can form branching colonies (Phaeothamnion). Because there are many unicellular clades between these two major groups on the phylogeny[13] (Fig 4.3), they were listed as two separate origins of multicellularity.

50 Texas Tech University, Ben Qin, May 2017

Eustigmatophytes Synurophytes Chrysophytes Dictyochophytes Diatoms Bolidomonas Pelagophytes Phaeophytes Giraudyiopsis Tetrasporopsis Xanthophytes Pinguiophytes Phaeothamnion Edited, based on Default symmetrical+Fig 4.3 Phylogeny of stramenopiles showing the incidences of multicellularity (red)[13].

Most diatoms are unicellular, but there are also highly structured multicellular colonies of various shapes (fan, star, zigzag, and filament et al.), and some of them even have cell differentiation and branching. However, these colonial diatoms remain microscopic and none of them develop into complexity on the same level as kelp. This research treats these diatoms as one single multicellular lineage, though they may be found on different sections of the phylogeny. As was the case with the Choanoflagellates, counting multiple origins of multicellularity within the diatoms would only strengthen my final conclusions.

4.2.4 Green algae and plants For the (including the UTC group and streptophytes), there is fundamental difference in the of cell-cell connections between the green algae and the land plants. For instance, the volvocine algae in the UTC group communicate with intercellular cytoplasmic strands. The outer cell wall layer of a unicellular progenitor fastens the multicellular colony, and the inner cell wall layer develops into an expanded extracellular matrix (ECM), leaving intercellular strands as bridges between individual cells. By contrast, in streptophytes (specifically ) the cell-cell

51 Texas Tech University, Ben Qin, May 2017

connection is achieved by plasmodesmata[13] (Fig 4.4). Along with the presence of intervening unicellular groups, this suggests two independent origins of multicellularity.

Pyramimonadales Mamelliales Coccoid prasinophytes Pseudoscourfieldiales Ulvophyceae Trebouxiophyceae

Chlorophyceae Edited, based on Default symmetrical+ Fig 4.4 Phylogeny of Chlorophytes showing the UTC group as a multicellular lineage[13].

Within streptophytes, multicellularity evolved once in the early stage[51] (Fig 4.5).

Mesostigma Klebsormidiales Charales

Zygnematales Land plants

Ostreococcus

Chlamydomonas Fig 4Edited,.5 Putative phylogeny of streptophytes showing the based on Default symmetrical+ appearance of multicellularity[51].

52 Texas Tech University, Ben Qin, May 2017

4.2.5 Red algae It is very likely that multicellularity emerge twice in red algae, though the resolution of the tree in Fig 4.15 is not enough to tell the two events apart. A more detailed phylogeny[52] (Fig 4.6) shows that multicellularity arose in Compsopogonales and the common ancestor of Bangiales and Florideophyceae. The presence of major unicellular clades between these two groups suggests that they are separate events in terms of multicellularity.

Bangiophyceae Bangiophyceae Florideophyceae Florideophyceae Florideophyceae Rhodellophyceae Rhodellophyceae Stylonematophyaceae Porphyridiophyceae Stylonematophyaceae Compsopogonophyceae Porphyridiophyceae Compsopogonophyceae Stylonematophyaceae Cyanidioschyzon merolae Cyanidioschyzon merolae Cyanidioschyzon merolae Galdieria sulphuraria Galdieria sulphuraria Galdieria sulphuraria Edited, based on Untitled Tree+ Edited, based on Untitled Tree+ Edited, based on Untitled Tree+ Fig 4.6 Modified phylogenies of red algae[52]. In all 3 trees there are two independent multicellular lineages (red), because none of these putative topologies sister Compsopogonophyceae with Bangiophyceae + Florideophyceae, and this is in agreement with other research[13].

4.3 Independent origins of aggregative multicellularity In the free-living stage, the organisms in aggregation-based group are mostly - like creatures, with the exception of Sorogena. When aggregating, they all form a sorocarpic structure. In this aspect, are those clades homologous or analogous? Here I discuss their life cycle one by one, to illustrate the reason I treated them as independent origins.

53 Texas Tech University, Ben Qin, May 2017

4.3.1 Capsaspora

Fig 4.7 Life cycle of Capsaspora[53]. A: with filopodia. B: Cells during transition to cystic stage. C: Cystic cells. D-E: Aggregating cells. F: Extracellular matrix between aggregating cells.

Among the aggregative clades, Capsaspora is the closest lineage to animals. Capsaspora remains unicellular in most part of its life cycle, but the cells can group together to a multicellular structure[53] (Fig 4.7). During the aggregation formation, the orthologs of genes associated with multicellularity are up regulated[53]. Some TFs thought to be unique to multicellular metazoan are also found in Capsaspora, but their role/function in Capsaspora is still unclear[54].

54 Texas Tech University, Ben Qin, May 2017

4.3.2 Fonticula

Fig 4.8 Multicellular structure built by aggregation of free- living Fonticula cells[55]. A: spore-bearing fruiting body is supported by extracellular stalk. B-C: are released and geminate to amoebae. D: switching between amoebae and cysts (spores). E-F: amoebae aggregate, covered with stalk of extracellular matrix. G: upper amoebae transfer into spores. H: mature spores migrate upwards forming the fruiting body.

Sister to fungi, the Fonticula is a basal branch from . The free-living amoeba can encyst and be converted into spores, either in solitary or aggregation[55] (Fig 4.8). The way they form the unique extracellular stalk and their phylogenetic position indicate that they stand as an independent origin of multicellularity.

55 Texas Tech University, Ben Qin, May 2017

4.3.3 Dictyostelium

Fig 4.9 Life cycle of Dictyostelium discoideum[56]. A: Starting from spores, free-living social amoebae prey on bacteria, and aggregate upon food shortage. The colony then goes through stages of mound, finger, slug and “Mexican hat”, finally turns into a fruiting body. B: Scanning electron microscopy showing the life cycle.

Dictyostelid is a typical aggregation-based multicellular lineage, and its cell-aggregation has been studied extensively[56].

56 Texas Tech University, Ben Qin, May 2017

4.3.4 Copromyxa

Fig 4.10 Life cycle of Copromyxa protea[57]. Note that the amoebae encyst to aggregate (E-I). The semi-transparent cells represent trophic amoebae and the solid-colored cells are encysted.

Copromyxa is the other Amoebozoa that form multicellular colony through aggregation[57] (Fig 4.10). Unlike , the aggregation of Copromyxa cells involves encysting, and the mature multicellular colony comprises cysts. This is fundamentally different from dictyostelid, which is composed directly of amoebae (not cyst), and thus makes Copromyxa an independent multicellular lineage.

57 Texas Tech University, Ben Qin, May 2017

4.3.5 Acrasis

Fig 4.11 Life cycle of Acrasis rosea[57]. The stalk cells are solid-colored with dark blue, while the spores are in solid light blue, and semi-transparent light blue represents amoebae. Note that the amoebae from the sorogen encyst to become the growing basal stalk, which elevates the remaining sorogen (aggregation of amoebae, g-l). Also, at the final stages, the sorogen cells also encyst to make spores (k-l).

The Excavata, , also go through a process of encystment in their life cycle[57] (Fig 4.11). However, this process is after aggregation to create the stalk, which is opposite to the encyst-to-aggregate Copromyxa.

58 Texas Tech University, Ben Qin, May 2017

4.3.6 Sorogena

Fig 4.12 Life cycle of Sorogena[58]. A-C: trophic cells germinate, feed, divide (C’), and encyst (C”). D-H: starvation triggers cell to shrink in size and to aggregate. I- K: a sheath wraps the colony and turns into the stalk. L-Q: the sorogen ascends and the fruiting body takes shape.

Within the aggregation-based group, the , Sorogena, is the only non-amoeba taxon. Its sheath (then stalk) is an extracellular matrix secreted by the cells[58], and this characteristic is similar to Fonticula.

59 Texas Tech University, Ben Qin, May 2017

4.3.7 Guttulinopsis

Fig 4.13 Life stages of Guttulinopsis vulgaris[59]. A: multicellular fruiting body with a sorus (scale bar: 250 μm); B: three sori projecting from one stalk (scale bar: 250 μm); C: complete image of fruiting body in A (scale bar: 50 μm); D: irregular shaped spores (scale bar: 10 μm); E: lobose amoeba and round spore (scale bar: 10 μm).

Guttulinopsis is a special Rhizaria in terms of such as the absence of fine (Fig 4.13, E) or a flagellated stage (very unusual for Rhizaria). It also reaches a life cycle stage of sorocarpic fruiting body, which produces spores of irregular shapes. Besides, its stalk is made of amoeboid and encysted cells[59].

60 Texas Tech University, Ben Qin, May 2017

4.3.8 Sorodiplophrys

Fig 4.14 Life stages of Sorodiplophrys stercorea[60]. A: golden sorocarp; B: sorocysts; C, D: amoeba; E: amoebae with anastomosing pseudopodia. Arrows indicate swellings in pseudopods. Scale bar: 400 μm in A, 10 μm in B-E.

The last group, Sorodiplophrys, belongs to Stramenopiles, and its typical golden macroscopic soropcarp stands out from other aggregation-based species (Fig 4.14, A). As for the stalk, both dead and degenerated cells can be found[57].

4.4 Phylogenies The two eukaryotic phylogenies used in the previous chapter were also employed here to analyze the evolutionary history of multicellularity. The 19 multicellular clades were mapped to both trees, keeping track of which groups are divisional[13] and which are aggregation group[14, 61].

Importantly, division and aggregation based multicellularity are always phylogenetically separated, with no cases of one type arising within a clade that encompasses the other. This supports the idea that these represent different routs to multicellularity, instead of different stages in a single process.

61 Texas Tech University, Ben Qin, May 2017

AscomycotaAscomycota Schizosaccharomyces pombe Pezizomycotina Candida albicans Cryptococcus neoformans BasidiomycotaBasidiomycota Agaricomycotina Ustilago maydis FungiFungi Chytridomycota Batrachochytrium dendrobatidis Trachipleistophora hominis Encephalitozoon cuniculi Fonticula Capsaspora owczarzaki Capsaspora Animalia Monosiga brevicollis Salpingoeca rosetta Dictyostelium discoideum Dictyostellids Entamoeba histolytica Tubulinea (Copromyxa) Acanthamoeba castellanii ExcavatesExcavates Leishmania major Trypanosoma cruzi ExcavataExcavata Acrasids Naegleria gruberi Giardia Trichomonas vaginalis Chlamydomonas reinhardtii UTCUTC Volvox carteri Chlorella variabilis UTCs ChloroplastidaChloroplastida Ostreococcus tauri Micromonas pusilla RhodophyceaeRhodophyceae Cyanidioschyzon merolae Galdieria sulphuraria Compsopogonales Bangiales+Florideophyceae +CryptophytesHaptophytes+Cryptophytes Emiliania huxleyi Guillardia theta Cercozoa (Guttulinopsis) Reticulomyxa filosa Thalassiosira pseudonana Phaeodactylum tricornum Diatoms Brown algae SARSAR Character: Cell Types SARSAR Aureococcus anophagefferens Parsimony Chrysophytes reconstruction (Squared) StramenopilesStramenopiles Nannochloropsis gaditana Phytophthora sojae [Squared length: Labyrinthulids (Sorodiplophrys) 507.7315493] Blastocystis hominis Tetrahymena thermophila Paramecium tetraurelia 1 to 2 Sorogena 2 to 4 AlveolatesAlveolates Sterkiella histriomuscorum / Oxytricha trifallax 4 to 6 Perkinsus marinus Cryptosporidium parvum 6 to 10 Toxoplasma gondii 10 to 14 Neospora caninum Eimeria tenella Plasmodium falciparum Babesia bovis Theileria parva Modified, based on Untitled Tree

Fig 4.15 Eukaryotic tree of life showing cell type evolution along the branches. The tree was based on Katz and Grant’s 2015 phylogeny as in Fig 3.3. Grey branches are what gave rise to Divisional (striped branches and red taxon names) and Aggregative (striped branches and blue taxon names) multicellularity. Note that the grey branches are not necessarily multicellular.

62 Texas Tech University, Ben Qin, May 2017

Among 89 internal and external branches, 11 of them gave rise to division-based multicellularity, and 8 of them generated aggregation-based multicellularity. These numbers were the same for both trees, with the only exception in the category of transcription factor. Due to the availability of genomic data, there were only 61 branches in total, with 6 of them being divisional and 4 of them being aggregative.

4.5 Data comparison For all morphological and genomic features, Mesquite 2.75[32] estimated the parsimonious ancestral character state of each internal branch. Because this analysis borrowed the two phylogenies from the previous chapter, the branch length transformation of Nee was also applied to both trees before ancestral state reconstruction. This allowed me to estimate the ancestral states, with respect to cell type number and genomic traits, of the branches from which multicellularity evolved.

It is important to note that the character states of the multicellular clades themselves were not included in the estimation of ancestral character states. To see why this was done, note that some metazoans have many times more cell types than does any , but these arose long after multicellularity had evolved, and we are interested in the states of the unicellular ancestors that gave rise to multicellularity. If we included these in the estimates of ancestral branches, we would artificially increase the estimated number of cell types for the ancestor. In a sense, this analysis estimates the ancestral states of multicellular clades by looking at the states of their closest unicellular relatives.

Once the multicellular clades were mapped onto the trees, I compared the character states of branches that gave rise to multicellular clades with the states of the background branches (which did not give rise to multicellularity, Fig 4.16 – 4.21).

63 Texas Tech University, Ben Qin, May 2017

Division

Aggregaon

Background 0 4 8 12 16 Cell types

Fig 4.16 Comparison of cell type number in background branches and branches that gave rise to multicellularity. Data points were internal and external branch values from the Katz & Grant phylogeny. There were 70 background branches (green), 8 aggregative branches (blue), and 11 divisional branches (red).

Division

Aggregaon

Background 0 50 100 150 200 Genome size (Mb)

Fig 4.17 Comparison of genome size in background branches and branches that gave rise to multicellularity.

64 Texas Tech University, Ben Qin, May 2017

Division

Aggregaon

Background 0 20000 40000 60000 Proteins

Fig 4.18 Comparison of protein number in background branches and branches that gave rise to multicellularity.

Division

Aggregaon

Background 0 20000 40000 60000 Genes

Fig 4.19 Comparison of gene number in background branches and branches that gave rise to multicellularity.

65 Texas Tech University, Ben Qin, May 2017

Division

Aggregaon

Background 0 20 40 60 80 GC percentage

Fig 4.20 Comparison of GC percentage in background branches and branches that gave rise to multicellularity.

Division

Aggregaon

Background 0 500 1000 1500 Transcripon factors

Fig 4.21 Comparison of transcription factor number in background branches and branches that gave rise to multicellularity.

66 Texas Tech University, Ben Qin, May 2017

4.6 Statistical analysis Surprisingly, multicellular eukaryotes tended to arise from branches with fewer cell types, proteins, genes, and transcription factors, and with smaller genome size than background branches. This was the opposite of my initial expectation. Also, previous research mainly suggested that genomic complexity is proportional to morphological complexity[54, 62], though those analyses were all done within multicellular groups, so they did not address the question of what traits initially gave rise to multicellularity.

To test this pattern, for each trait and each multicellular group, I used a Python program to randomly sample (with replacement) a given number of branches from the trees, and calculate the probabilities (one-tailed) of getting values more extreme than the actual data (Table 4.1).

The traits that were significantly associated with multicellularity were cell type number and the related traits protein number and coding gene number. Though protein and gene number are not independent, recall from the previous chapter that they are both negatively correlated with cell type number (the negative correlation was not statistically significant, but is sufficient to show that cell type number is a statistically different measure of complexity from gene/protein number).

67 Texas Tech University, Ben Qin, May 2017

Table 4.1 Sampling simulation test on the origins of divisional and aggregative multicellularity in two trees Parfrey & Grant Katz & Grant Trait Division Aggregation Total Division Aggregation Total

-1.32 -0.76 -1.09 -1.37 -0.80 -1.13 Cell types (0.040) (0.22) (0.029) (0.038) (0.22) (0.029)

-9804 -1756 -6415 -10533 -1542 -6747 Genome size (0.14) (0.45) (0.18) (0.12) (0.47) (0.17)

-4502 6379 79 -4738 6381 -56 Proteins (0.034) (0.049) (0.52) (0.027) (0.049) (0.51)

-4826 6221 -175 -5071 6210 -321 Genes (0.028) (0.055) (0.48) (0.022) (0.054) (0.46)

3.98 -5.67 -0.08 3.74 -5.71 -0.24 GC content (0.12) (0.072) (0.46) (0.14) (0.075) (0.45)

Transcription -27.90 2.27 -15.83 -27.83 2.50 -15.70 factors (0.39) (0.32) (0.47) (0.39) (0.31) (0.47) Monte Carlo sampling for each trait was performed for 100,000 times; values were in bold when significant. Numbers not in parentheses are the mean differences between the branches giving rise to multicellularity and background branches. Negative values show where the branches giving rise to multicellularity had lower values than the average for the tree.

68 Texas Tech University, Ben Qin, May 2017

4.7 Discussions

4.7.1 Divisional multicellularity arises from the simplest unicellular ancestors. The most striking result is that divisional multicellular clades evolve from relatively “simple” ancestors, with smaller numbers of cell types and fewer proteins/genes than the average of background branches. This is particularly interesting because divisional multicellular groups include the most complex multicellular organisms, including metazoans and vascular plants.

In Chapter 3, cell type number has a negative (not significant) relationship with all genomic traits, which means an eukaryote with fewer cell types is expected to have higher number in genomic traits. However, the ancestors leading to divisional multicellularity has low level of not only cell type number, but also genomic traits, meaning that their simplicity is in terms of both morphological complexity and genomic complexity.

Fig 4.15 also confirms that the divisional lineages tended to have sister/neighboring taxa with lower level of “polycellularity”. Furthermore, neither divisional nor aggregative multicelluarity arose from Apicomplexa, the group with largest cell type numbers among unicellular organisms.

4.7.2 Aggregation and division based multicellularity show different patterns. Aggregative multicellularity is significantly associated only with protein number, but it is interesting to note that the correlation is opposite to what we see for divisional multicellularity. Aggregative clades tend to arise from ancestors with more proteins (p=0.049) and genes (p=0.054) than average, whereas divisional clades are associated with fewer proteins or genes than average. This further supports the idea that aggregation and division should be treated as biologically distinct routs to multicellularity.

Overall, the differences between division and aggregation can be seen in: 1) cell behavior, where multicellularity is achieved through division from a single cell, or

69 Texas Tech University, Ben Qin, May 2017 aggregation of several free-living cells; 2) phylogenetic position, where there is no nesting between or within each other; 3) genome architecture of their ancestral branches, where morphologically and genomically simple branches led to divisional but not aggregative group.

70 Texas Tech University, Ben Qin, May 2017

Chapter 5

Conclusion

As we enter the age of genomics, it is interesting to ask: “How does morphology evolution connect with genome evolution?” My motivation in this research was to address this issue.

5.1 What is new in this research Previously, most work on the topic of trends in morphology evolution has focused on multicellular organisms, leaving most microbes unmentioned. This is partially because morphological complexity is traditionally measured in terms of cell type number; therefore unicellular species were seldom discussed. However, if we want to understand how complex morphology evolved from simple structures, we must start with unicellular microbes.

Furthermore, there are many databases for genomic features, such as genome size, gene number, and transcription factor number, but there is not a comprehensive database for microbial cell type number. Some researches[9] listed cell type number of typical species, most of which were multicellular. However, even among those relatively few multicellular organisms that had been assigned a cell type number, former publication did not provide justification/criteria for those cell type numbers. This had made results difficult to interpret and reproduce.

Thirdly, most previous research into large-scale genomic evolution has ignored phylogeny, meaning that the results could be artifacts. Therefore, statistical analyses involving multiple species comparison should be carried out in the background of phylogeny.

71 Texas Tech University, Ben Qin, May 2017

Having these issues in mind, in this dissertation I redefined the cell type number of a single cell species as the number of morphologically distinct life cycle stages, and then used this as a measure of morphological complexity. In total, this research collected cell type number of 83 prokaryotes and 45 eukaryotes, and for the first time, detailed justification was provided for each species.

Throughout this research, all the results were based on phylogenetic independent contrast. Also, at least two phylogenies were applied for each of the prokaryotes, eukaryotes, and multicellularity sections, and the conclusions were consistent across different phylogenies.

5.2 Prokaryotes and eukaryotes In prokaryotes, this study found that morphology and genome evolution were correlated. Cell type number is positively correlated with genome size, genes and transcription factors (TFs). Among these genomic features, TFs show the strongest connection (the steepest regression slope) with cell type number, for the p value is much smaller than other genomic features (by more than an order of magnitude). This result is consistent between 2 phylogenies, various branch length adjustments, and when archaea were excluded. Therefore, in prokaryotes, increasing morphological complexity is achieved the manner we would expect, with the help of more genes, proteins, and especially regulatory components.

Surprisingly, unlike prokaryotes, eukaryotic morphological complexity is not associated with any of the genomic features studied. In fact, the relationship was even slightly negative (though not significant), implying that this lack of correlation was not due to smaller sample size. Thus, simply accumulating genes or proteins, or adding regulatory components, is not how single celled eukaryotes achieved their morphological complexity.

72 Texas Tech University, Ben Qin, May 2017

5.3 Eukaryote multicellularity To study the evolution of multicellularity, I estimated the cell type number and genomic quantities for all of the internal branches of the phylogeny of unicellular eukaryotes (replicating the process on two recent eukaryote phylogenies). I then divided the multicellular eukaryotes into two categories, based on how initial multicellular units arise, and mapped each multicellular clade onto the tree. I could then see what trait values, in unicellular ancestors, are associated with the initial evolution of multicellularity. I used a Monte Carlo resampling simulation to test the significance of results.

The most striking result was that the divisional multicellular organisms, which include the most complex organisms on earth, arose from ancestors with a low level of morphological and genomic complexity. Direct comparison and Monte Carlo resampling test both showed that, these organisms originated from ancestors with significantly fewer cell types, proteins, and genes than the average values of the background branches.

In contrast to the divisional group, the aggregative group of multicellular organisms did not display very clear evolutionary pattern. Though not always significant, aggregation- based species appear to arise from ancestors with more proteins, genes and transcription factors. Analogous to the case comparing prokaryotes and eukaryotes cell type number; this suggests that the two groups achieve multicellularity through fundamentally different molecular mechanisms, and perhaps for different adaptive reasons.

To understand why some division-based groups have evolved to build more complex multicellular units than do aggregation-based colonies, it is important to note the unique reproduction of the former – going through a single cell stage. This stage minimized variation within a colony, for all cells in one individual colony are derived from one zygote/spore. This minimizes variation within colonies, while maximizing variation between colonies, shifting the main focus of selection from individual cells to

73 Texas Tech University, Ben Qin, May 2017 multicellular colonies. Thus, colony development from (rather than aggregation) maximizes the potential for cooperation among all cells in building more advanced structures.

On the other side, aggregation-based multicellular organisms were formed by free-living individual cells, which may be quite different from one another. Hence, this behavior amplified variation and thereby selection within a multicellular colony. This increases the potential for the evolution of “cheating” strategies, which benefit an individual cell while hurting the colony. This might explain aggregation-based group failed to achieve the complexity similar to division-based group.

5.4 Future study

5.4.1 Incorporating more data The two phylogenies chosen for prokaryotes’ PIC had quite different topologies, but the PIC results were consistent with each other. Similar consistency was achieved in all three phylogenies tested in eukaryotes (two trees were shown). Such robustness added confidence to the conclusion of this research, but more examination might lead to new discoveries. In particular, there is still uncertainty in the basal branching patterns of both prokaryotes and eukaryotes; therefore I am curious to see the test result from new phylogeny in the future.

Similarly, new knowledge on microbial life cycle and genomes would also enable us to analyze this topic with more information. This is particularly true for transcription factors because my current dataset is not complete in TF number.

One portion of this study focused on the evolution of eukaryotic multicellularity, using PIC to reveal what kinds of branches could give rise to multicellular lineages. This method can be carried out on prokaryotes in the future.

74 Texas Tech University, Ben Qin, May 2017

5.4.2 Prokaryotic multicellularity Currently, there are three groups of multicellular prokaryotes and they are the , Actinomyces, and Myxobacteria[57]. Among these, cyanobacteria are well known to have multicellular structures, which arise from cell division. By contrast, Myxobacteria behave like aggregative multicellular eukaryotes, by alternating between free-living and aggregating life cycles. The third group, Actinomyces, contains some members similar to multicellular fungi, and some others remain unicellular. Moreover, these three groups are phylogenetically distant, confirming that they are independent origins of multicellularity.

These instances of multicellularity were not included in this study, because of the small sample size for multicellular prokaryotes. However, if more multicellular prokaryotes are discovered, it will be exciting to perform the same PIC analysis as in this dissertation. This could be made possible by the study of prokaryotes’ natural history (especially Archaea) to search new lineages, or by splitting any of the three current groups, if within any of them, multicellularity arose more than one time.

Based on the result from this study, I would predict that multicellular prokaryotes originated from branches with more genes and transcription factors – opposite to what I found in division-based multicellular eukaryotes. This is because from the second chapter of this dissertation, prokaryotes were found to build their complex cell types with the assistance from increasing number of genes, especially regulatory genes. This is my guess, but I look forward to testing this hypothesis.

5.4.3 Evolution of cooperation Multicellularity is a type of cooperation, so studying how multicellular colony evolves may shed light on the evolution of cooperation. Interestingly, individuals of some animals form colony behaving like one single organism, such as siphonophores. This is similar to the construction of a , and the underneath mechanism may also be similar.

75 Texas Tech University, Ben Qin, May 2017

For example, we can look into the independent origins of eusocial lineages in the background of PIC, paying extra attention on what branches can lead to eusociality. At the same time, any knowledge from eusocial animals can benefit our research on multicellularity.

5.5 Concluding remarks Overall, this study founded a novel dataset of cell types for 128 microbes, quantifying complexity by the number of life cycle stages, with detailed justification for each species. Based on this dataset, phylogenetic independent contrast demonstrated that morphological complexity increased with genome complexity (especially regulatory components) in prokaryotes, but not in eukaryotes.

The origins of multicellular could also be tracked from the relationship between morphological and genome complexity. This study showed that the most complex multicellular organisms came from the simplest ancestors in my data, in terms of both morphology and genome. All these discoveries could contribute to our understanding in evolution of morphology, genomic architecture, and multicellularity, as well as triggering new thoughts on these areas.

76 Texas Tech University, Ben Qin, May 2017

Bibliography

1. Gregory, T.R., The C-value enigma in plants and animals: a review of parallels and an appeal for partnership. Ann Bot, 2005. 95(1): p. 133-146.

2. Andrews, C.B., S.A. Mackenzie, and T.R. Gregory, Genome size and wing parameters in passerine birds. Proc Biol Sci, 2009. 276(1654): p. 55-61.

3. Shen, Y.Y., et al., Relaxation of selective constraints on avian mitochondrial DNA following the degeneration of flight ability. Genome Res, 2009. 19(10): p. 1760-1765.

4. Lynch, M. and J.S. Conery, The origins of genome complexity. Science, 2003. 302(5649): p. 1401-1404.

5. Lynch, M. and B. Walsh, The origins of genome architecture. 2007: Sinauer Associates Sunderland.

6. Whitney, K.D. and T. Garland, Jr., Did genetic drift drive increases in genome complexity? PLoS Genet, 2010. 6(8): p. e1001080.

7. Felsenstein, J., Phylogenies and the comparative method. American Naturalist, 1985: p. 1-15.

8. McCarthy, M.C. and B.J. Enquist, Organismal size, metabolism and the evolution of complexity in metazoans. Evolutionary Ecology Research, 2005. 7(5): p. 681-696.

9. Valentine, J.W., A.G. Collins, and C.P. Meyer, Morphological Complexity Increase in Metazoans. Paleobiology, 1994. 20(2): p. 131-142.

10. Haygood, R. and S.T.-N.Y. Investigators, Proceedings of the SMBE Tri-National Young Investigators' Workshop 2005. Mutation rate and the cost of complexity. Mol Biol Evol, 2006. 23(5): p. 957-63.

11. Goley, E.D., A.A. Iniesta, and L. Shapiro, regulation in Caulobacter: location, location, location. J Cell Sci, 2007. 120(Pt 20): p. 3501-7.

12. Ahnert, S.E., T.M. Fink, and A. Zinovyev, How much non-coding DNA do eukaryotes require? J Theor Biol, 2008. 252(4): p. 587-92.

13. Niklas, K.J. and S.A. Newman, The origins of multicellular organisms. Evol Dev, 2013. 15(1): p. 41-52.

77 Texas Tech University, Ben Qin, May 2017

14. Parfrey, L.W. and D.J. Lahr, Multicellularity arose several times in the evolution of eukaryotes (response to DOI 10.1002/bies.201100187). Bioessays, 2013. 35(4): p. 339-47.

15. Givskov, M., et al., Responses to nutrient starvation in Pseudomonas putida KT2442: analysis of general cross-protection, cell shape, and macromolecular content. J Bacteriol, 1994. 176(1): p. 7-14.

16. Nassif, X., et al., Interactions of pathogenic Neisseria with host cells. Is it possible to assemble the puzzle? Mol Microbiol, 1999. 32(6): p. 1124-32.

17. Oliva, B., et al., Morphological and biochemical variations of Haemophilus influenzae type b induced by pH and temperature changes. New Microbiol, 2001. 24(2): p. 117-24.

18. Lu, J., Y. Nogi, and H. Takami, Oceanobacillus iheyensis gen. nov., sp. nov., a deep-sea extremely halotolerant and alkaliphilic species isolated from a depth of 1050 m on the Iheya Ridge. FEMS Microbiol Lett, 2001. 205(2): p. 291-7.

19. Welch, R.W., F.B. Rudolph, and E.T. Papoutsakis, Purification and characterization of the NADH-dependent butanol dehydrogenase from Clostridium acetobutylicum (ATCC 824). Arch Biochem Biophys, 1989. 273(2): p. 309-18.

20. Beltra, R., J.R. De Lecea, and C. De La Rosa, A Comparative Study of Chemical Fractions Isolated from Agrobacterium tumefaciens and from its Stable L-Form. Microbiology, 1972. 73(1): p. 185-188.

21. Tully, J.G., et al., Evaluation of culture media for the recovery of Mycoplasma hominis from the human urogenital tract. Sex Transm Dis, 1983. 10(4 Suppl): p. 256-60.

22. Deeb, B.J. and G.E. Kenny, Characterization of Mycoplasma pulmonis variants isolated from rabbits. I. Identification and properties of isolates. J Bacteriol, 1967. 93(4): p. 1416- 24.

23. Sako, Y., et al., Aeropyrum pernix gen. nov., sp. nov., a novel aerobic hyperthermophilic archaeon growing at temperatures up to 100 degrees C. Int J Syst Bacteriol, 1996. 46(4): p. 1070-7.

24. Thomas, C., D.J. Hill, and M. Mabey, Morphological changes of synchronized Campylobacter jejuni populations during growth in single phase liquid culture. Lett Appl Microbiol, 1999. 28(3): p. 194-8.

25. Croft, L.J., et al., Is prokaryotic complexity limited by accelerated growth in regulatory overhead? Genome Biology, 2003. 5(1): p. 1-26.

26. Hou, Y. and S. Lin, Distinct gene number-genome size relationships for eukaryotes and non-eukaryotes: gene content estimation for genomes. PLoS One, 2009. 4(9): p. e6978.

27. Wilson, D., et al., DBD––taxonomically broad transcription factor predictions: new content and functionality. Nucleic acids research, 2008. 36(suppl 1): p. D88-D92.

78 Texas Tech University, Ben Qin, May 2017

28. Fukami-Kobayashi, K., et al., A tree of life based on protein domain organizations. Mol Biol Evol, 2007. 24(5): p. 1181-9.

29. Lienau, E.K., et al., The mega-matrix tree of life: using genome-scale horizontal gene transfer and sequence evolution data as information about the vertical . , 2011. 27: p. 417-427.

30. Purvis, A. and A. Rambaut, Comparative analysis by independent contrasts (CAIC): an Apple Macintosh application for analysing comparative data. Comput Appl Biosci, 1995. 11(3): p. 247-51.

31. Garland, T., P.H. Harvey, and A.R. Ives, Procedures for the analysis of comparative data using phylogenetically independent contrasts. Systematic biology, 1992. 41(1): p. 18-32.

32. Maddison, W.P. and D.R. Maddison, Mesquite: a modular system for evolutionary analysis. 2011. Version 2.75. 2010.

33. Midford, P., T. Garland Jr, and W. Maddison, PDAP Package of Mesquite. Version 1.16 (2011). 2010.

34. Garland, T., et al., Phylogenetic analysis of covariance by computer simulation. Systematic Biology, 1993. 42(3): p. 265-292.

35. Wickstead, B. and K. Gull, The evolution of the cytoskeleton. J Cell Biol, 2011. 194(4): p. 513-25.

36. Sinden, R.E., et al., Gametocyte and gamete development in Plasmodium falciparum. Proc R Soc Lond B Biol Sci, 1978. 201(1145): p. 375-99.

37. Dolan, T., Theileriasis: a comprehensive review. Revue Scientifique et Technique, Office International des épizooties, 1989. 8(1): p. 11-78.

38. Homer, M.J., et al., . Clin Microbiol Rev, 2000. 13(3): p. 451-69.

39. Olsen, O.W., parasites: their life cycles and ecology. 1986: Courier Corporation.

40. Burki, F., et al., Phylogenomics reshuffles the eukaryotic supergroups. PLoS One, 2007. 2(8): p. e790.

41. Coordinators, N.R., Database resources of the national center for biotechnology information. Nucleic Acids Res, 2016. 44(Database issue): p. D7-D19.

42. de Mendoza, A., et al., Transcription factor evolution in eukaryotes and the assembly of the regulatory toolkit in multicellular lineages. Proc Natl Acad Sci U S A, 2013. 110(50): p. E4858-66.

43. Parfrey, L.W., et al., Broadly sampled multigene analyses yield a well-resolved eukaryotic tree of life. Syst Biol, 2010. 59(5): p. 518-33.

79 Texas Tech University, Ben Qin, May 2017

44. Derelle, R. and B.F. Lang, Rooting the eukaryotic tree with mitochondrial and bacterial proteins. Mol Biol Evol, 2012. 29(4): p. 1277-89.

45. Katz, L.A. and J.R. Grant, Taxon-rich phylogenomic analyses resolve the eukaryotic tree of life and reveal the power of subsampling by sites. Syst Biol, 2015. 64(3): p. 406-15.

46. Bonner, J.T., The origins of multicellularity. Integrative Biology Issues News and Reviews, 1998. 1(1): p. 27-36.

47. Sharpe, S.C., et al., Timing the origins of multicellular eukaryotes through phylogenomics and relaxed molecular clock analyses, in Evolutionary Transitions to Multicellular Life. 2015, Springer. p. 3-29.

48. Stajich, J.E., et al., The fungi. Curr Biol, 2009. 19(18): p. R840-5.

49. Nitsche, F., et al., Higher level taxonomy and molecular of the Choanoflagellatea. J Eukaryot Microbiol, 2011. 58(5): p. 452-62.

50. Carr, M., et al., Molecular phylogeny of choanoflagellates, the sister group to Metazoa. Proc Natl Acad Sci U S A, 2008. 105(43): p. 16641-16646.

51. Timme, R.E., T.R. Bachvaroff, and C.F. Delwiche, Broad phylogenomic sampling and the sister lineage of land plants. PLoS One, 2012. 7(1): p. e29696.

52. Yoon, H.S., et al., Defining the major lineages of red algae (rhodophyta) 1. Journal of phycology, 2006. 42(2): p. 482-492.

53. Sebe-Pedros, A., et al., Regulated aggregative multicellularity in a close unicellular relative of metazoa. Elife, 2013. 2: p. e01287.

54. Sebe-Pedros, A., et al., Unexpected repertoire of metazoan transcription factors in the unicellular holozoan Capsaspora owczarzaki. Mol Biol Evol, 2011. 28(3): p. 1241-1254.

55. Brown, M.W., F.W. Spiegel, and J.D. Silberman, Phylogeny of the "forgotten" cellular , Fonticula alba, reveals a key evolutionary branch within Opisthokonta. Mol Biol Evol, 2009. 26(12): p. 2699-2709.

56. Myre, M.A., Clues to gamma-secretase, huntingtin and Hirano body normal function using the model organism Dictyostelium discoideum. J Biomed Sci, 2012. 19(1): p. 41.

57. Brown, M.W. and J.D. Silberman, The non-dictyostelid sorocarpic amoebae, in : Evolution, Genomics and , Romeralo, Baldauf, and Escalante., Editors. 2013, Springer Berlin Heidelberg. p. 219-242.

58. Blanton, R. and L. Olive, Ultrastructure of aerial stalk formation by the ciliated protozoan Sorogena stoianovitchae. Protoplasma, 1983. 116(2): p. 125-135.

59. Brown, M.W., et al., Aggregative multicellularity evolved independently in the eukaryotic supergroup Rhizaria. Curr Biol, 2012. 22(12): p. 1123-7.

80 Texas Tech University, Ben Qin, May 2017

60. Tice, A.K., et al., Sorodiplophrys stercorea: Another Novel Lineage of Sorocarpic Multicellularity. Journal of Eukaryotic Microbiology, 2016. 63(5): p. 623-628.

61. Gao, F. and L.A. Katz, Phylogenomic analyses support the bifurcation of ciliates into two major clades that differ in properties of nuclear division. Mol Phylogenet Evol, 2014. 70: p. 240-243.

62. Niklas, K.J., E.D. Cobb, and A.K. Dunker, The number of cell types, information content, and the evolution of complex multicellularity. Acta Societatis Botanicorum Poloniae, 2014. 83(4): p. 337-347.

81 Texas Tech University, Ben Qin, May 2017

Appendix A

Cell Type Number of Prokaryotes

Table A.1 Cell type number of 83 prokaryotes

Table A.1 Continued

Cell Species Ref. Justification type

Irregular cocci, may have pili-like appendages Aeropyrum pernix 1 [23] but no flagella

Agrobacterium L-form under glucose agar plus the normal 2 [20] tumefaciens form

Aquifex aeolicus 1 [63] Nonsporeforming rod-shaped bacteria

“Nonmotile, irregular coccoid to disc shaped, Archaeoglobus fulgidus 2 [64] and 0.3 to 1.0 μm wide”

Bacillus halodurans 2 [65] Flagellated motile rod, spore-forming

Motile, matrix-producing, sporulating, and Bacillus subtilis 5 [66] competent cells, plus spore

"Cell shapes ranged from long and thick–rods Bifidobacterium longum 2 [67] with protuberances to long and thin–rods with blunted ends"

82 Texas Tech University, Ben Qin, May 2017

Table A.1 Continued

Cell Species Ref. Justification type

Spirochetes, round bodies, and biofilm-like Borrelia burgdorferi 3 [68] colonies: Cell-Wall Deficient (CWD)

Within the root nodule, the free-living Bradyrhizobium 2 [69] bacterium divides and differentiates into a japonicum bacteroid

Uncapsulated, nonmotile, non-spore-forming Brucella melitensis 1 [70] coccobacilli

Uncapsulated, nonmotile, non-spore-forming Brucella suis 1 [70] coccobacilli

Round or slightly oval cells lack flagella; not Buchnera aphidicola 1 [71] forming morphologically distinct resting stage or endospores

Helical form in logarithmic phase; cell elongated in stationary phase; long rod- Campylobacter jejuni 4 [24] shaped and coccoid in decline phase (budding not included)

Stalked cell, predivisional cell with a flagellum Caulobacter crescentus 3 [72] and pili, and motile swarmer cell

The elementary bodies are electron dense circular and smaller; the reticulate-like bodies are less electron-dense and have a less rigid Chlamydia muridarum 2 [73] outer membrane; used to be called strain SFPD and grouped as one strain of C. trachomatis

83 Texas Tech University, Ben Qin, May 2017

Table A.1 Continued

Cell Species Ref. Justification type

Chlamydia trachomatis 2 [74] Elementary body and reticulate body

Chlamydophila 2 [75] Elementary body and reticulate body pneumoniae

Chlorobium tepidum 1 [76] Non-motile rods

Clostridium Vegetative rod, clostridial form with capsule 3 [19] acetobutylicum and spore

Clostridium perfringens 2 [77] Vegetative rod and spore

Clostridium tetani 2 [78] Vegetative rod and endospore

Corynebacterium 1 [79] Non-motile, non-spore-forming rod efficiens

Deinococcus Spherical, non-motile, non-spore forming, 1 [80] radiodurans 1.5-3.5 μm in diameter

"A life cycle for E. coli that involves two Escherichia coli 2 [81] phenotypically different cell types in two different environments"

[82] Non-spore-forming, non-motile, no Fusobacterium [82, 1 flagellum nucleatum 83] [83] Spindle form, may branch

84 Texas Tech University, Ben Qin, May 2017

Table A.1 Continued

Cell Species Ref. Justification type

Cell elongated up to 10 times in stressful Haemophilus influenzae 2 [17] temperature or pH

Halobacterium sp. 1 [84] Rod shaped

Helical bacillus with multiple flagella at on end in lag and exponential phases, U-shaped Helicobacter pylori 3 [85] precoccoid in the early-decline phase, and coccoid (no flagella) in late-decline phase

"Non-motile, non-spore-forming straight rods with rounded ends, usually 0.9-1.2 μm by 3-8 Lactobacillus plantarum 1 [86] μm, that occur singly, in pairs or in short chains."

Non-motile, non-spore-forming, 0.5-1.5 μm Lactococcus lactis 1 [87] ovoid shape

Leptospira interrogans 1 [88] Flexible, spiral-shaped, has internal flagella

Flagellated at room temperature, non- Listeria innocua 2 [89] flagellated at 37°C

[90] Flagellated at 20°C, non-flagellated at [90, Listeria monocytogenes 2 37°C 91] [91] Elongated and flagella lost under stress

85 Texas Tech University, Ben Qin, May 2017

Table A.1 Continued

Cell Species Ref. Justification type

[92] Non-motile, non-spore-forming, no flagella Methanobacterium [92, 1 [93] Rod in optimal temperature (65°C), in thermoautotrophicum 93] 75°C cell wall becomes thicker and abnormal, in 45°C cell becomes sheroidal with up to 5 times thicker wall

Methanococcus 1 [94] "Only one cell type observed" jannaschii

"Gram-positive motile rods, occuring singly and in chains. No spores formed." Micro-cells Methanopyrus kandleri 1 [95] (10% in length of normal cells) may be present, but whether they are stunted or intentional differentiation is unknown

Methanosarcina 2 [96] Coccus and communal cyst acetivorans

Cocci in complex cycle, resting form in limit Methanosarcina mazei 5 [97] cycle, 3 staining types in sarcinal colony of both cycles

Straight or curved nonmotile rods from 1-8 Mycobacterium leprae 2 [98] μm in length and 02-0.5 μm in width

Typical rod shape plus cell-wall free phase to escape host ; during the latter Mycobacterium 4 [99] phase, filamentous and spherical forms tuberculosis appeared (V and Y shapes may be formed because of budding)

86 Texas Tech University, Ben Qin, May 2017

Table A.1 Continued

Cell Species Ref. Justification type

Mostly flask shape, stages of rounding, Mycoplasma genitalium 1 [21] blebbing, vacuolization and disruption appear as cultures aged

"Most organisms were rod-like with flask- Mycoplasma penetrans 1 [100] shaped features. Only occasional branching structures were found."

Mycoplasma Rounded organisms in early incubation, 2 [101] pneumoniae filamentous forms appear later

A predominance of filamentous and budding Mycoplasma pulmonis 2 [22] forms in growth phase, mainly swollen round forms in death phase

[16, [102] Non-spore-forming diplococcus Neisseria meningitidis 2 102] [16] Pili are present and then lost in life cycle

Predominantly rod-shaped, filamentous Oceanobacillus iheyensis 3 [18] forms also present; forms endospores

Pasteurella multocida 1 [103] Non-motile, non-spore-forming, rod-shape

Pseudomonas 6 [104] Five sessile stages plus planktonic stage aeruginosa

"Cells are long and cylindrical during exponential growth"; when starved, change Pseudomonas putida 2 [15] into round or coccoid shapes, accompanied by shrinking up to 1/5 of the original size

87 Texas Tech University, Ben Qin, May 2017

Table A.1 Continued

Cell Species Ref. Justification type

Cylinder-shaped rods; cell forms terminal Pyrobaculum sphere (like golf club) in stationary phase, 3 [105] aerophilum high nitrite or ph>8; spheres can enlarge to completely become cocci

Highly motile coccus, atypical sizes and Pyrococcus abyssi 2 [106] shapes occurred under stress

Slightly irregular cells, monopolar Pyrococcus furiosus 1 [107] polytrichous flagellated

Irregular cocci of 0.8-2 μm with a polar tuft of Pyrococcus horikoshii 1 [108] flagella

Normal cells have a bacillar form, under Ralstonia solanacearum 3 [109] starvation coccoid and filament appear

Several sections revealed chains of rickettsiae; Large, pleomorphic forms were Rickettsia prowazekii 2 [110] also encountered; these were extremely irregular in shape but possessed the same structural components

Rod shaped, appendages formed when Salmonella typhimurium 2 [111] contacting host cells

Shewanella oneidensis 2 [112] Filamentous at 3°C, short rod at 22°C

"It is a small, uncapsulated, non-motile Gram Shigella flexneri 1 [113] negative nonsporulating, facultative anaerobic "

88 Texas Tech University, Ben Qin, May 2017

Table A.1 Continued

Cell Species Ref. Justification type

Sinorhizobium meliloti 2 [114] Free-living bacterium and symbiotic bacteroid

[115, [115] Non-spore-forming Staphylococcus aureus 1 116] [116] "Spherical bacterium (coccus)"

Staphylococcus Cells are cocci, 0.5-1.0 μm in diameter, and 1 [117] epidermidis arranges mostly in pairs and tetrads

Streptococcus agalactiae 1 [118] Non-motile and non-spore-forming

[119, [119] Non-spore-forming Streptococcus mutans 1 120] [120] Non-motile

Streptococcus Non-motile, nonsporulating diplococcus, 0.5- 1 [121] pneumoniae 1.25 μm in diameter

Non-motile, nonsporulating round-to-ovoid Streptococcus pyogenes 1 [121] coccus, 0.6-1 μm in diameter

Endospore, forespore, and exospore, plus Streptomyces avermitilis 5 [122] aerial and substrate mycelia

[123] 2 stages in both aerial and substrate [123, growth, plus spore Streptomyces coelicolor 6 124] [124] Spore-shaped body formed in certain medium

89 Texas Tech University, Ben Qin, May 2017

Table A.1 Continued

Cell Species Ref. Justification type

Irregular cocci form pili and aggregate under Sulfolobus solfataricus 2 [125] UV (but not other stress), and this may be triggered by double-strand-break

Cells are irregular cocci and variable in size Sulfolobus tokodaii 1 [126] (usually 0.5–0.8 mm in diameter) and have some flattened and uneven surfaces

Synechocystis sp. 1 [127] Spherical shape

Thermoanaerobacter Nonmotile, nonsporulating rods, 1-10 μm 1 [128] tengcongensis long, 0.5-0.6 μm wide

Irregular cell without cell wall; some strains Thermoplasma 1 [129] contain crystal-like (inside) or fibrous acidophilum structures (on surface)

Thermoplasma 1 [130] Irregular shape volcanium

Motile due to a single flagellum, rods became Thermotoga maritima 2 [131] spheres during the stationary growth phase

Treponema pallidum 1 [132] Spiral shape, 6-15 μm long and 0.2 μm wide

"An intracellular form located within of infected cells and an extracellular form Tropheryma whipplei 2 [133] that appeared as massive aggregates of twist bacteria embedded in an extracellular matrix"

90 Texas Tech University, Ben Qin, May 2017

Table A.1 Continued

Cell Species Ref. Justification type

Cell is single or in pairs, short (not long) filaments can be seen; no long chains of Ureaplasma urealyticum 2 [134] cocci; filaments may be because "cytoplasmic division lags behind genome replication"

Vibrio cholerae 1 [135] Curved rod with a flagellum

Cells change from rods to cocci shape in low Vibrio vulnificus 2 [136] temperature

Wigglesworthia 1 [137] Rod shape without flagella, 4-5 µm long brevipalpis

Xanthomonas 1 [138] Nonsporulating, rod shaped campestris

Predominantly single, straight rods with long Xylella fastidiosa 2 [139] filamentous strands under some cultural

[140] No flagella, forming envelope under [140, 37°C Yersinia pestis 2 141] [141] From 26°C to 37°C, cell mass quadruples and elongates twice in size

91 Texas Tech University, Ben Qin, May 2017

Appendix B

Genomic Data of Prokaryotes

Table B.1 Genomic information of 83 prokaryotes

Table B.1 Continued

Genome Coding Total Coding Regulatory Species TFs size (kb) genes genes percent genes

Aeropyrum pernix 1669.7 2694 2751 89.17 19 39

Agrobacterium 5673.5 5301 5366 89.6 392 319 tumefaciens

Aquifex aeolicus 1590.8 1553 1603 92.68 39 29

Archaeoglobus fulgidus 2178.4 2407 2456 91.2 85 88

Archaeoglobus fulgidus 2178.4 2407 2456 91.2 85 88

Bacillus halodurans 4202.4 4066 4172 86.05 252 242

Bacillus subtilis 4214.6 4106 4225 88.3 247 238

Bifidobacterium longum 2260.3 1729 1802 86.43 78 90

Borrelia burgdorferi 1519.9 1639 1677 86.04 12 4

92 Texas Tech University, Ben Qin, May 2017

Table B.1 Continued

Genome Coding Total Coding Regulatory Species TFs size (kb) genes genes percent genes

Bradyrhizobium 9105.8 8317 8371 86.76 560 522 japonicum

Brucella melitensis 3294.9 3198 3264 86.67 169 167

Brucella suis 3315.2 3273 3337 84.98 164 159

Buchnera aphidicola 654.2 555 590 84.14 7 4

Campylobacter jejuni 1641.5 1654 1707 95.41 27 23

Caulobacter crescentus 4016.9 3737 3794 90.56 237 202

Chlamydia muridarum 1080.5 911 954 90.58 10 3

Chlamydia trachomatis 1042.5 894 937 90.83 10 4

Chlamydophila 1229.9 1110 1151 89.22 10 4 pneumoniae

Chlorobium tepidum 2154.9 2252 2308 88.46 34 26

Clostridium 4132.9 3848 3955 86.93 221 211 acetobutylicum

Clostridium perfringens 3085.7 2723 2849 85.1 121 116

Clostridium tetani 2873.3 2432 2504 85.96 95 111

93 Texas Tech University, Ben Qin, May 2017

Table B.1 Continued

Genome Coding Total Coding Regulatory Species TFs size (kb) genes genes percent genes

Corynebacterium 3219.5 2998 3069 90.83 121 133 efficiens

Deinococcus radiodurans 3284.2 3102 3159 87.71 99 95

Escherichia coli 4738.8 4359 4468 88.53 275 266

Fusobacterium 2174.5 2067 2129 90.05 53 55 nucleatum

Haemophilus influenzae 1830.1 1709 1785 87.24 63 63

Halobacterium sp. 2571 2605 2657 85.2 69 67

Helicobacter pylori 1643.8 1491 1531 90.69 9 8

Lactobacillus plantarum 3308.3 3051 3137 85.25 183 235

Lactococcus lactis 2365.6 2266 2346 85.88 106 145

Leptospira interrogans 4691.2 4725 4766 78.18 117 65

Listeria innocua 3093.1 3061 3145 89.97 175 194

Listeria monocytogenes 2731.8 2832 2888 83.64 177 203

Methanobacterium 1751.4 1846 1891 91.12 45 42 thermoautotrophicum

94 Texas Tech University, Ben Qin, May 2017

Table B.1 Continued

Genome Coding Total Coding Regulatory Species TFs size (kb) genes genes percent genes

Methanococcus 1739.9 1770 1813 87.23 28 44 jannaschii

Methanopyrus kandleri 1695 1691 1734 89.09 15 23

Methanosarcina 5751.5 4540 4609 74.28 109 191 acetivorans

Methanosarcina mazei 4096.3 3371 3438 75.83 76 115

Mycobacterium leprae 3268.2 2720 2768 76.87 43 41

Mycobacterium 4403.8 4189 4238 90.37 160 168 tuberculosis

Mycoplasma genitalium 580.1 480 519 91.47 2 3

Mycoplasma penetrans 1358.6 1037 1072 89.09 16 12

Mycoplasma 816.4 688 728 88.69 2 3 pneumoniae

Mycoplasma pulmonis 963.9 782 814 90.73 4 3

Neisseria meningitidis 2272.4 2025 2097 78.7 40 41

Oceanobacillus iheyensis 3630.5 3496 3588 85.38 183 176

Pasteurella multocida 2257.5 2014 2090 90.34 68 60

95 Texas Tech University, Ben Qin, May 2017

Table B.1 Continued

Genome Coding Total Coding Regulatory Species TFs size (kb) genes genes percent genes

Pseudomonas 6264.4 5566 5642 89.59 484 435 aeruginosa

Pseudomonas putida 6181.9 5350 5445 87.19 353 388

Pyrobaculum aerophilum 2222.4 2605 2655 88.78 33 61

Pyrococcus abyssi Orsay 1765.1 1784 1881 92.04 40 72

Pyrococcus furiosus 1908.3 2065 2115 91.38 39 80

Pyrococcus horikoshii 1738.5 2061 2112 91.07 33 70

Ralstonia solanacearum 5810.9 5120 5189 87.78 345 208

Rickettsia prowazekii 1111.5 834 870 76.03 12 9

Salmonella typhimurium 4951.4 4554 4692 87.28 304 290

Shewanella oneidensis 5131.4 4778 4908 83.89 233 243

Shigella flexneri 4599.4 4073 4195 78.05 216 217

Sinorhizobium meliloti 6691.7 6205 6270 86.25 449 217

Staphylococcus aureus 2841.1 2659 2738 84.6 98 122

Staphylococcus 2564.6 2485 2561 84.13 72 95 epidermidis

96 Texas Tech University, Ben Qin, May 2017

Table B.1 Continued

Genome Coding Total Coding Regulatory Species TFs size (kb) genes genes percent genes

Streptococcus agalactiae 2160.3 2124 2225 88.14 102 111

Streptococcus mutans 2030.9 1960 2040 87.06 111 131

Streptococcus 2038.6 2043 2113 87.57 80 99 pneumoniae

Streptococcus pyogenes 1852.4 1696 1774 85.34 78 89

Streptomyces avermitilis 9119.9 7673 7761 86.39 617 605

Streptomyces coelicolor 9054.8 8215 8299 89.13 704 720

Sulfolobus solfataricus 2992.2 2994 3050 84.32 67 115

Sulfolobus tokodaii 2694.8 2826 2878 83.36 68 111

Synechocystis sp. 3947 3564 3614 86.69 111 84

Thermoanaerobacter 2689.4 2588 2652 87.42 115 111 tengcongensis

Thermoplasma 1564.9 1478 1549 87.02 28 51 acidophilum

Thermoplasma 1584.8 1526 1575 86.19 26 44 volcanium

Thermotoga maritima 1860.7 1846 1895 93.88 59 53

97 Texas Tech University, Ben Qin, May 2017

Table B.1 Continued

Genome Coding Total Coding Regulatory Species TFs size (kb) genes genes percent genes

Treponema pallidum 1138 1031 1082 93.15 14 6

Tropheryma whipplei 927.3 808 861 86.34 6 9

Ureaplasma urealyticum 751.7 611 647 92.43 3 4

Vibrio cholerae 4033.5 3828 3951 87.42 223 205

Vibrio vulnificus 5126.8 4537 4676 85.73 308 268

Wigglesworthia 697.7 611 651 88.47 12 9 brevipalpis

Xanthomonas campestris 5076.2 4181 4241 84.8 225 181 pv. campestris

Yersinia pestis 4600.8 4090 4187 83.19 196 191

Data of genome size, coding genes, total genes, and coding percent were from Hou & Lin[26]; regulatory genes were from Croft et al.[25]; TF number was collected from an online database[27].

98 Texas Tech University, Ben Qin, May 2017

Appendix C

Cell Type Number of Eukaryotes

Table C.1 Cell type number of 45 eukaryotes

Table C.1 Continued

Species Cell type Ref. Justification

Acanthamoeba An actively feeding, dividing trophozoite 2 [142] castellanii and a dormant cyst

Aureococcus [143, [143] Spherical cell without flagellum 1 anophagefferens 144] [144] Cyst or resting stage unknown

Zygote (ookinete), sporoblast, sporozoite, schizont, merozoite, trophozoite, gametocyte and two gametes (one has Babesia bovis 9 [38] Strahlenkörper which will be kept in the zygote to enter the host cell, the other does not)

Zoospore (flagellated), encysted (has cell wall and resorbed flagellum), Batrachochytrium young / germling (has 4 [145] dendrobatidis branching rhizoids), sporangium / thallus (about to cleave into ). Sexual stage not found

99 Texas Tech University, Ben Qin, May 2017

Table C.1 Continued

Species Cell type Ref. Justification

Vacuolar, granular, multivacuolar, Blastocystis hominis 6 [146] avacuolar, amoebiod, and cyst

Spheroidal yeast element, elongated ovoid Candida albicans 4 [147] cell, pseudohypha element and hyphal element

filopodial stage (amoebas with filopodia), Capsaspora 3 [53] cystic stage (rounded without filopodia), owczarzaki aggregative stage (without filopodia)

[148] Vegetative cells and gametes are similar (counted as one), plus zygospore Chlamydomonas [148, (hypnozygote, a dormant form without 2 reinhardtii 149] flagellum) [149] 2 types of gametes ( types) are indistinguishable (heterothallic isogamous)

Chlorella variabilis 2 [150] Vegetative cell and autospore

Cryptococcus Chlamydospore, basidium, basidiospore, 6 [151] neoformans yeast cell, hyphal form, blastospore

[152] Sporozoite, trophozoite, 2 types of meronts, merozoite, 2 gamonts, zygote, Cryptosporidium [152, thick and thin cysts 12 parvum 153] [153] 2 types merozoites have different morphology, and plus an extracelluar gamont-like stage

100 Texas Tech University, Ben Qin, May 2017

Table C.1 Continued

Species Cell type Ref. Justification

Cyanidioschyzon Long oval shape, 3-4 µm long, upon the stiff 1 [154] merolae end 1-1.5 µm wide

Spore, myxamoebae, prestalk, prespore, Dictyostelium 6 [155] stalk, macrocyst/giant cell (Schaap: discoideum personal communication)

Sporozoite, trophozoite, schizont, 1st and 2nd generations of merozoites, Eimeria tenella 12 [39] unsporulated oocyst, sporulated oocyst, sporoblast/sporocyst, macro- and micro- gametes, macro- and micro- gametocytes

Coccolithophore, non-motile - bearing cells (c-cells), naked non-motile Emiliania huxleyi 3 [156] cells (n-cells), and motile scale-bearing swarmers (s-cells, gametes)

Encephalitozoon [157, [157] Sporont, sporoblast, spore 5 cuniculi 158] [158] Sporoplasm and meront

Entamoeba 2 [159] Trophozoite and cyst histolytica

Subspherical cells, 3-4 µm in diameter, Galdieria sulphuraria 1 [154] form autospores but not a different cell type

Giardia intestinalis / Trophozoite and cyst, excyzoite doesn't 2 [160] lamblia count

101 Texas Tech University, Ben Qin, May 2017

Table C.1 Continued

Species Cell type Ref. Justification

7-11 µm long, 4-6 µm deep, 4.5-6.5 µm Guillardia theta 1 [161] wide, slightly dorsoventrally compressed

Amastigote, paramastigote, and procyclic, Leishmania major 7 [162] nectomonad, leptomonad, metacyclic, and haptomonad promastigotes

"Cells 1-3 µm long, with a single posterior Micromonas pusilla 1 [163] flagellum... Sexual reproduction unknown."

Sessile trophic cell and zoospore (Karpov: Monosiga brevicollis 2 [164] personal communication)

Naegleria gruberi 3 [165] Amoeboid form, form and cyst

[166] Vegetative/mother cell (larger than vegetative cell but similar in shape), Nannochloropsis [166, autospore 2 gaditana 167] [167] Reproduction is exclusively binary ; no zoospore or sexual reproduction detected

Tachyzoite, bradyzoite, oocyst and Neospora caninum 4 [168] sporocyst

Coccoid of 1 μm long, no flagella or body Ostreococcus tauri 1 [169] scales

Sterkiella Starved, encysted, excysting and vegetative histriomuscorum / 4 [170] cells Oxytricha trifallax

102 Texas Tech University, Ben Qin, May 2017

Table C.1 Continued

Species Cell type Ref. Justification

Paramecium 1 [171] Non-cyst-forming tetraurelia

Trophozoites or aplanospores, Perkinsus marinus 3 [172] prezoosporangia or hypnospores, and planonts or zoospores

[173] 3 morphotypes: fusiform, triradiate, and oval; oval may be the resting stage; rounded cell also detected under stress or Phaeodactylum [173, oval cell aggregation; size remains after 5 tricornum 174] and no sexual reproduction known

[174] Another morphotype, cruciform, was confirmed

[175] Zoospore, cyst, mycelium (hyphae) and sporangia (sporangium, supported by [175, sporangophore, which is not counted as Phytophthora sojae 8 176] one cell type) [176] Oospore, oogonium, antheridium and chlamydospore

[177] Sporozoite, merozoite, trophozoite, Plasmodium [36, 11 schizont, 2 gametocytes, ookinete, oocyst falciparum 177] [36] Macro- and micro- gametes, zygote

103 Texas Tech University, Ben Qin, May 2017

Table C.1 Continued

Species Cell type Ref. Justification

Somatic (vegetative) cell and two types of resting stages: uncovered stage (dispersed by water) formed throughout the year in Reticulomyxa filosa 3 [178] deteriorating conditions, and covered / walled cyst (dispersed by water and air) additionally in summer

"Three solitary cell types (slow swimmers, fast swimmers, and thecate cells) and two colonial forms (rosettes and chains)…” Salpingoeca rosetta 5 [179] Rosettes are similar to slow swimmers but stained differently. Maybe some more cryptic types

Schizosaccharomyces 1 [180] One cell type of cylindrical shape pombe

Tetrahymena T. puriformis doesn't form cysts, 1 [181] thermophila trophozoite is just the vegetative cell

Thalassiosira 2 [182] Somatic cell and auxospore pseudonana

Sporozoite, macroschizont, microschizont, Theileria parva 11 [37] merozoite, piroplasm, 2 gametes, zygote, motile kinete, sporoblast, and

104 Texas Tech University, Ben Qin, May 2017

Table C.1 Continued

Species Cell type Ref. Justification

[183] Tachyzoite, bradyzoite/ cyst, sporozoite, oocyst, and schizonts A-E [183- Toxoplasma gondii 14 185] [184] 2 gametocytes and 2 gametes

[185] Sporocyst

Trachipleistophora Meront, sporont, sporoblast, spore, and 5 [186] hominis sporoplasma

Trichomonas Trophozoite divides by binary fission, no 1 [187] vaginalis cyst

Metacyclic trypomastigote, amastigote, Trypanosoma cruzi 4 [188] bloodstream trypomastigote, epimastigote

Teliospore, unicellular haploid form, filamentous dikaryon, and cylindrical Ustilago maydis 4 [189] fragmented hyphal cell (Banuett: personal communication)

Male and female are genetically but not morphologically distinct. Somatic cell, bisexual gonidium, androgonidia, Volvox carteri 7 [190] overwinter coated zygote (zygospore), germling (flagellated zoospore), plus macro- and microgametes

The cell type number of Leishmania major and Tetrahymena thermophila were from their respective sister taxa due to a lack of their cell type information.

105 Texas Tech University, Ben Qin, May 2017

Appendix D

Genomic Data of Eukaryotes

Table D.1 Genomic information of 45 eukaryotes

Table D.1 Continued

Genome size GC Species Proteins Genes TFs (kb) content

Acanthamoeba castellanii 101080 35520 36515 58.4

Aureococcus anophagefferens 56660 11520 11522 69.5 93

Babesia bovis 8170 3703 3773 41.6

Batrachochytrium dendrobatidis 24320 8700 8700 39.3 145

Blastocystis hominis 18820 6020 6020 45.2

Candida albicans 27560 14217 14217 33.4 185

Capsaspora owczarzaki 44430 14453 14793 53.8

Chlamydomonas reinhardtii 120190 14412 14354 63.9 213

Chlorella variabilis 46160 9780 9780 67.1 133

106 Texas Tech University, Ben Qin, May 2017

Table D.1 Continued

Genome size GC Species Proteins Genes TFs (kb) content

Cryptococcus neoformans 19057 6475 6617 48.6 194

Cryptosporidium parvum 9100 3805 3887 30.2 30

Cyanidioschyzon merolae 16537 4803 6170 54.9 66

Dictyostelium discoideum 34130 13267 13892 22.4 116

Eimeria tenella 51860 8599 8657 51.3

Emiliania huxleyi 167680 38554 38549 65.7 420

Encephalitozoon cuniculi 2498 1996 2029 47.3 35

Entamoeba histolytica 20840 8163 8342 24.3 69

Galdieria sulphuraria 13710 7174 6723 37.7

Giardia intestinalis 11210 6502 6583 49.2 24

Guillardia theta 87150 24822 24923 53.1

Leishmania major 32834 8316 9686 60.3 27

Micromonas pusilla 21960 10242 10248 65.9 107

Monosiga brevicollis 41630 9171 9174 54.4 151

107 Texas Tech University, Ben Qin, May 2017

Table D.1 Continued

Genome size GC Species Proteins Genes TFs (kb) content

Naegleria gruberi 40960 15711 16620 33.1 142

Nannochloropsis gaditana 33990 3554 3558 54.3

Neospora caninum 57550 6936 7084 54.8

Ostreococcus tauri 12920 7662 7668 59.4 78

Paramecium tetraurelia 72090 39580 39581 28.1 481

Perkinsus marinus 86610 23654 29745 47.4

Phaeodactylum tricornum 25050 9488 9479 48.9 131

Phytophthora sojae 82600 26489 28142 54.6 197

Plasmodium falciparum 23266 5337 5512 20.4 18

Reticulomyxa filosa 76420 39963 39965 35

Salpingoeca rosetta 55440 11706 11787 56

Schizosaccharomyces pombe 12589 5133 6991 34.6 100

Sterkiella histriomuscorum / 37430 810 810 28.4 Oxytricha

108 Texas Tech University, Ben Qin, May 2017

Table D.1 Continued

Genome size GC Species Proteins Genes TFs (kb) content

Tetrahymena thermophila 103010 24725 24725 22.3 119

Thalassiosira pseudonana 32440 11673 11771 46.9 169

Theileria parva 8350 4061 4141 34 15

Toxoplasma gondii 63000 7987 8155 52.3 23

Trachipleistophora hominis 8500 3212 3253 34.1

Trichomonas vaginalis 176420 59679 60815 32.8 1500

Trypanosoma cruzi 89940 19607 25183 51.7 43

Ustilago maydis 19680 6522 6631 53.8 192

Volvox carteri 137680 14436 14437 56 148

TF number was from the online database[27] and de Mendoza et al[42], and all other information was from NCBI[41].

109 Texas Tech University, Ben Qin, May 2017

Appendix E

Branch Values for Resampling Test

Table E.1 Branch values for resampling test

Table E.1 Continued

Genome GC Node Cell type Protein Gene TF size content

2 3.67 47015 15995 16546 44.8 171

3 3.71 30831 10559 10924 45.5 143

5 3.9 24321 8664 9016 43.9 137

6 3.88 22885 8313 8693 43.3 143

7 3.84 20862 7806 8275 43.2 151

8 3.09 16771 7307 8180 38.5 146

9 1 12590 5133 6991 36 100

12 4 14470 8689 9219 33.5 185

13 4.49 20077 7084 7365 47.6 174

15 6 19050 6475 6617 48.5 194

17 4 19860 6548 6671 53.7 192

110 Texas Tech University, Ben Qin, May 2017

Table E.1 Continued

Genome GC Node Cell type Protein Gene TF size content

20 4 24130 8700 8700 39.3 145

21 4.81 8835 3678 3770 41.3

22 5 8500 3212 3253 34.1

23 5 2500 1996 2029 47.3 35

25 3.12 36980 11674 11907 51.4 138

27 2 27970 14453 14793 53.8 143

31 3.37 44710 10869 10979 53.3 133

32 2 41710 9203 9233 54.8 151

33 5 55440 11731 11798 53.8 109

34 3.66 50174 17056 17643 44.7 177

35 3.31 36220 19798 20288 38.9 148

36 3.64 32013 15433 15741 31.4 121

38 6 34210 13315 13361 22.5 116

40 2 20770 8163 8342 24.3 69

43 2 39450 35520 36515 58.4 213

44 3.67 52157 17349 17956 44.9 180

111 Texas Tech University, Ben Qin, May 2017

Table E.1 Continued

Genome GC Node Cell type Protein Gene TF size content

45 3.33 64795 21090 22499 44.2 318

46 3.78 59019 18503 20303 44.9 222

47 4.93 60503 15475 18391 52.1 97

48 7 32860 8316 9686 59.7 27

49 4 89630 19607 25183 51.7 43

52 3 36300 15711 16620 33.1 142

53 1.94 87582 30213 31013 41.8 655

54 2 11190 6502 6583 49.2 24

55 1 178350 59679 60815 32.8 1500

56 3.72 53874 17454 18025 45.3 174

57 2.62 60275 16485 16799 51 180

58 2.49 56711 15181 15474 51.3 168

60 2.44 56109 13166 13331 56.8 150

62 3.07 74101 12797 12862 60.5 157

63 3.84 96392 13695 13725 60.2 170

64 2 105420 14489 14488 63.8 213

112 Texas Tech University, Ben Qin, May 2017

Table E.1 Continued

Genome GC Node Cell type Protein Gene TF size content

65 7 96392 14436 14437 56 148

66 2 42220 9780 9780 67.1 133

68 1.34 26500 10097 10190 61.1 106

69 1 12570 7990 8110 59 78

70 1 21750 10269 10288 65.9 107

73 1.25 22035 7542 7972 38.8

73 1.25 22035 7542 7972 38.8

74 1 16550 4803 6170 55 66

75 1 13420 7174 6723 17.5

79 2.1 111973 29313 29402 58.1 326

80 3 155940 38554 38549 65.7 420

81 1 87150 24822 24923 53.1 287

82 4.34 56049 18187 18764 43.8

83 3 101870 39963 39965 35

86 4.5 55161 17704 18299 43.8 151

87 4.5 47380 14137 14593 49.1

113 Texas Tech University, Ben Qin, May 2017

Table E.1 Continued

Genome GC Node Cell type Protein Gene TF size content

88 4.33 48348 14007 14474 51.1 137

89 3.93 46487 12939 13319 52.1 131

90 3.32 44421 12252 12489 54.1 129

92 3.44 34770 11444 11553 49.9 143

93 2 32440 11673 11771 46.7 169

94 5 27450 10408 10398 48.8 131

98 1 56660 11520 11522 69.5 93

101 2 33990 3554 3558 54.3 63

102 8 79330 26489 28142 54.6 197

105 6 18740 6020 6020 45.2

106 5.52 54838 17090 17886 40.1 133

107 3.42 69360 24767 25049 32.1

109 1.81 81477 29706 29805 27.5 273

110 1 103000 24770 24784 22.3 119

111 1 72070 39580 39581 28.1 481

113 4 63450 24578 24578 31.4

114 Texas Tech University, Ben Qin, May 2017

Table E.1 Continued

Genome GC Node Cell type Protein Gene TF size content

114 7.03 49245 13811 14943 40.3 88

115 3 86060 23654 29745 47.4 44

116 7.78 45038 12099 13023 39.9 77

117 12 9100 3805 3887 30.3 30

118 8.32 43030 10780 11530 40.3 67

119 9.48 51434 9043 9408 47.8

120 9.22 56192 8190 8444 50.9

121 14 62970 7987 8155 52.3 23

122 4 57550 6936 7084 54.9

123 12 51860 8599 8657 51.3

124 9.59 25593 6586 6942 34.2 39

125 11 23270 5337 5512 19.4 18

126 9.81 16250 4593 4785 36.2

127 9 8180 1719 1742 41.6

128 11 8350 4061 4141 34 15

Branches gave rise to aggregation-based multicellularity were in blue and division-based were in red.

115 Texas Tech University, Ben Qin, May 2017

9 Schizosaccharomyces pombe 8 11 Pezizomycotina 10 12 Candida albicans 7 15 Cryptococcus neoformans 14 16 13 Agaricomycotina 6 17 Ustilago maydis 19 Chytridomycota 5 18 20 Batrachochytrium dendrobatidis

22 4 Trachipleistophora hominis 21 23 Encephalitozoon cuniculi 24 Fonticula 3 27 Capsaspora owczarzaki 26 28 Capsaspora 25 30 Animalia 29 32 Monosiga brevicollis 31 33 Salpingoeca rosetta 38 Dictyostelium discoideum 37 39 36 Dictyostellids 2 40 35 Entamoeba histolytica 42 Tubulinea (Copromyxa) 41 43 Acanthamoeba castellanii 48 Leishmania major 47 49 Trypanosoma cruzi 46 51 Acrasids 50 45 52 Naegleria gruberi 54 Giardia 34 53 55 Trichomonas vaginalis 64 Chlamydomonas reinhardtii 63 65 62 Volvox carteri 61 66 Chlorella variabilis

67 60 UTCs 69 Ostreococcus tauri 59 68 70 Micromonas pusilla 44 71 Streptophyta 58 74 Cyanidioschyzon merolae 73 75 Galdieria sulphuraria 72 57 77 Compsopogonales 76 78 Bangiales+Florideophyceae 80 Emiliania huxleyi 79 81 Guillardia theta 84 Cercozoa (Guttulinopsis) 83 56 85 Reticulomyxa filosa 93 Thalassiosira pseudonana 92 94 91 Phaeodactylum tricornum 95 90 Diatoms 97 Brown algae 96 89 98 Aureococcus anophagefferens 82 100 Chrysophytes 88 99 101 Nannochloropsis gaditana 87 102 Phytophthora sojae 104 Labyrinthulids (Sorodiplophrys) 103 105 Blastocystis hominis 110 Tetrahymena thermophila 86 109 111 108 Paramecium tetraurelia 107 112 Sorogena 113 Sterkiella histriomuscorum / Oxytricha trifallax 106 115 Perkinsus marinus

117 114 Cryptosporidium parvum 121 Toxoplasma gondii 120 116 122 119 Neospora caninum 123 Eimeria tenella 118 125 Plasmodium falciparum 124 127 Babesia bovis 126 128 Theileria parva

Fig EEdited,. 1based Node position on Untitled Tree+ in the resampling test. The test was performed for both trees and the one from Katz & Grant was shown. Branches gave rise to aggregation-based multicellularity were in blue and division-based were in red.

116 Texas Tech University, Ben Qin, May 2017

Appendix F

References for Cell Types

15. Givskov, M., et al., Responses to nutrient starvation in Pseudomonas putida KT2442: analysis of general cross-protection, cell shape, and macromolecular content. J Bacteriol, 1994. 176(1): p. 7-14.

16. Nassif, X., et al., Interactions of pathogenic Neisseria with host cells. Is it possible to assemble the puzzle? Mol Microbiol, 1999. 32(6): p. 1124-32.

17. Oliva, B., et al., Morphological and biochemical variations of Haemophilus influenzae type b induced by pH and temperature changes. New Microbiol, 2001. 24(2): p. 117-24.

18. Lu, J., Y. Nogi, and H. Takami, Oceanobacillus iheyensis gen. nov., sp. nov., a deep-sea extremely halotolerant and alkaliphilic species isolated from a depth of 1050 m on the Iheya Ridge. FEMS Microbiol Lett, 2001. 205(2): p. 291-7.

19. Welch, R.W., F.B. Rudolph, and E.T. Papoutsakis, Purification and characterization of the NADH-dependent butanol dehydrogenase from Clostridium acetobutylicum (ATCC 824). Arch Biochem Biophys, 1989. 273(2): p. 309-18.

20. Beltra, R., J.R. De Lecea, and C. De La Rosa, A Comparative Study of Chemical Fractions Isolated from Agrobacterium tumefaciens and from its Stable L-Form. Microbiology, 1972. 73(1): p. 185-188.

21. Tully, J.G., et al., Evaluation of culture media for the recovery of Mycoplasma hominis from the human urogenital tract. Sex Transm Dis, 1983. 10(4 Suppl): p. 256-60.

22. Deeb, B.J. and G.E. Kenny, Characterization of Mycoplasma pulmonis variants isolated from rabbits. I. Identification and properties of isolates. J Bacteriol, 1967. 93(4): p. 1416- 24.

23. Sako, Y., et al., Aeropyrum pernix gen. nov., sp. nov., a novel aerobic hyperthermophilic archaeon growing at temperatures up to 100 degrees C. Int J Syst Bacteriol, 1996. 46(4): p. 1070-7.

24. Thomas, C., D.J. Hill, and M. Mabey, Morphological changes of synchronized Campylobacter jejuni populations during growth in single phase liquid culture. Lett Appl Microbiol, 1999. 28(3): p. 194-8.

117 Texas Tech University, Ben Qin, May 2017

25. Croft, L.J., et al., Is prokaryotic complexity limited by accelerated growth in regulatory overhead? Genome Biology, 2003. 5(1): p. 1-26.

26. Hou, Y. and S. Lin, Distinct gene number-genome size relationships for eukaryotes and non-eukaryotes: gene content estimation for dinoflagellate genomes. PLoS One, 2009. 4(9): p. e6978.

27. Wilson, D., et al., DBD––taxonomically broad transcription factor predictions: new content and functionality. Nucleic acids research, 2008. 36(suppl 1): p. D88-D92.

36. Sinden, R.E., et al., Gametocyte and gamete development in Plasmodium falciparum. Proc R Soc Lond B Biol Sci, 1978. 201(1145): p. 375-99.

37. Dolan, T., Theileriasis: a comprehensive review. Revue Scientifique et Technique, Office International des épizooties, 1989. 8(1): p. 11-78.

38. Homer, M.J., et al., Babesiosis. Clin Microbiol Rev, 2000. 13(3): p. 451-69.

39. Olsen, O.W., Animal parasites: their life cycles and ecology. 1986: Courier Corporation.

41. Coordinators, N.R., Database resources of the national center for biotechnology information. Nucleic Acids Res, 2016. 44(Database issue): p. D7-D19.

42. de Mendoza, A., et al., Transcription factor evolution in eukaryotes and the assembly of the regulatory toolkit in multicellular lineages. Proc Natl Acad Sci U S A, 2013. 110(50): p. E4858-66.

53. Sebe-Pedros, A., et al., Regulated aggregative multicellularity in a close unicellular relative of metazoa. Elife, 2013. 2: p. e01287.

63. Larson, M.A., et al., Hyperthermophilic Aquifex aeolicus initiates primer synthesis on a limited set of trinucleotides comprised of cytosines and guanines. Nucleic Acids Res, 2008. 36(16): p. 5260-9.

64. Beeder, J., et al., Archaeoglobus fulgidus Isolated from Hot North Sea Oil Field Waters. Appl Environ Microbiol, 1994. 60(4): p. 1227-31.

65. Takami, H., Y. Nogi, and K. Horikoshi, Reidentification of the keratinase-producing facultatively alkaliphilic Bacillus sp. AH-101 as Bacillus halodurans. , 1999. 3(4): p. 293-6.

66. Lopez, D., H. Vlamakis, and R. Kolter, Generation of multiple cell types in Bacillus subtilis. FEMS Microbiol Rev, 2009. 33(1): p. 152-63.

67. Shigwedha, N. and L. Jia, Bifidobacterium in human GI tract: screening, isolation, survival and growth kinetics in simulated gastrointestinal conditions, in Lactic Acid Bacteria - R & D for Food, Health and Livestock Purposes, M. Kongo, Editor. 2013, INTECH Open Access Publisher. p. 670.

118 Texas Tech University, Ben Qin, May 2017

68. Sapi, E., et al., Evaluation of in-vitro antibiotic susceptibility of different morphological forms of Borrelia burgdorferi. Infect Drug Resist, 2011. 4: p. 97-113.

69. Ozawa, T. and T. Tsuji, Inhibition of growth of Bradyrhizobium japonicum bacteroid by spermidine and spermine in yeast extract. Soil Science and Nutrition, 1992. 38(2): p. 375-379.

70. Savas, L., et al., Prospective evaluation of 140 patients with brucellosis in the southern region of Turkey. Infectious Diseases in Clinical Practice, 2007. 15(2): p. 83-88.

71. Munson, M.A., P. Baumann, and M.G. Kinsey, Buchnera gen. nov. and Buchnera aphidicola sp. nov., a taxon consisting of the mycetocyte-associated, primary endosymbionts of aphids. International Journal of Systematic and Evolutionary Microbiology, 1991. 41(4): p. 566-568.

72. Johnson, R.C. and B. Ely, Isolation of spontaneously derived mutants of Caulobacter crescentus. Genetics, 1977. 86(1): p. 25-32.

73. Fox, J.G., et al., Antigenic specificity and morphologic characteristics of Chlamydia trachomatis, strain SFPD, isolated from hamsters with proliferative ileitis. Lab Anim Sci, 1993. 43(5): p. 405-10.

74. Beatty, W.L., R.P. Morrison, and G.I. Byrne, Persistent chlamydiae: from cell culture to a paradigm for chlamydial pathogenesis. Microbiol Rev, 1994. 58(4): p. 686-99.

75. Campbell, L.A. and C.C. Kuo, Chlamydia pneumoniae--an infectious risk factor for atherosclerosis? Nat Rev Microbiol, 2004. 2(1): p. 23-32.

76. Wahlund, T.M., et al., A thermophilic green sulfur bacterium from New Zealand hot springs, Chlorobium tepidum sp. nov. Archives of microbiology, 1991. 156(2): p. 81-90.

77. Sarker, M.R., et al., Comparative experiments to examine the effects of heating on vegetative cells and spores of Clostridium perfringens isolates carrying genes versus chromosomal enterotoxin genes. Appl Environ Microbiol, 2000. 66(8): p. 3234-40.

78. Bisset, K.A., The sporulation of clostridium tetani. J Gen Microbiol, 1950. 4(1): p. 1-3.

79. Fudou, R., et al., Corynebacterium efficiens sp. nov., a glutamic-acid-producing species from soil and vegetables. Int J Syst Evol Microbiol, 2002. 52(Pt 4): p. 1127-31.

80. Joshi, H.M. and R.S. Toleti, Nutrition induced pleomorphism and budding mode of reproduction in Deinococcus radiodurans. BMC Res Notes, 2009. 2: p. 123.

81. Savageau, M.A., Escherichia coli habitats, cell types, and molecular mechanisms of gene control. American Naturalist, 1983: p. 732-744.

82. Bolstad, A.I., H.B. Jensen, and V. Bakken, Taxonomy, biology, and periodontal aspects of Fusobacterium nucleatum. Clin Microbiol Rev, 1996. 9(1): p. 55-71.

119 Texas Tech University, Ben Qin, May 2017

83. Hampp, E.G., D.B. Scott, and R.W. Wyckoff, Morphological characteristics of oral as revealed by the electron microscope. J Bacteriol, 1960. 79: p. 716-28.

84. Mormile, M.R., et al., Isolation of Halobacterium salinarum retrieved directly from halite brine inclusions. Environ Microbiol, 2003. 5(11): p. 1094-102.

85. Worku, M.L., et al., The relationship between Helicobacter pylori motility, morphology and phase of growth: implications for gastric colonization and pathology. Microbiology, 1999. 145 ( Pt 10): p. 2803-11.

86. Bringel, F., et al., Lactobacillus plantarum subsp. argentoratensis subsp. nov., isolated from vegetable matrices. Int J Syst Evol Microbiol, 2005. 55(Pt 4): p. 1629-34.

87. Batt, C.A., Lactococcus, in Encyclopedia of food microbiology, C.A. Batt and M.-L. Tortorello, Editors. 2014, Academic press. p. 439.

88. Johnson, R.C., Leptospira, in Medical Microbiology, S. Baron, Editor. 1996, University of Texas Medical Branch at Galveston.

89. Grundling, A., et al., Listeria monocytogenes regulates flagellar motility gene expression through MogR, a transcriptional repressor required for virulence. Proc Natl Acad Sci U S A, 2004. 101(33): p. 12318-23.

90. Peel, M., W. Donachie, and A. Shaw, Temperature-dependent expression of flagella of Listeria monocytogenes studied by electron microscopy, SDS-PAGE and western blotting. J Gen Microbiol, 1988. 134(8): p. 2171-8.

91. Zaika, L.L. and J.S. Fanelli, Growth kinetics and cell morphology of Listeria monocytogenes Scott A as affected by temperature, NaCl, and EDTA. J Food Prot, 2003. 66(7): p. 1208-15.

92. Zeikus, J.G. and R.S. Wolfe, Methanobacterium thermoautotrophicus sp. n., an anaerobic, autotrophic, extreme . J Bacteriol, 1972. 109(2): p. 707-15.

93. Zeikus, J.G. and R.S. Wolfe, Fine structure of Methanobacterium thermoautotrophicum: effect of growth temperature on morphology and ultrastructure. J Bacteriol, 1973. 113(1): p. 461-7.

94. Jones, W., et al., Methanococcus jannaschii sp. nov., an extremely thermophilic from a submarine . Archives of Microbiology, 1983. 136(4): p. 254-261.

95. Kurr, M., et al., Methanopyrus kandleri, gen. and sp. nov. represents a novel group of hyperthermophilic , growing at 110 C. Archives of Microbiology, 1991. 156(4): p. 239-247.

96. Sowers, K.R., S.F. Baron, and J.G. Ferry, Methanosarcina acetivorans sp. nov., an Acetotrophic Methane-Producing Bacterium Isolated from Marine Sediments. Appl Environ Microbiol, 1984. 47(5): p. 971-8.

120 Texas Tech University, Ben Qin, May 2017

97. Robinson, R.W., Life Cycles in the Methanogenic Archaebacterium Methanosarcina mazei. Appl Environ Microbiol, 1986. 52(1): p. 17-27.

98. Shinnick, T.M., Mycobacterium leprae, in The Prokaryotes, M. Dworkin, et al., Editors. 2006, Springer Science & Business Media. p. 934-944.

99. Michailova, L., et al., Morphological variability and cell-wall deficiency in Mycobacterium tuberculosis 'heteroresistant' strains. Int J Tuberc Lung Dis, 2005. 9(8): p. 907-14.

100. Lo, S.C., et al., Mycoplasma penetrans sp. nov., from the urogenital tract of patients with AIDS. Int J Syst Bacteriol, 1992. 42(3): p. 357-64.

101. Boatman, E.S. and G.E. Kenny, Morphology and ultrastructure of Mycoplasma pneumoniae spherules. J Bacteriol, 1971. 106(3): p. 1005-15.

102. Ala'Aldeen, D.A. and D.P. Turner, Neisseria meningitidis, in Principles and practice of clinical bacteriology, S.H. Gillespie and P.M. Hawkey, Editors. 2006, John Wiley & Sons, Ltd: . p. 205.

103. Bisgaard, M., Fowl cholera, in Poultry diseases, M. Pattison, Editor. 2008, Elsevier Health Sciences. p. 149.

104. Sauer, K., et al., Pseudomonas aeruginosa displays multiple phenotypes during development as a biofilm. J Bacteriol, 2002. 184(4): p. 1140-54.

105. Volkl, P., et al., Pyrobaculum aerophilum sp. nov., a novel nitrate-reducing hyperthermophilic archaeum. Appl Environ Microbiol, 1993. 59(9): p. 2918-26.

106. Erauso, G., et al., Pyrococcus abyssi sp. nov., a new hyperthermophilic archaeon isolated from a deep-sea hydrothermal vent. Archives of Microbiology, 1993. 160(5): p. 338-349.

107. Fiala, G. and K.O. Stetter, Pyrococcus furiosus sp. nov. represents a novel of marine heterotrophic archaebacteria growing optimally at 100 C. Archives of Microbiology, 1986. 145(1): p. 56-61.

108. Gonzalez, J.M., et al., Pyrococcus horikoshii sp. nov., a hyperthermophilic archaeon isolated from a hydrothermal vent at the Okinawa Trough. Extremophiles, 1998. 2(2): p. 123-30.

109. Alvarez, B., M.M. Lopez, and E.G. Biosca, Survival strategies and pathogenicity of Ralstonia solanacearum phylotype II subjected to prolonged starvation in environmental water microcosms. Microbiology, 2008. 154(Pt 11): p. 3590-8.

110. Anderson, D.R., et al., Comparison of the ultrastructure of several rickettsiae, ornithosis , and Mycoplasma in tissue culture. J Bacteriol, 1965. 90(5): p. 1387-404.

111. Ginocchio, C.C., et al., Contact with epithelial cells induces the formation of surface appendages on Salmonella typhimurium. Cell, 1994. 76(4): p. 717-24.

121 Texas Tech University, Ben Qin, May 2017

112. Abboud, R., et al., Low-temperature growth of Shewanella oneidensis MR-1. Appl Environ Microbiol, 2005. 71(2): p. 811-6.

113. Niyogi, S.K., Shigellosis. J Microbiol, 2005. 43(2): p. 133-43.

114. Mergaert, P., et al., Eukaryotic control on bacterial cell cycle and differentiation in the Rhizobium-legume symbiosis. Proc Natl Acad Sci U S A, 2006. 103(13): p. 5230-5.

115. Le Loir, Y., F. Baron, and M. Gautier, Staphylococcus aureus and food poisoning. Genet Mol Res, 2003. 2(1): p. 63-76.

116. Bennett, R., Staphylococcus aureus, in Guide to Foodborne Pathogens, R.G. Labbé and G. Santos, Editors. 2001, John Wiley and Sons, Inc., New York, NY, USA: New York. p. 203.

117. Schleifer, K.H. and W.E. Kloos, Isolation and characterization of Staphylococci from human skin I. Amended descriptions of Staphylococcus epidermidis and Staphylococcus saprophyticus and descriptions of three new species: Staphylococcus cohnii, Staphylococcus haemolyticus, and Staphylococcus xylosus. International Journal of Systematic and Evolutionary Microbiology, 1975. 25(1): p. 50-61.

118. Avais, M., et al., Dose dependent antibody response to composite formalin-inactivated Staphylococcus aureus, Streptococcus agalactiae and Escherichia coli in rabbits. International Journal of Agriculture and Biology (Pakistan), 2007.

119. Luppens, S.B. and J.M. ten Cate, Effect of biofilm model, mode of growth, and strain on streptococcusmutans protein expression as determined by two-dimensional difference gel electrophoresis. J Res, 2005. 4(2): p. 232-7.

120. Lee, B.S., et al., Bactericidal effects of diode laser on Streptococcus mutans after irradiation through different thickness of dentin. Lasers Surg Med, 2006. 38(1): p. 62-9.

121. MJ, P., Streptococcus, in Medical Microbiology, S. Baron, Editor. 1996, University of Texas Medical Branch at Galveston: Texas.

122. Filippova, S.N., et al., [Endospore formation by Streptomyces avermitilis in submerged culture]. Mikrobiologiia, 2005. 74(2): p. 204-14.

123. Del Sol, R., et al., Characterization of changes to the cell surface during the life cycle of Streptomyces coelicolor: atomic force microscopy of living cells. J Bacteriol, 2007. 189(6): p. 2219-25.

124. Daza, A., et al., Sporulation of several species of Streptomyces in submerged cultures after nutritional downshift. J Gen Microbiol, 1989. 135(9): p. 2483-91.

125. Frols, S., et al., UV-inducible cellular aggregation of the hyperthermophilic archaeon Sulfolobus solfataricus is mediated by pili formation. Mol Microbiol, 2008. 70(4): p. 938- 52.

122 Texas Tech University, Ben Qin, May 2017

126. Suzuki, T., et al., Sulfolobus tokodaii sp. nov. (f. Sulfolobus sp. strain 7), a new member of the genus Sulfolobus isolated from Beppu Hot Springs, . Extremophiles, 2002. 6(1): p. 39-44.

127. Marbouty, M., et al., Characterization of the Synechocystis strain PCC 6803 penicillin- binding proteins and cytokinetic proteins FtsQ and FtsW and their network of interactions with ZipN. J Bacteriol, 2009. 191(16): p. 5123-33.

128. Xue, Y., et al., Thermoanaerobacter tengcongensis sp. nov., a novel anaerobic, saccharolytic, thermophilic bacterium isolated from a in Tengcong, China. Int J Syst Evol Microbiol, 2001. 51(Pt 4): p. 1335-41.

129. Yasuda, M., et al., Morphological variation of new Thermoplasma acidophilum isolates from Japanese hot springs. Appl Environ Microbiol, 1995. 61(9): p. 3482-5.

130. Segerer, A., T.A. Langworthy, and K.O. Stetter, Thermoplasma acidophilum and Thermoplasma volcanium sp. nov. from solfatara fields. Systematic and Applied Microbiology, 1988. 10(2): p. 161-171.

131. Huber, R., et al., Thermotoga maritima sp. nov. represents a new genus of unique extremely thermophilic eubacteria growing up to 90 C. Archives of Microbiology, 1986. 144(4): p. 324-333.

132. Lafond, R.E. and S.A. Lukehart, Biological basis for syphilis. Clin Microbiol Rev, 2006. 19(1): p. 29-49.

133. La Scola, B., et al., Description of Tropheryma whipplei gen. nov., sp. nov., the Whipple's disease bacillus. Int J Syst Evol Microbiol, 2001. 51(Pt 4): p. 1471-9.

134. Razin, S., et al., Morphology of Ureaplasma urealyticum (T-mycoplasma) organisms and colonies. J Bacteriol, 1977. 130(1): p. 464-71.

135. Wai, S.N., et al., Vibrio cholerae O1 strain TSI-4 produces the exopolysaccharide materials that determine colony morphology, stress resistance, and biofilm formation. Appl Environ Microbiol, 1998. 64(10): p. 3648-55.

136. Oliver, J.D., L. Nilsson, and S. Kjelleberg, Formation of nonculturable Vibrio vulnificus cells and its relationship to the starvation state. Appl Environ Microbiol, 1991. 57(9): p. 2640-4.

137. Aksoy, S., Wigglesworthia gen. nov. and Wigglesworthia glossinidia sp. nov., taxa consisting of the mycetocyte-associated, primary endosymbionts of tsetse flies. Int J Syst Bacteriol, 1995. 45(4): p. 848-51.

138. Manicom, B. and F. Wallis, Further characterization of Xanthomonas campestris pv. mangiferaeindicae. International Journal of Systematic and Evolutionary Microbiology, 1984. 34(1): p. 77-79.

123 Texas Tech University, Ben Qin, May 2017

139. Wells, J.M., et al., Xylella fastidiosa gen. nov., sp. nov: gram-negative, xylem-limited, fastidious plant bacteria related to Xanthomonas spp. International Journal of Systematic and Evolutionary Microbiology, 1987. 37(2): p. 136-143.

140. Chen, T.H. and S.S. Elberg, Scanning electron microscopic study of virulent Yersinia pestis and Yersinia pseudotuberculosis type 1. Infect Immun, 1977. 15(3): p. 972-7.

141. Hall, P.J., et al., Effect of Ca2+ on morphology and division of Yersinia pestis. Infect Immun, 1974. 9(6): p. 1105-13.

142. Marciano-Cabral, F. and G. Cabral, Acanthamoeba spp. as agents of disease in . Clin Microbiol Rev, 2003. 16(2): p. 273-307.

143. Sieburth, J.M., P.W. Johnson, and P.E. Hargraves, Ultrastructure and ecology of Aureococcus anophageferens gen. et sp. nov.(Chrysophyceae): the dominant picoplankter during a bloom in Narragansett Bay, Rhode Island, summer 19851. Journal of Phycology, 1988. 24(3): p. 416-425.

144. DeYoe, H.R., et al., DESCRIPTION AND CHARACTERIZATION OF THE ALGAL SPECIES GEN. ET SP. NOV. AND REFERRAL OF AUREOUMBRA AND AUREOCOCCUS TO THE PELAGOPHYCEAE1. Journal of Phycology, 1997. 33(6): p. 1042- 1048.

145. Berger, L., et al., Life cycle stages of the amphibian chytrid Batrachochytrium dendrobatidis. Dis Aquat , 2005. 68(1): p. 51-63.

146. Stenzel, D.J. and P.F. Boreham, Blastocystis hominis revisited. Clin Microbiol Rev, 1996. 9(4): p. 563-84.

147. Merson-Davies, L.A. and F.C. Odds, A morphology index for characterization of cell shape in Candida albicans. J Gen Microbiol, 1989. 135(11): p. 3143-52.

148. Harris, E.H., Chlamydomonas as a Model Organism. Annu Rev Plant Physiol Plant Mol Biol, 2001. 52: p. 363-406.

149. Sager, R. and S. Granick, Nutritional control of sexuality in Chlamydomonas reinhardi. J Gen Physiol, 1954. 37(6): p. 729-42.

150. Hoshina, R., M. Iwataki, and N. Imamura, Chlorella variabilis and reisseri sp. nov.(Chlorellaceae, Trebouxiophyceae): Redescription of the endosymbiotic green algae of Paramecium bursaria (Peniculia, ) in the 120th year. Phycological research, 2010. 58(3): p. 188-201.

151. Lin, X. and J. Heitman, The biology of the Cryptococcus neoformans species complex. Annu Rev Microbiol, 2006. 60: p. 69-105.

152. Current, W.L. and L.S. Garcia, Cryptosporidiosis. Clinical microbiology reviews, 1991. 4(3): p. 325-358.

124 Texas Tech University, Ben Qin, May 2017

153. Thompson, R.C., et al., Cryptosporidium and cryptosporidiosis. Adv Parasitol, 2005. 59: p. 77-158.

154. Merola, A., et al., Revision of Cyanidium caldarium. Three species of acidophilic algae. Plant Biosystem, 1981. 115(4-5): p. 189-195.

155. Schaap, P., et al., Molecular phylogeny and evolution of morphology in the social amoebas. Science, 2006. 314(5799): p. 661-3.

156. Laguna, R., et al., Induction of phase variation events in the life cycle of the marine coccolithophorid Emiliania huxleyi. Appl Environ Microbiol, 2001. 67(9): p. 3824-31.

157. Terada, S., et al., Microsporidan hepatitis in the acquired immunodeficiency syndrome. Ann Intern Med, 1987. 107(1): p. 61-2.

158. Green, L.C., P.J. Didier, and E.S. Didier, Fractionation of sporogonial stages of the microsporidian Encephalitozoon cuniculi by Percoll gradients. J Eukaryot Microbiol, 1999. 46(4): p. 434-8.

159. Mukherjee, C., C.G. Clark, and A. Lohia, Entamoeba shows reversible variation in under different growth conditions and between life cycle phases. PLoS Negl Trop Dis, 2008. 2(8): p. e281.

160. Birkeland, S.R., et al., Transcriptome analyses of the Giardia lamblia life cycle. Mol Biochem Parasitol, 2010. 174(1): p. 62-5.

161. Hill, D.R. and R. Wetherbee, Guillardia theta gen. et sp. nov.(). Canadian journal of , 1990. 68(9): p. 1873-1876.

162. Gossage, S.M., M.E. Rogers, and P.A. Bates, Two separate growth phases during the development of Leishmania in sand flies: implications for understanding the life cycle. Int J Parasitol, 2003. 33(10): p. 1027-34.

163. Manton, I. and M. Parke, Further observations on small green with special reference to possible relatives of Chromulina pusilla Butcher. Journal of the Marine Biological Association of the United , 1960. 39(02): p. 275-298.

164. Karpov, S., VARIABILITY OF MONOSIGA-OVATA (CHOANOFLAGELLIDA, MONOSIGIDAE) IN THE CULTURE. ZOOLOGICHESKY ZHURNAL, 1980. 59(2): p. 296-299.

165. Fritz-Laylin, L.K., et al., The genome of Naegleria gruberi illuminates early eukaryotic versatility. Cell, 2010. 140(5): p. 631-42.

166. Fawley, K.P. and M.W. Fawley, Observations on the diversity and ecology of freshwater Nannochloropsis (Eustigmatophyceae), with descriptions of new taxa. , 2007. 158(3): p. 325-36.

167. Lubián, L.M., Nannochloropsis gaditana spec. nov., una nueva Eustigmatophyceae marina. Lazaroa, 1982. 4: p. 287-293.

125 Texas Tech University, Ben Qin, May 2017

168. Dubey, J.P., G. Schares, and L.M. Ortega-Mora, Epidemiology and control of neosporosis and Neospora caninum. Clin Microbiol Rev, 2007. 20(2): p. 323-67.

169. Chrétiennot-Dinet, M., et al., A new marine picoeucaryote: Ostreococcus tauri gen. et sp. nov.(, ). Phycologia, 1995. 34(4): p. 285-292.

170. Grisvard, J., et al., Differentially expressed genes during the encystment-excystment cycle of the ciliate Sterkiella histriomuscorum. Eur J Protistol, 2008. 44(4): p. 278-86.

171. Corliss, J.O. and S.C. Esser, Comments on the role of the cyst in the life cycle and survival of free-living . Trans Am Microsc Soc, 1974. 93(4): p. 578-93.

172. Dungan, C., Binding specificities of mono-and polyclonal antibodies to the protozoan oyster pathogen Perkinsus marinus. Dis. Aquat. Org., 1993. 15: p. 9-22.

173. Martino, A.D., et al., Genetic and phenotypic characterization of Phaeodactylum tricornutum (Bacillariophyceae) accessions1. Journal of Phycology, 2007. 43(5): p. 992- 1009.

174. He, L., X. Han, and Z. Yu, A rare Phaeodactylum tricornutum cruciform morphotype: culture conditions, transformation and unique fatty acid characteristics. PLoS One, 2014. 9(4): p. e93922.

175. Savidor, A., et al., Cross-species global proteomics reveals conserved and unique processes in Phytophthora sojae and Phytophthora ramorum. Mol Cell Proteomics, 2008. 7(8): p. 1501-16.

176. Hansen, E. and P. Hamm, Identification of Phytophthora spp. known to attack in the Pacific Northwest. 1987.

177. Greenwood, B.M., et al., : progress, perils, and prospects for eradication. J Clin Invest, 2008. 118(4): p. 1266-76.

178. Gothe, G., K. Boehm, and E. Unger, Different resting stages of the plasmodial rhizopod Reticulomyxa filosa. Acta Protozoologica, 1997. 36(1): p. 23-29.

179. Dayel, M.J., et al., Cell differentiation and in the colony-forming Salpingoeca rosetta. Dev Biol, 2011. 357(1): p. 73-82.

180. Fantes, P.A., Control of cell size and cycle time in Schizosaccharomyces pombe. J Cell Sci, 1977. 24: p. 51-67.

181. Sauvant, M.P., D. Pepin, and E. Piccinni, Tetrahymena pyriformis: a tool for toxicological studies. A review. Chemosphere, 1999. 38(7): p. 1631-69.

182. Alverson, A.J., et al., The model marine Thalassiosira pseudonana likely descended from a freshwater ancestor in the genus Cyclotella. BMC Evol Biol, 2011. 11: p. 125.

126 Texas Tech University, Ben Qin, May 2017

183. Dubey, J.P., Advances in the life cycle of Toxoplasma gondii. Int J Parasitol, 1998. 28(7): p. 1019-24.

184. Dubey, J.P. and J.K. Frenkel, Cyst-induced toxoplasmosis in cats. J Protozool, 1972. 19(1): p. 155-77.

185. Dubey, J.P., D.S. Lindsay, and C.A. Speer, Structures of Toxoplasma gondii tachyzoites, bradyzoites, and sporozoites and biology and development of tissue cysts. Clin Microbiol Rev, 1998. 11(2): p. 267-99.

186. Field, A.S., et al., Myositis associated with a newly described microsporidian, Trachipleistophora hominis, in a patient with AIDS. J Clin Microbiol, 1996. 34(11): p. 2803-11.

187. Schwebke, J.R. and D. Burgess, Trichomoniasis. Clin Microbiol Rev, 2004. 17(4): p. 794- 803, table of contents.

188. Magalhaes, A.D., et al., Trypanosoma cruzi alkaline 2-DE: Optimization and application to comparative proteome analysis of flagellate life stages. Proteome Sci, 2008. 6: p. 24.

189. Banuett, F. and I. Herskowitz, Discrete developmental stages during teliospore formation in the corn smut , Ustilago maydis. Development, 1996. 122(10): p. 2965-76.

190. Kirk, D.L., Germ-soma differentiation in volvox. Dev Biol, 2001. 238(2): p. 213-23.

127