WESLEYAN UNIVERSITY

Ecological dimensions of significance in diversification of subtilis and Bacillus licheniformis

Sarah M. Kopac

I would first like to thank my thesis committee, Annie Burke, Michael Singer, Danny

Krizanc, and especially my adviser, Fred Cohan. They have all shaped my thinking and ability to critically look at the work of others as well as my own, not only through classes but also through their personal mentoring. Fred especially helped me learn how to develop my ideas, has encouraged me to explore my interests in astrobiology and has given me the great gift of acknowledging my management skills, which I would not have touted on my own.

Second, I credit my family with always supporting me and encouraging my love of science. It began with studying the Challenger, the Apollo program, and with classic science fiction movies, and has endured to become a meaningful career. My mother has always been there to commiserate with me when I am overwhelmed, my dad has been there for career and travel advice, and my sister has been there to commiserate when mom and dad get annoying. All three of them, throughout these six years, never ceased to remind me that I am special no matter what, and I think they actually believed it.

My grandmother has also been an unswaying source of encouragement. She is the grandmother everyone wants to have; always believing that I have been wronged and have never been the perpetrator of a wrong, reminding me to take care of myself, asking if I have enough money to pay the rent, and sending me cards without any occasion. She is simply the best.

My friends and former roommates Nikki Evensen and Erica Stankard will always have a very special place in my heart. Swapping complaints about research with Nikki has been my consolation that I am not alone in this. Erica has been my shortest friend for a long time and can

i | P a g e always, always make me laugh, nearly never answers the phone except when I am crying or it is

2am, and has never doubted me.

My Wesleyan friends, especially Michelle Kraczkowski, have actually made parts of this whole process enjoyable. Kate Miller and Sarah GW are notable for their skills in dancing, and

(lack of) trivia knowledge. Jane Wiedenbeck is an inspiration, endlessly energetic, wise, and losing her phone, all of which I love her for. Those who I have worked with in the Cohan lab, especially Melanie Koren, Menherit Goodwyn, Janine Petito and Steffers Aracena, have made science more satisfying. They have all enriched my life and continue to do so.

Tod Osier and Jen Klug were my advisers from Fairfield University and are responsible for the majority of the good advice that I’ve received in my life. They’re supportive and wonderful, and I hope I never lose touch with them.

Lastly, Craig Breitsprecher has been invaluable in his support and endless patience in the last year. He has expanded my cultural knowledge during a phase when most people become very closed off from the world. He is also utterly ridiculous and makes me laugh, encourages me not to back down, and is the best friend I could ask for. Besides Rosie.

ii | P a g e

Table of contents

Abstract iv Introduction 1 Chapter 1: Ecological dimensions associated with bacterial speciation 19 Chapter 2: Bacillus subtilis-licheniformis growth in high copper and high boron 42 conditions Chapter 3: Genomic differences and their effects on independent and competitive 55 growth Conclusions 64 References 67 Appendix 86

iii | P a g e

The bacterial domain includes thousands of known species, and likely orders of magnitude more that have not yet been discovered. Little is known about the causes of diversification in and the environmental factors associated with recent divergences. Ecotypes are bacterial populations that are theorized to be ecologically distinct, representing the most recent products of speciation. Our study utilizes a unique environmental soil gradient with dimensions such as salinity, boron, and copper decreasing westward along the transect, and other dimensions remaining stable or in random flux along the gradient. We isolated and sequenced a protein coding gene for 620 strains from the Bacillus subtilis-licheniformis clade from the environmental gradient described above. These sequences were used to demarcate ecotypes from the sample set, and tested for associations with the twelve environmental factors measured for the soil samples. We found thirty one ecotypes in our sample set, twenty three in the B. subtilis subclade and nine in the B. licheniformis subclade, including nine previously unidentified ecotypes. The ecotypes are significantly heterogeneous in their associations with iron, F(18, 365)=1.6704, p=0.04239 and boron, F(18, 244)=1.6767 p=0.04401 in the soil, as well as the soil pH F(18, 365)=1.6466, p=0.04699 and the proportion of clay present in the soil F(18, 99)=1.7226 p=0.04753. We further explored the ecological heterogeneity of ecotypes and strains within ecotypes by testing their growth tolerance to high levels of boron or copper in the soil. Ecotypes were marginally significantly different in their growth in high boron media, F(7, 61)=0.0566 p=0.0566, and strains nested within ecotypes were significantly different, F(13, 61) p=0.0087. In high copper media, ecotypes didn’t show differences in growth, F(24, 131) p=0.21975, but strains did, F(73, 82) p=0.00001. This suggests that strains are quickly evolving with respect to their associations with copper and boron, and might gain and lose tolerances within the lifetime of an ecotype.

Rapid diversification, even within ecotypes, is supported by a separate project where we compared the genomes content of four strains of Bacillus subtilis subspecies spizizenii within a single ecotype. Gene annotations and genome comparisons with RAST showed that the strains differed in gene content, including genes for carbohydrate usage, specifically in utilization of maltose, maltodextrin and myo-inositol. Only one strain, G1A4, had genes that were non- paralogous to genes in the other strains; these were five genes for maltose and maltodextrin utilization. Strain G1A4 also had two paralogous genes for maltose and maltodextrin utilization in addition to the genes all of the strains shared; the strain G1A3 had three paralogous genes for the utilization of myo-inositol in addition to those in the core genome. To determine if differences in gene content reflect differences in ecology, strains were tested in monocoluture and in competition for their growth in media with the sole energy source as maltose, maltodextrin or myo-inositol. The strain G1A4, predicted to perform best on maltose and maltodextrin, did outperform the other strains. G1A4 also performed better on glucose, indicating that the strain was superior for reasons besides the extra maltose/maltodextrin genes (due to either the five other unique genes with known functions or one of the dozens of unique

iv | P a g e genes with unknown functions). The strain G1A3, predicted to perform best on myo-inositol, did not perform the best, even when data was corrected for the strains’ growth differences in the glucose control. These findings demonstrate the value of combining genomic analyses with growth experiments. Differences in the ecology of strains can be difficult to determine from sequence data alone. But taken together, the results indicate that carbohydrates are one factor associated with very recent speciation events in bacteria.

v | P a g e

Thousands of species of bacteria have been identified in the bacterial domain. These species are divergent in lifestyle and physical structure, including Deinococcus radiodurans, which floats on dust particles and can survive higher levels of radiation than any other known organism (Tempest and Moseley 1982), Aquifex species that tolerate temperatures of up to 95°C

(Madigan, Martinko et al. 2005), organisms that can live in single-species ecosystems hundreds of meters under the surface of the earth (Chivian, Brodie et al. 2008), and thioautotrophic organisms that need neither organic carbon nor sunlight to live (Stewart and Cavanaugh 2006).

Diversity also exists between more closely related lineages, most notably as ecological diversity within taxonomic species (Welch, Burland et al. 2002, Sikorski and Nevo 2005, Walk, Alm et al.

2007, Walk, Alm et al. 2009, Cohan and Kopac 2011, Luo, Walk et al. 2011). It is accepted that bacterial diversity is associated with environmental factors; light levels, temperature, and phosphorous concentrations in the environment have been linked to divergences in

Synechococcus and Prochlorococcus (Allewalt, Bateson et al. 2006, Johnson, Zinser et al. 2006,

Martiny, Coleman et al. 2006, Bhaya, Grossman et al. 2007, Martiny, Tai et al. 2009). In this work, we provide evidence that the diversification of Bacillus subtilis and Bacillus licheniformis are associated with environmental dimensions including resource related factors and growth inhibitors.

Bacterial diversity is so extensive that its limits have not yet been discovered (Cole,

Konstantidindis et al. 2010, Caporaso, Paszkiewicz et al. 2012, Kyrpides, Hugenholtz et al.

2014). As of 2009, there were approximately 1,000 prokaryotic species with sequenced genomes

(Wu, Hugenholtz et al. 2009); that number had increased to 1918 (with an additional 989

“permanent draft” genomes) as of 2012 (Pagani, Liolios et al. 2012); 3,285 bacterial or archaeal

1 | P a g e type strains currently have a genome project finished or under way (Kyrpides, Hugenholtz et al.

2014). In addition to named bacterial species, many strains have been identified by environmental DNA and are the only representative in their taxon. For example, in one case, deep sequencing of subsurface soils showed that the most prolific organism is one in a new phylum (Castelle, Hug et al. 2013); the other organisms found were also surprisingly divergent from even the most similar known strains, possibly representing sister species to already known species (Castelle, Hug et al. 2013). Although we are uncovering more of the diversity in bacteria, it is clear that the rarefaction curve has not yet been saturated.

Methods for the accurate identification of bacteria by molecular techniques have existed for barely 40 years (Johnson 1973). In reality, science has only just begun to characterize the variety of bacterial species. It’s estimated that over 9,000 new species of bacteria must be identified to fully sample the bacterial diversity that exists (Caporaso, Paszkiewicz et al. 2012), and with the most extensive of present experiments finding fewer than a dozen new species, it might be another 100 years before we uncover the limits of bacterial diversity. Our exploration has been limited both by access to some bacterial habitats (for example, Antarctic ice sheets) coupled with the need for large amounts of DNA for sequencing (making cultivability a prerequisite for the identification of a new sample). The advent of Next Generation sequencing technology has confirmed that bacteria inhabit even environments once thought too harsh to harbor life (Connon and Giovannoni 2002, Sogin, Morrison et al. 2006)- even the atmosphere is occupied by microbial cells passing through (Wainwright, Wickramasinghe et al. 2003, Maki,

Puspitasari et al. 2014) and others that actually participate in nucleating hailstones (Šantl-

Temkiv, Finster et al. 2013).

2 | P a g e

Microbial diversity is reflected not only by the environments which species colonize but also their method of appropriating energy. Similar to eukaryotes, bacteria can be sorted into autotrophs and heterotrophs, but within these categories bacteria have the distinction of many more variations of strategies and pathways than plants and animals do (Madigan, Martinko et al.

2005). Bacterial autotrophs, for example, have seven types of pigments primarily responsible for photosynthesis (bacteriochlorophylls) in addition to those used in plants (chlorophyll a and b).

This is astounding compared to the two types of chlorophyll found in all plant lineages (plants have a variety of carotenoids to absorb energy from wavelengths that chlorophyll cannot, but no plant can survive without chlorophyll) (Madigan, Martinko et al. 2005). Chemolithotrophy, the ability to gain energy from inorganic compounds, is also a strategy unique to bacteria.

Chemolithotrophs are often found in extreme environments and include groups that can oxidize hydrogen, sulfide, elemental sulfur, ammonium, nitrite, or ferrous iron (Madigan, Martinko et al.

2005).

In addition to having more variety in the approach to autotrophy, bacteria are also more diverse than the eukaryotes in methods to break down glucose during respiration {Madigan,

2005, Brock Biology of Microorganisms}. These bacteria are called anaerobes, because they do not use O2 as their final electron acceptor. Such microorganisms transfer electrons to nitrogen, sulfur, and carbon compounds. In some environments, anaerobic bacteria are key in the cycling

- of nitrate, reducing the nitrate (NO3 ) unusable by animals all the way to nitrogen gas (N2).

In “macro”, or multicellular organisms, there is a long history on the study of speciation, which the field of microbiology lacks. Theories of speciation in the animal and plant phyla concern reproductive and ecological diversification. Ernst Mayr championed the Biological

Species Concept based on the principle of reproductive isolation, which dictated that two

3 | P a g e lineages could only be named separate species when their offspring were no longer viable (Mayr

1969, Mayr 1982). Speciation was seen as a process of increasing sexual isolation created through pre- and post- reproductive barriers.

Ecology was given a back seat in the Biological Species Concept. In the BSC, ecological divergences might occur before, during, or after sexual isolation, but is not considered enough in itself to push two lineages to become different species. The focus on physical and behavioral reproductive phenotypes is one of the points of contention in critiques of Mayr’s work. It is this very focus on the population’s ability to interbreed which make Mayr’s theory difficult and unrealistic to apply to the bacterial world, since bacteria have asexual reproduction and are generally able to acquire genes from very distant relatives. Horizontal gene transfer events are usually limited to one or a few genes, meaning that bacterial hybrids are not as ill-fit to parental environments as are hybrids in sexually reproducing species (Cohan 2010). In addition, rates of recombination, which could be thought of as analogous to the recombining of genes during crossing over in sexually reproducing organisms, can be similar to mutation rates (Vos and

Didelot 2008), signifying that mutations might play just as big of a role in shaping bacterial populations as recombination; in fact, recombination rates might not be high enough to homogenize the diversity of different populations (Cohan 2010) even though they might confer a larger change in the genetic code.

As of late, other models of species and speciation have become prominent, which apply to both eukaryotes and prokaryotes. Two examples are the theory of ecological speciation and theories of rapid speciation which have been described in several different organisms (Mallet

2008). These theories highlight ecology as the main driving force in the divergence of populations. In ecological speciation, populations are associated with two different ecologies;

4 | P a g e and divergent selection pushes each population to become better adapted to its environment

(Nosil 2012). Hybrids are selected against because they are expected to be less fit than parent populations in both environments. As in the BSC, divergent populations are expected to move towards reproductive isolation, but different interpretations of the theory disagree on whether this is required (Nosil 2012).

Models of rapid speciation downplay the significance of reproductive isolation even more. James Mallet published evidence that ecological speciation is occurring rapidly, even in sympatric populations (Mallet 2008). He defined species based on phenotypic differences related to their natural ecology, which could then sometimes (but not always) lead to reproductive isolation. This model of speciation can be applied to bacteria with fewer conditionals than the Biological Species Concept; bacterial populations can participate in HGT with one another while still being considered separate species. In addition, this theory fits well with the high levels of mutation (as compared to multicellular organisms) observed in bacterial populations, as well as the low rates of recombination (Wiedenbeck and Cohan 2011).

Mallet’s theory also recognizes the importance of the interplay between adaptive genes and selection. Sexual isolation is not a prerequisite for the divergence of two populations because the impact of gene flow will be kept to low levels by selection; introduced genes will be eliminated if unfavorable (Mallet 2008). Hybrids can interbreed with either population without causing the respective populations to homogenize relative to ecological adaptations because of the strength of ecological selection against maladaptive genes.

Applying this more rapid model of speciation, Mallet identified a dozen examples of closely related populations which still engaged in gene flow, but had divergent phenotypes

5 | P a g e correlating to their different ecologies (Mallet 2008). For example, two closely related populations of a single snail species were divergent in shell shapes for protection against waves, according to the distance the snails lived from the high tide line. Mallet also noted the divergence between a population of killer whales that are nomadic and feed on fish, and a population that is established in a single location and is thus able to hunt seals (Mallet 2008).

Frequently occurring speciation can be thought of as a modification of Nosil’s and others’ characterization of speciation as a continuum where the continuum consists of many small divergences in ecology.

There are varying standards for detecting speciation in population genetics studies, but according to theories of rapid speciation, we only need to expect populations to have some divergence and to exhibit ecological differences (Mallet 2008, Cohan 2010). Thus, groups presently identified as “sub”species or “cryptic” species might be reassigned to full species status.

A model of frequent ecological speciation applies much more soundly to bacteria than does the Biological Species Concept. As previously mentioned, bacterial diversity is not recombined in every generation as it is in animals, making gene flow a less pressing matter in terms of the possibility of homogenizing newly divergent populations. On the other side of bacterial reproduction, horizontal gene transfer, occurring relatively infrequently, can supply niche specific genes which, if acquired by a population but not its sister population, could stimulate divergent selection to occur.

If, indeed, all a lineage needs to diverge from a parent population are a set of ecological adaptations, this could occur very quickly in bacteria. A migrant bacterium that lands in a new

6 | P a g e environment could acquire many of the genes it needs to survive from the microflora already present. This is one of the dangers of frequent antibiotic use; if one’s natural microbiome community becomes antibiotic resistant, then the community can pass on those genes to invading pathogens.

There is plenty of evidence that bacteria adapt to ecological conditions, either abiotic or biotic. Even early microbiologists associated different abiotic environmental factors (including temperature, soil substrates and pH) with different species. Growth parameters related to abiotic factors are still considered important to characterizing a species. Recently, the biotic influence of the larger microbiological community has itself been recognized as a significant factor shaping a microbe’s environment. The overall metabolic function of a community may be linked to the dominant taxa (Arumugam, Raes et al. 2011, Muegge, Kuczynski et al. 2011, Wu, Chen et al. 2011). Interestingly, the community type of one microbiome can be predictive of the microbiome elsewhere on the body (e.g. oral vs gut) of the same individual (Ding and Schloss

2014).

Several models for bacterial speciation have been described, including the Stable Ecotype

Model and the Nano-Niche Model from the Cohan lab. In the Stable Ecotype Model, bacterial ecotypes are thought to be analogous to the species-level taxonomic unit in Animalia (Cohan

2003); that is, groups that are cohesive, ecologically distinct, and founded only once, as well as irreversibly separate (de Queiroz 2005). Similarly, ecotypes are defined to be founded only once and have limited diversity (Koeppel, Perry et al. 2008). New ecotypes are founded when a strain or individual accesses a new ecological dimension through a genetic change via mutation or horizontal gene transfer event. Thus, ecotypes are ecologically distinct by definition. The mechanism of ecotype formation ensures that the diversity within new ecotypes is extremely

7 | P a g e low, typically being limited to the genetic lineage of a single individual cell. In the Stable

Ecotype Theory, low diversity is maintained throughout an ecotype’s existence by periodic selection events, wherein a strain or individual has a mutation or acquires a gene that makes the recipient more competitive within its current ecology (Cohan 2005). The lineage of this individual cell therefore can out-compete those of the other cells in the ecotype. This process results in homogenized populations; neutral diversity throughout the genome is eliminated because the sequence of every gene is affected by periodic selection (Cohan 2005). This is significant for the demarcation of ecotypes, as any gene, not just niche specific ones, can be used to predict ecotypes (Koeppel, Perry et al. 2008, Cohan 2010, Connor, Sikorski et al. 2010). The ecologies of demarcated ecotypes can then be elucidated, avoiding the problems of predicting the genes and ecological associations involved in ecological divergence a priori.

The Nano-Niche Model is based on the same principles of ecotype formation and periodic selection as the Stable Ecotype Model, but in the Nano-Niche Model closely related ecotypes are not shielded from each other’s periodic selection events. Closely related ecotypes might differ only in their ecology by the proportion of resources that they use. For example, all ecotypes might have the ability to use maltose, glucose, and maltodextrin in the environment, but one ecotype might specialize on maltose, another on glucose, and a third on maltodextrin. (A similar scenario is discussed in chapter 3). In this case, a periodic selection event involving a gene for a resource that all three of the Nano-ecotypes use (for instance, phosphorus), would homogenize the populations of all three of the ecotypes, and the divergences in carbohydrate specialization would be lost. The Nano-Niche Model does not preclude the existence of the

Stable Ecotype Model, and both might be applicable to different types of bacteria, perhaps according to their ecology or evolutionary history.

8 | P a g e

Conflicting theories maintain that periodic selection events would not sweep through an entire population before recombination spread the adaptive gene through the ecotype. This would prevent the homogenization of genomes within the ecotype, and would require researchers to use only adaptive genes to demarcate ecotypes. However, Ecotype Simulation has been successful in demarcating ecologically distinct clades using housekeeping genes. Independent experiments have shown that the rate of recombination in bacteria is relatively low compared to the rate of mutation, meaning that recombination is probably not frequent enough to prevent periodic selection events from moving through the population.

Another alternative theory to the ecotype models has sprung up from studies of genome comparisons. Seeing that no two genomes are exactly alike, Ford Doolittle drew the conclusion that each cell is ecologically distinct and can be treated as its own species. When classifying the genomic differences between a sister and mother cell, however, the majority of differences are not related to ecology but rather to the lifetime “experiences” of the cell, such as viral insertions sequences or genes received via horizontal gene transfer. These genomic differences don’t necessarily translate into ecological differences, though (Touchon, Hoede et al. 2009, Kopac,

Wiedenbeck et al. 2014). Even if genes with known bacterial functions are received through conjugation or transduction, the gene might not be expressed and could be destined for degradation in subsequent daughter cells.

Examples of diversification in bacteria abound and of special interest are recent diversifications of close relatives where the groups differ in one or a handful of environmental dimensions. Such instances have been noted within species groups and within genera (the diversification events being more recent in the latter). The Prochlorococcus genus, for example, contains clades associated with different light levels (Hess, Rocap et al. 2001, Martiny, Tai et al.

9 | P a g e

2009), temperature (Johnson, Zinser et al. 2006, Martiny, Tai et al. 2009), and nutrient/phosphorus levels (Johnson, Zinser et al. 2006, Martiny, Coleman et al. 2006, Martiny,

Tai et al. 2009). Genome comparisons also suggest that there is diversification within these clades in the bacteria’s ability to respond to stressors (Coleman, Sullivan et al. 2006). In another case, strains of Mastigocladus laminosus, a thermophile, have diversified and also show evidence that incoming genes from a different ecological population would be detrimental to diversifying populations, as predicted. Lineages of this species have diversified to inhabit different temperatures in their hot spring environment; in the lab, strains grew best at the temperature regime most similar to their sample site and grew poorly in the other temperature regime (Miller, Williams et al. 2009). Similarly, thermophilic Synechococcus populations are associated with different temperatures, and genomes indicate that they differ in their abilities to use nitrogen, phosphate, and iron (Bhaya, Grossman et al. 2007). Perhaps the strongest support for rapid diversification of bacteria comes from experimental evolution, such as the strains of

Pseudomonas flurescens which diversified into three different “morphs” to occupy different niches in a single 25mL microcosm over the course of a week (Rainey and Travisano 1998,

Koeppel, Wertheim et al. 2013). The LTEE (Long Term Experimental Evolution) project in

Richard Lenski’s lab also produced strains distinctly different from the original strains in that they could use citrate as a carbon source, although this diversification took longer than the Rainy experiment (31,000 generations) (Blount, Borland et al. 2008).

Whether one accepts the Stable Ecotype Model or a theory that is marked by a stronger role of recombination, ecologically distinct populations should be genetically distinct, able to be discerned by sequence data. Bacterial species have generally been defined by their 16S rDNA sequences, but using this highly conserved gene sequence has led to taxonomically recognized

10 | P a g e species that contain many ecologcally distinct groups. Escherichia coli is an example of a taxonomically recognized species that contains multiple phylogenetic clades each associated with a unique ecology (Welch, Burland et al. 2002). Common knowledge dictates that E. coli is a commensal bacteria which lives in the human gut, but the lay population also recognizes the species as a dangerous contaminant of chicken, spinach, and tomatoes. In reality, pathogenic strains of the bacterium differ in tissue tropisms and virulence (Manning, Motiwala et al. 2008).

Adding to the diversity in this species, some clades of E. coli strains are associated with hosts other than humans, including other mammals or birds (Walk, Alm et al. 2009), while other E. coli strains aren’t associated with host tissues at all, and commonly found in bodies of freshwater or living on beaches (Walk, Alm et al. 2007, Cohan and Kopac 2011).

At present, many sequence-based analyses are used to predict and demarcate ecologically distinct groups of bacteria. Comparison of 16S sequences, although convenient because the sequence is universal across bacteria, rarely offers the resolution needed to distinguish ecologically distinct groups. Sequences of protein coding genes generally offer a much higher resolution, and several such genes have been identified as universally conserved “standards” in the field (Santos and Ochman 2004). Of those that are more quickly evolving, phylogenies have enough resolution to distinguish ecotypes (see the Cohan lab’s publications Koeppel et al 2008 and Connor et al 2010; the sequences of all three protein-coding genes generally agree on the structure of the tree). However, as noted in the publications mentioned above, using a single- gene phylogeny can be affected by recombination. Multi Locus Sequence Typing is often favored over single gene analyses for this reason. Genome sequences represent the highest level of phylogenetic resolution, but direct comparisons can confound the ecological information in the results; we expect that even ecologically interchangeable strains will have detectable

11 | P a g e differences in their genomes caused by viral insertions, recent HGT events, and transposons, and even rearrangement of genes (Coleman, Sullivan et al. 2006, Touchon, Hoede et al. 2009, Paul,

Dutta et al. 2010, Kopac and Cohan 2011). Horizontal gene transfer events can also cause false positives in the demarcation of ecologically distinct groups by gene content comparisons, but this type of analysis is not obscured by transposons, viral inserts, or gene rearrangements.

Ultimately, even functional comparisons of genomes must be confirmed by experimental evidence for two groups to be confirmed as ecologically distinct (see Chapter 3).

High-resolution phylogenies of close relatives might provide the appropriate clades for ecotypes, but it’s not intuitive at which level of clade the groups should be demarcated (we expect that ecotypes contain some ecologically neutral diversity, which would have arisen since the last periodic selection event) (Cohan and Perry 2007, Koeppel, Perry et al. 2008). The key is to find the nodes of the phylogeny associated with ecological divergences (Nosil 2012). There are several algorithms aimed at solving this problem. The two highlighted here, Ecotype

Simulation and AdaptML, have successfully identified ecologically distinct groups, are free, and are open source (Perry, Koeppel et al. 2006, Hunt, David et al. 2008, Francisco, Cohan et al.

2012).

AdaptML divides phylogenies into groups (which will be referred to here as ecotypes) associated with different “habitat spectra” (Hunt, David et al. 2008); habitat spectra are a combination of the habitats from which strains were sampled, proportional to the number of strains from each habitat. The input requires a phylogeny of samples with habitat associations; the advantage of AdaptML is that the nature of the method enables the program to confirm the ecological uniqueness of each ecotype. Since ecotypes are demarcated by habitat spectra, transitions between different habitat spectra can be mapped to ancestors in the phylogeny.

12 | P a g e

However, this method also limits the program to finding only those ecotypes that differ in the ecological dimensions measured, requiring scientists to predict the environmental associations of strains a priori (Kopac and Cohan 2011). We theorize that this causes AdaptML to lump multiple ecotypes into one group when data on an important ecological dimension is missing.

Similar to AdaptML, Ecotype Simulation demarcates clades which are ecologically distinct, but Ecotype Simulation does not require any ecological data as input. Instead, EcoSim only requires a DNA sequence alignment. Like AdaptML, ecotype predictions are more accurate with a higher resolution of input data. The ecological distinctness of ecotypes predicted by

Ecotype Simulation need to be confirmed experimentally or with an independent analysis. The advantage of this is that Ecotype Simulation is not limited to demarcating ecotypes differing in ecological dimensions that the researchers have predicted, which could be the reason why, using the same sample set, Ecotype Simulation finds smaller ecotypes within the ecotypes that

AdaptML predicts. However, as this thesis will explain, strains in predicted ecotypes are not ecologically homogeneous for all dimensions, which might indicate that diversification is happening very quickly, even within putative ecotypes.

As discussed above, several environmental factors have been confirmed to be associated with bacterial diversification. For example, associations with light level differed between clades within Synechococcus, and Prochlorococcus, while clades of Bacillus subtilis, Bacillus simplex,

Mastigocladus laminosus, Synechococcus, and Prochlorococcus were differently associated with temperature within their respective genera. Lastly, both Synechococcus, and Prochlorococcus have populations with different phosphorus associations. It’s possible that some environmental factors are easier than others for bacteria to gain and lose adaptations to. Antibiotic resistance, of particular concern at present, is a trait bacteria can easily gain and lose, merely through the

13 | P a g e acquisition of a plasmid with the appropriate genes. To the contrary, cell wall type (gram- positive, gram-negative, or lack of a cell wall), is a trait that rarely changes along a lineage

(Woese 1987).

We selected salinity as one candidate environmental dimension that ecotypes might be associated with, due to an interesting conflict of facts. There is evidence that salinity is associated with deep divergences in the bacterial phylogeny (Lozupone and Knight 2007), making it one of those traits bacteria rarely gain and lose adaptations for. Even when saline and non-saline environments are in close geographical proximity, (for example in an estuary) the adjacent saline and non-saline environments harbor different bacterial communities (Benlloch,

López-López et al. 2002, Langenheder, Kisand et al. 2003, Walsh, Papke et al. 2005, Schapira,

Buscot et al. 2009, Hollister, Engledow et al. 2010, Berthe, Ratajczak et al. 2013). Salinity is a structuring factor in these communities as well as in bacterial evolution, possibly due to the cell- wide nature of its effects; all proteins are affected by a saline environment. A proteomic analysis in one saline mat community showed that predicted proteins from the mat community were more acidic than non-saline communities and that mat proteins had a higher aspartate usage (Kunin,

Raes et al. 2008), both theorized to increase proper protein folding at high salinities (Soppa

2006).

Conversely, our model organism, Bacillus subtilis is exposed to rapid changes in salinity in the soil, specifically when it rains. Transcriptomic analyses confirm that Bacillus subtilis responds to heightened salinity with altered expression levels of 5-8% of all genes (Steil,

Hoffmann et al. 2003, Hahne, Mader et al. 2010), including proteins only transcribed during periods of salt stress in the cell (Steil, Hoffmann et al. 2003, Hahne, Mader et al. 2010). It is

14 | P a g e unclear why Bacillus subtilis is able to respond to changes in salinity when salinity tolerance is associated with ancient adaptations.

Another way to look at adaptations along the bacterial phylogeny is to classify environmental dimensions as resources or stressors. Both categories are represented in the examples of bacterial diversification discussed above, but these two types of environmental associations have not been compared in cases of diversification of very close relatives. The present work provides evidence of one instance where resource-based adaptations are associated with diversification in Bacillus subtilis subsp. spizizenii (see Chapter 3). In addition, we have found that tolerance-based adaptations, specifically to boron and copper in the soil, are associated with very recent diversification, i.e., diversification occurring within putative ecotypes of the Bacillus subtilis-licheniformis clade (Chapter 2). Other tolerance-based adaptations were not associated with recent diversification of Bacillus subtilis-licheniformis ecotypes, but might be associated with deeper branching within the phylogeny. A genome study of very close relatives in this sample set is anticipated, and will compare these categories of traits more in- depth.

Bacterial diversification is a topic intrinsic to astrobiolgy, a field which probes the origin and evolutionary path of earthly life, and the possibility of life on extraterrestrial bodies such as

Mars or one of the hundreds of planets recently discovered by Kepler (Mix 2006). One of

NASA’s goals is preventing humans and spacecraft from contaminating extraterrestrial bodies with earth-based microbial life, both to ensure the validity of any scientific experiments being conducted and to prevent contamination of these pristine bodies (Office of Planetary Protection website). (This is in contrast to the general public’s concerns that the opposite will occur, with life forms arriving on and harming Earthly life.) This is of real concern, as protocols developed

15 | P a g e to clean spacecraft construction areas extinguish most but not all bacteria (Venkateswaran,

Vaishampayan et al. 2014). The species that remained after cleaning procedures naturally tended to be hardier and belonged to the Actinobacteria and Alphaproteobacteria, both groups known to survive extreme environments that could represent those on other planets (La Duc,

Vaishampayan et al. 2012). A later study found eighty-two strains of bacteria determined to be

“extreme tolerant” (able to survive under more than one extreme condition) sampled from clean rooms intended to be sterile for the construction of spacecraft (Venkateswaran, Vaishampayan et al. 2014). In addition, microbes can survive the trip into space, at least aboard the International

Space Station (Venkateswaran, Vaishampayan et al. 2014); this is of concern for astronauts as biofilm formation and antibiotic resistance arise more frequently than on earth in the low- nutrient, low-gravity environment (Kim, Tengra et al. 2013, Schiwon, Arends et al. 2013).

Astrobiology also includes the study of life in extreme environments. Extreme environments where bacteria live include marine hydrothermal vents, arctic ice, highly saline soils or lakes, hot springs and highly acidic or alkaline waters. The inability to cultivate many of these organisms stems from the difficulty in replicating the extreme conditions of their natural environment; in some cases, minerals from these environments don’t remain intact in laboratory conditions either. However, some of these organisms, including Acidithiobacillus species (Kelly and Wood 2000), Halobacillus species (Moreno, Piubeli et al. 2012), and numerous

Actinobacteria genera (Joshi, Kanekar et al. 2008) can be grown in the laboratory.

Acidithiobacillus ferrooxidans could be used as a tool to predict the characteristics of a dangerous or inaccessible environment (Kopac and Cohan 2011). Some bacteria are associated with the production of complex minerals that are characteristic of elements in the environment; for example, Acidothiobacillus ferrooxidans is an iron oxidizer and associated with formation of

16 | P a g e the minerals jarosite and schwertmannite, which are hydrous minerals containing iron and sulfate groups (Wang, Bigham et al. 2006). There is a movement to include detectors for these minerals on the next Mars rover, because their detection would give astrobiologists an estimate of the amount of water in the environment (Gilmore, Bornstein et al. 2008). However, these complex hydrous minerals can be degraded by transportation (T. Kading, personal comm.) and require lengthy, careful experiments to characterize. Using ecotypes associated with production of these minerals as markers for the minerals themselves might be faster and more reliable than studying the minerals themselves.

In this thesis, I discuss several different approaches to determining the environmental factors are associated with bacterial speciation. Two of the approaches are based on DNA sequences, while the third explores the ability of ecotypes to grow in media spiked with specific geochemical components found in the environment where the ecotypes were isolated.

Combining these techniques, we can piece together a more in-depth view of bacterial speciation.

Chapter 1 describes a project in which I compared the environmental associations of ecotypes predicted from DNA sequence data. We isolated several hundred strains of Bacillus of the licheniformis-subtilis clade, finding thirty two ecotypes, and subsequently determining their associations with 12 environmental factors. The ecotypes were significantly different in their associations with pH, iron, boron and the proportion of clay in the soil, indicating that the groups might specialize to different concentration ranges of these geochemical factors. In addition, it seems that putative ecotypes might contain even smaller ecologically divergent groups.

In Chapter 2, I test a subset of the strains from Chapter 1 for their ability to tolerate boron and copper, two stressors naturally present at varying levels from our sample site. We theorized

17 | P a g e that ecotypes might speciate to be able to tolerate high levels of these toxins, and tested their ability to grow with and without them. I found that strains were significantly different in their ability to grow at high levels of boron, as well as without boron added to the media. Putative ecotypes did not differ significantly in their ability to tolerate high boron. Putative ecotypes followed the same trend with copper, leading us to theorize that adaptations to tolerate these two elements are occurring rapidly within the ecotype groupings that we predicted.

In Chapter 3, I present a comparison of four samples chosen randomly from a single ecotype demarcated in Connor et al 2010. The ecologies of the samples was compared through genome content comparisons and corresponding growth tests. The four sample genomes were compared to each other and a type strain for their metabolic capabilities, as predicted by annotations from the RAST database. All of the strains had unique genes that were not found in the genomes of the other strains. Focusing on carbohydrate metabolism, two strains had extra genes for pathways already present in all strains; one strain had extra copies of genes for maltodextrin metabolism, and the other strain had unique genes for maltose metabolism.

Monoculture and qPCR co-culture tests showed that the strain with unique genes had a competitive advantage on the carbon source of interest, but the strain with only extra copies of genes did not. The experiments in this chapter were done in collaboration with Zhang Wang and

Martin Wu, as well as with an undergraduate and a BA/MA student, Jane Wiedenbeck, from the

Cohan lab.

18 | P a g e

Introduction

As explained above, ecotypes are bacterial populations that have ecological associations distinct from those of closely related populations. Each ecotype is a clade thought to have diverged from its sister clade along some environmental dimension. It is not clear what features define ecology for a free-living bacterium. Owing to historically-based microbiology methods, our knowledge of the ecology of many species is limited to the range of temperatures they can tolerate and a few substrates they can metabolize. Many more factors must be identified to explain the extensive diversification in the bacterial . Macro biologists observe organisms in their natural habitats to link species to their unique ecologies, as with Darwin’s finches. However, it’s generally not possible to observe bacteria in their natural habitats.

Furthermore, undiscovered species most likely outnumber those which have been identified in the past (Wu, Hugenholtz et al. 2009, Cole, Konstantidindis et al. 2010, Caporaso, Paszkiewicz et al. 2012, Kyrpides, Hugenholtz et al. 2014), although with metagenomic tools researchers are uncovering this diversity (Sogin, Morrison et al. 2006), including the discovery of specimen so divergent from known strains that they must be considered new phyla (Castelle, Hug et al. 2013).

Taxonomically named species were not demarcated taking ecological variation into consideration and thus encompass a breadth of ecological diversity (Welch, Burland et al. 2002,

Walk, Alm et al. 2009, Cohan and Kopac 2011, Luo, Walk et al. 2011, Berthe, Ratajczak et al.

2013), making them poor tools for predicting environmental dimensions linked to speciation.

Hunt et al. (2008) published an algorithm, AdaptML, to confirm the ecologically distinct groups in a sample set, but the program relies on the user to define the environmental dimensions of importance (and cannot differentiate groups differing in a parameter not predicted by the

19 | P a g e researcher) (Connor, Sikorski et al. 2010, Cohan and Kopac 2011). The Cohan laboratory and colleagues use an approach wherein clues from the natural environment of bacteria are used to distinguish which parameters are important in the diversification of ecotype lineages, bypassing the problem of predicting environmental associations a priori and of ecologically heterogeneous taxonomic groups (Cohan, Koeppel et al. 2006, Perry, Krizanc et al. 2006, Ward, Bateson et al.

2006, Bhaya, Grossman et al. 2007, Connor, Sikorski et al. 2010, Stefanic, Decorosi et al. 2012).

Genomes offer the highest resolution for identifying the most recently speciated groups (Kopac and Cohan 2011, Kopac, Wiedenbeck et al. 2014), but protein-coding gene sequences offer enough resolution to identify recently speciated groups with unique ecologies within named taxonomic species (Sikorski and Nevo 2005, Perry, Krizanc et al. 2006, Sikorski, Brambilla et al.

2008, Connor, Sikorski et al. 2010).

Horizontal gene transfer and genetic mutations are two important contributes to bacterial diversification. Large population sizes in bacteria allow them to access virtually every SNP possible in every generation, which likely contributes to their extensive diversification. These mutations could provide the adaptation needed for an individual to begin a periodic selection event in its ecotype. It’s less likely (but not impossible) that a point mutation could tweak an individual’s ecology to access a new niche, beginning an ecotype formation event. Horizontal gene transfer is another mechanism that might provide the change in environmental associations needed to found a new ecotype. HGT can pass on niche-transcending genes as well, which would initiate an periodic selection event in the recipient’s ecotype. Due in part to horizontal gene transfer, some adaptations are relatively easy for bacteria to acquire, and occur frequently. .

Antibiotic resistance and pathogenicity are traits that a lineage can gain and lose multiple times.

Other adaptations seem to be more evolutionarily unlikely, as they occur infrequently in the

20 | P a g e history of a lineage (Johnson, Zinser et al. 2006, Lozupone and Knight 2007). Known transitions between a saline environment and a non-saline environment occur only a few times in the entire bacterial tree, possibly because it involves a suite of changes in the genome, involving many genes and pathways instead of a single gene that can be acquired easily. The difficulty or evolutionary frequency of transition might reflect the ease with which a trait can be acquired, including how many genes the phenotype requires, how many other cellular pathways the phenotype interacts with, and even how evolutionarily distant the acquired gene(s) are, since transferred genes might be more likely (Majewski and Cohan 1999), as well as compatible

(Lawrence and Ochman 1997), in close relatives.

Bacterial speciation has important implications in the fields of astrobiology and medicine.

Scientists will be able to better predict the evolution of an ecotype if the limiting environmental factors associated with its parent lineage are known, and this extends to infectious bacteria. For pathogens, a well-defined ecology of a pathogen might lead to more accurate predictions of whether an infectious agent will expand its host range; for example most ecology-related genes in E. coli don’t fall into the core genome (those genes shared by all strains), since the core genome is a comparison of strains with many different ecologies (Welch, Burland et al. 2002,

Touchon, Hoede et al. 2009). Using the same logic, merely being able to accurately identify ecotypes within a named species of pathogen could make vaccine design more efficient by defining the core genome of the ecotype (Kopac and Cohan 2011).

One of the major topics of interest in astrobiology is the evolution of the earliest bacterial lineages into the species that exist today. Little is known about the early diversification of the first million years’ worth of cells, but more recent diversifications might inform us about the past. For example, temperature and light level are two factors associated with the diversification

21 | P a g e of several different groups (Hess, Rocap et al. 2001, Sikorski and Nevo 2005, Johnson, Zinser et al. 2006, Perry, Krizanc et al. 2006, Bhaya, Grossman et al. 2007, Sikorski, Brambilla et al.

2008, Martiny, Tai et al. 2009, Miller, Williams et al. 2009) and ancient bacteria might have taken similar evolutionary paths, since they seem to involve relatively easy adaptations. By studying very recent speciation events, we hope to identify broad classes of environmental parameters often associated with diversification; for example pH or carbohydrate utilization.

These divergences might be fundamental to the evolution of bacterial diversity, as far back as the first species spilt. Similarly, it’s becoming clear that predicting the survival and evolutionary path of microbes associated with spacecraft is necessary for any future extraterrestrial expeditions. Studies have shown that NASA clean rooms are not actually sterile (La Duc,

Vaishampayan et al. 2012, Venkateswaran, Vaishampayan et al. 2014); as basic microbiology textbooks teach, it’s only reasonable to assume the death of nearly all of bacterial cells on a given surface. Alarmingly, the species found after stringent NASA sterilization procedures are known for surviving in harsh conditions, meaning that they might already be equipped for survival in space and/or extraterrestrial environments. Thus it is important that evolutionary predictions be made for scenarios where microbes do “hitch a ride” on a spacecraft and are transferred to an extraterrestrial environment. We can make these predictions based on the ecotype’s ecological associations. The questions of Will this species survive? and How will it survive? are not new in microbiology, but by combining phylogenetic and environmental techniques, we can assert more accurate answers

For the present project, we hypothesized that ecotypes of Bacillus subtilis and Bacillus licheniformis have diversified in association with environmental dimensions in soil. In particular, we expected that we would find ecotypes with adaptations to stressors such as metals

22 | P a g e and high salinity, since Bacillus subtilis and Bacillus licheniformis can survive in environments with high salinity (Steil, Hoffmann et al. 2003, Hahne, Mader et al. 2010) or metal content

(Moore, Gaballa et al. 2005, Moore and Helmann 2005). Our approach was to quantitatively characterize soil components along a geochemical gradient from which strains of Bacillus subtilis and Bacillus licheniformis were isolated. Strains of this clade are common in Death

Valley soils and exist as spores during the dry season, making it possible to isolate many strains without culturing non spore-formers. The clade has two major lineages; that of B. licheniformis and that of B. subtilis. Other named species included in the clade are B. vallismortis, B. amyloliquefaciens, B. sonorensis, B. tequilensis, B. atrophaeus, and B. mojavensis; the subspecies B. subtilis subsp. subtilis, B. subtilis subsp. spizizenii, and B. subtilis subsp. inaquosorum were also analyzed. We plotted our sample transects adjacent to the dry lakebed of

Lake Manly, to incorporate a gradient of soil salinity from the upper tolerance limit for our organism to soil with a normal (average) salinity level. Our transects also encompassed varying levels of other soil components including pH, lead, nitrate, manganese, potassium, zinc, boron, the proportion of organic matter in the soil and the proportions of sand, silt, and clay in the soil.

Some of the soil components varied in the same pattern as soil salinity; decreasing further from the lakebed. Other soil dimensions did not follow this pattern, with maximum levels in the middle of the transect. This allowed us to test for adaptations to a continuum of salinity levels

(and that of salinity co-variants) while simultaneously testing for independent associations with other soil dimensions. We tested ecotypes for their associations with 12 chemical and geological characteristics of the soil; if two or more ecotypes had associations with significantly different concentrations or levels of a characteristic, then this parameter was considered to be associated with early diversification of the ecotypes we sampled. Our methods are unique in using

23 | P a g e quantitative measures of an environmental dimension; previous studies often characterized bacterial ecologies through a comparison of the absence vs. presence of a dimension (for example, see (Lozupone and Knight 2007) (Sikorski and Nevo 2007). By quantifying dimensions in the environment, we may be better able to identify the differences that distinguish the ecologies of closely related groups. Here, we elucidate the ecologies of 33 ecotypes of the

Bacillus subtilis-licheniformis clade with respect to 12 environmental factors.

Methods

Sample collection

Samples were collected from Death Valley in July 2010 and May 2011. Four sample transects were chosen for their location adjacent to the salt pan created by the drying of Lake

Manly; each of the four transects run perpendicular to the salt pan and parallel to one another. In

2010, fifty mL of soil was sampled from sites 2.5cm below the surface at 60 different locations along each transect, these being distributed among 20 levels, with increasing distance from the salt pan, with three replicates at each level. In the following year roughly 470mL of soil was collected from additional levels along two transects; these levels were intermediate to the ones established in 2010, as our intent was to gain higher resolution of samples along the gradient; since the gradient has existed for hundreds of years we did not expect any major changes in the overall environment between sample collections.

24 | P a g e

Figure 1. Sample transects in Death Valley adjacent to the dry lakebed of Lake Manly (right) and alluvial fan extending from Panamint Mountains. The geochemical gradient decreases from right to left. Pins with labels denote the western and eastern limits of each transect; twenty sample sites were spaced approximately evenly along each transect.

Isolation of strains

Soil samples were sifted through 2mm mesh to remove any large particles; soil was sifted either on-site or in the lab (aseptically ) within one month of sampling. Strains of Bacillus subtilis-Bacillus licheniformis were isolated from soil samples according to protocols previously conducted by our lab (see (Connor, Sikorski et al. 2010). As stated in the introduction, the following species and subspecies belonging to the Bacillus subtilis-lichenoiformis were included in the isolations and subsequent analysis; B. vallismortis, B. amyloliquefaciens, B. sonorensis, B. tequilensis, B. atrophaeus, and B. mojavensis; the subspecies B. subtilis subsp. subtilis, B. subtilis subsp. spizizenii, and B. subtilis subsp. inaquosorum. After isolation, strains were stored on AK agar at 20°C.

25 | P a g e

Analysis of soil dimensions

Soil composition was determined by the Water, Soil and Plant Testing Lab at Colorado

State University. Samples were sifted before being analyzed. Not all soil samples collected in

2010 were analyzed for all parameters due to the limited amount of soil collected and the destructive nature of some of the tests. The ANOVAs were performed using STATISTICA

Academic software (StatSoft, Inc. (2013). STATISTICA (data analysis software system), version

12. www.statsoft.com); environmental dimensions were treated as dependent variables and ecotypes were treated as independent (predictive) variables. Tests for all soil dimensions were performed separately.

DNA extraction and sequencing

DNA extractions were performed using the Epicentre DNA Extraction kit following the manufacturer's instructions. Extractions were performed by both the Cohan lab and Alejandro

Rooney's lab at the USDA. Sequencing of samples from this project was done by Alejandro

Rooney's lab on a Sanger platform.

Sequence curation and alignment

Raw data for forward and reverse sequences were aligned and the resulting consensus sequence was checked for quality; this pairwise alignment and quality checking were done using

BioEdit with ClustalW. Sample sequences were required to have quality reads for both the forward and reverse sequences for inclusion in the final dataset, unless otherwise noted. Samples with sequence reads shorter than 600bp were discarded.

DNA sequences were aligned using MEGA5's Muscle alignment algorithm with default parameters. In order to maximize the amount of data that could be used with the constraints of

26 | P a g e the Ecotype Simulation algorithm, the alignment created with Muscle was also edited by hand to increase accuracy of alignment in areas of repeats and uncalled bases. This step greatly increased the number of nucleotide sites available for analysis by Ecotype Simulation.

Sixty sequences included in the alignment were not from strains isolated in this project.

These include sequences of, at most, two strains from every Bacillus ecotype previously identified using Ecotype Simulation (called Reference Strains) (Perry, Koeppel et al. 2006,

Cohan and Koeppel 2008, Koeppel, Perry et al. 2008, Connor, Sikorski et al. 2010, Stefanic,

Decorosi et al. 2012), as well as Species Type Strains from ecotypes whenever possible. The

Bacillus halodurans strain C125 was used as an outgroup. In order to maximize the length of

Type Strain sequences (previously obtained from partial sequencing of a gene), we used the local

BLAST included in BioEdit find the gyrA gene in the genomes of any Type Strains that were recently fully sequenced (the query being the longest gene sequence in our sample set). Thus we were able to extend the length of the gyrA sequence for several Type Strains.

Phylogenetic analysis

The DNA alignment, including the Type and Reference Strains, was analyzed by PhyML, a program using a maximum likelihood approach. Our analysis used parameters assuming the

HKY85 model of nucleotide substitution, an optimized Ts/Tv ratio and parametric bootstrap analysis (further details from the analysis are included in the Appendix).

Identification of Ecotypes by Ecotype Simulation and AdaptML

The DNA alignment was input for analysis by Ecotype Simulation. Parameters were determined using the Ecotype Simulation 2 algorithm . These parameters were used with the original Ecotype Simulation (Cohan and Koeppel 2008, Koeppel, Perry et al. 2008) to test clades

27 | P a g e predicted to be ecotypes, chosen manually with the ML tree as a reference. A likelihood ratio test was used to determine if the clade in question contained members from one or more ecotypes.

Before running the binning program, gaps and ambiguous nucleotides were removed from the sequence alignment with a program that would identify gaps and ambiguous nucleotide; once one of these were found, the program would determine if fewer than 10% of the total sequences in the alignment had a gap at that position. If this was true, the program would then assign the most common nucleotide for that position. If more than 10% of the sequences lacked a definitive nucleotide at that site, then the ambiguous nucleotide was not changed and that position, for all sequences, was ignored in subsequent steps.

The input for the binning program in ES2 was the alignment of 680 sequences of length

828 nt sequences, entered into the file numbers.dat. The PCR error was set to zero, the default; the default number of bin levels was also used (13 bins, ranging in sequence identity from 0.65-

1.0). After the number of sequences in each bin was determined, this was input into the original

Ecotype Simulation algorithm to determine the parameters for sigma, omega, and npop; the rate of ecotype formation, the rate of periodic selection, and the number of ecotypes in the sample set, respectively.

For the actual testing of clades in being consistent with a single ecotype, clades were chosen and analyzed in the following manner: a separate binning analysis was done in the original ES, using only the strains in the focus clade, and when the frequency of strains in each bin was determined, this was input into the acinas program in Ecotype Simulation to determine if the clade was consistent with being one ecotype. For this determination, the sigma and omega

28 | P a g e values for the entire sample set were used as input, and an npop range between 1 and the number of subclades in the clade (or a best guess). The likelihood value at 1.5x precision for an npop of

1 was compared to the highest (or next highest, if npop=1 was associated with the highest value) likelihood value using a log likelihood ratio test. If the null model was accepted by the test the clade in question was considered to be consistent with one ecotype; if the test rejected the null hypothesis, then the clade tested was considered to incorporate more than one putative ecotype.

In the case that the null was rejected, smaller clades within the tested set of samples were tested individually.

Ecotypes were generally limited to having a monophyletic structure, except in cases where many samples stemmed from the basal node. The analysis began at the top of the tree.

When we tested a clade and found that the it was a single ecotype, we tested all of the members of the clade originating from the first clade’s next most common ancestor for inclusion in that ecotype, to ensure that the ecotype was limited to the original clade.

Results

The soil characteristics quantified showed a general trend where levels of elements or soil traits decreased moving west along transects, away and up from the salt pan (Fig. 2). However, some soil dimensions, including pH, the proportion of organic matter in the soil, phosphorus and iron did not follow any clear trend along the transects; no soil dimensions showed a trend of

29 | P a g e increasing levels along the more western end of the transect.

Figure 2. Gradient of soil boron, copper, and electrical conductivity levels in soil along the T sample transect. Levels of elements and electric conductivity were measured and the highest value found in any sample was used to calibrate the levels of other samples as proportions of the highest value. Copper concentrations ranged from 0.11 to 0.8ppm, boron from 0.48 to 63.9ppm, and electrical conductivity ranged from 0.3 to 58.8mmhos/cm.

Sequences from the sample strains, reference strains and type strains were demarcated into thirty-one ecotypes, twenty three in the B. subtilis subclade and nine in the B. licheniformis subclade (Fig.3). In total, there are nine previously unidentified ecotypes (eight of which have more than five strains); these ecotypes are independent of any previously discovered species or subspecies. Some type and reference strains previously put into separate ecotypes were put in a single combined ecotype. Every strain was assigned an ecotype demarcation except for the outgroup (for the full list of strains and their designations, see Appendix.).

30 | P a g e

Four environmental factors, iron, nitrate, pH, and the proportion of clay in the soil were found to be associated with diversification in the full sample set; that is, ecotypes differed in the level or concentration of these factors (Table 1). Several factors tested were found not to be significantly associated with speciation, including EC, manganese, copper, boron, lead, potassium, the proportion of organic matter in the soil, zinc, the proportion of sand, and the proportion of silt in the soil. To reduce the amount of variation and focus on the closest relatives, ecotypes from the subtilis and licheniformis subclades were tested separately for associations. In these tests, ecotypes in the subtilis subclade differed from one another in their associations with nitrate, electric conductivity (salinity), the proportion of clay in the soil, and pH. The ecotypes in the licheniformis subclade were found to be heterogeneous for their associations with manganese. Taking all tests into consideration, copper, boron, lead, potassium, the proportion of organic matter in the soil, zinc, the proportion of sand, and the proportion of silt in the soil were the factors that did not sort out among ecotypes in either subclade.

31 | P a g e

A.

32 | P a g e

B.

33 | P a g e

Figure 3 (page 32 and 33). Phylogenetic and Ecotype Simulation analysis of samples from the Bacillus subtilis-licheniformis clade. The phylogeny is based on an 828 base pair alignment of the gyrA gene sequence using Bacillus halodurans as an outgroup. The two main lineages, A. and B., represent the subtilis and licheniformis subclades, respectively. Each bracket represents a clade consistent with being a single ecotype; ecotypes consisting of only a single strain are not bracketed. Ecotypes including either type or reference strains of named species or previously identified ecotypes are labeled with the representative strain. Previously unidentified ecotypes consisting entirely of strains from this project have the prefix “New PE”.

34 | P a g e

Figure 4. Heterogeneity in associations with iron across ecotypes of the B. subtilis subclade. Bar graphs are depicted as the fraction of samples in each ecotype associated with the different levels of iron in the soil, with 100% being the total number of samples in a given ecotype. The parameters of the high, medium high, medium low and low categories were created by sorting all

35 | P a g e strains by their iron associations and then dividing the strains in order to create four bins with an equal number of strains in each bin.

Figure 5. Ecotype heterogeneity for associations with boron in the B. subtilis subclade. Bar graphs are depicted as the fraction of samples associated with each level of boron, with 100% being the total number of samples in a given ecotype. The parameters of the high, medium high,

36 | P a g e medium low and low categories were created by sorting all strains by their boron associations and then dividing the strains in order to create four bins with an equal number of strains in each bin.

Table 1. Results of significant ANOVA tests on ecological dimension v. all ecotypes. Asterisks indicate that results are significant.

ANOVA tests of association between all samples and environmental dimensions

Soil dimension df F P value Fe 18, 365 1.6704 0.04239* B 18,244 1.6767 0.04401* pH 18, 365 1.6466 0.04699* % clay 18,99 1.7226 0.04753*

Table 2. Results of significant ANOVA tests on ecological dimension v. subtilis ecotypes. Asterisks indicate that results are significant; a dagger indicates that results are marginally significant.

ANOVA tests of association between subtilis samples and environmental dimensions

Soil dimension df F P value B 12, 159 1.6884 0.07381† %clay 12, 63 2.0102 0.03783* pH 13, 227 1.7456 0.05315†

Table 3. Results of significant ANOVA tests on ecological dimension v. licheniformis ecotypes.

ANOVA tests of association between licheniformis samples and environmental dimensions

Soil dimension df F P value

NO3 5, 137 1.9667 0.08741†

Discussion

37 | P a g e

Our analysis demarcated the sample set into twenty one ecotypes in B. subtilis, eight of which (New PE c1,c1a, New PE d1, New PE f1, New PE g1, New PE i1,h1,j1, New PE k1, New

PE n1o1, and New PE q,r,ra,s) were previously unidentified, neither as ecotypes nor taxonomic species. In the B. licheniformis subclade, samples were demarcated into eight clades, one of which was a previously unidentified ecotype, New PE RR-SS; in addition the previously identified PE 4 was split into two separate clades. Our analysis indicates that iron, boron, the proportion of clay in the soil and pH are environmental factors associated with diversification in

Bacillus subtilis and Bacillus licheniformis ecotypes. We found that the ecotypes within B. subtilis were also marginally significantly different in their associations with pH, clay, and marginally significant for boron. Ecotypes in the licheniformis subclade are heterogeneous in their associations with nitrate.

Some of the factors associated with speciation in our entire sample set can all be categorized as resources, or resource-related conditions. For example, organic materials are a component of clay by definition (Guggenheim and Martin 1995), and can interact and dissociate ions in the soil, making them available to bacteria (Bergaya, Theng et al. 2006). Clay, being difficult for water to diffuse into, is also associated with the water activity of the soil. Since the proportion of clay in the soil is a measure encompassing all of these components, its association with speciation might represent ecotypes’ associations with a combination of these factors. Soil pH is a factor that might also be related to a bacterium’s ability to acquire resources. Detritus can be associated with low local pH, which would be advantageous for Bacillus to adapt to or be able to detect. Another characteristic making lower pH a good environment for bacteria is the increase in solubility of cations (Washington State Department of Ecology, 2014), making the ions readily available to cells. Iron is crucial for the functioning of Bacillus subtilis cells; as in

38 | P a g e other organisms, it is a co-factor in some enzymes, and is also necessary for gene regulation

(Hoffmann, Schutz et al. 2002, Ollinger, Song et al. 2006).

In the past, projects exploring the speciation of the Bacillus subtilis-licheniformis clade in contrasting environments have found that the diversification of ecotypes is associated with each of the environmental contrasts, as predicted. One contribution to this finding could be that only one gene was used for the analysis, unlike the three gene concatenation (Connor, Sikorski et al.

2010) used in the past. Although the phylogeny found with the one gene is generally consistent with a concatenation of three or more genes, the resolution for ecotype demarcation isn’t as powerful. In this study we found associations with four of twelve environmental factors, but the experimental design also had an important departure from that of previous studies in that samples were taken from along an environmental gradient, instead of contrasting environments.

Contrasting environments might be characterized by one factor (e.g. solar exposure) but actually correlated with many other factors (eg. temperature, plant growth). Therefore by studying ecotypes’ environmental associations along a multi-faceted geochemical gradient over an area with soil components that vary independent of the gradient, we have the advantage of being able to 1) discern environmental associations with higher resolution with respect to levels of environmental dimensions, as well as 2) better tease out the important environmental factors from those that happen to co-occur with the significant ones.

From the perspective of microbiology, we were expecting that the soil factors representing physiological challenges to microbes would be associated with speciation; that is, that some ecotypes would have evolved higher tolerances for challenging elements such as manganese, zinc, lead, and copper. It is likely that tolerances to the microbial stressors are more ancient than ecotypes and were acquired somewhere earlier in the tree; for example the levels of

39 | P a g e these factors in the soil might be low enough that the general stress response protects the cells

(Petersohn, Brigulla et al. 2001). Adaptations might date back to the split between the B. subtilis and B. licheniformis subclades, characteristic of their divergent metabolisms (Clements, Miller et al. 2002). A similar pattern of progressive adaptations was found in the Prochlorococcus phylogeny with populations sharing adaptations to light, indicating an acquisition by a shared ancestor but differing in temperature adaptations (Johnson, Zinser et al. 2006, Martiny, Tai et al.

2009). In the case of our samples, the last common ancestor to all of the samples might have acquired adaptations to lead, zinc and manganese, but associations with the level of nitrate, iron, boron, pH and the proportion of clay in the soil were acquired by different ecotypes later. Of the factors that were unassociated with speciation, manganese, zinc, and lead can be growth inhibitors at high concentrations, and might require multiple mutations for resistance, or may be difficult to accommodate due to their interactions with numerous enzymes. It’s logical that these adaptations are perquisites for cells to survive, while adaptations to other factors that are less inhibitory to growth (pH, proportion of clay in the soil, iron and nitrate) or unique to certain locations (boron) are more facultative and might only be acquired by some lineages.

Conversely, the acquisition and loss of adaptations to some environmental factors might be happening within ecotypes that have already begun to diverge into new groups. This would result in heterogeneity within ecotypes, which appears to be the case in Prochlorococcus ecotypes with respect to phosphorus (Martiny, Tai et al. 2009). For example, resistance to metals such as copper can be transferred on plasmids (Bruins, Kapil et al. 2000), allowing adaptations to occur quickly. Growth tests support this theory, finding that strains within ecotypes were heterogeneous in their ability to grow in high copper and boron media (see

Chapter 2).

40 | P a g e

Death Valley National Park offers the opportunity to study bacterial adaptations to many parameters in an accessible area. Although it lacks the chemical extremes of some other environments, Death Valley is both the hottest and the driest place in North America, certainly classifying it as an extreme habitat. Deserts are used as examples of the limits of microbial life

(Robinson, Wierzchos et al. 2014) and as analogs for extraterrestrial environments (Des Marais

2001), approximating the low water availability and high solar exposure that might be found on other planets. Our sample site has the advantage of incorporating a natural gradient of geochemical factors in an already stressful environment. By studying this system, we can observe the evolution of bacteria challenged to adapt to multiple environmental stressors simultaneously and gain insight into how bacteria might evolve in a similar system on an extraterrestrial body.

We have found differences in the ecology of our sample ecotypes, but do not claim to have characterized their ecologies completely. There might be other, unmeasured environmental conditions important in the speciation of these closely related ecotypes. Similar to the human gut microbiome (Arumugam, Raes et al. 2011, Muegge, Kuczynski et al. 2011, Wu, Chen et al.

2011, Ding and Schloss 2014), soil ecotypes might be associated with bacterial community types, which our study, at present, is blind to. In addition, our recent genomic work comparing strains within a single ecotype found that carbohydrate resource partitioning is an element in the most recently divergent strains (Kopac and Wang 2014, see Chapter 3); this is in agreement with our finding that iron, clay and pH, all related to resource gathering, are associated with speciation. An effort to sequence the genomes of a subset of the samples from this project is currently underway, which might allow for finer ecotype demarcation as well as ecological insights of strains through gene content.

41 | P a g e

Introduction

Some bacterial species are excellent at exploiting resources that are difficult to metabolize or toxic to other organisms (Hallberg, Gonzalez-Toril et al. , Johnson, Stallwood et al. 2006, Becerra, Lopez-Luna et al. 2009, Morozkina, Slutskaia et al. 2010, Cavicchioli, Amils et al. 2011). Such environments include acid mine drainages (Hallberg, Gonzalez-Toril et al. ,

Duquesne, Lebrun et al. 2003, Hallberg, Coupland et al. 2006, Johnson, Stallwood et al. 2006,

Wang, Bigham et al. 2006, Becerra, Lopez-Luna et al. 2009), oceanic hydrothermal vents (Sogin,

Morrison et al. 2006, Petersen, Zielinski et al. 2011, Connelly, Copley et al. 2012), and in oil slicks (Hazen, Dubinsky et al. 2010, Kostka, Prakash et al. 2011, Gutierrez, Singleton et al.

2013); the microbial diversity in these environments are often low, but the few species that can survive have adapted to the extreme challenges in their habitats (Okibe, Gericke et al. 2003,

Wang, Bigham et al. 2006, Zhou, Bo et al. 2007, Blöthe, Akob et al. 2008, Fernández-Remolar,

Gómez et al. 2008, Becraft, Cohan et al. 2011, Robinson, Wierzchos et al. 2014). Adaptation to environments with factors that act as stressors on diverse groups can be a survival strategy for avoiding competition from other species, which can subsequently lead to diversification. We theorized that two bacterial stressors, boron, and copper would be associated with speciation in the bacterium, Bacillus subtilis, a generalist for carbon resources (Kopac, and Wang et al. 2014,

Meyer, Weidmann et al. 2014) and abiotic environmental factors in the soil (Perry, Koeppel et al.

2006, Sikorski and Nevo 2007, Connor, Sikorski et al. 2010, Maughan and Nicholson 2011).

Extreme environments and their stressors are of interest to astrobiologists, bioprospectors, and microbiologists alike. The common interests of these fields are the

42 | P a g e adaptations allowing some bacteria to withstand extreme conditions while others cannot.

Microbial processing of ores takes advantage of certain species’ natural ability to perform redox reactions to recover metals in mine water runoffs (Johnson 2006, Vera, Schippers et al. 2013) and is commonly called “bioleaching”. One method of bioleaching is through “heap bioleaching”; as the man-made heaps of ore are chemically altered by bacteria via redox reactions, the environment becomes less “extreme” and the bacterial community structure changes (Remonsellez, Galleguillos et al. 2009). Few of the species colonizing the original heap, nor the processes that they perform, are well understood, and it remains a mystery as to why some species are able to survive the low pH and high ion concentrations while other species cannot (Okibe, Gericke et al. 2003, Becerra, Lopez-Luna et al. 2009). The diversity of genetically similar groups of acidophiles isn’t well characterized; no more than a handful of species have been identified from most extremophilic genera and isolates of these often differ in phenotypes, indicating that cryptic diversity might exist (Kelly, McDonald et al. 2000, Kelly and

Wood 2000).

The first microbial cells on earth also persisted in harsh environments until the planet was transformed by multicellular organisms; theories of the origin of the first cell place the event in anoxic conditions, deep in clay sediments below the surface of the ocean, in hydrothermal vents, or otherwise challenging environments (Mix 2006). Being able to accommodate these environments was a prerequisite for early bacteria, and lithotrophic metabolisms are thought to date back to this time (Hegler, Posth et al. 2008). Modern organisms in similar habitats might use metabolic pathways derived from their ancient predecessors to accommodate similar physical and chemical stressors.

43 | P a g e

Many microbial genes have been linked to survival in unfavorable environments. For example, starvation and environmental stressors invoke the global stress response in Bacillus

(Petersohn, Brigulla et al. 2001). The general stress response up-regulates a set of roughly 101 genes controlled by the transcription factor σB (Petersohn, Brigulla et al. 2001), accounting for

20% of the total protein production of the cell, compared to their baseline production of 1% in a non-stressed cell (Bernhardt, Volker et al. 1997). Bacillus relies upon the general stress response for initial protection against environmental stressors such as salinity (Hecker and Volker 2001), but also have stressor-specific responses. For example, resistance to high concentrations of some metal ions can cause the disruption of proteins, DNA, and metabolic processes (Bruins, Kapil et al. 2000, Chillappagari, Seubert et al. 2010), but metal ions can be mediated by an efflux pump that removes the dangerous ions from the cell (Moore and Helmann 2005, Wu, Monchy et al.

2011).

Quite differently, high salinity environments invoke a complex series of responses to quickly deal with the osmotic shock and prepare the cell to adjust to the new environment for an extended time. A Bacillus cell begins its strive for an isosmotic state by preferentially accumulating K+ via the KtrAB and KtrCD membrane uptake systems and producing organic compatible solutes or osmoprotectants -molecules such as proline that appear to increase the water availability in a cell (CAYLEY, LEWIS et al. 1992); without this initial step cells cannot survive the initial osmotic shock (Holtmann, Bakker et al. 2003). The second phase of the cellular response represents a unique transcription profile from the initial response (Steil,

Hoffmann et al. 2003).

Bacteria have been successful in adapting to many environments, due in part to their large population sizes which generate mutations of virtually every locus in the genome (Cohan

44 | P a g e and Koeppel 2008). Unlike animals, bacteria also have the ability to incorporate genes from distant relatives into their genomes (Cohan 1994) (this is more difficult in animals, as the recipient’s genetic background is likely to disrupt the incoming gene’s function). This is not to say that bacteria are able to genetically adapt to any environment; adapting to some environments requires more than one gene or even more than one genetic pathway. As an example, for a bacterial species to survive in a saline environmen