IDENTIFYING ECOLOGICAL DIFFERENTIATION IN MICROBIAL COMMUNITIES ACROSS TAXONOMIC SCALES

BY

NICHOLAS YOUNGBLUT

DISSERTATION

Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Microbiology in the Graduate College of the University of Illinois at Urbana-Champaign, 2014

Urbana, Illinois

Doctoral Committee:

Associate Professor Rachel J. Whitaker, Chair Professor William W. Metcalf Professor Gary J. Olsen Professor James M. Slauch ABSTRACT Microbial diversity, both genetic and phenotypic, is a product of evolutionary and ecological processes interacting at multiple levels of biological organization. High- throughput sequencing is providing a means to resolve the vast amount of genetic diversity in microbial assemblages. However, a current fundamental challenge is associating genetic diversity, whether at the community or population scale, to phenotypic variation that defines ecological roles and dictates how microbial community diversity and ecosystem functioning respond to environmental change. In this body of work, I have used sequencing-based techniques to identify ecological differentiation at taxonomic scales ranging from a whole bacterial community to a single population of methanogenic . At the broadest taxonomic scale, I used a whole-ecosystem disturbance along with 16S rRNA 454-pyrosequencing to identify ecologically relevant distinctions among taxonomic groups defined at various taxonomic scales. The findings showed that bacterial lineages require different taxonomic definitions to capture ecological patterns. At a more refined taxonomic breadth, I used a culture-independent approach to elucidate how methanogen community diversity was distributed within and between a set of freshwater lakes and also identified the ecological processes dictating this spatial distribution of diversity. Finally, at the highest resolution, I employed a comparative genomics approach to identify genomic signals of adaptive evolution in a Methanosarcina mazei population in order to link genetic variation to the ecological processes that define spatial and temporal distributions of methanogen populations. Together, this work has helped to resolve the interdependencies between genetic diversity and ecological processes, which concomitantly act to create and maintain microbial diversity across time and space.

ii TABLE OF CONTENTS

CHAPTER 1: INTRODUCTION ...... 1

1.1 A MICROBIAL WORLD ...... 1

1.2 THE IMPORTANCE OF MICROBIAL DIVERSITY ...... 2

1.3 MICROBIAL COMMUNITY ASSEMBLY ...... 4

1.4 DEFINING UNITS OF DIVERSITY ...... 7

1.5 FRESHWATER LAKES: A MODEL EXPERIMENTAL SYSTEM FOR DETERMINING

ECOLOGICAL DISTINCTIONS AMONG MICROBIAL CLADES ...... 12

1.6 THE GENETIC BASIS OF ECOLOGICAL VARIATION ...... 16

1.7 METHANOSARCINA: A MODEL SYSTEM FOR DETERMINING THE GENETIC BASIS

OF ECOLOGICAL VARIATION IN ARCHAEA ...... 21

1.8 OVERVIEW OF DISSERTATION ...... 25

1.9 REFERENCES ...... 27

CHAPTER 2: LINEAGE-SPECIFIC RESPONSES OF MICROBIAL COMMUNITIES TO A WHOLE-ECOSYSTEM DISTURBANCE ...... 40

2.1 ABSTRACT ...... 40

2.2 INTRODUCTION ...... 41

2.3 EXPERIMENTAL PROCEDURES ...... 43

2.4 RESULTS ...... 48

2.5 DISCUSSION ...... 51

2.6 ACKNOWLEDGEMENTS ...... 55

2.7 REFERENCES ...... 56

2.8 TABLES AND FIGURES ...... 60

CHAPTER 3: DIFFERENTIATION BETWEEN SEDIMENT AND HYPOLIMNION METHANOGEN COMMUNITIES IN HUMIC LAKES ...... 68

3.1 ABSTRACT ...... 68

3.2 INTRODUCTION ...... 69

3.3 EXPERIMENTAL PROCEDURES ...... 70

3.4 RESULTS ...... 74

3.5 DISCUSSION ...... 78

iii 3.6 ACKNOWLEDGEMENTS ...... 83

3.7 REFERENCES ...... 84

3.8 TABLES AND FIGURES ...... 89

CHAPTER 4: GENOMIC DIFFERENTIATION BETWEEN TWO COEXISTING POPULATIONS OF METHANOSARCINA MAZEI ...... 98

4.1 ABSTRACT ...... 98

4.2 INTRODUCTION ...... 99

4.3 EXPERIMENTAL PROCEDURES ...... 100

4.4 RESULTS ...... 104

4.5 DISCUSSION ...... 112

4.6 CONCLUSIONS ...... 117

4.7 ACKNOWLEDGEMENTS ...... 117

4.8 REFERENCES ...... 118

4.9 TABLES AND FIGURES ...... 123

CHAPTER 5: CONCLUDING REMARKS AND FUTURE STUDIES ...... 132

5.1 ASSESSING ECOLOGICAL DIFFERENTIATION AT MULTIPLE SCALES OF

BIOLOGICAL ORGANIZATION ...... 132

5.2 MORE ACCURATE ASSESSMENTS OF ECOLOGICAL DIFFERENTIATION AMONG

MEMBERS OF HIGHLY DIVERSE COMMUNITIES ...... 133

5.3 FURTHER DEFINING THE PROCESSES OF COMMUNITY ASSEMBLY DRIVING

THE SPATIOTEMPORAL DISTRIBUTION OF FRESHWATER METHANOGENS ...... 134

5.4 LINKING GENETIC VARIATION TO ECOLOGICAL DIFFERENTIATION IN

METHANOSARCINA ...... 135

5.5 REFERENCES ...... 138

APPENDIX A ...... 140

A.1 REFERENCES ...... 140

A.2 TABLES AND FIGURES ...... 140

iv APPENDIX B ...... 151

B.1 TABLES AND FIGURES ...... 151

APPENDIX C ...... 159

C.1 TABLES AND FIGURES ...... 159

v CHAPTER 1: INTRODUCTION

1.1 A microbial world Microbial life is fundamental for human and global health. For ~3.7 billion of Earth’s ~4.6 billion year history, microbial life has not only persisted, but also evolved and diversified to populate every conceivable habitat, from hydrothermal vents to the surface of human skin (25,63). and Archaea have become key players in all biogeochemical cycles by engineering a wide array of cellular machinery for conserving energy from most redox reactions thermodynamically favorable for life (41). Much of this vast array of cellular machinery has yet to be discovered and potentially adapted for industrial or clinical purposes. The evolution of these cellular machines has lead to dramatic environmental change at the global scale throughout Earth’s history, and in the process, gave rise to a biosphere suitable for the evolution of multicellular eukaryotes (61). Many plants and animals are completely dependent on microbes for continued survival, and humans are no exception. We rely on microbial life to maintain our health, degrade our sewage and other waste, promote the growth of our crops, and perform a myriad of other activities relevant to our continued economic prosperity and biological survival (90). Bacteria and Archaea have become so fundamental to human and global health in part because of their diversity and shear abundance. Even at local spatial scales, microbial diversity can be staggering, with a gram of soil often estimated to contain thousands of operationally defined ‘’ of Bacteria and Archaea (122). When extrapolating to the global scale, Bacteria and Archaea are estimated to have a total standing cell count of 1030 (162). In regards to taxonomic diversity, microbes dominate the tree of life, while multicellular organisms are relegated to only one section in one of the three domains of life (113,115,166). What is the ecological relevance of this diversity? More specifically, how much of this vast diversity is necessary for the stable maintenance of processes carried out by microbial life? In addition, what ecological and evolutionary forces generate and maintain this diversity? Elucidating the answers to these questions is necessary for

1 developing accurate models of how environmental stimuli alter the diversity of microbial assemblages, which in turn alter the environmental processes mediated by microbial life. To this end, determining the processes that create and maintain microbial community diversity will allow for accurate predictions of how this diverse assemblage will change over spatial and temporal scales in response to future environmental scenarios. In conjunction, defining the relationship between microbial community diversity and functions mediated by microbial life will produce a holistic model of interactions between the environment and microbial assemblages. Considering the multitude of functions mediated by microbial life, this holistic predictive framework will likely be necessary for understanding the full ramifications of natural and anthropogenic environmental change at local and global scales.

1.2 The importance of microbial diversity The current state of ecosystem simulation models highlights the need for a full conceptualization of how microbial diversity affects ecosystem functioning. Ecosystem simulation models are designed to explain and predict how the rates of these ecosystem processes change in response to varying environmental conditions. Currently, most do not incorporate microbial composition, the interactions among microbial taxa, or the effects of environmental stimuli on the community (150). Instead, microbial communities are often treated as a ‘black box’, where input and output are quantified while ignoring the mechanisms causing biochemical transformations of the input that ultimately produce the output (124,146). However, many studies have shown microbial communities from the same general environment (e.g., lakes, gut, or soil) are not functionally equivalent (124,143,152). These studies illustrate that ecosystem processes are not just determined by environmental conditions as assumed by many ecosystem simulation models. On the contrary, microbial community composition can also influence ecosystem process rates. While many studies have found positive associations between community diversity and function, the nature of the relationship can vary substantially (110). For instance, a study assessing organic matter decomposition by saprophytic fungi found a continuous linear diversity-function relationship (i.e., increased community diversity increased decomposition rates), while another study quantifying bacterial community

2 respiration rates from leaf litter decomposition found a diversity-function relationship that was linear at low levels of diversity but leveled off at higher diversity levels (11,148). What could be the cause of such varied findings when studying diversity- function relationships? A possible factor dictating the diversity-function relationship is the degree to which various taxa overlap in their functional role. Functional overlap causes redundancy and may have a large impact on stability of ecosystem processes and diversity-function relationships. This concept of functional redundancy is often assumed to be prevalent in most microbial communities because of their high level of taxonomic diversity relative to the number of perceived functional roles (79). Theory on biodiversity-function relationships can help explain the apparently convoluted association between diversity and function (119). If taxa do not overlap in functional roles no matter how diverse the community, then a continuously linear relationship between function and diversity is predicted. Alternatively, if taxa do overlap in function, as community diversity increases and all unoccupied functional roles become filled, function will no longer increase with increasing diversity. A plot of diversity relative to function (e.g., net methane flux) would be linear at low levels of diversity and flat at higher levels of diversity. In a third scenario, the incorporation of generalist and specialist ecological strategies allows taxa to occupy many or very few functional roles. Thereby, the amount of functional overlap is idiosyncratic and will depend on which specific taxa are added. For instance, functional overlap will be much higher when a community is comprised of mostly generalists relative to one comprised of the same number of specialists. The predicted diversity-function relationship is nonmonotonic (i.e., function can increase and decrease with increasing diversity). According to this theoretical framework, diversity-function relationships can be predicted if the functional breadth of taxa (e.g., generalists and specialists) and the prevalence of functional redundancy in a microbial community can be determined. However, determining this relationship at the microbial level is challenging for a number of reasons. For instance, the vast majority of microbial life has never been cultured, which prevents controlled characterization of ecologically relevant phenotypic characteristics (141). Additionally, the high diversity of most microbial communities coupled with a lack of distinguishing morphological characteristics hinders the

3 identification and monitoring of particular taxa in situ in order to their ecologies. High throughput molecular methods that target a specific phylogenetic maker (e.g., the 16S rRNA gene) have allowed researchers to (nearly) fully sample bacterial and archaeal communities, which is necessary to accurately determine how they change in composition and relative abundance over space and time. However, phylogenetic markers such as the 16S rRNA gene inherently provide little or no insight into function. Additionally, there is currently no coherent framework for identifying a fundamental unit of diversity (i.e., species) that can be used to assess community change (45). Arbitrarily defining taxonomic units in the absence of a species concept can obscure how diversity relates to function because fine taxonomic resolutions will artificially subdivide ecologically coherent clades and thus inflate estimations of functional redundancy. Conversely, course taxonomic resolutions will artificially aggregate clades with divergent ecologies. This question of how to use molecular methods to assess microbial community change in a manner relevant to determining its affect on functions mediated by microbial life is a fundamental hurdle to the progress of the microbial ecology.

1.3 Microbial community assembly In order to open the ‘black box’ of microbial diversity, a better understanding of the ecological and evolutionary forces that define how communities assemble is needed. Community assembly is defined as the process by which taxa colonize and persist in a locality to form a community (60). The key processes to understanding community assembly are dispersal, selection by abiotic parameters or biotic interactions, birth and death rates, and speciation (89). These processes can occur in tandem and at a hierarchy of different spatial and temporal scales. For instance, this hierarchy can include dispersal limitation at large geographic scales, patchy distributions of taxa in a region due to environmental gradients, and local biotic interactions causing the coexistence or exclusion of taxa depending on fitness trade-offs such as predation resistance versus growth rates (102). Dispersal (i.e., the ability of an organism to move from between locations) can impact community assembly. High dispersal rates link community composition and species abundances and evolutionary processes such as local adaptation across localities,

4 whereas low dispersal rates unlink these local processes and allow ecological and evolutionary forces to occur independently in each locality. The argument has been made that the process of dispersal is largely unimportant for microbial community assembly based on the presumption that microbes have a cosmopolitan distribution due to their small size and ability to persist in dormant states (43,86). However, at least some bacteria and archaea appear to be dispersal limited (116,161). Detecting dispersal limitation can depend on the spatial or evolutionary time scale assessed. For instance, Martiny et al. found marine Prochlorococcus diversity to correlate with environmental gradients at course taxonomic levels, while dispersal limitation was only evident at the finest taxonomic scales (94). Besides linking community dynamics, high rates of dispersal can also support a population density higher than the environment could maintain in the absence of regional taxon movement. This ‘mass effects’ phenomenon can occur if an environment with a relatively large carrying capacity (source) provides migrants to a ‘harsh’ habitat where death rates exceed birth rates (sink). At the microbial scale, these source-sink models have regularly been used to explain the evolution of bacterial virulence (18,31,118,139). Taxa that have the ability to disperse from the regional pool into a locality may subsequently face the filtering mechanism of selection by environmental parameters (e.g., pH or temperature). The concept of habitat filtering is based on the premise that taxa in the regional pool or a single locality (if assessing temporal community dynamics) vary in fitness. Local conditions can selectively promote the growth of certain taxa, while others are outcompeted or simply cannot grow in the harsh conditions (102). This concept appears very tractable at the microbial scale, given the metabolic and phenotypic variation among taxonomic groups, and studies do often find environmental gradients to explain much of the spatial or temporal variation among microbial communities or populations (10,73,95,153). Spatially and temporally varying abiotic gradients create a fitness landscape wherein organisms can interact competitively or cooperatively. These biotic interactions create another filter for community assembly in which colonization and persistence of microbial taxa are dependent on a network of interactions that include inter- and intra- specific competition and mutualism. Competitive exclusion theory dictates that species

5 competing for space or resources (i.e., niche) in the same geographic locality cannot stably coexist, thus leading to the local extinction of one species (52). Therefore, competitive exclusion can act as a biotic filter that prevents colonization or persistence of taxa in a locality. Despite the occurrence of competitive exclusion at local scales, competitors could stably coexist at regional scales if they show a trade-off between competitive ability and ability to colonize new patches (56). Biotic filters can extend beyond the same trophic level to include predation and parasitism. At the local scale, predation can lead to extinction unless negative frequency- dependent selection (i.e., taxa at low abundances are selected for) or another stabilizing mechanism occurs (42). Dispersal among patches allows for the reintroduction of both prey and predator in a locality where one or both had gone extinct (64). Regional dispersal of microbial prey (i.e., bacteria and archaea) along with predators (e.g., viruses and protozoans) has been shown to promote regional coexistence (75,138,165). Importantly, rates of dispersal can have dramatic effects on the spatial distribution of both predators and prey (28,75). Diversity mediated by abiotic or biotic interactions assumes that taxa vary in relative fitness, and thus the environmental conditions provide a selective advantage to certain individuals. An alternative hypothesis to environmental selection presumes that all organisms in the same trophic level are equally fit, and thereby nullifying the concept of environmental selection. Instead, ‘neutral’ mechanisms of dispersal, stochastic variation in birth and death rates, and speciation govern community assembly (62). Many studies of both macrobial and microbial communities have found evidence of neutral processes at least partially dictating community assembly (16,39,84,123). Importantly, neutral processes are not necessarily seen as mutually exclusive to selective (deterministic) processes of community assembly. Both have often been shown to partially explain observed patterns of biodiversity, which may be attributable to a hierarchy of deterministic and neutral processes occurring at various geographic, temporal, and taxonomic scales (39,94,95,97). The relative impacts of geographic barriers and environmental heterogeneity on local and regional microbial community diversity is a very active field of research that is being accelerated by next-generation sequencing technologies (26,50,89). Recent studies

6 are unraveling the selective and neutral processes dictating the diversity of bacterial communities distributed regionally across lakes and streams (104), globally across the world’s oceans (50), and among different habitats in and on the human body (25). Although studies of archaeal biogeography have been pivotal to the understanding of microbial community assembly (58,125,161), much less attention has been given to community assembly dynamics of archaeal communities. To better understand the processes of community assembly dictating spatiotemporal distributions of freshwater methanogenic archaea, Milferstedt et al. assessed how methanogen community diversity varied across a set of lakes during the course of a year. The findings indicated that both niche and neutral processes had played significant roles. In Chapter 3, I further resolve the intra- and inter-lake distributions of these methanogen communities and identify how processes of community assembly have maintained this biogeographic pattern. In summary, community assembly is a conceptual framework for understanding how biodiversity changes across spatial and temporal scales and can involve a hierarchy of biotic and abiotic selective filters dictating the colonization and persistence of taxa among patches in a region. In addition, determining the influence of neutral and selective processes in governing community change can provide insight on diversity-function relationships. For instance, if neutral processes predominate, functional redundancy is likely highly prevalent in the community. A caveat of using this conceptual framework is that it is contingent on accurate assessments of community diversity across space and time. However, a species concept at the microbial level is currently lacking, and arbitrarily defined taxonomic units can obscure assessments of both spatiotemporal changes in community diversity and diversity-function relationships.

1.4 Defining units of diversity Quantitative study of community diversity cannot be performed without first defining a unit of diversity to quantify. Boundaries for these units of diversity could be placed anywhere on the continuum of recursively subdividing branches in the tree of life, or instead, be based on traits that are phylogenetically incongruent (e.g., Gram staining). However, if taxonomic boundaries do not coincide with the phenotypic or genetic

7 characteristics necessary to understand evolutionary and ecological processes governing community assembly, then the chosen units of diversity could possibly be misleading to the study of ecology and evolution. Most taxonomic assemblages (e.g., phylum, class, etc.) are considered arbitrary constructs used to imperfectly categorize life. However, the taxonomic level of species is considered a fundamental unit of biological organization, but what aspects make this taxonomic classification any different from courser taxonomic levels (e.g., ) or finer levels (e.g., strain/subspecies)? Many definitions of species exist, but the most widely used is the biological species concept proposed by Ernst Mayr, which defines species as potentially interbreeding populations capable of producing fertile offspring (96). Thus, species boundaries delimit the extent to which any allele can move among genomic backgrounds through genetic recombination (via sexual reproduction). The implications of this boundary are many-fold. For instance, although an allele may confer a stronger fitness advantage in the genomic background of another species, transfer beyond members of the same species is prohibited. In addition, adaptive mutations that arise in separate species may confer additive fitness effects if brought into the same genomic background, but species boundaries prevent this occurrence. Essentially, an allele is trapped within the species in which it arose, and thus its evolutionary fate (e.g., extinction or persistence) is contingent upon its host species. Mayr’s species concept appears unsuited for microbial life, because bacteria and archaea reproduce asexually. In totally clonal organisms, any new alleles that arise are completely restricted to the organism containing the mutation and all of its descendants. Therefore, every new mutation would form a new species according to Mayr’s species concept! Natural biological organization in the form of genetic clusters with similar phenotypes can still arise among clonal organisms simply through the processes of selection coupled with mutation (21). As a first step in this process, mutation in a clonal population leads to genetic diversification in the lineage. Beneficial mutations arise rarely and cause a purge of all members in the lineage co-occupying the niche but lacking the selective advantage (i.e., a selective sweep) (3,54). The resulting clonal population would be devoid of neutral genomic diversity, and all members are equal in fitness, which allows for a new round of neutral mutation accumulation in the population until the next

8 selective sweep. This process of periodic selection limits neutral diversification and thus maintains a genetically cohesive lineage that could be considered a fundamental unit of diversity. While bacteria and archaea do not exchange genetic material during reproduction, they can recombine via the processes of conjugation, transformation, and transduction (46,156). Homologous recombination rates have been shown to decrease log-linearly with increasing genetic divergence among some bacterial strains (92), which should produce genotypic clusters with distinct evolutionary trajectories (i.e., species) (33,155). However, many studies have shown instances of genetic exchange among highly unrelated taxa, even across the domains of life (105). Furthermore, these exchanges are often adaptive and have ecological and evolutionary implications for the recipient (120). Thus, the boundary of genetic exchange fundamental to Mayr’s species concept is not definite among all bacteria and archaea. How then can microbial community diversity be measured and compared while maintaining an agnostic view toward the concept of a microbial species? To further complicate the problem, most microorganisms have never been cultivated and are only described by their 16S rRNA sequence(s), and therefore, genome-wide rates of recombination, mutation, and selection cannot be assessed. As a work-around for these challenges of quantifying microbial diversity, an operational definition of a microbial species has largely been adopted, which is often defined simply by a cutoff of 97% sequence identity of the 16S rRNA gene (65,140). This cutoff was initially chosen because all microbes with ≥70% DNA-DNA hybridization (a traditional, but theoretically dubious species cutoff) also shared 16S rRNA genes of ≥97% similarity (126). Grouping microbial life into operational taxonomic units (OTUs) based on this universally conserved gene and a standard cutoff does at least aid in cross-comparisons of community diversity among studies and has been widely used to provide insight into spatial and temporal distributions of microbial diversity (20,47,87,106). However, besides the arbitrariness of the cutoff, relatedness as assessed by one or a few largely vertically inherited genes does not necessarily reflect the dynamics of recombination, mutation, genetic drift, and selective pressures occurring genome-wide (80). Furthermore, sequence data of phylogenetic markers often provide

9 little to no intrinsic information on an organism’s ecological role in an environment. Therefore, an OTU may contain multiple populations that have diverged by adaptive evolution into multiple niches or through the neutral processes of genetic drift or limited recombination relative to mutation rates. Alternatively, an OTU may be too restrictive to where populations are artificially subdivided. Inaccurately defining units of diversity can lead to erroneous perceptions of spatial and temporal distribution of diversity, the processes maintaining diversity, and how diversity corresponds to function (e.g., the prevalence of functional redundancy). For instance, Jaspers and Overman showed that various freshwater strains of Brevundimonas alba possessing identical 16S rRNA sequences varied considerably in the carbon substrates that could be utilized (67). This metabolic specialization may indicate adaptive evolution of specialists, or it could suggest that the clade has a low level of functional redundancy, which could affect functional stability during environmental disturbance. These potential insights into the relationship between community diversity and function would be missed at courser taxonomic resolutions. As highlighted by the work of Jaspers and Overman, resolving ecologically coherent (i.e., occupying the same niche) clades can create a taxonomic framework that can improve the understanding of microbial community assembly and function. How can molecular tools be best used to distinguish ecologically distinct taxa in a microbial community? Testing hypotheses of niche differentiation can be a straightforward affair with current molecular methods used in microbial ecology. To illustrate, a study could assess the spatial or temporal occurrence or abundance patterns of taxa at different taxonomic resolutions to discern clades with coherent patterns. Correlating abiotic or biotic data to these observed patterns supports hypotheses of ecological differentiation (e.g., sister clades vary in soil pH tolerances). Manipulating environmental parameters in replicated field or laboratory experiments can further test hypotheses substantiated by observational work. Lack of correlations may suggest that neutral processes govern spatiotemporal distributions, or that the niche axes differentiating clades have not been measured. Quantifying microbial diversity by ecologically relevant units instead of relatively arbitrary groupings defined by genetic similarity should expedite the answering of many

10 fundamental questions about microbial diversity and its relationship to function such as i) How prevalent is functional redundancy in microbial communities? ii) How does community diversity affect functional stability and ecosystem process rates? iii) How does habitat heterogeneity lead to biogeographic distributions of microbial taxa? Many studies have employed this methodological framework for investigations of ecological differentiation among microbial taxa along spatial or temporal gradients. For instance, a recent example of resource partitioning at a fine genetic scale was performed on Vibrionaceae isolates obtained from the same seawater samples that were differentially fractionated into what the authors presumed to be varying habitats ranging from free-living (<1 µm size fraction) to zooplankton-associated (>63 µm fraction) (66). The results showed that in most cases, the inferred niche of each isolate was consistent within phylogenetic clusters approximating the diversity of described Vibrio species. However, a notable exception was found among Vibrio splendidus isolates, which showed niche differentiation at much finer taxonomic scales, suggesting rapid adaptive divergence. This fine-scale adaptive divergence of just one defined species illustrates the need to assess ecological differentiation at varying scales of taxonomic resolution. In fact, many studies of spatiotemporal change in microbial community diversity either explicitly or implicitly use multiple taxonomic scales to resolve ecologically relevant variation among clades (100,103,144,151). Disturbance, either natural or anthropogenic, provides a means of discerning ecological distinction among taxa in the relevant context of environmental change. A disturbance can be defined as a causal event that alters selective pressures among community members beyond baseline variation, and thereby leading to altered community or population structure (59). Using disturbance to resolve ecological differentiation among taxa can be accomplished by determining phylogenetic clusters showing similar response to the disturbance, with ‘response’ being any phenotypic trait that exceeds normal amounts of variation as a result of the disturbance (1,132). Various methods can be used to quantify similarity of response, such as correlational methods using Pearson or Spearman correlations of taxa presence-absence or abundance data over the disturbance time series (6,121,169). More complex methods that account for multi- taxa interactions or interactions between taxa and environmental parameters (e.g., nitrate

11 or temperature) are increasingly being utilized (17,127,144). Assessing the distribution of responses provides a quantitative framework for producing dynamic mathematical models of community response and recovery to environmental disturbance (42). In summary, quantifying community change over space and time is necessary to define ecologically relevant scales of microbial diversity, which in turn is essential to understanding the processes of community assembly and the associations between community diversity and function. However, quantifications of community diversity cannot be performed without first defining a unit of diversity. Species are considered a naturally defined taxonomic unit where boundaries delineate taxa with cohesive evolutionary trajectories, but a species concept applicable to all bacteria and archaea is currently lacking (37,38). Culture-independent studies assessing microbial community diversity often use an operational species definition, which could lead to erroneous conclusions of total diversity in a locality or spatiotemporal dynamics among sites. Determining niche partitioning among lineages with molecular data is a pragmatic methodology being employed to resolve ecologically cohesive and distinct clades, and thereby create a taxonomic organization of microbial life that will facilitate the investigation of microbial community assembly and the associations between community diversity and function. Environmental disturbance can be a useful model system for assessing niche partitioning within lineages, and is relevant for the understanding of how functioning of microbial communities may be affected by environmental change.

1.5 Freshwater lakes: a model experimental system for determining ecological distinctions among microbial clades Freshwater lakes contribute significantly to atmospheric carbon flux and are also seen as sentinels of regional and global environmental change (9,164). Bacteria and archaea play a central role in the biologically mediated cycling of most molecules in freshwater systems (36,160), which is why their ecological roles in freshwater systems have been studied extensively (107,137). Freshwater systems provide a tractable experimental system for assessing ecological distinction among microbial clades and testing community assembly theory at the microbial level. In particular, lakes have discrete geographic boundaries, well-characterized trophic levels, and are amenable to

12 experimental manipulation, which is why they have become a major experimental system for developing and testing ecological theory for both macro- and microorganisms. Freshwater lakes display varying degrees of temporal and spatial heterogeneity of nutrients, light, pH, temperature, suspended particles, etc. (160). Competition for these spatially and temporally varying resources provides avenues for niche differentiation among microbes (4,108). For example, Jones et al. found evidence of carbon substrate niche partitioning when assessing freshwater bacterial taxa occurrence patterns among lakes with varying quantities of terrestrially-derived and phytoplankton-derived dissolved organic carbon (DOC) (70). Patterns of taxon occurrence along the focal DOC gradient were robust to defining OTUs at different sequence similarity cutoffs (95% to 99.5%), suggesting that the clades defined at the 95% cutoff were largely coherent in how DOC content governed their spatial distribution. A major driver of intra-lake physiochemical heterogeneity among temperate lakes (i.e., within latitudes spanning the Arctic Circle and the Tropic of Cancer) is seasonal stratification of the water column. Stratification occurs because light is often highly attenuated in deeper depths of the water column, which prevents homogenous heating by the sun along the entire water column (160). This heating gradient will cause the warmer water to float upon the denser and colder water, and thereby producing two distinctly stratified water layers referred to as the epilimnion and hypolimnion for the shallow and deep layers, respectively. The epilimnion is characteristically oxygenated, nutrient limiting, and receives high levels of light, while the hypolimnion is often the antithesis in these characteristics. This heterogeneity within the water column has been shown to be a major driver of bacterial and archaeal community structuring in freshwater lakes (14,114,117,128,131). Therefore, environmental selection induced by gradients formed through stratification can create and maintain ecologically distinct microbial clades. Shade et al. tested this concept by using dialysis tubing to perform a transplant experiment of hypolimnion water moved to the epilimnion and vice versa. The authors showed that some bacterial taxa could persist in both layers and were therefore considered generalists, while others were layer-preferential or layer-specialists (129). Such water layer preferences have been used to help distinguish other closely related bacterial clades inhabiting lakes (68,107).

13 An integral component of water column stratification is its impermanent nature. Water column overturn (i.e., mixing) disrupts established thermal stratifications, which can greatly affect physical and chemical gradients in the water column and has been shown to be major driver of seasonal succession of both macro- and microorganisms (130). Mixing of the epilimnion and hypolimnion can occur seasonally due to declined heating in the winter months or when wind and rain supersedes the thermal energy maintaining stratification. Importantly, climate change has been shown to influence the timing and frequency of mixing events (74,77). Thus, resolving microbial clades by how they respond to overturn events can lead to a taxonomic framework for effectively studying how freshwater microbial community diversity and function respond to environmental change. Many studies have established the importance of water column overturn on structuring freshwater microbial communities and have developed water column overturn into a model system for understanding microbial community response (i.e., resistance) and recovery (i.e., resilience) to naturally occurring disturbances. Most of the focus has been on the bacterial component of microbial communities; however, recent progress has been made toward understanding the impact of water column mixing on freshwater archaea (101). Observational studies of community response to natural disturbances have been instrumental to showing that natural mixing events do drive microbial community structure. For instance, Jones et al. found that typhoon-induced mixing largely homogenized bacterial community diversity throughout the water column of a small lake in Taiwan, although the communities were able to re-stratify over the course of days to weeks, suggesting that the communities were resilient to periodic typhoon disturbances (69). In contrast, Yannarell et al. found communities of cyanobacteria in a hypersaline Bahamian lagoon to remain altered in composition one month after a category-4 hurricane, although nitrogen fixation rates did recover to pre-disturbance levels. Recovery of function without recovery of community composition suggested significant levels of functional redundancy and illustrates how diversity-function relationships can be inferred through measuring response to disturbance. Experimental studies have built on observational data by using replicated artificial disturbances to resolve signal from noise for both community resistance and resilience

14 along with the relative importance of changes in physiochemical parameters caused by mixing (34). For example Weithoff et al. manipulated multiple plastic cylindrical mesocosms (~1 m in diameter and 1.5-15 m in depth) to show that the biomass and composition of phytoplankton and bacteria in eutrophic (i.e., high levels of phosphorous) lakes are strongly affected by mixing through increased nutrient availability (159). Shade et al. adapted this mesocosm design to test for the specific abiotic drivers of mixing on bacterial community structure and found oxygen and nutrient concentrations to be important drivers of community change (133). Some clades showed distinct responses (i.e., changing relative abundances) to treatments, suggesting ecological distinction. However, the molecular fingerprinting methodology employed (ARISA) prevented accurate inferences of phylogenetic relatedness among OTUs. Interestingly, the authors also found the bacterial community composition in all treatments in the experiment to have low resistance (i.e., changed due to treatment) but showed high resilience (i.e., returned back to control communities within 10 days), suggesting that freshwater bacterial communities are robust to mixing disturbance. While artificially partitioned, replicated experimental designs provide statistically sound measurements of ecosystem processes, the findings do not necessarily scale to whole ecosystems (49). Shade et al. tested bacterial community resistance and resilience to a controlled whole-ecosystem disturbance induced by mixing an entire lake during a period of stratification (134). Following bacterial community over time with 454 pyrosequencing of 16S rRNA amplicons revealed that the community did change significantly during the disturbance, but as observed in replicated experimental designs, the community was very resilient and recovered within 11 days. Response did vary considerably among taxonomic groups. In Chapter 2, I assess the coherence of responses among closely related taxa defined at various taxonomic resolutions to discern taxonomic boundaries that partitioned ecologically differentiated taxa. In summary, freshwater lakes provide a tractable experimental system for developing and testing community assembly theory and determining diversity-function relationships at the microbial level. In particular, mixing disturbances are a relevant system for discerning ecological distinctions among closely related taxa, which is necessary for identifying units of diversity that are relevant to understanding how

15 environmental perturbation can alter microbial community diversity, and in the process, potentially affect functions mediated by members of the community.

1.6 The genetic basis of ecological variation From the specific genetic complement of each community member to the network of competitive and mutualistic interactions occurring among populations, a hierarchy of evolutionary and ecological processes dictates how communities change in taxa composition and relative abundances over time and space. Therefore, multiple resolutions of biological organization need to be investigated in order to provide a holistic model of how communities change in both diversity and function. As discussed in the prior two sections of this chapter, community-level approaches provide a ‘top-down’ methodology for reverse engineering biological systems at the complex, but all-inclusive community scale. However, this methodology cannot directly discern how biological organization at the cellular level influences ecological interactions at the community level. Various ‘omics’ data such as genomics, transcriptomics, and proteomics are providing effective methodology for determining the genetic basis of adaptive evolution among microorganisms (15,142). These technological advances are leading to a new synthesis of ecology and genetics, with the former often revealing traits involved in adaptation, and the latter defining the genetic basis of such traits. This synthesis is facilitating the answering of a multitude of questions regarding how processes occurring at multiple levels of organization (e.g., genomic, population, and community/environment) result in distributions of microbial diversity across space and time (30,40,88). These questions include: What are the relative roles of geography and ecology in facilitating genomic differentiation? How do host-mobile element interactions create and maintain genetic or ecological diversity within a population? Do adaptive loci most often arise through polymorphism in conserved core genomic regions or are they acquired in highly variable regions? Answering these questions with robust inferences requires the comparison of many closely related taxa, because genetic signatures of ecological diversification is eroded by recombination, back mutations, genomic rearrangements and gene decay across larger evolutionary distances. Yet, even among very closely related taxa,

16 identifying genomic signatures of adaptive evolution is convoluted, because selection occurs in a background of recombination, genetic drift, and mutation. Combined, these processes can produce a mosaic genomic landscape that obscures signals of diversifying selection. In addition to evolutionary signatures inferred through direct sequence comparison of homologous genomic regions, gene gain and loss is often prevalent in microbial genome dynamics, producing an often expansive variable component in addition to the ‘core’ genome conserved among members in the population (82). Gene gain and loss in microorganisms can occur rapidly over short evolutionary distances, allowing for rapid adaptive evolution relative to polymorphisms in homologous loci (154). These large gene pools within variable genomes can provide a diverse repertoire of genetic components to be combined and utilized for adaptation (98). Both gene acquisition and loss have been implicated in adaptive evolution (23,81). Many studies have found rapid flux of genes seemingly associated with adaptive divergence among populations (29,76,83). Rapid adaptation through gain or loss of gene content has often been found amongst bacterial pathogens, with recently acquired adaptive genes usually confined to ‘pathogenicity islands’ (71). Pathogenicity islands are considered a subclass of DNA elements generally referred to as ‘genomic islands,’ which have been implicated in rapidly providing novel adaptive functions for symbioses, novel metabolic processes, antibiotic resistance, and viral resistance in addition to pathogenicity (71). Genomic islands comprise a number of different mobile elements including conjugative elements, conjugative transposons, integrated plasmids, and some prophages (35). Adaptive evolution through the rapid gain and loss of genomic islands can cause repeated niche invasions among closely related taxa (35). In addition, the rapid and repeatable switching of ecological niches can hinder the identification of irreversibly separated populations (i.e., species) that diverged through niche differentiation, because the divisive mechanism of niche differentiation can be rapidly nullified by convergence on the same niche (163). Gene duplications, transfers, and loss can convolute the evolutionary histories of genes and increases the difficulty of determining the history of gene content dynamics that lead to ecological differentiation among taxa. For instance, the sparse distribution of

17 a gene among members of a population may have resulted simply from a combination of duplication and loss, or alternatively gene transfer may have lead to the presence of the gene in certain taxa. Various computational methods have been used to infer the true evolutionary history of a homologous group of genes, which range from simplistic parsimony criteria of gene gain and loss to more recent advances in modeling gene loss, duplication, and transfer by reconciling the gene tree phylogeny and an inferred organismal or ‘species’ phylogeny (often inferred from a set of orthologous genes shared among all taxa) (5,19,22). Accurate determination of the processes leading to the observed distribution of genes among a population is necessary for determining the association between gene content dynamics and ecological differentiation. Variation in gene content is often implicated in adaptive evolution of microbial populations, which may be a biased finding due to the tractability of associating gene content with phenotypic variation relative to determining how sequence variation has lead to adaptation. Long-term evolution experiments have shown that the accumulation of nucleotide polymorphisms can facilitate adaptation (8,24). Sequence heterogeneity can be extensive within a microbial population in a natural environment, but the extent to which this variation relates to phenotypic variation that lead to adaptive divergence can be difficult to quantify for a number of reasons. For instance, adaptive divergence may occur through a large number of mutations, each with small fitness effects (7). This dispersed signal can create difficulty for significantly distinguishing loci under positive selection from neutral variation with many standard methods in population genetics. In general, direct tests of selection on specific loci often have low power to detect positive selection for a number of reasons. For instance, identifying an overrepresentation of non-synonymous versus synonymous substitutions via calculation of dN/dS can identify adaptive loci (167). However, loci containing both segments under positive selection (e.g., an active site) and segments under purifying selection will likely fail to show a strong signal of positive selection when dN/dS is calculated for the whole gene. An additional hindrance to detecting adaptive loci is the inability to discern signals of selection from other processes such as population demographics. For instance, Tajima’s D is a test of non-neutral evolution that can identify loci under selection by comparing observed allele frequencies versus what is expected in a neutrally evolving population;

18 however, rapid population expansion and contraction will also shift allele frequencies away from neutral expectations (145). A major factor in resolving the process of adaptive genomic differentiation is the strength of selection versus the frequency of recombination (135). In a clonal population, all loci in a genome are linked, and thereby selection that increases an allele’s frequency in the population will also increase the frequencies alleles present in the genome. Conversely, neutrally evolving alleles not in the genomic context of the adaptive allele will decline in frequency in the population and eventually be purged. This process of ‘genetic hitch-hiking’ will rapidly segregate (fix) neutrally evolving loci in a (nearly) clonal population evolving under divergent selection (112). Recombination unlinks loci by introducing neutral polymorphisms into the genome or can introduce the locus under selection into a neutrally evolving genomic background. Through this process, recombination allows neutral diversity to persist, which is antagonistic to the purging of neutral diversity by selection. This process also creates mosaic genomic landscapes with regions under different selective pressures. Therefore, the degree of fixation can vary across the genome, with regions of high fixation indicative of positive selection. Scans of fixation using the fixation index (FST) or other related measures can help narrow the potential regions harboring adaptive loci (99,158). Alternatively, such regions of high fixation could be the result of genetic drift following the establishment of physical or genetic barriers to gene flow (111). Therefore, careful assessment of genome-wide rates of recombination and experimental validation of candidate adaptive loci are required (40). Except for highly clonal microbial lineages (e.g., Bacillus cereus (55)), recombination has likely played a defining role in adaptive evolution and must be taken into account when detangling the genetic basis of ecological differentiation. Closely related bacteria occupying distinct niches can often display very different rates of recombination (55). Clades with specialized ecological strategies generally have lower recombination rates than generalist clades (33). Therefore, propensity to recombine in a lineage may dictate the ecological strategy employed; or conversely, the ecological strategy may in turn alter rates of recombination. Recombination can also facilitate adaption by combining moderately adaptive alleles into the same genome, which can

19 increase selective pressure for the combined adaptive advantage and thus decreases the chance that either allele will be lost due to genetic drift (33). Because the genetic basis for ecological divergence in microbial populations may include many aspects of genome dynamism (e.g., mutation, recombination; gene duplication and loss), studies often investigate many of these aspects to detangle potentially convoluted evolutionary signals of adaptive evolution. For example, a study of 26 Streptococcus genomes utilized methods to detect gene gain and loss, recombination, and selection to reveal that both recombination and selection contributed heavily to adaptive evolution of the clade (83). Some of the taxa showed large amounts of gene gain (>20% of the total gene content in the genome). Moreover, genome-wide scans of dN/dS showed selection pressure to be more pronounced in certain lineages. Using an alternative approach combining comparative proteomics and genomics, Denef et al. found two closely related clades of Leptospirillum to have different spatial distributions in the same ecosystem caused by preferences to colonizing biofilms either early or late in development (30). Assessments of recombination and selection suggested that both contributed to this niche specialization. Interestingly, recent recombination between the early and late colonizers appeared to increase ecological specialization among subclades. In summary, determining the genetic basis for ecological differentiation among taxa can provide insight into how evolutionary processes occurring at the cellular level affect ecological processes occurring at the community level. Multiple levels of genome dynamics (e.g., mutation, recombination, gene duplication and loss) may have played considerable roles in the adaptive evolution of microbial taxa. A full conceptualization of the association between genetic variation and ecological diversification has not been reached, but more studies of microbial populations will aid in determining the generality of associations between genetic and ecological variation. Importantly, focus has been biased toward pathogenic bacteria, while other environmentally, industrially, and clinically relevant clades of bacteria and archaea have been understudied (33,149).

20 1.7 Methanosarcina: a model system for determining the genetic basis of ecological variation in archaea Methanosarcina play a key role in the global carbon cycle through the production of methane from a number of different C1 and C2 compounds. Biogenic methane production, which largely originates from methanogenic archaea, holds implications for the environment as well as industrial applications, given that methane is a greenhouse gas

~25 times more potent than carbon dioxide (kg CH4/kg CO2) and is also a commercially viable biofuel (72,85). Approximately two-thirds of biogenic methane is derived from acetate, although the only methanogens that can metabolize this substrate belong to the Methanosarcinales order (44). In addition, Methanosarcina is the most metabolically versatile of all methanogens, which in part has also made it the most genetically tractable (78). Despite the importance and experimental tractability of Methanosarcina, a conceptualization of how genetic, evolutionary, and ecological processes interact to produce and maintain observed distributions of the genus among various environments is lacking. This conceptual understanding will contribute to the construction of predictive models for effectively detailing how the genus and potentially other methanogen clades will behave under future environmental scenarios in terms of both community composition and functional processes. A step towards this goal is determining the genetic basis for ecological differentiation among members of Methanosarcina in order to link genotypic and phenotypic variation to ecological processes. Many factors make Methanosarcina a technically feasible model system for determining the genetic basis of ecological differentiation. First, isolation of Methanosarcina can be readily accomplished using established cultivation methodology, which is a necessity to obtain complete genomes without the complications and difficulties accompanying metagenomic or single-cell sequencing. Second, the tractability to cultivate and genetically manipulate Methanosarcina strains has produced a wealth of knowledge on the physiology and biochemistry of multiple members in the genus and can be used to directly test hypotheses of relationships between genetic variation and physiology (78). Third, complete genomes of multiple Methanosarcina strains (M. barkeri strain Fusaro, M. acetivorans strain C2A, and M. mazei strain Gö1) are publically available, which have provided a great deal of information on genetic

21 variation within the genus (32,48,91). Fourth, curated metabolic models have been produced from all three published genomes, which can facilitate the linking of genetic and metabolic variation (12,13,51). Associating genetic variation with ecological differentiation through comparative means necessitates that taxa vary by at least one of these two factors. Members of Methanosarcina vary greatly across multiple levels of biological organization, from genetic to physiological to ecological. Isolates have been obtained from various environments throughout the world, which include freshwater, brackish water, marine sediment and kelp, soil, bioreactors, sands, and digestive tracts (44,147), which has resulted in large variations in optimal growth temperature (20°C for M. lacustris Z-7289 to 57°C for M. thermophila CHTI-55) and optimal salinity (<0.1 M for multiple strains to 0.6 M for M. semesiae MD1 & M. siciliae T4/M). Culture morphologies include cocci, pseudosarcinae, and sheathed rods (147). Gas vesicles and flagella are modes of motility for some members, while others are non-motile (44). Polar lipids can contain glucose, galactose, mannose, myo-inositol, ethanolamine, serine, and glycerol, depending upon the strain (147). While Methanosarcina is the most metabolically diverse methanogen genus, not all members can metabolize all substrates utilized by members of the genus. For instance, M. barkeri strain Fursaro can utilize all known methanogen substrates except formate, while M. soligelidi cannot use methyl amines, formate, or dimethyl sulfide, and M. horonobensis sp. nov. cannot use monomethylamine, H2:CO2, or formate (2,136,157).

Interestingly, M. acetivorans strain C2A cannot use H2:CO2 because it does not express the needed hydrogenases, although the genes are present in the genome, suggesting a recent loss of function when the strain specialized on specific substrates (53). Gain or loss of primary metabolic pathways may be a common mode of evolution in the genus. Comparative genomics of M. acetivorans C2A, M. barkeri Fusaro, and M. mazei Gö1 have revealed dynamic genomes that have diverged through multiple evolutionary processes. Genome size varies greatly, from ~4.1 Mb for M. mazei to ~5.9 Mb for M. acetivorans C2A (91). Maeder et al. found low levels of collinearity across all three genomes, which contributed to the hypothesis that i) the M. acetivorans C2A genome expanded due to uniformly distributed gene or operon insertions ii) localized inversions

22 and transpositions caused gene loss in M. barkeri Fursaro iii) the M. mazei Gö1 genome represents an ancestral state (91). This hypotheses was generally supported by a study that modeled archaeal genome expansion and contraction through gene gain and loss (27). Insertion sequences and transposons may contribute to adaptive evolution in many Methanosarcina strains. The genome of M. mazei Gö1 contains many insertion sequences and transposons, which may have facilitated horizontal gene transfer from certain clades of anaerobic bacteria (32). Interestingly, M. barkeri Fursaro upregulated the expression of transposases during exposure to oxygen, suggesting that at least some members of Methanosarcina may commonly use tranposases to facilitate adaptive evolution during environmental stress (168). Many genes of known functional importance are not ubiquitous among Methanosarcina members. For instance, M. barkeri Fursaro uniquely contains an operon predicted to encode the necessary genes for gas vesicle formation. M. barkeri Fursaro also uniquely possesses bacteria-like genes required for N-acetylmuramic acid synthesis and a P450-specific ferredoxin reductase, suggesting horizontal gene transfer from bacteria can be an important source of ecological innovation for members of Methanosarcina (91). Loss and acquisition of plasmids may be an additional source of adaptive loci, given that M. barkeri Fursaro has a 34.6 kb plasmid containing four genes associated with methanochondroitin synthesis and that plasmids are not found in all Methanosarcina spp. (32). CRISPRs (clustered regularly interspaced palindromic repeats) present a possible additional mode of rapid adaptive evolution in Methanosarcina. This mechanism of adaptive immunity against both viruses and plasmids has been shown to rapidly evolve in populations of archaea and maintain diversity in a population through an uneven distribution of immunity against members of the viral community (57). CRISPRs are present in all three published Methanosarcina genomes, with each containing one or two CRISPR subtypes as defined by Makarova et al. (93). Importantly, the same subtypes are not present in all three, indicating CRISPRs have been gained and loss over the course of Methanosarcina evolution (109). Therefore, CRISPRs are likely selected for; otherwise, a pattern of repeated loss would be expected. Nickel et al. used northern blot and differential RNA-seq analysis to show low expression levels of the two CRISPR loci in

23 Methanosarcina mazei except under high salt conditions (109). The authors hypothesized that that DNA damage during high salt stress mimicked viral infection, which upregulated expression of the CRISPR loci. A more complete understanding of how CRISPR evolution is associated with ecological differentiation in Methanosarcina is hindered by the lack of published genomes for members of the genus. Comparison of closely related isolates is required to answer such questions as: What is the rate of CRISPR spacer acquisition in natural populations? Is diversity maintained in Methanosarcina populations by variation in virus immunity and susceptibility among members of the population? A side benefit of continued study of Methanosarcina CRISPR diversity is the potential identification and characterization of Methanosarcina viruses, of which no isolates exist and very little is known. A lack of closely related Methanosarcina genomes has also obstructed the study of how selective pressures and rates of recombination vary across the genome. A population genomics approach using many very closely related isolates (i.e., differing by ~0-1% of nucleotide positions in the core genome) is needed to observe genomic signals of incipient adaptive evolution, which have lead to the large degree of genetic and physiological variation among members of Methanosarcina. In Chapter 4, this question is addressed through a comparative genomics approach applied to a very closely related population of Methanosarcina mazei obtained from the same estuary environment. In summary, Methanosarcina holds potential as a model system for determining evolutionary processes that create genomic variation impacting ecological processes, which in turn, feedback to the genomic level in a recurring process that can lead to irreversible genetic division of a microbial population. Genomic comparisons across only a few model strains in the genus suggest that many evolutionary processes such as changes in gene content have driven adaptive evolution in the genus. Comparative genomic analysis of closely very closely related taxa is needed to identify genomic signatures of incipient ecological differentiation, which will help to resolve the relationships between genetic variation and ecological processes.

24 1.8 Overview of dissertation Molecular surveys of microbial community diversity face the difficulty of organizing high-throughput sequencing data into meaningful groupings for investigations of how the processes of community assembly dictate spatiotemporal distributions of community diversity. Response to environmental disturbance provides a means to group microbial sequence data by a meaningful ecological characteristic and also provides insight into the potential prevalence of functional redundancy in a community. In Chapter 2, I used 16S rRNA amplicon 454 pyrosequencing data obtained from a time course of a whole-ecosystem disturbance in order to investigate the degree to which clades defined at various taxonomic resolutions show incoherent responses to the disturbance. The goal of this methodology was to discern taxonomic boundaries that partition ecologically differentiated taxa. Methanogen communities in the water columns of freshwater lakes have been considered small communities of potentially dormant or dead individuals, with little to no ecological relevance. However, recent studies have challenged this paradigm. Although water column methanogen communities may play significant roles in freshwater ecosystems, little is known of their spatial and temporal distributions of diversity and how such diversity relates to that found in the sediment. Furthermore, a basic understanding is lacking of how methanogen community diversity in the water columns of freshwater lakes is dictated by environmental selection, dispersal, and stochastic demographics. In Chapter 3, I utilized culture independent approaches to determine how methanogen community diversity in the water column and sediment of five lakes varies over space, time, and at different taxonomic scales. Moreover, I determined the processes of community assembly driving the spatiotemporal distribution of diversity. Methanosarcina plays a key role in the global carbon cycle. Yet, knowledge is lacking on how genomic-level variation in Methanosarcina can impact its ecological roles and environmental distributions. Determining the genetic basis of how Methanosarcina taxa have become ecologically differentiated will help link cellular systems-level models and community dynamics caused by environmental change. In Chapter 4, I apply population genomics approaches to the comparative analysis of 56

25 genomes of Methanosarcina mazei isolates obtained from the same geographic region to determine the genetic basis of ecological differentiation within this clade. While varied in content, the chapters of this dissertation share unifying themes. Foremost, each primarily utilized molecular methods to investigate processes that determine how microbial life is distributed across space and time. Because these processes can act at varying levels of biological organization, multiple taxonomic resolutions were employed both within and among studies. A central goal of this body of work was to resolve units of diversity that are relevant to quantifying how environmental change influences microbial community diversity and functional processes. In Chapter 5, I briefly summarize these unifying themes and layout future prospects for each project.

26 1.9 References

1. Allison SD, Martiny JBH. Colloquium Paper: Resistance, resilience, and redundancy in microbial communities. Proc Natl Acad Sci. 2008; 105: 11512– 11519.

2. Anderson KL, Apolinario EE, Sowers KR. Desiccation as a Long-Term Survival Mechanism for the Archaeon Methanosarcina barkeri. Appl Environ Microbiol. 2012; 78: 1473–1479.

3. Atwood KC, Schneider LK, Rryan FJ. Selective Mechanisms in Bacteria. Cold Spring Harb Symp Quant Biol. 1951; 16: 345–355.

4. Auguet J-C, Nomokonova N, Camarero L, Casamayor EO. Seasonal Changes of Freshwater Ammonia-Oxidizing Archaeal Assemblages and Nitrogen Species in Oligotrophic Alpine Lakes. Appl Environ Microbiol. 2011; 77: 1937–1945.

5. Bansal MS, Alm EJ, Kellis M. Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss. Bioinformatics. 2012; 28: i283– i291.

6. Barberán A, Bates ST, Casamayor EO, Fierer N. Using network analysis to explore co-occurrence patterns in soil microbial communities. ISME J. 2011; 6: 343–351.

7. Barrett RDH, Hoekstra HE. Molecular spandrels: tests of adaptation at the genetic level. Nat Rev Genet. 2011; 12: 767–780.

8. Barrick JE, Yu DS, Yoon SH, et al. Genome evolution and adaptation in a long- term experiment with Escherichia coli. Nature. 2009; 461: 1243–1247.

9. Bastviken D, Tranvik LJ, Downing JA, Crill PM, Enrich-Prast A. Freshwater Methane Emissions Offset the Continental Carbon Sink. Science. 2011; 331: 50– 50.

10. Bates ST, Berg-Lyons D, Caporaso JG, Walters WA, Knight R, Fierer N. Examining the global distribution of dominant archaeal populations in soil. ISME J. 2011; 5: 908–917.

11. Bell T, Newman JA, Silverman BW, Turner SL, Lilley AK. The contribution of species richness and composition to bacterial services. Nature. 2005; 436: 1157– 1160.

12. Benedict MN, Gonnerman MC, Metcalf WW, Price ND. Genome-scale metabolic reconstruction and hypothesis testing in the methanogenic archaeon Methanosarcina acetivorans C2A. J Bacteriol. 2012; 194: 855–865.

27 13. Bizukojc M, Dietz D, Sun J, Zeng A-P. Metabolic modelling of syntrophic-like growth of a 1,3-propanediol producer, Clostridium butyricum, and a methanogenic archeon, Methanosarcina mazei, under anaerobic conditions. Bioprocess Biosyst Eng. 2010; 33: 507–523.

14. Boucher D, Jardillier L, Debroas D. Succession of bacterial community composition over two consecutive years in two aquatic systems: a natural lake and a lake-reservoir. FEMS Microbiol Ecol. 2006; 55: 79–97.

15. Brockhurst MA, Colegrave N, Rozen DE. Next-generation sequencing as a tool to study microbial evolution. Mol Ecol. 2011; 20: 972–980.

16. Caruso T, Chan Y, Lacap DC, Lau MC, McKay CP, Pointing SB. Stochastic and deterministic processes interact in the assembly of desert microbial communities on a global scale. ISME J. 2011; 5: 1406–1413.

17. Chaffron S, Rehrauer H, Pernthaler J, Mering C von. A global network of coexisting microbes from environmental and whole-genome sequence data. Genome Res. 2010; 20: 947–959.

18. Chattopadhyay S, Feldgarden M, Weissman SJ, Dykhuizen DE, van Belle G, Sokurenko EV. Haplotype diversity in “source-sink” dynamics of Escherichia coli urovirulence. J Mol Evol. 2007; 64: 204–214.

19. Chen K, Durand D, Farach-Colton M. NOTUNG: a program for dating gene duplications and optimizing gene family trees. J Comput Biol. 2000; 7: 429–447.

20. Chu H, Fierer N, Lauber CL, Caporaso JG, Knight R, Grogan P. Soil bacterial diversity in the Arctic is not fundamentally different from that found in other biomes. Environ Microbiol. 2010; 12: 2998–3006.

21. Cohan FM, Perry EB. A systematics for discovering the fundamental units of bacterial diversity. Curr Biol. 2007; 17: 373–386.

22. Cohen O, Ashkenazy H, Belinky F, Huchon D, Pupko T. GLOOME: gain loss mapping engine. Bioinformatics. 2010; 26: 2914–2915.

23. Coleman ML, Chisholm SW. Ecosystem-specific selection pressures revealed through comparative population genomics. Proc Natl Acad Sci. 2010; 107: 18634– 18639.

24. Conrad TM, Lewis NE, Palsson BØ. Microbial laboratory evolution in the era of genome-scale science. Mol Syst Biol. 2011; 7: 509.

25. Costello EK, Lauber CL, Hamady M, Fierer N, Gordon JI, Knight R. Bacterial Community Variation in Human Body Habitats Across Space and Time. Science. 2009; 326: 1694–1697.

28 26. Costello EK, Stagaman K, Dethlefsen L, Bohannan BJ, Relman DA. The application of ecological theory toward an understanding of the human microbiome. Science. 2012; 336: 1255–1262.

27. Csűrös M, Miklós I. Streamlining and Large Ancestral Genomes in Archaea Inferred with a Phylogenetic Birth-and-Death Model. Mol Biol Evol. 2009; 26: 2087–2095.

28. Declerck SAJ, Winter C, Shurin JB, Suttle CA, Matthews B. Effects of patch connectivity and heterogeneity on metacommunity structure of planktonic bacteria and viruses. ISME J. 2013; 7: 533–542.

29. Den Bakker H, Cummings C, Ferreira V, et al. Comparative genomics of the bacterial genus Listeria: Genome evolution is characterized by limited gene acquisition and limited gene loss. BMC Genomics. 2010; 11: 688.

30. Denef VJ, Kalnejais LH, Mueller RS, et al. Proteogenomic basis for ecological divergence of closely related bacteria in natural acidophilic microbial communities. Proc Natl Acad Sci. 2010; 107: 2383–2390.

31. Dennehy JJ, Friedenberg NA, McBride RC, Holt RD, Turner PE. Experimental evidence that source genetic variation drives pathogen emergence. Proc R Soc B Biol Sci. 2010; 277: 3113–3121.

32. Deppenmeier U, Johann A, Hartsch T, et al. The genome of Methanosarcina mazei: evidence for lateral gene transfer between bacteria and archaea. J Mol Microbiol Biotechnol. 2002; 4: 453–461.

33. Didelot X, Maiden MC. Impact of recombination on bacterial evolution. Trends Microbiol. 2010; 18: 315–322.

34. Diehl S, Berger S, Ptacnik R, Wild A. Phytoplankton, light, and nutrients in a gradient of mixing depths: field experiments. Ecology. 2002; 83: 399–411.

35. Dobrindt U, Hochhut B, Hentschel U, Hacker J. Genomic islands in pathogenic and environmental microorganisms. Nat Rev Microbiol. 2004; 2: 414–424.

36. Dodds WK. Freshwater Ecology: Concepts and Environmental Applications. San Diego: Academic Press; 2002. 592 p.

37. Doolittle WF, Zhaxybayeva O. On the origin of prokaryotic species. Genome Res. 2009; 19: 744–756.

38. Doolittle WF. Population Genomics: How Bacterial Species Form and Why They Don’t Exist. Curr Biol. 2012; 22: R451–R453.

29 39. Dumbrell AJ, Nelson M, Helgason T, Dytham C, Fitter AH. Relative roles of niche and neutral processes in structuring a soil microbial community. ISME J. 2010; 4: 337–345.

40. Ellison CE, Hall C, Kowbel D, et al. Population genomics and local adaptation in wild isolates of a model microbial eukaryote. Proc Natl Acad Sci. 2011; 108: 2831–2836.

41. Falkowski PG, Fenchel T, DeLong EF. The Microbial Engines That Drive Earth’s Biogeochemical Cycles. Science. 2008; 320: 1034–1039.

42. Faust K, Raes J. Microbial interactions: from networks to models. Nat Rev Microbiol. 2012; 10: 538–550.

43. Fenchel T, Finlay BJ. The Ubiquity of Small Species: Patterns of Local and Global Diversity. BioScience. 2004; 54: 777–784.

44. Ferry JG. Methanogenesis: ecology, physiology, biochemistry & genetics. New York: Chapman & Hall; 1993.

45. Fraser C, Alm EJ, Polz MF, Spratt BG, Hanage WP. The Bacterial Species Challenge: Making Sense of Genetic and Ecological Diversity. Science. 2009; 323: 741–746.

46. Fraser C, Hanage WP, Spratt BG. Recombination and the nature of bacterial speciation. Science. 2007; 315: 476–480.

47. Fuhrman JA, Steele JA, Hewson I, et al. A latitudinal diversity gradient in planktonic marine bacteria. Proc Natl Acad Sci. 2008; 105: 7774–7778.

48. Galagan JE, Nusbaum C, Roy A, et al. The Genome of M. acetivorans Reveals Extensive Metabolic and Physiological Diversity. Genome Res. 2002; 12: 532– 542.

49. Gardner RH. Scaling relations in experimental ecology. New York: Columbia University Press; 2001.

50. Ghiglione J-F, Galand PE, Pommier T, et al. Pole-to-pole biogeography of surface and deep marine bacterial communities. Proc Natl Acad Sci. 2012; 109: 17633– 17638.

51. Gonnerman MC, Benedict MN, Feist AM, Metcalf WW, Price ND. Genomically and biochemically accurate metabolic reconstruction of Methanosarcina barkeri Fusaro, iMG746. Biotechnol J. 2013; 8: 1070–1079.

52. Grime JP. Competitive Exclusion in Herbaceous Vegetation. Nature. 1973; 242: 344–347.

30 53. Guss AM, Kulkarni G, Metcalf WW. Differences in Hydrogenase Gene Expression between Methanosarcina acetivorans and Methanosarcina barkeri. J Bacteriol. 2009; 191: 2826–2833.

54. Guttman DS, Dykhuizen DE. Detecting selective sweeps in naturally occurring Escherichia coli. Genetics. 1994; 138: 993–1003.

55. Hanage WP, Fraser C, Spratt BG. The impact of homologous recombination on the generation of diversity in bacteria. J Theor Biol. 2006; 239: 210–219.

56. Hastings A. Disturbance, coexistence, history, and competition for space. Theor Popul Biol. 1980; 18: 363–373.

57. Held NL, Herrera A, Cadillo-Quiroz H, Whitaker RJ. CRISPR Associated Diversity within a Population of Sulfolobus islandicus. PLoS ONE. 2010; 5: e12988.

58. Held NL, Whitaker RJ. Viral biogeography revealed by signatures in Sulfolobus islandicus genomes. Environ Microbiol. 2009; 11: 457–466.

59. Helmus MR, Keller WB, Paterson MJ, Yan ND, Cannon CH, Rusak JA. Communities contain closely related species during ecosystem disturbance. Ecol Lett. 2010; 13: 162–174.

60. HilleRisLambers J, Adler PB, Harpole WS, Levine JM, Mayfield MM. Rethinking Community Assembly through the Lens of Coexistence Theory. Annu Rev Ecol Evol Syst. 2012; 43: 227–248.

61. Holland HD. The oxygenation of the atmosphere and oceans. Philos Trans R Soc B Biol Sci. 2006; 361: 903–915.

62. Hubbell SP. The Unified Neutral Theory of Biodiversity and Biogeography. New York: Princeton University Press; 2001. 448 p.

63. Huber JA, Welch DBM, Morrison HG, et al. Microbial Population Structures in the Deep Marine Biosphere. Science. 2007; 318: 97–100.

64. Huffaker CB. Experimental studies on predation: dispersion factors and predator- prey oscillations. Hilgardia. 1958; 27: 343–383.

65. Hugenholtz P, Goebel BM, Pace NR. Impact of culture-independent studies on the emerging phylogenetic view of bacterial diversity. J Bacteriol. 1998; 180: 4765– 4774.

66. Hunt DE, David LA, Gevers D, Preheim SP, Alm EJ, Polz MF. Resource partitioning and sympatric differentiation among closely related bacterioplankton. Sci N Y NY. 2008; 320: 1081–1085.

31 67. Jaspers E, Overmann J. Ecological Significance of Microdiversity: Identical 16S rRNA Gene Sequences Can Be Found in Bacteria with Highly Divergent Genomes and Ecophysiologies. Appl Env Microbiol. 2004; 70: 4831–4839.

68. Jezbera J, Jezberová J, Brandt U, Hahn MW. Ubiquity of Polynucleobacter necessarius subspecies asymbioticus results from ecological diversification. Environ Microbiol. 2011; 13: 922–931.

69. Jones SE, Chiu CY, Kratz TK, Wu JT, Shade A, McMahon KD. Typhoons initiate predictable change in aquatic bacterial communities. Limnol Oceanogr. 2008; 53: 1319–1326.

70. Jones SE, Newton RJ, McMahon KD. Evidence for structuring of bacterial community composition by organic carbon source in temperate lakes. Environ Microbiol. 2009; 11: 2463–2472.

71. Juhas M, Van Der Meer JR, Gaillard M, Harding RM, Hood DW, Crook DW. Genomic islands: tools of bacterial horizontal gene transfer and evolution. FEMS Microbiol Rev. 2009; 33: 376–393.

72. Karl DM, Beversdorf L, Björkman KM, Church MJ, Martinez A, DeLong EF. Aerobic production of methane in the sea. Nat Geosci. 2008; 1: 473–478.

73. Kent AD, Yannarell AC, Rusak JA, Triplett EW, Mcmahon KD. Synchrony in aquatic microbial community dynamics. ISME J. 2007; 1: 38–47.

74. Kernan MR, Battarbee RW, Moss B. Climate change impacts on freshwater ecosystems. Chichester, West Sussex, UK; Hoboken, NJ: Wiley-Blackwell; 2010.

75. Kerr B, Neuhauser C, Bohannan BJM, Dean AM. Local migration promotes competitive restraint in a host–pathogen “tragedy of the commons.”Nature. 2006; 442: 75–78.

76. Kettler GC, Martiny AC, Huang K, et al. Patterns and implications of gene gain and loss in the evolution of Prochlorococcus. PLoS Genet. 2007; 3: e231.

77. Kirillin G. Modeling the impact of global warming on water temperature and seasonal mixing regimes in small temperate lakes. Boreal Env Res. 2010; 15: 279– 293.

78. Kohler PRA, Metcalf WW. Genetic manipulation of Methanosarcina spp. Front Microbiol. 2012; 3: 259.

79. Konopka A. What is microbial community ecology? ISME J. 2009; 3: 1223–1230.

80. Konstantinidis KT, Tiedje JM. Genomic insights that advance the species definition for prokaryotes. Proc Natl Acad Sci U S A. 2005; 102: 2567–2572.

32 81. Koskiniemi S, Sun S, Berg OG, Andersson DI. Selection-Driven Gene Loss in Bacteria. PLoS Genet. 2012; 8: e1002787.

82. Lapierre P, Gogarten JP. Estimating the size of the bacterial pan-genome. Trends Genet. 2009; 25: 107–110.

83. Lefébure T, Stanhope MJ. Evolution of the core and pan-genome of Streptococcus: positive selection, recombination, and genome composition. Genome Biol. 2007; 8: R71.

84. Leibold MA, McPeek MA. Coexistence of the niche and neutral perspectives in community ecology. Ecology. 2006; 87: 1399–1410.

85. Lelieveld J, Crutzen PJ, Dentener FJ. Changing concentration, lifetime and climate forcing of atmospheric methane. Tellus B. 1998; 50: 128–150.

86. Lennon JT, Jones SE. Microbial seed banks: the ecological and evolutionary implications of dormancy. Nat Rev Microbiol. 2011; 9: 119–130.

87. Ley RE, Peterson DA, Gordon JI. Ecological and evolutionary forces shaping microbial diversity in the human intestine. Cell. 2006; 124: 837–848.

88. Lieberman TD, Michel J-B, Aingaran M, et al. Parallel bacterial evolution within multiple patients identifies candidate pathogenicity genes. Nat Genet. 2011; 43: 1275–1280.

89. Lindström ES, Langenheder S. Local and regional factors influencing bacterial community assembly. Environ Microbiol Rep. 2012; 4: 1–9.

90. Madigan MT, Martinko JM, Stahl DA, Clark DP. Brock biology of microorganisms. 13th ed. San Francisco: Benjamin Cummings; 2010.

91. Maeder DL, Anderson I, Brettin TS, et al. The Methanosarcina barkeri Genome: Comparative Analysis with Methanosarcina acetivorans and Methanosarcina mazei Reveals Extensive Rearrangement within Methanosarcinal Genomes. J Bacteriol. 2006; 188: 7922–7931.

92. Majewski J. Sexual isolation in bacteria. FEMS Microbiol Lett. 2001; 199: 161– 169.

93. Makarova KS, Haft DH, Barrangou R, et al. Evolution and classification of the CRISPR–Cas systems. Nat Rev Microbiol. 2011; 9: 467–477.

94. Martiny AC, Tai APK, Veneziano D, Primeau F, Chisholm SW. Taxonomic resolution, ecotypes and the biogeography of Prochlorococcus. Environ Microbiol. 2008; 11: 823–832.

33 95. Martiny JBH, Eisen JA, Penn K, Allison SD, Horner-Devine MC. Drivers of bacterial β-diversity depend on spatial scale. Proc Natl Acad Sci. 2011; 108: 7850– 7854.

96. Mayr E. Animal species and evolution. Cambridge, Mass: Belknap Press; 1973.

97. McGill BJ, Maurer BA, Weiser MD. Empirical evaluation of neutral theory. Ecology. 2006; 87: 1411–1423.

98. Medini D, Donati C, Tettelin H, Masignani V, Rappuoli R. The microbial pan- genome. Curr Opin Genet Dev. 2005; 15: 589–594.

99. Meirmans PG, Hedrick PW. Assessing population structure: FST and related measures. Mol Ecol Resour. 2011; 11: 5–18.

100. Melendrez MC, Lange RK, Cohan FM, Ward DM. Influence of Molecular Resolution on Sequence-Based Discovery of Ecological Diversity among Synechococcus Populations in an Alkaline Siliceous Hot Spring Microbial Mat. Appl Environ Microbiol. 2011; 77: 1359–1367.

101. Milferstedt K, Youngblut N, Whitaker R. Spatial structure and persistence of methanogen populations in humic bog lakes. ISME. 2010; 4: 764–776.

102. Morin PJ. Community ecology. Chichester, West Sussex; Hoboken, NJ: Wiley; 2011.

103. Morowitz MJ, Denef VJ, Costello EK, et al. Strain-resolved community genomic analysis of gut microbial colonization in a premature infant. Proc Natl Acad Sci U S A. 2011; 108: 1128–1133.

104. Nelson CE, Sadro S, Melack JM. Contrasting the influences of stream inputs and landscape position on bacterioplankton community structure and dissolved organic matter composition in high-elevation lake chains. Limnol Oceanogr. 2009; 54: 1292.

105. Nelson KE, Clayton RA, Gill SR, et al. Evidence for lateral gene transfer between Archaea and Bacteria from genome sequence of Thermotoga maritima. Nature. 1999; 399: 323–329.

106. Nemergut DR, Costello EK, Hamady M, et al. Global patterns in the biogeography of bacterial taxa. Environ Microbiol. 2011; 13: 135–144.

107. Newton RJ, Jones SE, Eiler A, McMahon KD, Bertilsson S. A Guide to the Natural History of Freshwater Lake Bacteria. Microbiol Mol Biol Rev. 2011; 75: 14–49.

108. Newton RJ, Jones SE, Helmus MR, McMahon KD. Phylogenetic Ecology of the Freshwater acI Lineage. Appl Environ Microbiol. 2007; 73: 7169– 7176.

34 109. Nickel L, Weidenbach K, Jäger D, et al. Two CRISPR-Cas systems in Methanosarcina mazei strain Gö1 display common processing features despite belonging to different types I and III. RNA Biol. 2013; 10: 779–791.

110. Nielsen UN, Ayres E, Wall DH, Bardgett RD. Soil biodiversity and carbon cycling: a review and synthesis of studies examining diversity–function relationships. Eur J Soil Sci. 2011; 62: 105–116.

111. Nosil P, Feder JL. Genomic divergence during speciation: causes and consequences. Philos Trans R Soc B Biol Sci. 2012; 367: 332–342.

112. Nosil P, Funk DJ, Ortiz-Barrientos D. Divergent selection and heterogeneous genomic divergence. Mol Ecol. 2009; 18: 375–402.

113. Olsen GJ, Lane DJ, Giovannoni SJ, Pace NR, Stahl DA. Microbial ecology and evolution: a ribosomal RNA approach. Annu Rev Microbiol. 1986; 40: 337–365.

114. Ovreas L, Forney L, Daae FL, Torsvik V. Distribution of bacterioplankton in meromictic Lake Saelenvannet, as determined by denaturing gradient gel electrophoresis of PCR-amplified gene fragments coding for 16S rRNA. Appl Environ Microbiol. 1997; 63: 3367.

115. Pace NR. Mapping the Tree of Life: Progress and Prospects. Microbiol Mol Biol Rev. 2009; 73: 565–576.

116. Papke RT, Ramsing NB, Bateson MM, Ward DM. Geographical isolation in hot spring cyanobacteria. Environ Microbiol. 2003; 5: 650–659.

117. Pernthaler J, Glöckner F-O, Unterholzner S, Alfreider A, Psenner R, Amann R. Seasonal Community and Population Dynamics of Pelagic Bacteria and Archaea in a High Mountain Lake. Appl Environ Microbiol. 1998; 64: 4299–4306.

118. Perron GG, Gonzalez A, Buckling A. Source–sink dynamics shape the evolution of antibiotic resistance and its pleiotropic fitness cost. Proc R Soc B Biol Sci. 2007; 274: 2351–2356.

119. Peterson G, Allen CR, Holling CS. Ecological resilience, biodiversity, and scale. Ecosystems. 1998; 1: 6–18.

120. Polz MF, Alm EJ, Hanage WP. Horizontal gene transfer and the evolution of bacterial and archaeal population structure. Trends Genet. 2013; 29: 170–175.

121. Qin J, Li R, Raes J, et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010; 464: 59–65.

122. Quince C, Curtis TP, Sloan WT. The rational exploration of microbial diversity. ISME J. 2008; 2: 997–1006.

35 123. Ramette A, Tiedje JM. Multiscale responses of microbial life to spatial distance and environmental heterogeneity in a patchy ecosystem. Proc Natl Acad Sci. 2007; 104: 2761–2766.

124. Reed HE, Martiny JBH. Testing the functional significance of microbial composition in natural communities. FEMS Microbiol Ecol. 2007; 62: 161–170.

125. Reno ML, Held NL, Fields CJ, Burke PV, Whitaker RJ. Biogeography of the Sulfolobus islandicus pan-genome. Proc Natl Acad Sci U S A. 2009; 106: 8605– 8610.

126. Rosselló-Mora R, Amann R. The species concept for prokaryotes. FEMS Microbiol Rev. 2001; 25: 39–67.

127. Ruan Q, Dutta D, Schwalbach MS, Steele JA, Fuhrman JA, Sun F. Local similarity analysis reveals unique associations among marine bacterioplankton species and environmental factors. Bioinforma Oxf Engl. 2006; 22: 2532–2538.

128. Selig U, Hübener T, Heerkloss R, Schubert H. Vertical gradient of nutrients in two dimictic lakes – influence of phototrophic sulfur bacteria on nutrient balance. Aquat Sci. 2004; 66: 247–256.

129. Shade A, Chiu C-Y, McMahon KD. Differential bacterial dynamics promote emergent community robustness to lake mixing: an epilimnion to hypolimnion transplant experiment. Environ Microbiol. 2010; 12: 455–466.

130. Shade A, Chiu C-Y, McMahon KD. Seasonal and Episodic Lake Mixing Stimulate Differential Planktonic Bacterial Dynamics. Microb Ecol. 2010; 59: 546–554.

131. Shade A, Jones SE, McMahon KD. The influence of habitat heterogeneity on freshwater bacterial community composition and dynamics. Environ Microbiol. 2008; 10: 1057–1067.

132. Shade A, Peter H, Allison SD, et al. Fundamentals of microbial community resistance and resilience. Front Aquat Microbiol. 2012; 3: 417.

133. Shade A, Read JS, Welkie DG, Kratz TK, Wu CH, McMahon KD. Resistance, resilience and recovery: aquatic bacterial dynamics after water column disturbance. Environ Microbiol. 2011; 13: 2752–2767.

134. Shade A, Read JS, Youngblut ND, et al. Lake microbial communities are resilient after a whole-ecosystem disturbance. ISME J. 2012; 6: 2153–2167.

135. Shapiro BJ, David LA, Friedman J, Alm EJ. Looking for Darwin’s footprints in the microbial world. Trends Microbiol. 2009; 17: 196–204.

36 136. Shimizu S, Upadhye R, Ishijima Y, Naganuma T. Methanosarcina horonobensis sp. nov., a methanogenic archaeon isolated from a deep subsurface Miocene formation. Int J Syst Evol Microbiol. 2011; 61: 2503–2507.

137. Sigee DC. Freshwater microbiology biodiversity and dynamic interactions of microorganisms in the aquatic environment. Chichester: Wiley; 2005. 544 p.

138. Snyder JC, Wiedenheft B, Lavin M, et al. Virus movement maintains local virus population diversity. Proc Natl Acad Sci. 2007; 104: 19102–19107.

139. Sokurenko EV, Gomulkiewicz R, Dykhuizen DE. Source–sink dynamics of virulence evolution. Nat Rev Microbiol. 2006; 4: 548–555.

140. Stackebrandt E, Goebel BM. Taxonomic note: a place for DNA-DNA reassociation and 16S rRNA sequence analysis in the present species definition in bacteriology. Int J Syst Bacteriol. 1994; 44: 846–849.

141. Staley JT, Konopka A. Measurement of in Situ Activities of Nonphotosynthetic Microorganisms in Aquatic and Terrestrial Habitats. Annu Rev Microbiol. 1985; 39: 321–346.

142. Stapley J, Reger J, Feulner PGD, et al. Adaptation genomics: the next generation. Trends Ecol Evol. 2010; 25: 705–712.

143. Strickland MS, Lauber C, Fierer N, Bradford MA. Testing the functional significance of microbial community composition. Ecology. 2009; 90: 441–451.

144. Sun Y, Cai Y, Mai V, et al. Advanced computational algorithms for microbial community analysis using massive 16S rRNA sequence data. Nucleic Acids Res. 2010; 38: e205–e205.

145. Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989; 123: 585–595.

146. Tiedje JM, Asuming-Brempong S, Nüsslein K, Marsh TL, Flynn SJ. Opening the black box of soil microbial diversity. Appl Soil Ecol. 1999; 13: 109–122.

147. Timmis KN, Mcgenity TJ, Meer JR, Lorenzo V. Handbook of hydrocarbon and lipid microbiology. Berlin: Springer; 2010.

148. Tiunov AV, Scheu S. Facilitative interactions rather than resource partitioning drive diversity-functioning relationships in laboratory fungal communities. Ecol Lett. 2005; 8: 618–625.

149. Toft C, Andersson SGE. Evolutionary microbial genomics: insights into bacterial host adaptation. Nat Rev Genet. 2010; 11: 465–475.

37 150. Treseder KK, Balser TC, Bradford MA, et al. Integrating microbial ecology into ecosystem models: challenges and priorities. Biogeochemistry. 2012; 109: 7–18.

151. Turnbaugh PJ, Hamady M, Yatsunenko T, et al. A core gut microbiome in obese and lean twins. Nature. 2009; 457: 480–484.

152. Turnbaugh PJ, Ley RE, Mahowald MA, Magrini V, Mardis ER, Gordon JI. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature. 2006; 444: 1027–1031.

153. Van der Gucht K, Cottenie K, Muylaert K, et al. The power of species sorting: Local factors drive bacterial community composition over a wide range of spatial scales. Proc Natl Acad Sci. 2007; 104: 20404–20409.

154. Van Passel MW, Marri PR, Ochman H. The emergence and fate of horizontally acquired genes in Escherichia coli. PLoS Comput Biol. 2008; 4: e1000059.

155. Vos M, Didelot X. A comparison of homologous recombination rates in bacteria and archaea. ISME J. 2009; 3: 199–208.

156. Vos M. Why do bacteria engage in homologous recombination? Trends Microbiol. 2009; 17: 226–232.

157. Wagner D, Schirmack J, Ganzert L, Morozova D, Mangelsdorf K. Methanosarcina soligelidi sp. nov., a desiccation- and freeze-thaw-resistant methanogenic archaeon from a Siberian permafrost-affected soil. Int J Syst Evol Microbiol. 2013; 63: 2986–2991.

158. Weir BS, Cockerham CC. Estimating F-Statistics for the Analysis of Population Structure. Evolution. 1984; 38: 1–14.

159. Weithoff G, Lorke A, Walz N. Effects of water-column mixing on bacteria, phytoplankton, and rotifers under different levels of herbivory in a shallow eutrophic lake. Oecologia. 2000; 125: 91–100.

160. Wetzel RG. Limnology: lake and river ecosystems. 3rd ed. San Diego: Academic Press; 2001. 1006 p.

161. Whitaker RJ, Grogan DW, Taylor JW. Geographic Barriers Isolate Endemic Populations of Hyperthermophilic Archaea. Science. 2003; 301: 976–978.

162. Whitman WB, Coleman DC, Wiebe WJ. Prokaryotes: the unseen majority. Proc Natl Acad Sci U S A. 1998; 95: 6578–6583.

163. Wiedenbeck J, Cohan FM. Origins of bacterial diversity through horizontal genetic transfer and adaptation to new ecological niches. FEMS Microbiol Rev. 2011; 35: 957–976.

38 164. Williamson CE, Dodds W, Kratz TK, Palmer MA. Lakes and streams as sentinels of environmental change in terrestrial and atmospheric processes. Front Ecol Environ. 2008; 6: 247–254.

165. Winter C, Matthews B, Suttle CA. Effects of environmental variation and spatial distance on Bacteria, Archaea and viruses in sub-polar and arctic waters. ISME J. 2013; 7: 1507–1518.

166. Woese CR, Fox GE. Phylogenetic structure of the prokaryotic domain: The primary kingdoms. Proc Natl Acad Sci. 1977; 74: 5088–5090.

167. Yang Z, Bielawski JP. Statistical methods for detecting molecular adaptation. Trends Ecol Evol. 2000; 15: 496–503.

168. Zhang W, Culley DE, Nie L, Brockman FJ. DNA microarray analysis of anaerobic Methanosarcina barkeri reveals responses to heat shock and air exposure. J Ind Microbiol Biotechnol. 2006; 33: 784–790.

169. Zhou J, Deng Y, Luo F, He Z, Tu Q, Zhi X. Functional Molecular Ecological Networks. mBio. 2010; 1: e00169–10.

39 CHAPTER 2: LINEAGE-SPECIFIC RESPONSES OF MICROBIAL COMMUNITIES TO A WHOLE-ECOSYSTEM DISTURBANCE1,2

2.1 Abstract A great challenge facing microbial ecology is how to define ecologically relevant taxonomic units. To address this challenge, we investigated how changing the definition of operational taxonomic units (OTUs) influences the perception of ecological patterns in microbial communities as they respond to a dramatic environmental change. We used pyrosequenced tags of the bacterial V2 16S rRNA region, as well as clone libraries constructed from the cytochrome oxidase C gene ccoN to provide additional taxonomic resolution for the common freshwater genus Polynucleobacter. At the most highly resolved taxonomic scale, we show that distinct genotypes associated with the abundant Polynucleobacter lineages exhibit divergent spatial patterns and dramatic changes over time, while the, also abundant, Actinobacteria OTUs are highly coherent. This clearly demonstrates that different bacterial lineages demand different taxonomic definitions to capture ecological patterns. Based on the temporal distribution of highly resolved taxa in the hypolimnion, we demonstrate that change in the population structure of a single genotype can provide additional insight into the mechanisms of community-level responses. These results highlight the importance and feasibility of examining ecological change in microbial communities across taxonomic scales while also providing valuable insight into the ecological characteristics of ecologically coherent groups in this system.

1 This chapter appeared in its entirety in the journal: Applied and Environmental Microbiology. This article is reprinted with the permission of the publisher and is available from http://www.ncbi.nlm.nih.gov/pubmed with the PMID: 23064335. 2 Nicholas D. Youngblut and Dr. Rachel J. Whitaker conceived the experiments, analyzed the data, and wrote the paper. Nicholas D. Youngblut created the figures and tables. Jordan S. Read, Dr. Ashley Shade, and Dr. Katherine D. McMahon provided geochemical and nucleotide sequence data and also contributed to the data analysis.

40 2.2 Introduction Essential measures of community change assess the relative abundances and distributions of species through space and time (5). This poses difficulty for microbial ecologists, since quantification of community change has been hindered by both methodological limitations of sampling the vast richness of microbial communities and the lack of a species definition that identifies distinctive and unique ecological units (8,9,31). The sampling challenge may soon be overcome by next generation sequencing of the conserved 16S rRNA gene (2,4,7). However, microbiologists are still faced with defining distinct units of diversity that are ecologically comparable to macrobial species. To define units of diversity, microbial ecologists rely on clustering of 16S rRNA sequences into operational taxonomic units (OTUs). Ecological studies of microbial communities from a diversity of environmental systems ranging from the human microbiome (7) to deep sea hydrothermal vents (18) typically classify OTUs using a single definition, most commonly where they share >97% sequence identity (21). However, it has become increasingly clear that using different taxonomic designations of OTUs can alter the perception of spatial or temporal community change and that important ecological dynamics are occurring among taxa distinguished at a higher resolution. For instance, changing 16S rRNA OTU designations from a wide variety of habitats uncovered different biogeographical patterns at different taxonomic scales (27). A single sequence similarity cutoff that is too low could falsely lump individuals that may respond differently to environmental stimuli, while a very high sequence similarity cutoff for producing OTUs could split taxa with false boundaries, resulting in the appearance of ecologically redundant taxa. The variation in ecology and evolutionary history of different lineages will produce ecological distinction at different taxonomic resolutions, which means that choosing a single OTU definition for any study will influence results in ways that cannot be anticipated a priori. Examining the ways in which microbial communities respond to environmental change at different taxonomic scales will provide insight into the level of divergence at which microbial taxa are ecologically distinct or redundant (24). In particular, a dramatic environmental change such as ecological disturbance provides a useful tool to examine microbial response to environmental change (1). Similar responses to disturbance

41 indicate ecological coherence and possible redundancy while divergent responses indicate distinctiveness among taxa. This approach may provide insight for yet- uncultivable microorganisms, especially those for which we have limited understanding of their ecological roles or metabolic capabilities. Freshwater lakes have become model systems for studying the affects of disturbance on microbial communities (20,26,40). It has been demonstrated that overturn, or mixing, of the stratified water column due to seasonal changes in temperature (26,38– 40) or sporadic disturbances due to storm activity (20) overturns the stratified water column and creates a dramatic disturbance for the microbial community. Water column mixing destroys the physical, chemical, and biological gradients present when the water column is stratified into two thermal layers (45). The microbial communities in freshwater lakes are complex and diverse but tend to be dominated by ubiquitous and relatively uncharacterized groups of Actinobacteria and the Betaproteobacteria genus Polynucleobacter. These two groups appear to occupy different niches in lakes (28), and exhibit sharply contrasting seasonal dynamics (30,47). Both groups are known to circumscribe distinct clades with unique distribution patterns (13,29). The differences in levels of microdiversity within these groups or in the ways that these groups respond to ecological disturbance has not been previously examined. Here, we focus on an experimental whole-ecosystem disturbance in which a stratified water column of a dystrophic temperate lake was mixed mechanically (35). In a previous study on this system, the 16S rRNA gene was used to profile bacterial community composition with OTU designations at 97% 16S identity and phylotyping (i.e., using a sequence dataset with a pre-defined for classifying sequences) (41). These community-level analyses revealed changes in richness, taxonomic composition, and relative abundance of taxa before, during, and after the mixing event. The communities were ultimately resilient, recovering within twenty days. Here, we asked how altering the taxonomic resolution applied to the sequence dataset uncovers new ecological patterns of microbial response to this dramatic whole ecosystem disturbance focusing on comparisons between the two most prevalent lineages of Actinobacteria and Polynucleobacter.

42 2.3 Experimental procedures Artificial lake mixing From July 3 to July 10 2008, the highly stratified North Sparling Bog was mechanically mixed through buoyant forces (35). The changes in chemical conditions and aggregate measures of the bacterial community over the course of artificial lake mixing are described in detail elsewhere (41). In brief, the water column was de-stratified over the course of eight days, changing the hypolimnion temperature from 6°C to 20°C. This large, rapid increase in temperature is assumed to be unprecedented in the entire history of the lake, and the temperature remained elevated until the water column became isothermal in early October 2008. The epilimnion cooled from 24.2°C to 20.3°C and returned to normal lake conditions five days after the artificial mixing event on July 10. The dissolved oxygen (DO) concentration in the hypolimnion changed from below detection to near 3 mg/L, and the epilimnion decreased from 7.0 to 3.2 mg/L. DO concentrations within the hypolimnion returned to below detection 3 days after mixing stopped, while the DO in the epilimnion did not rebound until July 15. De-stratification altered concentrations of iron, sulfur, nitrogen, carbon dioxide, and methane within the water column – all of which rebounded within twenty days of mixing (41). Twelve samples were collected from two water layers, the epilimion (0 m) and hypolimnion (4 m) over six time points through the course of the disturbance experiment: before the mixing treatment began (“-9”), immediately after de-stratification was achieved (“Mix”), and then at four post-mixing time points at 3, 7, 11, and 20 days (41). Water samples were collected on a 0.2 µm Supor-200 nylon membrane filters (Pall Life Sciences, Ann Arbor, MI, USA) using a peristaltic pump that was rinsed between samples.

Amplification and sequencing DNA extraction from the filters was performed with the FastPrep Biogene kit (MP Biomedials, Solon, OH, USA). Pyrosequencing was performed using previously described methods (7,41). Briefly, the V2 region of the 16S rRNA gene was amplified by PCR using the primers 27F and 338R with attached multiplex, error-correcting barcodes

43 (16). Pyrosequencing was conducted using primer A on a 454 Life Science Genome Sequencer FLX instrument (Roche). In order to resolve the genetic diversity of the genus Polynucleobacter beyond 16S rRNA level, we used the protein-encoding gene cytochrome C oxidase, cbb3-type, subunit I (ccoN). DNA was extracted from filtered integrated water column samples using the MO BIO UltraClean Fecal DNA Kit (Mo Bio Laboratories Inc., Carlsbad, CA, USA). The primers Pnuc0453F (5’-CAGYCAATTTGCCATCGTTAC-3’) and Pnuc0453R (5’-GTCATGATGCCGTTGATC-3’) were designed using the genome sequence of Polynucleobacter necessarius subsp. necessarius STIR1 (NC_010531) and Polynucleobacter necessarius subsp. asymbioticus QLW-P1DMWA-1 (NC_009379). The gene fragment was amplified using a PCR with a final volume of 30 µl containing the final concentrations of 1.5 mM MgCl2, 0.2 mM dNTPs (each), 0.4 µM primers (each), and 0.05 U of Phusion High-Fidelity DNA polymerase F-530 (Finnzymes, MA, USA). Thermocycler conditions consisted of an initial denaturation for 3 minutes at 98°C, followed by 30 cycles of 30 seconds at 98°C, 25 seconds at 58.1°C, and 25 seconds at 72°C, with a final extension of 10 minutes at 72°C. PCR products were cloned using the Zero Blunt TOPO PCR for Sequencing Kit following the standard kit protocol (Invitrogen, Carlsbad, CA, USA). All clones were submitted to the WM Keck Center for Comparative and Functional Genomics at the University of Illinois for Sanger sequencing of both ends of the vector insert.

Sequence analysis Pyrosequencing reads were quality controlled based on quality scores, sequence length, primer mismatches and length of homopolymers using QIIME v1.1.0 (3). The filtered sequence dataset was aligned with a Needleman-Wunsch pairwise alignment against a reconstruction of the SILVA SEED database as implemented in the software Mothur v1.15.0 (36). A random subsample of 1217 total reads for each sample was selected for further analysis in order to equalize the number of quality reads in each sample. No chimeras were found using the chimera.slayer command in Mothur v1.19.0 with the SILVA Gold reference dataset (12). We identified <0.5% of the 16S rRNA dataset to be potential chimeras with Perseus (as implemented in Mothur), a database-

44 independent method to detect chimeras (33). Both the ccoN and 16S rRNA datasets were hierarchically clustered in Mothur using the average-neighbor clustering algorithm. Sequence identity cutoffs for defining OTUs were chosen by comparing the diversity among all samples using the Bray-Curtis index (i.e., temporal and spatial beta diversity), and then comparing how these Bray-Curtis values changed based on the cutoff used, with the largest changes between cutoffs chosen as the taxonomic levels for our analyses (Figure A.1, and Table A.1). Only OTUs with ≥50 reads in the dataset were used for all analyses (except number of OTUs in Figure 2.4) in order to reduce the number of zeros (i.e., absences) in the dataset, which can cause false correlations. This cutoff was chosen as a compromise between the number of absences in the dataset and the number of OTUs (quantified at the 99.5% cutoff) removed from the dataset (Figure A.2). The number of OTUs assigned to each taxonomic level is listed in (Table A.2). 16S rRNA reads were classified with the classify.seqs command in Mothur, which utilized a naïve Bayesian classifier based on the reconstructed SILVA SEED database with SILVA classifications as a training dataset for classifying sequences from the phylum to genus level. OTUs were assigned a taxonomic classification where 95% of an OTU’s reads could be assigned to one classification. Otherwise, OTUs were labeled as ‘*’ if they contained a mixture of classifications or were not labeled if the majority of reads could not be classified with confidence (bootstrap of >50). A detailed comparison of overlap between each taxonomic classification and OTU shows that the groups most often corresponded (Figure A.3) although some were inconsistent (usually classifications were split into multiple hierarchical clusters). In addition, OTUs were classified using the new freshwater taxonomy (27). Figures based on this classification scheme are shown in Figures A.4 & A.5. ccoN sequences were assembled into contiguous fragments and manually checked for sequencing errors using Sequencher v4.7 (Gene Codes Corporation, Ann Arbor, MI, USA). The resulting 351 sequences had an average length of 788 nucleotides and were manually aligned with MacClade v4.08 (22). All 77 unique ccoN gene sequence types are available at GenBank under Accession numbers JX944304 to JX944380.

45 Accounting for possible PCR and sequencing errors We took care to ensure that sequence processing did not bias our results at the finest level of taxonomic resolution (99.5%). We found very few potential chimeras in the 16S rRNA dataset with with Chimera Slayer and Perseus (implemented in Mothur) (12,36). Furthermore, we did not perform any analyses that relied on rare divergent subtypes. We found 87 of 95 OTUs to be present in multiple samples, which suggested little to no signal of chimeras in the dataset if they occurred. A second concern is that errors introduced into the sequence by DNA polymerase that may be amplified depending upon when they occurred. The 99.5% cutoff allowed for ~1 SNP error since the median fragment length of the 16S rRNA gene dataset was 202nt (min: 166 bp, max: 227 bp) and the average-neighbor clustering algorithm employed has been shown to more accurately produce clusters that are reflective of genetic distance than other clustering algorithms. This algorithm reduced the number of singleton or doubleton taxa that varied from other OTUs by 1 SNP to <1% of OTUs (37). Thus, any erroneous OTUs should be aggregated with a ‘true’ OTU. Also, PCR errors should occur randomly among samples, but most OTUs were present in multiple samples, which suggests little to no influence of PCR error in their distribution. Finally, the majority of our analyses focused on change in abundance among taxa and relied on an abundance cutoff of ≥50 total reads in the dataset, which greatly reduced the prevalence of less abundant, and potentially erroneous reads. Two additional types of errors result from the sequencing process. The first occurs from errors, again by the DNA polymerase, but this time during the bead amplification. This is unlikely to influence our results again due to the clustering algorithm and abundance cutoffs used throughout our analysis. The second results from changes in abundance due to pseudoamplification of sequences during the bead amplification process (11). Sanger sequencing of a different phylogenetic marker (ccoN) confirmed that changes in dynamics of a single lineage did not result from pseudoamplification.

46 Statistics Patterns of change in abundance among taxa were compared using Spearman partial rank-order correlations while conditioning for depth (0 m or 4 m) or time (rank- order of time points). Significance values for correlations were adjusted using the false discovery rate (Q-value) to account for multiple hypothesis testing (43). We defined strong, significant correlations as |ρ| > 0.6, Q-value < 0.05 for comparing temporal and spatial changes of taxon abundance. Partial correlations were calculated using the Stats packages in R. Correlations in abundance among taxa were visualized with Cytoscape (42). Collector’s curves were produced based on the observed richness with the collect.single command in Mothur. The scripts used to compare the hierarchical clustering and phylotyping methods at each taxonomic scale (Figures A.3 & A.5) are available at http://www.life.illinois.edu/whitaker/whitaker_lab.html. Heat maps, bar plots, and line plots were created in R with the Heatplus, APE, and ggplot2 packages (32,34,46).

Selecting sequence identity cutoffs for defining OTUs To select sequence identity cutoffs for defining OTUs, we first calculated all pairwise Bray-Curtis values among all samples (six time points and two water depths) using OTUs defined at a particular sequence identity cutoff. This procedure was repeated at all possible sequence identity cutoffs. We compared the Bray-Curtis values obtained at each sequence identity cutoff (i.e., all Bray-Curtis values calculated using OTUs defined at one cutoff versus all Bray-Curtis values calculated using OTUs defined at another cutoff) by performing a Mantel test for each pairwise comparison of sequence identity cutoffs. The smallest Mantel values correspond to the largest change in beta diversity (i.e., Bray-Curtis values) when sliding the taxonomic scale. These pairwise comparisons of all sequence identity cutoffs were visualized using non-metric multidimensional scaling (NMDS) (Figure A.1). We found certain ranges of sequence identity cutoffs (e.g., 99%-93%) to cluster, suggesting that the largest changes in beta-diversity occurred at certain incremental increases in taxonomic resolution. The sequence identity cutoffs: 75%, 81%, 85%, 87%, 93%, and 99.5% were chosen because they produced the lowest

47 Mantel values when compared to their incrementally broader cutoff, which resulted in the ‘gaps’ between the clusters seen the in the NMDS plot (Table A.1).

2.4 Results Ecological patterns vary across taxonomic scales The relative abundance of taxa defined at increasingly stringent OTU definitions (from 75%- 99.5%) were calculated for twelve samples that differ in space (epilimnion and hypolimnion) and time (six different times) through the whole lake mixing experiment. Collector’s curves produced from OTUs defined at each cutoff (either all samples or each time point separately) showed complete or nearly complete sampling of the standing diversity except at the 99.5% percent cutoff (data not shown). Figure 2.1 shows a qualitative assessment of the change in relative abundance of taxa when they are binned at different taxonomic scales. Throughout the text we will refer to OTUs using numerical codes derived from the hierarchies on the right side of Figure 2.1. For example, OTU 3.3.3.80.189 was assigned to Moraxellaceae within the , while OTU 3.3 represents all reads classified as Gammaproteobacteria. As parental OTUs were split into daughter OTUs, new patterns emerged (Figure 2.1) in some lineages. For example, the sister taxa OTU 3.3.3.80.189 and 3.3.3.80.93 in the Gammaproteobacteria defined by 87% sequence divergence, showed different spatial distributions being found predominantly in the epilimnion and the hypolimnion, respectively. As shown in Figure 2.1, while most sister taxa show similar responses to mixing disturbance, differences occurred in both space and time and at all taxonomic levels. For example OTU 3.3 (Gammaproteobacteria) differed from OTU 3.5 (Betaproteobacteria) in their temporal change in abundance across the time course of the mixing experiment. The extent of difference in patterns between sister taxa were quantified using change in rank order of taxon abundances (number of reads) between samples (Figure 2.2, Figure A.6). Overall, a total of 34.9% of sister taxa displayed strong positive correlations (ρ ≥ 0.6, Q < 0.05), while 13.7% showed strong negative correlations (ρ ≥ - 0.6, Q < 0.05) when ranked for abundance in the twelve samples from six time points in two water layers. As shown in Figure 2.2, strong positive correlations and negative

48 correlations occurred at most taxonomic scales (75%, 85%, 93% and 99.5%). The 93% cutoff showed the greatest number of strong negative correlations among sister OTUs, particularly in the Betaproteobacteria (Figure 2.2, Table 2.1). The highest resolution of 99.5% cutoff showed the largest number of strong positive correlations among taxa when controlling for depth or time, which suggested similar ecological preferences among groups resolved at this scale (Figure 2.2). The majority of these strong positive correlations originated from the OTU 5 lineage, which was only composed of one abundant OTU at each cutoff until OTU 5;6;6;6;6 split at the 99.5% cutoff into 7 abundant and often temporally or spatially correlated OTUs (Figure A.6). In contrast, some taxa exhibited ecological distinction at the finest taxonomic scale. For example, the two most abundant OTUs in the dataset, OTUs 3.5.10.10.12.1733 and 3.5.10.10.12.1694, which are assigned to the PnecC subcluster within the Polynucleobacter genus (15,28), displayed a negative correlation when controlling for time, suggesting a preference for specific water layers (Figure 2.2, Table 2.1). These two Polynucleobacter OTUs are within the OTU 3 lineage, in which many of the negative correlations among sister taxa occurred, suggesting that ecological distinction is prevalent throughout the lineage’s hierarchy (Table 2.1).

Further resolving the distinction between Polynucleobacter genotypes We further investigated the differences between closely related taxa observed in the dominant Polynucleobacter-affiliated OTU 3.5.10.10.12. To do this, we used a heat map to view the change in relative abundance of reads within this OTU over the course of the mixing experiment and found a population dominated by two relatively highly abundant OTUs, with each segregated in different water layers (Figure 2.3A). Both OTUs were highly abundant in both layers during the artificial mixing of the water column but quickly rebounded to the relative abundances in each layer seen prior to mixing. These prevalent OTUs differed by one nucleotide out of 202 (~99.5% identical), providing an example of ecological differentiation occurring at a particularly fine scale. To test whether we could further resolve these two prevalent Polynucleobacter OTUs beyond the single nucleotide change in the 16S rRNA marker, we compared the rRNA patterns to those of a protein-encoding marker ccoN, which encodes a fragment of

49 cytochrome-C oxidase, cbb3-type, subunit I (ccoN) specifically found within the genus Polynucleobacter. We obtained a total of 351 partial gene sequences from the surface epilimnion population. The relative abundances of each unique sequence observed in the epilimnion of each sample were visualized with a heat map and compared to the 16S rRNA heat map (Figure 2.3B). The alternative phylogenetic marker ccoN resolved a similar pattern of succession as observed in Figure 3A, demonstrating robust patterns across these two different markers. There was a single dominant genotype in the epilimnion before mixing. As seen using the 16S rRNA marker, a new abundant genotype was introduced at mixing which we infer is linked to the prevalent 16S rRNA genotype from the hypolimnion. Following the mixing event, this second genotype decreased in frequency over time as the relative abundances of both types return to their pre-disturbance distribution. The ccoN gene was chosen as a marker for differentiating populations, and so it is unlikely that sequence variation between the genotypes contributed to their ecological differentiation. In agreement with this assumption, we found the ccoN fragment for the two dominant genotypes to differ by 7 amino acids (262 AA total), which were all confined to the middle of the gene where no functional domains are present according to the NCBI Conserved Domain Database (23).

Genotype specificity of response to lake mixing One of the most striking changes observed previously in the microbial community in response to mixing was a reduction in phylogenetic diversity and number of OTUs observed in the hypolimnion on the day of mixing (Figure 2.4 and (41)). Our analysis of highly resolved taxa (above) shows that this drop in richness co-occurs with a dramatic increase in a single Polynucleobacter-affiliated OTU (3.5.10.10.12.1733) that we identified previously as specific to the hypolimnion. This OTU increased by more than two fold (163 reads to 381 16S rRNA reads) from Day -9 to Mix, which corresponded to the OTU 3 increasing from 49% of the total sample reads at Day -9 to 67% percent at Day Mix. The increase in number of sequences affiliated with this one resolved taxon dramatically decreased the observed phylogenetic diversity and richness of this community.

50 Highly resolved taxa show difference in response to disturbance in the hypolimnion Previous work had reported that the bacterial communities in this experiment were not resistant to disturbance but were resilient over the course of 20 days (41). We quantified how splitting OTUs into highly resolved taxa altered assessments of community resistance (from the pre-disturbance time point, Day -9) and resilience (return to pre-disturbance) in terms of their presence and absence profiles in both water layers through time (Table 2.2). This is a slightly different definition of resistance and resilience than used in the community-level analyses of (41), which assessed changes in relative abundances in addition to presence and absence at a single taxonomic scale. We tested for a bias toward an increased number of absences for OTUs with lower total abundances, which may influence our measurements of resistance and resilience, especially when subdividing OTUs at increasingly fine taxonomic resolutions. We found a weak correlation between OTU abundance and number of absences (Spearman, ρ=- 0.42, P=0.02), indicating that our measurements of resistance and resilience were not substantially influenced by such a bias. In addition, the number of absences decreased minimally when increasing our OTU abundance cutoff beyond ≥50 reads, suggesting that we have mitigated most spurious absences caused by low OTU abundance (Figure A.2). Using this definition of resistance and resilience, the majority of taxa were resistant to mixing disturbance when binned at the 75%, 81%, 85% and 87% level (Pattern A), although the community profile of “not resistant but resilient” (Pattern B) was seen for a minority of individual taxa. One striking difference between the patterns of community response to mixing was identified in the hypolimnion where we observed that 42% of the 99.5%-resolved taxa displayed a pattern of “introduced at Mix” (Pattern E). These OTUs were not categorized as resistant because they were not observed before mixing, and also were not resilient because they did not return to undetectable levels after 20 days (Figure 2.1 – bold labels). All but three (out of 9) of these OTUs were detected in the epilimnion prior to mixing.

2.5 Discussion We have shown that assessing microbial community sequence diversity through time and space across different taxonomic scales uncovers novel patterns of response to

51 environmental change. In addition, we have shown that there are differences in the ecological coherence (i.e., similar patterns in space and time) at different taxonomic scales between abundant Actinobacteria and Polynucleobacter OTUs in this freshwater community (Figure 2.2, Table 2.1, Table A.6) in ways that support the ecological distinction between these two abundant and ubiquitous but uncharacterized groups of microorganisms. These results demonstrate, the importance of varying taxonomic resolution in culture-independent studies of microbial communities in order to identify ecologically distinct units that correspond to environmental change. When split at the 99.5% sequence identity cutoff, most OTUs demonstrated a coherent response to environmental change. In particular, OTUs such as 5.6.6.6.6 (Actinobacteria) resolved a large number of daughter OTUs (seven) with significant positive correlations in their stable abundance patterns through space and time (all except for OTU 5.6.6.6.6.780) (Figure A.6). This suggested that at least in response to this disturbance, these OTUs were highly redundant. Previous work has shown that the Actinobacteria have stable populations through resistance to desiccation, predation, and dramatic variations in organic carbon load; therefore, this disturbance may not present ecological parameters that differentiate these closely related lineages if they exist (10,30,44). Such high levels of coherence between finely resolved taxa have been seen in plant and animal datasets and were suggested to result from low competition relative to environmental variables (17). This apparent redundancy in ecological responses at the finest taxonomic resolution may indicate neutral divergences within most OTUs at this taxonomic level (6). Alternatively, closely related, non-redundant taxa may co-vary because they are interdependent or because each co-varies with independent environmental stimuli that change in the same way in response to disturbance. It is important to note that ecological redundancy among taxa observed in this study may not hold for other changes in environmental stimuli, and further work with different disturbances in a diversity of systems will help to understand the robustness of this pattern. In contrast to the majority of taxa, OTU 3.5.10.10.12 (assigned to the Polynucloebacter genus within the Proteobacteria phylum) had an overall abundance very similar to OTU 5.6.6.6.6 but contained only four daughter OTUs at the 99.5%

52 cutoff. Two of these OTUs were prevalent and were found in opposite thermal layers, indicating that in our experimental system, this OTU was comprised of two non- redundant taxa. This niche diversification may have resulted from strong competitive interactions that drive ecological differentiation within this group. Recently, a study found niche differentiation among Polynucleobacter PnecC OTUs that were >99% identical by 16S rRNA. This specialization occurred along gradients of pH, DOC, and other important parameters spanning a large number of freshwater habitats (19). In the same study, there were strains that preferred either oxic or anoxic layers of a lake water column, suggesting that Polynucleobacter taxa are commonly ecologically distinct in other lake ecosystems as well as in our humic lake. Again, it is important to note that the divergence among the Polynucleobacter OTUs was in response to this particular disturbance. Alternative scenarios of environmental change may cause divergence between other taxa, while potentially not producing the divergence between the Polynucleobacter strains observed in this study. The dynamics of highly resolved taxa can provide additional insight into the possible mechanisms of community change. Previous work found that there was a decrease in hypolimnion richness during the mixing (41), and attributed this decrease to a reduction in the abundances of strict anaerobic taxa when oxygen was introduced. Here, we additionally found that increase in a single Polynucleobacter lineage, OTU 3.5.10.19.12.1733, appeared to contribute to the richness decrease. Although we cannot quantify the extent to which the two mechanisms contributed to the measured decrease, the rapid change in abundance of this Polynucleobacter OTU during the mixing is consistent with the opportunistic lifestyle of Polynucleobacter lineages, which have been shown to be highly susceptible to specific changes in grazing, nutrient source and concentrations, and oxygen requirements (28). There is also evidence that Polynucleobacter lineages can be very dynamic over relatively short temporal windows. For example, in a similarly dystrophic lake the PnecC subcluster of Polynucleobacter changed from as low as ~5% to as high as ~60% of the total bacteria abundance in the span of days or weeks (14). We found some OTU-level exceptions to the community-level overall pattern of resilience when defining OTUs at the 99.5% sequence identity cutoff. Only at this

53 taxonomic resolution was there a large number (42%) of OTUs to be absent before mixing began, but were found continuously afterward (Table 2.2). Most of these OTUs were detected in the epilimnion prior to mixing, which suggests that these OTUs were introduced during mixing and then persisted in the lower water layer. Persistence of these OTUs may have been due to increased temperature or other environmental variables that were different pre- and post-mix in the hypolimnion but was not correlated with a change in oxygen concentration which quickly returned to premixing levels (41). Although several new patterns may be resolved at finer taxonomic resolutions, many studies do not assess dynamics of taxa at a fine scale because of perceived problems associated with sequencing error (e.g., reference (25)). We demonstrated coherence of patterns detected using 454-tag pyrosequencing of the V2 region of the 16S rRNA gene and a significantly more divergent marker (ccoN), suggesting that for Polynucleobacter, the 16S rRNA gene is sufficient for understanding dynamics at a more resolved taxonomic level. The same may be true for other genera and other protein- encoding markers, and should be explored for taxa of interest in other environments. 16S rRNA pyrosequencing provides the ability to deeply sample diverse microbial communities. Through analysis of change in relative abundance among taxa resolved at various taxonomic scales, these datasets can be used to identify ecologically relevant units of diversity in various environmental and experimental systems. Identifying such units will improve our understanding of how particular lineages and the entire community respond to environmental change. Our study clearly demonstrates that defining ecologically relevant taxonomic units can provide novel perspective to microbial dynamics that are otherwise unobserved in a community-level analysis. While closely related sequences are commonly assumed to be redundant evolutionary variants, we show that ecological changes occur at this scale. In conclusion, this work has provided a non-arbitrary method for defining ecologically-relevant taxonomic units informed by disturbance responses, has clearly demonstrated that different bacterial lineages may demand different taxonomic unit definitions, and finally, has showed that re-defining ecologically relevant taxonomic units brings novel perspective to microbial dynamics that are otherwise unobserved in a community-level analysis.

54

2.6 Acknowledgements We thank M. Dell’Aringa and B.K. Dalsing for help with clone library construction, B.L. Dalsing and M. Milferstedt for the ccoN primer design, R. Knight and N. Fierer for performing the 454-tag pyrosequencing, A. Kent and S. Paver for critical feedback and discussion of methods and results, the North Temperate Lakes Microbial Observatory summer 2008 field crew for help with the mixing experiment and sample collection, and the UW-Trout Lake Station for logistical support. Funding was provided by the North Temperate Lakes Microbial Observatory NSF grant No. MCB-0702653 and the North Temperate Lakes Long Term Ecological Research NSF grant No. DEB- 0822700. A. Shade is a Gordon and Betty Moore Foundation Fellow of the Life Sciences Research Foundation.

55 2.7 References 1. Allison SD, Martiny JBH. Colloquium Paper: Resistance, resilience, and redundancy in microbial communities. Proceedings of the National Academy of Sciences. 2008; 105: 11512–11519.

2. Bates ST, Berg-Lyons D, Caporaso JG, Walters WA, Knight R, Fierer N. Examining the global distribution of dominant archaeal populations in soil. ISME J. 2011; 5: 908–917.

3. Caporaso JG, Kuczynski J, Stombaugh J, et al. QIIME allows analysis of high- throughput community sequencing data. Nature methods. 2010; 7: 335–336.

4. Caporaso JG, Lauber CL, Walters WA, et al. Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proc Natl Acad Sci USA. 2011; 108 Suppl 1: 4516–4522.

5. Chiarucci A, Bacaro G, Scheiner SM. Old and new challenges in using species diversity for assessing biodiversity. Philosophical Transactions of the Royal Society B: Biological Sciences. 2011; 366: 2426 –2437.

6. Cohan FM, Perry EB. A systematics for discovering the fundamental units of bacterial diversity. Current Biology. 2007; 17: 373–386.

7. Costello EK, Lauber CL, Hamady M, Fierer N, Gordon JI, Knight R. Bacterial Community Variation in Human Body Habitats Across Space and Time. Science. 2009; 326: 1694–1697.

8. Curtis TP, Sloan WT. Prokaryotic diversity and its limits: microbial community structure in nature and implications for microbial ecology. Current Opinion in Microbiology. 2004; 7: 221–226.

9. Dobrindt U. Whole genome plasticity in pathogenic bacteria. Current Opinion in Microbiology. 2001; 4: 550–557.

10. Glockner FO, Zaichikov E, Belkova N, et al. Comparative 16S rRNA Analysis of Lake Bacterioplankton Reveals Globally Distributed Phylogenetic Clusters Including an Abundant Group of Actinobacteria. Appl Environ Microbiol. 2000; 66: 5053–5065.

11. Gomez-Alvarez V, Teal TK, Schmidt TM. Systematic artifacts in metagenomes from complex microbial communities. The ISME Journal. 2009; 3: 1314–1317.

12. Haas BJ, Gevers D, Earl AM, et al. Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Research. 2011; 21: 494 –504.

56 13. Hahn MW, Pöckl M, Wu QL. Low intraspecific diversity in a Polynucleobacter subcluster population numerically dominating bacterioplankton of a freshwater pond. Appl Environ Microbiol. 2005; 71: 4539–4547.

14. Hahn MW, Scheuerl T, Jezberová J, et al. The Passive Yet Successful Way of Planktonic Life: Genomic and Experimental Analysis of the Ecology of a Free- Living Polynucleobacter Population. PLoS ONE. 2012; 7: e32772.

15. Hahn MW. Isolation of strains belonging to the cosmopolitan Polynucleobacter necessarius cluster from freshwater habitats located in three climatic zones. Applied and Environmental Microbiology. 2003; 69: 5248–5254.

16. Hamady M, Walker JJ, Harris JK, Gold NJ, Knight R. Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex. Nature methods. 2008; 5: 235–237.

17. Houlahan JE, Currie DJ, Cottenie K, et al. Compensatory dynamics are rare in natural ecological communities. Proceedings of the National Academy of Sciences. 2007; 104: 3273 –3277.

18. Huber JA, Mark Welch DB, Morrison HG, et al. Microbial population structures in the deep marine biosphere. Science (New York, NY). 2007; 318: 97–100.

19. Jezbera J, Jezberová J, Brandt U, Hahn MW. Ubiquity of Polynucleobacter necessarius subspecies asymbioticus results from ecological diversification. Environ Microbiol. 2011; 13: 922–931.

20. Jones SE, Chiu CY, Kratz TK, Wu JT, Shade A, McMahon KD. Typhoons initiate predictable change in aquatic bacterial communities. Limnology and Oceanography. 2008; 53: 1319–1326.

21. Konstantinidis KT, Ramette A, Tiedje JM. The bacterial species definition in the genomic era. Phil Trans R Soc B. 2006; 361: 1929–1940.

22. Maddison D, Maddison W. MacClade 4. manual. 2000; 1–492.

23. Marchler-Bauer A, Anderson JB, Cherukuri PF, et al. CDD: a Conserved Domain Database for protein classification. Nucl Acids Res. 2005; 33: D192–D196.

24. Martiny AC, Tai APK, Veneziano D, Primeau F, Chisholm SW. Taxonomic resolution, ecotypes and the biogeography of Prochlorococcus. Environmental microbiology. 2008; 11: 823–832.

25. Martiny JBH, Eisen JA, Penn K, Allison SD, Horner-Devine MC. Drivers of bacterial β-diversity depend on spatial scale. PNAS. 2011; 108: 7850–7854.

57 26. Nelson CE. Phenology of high-elevation pelagic bacteria: the roles of meteorologic variability, catchment inputs and thermal stratification in structuring communities. ISME J. 2008; 3: 13–30.

27. Nemergut DR, Costello EK, Hamady M, et al. Global patterns in the biogeography of bacterial taxa. Environmental Microbiology. 2011; 13: 135–144.

28. Newton RJ, Jones SE, Eiler A, McMahon KD, Bertilsson S. A Guide to the Natural History of Freshwater Lake Bacteria. Microbiology and Molecular Biology Reviews. 2011; 75: 14–49.

29. Newton RJ, Jones SE, Helmus MR, McMahon KD. Phylogenetic Ecology of the Freshwater Actinobacteria acI Lineage. Applied and Environmental Microbiology. 2007; 73: 7169–7176.

30. Newton RJ, Kent AD, Triplett EW, McMahon KD. Microbial community dynamics in a humic lake: differential persistence of common freshwater phylotypes. Environmental Microbiology. 2006; 8: 956–970.

31. Philippe H, Brinkmann H, Lavrov DV, et al. Resolving difficult phylogenetic questions: why more sequences are not enough. PLoS Biol. 2011; 9: e1000602.

32. Ploner A. Heatplus: A heat map displaying covariates and coloring clusters [Internet]. 2011. Available from: http://CRAN.R-project.org/package=Heatplus.

33. Quince C, Lanzen A, Davenport RJ, Turnbaugh PJ. Removing noise from pyrosequenced amplicons. BMC Bioinformatics. 2011; 12: 38.

34. R Development Core Team. R: A Language and Environment for Statistical Computing [Internet]. Vienna, Austria: 2010. Available from: http://www.R- project.org.

35. Read JS, Shade A, Wu CH, Gorzalski A, McMahon KD. “Gradual Entrainment Lake Inverter” (GELI): A novel device for experimental lake mixing. Limniology Oceanography Methods. 2011; 9: 14–28.

36. Schloss PD, Westcott SL, Ryabin T, et al. Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities. Appl Environ Microbiol. 2009; 75: 7537– 7541.

37. Schloss PD, Westcott SL. Assessing and improving methods used in OTU-based approaches for 16S rRNA gene sequence analysis. Applied and Environmental Microbiology. 2011; 2810–2820.

38. Shade A, Chiu C-Y, McMahon KD. Differential bacterial dynamics promote emergent community robustness to lake mixing: an epilimnion to hypolimnion transplant experiment. Environmental Microbiology. 2010; 12: 455–466.

58 39. Shade A, Chiu C-Y, McMahon KD. Seasonal and Episodic Lake Mixing Stimulate Differential Planktonic Bacterial Dynamics. Microb Ecol. 2010; 59: 546–554.

40. Shade A, Kent AD, Jones SE, Newton RJ, Triplett EW, McMahon KD. Interannual Dynamics and Phenology of Bacterial Communities in a Eutrophic Lake. Limnology and Oceanography. 2007; 52: 487–494.

41. Shade A, Read JS, Youngblut ND, et al. Lake microbial communities are resilient after a whole-ecosystem disturbance. ISME J. 2012; 6: 2153–2167.

42. Shannon P, Markiel A, Ozier O, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Research. 2003; 13: 2498–2504.

43. Strimmer K. fdrtool: Estimation and Control of (Local) False Discovery Rates [Internet]. 2009. Available from: http://CRAN.R-project.org/package=fdrtool.

44. Tarao M, Jezbera J, Hahn MW. Involvement of cell surface structures in size- independent grazing resistance of freshwater Actinobacteria. Applied and Environmental Microbiology. 2009; 75: 4720–4726.

45. Wetzel RG. Limnology: lake and river ecosystems. 3rd ed. San Diego: Academic Press; 2001. 1006 p.

46. Wickham H. ggplot2: elegant graphics for data analysis. Springer New York; 2009. 210 p.

47. Wu QL, Hahn MW. High predictability of the seasonal dynamics of a species-like Polynucleobacter population in a freshwater lake. Environmental Microbiology. 2006; 8: 1660–1666.

59 2.8 Tables and figures

75% (Phylum) - 81% (Class) - - 85% (Order) - - - 87% (Family) Epilimnion Hypolimnion - - - - 93% (Genus) 0 m 4 m - - - - - 99.5% (Genus) 2 (Proteobacteria) 3 (Proteobacteria) - 3 (Gammaproteobacteria) - - 3 () - - 80 (Methylococcales) - - - 93 (Methylococcaceae) - - - 189 (Moraxellaceae) - 5 (Betaproteobacteria) - - 5 (Methylophilales) - - - 5 (Methylophilaceae) - - - - 24 (Methylophilaceae) - - - - 517 (*) - - 10 (Burkholderiales) - - - 10 (Burkholderiaceae) - - - - 11 - - - - 12 (Polynucleobacter) - - - - - 1694 - - - - - 1733 - - - - - 1797 - - - - - 1899 - - - - 14 (Polynucleobacter) - - - 12 (Comamonadaceae) - - - - 15 (Curvibacter) - - - - - 96 (Comamonadaceae) - - - - - 104 (*) - - - - - 171 (Curvibacter) - - - - - 303 (Curvibacter) - - - - - 320 (Curvibacter) - - - - 64 (Rhodoferax) - - - 14 (Oxalobacteraceae) - - 27 (*) - 19 (Alphaproteobacteria) - - 20 (Rhizobiales) - - - 23 - - - - 32 - - - - - 1262 (Beigerinkia) - - - - - 2264 (*) - - - - - 2295 (*) - - 37 (Alphaproteobacteria) - - 75 (Sphingomonadales) - - 76 (Alphaproteobacteria) - - 197 (Rickettsiales) - 181 (Alphaproteobacteria) 4 (Actinobacteria) - 4 (Actinobacteria) - - 4 (Acidimicrobiales) - - - 4 (Acidimicrobiales) - - - - 4 (Iamia) - - - - 123 5 (Actinobacteria) - 6 (Actinobacteria) - - 6 (Actinomycetales) - - - 6 (Actinomycetales) - - - - 6 - - - - - 685 - - - - - 690 - - - - - 698 - - - - - 765 - - - - - 769 - - - - - 772 - - - - - 780 10 (Verrucomicrobia) - 11 (Opitutae) - - 12 (Puniceicoccales) - - - 13 (Puniceicoccaceae) - - - - 16 - - - - - 1047 - - - - - 1054 16 (Bacteroidetes) - 17 (Sphingobacteria) - - 18 (Sphingobacteriales) - - - 21 (Chitinophagaceae) - - - - 30 (Sediminibacterium) - - - - 302 (Sediminibacterium) - - - - 430 (Sediminibacterium) - 146 (Sphingobacteria) 18 (Acidobacteria) - 20 (Holophagae) - - 22 (Holophagales) - - - 25 (Holophagaceae) - - - - 34 (Geothrix) - - - - - 1494 - - - - - 1542 - - - - - 1548 19 (Bacteroidetes) 21 (Bacteroidetes) - 24 (Bacteroidetes) - - 138 - - - 157 - - - - 255 - - - - - 1156 - - - - - 1159 26 (TM7) 28 (Proteobacteria) 69 (Chlorobi) 3 7 3 7 -9 -9 11 20 11 20 Mix Mix

% relative abundance by OTU (across the row)

0% 20% 40% 60% 80% Figure 2.1

60 Figure 2.1 (cont.): Divergent ecological patterns between parent and daughter taxa. The heat map shows relative abundances (number of 16S rRNA reads) of each OTU defined at a 75%, 81%, 85%, 87%, 93%, or 99.5% sequence identity cutoff. Relative abundances are normalized by OTU to show relative change in abundance of each OTU over time and depth. OTUs were assigned a taxonomic classification if 95% of the reads in an OTU were classified identically or labeled as ‘*’ if mixed. OTUs were left unlabeled if the reads they contained could not be classified with confidence to the specified taxonomic resolution. Bold labels highlight OTUs defined at the 99.5% cutoff that show the “opportunistic starting at mix” pattern as shown in Table 2.1. A similar figure using the freshwater classifications from (28) is shown in Figure A.2.

61 Temporal Patter ns Spatial Patterns 0.6

0.4

0.2

0.0

0.2

0.4 (# strong correlations) / comparisons) 75% 85% 93% 75% 85% 93%

99.5% 99.5% Figure 2.2: Summarizing correlations of ecological preference among sister taxa. Correlations of change in abundance of sister OTUs when controlling for depth (temporal patterns) or time (spatial patterns) were summed at each cutoff and represented in the bar graph as the number of strong correlations among sister OTUs normalized by the total number of comparisons made among sister OTUs. Negative and positive correlations are shown in red or blue, respectively. The 81% and 87% cutoffs are not shown because no strong correlations were found. See Figure A.6 for network diagram displaying correlations among OTUs.

62 Epilimnion Hypolimnion A) (00 mm) (4 4.5 m) m % relative abundance by taxon (across the row) the (across taxon by abundance relative %

0.005 010203040506070 3 7 3 7 -9 -9 11 20 11 20 Mix Mix B) Epilimnion (0 m) 50

0.02 row) the (across taxon by abundance relative % 7 3* 12 20 010203040 -10 Mix Figure 2.3

63 Figure 2.3 (cont.): Assessing ecological differences among highly related genotypes of Polynucleobacter using two phylogenetic markers. Each heat map and dendrogram depicts the relative abundance of each de-replicated sequence (genotype) over time and the genetic similarity among genotypes using A) the V2 16S rRNA pyrosequencing data B) the protein-encoding phylogenetic marker (ccoN). Abundances are relative to the total number of 16S rRNA sequences in each sample. Genotypes represent A) unique 16S rRNA sequences that were classified as belonging to the Polynucleobacter genus B) unique ccoN sequences amplified with primers specific for the Polynucleobacter genus. The two dominant genotypes in the 0 m and 4 m depths are outlined in each heat map. ‘*’ and striped boxes signify that a ccoN clone library was not constructed for Day 3 at 0 m.

64 Hypolimnion (4 m)

22 Phylogenetic

20 Diversity

18

16

14 380

360 Number of OTUs

340 PD (Total)

320 Number of OTUs 3 (Proteobacteria) 300 3.5.10.10.12.1733 280

260

800 AbundanceProteobacteria

600

400

200

9 Mix 3 7 11 20 Days since 'Mix'

Figure 2.4: A bloom of one highly resolved OTU corresponds to a drop in community diversity. The top and middle plots show change in phylogenetic diversity and richness (number of OTUs) in the hypolimnion, with OTUs defined at the 99.5% sequence identity cutoff used for both analyses (Figure 2.3). The lower plot shows relative abundance (number of reads) of OTU 3.5.10.10.12.1733, the most abundant OTU defined at the 99.5% cutoff, and OTU 3, the most abundant OTU in the dataset.

65 Cutoff Pattern OTU Identifier Classification 85% Temporal 3;5;27 & 3;5;5 Betaproteobacteria 93% Temporal 3;5;10;10;11 & 3;5;10;10;14 Polynucleobacter 93% Temporal 16;17;18;21;30 & 16;17;18;21;430 Sediminibacterium 85% Spatial 3;5;27 & 3;5;5 Betaproteobacteria 93% Spatial 3;5;10;10;11 & 3;5;10;10;14 Polynucleobacter 93% Spatial 3;5;10;12;15 & 3;5;10;12;64 Comamonadaceae 93% Spatial 3;5;5;5;24 & 3;5;5;5;517 Methylophilaceae 93% Spatial 16;17;18;21;30 & 16;17;18;21;430 Sediminibacterium 99.5% Spatial 3;5;10;10;12;1694 & 3;5;10;10;12;1733 Polynucelobacter

Table 2.1: OTUs within each lineage that showed a strong negative correlation. The table lists the OTUs in each lineage (i.e., 75% cutoff OTUs) that show strong negative correlations (as summarized in Figure 2.2 – blue bars) See Figure 2.1 for the OTU classification methodology. See Figure A.6 for network diagram displaying correlations among OTUs.

66 Presence/Absence Pattern Key Pattern -9 Mix 3 7 11 20 A) resistant B) not resistant, but resilient C) not resistant, not resilient

D) transient E) opportunistic starting at Mix F) opportunistic after Mix G) sporadic*

Water Layer Level/Cutoff A B C D E F G 75% 67 11 11 0 0 0 11 81% 58 8 8 0 0 0 25 Epilmnion 85% 61 6 6 0 6 0 22 (0 m) 87% 52 5 5 0 14 0 24 93% 48 4 8 4 8 0 28 99.5% 69 3 0 3 7 0 17 75% 50 33 0 0 0 0 17 81% 53 33 0 0 0 0 13 Hypolimnion 85% 55 18 0 0 9 5 14 (4 m) 87% 52 16 0 0 8 4 20 93% 45 17 0 0 7 3 17 99.5% 22 17 0 0 42 0 19

Table 2.2: Patterns of resistance and resilience at different taxonomic resolutions. Response patterns to the lake mixing disturbance were defined by presence (black box in key) or absence (white box in key) before mixing, immediately after mixing, or the following post-Mix time points (see pattern key in table). The values are percentages of OTUs in the community that fall in each category. The “sporadic” pattern is defined as taxa that were only present in some of the later post-Mix time points (Days 3, 7, 11, & 20).

67 CHAPTER 3: DIFFERENTIATION BETWEEN SEDIMENT AND HYPOLIMNION METHANOGEN COMMUNITIES IN HUMIC LAKES3,4

3.1 Abstract The traditional view of carbon cycling within the pelagic zone of freshwater lakes has consisted of methane production within the anoxic sediment, followed by diffusive flux and ebullition through the water column. Methanogenic archaea have been shown to be present within the water columns of freshwater lakes; however, little is known about whether these methanogenic communities are distinct from those in the sediment or how these communities change over space and time. We used the methanogen-specific phylogenetic marker mcrA to perform a three-year study focusing on the community structure of methanogens within the sediment and anoxic hypolimnion water layer of five humic lakes in WI, USA. The hypolimnion and sediment communities were distinct in composition, richness, and phylogenetic diversity. Hypolimnion communities displayed a temporally stable biogeographic pattern among lakes, which was driven by both lake- specific environmental variables and barriers to dispersal. We conclude that the hypolimnion comprised communities of methanogens that are distinct from those in the sediment, differentiated among lakes, and likely have unique ecological roles and evolutionary trajectories in these anaerobic environments.

3 This chapter appeared in its entirety in the journal: Environmental Microbiology. This article is reprinted with permission of the publisher and is available from http://www.ncbi.nlm.nih.gov/pubmed with PMID: 24237594. 4 Nicholas D. Youngblut and Dr. Rachel J. Whitaker conceived the experiments and wrote the paper. Nicholas D. Youngblut, Dr. Rachel J. Whitaker and Mark Dell’Aringa analyzed the data. Nicholas D. Youngblut and Mark Dell’Aringa performed the experiments. Nicholas D. Youngblut created the figures and tables.

68 3.2 Introduction The biogenic production of methane in freshwater environments is an important component of the global carbon cycle. Approximately 25% of the terrestrial greenhouse gas sink is estimated to be offset by methane emissions from freshwater environments (3). Methanogenic archaea are obligate anaerobes that produce an estimated 1.4 x 1014 g of carbon in the form of methane each year in freshwater habitats and account for 10- 50% of total carbon mineralization in these environments (2,3). Methanogens are diverse in their substrate utilization and affinities, along with other ecological characteristics; however, methanogen community composition, abundance, and physiological variation is often not directly factored into models of biogenic methane efflux (39,41). The traditional view of freshwater methanogenesis in lakes dictates that significant biogenic production of methane only occurs in the sediments of the pelagic zone (53), although some evidence exists of methane production within the water column depths proximal to the sediment (58). Recent molecular and culture-based evidence of active methanogen communities throughout the water columns of different lakes has challenged this paradigm. First, (5) used RT-PCR of the methanogen-specific phylogenetic marker mcrA along with water column profiles of methane concentrations to support their hypothesis of metabolically active methanogens in the anoxic hypolimnion of a deep meromictic lake (5). These authors used mcrA sequence data to show that the broad community composition of methanogens in the water column differed from the sediment. Second, Grossert et al. provided multiple lines of evidence suggesting that methanogenesis within the water column of an oligotrophic lake was a large contributor to high concentrations of methane in aerobic portions of the water column (20). If methanogens do in fact stably inhabit water columns of freshwater lakes, how are these communities assembled? We suggest two possible hypotheses to explain the methanogen community structure in our focal lakes. First, the hypolimnion communities are sinks sustained by high migration from the sediment source community. This hypothesis predicts either no discernable segregation of diversity between sediment and hypolimnion samples or the hypolimnion appearing as a subset of the sediment if the hypolimnion community is continually maintained by randomized colonization from the

69 larger source sediment community. Second, the hypolimnion and sediment communities are distinct communities, and this distinction resulted from either neutral or deterministic processes. For instance, the neutral process of dispersal limitation would allow the hypolimnion and sediment communities to diverge due to endemic speciation and differences in local birth and death rates (22). In contrast, environmental selection specific to the hypolimnion would select for certain methanogen taxa in the hypolimnion, thus causing divergence from the sediment community. Substrate concentration, temperature, oxygen tolerance, affinity for suspended particles, capacity for transport on gas vesicles, or specific symbiosis with ciliates or other protists may have contributed to selecting specific methanogen communities in the water column (11,18). In addition to these possible intra-lake dynamics, distinctions among the water column communities of our focal lakes could result from one or both mechanisms of limited inter-lake dispersal and lake-specific selection. Milferstedt et al. used culture-independent techniques to assess the spatial diversity and temporal stability in the hypolimnia of five humic lakes (33). The authors observed partially unique and temporally stable methanogen communities in the water columns of each lake. For this study, we expanded upon this work by using culture- independent methods to compare the water column to the sediment from a set of five lakes over time and across taxonomic scales in order to test our hypotheses on the origins and maintenance of methanogen diversity in the water columns of freshwater lakes.

3.3 Experimental procedures Lake characteristics We focused on five dystrophic humic lakes located in Vilas County in Northern Wisconsin, USA. All five lakes are associated with the North Temperate Lakes Long- Term Ecological Research Network (NTL-LTER). Three lakes: Trout Bog (TB), North Sparkling Bog (NSB), and South Sparkling Bog (SSB) are located within ~4.5 km of each other, with NSB and SSB separated by ~90 m. The other two lakes Mary Lake (MA) and Rose Lake (RL) are separated by ~50 m and are ~30 km northwest of the other three lakes. Each lake contains darkly stained, acid water, and the shore vegetation consists predominantly of Sphagnum spp. and Vaccinium spp. The lakes vary in surface

70 area from 0.44 to 1.43 hectares (1.1 to 3.5 acres), in maximum depth from 4.5 to 21.5 meters, and in average pH from 4.3 to 6.5 (Milferstedt et al., 2010) (Table B.1). MA has a monomictic mixing regime, while the other four are dimictic.

Biological sample collection Samples new to this study were obtained on May 29-30, 2009 and May 30, 2010 (Table 3.1). Consistent with (33), all samples were collected at the deepest point of each lake. Sediment samples were collected with an Eckman dredge and contained ~6 in of surface sediment along with small amounts of sediment from a flocculent layer that extended ~1 m above the sediment surface. Sediment samples were homogenized and immediately flash frozen in 15 ml conical tubes. Integrated water samples were collected using the standard North Temperate Lake Microbial Observatory (NTL-MO) protocol. Briefly, samples were collected with connected 2 m PVC pipe segment with stop valves at 1 m intervals and combined in a clean carboy. Depth-discrete water samples (see Table 3.1 & Figure 3.1) were collected at 4, 7, 13, and 19 m depths from Mary Lake on July 8, 2008 using a Van Dorn and then transferred to clean Nalgene containers. All water samples were immediately returned to the lab for further processing. Cellular and particulate matter was collected with vacuum filtration on 0.2 µm Supor-200 nylon membrane from 200-250ml of sample volume (Pall Life Sciences, Ann Arbor, USA). The filters and sediment samples were temporarily stored at -20°C until transported on ice to storage at -80°C.

Water column profiling Dissolved oxygen and temperature were measured at 1 m intervals through the water column with a YSI 550 dissolved oxygen meter (Yellow Springs, OH) (Table B.1). For methane profiling, depth-discrete water samples were collected every 2 m using a Van Dorn. Within a carboy, approximately 2 L of sample water was equilibrated with 50 ml of headspace by shaking for 2 minutes. Headspace samples were stored in pre- evacuated 4 ml BD Vacutainers (BD Biosciences, San Jose, CA, USA) at room temperature and processed within 5 days. Methane was measured using a Hewlett Packard 5890 Series II gas chromatograph with flame ionization detection. Methane

71 profiles of MA and SSB were obtained in August 2011 to test for evidence of methane production within the hypolimnion (Figure B.1).

DNA extraction and clone library construction DNA extraction was performed using an Ultra Clean Fecal DNA kit (MO BIO Laboratories Inc., Carlsbad, CA, USA) with either one half of a frozen filter or 0.5 g of sediment. A fragment of the methyl-coenzyme M reductase (mcrA) gene was amplified using the primers mcrF and mcrR (29) in a PCR with a final volume of 30 µl containing the final concentrations of 1.5 mM MgCl2, 0.2 mM dNTPs (each), 0.4 µM primers (each), and 0.05 U of Phusion High-Fidelity DNA polymerase F-530 (Finnzymes, MA, USA). Thermocycler conditions consisted of an initial denaturation for 3 minutes at 98°C, followed by 30 cycles of 30 seconds at 98°C, 15 seconds at 59°C, and 15 seconds at 72°C, with a final extension of 10 minutes at 72°C. PCR products were cloned using the Zero Blunt TOPO PCR for Sequencing Kit following the standard kit protocol (Invitrogen, Carlsbad, CA, USA). All clones were submitted to the WM Keck Center for Comparative and Functional Genomics at the University of Illinois for Sanger sequencing of one end of the vector insert. All mcrA sequences have been submitted to GenBank under accession numbers KF194358 - KF195483.

Sequence analysis The chromatograms of all Sanger reads were manually inspected for sequencing errors and edited in Sequencher 4.7 (Gene Code Corporation, Ann Arbor, MI, USA). Reads were aligned manually in MacClade v4.08 (30). Pairwise sequence identity calculation, clustering of reads into alleles of 100% sequence identity, and calculation of richness (number of observations), the Chao1 index, the Shannon-evenness index, phylogenetic diversity (PD), pairwise genetic distance, Bray-Curtis values, and abundance-based Sorenson values were done using Mothur v1.28.0 (42). Maximum likelihood (ML) phylogenies (GTR+Γ model) were inferred with RAxML (47) and visualized using FigTree v1.3.1. Type strains used for phylogenetic reference were selected from BLASTn (v 2.2.27+; default parameters) hits produced by querying the GenBank nt database with all mcrA alleles (8). Arlecore v3.5.1.3 and Mothur v 1.28.0

72 were used for calculating all pairwise FST and weighted-Unifrac values between samples, respectively (15,42).

The standard effect size of mean phylogenetic distance (MPDSES) was calculated from the mcrA ML phylogeny and a table of allele counts per sample with the picante R package v1.3-0 (27). The null communities used for calculating MPDSES were constructed by randomizing i) allele abundances across samples, which maintained occurrence frequencies among samples (“Frequency” model) ii) allele abundances within samples, which maintained each sample’s richness (“Richness” model) iii) allele abundances across samples and within samples, while maintaining allele occurrence frequencies and richness (“Independent Swap” model) iv) taxa labels across the phylogeny (“Taxa Labels” model). AdaptML v1.0 (24) was run with the mcrA ML phylogeny and the following settings: init_hab_num=20, collapse_thresh=0.10 converge_thresh=0.001, rateopt=avg. We assessed the robustness of our AdaptML analysis by varying the initial number of habitats in the model and the collapse threshold parameter. AdaptML consistently inferred the same number of habitats from the provided mcrA phylogeny. In addition, we repeated the analysis multiple times using randomized layer assignments (i.e., sediment or water column) for each allele to determine whether AdaptML was discerning phylogenetic signal from noise. The randomizations lead to consistent inference of only one ancestral habitat, suggesting that AdaptML was accurately modeling the phylogenetic signal in the dataset.

Statistics All statistical tests not previously listed and plot creation were done with R (R Development Core Team). The Ecodist R package was used for the multiple regression on matrices (19). Prior to the analysis, redundant environmental variables (i.e., strongly co-varying parameters: ρ > 0.6; Table B.1) were grouped using the varclus function in Hmisc R package (21). ANOSIM and Mantel tests were performed with the Vegan package, which was also used for creating the PCoA plots and fitting environment variables (36). All other plots were created with using the ggplot2 R package (37,57).

73 3.4 Results Methane profiles In August of 2010, we profiled methane concentrations in the water columns of Mary Lake and South Sparkling Bog to test for methane maxima at intermediate intervals as seen in (20) (Figure B.1). In contrast to (20) we observed profiles indicating that most methane in the water column originated from the sediment at this time point.

Sequencing data We obtained matched samples from the hypolimnion and sediment of five lakes: Mary Lake (MA), Rose Lake (RL), Trout Bog (TB), North Sparkling Bog (NSB), and South Sparkling Bog (SSB) (Table 3.1) in May 2009. Throughout the manuscript, lake abbreviations appended with an ‘H’ or ‘S’ will designate samples were taken from the integrated hypolimnion or dredged sediment, respectively. These samples in addition to two additional sediment samples in May 2010 resulted in a total of 900 mcrA sequences from 12 samples. When combined with the Milferstedt et al. dataset (33), the merged dataset comprised 2132 mcrA sequences from 27 samples spanning October 2007 to May 2010 (Table 3.1). Of the 2132 total sequences, 571 were unique sequence types (alleles). The majority of alleles in the combined dataset were from the Methanomicrobiales order (Figure 3.3; Figure B.2) (18). Sampling of mcrA allelic diversity was more complete in the hypolimnion samples relative to the sediment samples (Figure B.3). In accordance with (33), mcrA was not detected by PCR in any epilimnion samples, although repeated attempts were made with samples collected from MA and RL in May 2010.

Differentiation between sediment and water column We first tested for discernable community differentiation among matched sediment and hypolimnion samples. Between matched samples, we found the overlap of unique alleles (defined by 100% sequence identity) to be very low, with a range of 2-10% of alleles shared between layers (Table 3.2). Weighing allele overlap by abundance with the commonly used Bray-Curtis index also showed highly differentiated communities (range of 0.75 to 0.96) (Table 3.2). Additionally, the communities of each layer still appeared highly differentiated when accounting for overlap of undetected alleles in either

74 layer by using the abundance-based Sorenson index (range of 0.59 to 0.93) (10). Beta diversity measures that incorporated genetic relatedness also showed significant differentiation among layers of each lake, with fixation index (FST) values ranging from 0.09 to 0.2 (P < 0.01 for all comparisons) and weighted-Unifrac values ranging from 0.22 to 0.54 (P < 0.001 for all comparisons). The distinction between the hypolimnion and sediment communities was particularly clear when assessing community differentiation along the water column of MA. Samples were collected at 4, 7, 13, 19 m in July 2008 (Table 3.1). Composition and allele abundances varied along the water column (Figure 3.1). This was most evident at the 4 m depth, which contained a highly abundant allele (21 of 58 reads; 36%) that was not detected in any of the other depths. The fact that this allele was also detected in four other integrated hypolimnion samples (MAH 5/1/08 & 7/1/08, RLH 5/30/09, SSBH 4/30/08) but never in any sediment samples further suggests specificity of this allele for the water column. Notably, this allele was seen in MAH 7/1/08, an integrated sample (2- 20m) taken 7 days prior, suggesting that the apparent absence of this genotype in the 7, 13, or 19 m depths was not due to under sampling of diversity in those samples. In addition to comparisons of allele composition between samples, we compared matched hypolimnion and sediment samples using measures of richness, evenness, raw genetic distance, and phylogenetic relatedness. We found the hypolimnion of all lakes except TB to possess more limited richness (number of alleles and Chao1 estimator), greater unevenness (Shannon evenness index), and less genetic (Dxy) and phylogenetic diversity (PD) (Figure 3.2) (16,35). In contrast to the other lakes, the sediment and hypolimnion samples from TB appeared nearly identical in alpha diversity.

Environmental selection contributes to hypolimnion specificity We mapped the abundances of the more prevalent alleles (≥ 3 total sequences) observed in the matched sediment and hypolimnion samples onto a phylogeny inferred from the dataset (Figure 3.3). Alleles located in the Methanosarcinales order were predominantly detected in sediment samples. Within the Methanomicrobiales order, we observed some phylogenetic clustering of alleles predominantly found in the hypolimnion or sediment (Figure 3.3).

75 Selection for certain functionally coherent groups in a community will produce a community containing taxa that are more closely related on average (i.e., phylogenetic clustering) than null communities created by random assembly of alleles from all sampled communities (i.e., the regional taxa pool) (49). We compared each community to four different null communities where various aspects of the assembled community were held constant (see 3.3 Experiment Procedures). This analysis has previously been used to provide evidence of environmental selection in microbial mat communities (1). In the study, the authors observed negative effect sizes of the statistical test (Z values) to range from ca. -2 to -9, which was broader than the range of negative Z values observed here (- 1.78 to -3.53). We found MAH, NSBH, and RLH to show significant signals of phylogenetic clustering (P < 0.005, or in one case P < 0.05) (Table B.2). SSBH only displayed significant clustering using a null community where frequency of taxa among samples was held constant (“Frequency” null community). This signal was robust to the inherent phylogenetic uncertainty (i.e., low bootstrap support) in our analysis (Figure B.4). In addition, we found the persistent taxa in each lake (i.e., observed in all time points as in (33)) to recapitulate this signal of phylogenetic clustering much more than the transient taxa, suggesting environmental selection has played a significant role in defining the hypolimnion community (Figure B.5). In contrast to the hypolimnion samples, no sediment samples showed significant clustering (Table B.2).

Biogeographic patterns among hypolimnion methanogen communities We assessed between-lake community differentiation by performing pairwise comparisons among all samples (i.e., all layers, lakes, and times) using the weighted- Unifrac index and visualized the highly multidimensional data using principal coordinates analysis (PCoA) (Figure 3.4A). We found significant clustering by lake for the hypolimnion samples, while sediment samples clustered together regardless of lake

(ANOSIM, R=0.71, P<0.001). The same pattern was also observed when using FST instead of the weighted-Unifrac index (data not shown). Interestingly, these findings showed that while the hypolimnion communities were differentiated between lakes, the sediment samples were not.

76 We examined whether dispersal limitation or environmental selection resulted in the biogeographic patterns we observed. We did not find the mean relative taxon abundance to increase monotonically with frequency of observation among samples as would be expected if the communities from each lake were randomly assembled from a common source meta-community (Figure B.6) (46). In addition, a Mantel test comparing geographic distance to weighted-Unifrac values showed no rank-order correlation (Mantel, ρ = 0.12, P < 0.14). We did resolve a significant correlation between environmental distance (i.e., all measured environmental parameters after combining co- varying parameters; see 3.3 Experiment Procedures) and weighted-Unifrac values when TB was excluded (Mantel; ρ = 0.26; P < 0.02), but not when TB was included (Mantel; ρ = 0.07; P < 0.17). In contrast, we did find a strong and significant correlation between community differentiation among lakes and geographic distance when we compared communities using the Bray-Curtis index (Mantel, ρ = 0.61, P < 0.001), but no correlation with environmental distance was found when using Bray-Curtis values either with TB (Mantel; ρ = 0.11; P < 0.09) or without TB (Mantel; ρ = 0.16; P < 0.07). The Bray-Curtis index only compares overlap of alleles (weighted by abundance) regardless of phylogenetic relatedness, whereas weighted-Unifrac incorporates phylogenetic distance. Therefore, the difference in signal when using weighted-Unifrac or Bray-Curtis values indicated that while specific alleles showed endemism, lake-specific environmental selection predominantly selected for specific courser phylogenetic groups. To identify the statistical significance and relative importance of each environmental parameter in defining the lake-specific biogeographic pattern, we performed a multiple regression on matrices (MRM) to compare weighted-Unifrac values and each individual parameter while keeping all others constant (28,31). Overall, our MRM model explained a large and significant proportion of the beta diversity among hypolimnion communities (R2 = 0.44; P < 0.002). The minimum dissolved oxygen and maximum temperature measured in each lake’s water column were the only parameters to significantly explain community differentiation (Table B.3). Fitting these variables onto an ordination of weighted-Unifrac values showed both parameters to increase along the

77 primary axis, which explained much of the between-lake variance of community differentiation (Figure 3.4B). Based on the evidence of selection mediating within- and between-lake community differentiation, we used AdaptML to test for phylogenetic signals of local adapted “ecotypes” at either the within- or between-lake scale (24). Briefly, AdaptML estimates the number of ecologically cohesive clades (ecotypes) and models the transition among ecotypes through the evolutionary history based on the phylogenetic signal of where each allele was detected. Overall, five ecotypes (red, green, blue, yellow, and teal) were identified with varying probabilities of being detected in a particular environment (Figure 3.5A). Of the inferred ecotypes, only the red ecotype showed a high probability of detection in the sediment (79%) (Figure 3.5B). While each of the other four ecotypes had similar probabilities of detection in the hypolimnion (~70%), each showed varying probabilities of detection among lakes, with MA and RL, TB, NSB, and MA being most predominant in the green, blue, yellow, and teal ecotypes, respectively. Again these data supported the differentiation between the hypolimnion and sediment communities and between the hypolimnion samples from different lakes.

3.5 Discussion We have shown through culture-independent techniques that the hypolimnion and sediment of our focal freshwater lakes harbor distinct methanogen communities. Our results suggest that environment-specific selection contributed to the observed distinction between hypolimnion and sediment communities. In addition, our data support a scenario where a hierarchy of processes produced a biogeographic pattern among hypolimnion communities: lake-specific alleles resulted from dispersal limitation between lakes, while environmental gradients among lakes selected for specific courser phylogenetic groups. We originally suggested two hypotheses that would lead to differentiation between sediment and hypolimnion communities: one neutral and one deterministic. We predicted that random colonization of the hypolimnion through high migration from the sediment community would make the hypolimnion community appear as a subset of the source community composition. In contrast to this scenario, we found the hypolimnion samples to be largely composed of unique alleles not found in the matched sediment

78 samples, and little overlap of alleles was observed between the two environments (Table 3.2). This random colonization hypothesis is further disputed by our depth-resolved sampling of MA, which showed an abundant taxon specific to the shallowest depth (Figure 3.1). In contrast to this neutral explanation for the observed difference between sediment and hypolimnion communities, our multiple lines of evidence parsimoniously favor a deterministic process – one where environmental selection lead to vertical niche separation of methanogen taxa along the water column and sediment. First, the depth- resolved sampling of MA showed an abundant and unique taxon that was not found within the sediment, suggesting adaption to the shallow depths of the hypolimnion (Figure 3.1). Second, we observed phylogenetic clustering of alleles found predominantly in either the hypolimnion or sediment (Figure 3.3). Quantitative assessment of phylogenetic clustering in each of the five lakes revealed significant clustering in four of the hypolimnion samples: MAH, RLH, NSBH, and SSBH (Table B.2), indicating selection for particular ecologically coherent clades. Furthermore, the persisting taxa of MAH, NSBH, and SSBH were more related than ‘transient’ taxa (i.e., intermittently detected) as would be expected if seasonally persistent selective constraints were maintaining these common taxa (33) (Figure B.5). Third, AdaptML inferred an ecotype with a high bias for detection in the sediment (‘sediment-ecotype’), while the other four ecotypes were hypolimnion-biased (Figure 3.5). This inference is consistent with a scenario of adaptive evolution in the hypolimnion environment. What phenotypic characteristics may have lead to hypolimnion or sediment specificity? Methanosarcinales alleles were predominantly found in the sediment, while both hypolimnion and sediment-specific phylogenetic groups were observed in the Methanomicrobiales (Figure 3.3). Differences in substrate utilization, substrate affinity, and growth yield may have localized Methanosarcinales to the sediment. By possessing cytochromes, members of Methanosarcinales growing on H2 can obtain greater than twice the growth yields of any known Methanomicrobiales isolate, but require at least a ten-fold higher H2 partial pressure (48). H2 partial pressures lower than this threshold have been observed in the sediment and water column of Mary Lake (44), suggesting

Methanomicrobiales would outcompete Methanosarcinales for H2 in our focal lakes.

79 However, acetoclastic methanogenesis often dominates the hydrogenotrophic pathway in lake sediments, and only members of Methanosarcinales can utilize acetate (18,56). Endosymbiotic associations with anaerobic ciliates provide another source of relevant niche differentiation between Methanosarcinales and Methanomicrobiales. Methanomicrobiales are the only methanogens shown to be endosymbionts of anaerobic ciliates found in freshwater sediments (50). These ciliates use hydrogenosomes to ferment pyruvate and in conjunction produce hydrogen for their endosymbiotic methanogens, which therefore negates potential competition for H2 between endosymbiotic and free-living methanogens (e.g., members of Methanosarcinales), regardless of H2 partial pressures in the sediment. In addition, multiple endosymbiotic replacements have occurred during the evolution of freshwater anaerobic ciliates (50), which may have lead to the repeated evolution of sediment-specificity among the Methanomicrobiales alleles (Figure 3.3). Inferring the phenotypic traits that confer layer-specific adaptations within the Methanomicrobiales is more difficult. Methanoregula (6), the genus to which the majority of alleles belong, currently only contains two characterized isolates (40). Moreover, ecological plasticity may be high in the genus, considering that the two existing Methanoregula type strains differ considerably in their known ecologies. For instance, Methanoregula boonei 6A8 was isolated from a peat bog, has an optimum pH of ca. 5 and is only known to grow on H2:CO2 (6), while Methanoregula formicicum SMSP was isolated from a sludge digester, has an optimum pH of ca. 7, and can use formate in addition to H2:CO2 (59). Relatively few alleles grouped with characterized isolates in other Methanomicrobiales genera, which further hindered inference of possible ecological roles (Figure 3.3). In addition, low bootstrap support for some nodes in the Methanomicrobiales clade increased uncertainty of the actual phylogenetic distribution of layer specificities. Finally, caution must be taken when inferring ecologies among even closely related microbial taxa because many studies have found a great deal of ecological variation among microbial taxa defined at very fine taxonomic resolutions (4,7,24–26). At the between-lake scale, we found a biogeographic pattern among the hypolimnion samples but not among sediment samples (Figure 3.4A), and this biogeographic distribution significantly co-varied with environmental diversity when

80 TBH was excluded (Figure 3.4B; Table B.3). Also, this biogeographic pattern did not appear to follow a model of random colonization from a common source community as would be expected from a neutral model (Figure 3.4A; Figure B.6) (46). Of the measured environmental variables, only the minimum dissolved oxygen (DO) and maximum temperature of the water column showed significant covariance with the hypolimnion community diversity (46) (Figure 3.4B; Table B.3). Both factors may have directly or indirectly influenced methanogen community diversity. Minimum DO may be a proxy for redox potentials permissible to methanogenesis – possibly occurring in microenvironments more reduced than our macro-scale measurements. Maximum temperature may increase rates of cellular processes, leading to higher growth rates of water column methanogens either directly or indirectly though other trophic stages in the freshwater lake carbon cycle. For instance, chlorophyll-a has been shown to co-vary with methanogen community composition (Ofiţeru et al., 2010), and increased warming has been shown to elevate the proportion of carbon originating from primary production that was emitted as methane from freshwater environments (60). While we did find significant co-variation between environmental factors and hypolimnion methanogen community diversity, much of the variance in beta diversity remained unexplained. These factors may include other environmental parameters. For instance, total organic carbon, organic matter composition, algal biomass, and chlorophyll-a have been shown to significantly explain spatial heterogeneity of methanogen composition in freshwater sediments (17,46,54). Dispersal limitation may have also contributed to the observed biogeographic pattern among lakes. We found a significant correlation between geographic distance and community similarity only when using a measurement of beta diversity that did not account for phylogenetic relatedness. Theses findings suggest that endemism of specific alleles resulted from low rates of dispersal between lakes. The same explanatory scenario has been hypothesized for marine Prochlorococcus populations and Sulfolobus populations in hot springs (33,55). While our results suggest distinct hypolimnion communities adapted to their local environment, we are still left with the question of how certain methanogens persist and grow in anoxic water columns (5,33). Methanogenesis is considered to require a reduction-oxidation (redox) potential of -300 mv (23), but the minimum redox

81 measurement obtained in our focal lakes was -89 mv. Spatial heterogeneity in redox at the micro-scale, such as anaerobic microsites in suspended particles may allow for methanogenesis in unfavorable macro environments. This explanation as has been suggested for oxic soil environments and ocean water columns (12,49,51), and is feasible given our findings if either particles are persistently suspended in the water column (low within-lake dispersal) or strong selective pressures occur within the water column. An alternative explanation is periodic methanogenesis during transiently favorable conditions in redox, substrate concentrations, or other variables. In line with this possibility, seasonal inputs of allochthonous organic carbon by sedimentation processes have been shown to stimulate methanogenesis in freshwater sediments (9,45). Although we have found evidence of distinct methanogen communities adapted to the water column environment, we did not find a measureable build-up of methane in the water column that would indicate a significant production of methane (Figure B.1). Anaerobic oxidation of methane (AOM) may have occurred in tandem with water column methanogenesis, preventing the accumulation of methane in the water column (13,14,38,43); however, we did not detect any mcrA alleles of close association to known anaerobic methane oxidizing clades. Another explanation is the coupling of water column methanogenesis to microaerophilic methanotrophy (34), which seems most feasible if methanogenesis were occurring in anaerobic microsites. Lastly, along with favorable redox potentials, high rates of methanogenesis in the water column may be periodic along diurnal or seasonal timescales. Our findings hold implications for future studies of freshwater methanogen diversity. The unlinked community dynamics between the hypolimnion and sediment communities may alter outcomes at ecological and evolutionary scales. Therefore, models of freshwater carbon cycling may need to incorporate water column methanogen diversity if such diversity is found to affect ecosystem function. Further research of methanogen community dynamics in the water columns of freshwater lakes is required to fully elucidate how freshwater methanogen communities will behave under future environmental scenarios.

82 3.6 Acknowledgements We would like to thank K. Milferstedt for help with clone library construction and sequencing, S. Paver for help with sampling and NTL-MO metadata analysis, N. Lottig for help with methane concentration measurements, and all other students and staff at the UW Madison Trout Lake Station for assistance with sampling. Funding was provided by NSF via the North Temperate Lake Microbial Observatory (MCB-0702653). The authors declare no conflict of interest.

83 3.7 References 1. Armitage DW, Gallagher KL, Youngblut ND, Buckley DH, Zinder SH. Millimeter-scale patterns of phylogenetic and trait diversity in a salt marsh microbial mat. Front Microbiol. 2012; 3: 293.

2. Bastviken D, Cole JJ, Pace ML, van de Bogert MC. Fates of methane from different lake habitats: Connecting whole-lake budgets and methane emissions. J Geophys Res. 2008; 113: 1–14.

3. Bastviken D, Tranvik LJ, Downing JA, Crill PM, Enrich-Prast A. Freshwater Methane Emissions Offset the Continental Carbon Sink. Science. 2011; 331: 50– 50.

4. Bhaya D, Grossman AR, Steunou A-S, et al. Population level functional diversity in a microbial community revealed by comparative genomic and metagenomic analyses. ISME J. 2007; 1: 703–713.

5. Biderre-Petit C, Jézéquel D, Dugat-Bony E, et al. Identification of microbial communities involved in the methane cycle of a freshwater meromictic lake. FEMS Microbiol Ecol. 2011; 77: 533–545.

6. Bräuer SL, Cadillo-Quiroz H, Ward RJ, Yavitt JB, Zinder SH. Methanoregula boonei gen. nov., sp. nov., an acidiphilic methanogen isolated from an acidic peat bog. Int J Syst Evol Microbiol. 2011; 61: 45–52.

7. Cadillo-Quiroz H, Didelot X, Held NL, et al. Patterns of gene flow define species of thermophilic Archaea. PLoS Biol. 2012; 10: e1001265.

8. Camacho C, Coulouris G, Avagyan V, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009; 10: 421.

9. Chan OC, Claus P, Casper P, Ulrich A, Lueders T, Conrad R. Vertical distribution of structure and function of the methanogenic archaeal community in Lake Dagow sediment. Environ Microbiol. 2005; 7: 1139–1149.

10. Chao A, Chazdon RL, Colwell RK, Shen T-J. Abundance-Based Similarity Indices and Their Estimation When There Are Unseen Species in Samples. Biometrics. 2006; 62: 361–371.

11. Conrad R, Klose M, Noll M. Functional and structural response of the methanogenic microbial community in rice field soil to temperature change. Environ Microbiol. 2009; 11: 1844–1853.

12. De Angelis MA, Lee C. Methane production during zooplankton grazing on marine phytoplankton. Limnol Oceanogr. 1994; 39: 1298–1308.

84 13. Deutzmann JS, Schink B. Anaerobic Oxidation of Methane in Sediments of Lake Constance, an Oligotrophic Freshwater Lake. Appl Environ Microbiol. 2011; 77: 4429–4436.

14. Eller G, Känel L, Krüger M. Cooccurrence of Aerobic and Anaerobic Methane Oxidation in the Water Column of Lake Plußsee. Appl Environ Microbiol. 2005; 71: 8925–8928.

15. Excoffier L, Lischer HEL. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol Ecol Resour. 2010; 10: 564–567.

16. Faith DP. Conservation evaluation and phylogenetic diversity. Biol Conserv. 1992; 61: 1–10.

17. Fan X, Wu QL. Intra-habitat differences in the composition of the methanogenic archaeal community between the Microcystis-dominated and the macrophyte- dominated bays in Taihu Lake. Geomicrobiol J. accepted.

18. Ferry JG. Methanogenesis: ecology, physiology, biochemistry & genetics. New York: Chapman & Hall; 1993.

19. Goslee SC, Urban DL. The ecodist Package for Dissimilarity-based Analysis of Ecological Data. Jstatssoft. 2007; 22: 1–19.

20. Grossart H-P, Frindte K, Dziallas C, Eckert W, Tang KW. Microbial methane production in oxygenated water column of an oligotrophic lake. Proc Natl Acad Sci U S A. 2011; 108: 19657–19661.

21. Harrell FE. Hmisc: Harrell Miscellaneous [Internet]. 2013. Available from: http://CRAN.R-project.org/package=Hmisc.

22. Hubbell SP. The Unified Neutral Theory of Biodiversity and Biogeography. New York: Princeton University Press; 2001. 448 p.

23. Hungate RE. Hydrogen as an intermediate in the rumen fermentation. Arch Microbiol. 1967; 59: 158–164.

24. Hunt DE, David LA, Gevers D, Preheim SP, Alm EJ, Polz MF. Resource partitioning and sympatric differentiation among closely related bacterioplankton. Sci N Y NY. 2008; 320: 1081–1085.

25. Jaspers E, Overmann J. Ecological Significance of Microdiversity: Identical 16S rRNA Gene Sequences Can Be Found in Bacteria with Highly Divergent Genomes and Ecophysiologies. Appl Env Microbiol. 2004; 70: 4831–4839.

85 26. Jezbera J, Jezberová J, Brandt U, Hahn MW. Ubiquity of Polynucleobacter necessarius subspecies asymbioticus results from ecological diversification. Environ Microbiol. 2011; 13: 922–931.

27. Kembel SW, Cowan PD, Helmus MR, et al. Picante: R tools for integrating phylogenies and ecology. Bioinformatics. 2010; 26: 1463–1464.

28. Lichstein JW. Multiple regression on distance matrices: a multivariate spatial analysis tool. Plant Ecol. 2007; 188: 117–131.

29. Luton PE, Wayne JM, Sharp RJ, Riley PW. The mcrA gene as an alternative to 16S rRNA in the phylogenetic analysis of methanogen populations in landfill. Microbiology. 2002; 148: 3521–3530.

30. Maddison D, Maddison W. MacClade 4. manual. 2000; 1–492.

31. Manly BFJ. Randomization and regression methods for testing for associations with geographical, environmental and biological distances between populations. Res Popul Ecol. 1986; 28: 201–218.

32. Martiny AC, Tai APK, Veneziano D, Primeau F, Chisholm SW. Taxonomic resolution, ecotypes and the biogeography of Prochlorococcus. Environ Microbiol. 2008; 11: 823–832.

33. Milferstedt K, Youngblut N, Whitaker R. Spatial structure and persistence of methanogen populations in humic bog lakes. ISME. 2010; 4: 764–776.

34. Miller JA, Kalyuzhnaya MG, Noyes E, Lara JC, Lidstrom ME, Chistoserdova L. Labrys methylaminiphilus sp. nov., a novel facultatively methylotrophic bacterium from a freshwater lake sediment. Int J Syst Evol Microbiol. 2005; 55: 1247–1253.

35. Nei M. Molecular Evolutionary Genetics. Columbia University Press; 1987. 512 p.

36. Oksanen J, Blanchet FG, Kindt R, et al. vegan: Community Ecology Package [Internet]. 2012. Available from: http://CRAN.R-project.org/package=vegan.

37. R Development Core Team. R: A Language and Environment for Statistical Computing [Internet]. Vienna, Austria: 2010. Available from: http://www.R- project.org.

38. Raghoebarsing AA, Pol A, van de Pas-Schoonen KT, et al. A microbial consortium couples anaerobic methane oxidation to denitrification. Nature. 2006; 440: 918– 921.

39. Riley WJ, Subin ZM, Lawrence DM, et al. Barriers to predicting changes in global terrestrial methane fluxes: analyses using CLM 4 Me, a methane biogeochemistry model integrated in CESM. Biogeosciences Discuss. 2011; 8: 1733–1807.

86 40. Sakai S, Ehara M, Tseng I-C, et al. Methanolinea mesophila sp. nov., a hydrogenotrophic methanogen isolated from rice field soil, and proposal of the archaeal family Methanoregulaceae fam. nov. within the order Methanomicrobiales. Int J Syst Evol Microbiol. 2011; 62: 1389–1395.

41. Scandella BP, Varadharajan C, Hemond HF, Ruppel C, Juanes R. A conduit dilation model of methane venting from lake sediments. Geophys Res Lett. 2011; 38: 1–6.

42. Schloss PD, Westcott SL, Ryabin T, et al. Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities. Appl Environ Microbiol. 2009; 75: 7537– 7541.

43. Schubert CJ, Vazquez F, Lösekann-Behrens T, Knittel K, Tonolla M, Boetius A. Evidence for anaerobic oxidation of methane in sediments of a freshwater system (Lago di Cadagno). FEMS Microbiol Ecol. 2011; 76: 26–38.

44. Schütz H, Conrad R, Goodwin S, Seiler W. Emission of hydrogen from deep and shallow freshwater environments. Biogeochemistry. 1988; 5: 295–311.

45. Schwarz JI, Eckert W, Conrad R. Response of the methanogenic microbial community of a profundal lake sediment (Lake Kinneret, Israel) to algal deposition. Limnol Oceanogr. 2008; 53: 113–121.

46. Sloan WT, Lunn M, Woodcock S, Head IM, Nee S, Curtis TP. Quantifying the roles of immigration and chance in shaping prokaryote community structure. Environ Microbiol. 2006; 8: 732–740.

47. Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinforma Oxf Engl. 2006; 22: 2688–2690.

48. Thauer RK, Kaster A-K, Seedorf H, Buckel W, Hedderich R. Methanogenic archaea: ecologically relevant differences in energy conservation. Nat Rev Microbiol. 2008; 6: 579–591.

49. Van der Maarel MJEC, Sprenger W, Haanstra R, Forney LJ. Detection of methanogenic archaea in seawater particles and the digestive tract of a marine fish species. FEMS Microbiol Lett. 1999; 173: 189–194.

50. Van Hoek AH, van Alen T, Sprakel VS, et al. Multiple acquisition of methanogenic archaeal symbionts by anaerobic ciliates. Mol Biol Evol. 2000; 17: 251–258.

51. Von Fischer JC, Hedin LO. Separating methane production and consumption with a field-based isotope pool dilution technique. Glob Biogeochem Cycles. 2002; 16: 1–13. idn

87

52. Webb CO, Ackerly DD, McPeek MA, Donoghue MJ. Phylogenies and Community Ecology. Annu Rev Ecol Syst. 2002; 33: 475–505.

53. Weimer W, Lee G. Some considerations of the chemical limnology of meromictic Lake Mary. Limnol Oceanogr. 1973; 18: 414–425.

54. West WE, Coloso JJ, Jones SE. Effects of algal and terrestrial carbon on methane production rates and methanogen community structure in a temperate lake sediment. Freshw Biol. 2012; 57: 949–955.

55. Whitaker RJ, Grogan DW, Taylor JW. Geographic Barriers Isolate Endemic Populations of Hyperthermophilic Archaea. Science. 2003; 301: 976–978.

56. Whiticar M., Faber E, Schoell M. Biogenic methane formation in marine and freshwater environments: CO2 reduction vs. acetate fermentation—Isotope evidence. Geochim Cosmochim Acta. 1986; 50: 693–709.

57. Wickham H. ggplot2: elegant graphics for data analysis. Springer New York; 2009. 210 p.

58. Winfrey M, Zeikus J. Microbial methanogenesis and acetate metabolism in a meromictic lake. Appl Environ Microbiol. 1979; 37: 213–221.

59. Yashiro Y, Sakai S, Ehara M, Miyazaki M, Yamaguchi T, Imachi H. Methanoregula formicica sp. nov., a methane-producing archaeon isolated from methanogenic sludge. Int J Syst Evol Microbiol. 2011; 61: 53–59.

60. Yvon-Durocher G, Montoya JM, Woodward G, Jones JI, Trimmer M. Warming increases the proportion of primary production emitted as methane from freshwater mesocosms. Glob Change Biol. 2011; 17: 1225–1234.

88 3.8 Tables and figures

Mary LakeMary LakeRose LakeRose Lake 20 20 15 15 10 10 % Relative Abundance % Relative Abundance 5 5 0 0 4 m 7 m 4 m 7 m 7 m 7 m 13 m 19 m 13 m 19 m 11 m 11 m 2-20m 2-20m 2-11 m 2-11 m Figure 3.1: Genotype depth-specificity. The dendrogram depicts relative similarity of the mcrA alleles (pairwise genetic distance). The heat map shows relative abundances of alleles in depth-discrete and integrated hypolimnion samples from MA in July 2008, with each row representing an individual allele. The red lines highlight an allele that comprised 36% (21/58) of the total abundance in the MA 4 m sample. Additionally, this allele was not found in any of the 8 sediment samples in the dataset.

89 ●

60 Richness 50 ● ●

40 ● ● ● 30 ● ● 20 ● 160 ●

● 120 ● Chao

● 80 ● ● ● ● 40 ● ● ● ● Evenness ● ● ● Shannon Layer 0.9 ● ● HypolimnionH 0.8 ● ● ● SedimentS

0.7 ● ● 7 ● ● ● ●

6 PD 5 ● ● 4 ● ● 0.25 ● ●

0.20 ● ● Dxy ● 0.15 ● ● ● 0.10 ● RL TB MA SSB NSB Figure 3.2: Reduced alpha diversity in most hypolimnion samples. The alpha diversity measures used were number of OTUs (‘Richness’), the Chao1 index (‘Chao’), the Shannon-evenness index (‘Shannon Evenness’), phylogenetic diversity (‘PD’), and pairwise genetic distance (‘Dxy’).

90

Figure 3.3: Phylogenetic distribution of hypolimnion and sediment specificity. Alleles from matched hypolimnion and sediment samples were rarefied to the lowest total abundance of any sample (57 in TBH 5/29/09). The maximum likelihood phylogeny of all mcrA alleles that occurred a total of ≥3 times was inferred with RAxML (GTR+Γ model; rooted on Methanopyrus kandleri AV19). Bootstrap values >50 (100 bootstrap replicates) are shown. The blue and red bars represent the total abundance of each allele in the hypolimnion or sediment, respectively. Error bars display standard deviations of abundances from 1000 permutations of rarefying samples.

91

Figure 3.4

92 Figure 3.4 (cont.): Between-lake community diversity. mcrA allele counts and relatedness among samples were quantified with pairwise comparison of samples with the weighted- Unifrac metric. A) Colors designate all hypolimnion samples for each lake, with all sediment samples designated in the same group regardless of lake origin. Ellipses highlight the dispersion among samples in each grouping (lake and/or layer). B) Only weighted-Unifrac values from MAH, NSBH, RLH, and SSBH samples were used. The environmental variables found to correlate significantly with community diversity (i.e., maximum temperature and minimum dissolved oxygen) are shown as arrows. Arrow direction and length indicates the direction and magnitude of increase for each variable.

93 A)

* * *

Layer Lake

*

* * * * * * * *

*

MA RL TB Sediment NSB SSB Hypolimnion B) Lake Layer ecotype

0.00 0.25 0.50 0.75 0.00 0.25 0.50 0.75

Figure 3.5

94 Figure 3.5 (cont.): Hypolimnion ecotypes originate from the sediment. Ancestral ecotypes inferred by AdaptML were mapped onto a maximum likelihood phylogeny of the mcrA dataset (A), and a bar chart shows the probability of each inferred ecotype being detected in a given lake and layer (B). Node colors in (A) indicate the inferred ecotype at that ancestor. Asterisks highlight basal nodes with bootstrap support ≥70. The inner and outer rings indicate respectively the lake and layer where each extant taxon was detected. The key in (A) also applies to the bar colors in (B).

95 67#8,1)9:)!"#$% !"#$%&'()*"+,) -".,) -".,/-"0,1)2*345) ;,<7,'=,;>) !"#!$#"%& '()*&+(,-& './& 0"0& !"#!$#"%& 12)34&56(),789:&;2:& 15;/& !!0& !"#!$#"%& 52<34&56(),789:&;2:& 55;/& =%& !"#!>#"%& ?)2<3&;2:& ?;/& =$& !#@#"=& '()*&+(,-& './& AB& !#@#"=& 12)34&56(),789:&;2:& 15;/& @>& !#@#"=& ?)2<3&;2:& ?;/& $B& @#A"#"=& 52<34&56(),789:&;2:& 55;/& !!>& @#A"#"=& ?)2<3&;2:& ?;/& B"& $#!#"=& '()*&+(,-& './& >B& >#A"#"=) 12)34&56(),789:&;2:) 15;/) !!") %#!#"=) '()*&+(,-) './) =>) %#!#"=) C2D-&+(,-) C+/) =") %#A#"=) 52<34&56(),789:&;2:) 55;/) $!) ?@A@BA& C"10)-".,& CDEF#& GA& ?@A@BA& C"10)-".,& CDE?#& 5G& ?@A@BA& C"10)-".,& CDE3>#& A>& ?@A@BA& C"10)-".,& CDE3H#& IB& G@5H@BH) 691+J)!$"1.%&'()K9() 6!KL) I5) G@5H@BH) 691+J)!$"1.%&'()K9() 6!K!) ?I) G@5H@BH) !97+J)!$"1.%&'()K9() !!KL) AG) G@5H@BH) !97+J)!$"1.%&'()K9() !!K!) ??) G@5H@BH) M197+)K9() MKL) G?) G@5H@BH) M197+)K9() MK!) ??) G@>B@BH) C"10)-".,) CDL) HF) G@>B@BH) C"10)-".,) CD!) H5) G@>B@BH) N9;,)-".,) N-L) ?F) G@>B@BH) N9;,)-".,) N-!) AF) G@>B@3B) C"10)-".,) CD!) IH) G@>B@3B) !97+J)!$"1.%&'()K9() !!K!) G>) !&E/E&F2)&4*6278G9829H&I5E&F2)&D-J8G-93H&(9J&IKLGE&F2)&J-634MJ8DN)-3-&D(G67-D& 0&!"#$%D-O<-9N-D&F)2G&34-&J-634MJ8DN)-3-&D(G67-D&P-)-&(9(7*Q-J&89J-6-9J-937*& F)2G&(77&234-)&D(G67-D& A&!"#$%D-O<-9N-D&F)2G&D(G67-D&89&R27J&()-&92S-7&32&348D&D3

96 !"#$% &''$'$(%()"*$+% ,*"-./0*12(% 34*$5(45% 6378% 9$2:)1$+.;52<*"=88% !"# $%&$#'()*+# ),$-# ),-.# ),).# ),-/# 012# (%-)#'3*+# ),.&# ),.4# ),3)# ),33# 56# /%&(#'$*+# ),.-# ),.)# ),(-# ),4/# 112# /%$-#'-*+# ),.(# ),73# ),(3# ),4-# 82# -%.$#'-*+# ),77# ),7(# ),(7# ),33# 9#:#;#),)(####99#:#;#),))(# ! # # # Table 3.2: Matched sediment and hypolimnion samples show significant beta diversity. “Alleles shared” (i.e., shared non-unique mcrA sequences) is the number of alleles shared between matched samples over the total number of genotypes for both samples (shared alleles counted once). “Sorenson” signifies the abundance-based Sorenson index (10).

Significance values were ascertained for FST and weighted-Unifrac metrics, with significance values for all comparisons denoted by asterisks.

97 CHAPTER 4: GENOMIC DIFFERENTIATION BETWEEN TWO COEXISTING POPULATIONS OF METHANOSARCINA MAZEI5

4.1 Abstract Ecological differentiation among organisms can dictate their varying prevalence among heterogeneous environments. However, the origins and maintenance of the genetic variation that leads to ecological differentiation is often not known. Linking genomic changes to environmental conditions through high-resolution comparative genomics is a promising methodology to examine the evolution of ecological differentiation. We assessed how genetic variation is distributed among Methanosarcina mazei isolates obtained from three locations in the Columbia River Estuary in Oregon, USA. 454-tag pyrosequencing of the mcrA gene showed a highly variable distribution of methanogens among sites. Whole-genome analysis of 56 isolates averaging <1% nucleotide divergence revealed two distinct, co-existing clades, which we referred to as the ‘mazei-T’ and ‘mazei-WC’ clades. Genomic analysis showed that these two clades differ in gene content and fixation of allelic variants that point to potential differences in primary metabolism and also interactions with foreign genetic elements. Laboratory growth experiments demonstrated a physiological difference between clades in regards to substrate turnover of trimethylamine, which supports a hypothesis of ecological differentiation. These findings will improve our understanding of the interactions between genetic and ecological diversity of Methanosarcina in natural environments.

5 Nicholas Youngblut and Rachel Whitaker designed the experiments and analyzed the data. Nicholas Youngblut, Joe Wirth, and Maya Errabolu performed the experiments. Holly Simon and Mariya Smit collected samples and provide the geochemical data.

98 4.2 Introduction Comparative genomic studies of microbial populations are revealing variation even among very closely related bacterial and archaeal strains. The model of the microbial pan-genomes suggest that to rapidly adapt to spatially and temporally varying environmental conditions, microbial populations often possess highly variable and frequently horizontally transferred gene complements (i.e., the pan-genome) that can facilitate adaptive evolution more rapidly than mutation rates would allow (9,14,24). If this hypothesis is correct, the distribution of gene variation along natural environmental gradients can reveal the genetic basis of ecological differentiation among strains (16,48,58). Exploring genetic variation in the context of core genome will provide insight into the evolutionary mechanisms through which microbial diversity is generated. Furthermore, placing this genomic variation in its natural context will provide an ecological basis of how diversity is maintained in the natural microbial world. Methanogenic archaea comprise a phylogenetically and ecologically diverse assemblage distributed across a wide range of environmental conditions (21). An estimated 1 billion tons of methane are produced yearly by methanogens (53), which makes this microbial assemblage a key player in the global carbon cycle. Most methanogens can only grow on a very limited number of substrates, mainly by reducing

CO2 with H2. In contrast, Methanosarcina is the only known genus that can utilize all identified methanogenic pathways (28). This genus, along with other members of the Methanosarcinales, uniquely possesses cytochromes and conserves energy from methane production via respiration (52). This diversity and potential plasticity in resource utilization may provide the Methanosarcina with an advantage in natural environments and/or a basis for ecological differentiation. Genetic and biochemical studies of three model Methanosarcina strains (M. acetivorans C2A (49), M. barkeri Fursaro (25), and M. mazei Gö1 (18)) have revealed a great deal on the functional roles of cellular processes occurring in this unique group of archaea. However, whether these strains represent the natural diversity in the Methanosarcina and how this functional variation among strains evolved and is distributed in natural populations is largely unknown. Linking genetic variation to ecological roles is necessary to understand Methanosarcina

99 diversity in nature and their contributions to the anaerobic ecosystems in which they inhabit. In this study, we employed a comparative genomics approach to identify genomic signatures of incipient adaptive evolution in a population of Methanosarcina mazei. Our goal was to link genetic variation to its potential ecological implications. The experimental system consisted of three sites along the Columbia River Estuary that varied in salinity along with other geochemical parameters. We hypothesized that niche partitioning though adaptive genomic evolution played a role in creating and maintaining diversity in this Methanosarcina population.

4.3 Experimental procedures Sample collection On July 22, 2011, sediment samples were collected immediately off the shore at three locations in the Columbia River Estuary: the mouth of Young’s Bay, near an inlet of Young’s Bay, and within Baker Bay (Table C.1). Samples were collected in sterile 50 ml Corning tubes and stored on ice until processed. Approximately 0.5 L of sediment from each site was sent AgSource (Umatilla, OR) for geochemical analysis (Table C.1).

Nucleic acid extraction and mcrA amplicon 454-pyrosequencing DNA was extracted from approximately 1 g from each sediment sample with the PowerSoil DNA Isolate Kit using the standard protocol (MoBio, Carlsbad, CA). Adapters and barcodes were added to the mcrA-specific primers mcrF and mcrR (35) for multiplex 454 pyrosequencing. The gene fragment was amplified using a PCR with a final volume of 30 µl containing the final concentrations of 0.2 mM dNTPs (each), 0.5 µM primers (each), and 0.03 U of Phusion DNA Polymerase F-530 (Finnzymes, MA, USA). Thermocycler conditions consisted of an initial denaturation for 3 minutes at 98°C, followed by 30 cycles of 30 seconds at 98°C, 15 seconds at 59°C, and 15 seconds at 72°C, with a final extension of 10 minutes at 72°C. Triplicate PCR reactions were pooled, and gel bands of the estimated amplicon size were excised and purified with the Wizard DNA Purification Kit (Promega, Madison, WI). Purified amplicons were submitted to the WM Keck Center for Comparative and Functional Genomics at the

100 University of Illinois at Urbana-Champaign for 454 pyrosequencing on a 454 GSFLX+ Sequencer (Roche, Branford, CT). mcrA sequence analysis Mothur v1.24.0 was used for 454 pyrosequencing read barcode and primer removal along with sequence quality filtering (45). Sequences that were <200 bp in length, contained homopolymers >8 bp long, had >1 error in the barcode, or >1 error in the primer were discarded. Chimeric sequences were identified and removed with the Mothur implementation of Uchime (17). Combined, quality filtering removed 12.5% (8898 of 71097) of the sequences. Sequences were clustered into 295 operational taxonomic units (OTUs) at a 95% sequence identity cutoff with CD-HIT-454 v4.6 (23). A reference mcrA dataset was constructed from select mcrA sequence fragments of cultured methanogens in the Functional Gene pipeline and repository (FunGene; http://fungene.cme.msu.edu), mcrA gene sequences from each sequenced isolate in this study, and all sequenced Methanosarcina genomes. Amino acid sequences of the reference mcrA dataset was aligned with mafft v7.037b and the reverse-translated with PAL2NAL v14 (27,51). A maximum likelihood phylogeny was inferred from nucleotide alignment with RAxML v7.2.6 (GTR-Γ model; 100 bootstrap replicates) (50). Representative sequences for each environmental mcrA OTU were inserted into the reference phylogeny using RAxML. The number of sequences within each OTU at their sample origin was mapped onto the tree with iTOL v2 (34).

Culture isolation Direct plating with agar overlays under strictly anaerobic conditions was used for initial Methanosarcina strain cultivation. Three sediment dilutions (100, 10-1, and 10-2) were plated on bicarbonate-buffered high-salt media (38) or freshwater PIPES-buffered media (1 µM KPO4, 10 µM NH4Cl, 4 µM resazurin, 40 mM PIPES buffer, 1:100 trace elements solution, 1:100 vitamin solution, 1X base salts). The trace element solution consisted of 5.8 mM N(CH2CO2H)3, 2 mM Fe(NH4)2(SO4)2, 1.1 mM Na2SeO3, 0.4 mM

CoCl2Ÿ6H2O, 0.6 mM MnSO4ŸH2O, 0.4 mM Na2MoO4Ÿ2H2O, 0.3 mM Na2WO4Ÿ2H2O,

0.3 mM ZnSO4Ÿ7H2O, 0.4 mM NiCl2Ÿ6H2O, 0.16 mM H3BO3, 40 µM CuSO4Ÿ5H2O. The

101 vitamin solution consisted of 73 µM p-aminobenzoic acid, 81 µM nicotinic acid, 42 µM calcium pantothenate, 49 µM pyridoxine HCl, 27 µM riboflavin, 30 µM thiamine HCl, 20

µM biotin, 11 µM folic acid, 24 µM α-lipoic acid, 3.7 µM vitamin B12. The base salts consisted of 342 mM NaCl, 14.8 mM MgCl2Ÿ6H2O, 1 mM CaCl2Ÿ2H2O, and 6.71 mM KCl. The media were supplemented with 40 mM acetate, 60 mM methanol, or 50 mM trimethyamine for a total of 18 pairwise dilution-media-substrate combinations for each sample. Cultures were incubated at 37°C. Isolated colonies picked from the direct plating were subjected to 1-3 rounds of colony purification, with media containing ampicilin (10 µg/ml) and erythromycin (1 µg/ml) in the first round, while the subsequent rounds contained rifampicin (10 µg/ml) and erythromycin (1 µg/ml). Each colony-purified culture was screened for bacterial contamination with the bacteria-biased 16S rRNA primers B27F (5’-AGAGTTTGATCCTGGCTCAG-3’) and 1492R (5’- GGYTACCTACGACTT-3’) and in some cases verified using light microscopy.

Genomic sequencing and assembly Genomic DNA extracted from each culture using the UltraClean Microbial DNA Isolation Kit (MoBio, Carlsbad, CA). Multiplexed libraries were prepared using the Nextera XT DNA Sample Prep Kit (lllumina, San Diego, CA) without performing the bead normalization step, and instead the libraries were quantified with a Qubit fluorometer (Life Technologies, Carlsbad, CA) and normalized by dilution with molecular grade water. Normalized libraries were pooled and submitted to the WM Keck Center for Comparative and Functional Genomics at the University of Illinois at Urbana- Champaign for paired-end sequencing with a HiSeq2000 sequencer (Illumina, San Diego, CA). The paired-end reads were quality-filtered with the FASTX Toolkit v0.0.13 using a q-value cutoff of 30 over 95% of the read length. Filtered reads were randomly subsampled to one million read pairs per sample. Genomic assembly and scaffolding was performed with a modified version of the A5 assembly pipeline (54), in which IDBA-UD was used instead of IDBA for the actual assembly (42). BLASTn was used to identify scaffolds potentially containing contamination (e.g., regions with a high number of hits to E. coli). The percentage of total scaffold length of any assembly that was identified as

102 contamination and removed varied from 0-3%. Gaps in scaffolds were filled in silico with GapFiller (8), with an average of 68% of gaps closed per assembly. Sequel was used to correct on average 39 base miscalls and/or erroneous indels in each assembly (44). To estimate the number of gene clusters missing from any particular draft genome, we compared draft assemblies produced from just Illumina HiSeq2000 paired- end reads from the reference stains M. mazei WWM610, M. mazei C16, and M. mazei LYC versus the closed versions of these genomes, which had been assembled with multiple sequencing methods including paired-end 454 pyrosequencing data, cosmid paired-end reads, and Sanger sequencing to fill gaps. We found that <2% of coding sequences were missing when comparing the draft assemblies to their corresponding closed genomes, indicating that artificial gene absence was relatively minor among the draft isolate genomes.

Core and variable gene analysis Genes were called and annotated using the Rapid Annotation using Subsystem Technolog (RAST) server (3). The ITEP toolkit was used to group genes from all isolates (or isolates and type strains) into putative orthologs through Markov Chain Clustering (via the MCL program) of BLASTp maximum bitscore ratios (0.4 cutoff, 2.0 inflation parameter) (6,19). The ITEP toolkit was further used along with custom Perl scripts to investigate gene content variation among genomes.

Quantification of dN/dS, percent sequence identity, and FST values for all core genes was performed with SNAP, Mothur, and Arlecore v3.5.1.3, respectively (20,30,45). Ranger-DTL was used to infer gene transfer events from ML trees inferred from each gene cluster present in at least 4 copies in the population (4).

Alignment and phylogenetic inference A whole genome alignment (WGA) of isolate genomes identified as M. mazei and all reference M. mazei genomes was created with mugsy v1.2.3 (1). RAxML (GTR-Γ model; 100 bootstrap replicates) was used to infer a ‘species’ tree was inferred from all ‘core’ (found in all taxa) local collinear blocks (LCBs) in the WGA.

103 Gene clusters were aligned with mafft v7.037b and the reverse-translated with PAL2NAL v14 (27,51). A maximum likelihood phylogeny was inferred from nucleotide alignment with RAxML v7.2.6 (GTR-Γ model; 100 bootstrap replicates) (50).

Statistics and plotting All statistical evaluations were performed in R (43). The circular genome plots were created with Circos (32), and all other plots were produced with R using the ggplot2 package (56).

Methane production assays Methane production, as proxy for culture growth, was monitored using a Hewlett Packard 5890 Series II gas chromatograph with flame ionization detection (Hewlett- Packard, Wilmington, DE). The maximum growth rates inferred from the resulting methane production curves were compared among isolates. The low-throughput nature of this method limited the number of cultures that can be compared in the same experiment. In addition, isolates were originally isolated on different media (high salt and freshwater) and substrates (trimethylamine, methanol, and acetate), which could be confounding factors. To control for this, we performed direct pairwise comparisons between isolates from different clades isolated on the same media and substrate when possible. Two to four isolates were compared in any given round of methane production monitoring. Each isolate was grown in its ‘native’ medium (high salt or freshwater) in triplicate or quadruplicate. In addition, we found no growth in cultures inoculated in media lacking substrate and balch tubes containing all substrates but lacking inoculum.

4.4 Results A spatially heterogeneous distribution of Methanosarcinales We sampled three sites in the Columbia River Slough, near Astoria, Oregon that differed in various geochemical parameters (Table C.1). These sampling sites will be referred to as Young’s Bay Back (YBB) near the freshwater inlet of Young’s River, Young’s Bay Mouth (YBM) near the mouth of Young’s Bay, and Baker Bay (BB) just off of a peer on the northwest side of Baker Bay. YBB and YBM were separated by ~4

104 km and were ~20 km and ~23 km from BB, respectively. These samples establish a salinity gradient with salt-water intrusion ranging from highest in BB to lowest in YBB. A similar gradient of pH was also observed, with a pH of 7.5, 6.6, and 6.3 at BB, YBM, and YBB, respectively (Table C.1). To place our comparative population genomics study in the context of the entire methanogen community, we performed amplicon 454 pyrosequencing with primers targeting the methyl coenzyme M reductase subunit A (mcrA) gene, a phylogenetic marker found in all known methanogenic archaea. The 62,199 reads that remained following quality control had a median length of 216 bp and were evenly distributed among the three samples (YBM: 20,303 reads; YBB: 18,970 reads; BB: 22,926 reads). A maximum likelihood phylogeny of the reads grouped into OTUs defined at a 95% sequence similarity cutoff revealed large differences in community composition and structure (Figure 4.1). A single OTU in the Methanomicrobiales order dominated both sites in Young’s Bay, while the Baker Bay site harbored a relatively large fraction of Methanosarcina OTUs (49% of all reads in the sample), which were rather evenly distributed across the genus (Figure 4.1). Of the OTUs falling into Methanosarcinales clades with taxonomically classified strains, OTUs in the Lacustris, Barkeri, and Thermophila genera dominated, with variable relative abundances among sites. We obtained isolates on each of our 18 sample-media-substrate combinations used in our isolation scheme for a total of 128 colony-purified isolates (Figure C.1). Many of our isolates possessed mcrA alleles that showed close relationships to OTUs from the 454 pyrosequencing amplicons, although culturing biases appear to have change the relative abundances of strains that were successfully isolated (Figure 4.1; Figure C.2). Over 40% of these isolates (56 of 128 isolates) possessed nearly identical mcrA sequences and fell within the Methanosarcina mazei clade (Figure C.2), although M. mazei appeared to have a low abundance in the mcrA amplicon dataset. Most of these M. mazei isolates originated from the Young’s Bay samples (YBM: 26, YBB: 26, BB: 4). We focused on this large set of closely related isolates to resolve the distribution of genomic variation in this heterogeneous environment.

105 WGA phylogeny indicates two coexisting clades Illumina paired-end sequencing and de novo genome assembly of all 56 Methanosarcina mazei isolates produced draft assemblies with median values for the number of scaffolds, N50, and maximum scaffold length being 167 scaffolds, 43.7 kb, and 155 kb, respectively (Table 4.1, Table C.2). Genome length and number of CDS called by RAST were very consistent among isolates, with a median genome length and number of CDS of 4.08 Mb and 3941, respectively (3). A whole genome alignment (WGA) of these draft assemblies with the seven existing closed M. mazei genomes resulted in 1331 ‘core’ (found in all taxa) local collinear blocks (LCBs) ≥500 bp in length, representing 71% of the median genome length (2.89 of 4.08 Mb). This highly fragmented WGA appeared to be partially a result of loss of synteny within the M. mazei clade. Large rearrangements and inversions were observed in a WGA of the closed M. mazei reference genomes (Figure C.3A). A dendrogram of double cut and join distance values (a measure of synteny) showed that loss of synteny increased with genetic distance, and also M. mazei TMA was relatively highly divergent in genome organization (Figure C.3B). A maximum likelihood phylogeny inferred from all core LCBs extracted from the WGA of all M. mazei isolates and reference strains revealed that the isolates were distributed between two major, well-resolved clades, with the first harboring the reference strains M. mazei WWM610 and M. mazei C16 (‘mazei-WC’ clade), and the second containing M. mazei TMA (‘mazei-T’ clade) (Figure 4.2). M. mazei WWM610, M. mazei C16, and M. mazei TMA were originally isolated from Wisconsin (USA), the North Sea, and Japan, respectively (2,7). Each clade possessed an average sequence identity of ~99.71%, with ~99.4% between clades. Within both of these clades, many isolates possessed nearly clonal core genomes, with on the order of tens to hundreds of SNPs differentiating these strains. These two major clades were each largely composed of a comparable number of isolates from both Young’s Bay samples, which are only separated by ~4 km (Table C.1). Mapping isolation metadata (originating sample, medium, and substrate) on to the phylogeny revealed that strains in mazei-WC were primarily isolated on acetate, while no specific bias for any substrate was observed among the mazei-T isolates. In order to test for niche conservatism in regards to growth

106 on either medium or substrate, we calculated the phylogenetic D statistic, which is a measure of whether the phylogenetic distribution of a trait significantly departs from a random distribution or a pattern of clumping expected under a Brownian evolution model (22). D values of 1 indicate over-dispersion of trait distributions, while values of <0 imply significant phylogenetic conservatism. We found evidence of niche conservatism when defining the niche by isolation media (D < -0.12), but not by substrate (D > 0.44).

Gene flow between clades Of the putative orthologous genes present among the 52 M. mazei isolates (BB isolates excluded), 2145 were found in a single copy in all genomes (‘core’) and 4110 varied in copy number and/or were absent in some isolates (‘variable’). To explore the evolutionary impact of horizontal gene transfer within and between the clades, we employed a tree reconciliation approach with Ranger-DTL (4), which infers recombination by determining incongruences between the ‘species’ tree (the WGA phylogeny) and a gene tree (inferred for each core gene). We found that poorly supported nodes in either the species or gene trees, as often occur among highly similar sequences, greatly inflated the number of inferred gene transfers. To mitigate this artifact, we used a clone-corrected species tree, where all approximately clonal taxa were collapsed to one representative taxon. In addition, we performed tree-reconciliation only on core gene trees with significant bootstrap support (>50). We found evidence of inter-clade transfer in 27 of these gene clusters (100%), indicating continued gene flow between the clades. Manual determinations of the incongruencies between these gene trees and the WGA phylogeny suggested that at least most of the inter-clade transfers had likely occurred.

Partitioned variation across the core genome To assess the level of genetic segregation (fixation) between mazei-T and mazei-

WC, we calculated FST values for all core genes and mapped them onto the reference genome M. mazei C16 (Figure 4.3). Genes showing a high degree of fixation (FST > 0.9) tended to co-locate in certain regions of the genome (Figure 4.3 – red line plot). A high degree of fixation associated with high genetic divergence is indicative of loci that may be under divergent selection. When mapping the average divergence of genes onto

107 the genome, we found one region (~0.03-0.08 Mb on M. mazei C16) in which many divergent genes were co-located with genes having high FST values (Figure 4.3). In fact, many of the same genes in this region were both divergent and (nearly) fixed. Genes with high fixation in this region were annotated as S layer and outer membrane proteins, a ribonuclease, a phosphomannomutase, four subunits of the molybdenum-containing formylmethanofuran dehydrogenase (FmdA, FmdB, FmdD, and FmdE), a molybdate transporter ATP-binding protein, and multiple hypothetical proteins. To test for evidence of diversifying selection between clades, we calculated dN/dS for each core gene and mapped on all genes with a dN/dS value >1.1 that we did not find to be a false signal caused by artificial frameshifts in the alignment (Figure 4.3 – purple bars). Only 4 of these genes also had high FST values (>0.50), and each of these had hypothetical annotations (Table 4.2).

Clade-specific gene content We assessed the distribution of variable genes across the strains we isolated. A maximum parsimony tree based on the presence and absence of particular genes within a genome is congruent with the core gene phylogeny (Figure 4.2). Except for the highly similar strain clusters, the majority of branch length in this phylogeny is on the terminal branches indicated a high level of strain specific gene content. While the absence of particular genes from a subset of strains may be an artifact of draft assemblies, tests of our assembly pipeline on reference M. mazei genomes indicated that only a small proportion (< 2%) of coding sequences (CDS) were likely missing from any particular isolate genome (see 4.3 Methods). Still, to account for this possibility and to mitigate errors caused by artificial gene loss, we defined genes specific to the mazei-WC or mazei-T clade as those found in the majority of strains in one clade but absent from the other. We compared the number of genes in this category to the variable genes that appeared specific to a sampling location, isolation media or substrate. None or very few (≤ 2) genes were specific to location, media, or substrate that were not also specific to a clade, suggesting that evolutionary history rather than location or these potential ecological differences contribute most to evolution of the variable genome.

108 Using this cutoff, we found 60 and 192 gene clusters specific to the mazei-WC and mazei-T clades, respectively (Table 4.3). Of these genes, 75% (45 genes) and 72% (139 genes) had ambiguous ‘hypothetical’ annotations in the mazei-WC and mazei-T clades, respectively. We found that the two clades showed a very uneven distribution of variable genes, with mazei-T possessing ~2.2 times more than mazei-WC (Figure 4.2). Mapping genes specific to mazei-WC and mazei-T onto the reference genomes in each clade revealed that many of these genes co-located within variable regions throughout the genome (Figure 4.4). To test for association between these regions and signatures of mobile genetic elements, we used IslandViewer to identify putative genomic islands based on multiple sequence composition criteria (33). For genes specific to mazei-WC, we found one instance of co-location of multiple clade-specific genes and a putative genomic island. Eight such regions were observed for mazei-T. Most of these regions were flanked by one or a combination of mobile element proteins, transposases, integrases, and tRNAs, suggesting they were introduced by mobile genetic elements. Clade-specific genes may have been acquired from closely related taxa or more divergent taxa as have often been observed in many microbial strains (13,41). We used BLASTp to determine the best match of each clade-specific gene in NCBI’s non- redundant protein database. Hits to the Bacteria domain comprised 75% and 67% of the best hits for genes specific to mazei-WC and mazei-T, respectively. More specifically, taxa in the Proteobacteria and Firmicutes phyla composed at total of ~40% of the best hits in each clade, with many hits to members of the Deltaprotoebacteria, Clostridia and Bacilli.

Clade-specific defense systems Among the genes specific to each clade were CRISPR-associated genes and restriction-modification (RM) genes (Table 4.3). Both are associated with systems known to confer cellular defense against foreign genetic elements and can also modulate gene transfer (5,11,37). Therefore, clade-specific gene complements for either system may have created inter-clade barriers to gene flow or clade-biased gene flow from gene pools external to the population.

109 The CRISPR-associated genes included Cas1, Cas2, Cas3, Cas6, Csc1, and a CRISPR locus-related DNA binding protein. The CRISPR locus-related DNA binding protein was divergent from any CAS genes in NCBI’s non-redundant protein database, with limited similarity to Csa3 in Halococcus sp. (BLASTp e-value of 1e-31). These genes were all co-located and in the same arrangement in 18 mazei-T isolates. One of the isolates was missing Csc1 and Cas3, which did not appear to be a draft genome artifact, since the remaining 5 CAS genes were >10 kb from the end of a contig. The clade- specificity of these CAS genes appeared to be a result of differences in CRISPR subsystems between the clades (Figure 4.5). Specifically, Subtype III-B was found in the majority of mazei-WC members but only 2 members of mazei-T, while Subtype I-D was solely present in members of the mazei-T clade. Additionally, Subtype C, a putative new subtype of Type III CRISPR systems, was found in all members of the mazei-T clade except for the most basal members of the clade. These basal members instead possessed Subtype III-A and or III-B CRISPR systems. We assessed whether the spacer content in the repeat spacer arrays of these strains corresponded to the difference in CRISPR subsystems between clades. We quantified the number of unique CRISPR spacers shared between isolates grouped by clade, origin, isolation media or substrate, or CRISPR subtype (Table C.3). Of the 2240 unique spacer sequences, 3.4% and 4.2% were specific to the mazei-WC and mazei-T clades, respectively. No spacers were specific to YBB isolates. We found 4.1% to be specific to YBM isolates; however, these spacers were also specific to the mazei-T clade and all fell into the nearly clonal clade of 15 YBM isolates. No spacers were specific to isolates from a specific location or isolates grown on a specific medium or substrate. In contrast, 16% of spacers were specific to a subtype. In summary, spacer content appeared largely dictated by CRISPR subtypes and their partial segregation between clades. RM genes were highly prevalent among the isolates, with 88 gene clusters annotated as RM genes, and all were single copy. Additional partitioning of defense against mobile genetic elements each clade contained a different suite of restriction modification systems. The Type I RM system genes specific to mazei-WC were adjacent when mapped to the reference genomes and located adjacently to a putative genomic

110 island. The Type I, subunit R gene specific to mazei-T was adjoining to S and M subunits, and the Type III, subunit R gene was adjacent to an M subunit.

A putative clade-specific metabolic gene cassette A set of genes specific to mazei-WC were annotated as CoB-CoM heterodisulfide reductase subunits A & C (HdrAC), formate dehydrogenase subunits A & B (FdhAB), and a methylviologen-reducing hydrogenase (MvhD). These genes along with hdrB were immediately adjacent and always found in a conserved order (fdhAB, mvhD, hdrABC). Gene overlap was found between fdhA and fdhB (10 bp) along with mvhD and hdrA (1 bp), while mvhD and fdhB were separated by 2 bp and gaps of 170 bp and 136 bp existed between the hdr genes (Figure 4.6A). These data suggest the gene set is monocystronic. This putative operon was found in all members of mazei-WC including M. mazei C16 but excluding M. mazei WWM610 and the 2 most closely related isolates to M. mazei WWM610, suggesting a recent loss in this subclade. Additionally, this gene set was also found in a sparsely distributed set of Methanosarcina strains, namely M. horonobensis HB1, M. naples 100, and M. calensis Cali. This sparse distribution among the Methanosarcina spp. indicates that either this putative operon was in the last common ancestor of the genus and lost in many separate events or was gained in separate events in multiple clades throughout the genus. Maximum likelihood phylogenies of each of these 6 genes were either congruent with a concatenated core gene phylogeny, or held low bootstrap support for nodes displaying incongruences. This finding supported the hypothesis that the gene cluster was present in the last common ancestor of Methanosarcina and lost in separate events among most members of the genus. All 52 M. mazei isolates possessed both of the Hdr paralogs found in other Methanosarcina spp. (Figure 4.6B). Unrooted maximum likelihood phylogenies of each Hdr homolog showed that the HdrABC adjacent to FdhAB and MvhD appeared to be a third paralog (HdrA3B3C3) (Figure C.4; Figure 4.6). In addition, the genes were distant paralogs that seemed to have diverged prior to the inception of the Methanosarcina genus. Although divergent, the 4Fe-4S, FAD, and zinc-binding domains of genes found in biochemically-characterized HdrABC genes appeared to be conserved in HdrA3B3C3.

111 We found the MvhD adjacent to HdrA3B3C3 to be highly divergent from the characterized delta subunit of methylviologen-reducing hydrogenase in M. maripaludis, with only 40% sequence identity. However, all 4 cysteines were conserved in the 2Fe-2S binding domain. FdhB was homologous to FdhB1 in M. maripaludis (BLASTp evalue of 2e-101) although divergent with 43% sequence identity. The FdhB homologs in Methanosarcinales form a distinct clade from any methanogen clade lacking cytochromes (Figure C.5). In contrast to FdhB, FdhA does not appear to be homologous to any characterized FdhA and had no homology to any coding sequences outside of the Methanosarcinales.

Differences in substrate utilization rates between clades We tested for the possibility of niche differentiation between the mazei-WC and mazei-T clades by directly comparing methane production rates of isolates from each clade when growing on acetate, methanol, or trimethylamine. Optical density could not be used to directly assess growth rates because the cultures were polymorphic and often formed macroscopic aggregates even in the high salt medium (0.4 M NaCl). In an attempt to control for growth biases introduced during isolation, we performed direct comparisons between isolates from each clade that, when possible, were obtained from the same location and isolated on the same medium and substrate (Figure 4.7). We found no significant differences in maximum methane production rates between the mazei-WC and mazei-T clades when isolates were grown on acetate or methanol (Table C.4). However, the mazei-WC and mazei-T isolates showed significantly different maximum methane production rates on trimethylamine (Table C.4), with mazei-WC isolates generally showing substantially higher rates than mazei-T isolates.

4.5 Discussion Variation within the methanogen community The mcrA 454-pyrosequencing amplicon dataset showed that methanogen community composition was highly varied across the region. The single dominant Methanomicrobiales OTU observed in both Young’s Bay samples may be a result of

112 lower pH at these sites relative to BB (Figure 4.1; Table C.1). Methanomicrobiales taxa have been shown to dominate in other acid sediments and soils (12,39). Our isolation strategy targeted Methanosarcinales diversity. While the strains isolated through this strategy do not represent the methanogen community based on the mcrA amplicon dataset, the differences in types of Methanosarcina isolated from each sample may be ecologically significant. The low number of M. mazei isolates obtained from Baker Bay may be a result of the high abundance of other Methanosarcina clades in that site (Figure 4.1). Specifically, M. mazei may have been outcompeted by these other clades in Baker Bay. Geochemical and methanogen composition data show that BB is much different than either YBM or YBB (Table C.1). The existence of two closely related, but distinct clades of M. mazei in both YBM and YBB suggested that consistent niche partitioning was maintaining diversity of M. mazei in each of the two environments. The WGA phylogeny showed that the isolate population also consisted of three reference strains isolated from various locations across the globe, suggesting that the initial bifurcation of mazei-T and mazei-WC in addition to the division between the subclades containing M. mazei WWM610 and M. mazei C16 likely occurred before colonization of BB, YBM, and YBB. Additionally, the very low number of variable genes specific to isolates from a particular sampling location indicated that local geographic barriers did not substantially contribute to evolution of the variable genome. In contrast, the observed signal of trait conservatism in regards to growth on freshwater or high salt media did suggest niche partitioning within each clade – likely in regards to salinity tolerance. We hypothesize that the maintenance of diversity was likely due to ecological differentiation of the mazei-WC and mazei-T clades, but these ecological barriers have not lead to complete species barriers, given our evidence of continued gene flow between mazei-T and mazei-WC. Divergent selection for adaptive loci in a background of recombination can create heterogeneous genomic landscapes, where genetic differentiation accumulates in some regions, while the homogenizing effects of gene flow limit divergence in other regions (40). The high prevalence of gene fixation in only certain regions of the core genome supports this scenario (Figure 4.3). Consistent with the maintenance of diversity through ecological differentiation, we did observe a clade-

113 specific difference in regards to methane production rates during growth on TMA (Figure 4.7), which shows that these clades are phenotypically distinct. By investigating the suite of variable genes and core alleles that are fixed between the two clades, we have identified some possible sources of ecological differentiation the mazei-WC and mazei-T clades. These potential drivers are described in the following.

Genomic signatures of a potentially adaptive gene cassette We identified a gene cassette putatively encoding HdrA3B3C3 with FdhAB and MvhD among the mazei-WC isolates and certain type strains (Figure 4.6). The hdrABC paralog along with mvhD and fdhAB were highly conserved within the mazei-WC clade and appeared to be monocystronic, suggesting that the gene set was functional and formed a complex. Methanococcus maripaludis has been experimentally shown to grow on formate in a pathway coupling formate oxidation to the reduction of CO2 to formyl- MFR by additionally coupling the reaction to the reduction of the CoM-CoB heterodisulfide through flavin-based electron bifurcation (15). The pathway involves FdhAB for formate oxidation and a methylviologen-reducing hydrogenase acting as the adaptor subunit for electron bifurcation by HdrABC. We hypothesize that the gene cassette found in out isolates either functions in the same manner to confer growth on formate, or the pathway may primarily run in reverse to produce format as a precursor for purine biosynthesis (55). This metabolic difference between mazei-T and mazei-WC may have played a role in maintaining genomic diversity in this population. As additional support for this hypothesized pathway, four subunits of the molybdenum-containing formylmethanofuran dehydrogenase (Fmd) were fixed or nearly fixed between mazei-T and mazei-WC. Fmd is one of two isozymes that are known to catalyze the reduction of CO2 to formyl-MFR (46). The Fwd operon was also present among all M. mazei strains, but each gene in the operon had low FST values (<0.37). Polymorphisms in FmdC and FmdF, the other genes in the FmdEFACDB operon, were both also (nearly) fixed, but these genes were absent from a strain in mazei-T, and therefore had not been included in our analysis of core gene fixation. While polymorphisms in the Fmd operon were (nearly) fixed between clades, both synonymous and nonsynonymous genes were (nearly) fixed, leading to low dN/dS values (< 0.3). This

114 signal of weak purifying selection on the Fmd operon may be due to changing selective pressures (i.e., initially positive but changed to weak purifying) from when the gene cassette was initially gained in mazei-WC or lost in mazei-T. Interestingly, multiple genes involved in molybdenum and tungstate transport along with molybdopterin synthesis were also divergent and nearly fixed between clades. Two of these genes were located ~1 Mb downstream of the Fmd operon in M. mazei C16, suggesting that the fixation was not due to linkage. Given that Fmd requires either molybdenum or tungstate to function (47), the introduction or loss of the mazei-WC clade-specific cassette may have altered cellular requirements for molybdopterin. While multiple lines of evidence support our hypothesized role of the putative fdhAB-mvhD-hdrABC cassette in formate catabolism or anabolism, this hypothesis is tenuous for a number of reasons. First, the genes encoded on the mazei-WC clade- specific cassette were divergent from known, characterized homologs, indicating that function may not be conserved between homologs. This is especially true for the putative fdhA, which did not showed homology to any characterized homologs, unlike the other five genes in the putative cassette. Second, The fdhB may actually encode an F420- reducing hydrogenases (FrhB), given that both of these subunits contain prosthetic groups for F420, FAD, and a [4Fe4S] cluster (26). Third, the formate transporter (FdhC) does not appear to be present in any M. mazei strain (57). Fourth, no Methanosarcina strain has been shown to grow on formate, including all of the reference Methanosarcina strains that we found to contain the fdhAB-mvhD-hdrABC cassette. Finally, neither FdhAB paralog in Methanococcus maripaludis appears to function in the production of formate for purine biosynthesis (57). A potential alternative role of HdrABC in this cassette may be as a ferredoxin:CoB-S-S-CoM heterodisulfide oxidoreductase as proposed for HdrA1B1C1 in Methanosarcina acetivorans C2A – specifically during methylotrophic growth (10). As a subfunction to HdrA1B1C1, HdrA3B3C3 may be expressed specificity during growth on TMA, which could explain the observed growth differences on TMA between the clades (Figure 4.7). In this scenario, MvhD may still act as an adaptor subunit between Hdr and ferredoxin. However, the functional role of FdhAB is not clear, given that formate would

115 not be used in this pathway, and Fpo would likely be the acting F420-reducing hydrogenase.

Influence of mobile genetic elements The majority of variable genes in this dataset are strain specific indicating their rapid movement through these genomes. Numerous genes specific to mazei-T or mazei- WC co-localized with identified genomic islands, which suggested that they were introduced via mobile elements (Figure 4.4). This process appeared to occur asymmetrically between clades, with the mazei-T clade possessing a much larger number of putative genomic islands (Figure 4.4). This large discrepancy between clades may also explain the fact that members of mazei-T contained more total genes than members of mazei-WC. Higher rates of gene loss in mazei-WC could also create the imbalance. However, the hypothesis of a higher rate of gene gain in mazei-T was supported by the fact that M. mazei TMA has 191 more coding sequences (CDS) than any M. mazei strain falling outside of the mazei-WC and mazei-T clades. The prevalence, diversity, and distribution of defense systems in these strains are consistent with mobile elements being active in these populations. Interestingly, the mazei-T and mazei-WC clades differed in genes associated with both RM systems and CRISPR systems (Table 4.3). The clade specificity of both innate and adaptive immune systems against foreign DNA suggests that the maintenance of diversity within the YBB and YBM environments may at least partially have resulted from differences in susceptibility to viral pathogens or other genetic mobile elements. The correspondence between specific CRISPR spacers and the presence or absence of a particular CRISPR- CAS system supports this conclusion. Integrated proviruses have been observed in other Methanosarcina strains (31); yet, no viruses targeting Methanosarcina have been isolated, and little is known of host-virus interactions in the genus. The identified genomic islands in both clades had hallmarks of integrated viral or plasmid DNA (e.g., flanking mobile element proteins, integrases, and tRNAs), and thus further supported the hypothesis that mobile elements have mediated genomic evolution in this population. In addition to acting as cellular defense mechanisms against foreign DNA, the RM and CRISPR systems specific to each population may have been mediating inter-

116 clade gene flow and thereby maintained the differentiation among coexisting strains (11). The RM systems in these genomes may have also been major instigators of genome rearrangements (29). We suggest that in addition to maintaining genomic diversity the high prevalence of RM systems may explain the rapid erosion of synteny reported here (Figure C.3) and across the Methanosarcina genus (36).

4.6 Conclusions Through a combination of culture-based and culture-independent techniques, we demonstrate significant genetic variation within a Methanosarcina mazei population. In addition, we were able to assess the likelihood of various processes that maintained this genetic and phenotypic variation. Through understanding the genetic basis for ecological differentiation in our focal population, we move closer to a conceptualization of how genomic diversity interacts with environmental heterogeneity to dictate methanogen community composition across spatial scales.

4.7 Acknowledgements We would like to thank everyone involved in the Genomes to Life: Biological Systems Research on the Role of Microbial Communities in Carbon Cycling project, especially James Henriksen, Matt Benedict, Judy Luke, and Sarah Reinhart. We would also like to that the other members of Dr. William Metacalf’s lab that provided assistance with this work. This material is based on work supported in part by the DOE under grant No. DE-SC0005348.

117 4.8 References 1. Angiuoli SV, Salzberg SL. Mugsy: fast multiple alignment of closely related whole genomes. Bioinformatics. 2011; 27: 334–342.

2. Asakawa S, Akagawa-Matsushita M, Morii H, Koga Y, Hayano K. Characterization of Methanosarcina mazei TMA isolated from a paddy field soil. Curr Microbiol. 1995; 31: 34–38.

3. Aziz RK, Bartels D, Best AA, et al. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics. 2008; 9: 75.

4. Bansal MS, Alm EJ, Kellis M. Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss. Bioinformatics. 2012; 28: i283– i291.

5. Barrangou R, Fremaux C, Deveau H, et al. CRISPR Provides Acquired Resistance Against Viruses in Prokaryotes. Science. 2007; 315: 1709–1712.

6. Benedict M, Henriksen J, Metcalf W, Whitaker R, Price N. ITEP: an integrated toolkit for exploration of pan-genomes. submitted.

7. Blotevogel KH, Fischer U, Lüpkes KH. Methanococcus frisius sp.nov., a new methylotrophic marine methanogen. Can J Microbiol. 1986; 32: 127–131.

8. Boetzer M, Pirovano W. Toward almost closed genomes with GapFiller. Genome Biol. 2012; 13: R56.

9. Boucher Y, Cordero OX, Takemura A, et al. Local Mobile Gene Pools Rapidly Cross Species Boundaries To Create Endemicity within Global Vibrio cholerae Populations. mBio. 2011; 2: e00335–10.

10. Buan NR, Metcalf WW. Methanogenesis by Methanosarcina acetivorans involves two structurally and functionally distinct classes of heterodisulfide reductase. Mol Microbiol. 2010; 75: 843–853.

11. Budroni S, Siena E, Hotopp JCD, et al. Neisseria meningitidis is structured in clades associated with restriction modification systems that modulate homologous recombination. Proc Natl Acad Sci. 2011; 108: 4494–4499.

12. Cadillo-Quiroz H, Yashiro E, Yavitt JB, Zinder SH. Characterization of the Archaeal Community in a Minerotrophic Fen and Terminal Restriction Fragment Length Polymorphism-Directed Isolation of a Novel Hydrogenotrophic Methanogen. Appl Environ Microbiol. 2008; 74: 2059–2068.

13. Cohan FM, Koeppel AF. The Origins of Ecological Diversity in Prokaryotes. Curr Biol. 2008; 18: R1024–R1034.

118 14. Coleman ML, Chisholm SW. Ecosystem-specific selection pressures revealed through comparative population genomics. Proc Natl Acad Sci. 2010; 107: 18634– 18639.

15. Costa KC, Lie TJ, Xia Q, Leigh JA. VhuD Facilitates Electron Flow from H2 or Formate to Heterodisulfide Reductase in Methanococcus maripaludis. J Bacteriol. 2013; 195: 5160–5165.

16. Denef VJ, Kalnejais LH, Mueller RS, et al. Proteogenomic basis for ecological divergence of closely related bacteria in natural acidophilic microbial communities. Proc Natl Acad Sci. 2010; 107: 2383–2390.

17. Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R. UCHIME improves sensitivity and speed of chimera detection. Bioinformatics. 2011; 27: 2194–2200.

18. Eggen RIL, Geerling ACM, Groot PWJD, Ludwig W, De Vos WM. Methanogenic Bacterium Gö1: An Acetoclastic Methanogen that is Closely Related to Methanosarcina frisia. Syst Appl Microbiol. 1992; 15: 582–586.

19. Enright AJ, Van Dongen S, Ouzounis CA. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2002; 30: 1575–1584.

20. Excoffier L, Lischer HEL. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol Ecol Resour. 2010; 10: 564–567.

21. Ferry JG. Methanogenesis: ecology, physiology, biochemistry & genetics. New York: Chapman & Hall; 1993.

22. Fritz SA, Purvis A. Selectividad en el Riesgo de Extinción y Tipos de Amenaza en Mamíferos: una Nueva Medida de la Intensidad de la Señal Filogenética en Atributos Binarios. Conserv Biol. 2010; 24: 1042–1051.

23. Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next- generation sequencing data. Bioinforma Oxf Engl. 2012; 28: 3150–3152.

24. Holden MTG, Feil EJ, Lindsay JA, et al. Complete genomes of two clinical Staphylococcus aureus strains: Evidence for the rapid evolution of virulence and drug resistance. Proc Natl Acad Sci U S A. 2004; 101: 9786–9791.

25. Kandler O, Hippe H. Lack of peptidoglycan in the cell walls of Methanosarcina barkeri. Arch Microbiol. 1977; 113: 57–60.

26. Kaster A-K, Goenrich M, Seedorf H, et al. More Than 200 Genes Required for Methane Formation from H2 and CO2 and Energy Conservation Are Present in Methanothermobacter marburgensis and Methanothermobacter thermautotrophicus. Archaea. 2011; 2011: 1–23.

119 27. Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013; 30: 772–780.

28. Kendall MM, Boone DR. The order methanosarcinales. The prokaryotes. 2006; 3: 244–256.

29. Kobayashi I. Behavior of restriction–modification systems as selfish mobile elements and their impact on genome evolution. Nucleic Acids Res. 2001; 29: 3742–3756.

30. Korber B. HIV signature and sequence variation analysis. Comput Anal HIV Mol Seq. 2000; 4: 55–72.

31. Krupovic M, Forterre P, Bamford DH. Comparative analysis of the mosaic genomes of tailed archaeal viruses and proviruses suggests common themes for virion architecture and assembly with tailed viruses of bacteria. J Mol Biol. 2010; 397: 144–160.

32. Krzywinski M, Schein J, Birol İ, et al. Circos: an information aesthetic for comparative genomics. Genome Res. 2009; 19: 1639–1645.

33. Langille MGI, Brinkman FSL. IslandViewer: an integrated interface for computational identification and visualization of genomic islands. Bioinformatics. 2009; 25: 664–665.

34. Letunic I, Bork P. Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy. Nucleic Acids Res. 2011; 39: W475–W478.

35. Luton PE, Wayne JM, Sharp RJ, Riley PW. The mcrA gene as an alternative to 16S rRNA in the phylogenetic analysis of methanogen populations in landfill. Microbiology. 2002; 148: 3521–3530.

36. Maeder DL, Anderson I, Brettin TS, et al. The Methanosarcina barkeri Genome: Comparative Analysis with Methanosarcina acetivorans and Methanosarcina mazei Reveals Extensive Rearrangement within Methanosarcinal Genomes. J Bacteriol. 2006; 188: 7922–7931.

37. Marraffini LA, Sontheimer EJ. CRISPR Interference Limits Horizontal Gene Transfer in Staphylococci by Targeting DNA. Sci N Y NY. 2008; 322: 1843–1845.

38. Metcalf WW, Zhang JK, Shi X, Wolfe RS. Molecular, genetic, and biochemical characterization of the serC gene of Methanosarcina barkeri Fusaro. J Bacteriol. 1996; 178: 5797–5802.

39. Milferstedt K, Youngblut N, Whitaker R. Spatial structure and persistence of methanogen populations in humic bog lakes. ISME. 2010; 4: 764–776.

120 40. Nosil P, Funk DJ, Ortiz-Barrientos D. Divergent selection and heterogeneous genomic divergence. Mol Ecol. 2009; 18: 375–402.

41. Ochman H, Lawrence JG, Groisman EA. Lateral gene transfer and the nature of bacterial innovation. Nature. 2000; 405: 299–304.

42. Peng Y, Leung HC, Yiu S-M, Chin FY. IDBA-UD: a de novo assembler for single- cell and metagenomic sequencing data with highly uneven depth. Bioinformatics. 2012; 28: 1420–1428.

43. R Development Core Team. R: A Language and Environment for Statistical Computing [Internet]. Vienna, Austria: 2010. Available from: http://www.R- project.org.

44. Ronen R, Boucher C, Chitsaz H, Pevzner P. SEQuel: improving the accuracy of genome assemblies. Bioinformatics. 2012; 28: i188–i196.

45. Schloss PD, Westcott SL, Ryabin T, et al. Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities. Appl Environ Microbiol. 2009; 75: 7537– 7541.

46. Schmitz RA, Albracht SPJ, Thauer RK. A molybdenum and a tungsten isoenzyme of formylmethanofuran dehydrogenase in the thermophilic archaeon Methanobacterium wolfei. Eur J Biochem. 1992; 209: 1013–1018.

47. Schmitz RA, Albracht SPJ, Thauer RK. Properties of the tungsten-substituted molybdenum formylmethanofuran dehydrogenase from Methanobacterium wolfei. FEBS Lett. 1992; 309: 78–81.

48. Shea PR, Beres SB, Flores AR, et al. Distinct signatures of diversifying selection revealed by genome analysis of respiratory tract and invasive bacterial populations. Proc Natl Acad Sci. 2011; 108: 5039–5044.

49. Sowers KR, Baron SF, Ferry JG. Methanosarcina acetivorans sp. nov., an Acetotrophic Methane-Producing Bacterium Isolated from Marine Sediments. Appl Environ Microbiol. 1984; 47: 971–978.

50. Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinforma Oxf Engl. 2006; 22: 2688–2690.

51. Suyama M, Torrents D, Bork P. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 2006; 34: W609–W612.

121 52. Thauer RK, Kaster A-K, Seedorf H, Buckel W, Hedderich R. Methanogenic archaea: ecologically relevant differences in energy conservation. Nat Rev Microbiol. 2008; 6: 579–591.

53. Thauer RK. Biochemistry of methanogenesis: a tribute to Marjory Stephenson:1998 Marjory Stephenson Prize Lecture. Microbiology. 1998; 144: 2377–2406.

54. Tritt A, Eisen JA, Facciotti MT, Darling AE. An Integrated Pipeline for de Novo Assembly of Microbial Genomes. PLoS ONE. 2012; 7: e42304.

55. White RH. Purine biosynthesis in the domain Archaea without folates or modified folates. J Bacteriol. 1997; 179: 3374–3377.

56. Wickham H. ggplot2: elegant graphics for data analysis. Springer New York; 2009. 210 p.

57. Wood GE, Haydock AK, Leigh JA. Function and Regulation of the Formate Dehydrogenase Genes of the Methanogenic Archaeon Methanococcus maripaludis. J Bacteriol. 2003; 185: 2548–2554.

58. Zhang Q, Lambert G, Liao D, et al. Acceleration of emergence of bacterial antibiotic resistance in connected microenvironments. Science. 2011; 333: 1764– 1767.

122 4.9 Tables and figures

Figure 4.1: The phylogenetic tree is a maximum likelihood inference (GTR-Γ model) of full-length mcrA alleles from all isolates and type strains. mcrA fragment alleles obtained by amplicon 454-pryosequencing of each sample were inserted into the tree with the RAxML EPA algorithm. Leaf labels are colored by dataset. The number of isolates and pyrosequencing reads obtained at each location are depicted as bars adjacent to each leaf label. The 454-pyrosequencing alleles were hierarchically clustered at a cutoff of 95% sequence identity and only alleles detected at least 20 times are shown.

123 M. mazei SarPi M. mazei LYC M. mazei Go1 M. mazei S.6 YBB.H.A.2.6 2 4/11 YBB.H.M.2.7 5 42/74 YBB.H.A.2.8 20 56 YBB.H.A.2.1 21/124 0/397 YBM.H.A.2.6 47 69 0/188 M. mazei WWM610 YBM.H.A.1A.4 4 40/60 77 1/306 YBM.H.A.2.1 16 YBM.H.A.0.1 25 mazei-WC 5/94 9/672 YBM.F.A.1B.4 4 43/62 YBM.F.A.1A.3 3 YBM.F.A.1B.3 6 1/10 M. mazei C16 YBB.H.A.2.5 7 0/8 YBB.H.A.1A.2 1 0/14 YBB.H.T.1A.1 3 37/58 YBB.H.T.1A.2 2 3/193 BB.F.T.2.6 37 YBB.F.A.2.12 10 0/11 1/15 YBB.F.A.2.7 1 5/113 YBB.F.A.2.5 2 30/69 YBB.F.A.1A.3 1 YBM.F.M.0.5 2 0/15 YBB.F.A.2.3 7 0/14 YBB.F.A.2.6 4 0/11 YBB.F.T.2.1 36 25/100 YBB.H.A.1A.1 112 2/375 YBB.F.A.1B.1 39 94 5/412 93 YBM.F.A.2.8 22 2/267 94 YBB.F.A.1A.1 153 BB.F.A.2.4 13 118/137 mazei-T YBM.H.M.2.1 6 11/1487 M. mazei TMA YBM.H.M.0.1 30 96 YBB.H.M.1B.1 20 4/34 6/182 YBB.H.M.1B.5 7 0.002 substitutions per site YBB.H.M.1B.2 2 0 44/141 1/867 YBB.F.T.1A.2 4 0 300 changes YBB.F.T.1A.4 5 40/53 YBB.F.T.1A.1 2 YBB.H.A.2.4 70 14/113 7/657 YBB.H.M.1A.1 29 BB.F.T.0.2 41 YBM.H.M.2.2 2 0/2 YBM.H.M.1A.2 0 0/9 YBM.H.T.2.3 2 0/5 0/390 Young’s Bay, Mouth (YBM) YBM.H.M.2.3 3 0/90 YBM.H.T.2.1 1 Young’s Bay, Back (YBB) BB.F.A.2.3 8 0/33 Baker Bay (BB) YBM.H.A.1A.6 18 0/77 3/273 YBM.H.A.1A.3 7 0/25 22/210 Freshwater YBM.H.M.1A.3 23 0/75 High Salt YBM.H.M.2.4 14 1/38 YBM.H.T.2.5 2 Mean Sequence Identity Acetate YBM.H.A.2.7 61 0/61 (’core’ genome) 0/73 YBM.H.M.1A.1 0 7/225 Within mazei-WC 99.709% Methanol YBM.H.A.2.3 4 0/71 TMA YBM.H.A.2.8 5 1/10 Within mazei-T 99.711% YBM.H.A.1A.1 2 Between clades 99.361%

Figure 4.2: The tree on the left is a maximum likelihood inference (GTR-Γ model; 100 bootstrap replicates) from whole genome alignments for all Methanosarcina mazei isolates (N = 56) and type strains. Node labels are provided for nodes with bootstrap support <100. Isolation metadata is mapped onto the tree, with columns showing sample location, isolation media, and substrate used for isolation, respectively (see the legend). The tree on the right is a maximum parsimony inference of gene presence-absence (100 bootstrap replicates). Node labels indicate (genes specific to and found in all descendants) / (genes specific to and found in any descendants). Red node labels highlight nodes with a bootstrap support of <70. Bold leaf labels highlight isolates used for methane production assays (see Figure 4.7). ‘core’ genome refers to all LCBs containing all isolates and reference strains.

124

Figure 4.3: Genes and genomic regions of each M. mazei isolate mapped onto the genome of M. mazei C16. cdc6 locations in M. mazei C16 are marked along the chromosomal axis. The outer ring with numbered labels designates LCBs shared among M. mazei C16, M. mazei WWM610, M. mazei TMA (as in Figure 4.4). The second-most inner ring designates core and variable regions among all M. mazei isolates, with black indicating core regions, dark shades of blue indicating presence in most genomes, and light shades indicating presence in few genomes. The red and green line plot shows the density of core genes with an FST value >0.9 in a 10kb window (maximum density of 5) and a mean sequence identify of <97% (maximum density of 6), respectively. The scatterplots of red or green dots displays the Fst value for each core gene (value range of -0.1 to 1) and mean sequence identity values (value range of 73% to 100%), respectively. The purple bars are gene clusters that had a manually verified dN/dS value of >1.1.

125

C16! WWM610!

TMA!

Figure 4.4: mazei-WC and mazei-T clade-specific gene clusters mapped onto closed reference genomes: M. mazei C16 and M. mazei WWM610 for mazei-WC, and M. mazei TMA, for mazei-T. The outer ring with numbered labels designates LCBs shared among the three genomes. The second-most inner ring designates clade-specific core and variable regions. Black regions are found in all isolates and references in the mazei-WC or mazei-T clades (‘core’), while shades of blue indicate how many genomes in the clade contain that region, with darker shades indicating presence in more genomes. The red bars refer to clade-specific gene locations. The green bars highlight regions identified as potential genomic islands.

126 Methanosarcina mazei Sar Pi

mazei-WC

mazei-T 0.001 Substitutions per site

YBM I-A YBB I-B BB I-C Freshwater I-D High Salt I-E

Acetate III-A Methanol III-B TMA III-C*

Type I Type III

Figure 4.5: CRISPR subtypes present in each isolate are mapped onto the phylogeny from Figure 4.2. ‘*’ refers to a putative novel CRISPR system subtype.

127 A) fdhA fdhB mvhD hdrA hdrB hdrC

238725 244624

B) M. mazei SarPi M. mazei LYC M. mazei Go1 M. mazei S 6 YBB.H.A.2.6 YBB.H.A.2.8 YBB.H.M.2.7 YBB.H.A.2.1 YBM.H.A.2.6 WWM610 YBM.H.A.1A.4 YBM.H.A.2.1 YBM.H.A.0.1 YBM.F.A.1B.4 YBM.F.A.1A.3 YBM.F.A.1B.3 M. mazei C16 YBB.H.A.2.5 YBB.H.T.1A.1 YBB.H.T.1A.2 YBB.H.A.1A.2 BB.F.T.2.6 YBB.F.A.2.12 YBM.F.M.0.5 YBB.F.A.1A.3 YBB.F.A.2.6 YBB.F.A.2.7 YBB.F.A.2.3 YBB.F.T.2.1 YBB.F.T.2.1 YBB.H.A.1A.1 YBB.F.A.1B.1 YBM.F.A.2.8 YBB.F.A.1A.1 BB.F.A.2.4 YBM.H.M.2.1 M. mazei TMA YBM.H.M.0.1 YBB.H.M.1B.1 YBB.H.M.1B.5 YBB.H.M.1B.2 YBB.F.T.1A.2 YBB.F.T.1A.4 YBB.F.T.1A.1 YBB.H.A.2.4 YBB.H.M.1A.1 BB.F.T.0.2 YBM.H.M.2.2 YBM.H.T.2.3 YBM.H.M.1A.2 YBM.H.T.2.5 YBM.H.A.1A.6 YBM.H.M.1A.3 YBM.H.A.1A.3 YBM.H.T.2.1 1 copy YBM.H.M.2.3 YBM.H.A.1A.1 YBM.H.M.1A.1 YBM.H.A.2.7 BB.F.A.2.3 YBM.H.A.2.3 YBM.H.M.2.4 YBM.H.A.2.8 HdrE FdhB FdhA HdrD MvhD HdrB1 HdrB2 HdrB3 HdrC1 HdrC2 HdrC3 HdrA1 HdrA2 HdrA3 Figure 4.6: A) The putative gene cassette orientation and position in the M. mazei C16 genome. B) Gene copy number of Hdr, Fdh, and mvhD paralogs mapped onto a concatenated core gene phylogeny of all 52 M. mazei isolates and reference strains. Black box outlines highlight genes that were not called by RAST but were identified by TBLASTn.

128 Round 1 Round 2 Round 3 Round 4 0.08

0.06 Acetate

0.04

0.02

ling per hour 0.08 b

0.06 MeOH Clade TMA 0.04 WC

0.02 ate of methane dou r 0.08 um m 0.06 TMA Maxi

0.04

0.02 1 2 3 4 5 6 7 8 9 10 11 YBB.H.A.2.5 YBB.H.A.2.5 YBB.H.A.2.1 YBB.H.A.2.4 YBM.H.A.0.1 YBM.H.A.0.1 YBM.H.A.2.1 YBM.H.A.2.7 YBB.H.T.1A.1 YBB.H.T.1A.1 YBB.H.A.1A.2 YBB.H.A.1A.2 YBB.H.A.1A.1 YBM.H.A.1A.1 YBM.H.A.1A.1 YBB.H.M.1B.2 Figure 4.7: The boxplots show the distributions of maximum rate of methane production (concentration doubling per hour) for 3-4 replicates of each culture on each of the three substrates. Maximum methane production rates were calculated from the maximum slope of logistic curves fit onto time series of methane concentrations spanning >600 hours. Numbers refer to isolates in Figure 4.2 (bold labels). Rounds denote separate growth experiments directly comparing isolates from each clade while controlling for isolation source, media, and substrate.

129 :%;3,+,( *+,-'.(#/( "0%//#$1( =#&%$($'24&<( *+,-'.(#/( !"#$%&'(!) *+,-'.(#/("0%//#$1" 0#2&34" *56(78-9 $'24&<(78-9 7:-9 >)? >)?@A8- !"#$%$&$'&$( ')* '+' ,,$' ),* *$-+ (./) -$/0 !"#$%$&$'"$( '*+ '0/ */$. '.- *$-( (.0. -$/0 !"#$%$&$'"$* ',' './ *,$- '00 *$-+ (/-) -$/0 !"#$%$&$)$. )(. )0( (-$) '-. ($/, (.+. -$/, !"#$%$#$-$+ '0* '.- **$) '+' *$-, (/(* -$/, !"#$1$&$-$' '+, '.+ **$. ',) ($/, (.', -$/0 !"#$1$&$'&$' ',) '., (/$/ )0' *$-* (./. -$/0 !"#$1$&$'&$( )-+ )'* (0$, ''' *$-. (/(+ -$/0 !"#$1$&$'&$* ')( ',* 0,$/ *(( ($// (.+) -$/, !"#$1$&$'&$0 )(' )(, (0$- ,, *$-. (/0* -$/, !"#$1$&$)$' )/* ()- )+$+ ''0 *$-- (.0/ -$/, !"#$1$&$)$( ',' )-0 *)$/ ')+ *$-/ (/+( -$/, !"#$1$&$)$0 )-* )() (,$- '(0 *$-' (.., -$/, !"#$1$&$)$, (.( *0) *)$+ '(- *$)+ *--0 -$/* !"#$1$&$)$. ((, (.' ))$' 0/ *$-. (/(* -$/, !"#$1$#$-$' ',0 '/' *)$) )+' *$-0 (/,) -$/. !"#$1$#$'&$' ',+ '.* **$. '(- *$-/ (/+, -$/, !"#$1$#$'&$) '++ '.' +'$* ',, *$-. (/(+ -$/0 !"#$1$#$'&$( )+, ),- ),$0 /) *$-. (/** -$/, !"#$1$#$)$' )0( (', ('$) '-- *$'/ *-+* -$/, !"#$1$#$)$) '+* )-' **$* )', *$-. (/(* -$/0 !"#$1$#$)$( '*) '/. *,$( '00 ($/. (.*) -$/, !"#$1$#$)$* )0- )00 )0$/ /( *$-. (/+, -$/, !"#$1$2$)$' '+/ '/* *0$, '(' *$-. (/(/ -$/, !"#$1$2$)$( '*. ',( **$/ ',0 ($/, (.(/ -$/, !"#$1$2$)$+ '*) '.0 */$* '0) *$-/ (/+) -$/, ""$%$&$)$( './ )-- *'$' ')+ *$-. (/*) -$/, ""$%$&$)$* '*0 '0/ +($0 (-- *$)- *-0. -$/, ""$%$2$-$) '(0 ',. +-$0 ),, *$-/ (/*( -$/0 ""$%$2$)$0 )(0 )*+ ('$0 /* *$-0 (/'/ -$/0 !""$%$&$'&$' ),- )., (-$) ')' *$'' *-+( -$// !""$%$&$'&$( '*, )'- *0$+ '0( *$-, (/') -$/0 !""$%$&$'"$' '0, '/0 *($, '(( *$-. (/*+ -$/, !""$%$&$)$') )-/ )'0 (+$( ')* *$-. (/)+ -$/0 !""$%$&$)$( '.+ '/0 *'$( '(- *$-. (/() -$/0 !""$%$&$)$+ '*0 ',) *0$+ '+/ *$-, (/)/ -$/, !""$%$&$)$0 '0- )-( *'$, )') *$-* (.// -$/0 !""$%$&$)$, '*) ',' **$) ',( *$-( (.,- -$/0 !""$%$2$'&$' '+0 '.* +-$* )', *$'0 *-)) -$/, !""$%$2$'&$) '0( '.. *($. ''+ *$'0 *-)- -$/, !""$%$2$'&$* '(/ ',* +'$- (/' *$'0 *-). -$/, !""$%$2$)$' '0) '/. *)$/ ',- *$'' (//. -$/, !""$1$&$'&$' (*/ (.' )-$( 0, *$-' (/+* -$// !""$1$&$'&$) '.) '/' *-$. /0 *$'- (/,) -$/, !""$1$&$)$' '+/ '.+ **$0 )++ ($/. (.+/ -$/, !""$1$&$)$* '+, '.) */$- (-+ *$') (//- -$/, !""$1$&$)$+ '.0 )(0 *)$, '-/ *$-/ (/0. -$/, !""$1$&$)$0 '0, '/. *)$/ '', *$-' (./- -$/, !""$1$&$)$. ))) )). (-$* '-/ *$-' (./' -$/, !""$1$#$'&$' '0, )', *,$+ (-0 *$') (/,- -$/0 !""$1$#$'"$' )'' ))( ()$) ',. *$'( (//' -$/, !""$1$#$'"$) '(0 ',, +($) ))/ *$'( (/,( -$/0 !""$1$#$'"$+ '(0 '.' +*$( '.( *$') (/,. -$/, !""$1$#$)$, '// )'' (+$. :%;3,+,(''' *$-- (.0* -$/, !""$1$2$'&$' '). *+,-'.(#/('0) +0$0 )')"0%//#$1( =#&%$($'24&<(*$-, *+,-'.(#/((/0+ -$/, !"#$%&'(!)!""$1$2$'&$) *+,-'.(#/("0%//#$1"'.+ 0#2&34"'/* *56(78-9(,$+ $'24&<(78-9'-/ 7:-9*$-/ >)?(/0+ >)?@A8--$/, !"#$%$&$'&$(:323,+, ')*')( '+' ,,$')-$( ),*0, *$-+($/, (./)(.', -$/0 !"#$%$&$'"$(:'13%2 '*+'0, '0/'/0 */$.*($( '.-'++ *$-(*$-. (.0.(/*' -$/0-$/, !"#$%$&$'"$*:%;3,+, ','(.( './*0) *,$-,,$' '00*(( *$-+*$)+ (/-)*-0. -$/0 !"#$%$&$)$. )(. )0( (-$) '-. ($/, (.+. -$/, Table!"#$%$#$-$+!"#$%&'(!)(8'BC( 4.1: Assembly'0*"'13,'2&("%,D$' statistics '.-3"#$%&3#2(,'13%of the 56 M.**$) mazei'+'3"#$%&3#2("+-"&.%&' isolates.*$-, (/(*13$+&3#2 -$/,0#$#2B !"#$1$&$-$' '+,!"#343!56789:3";<3 '.+ **$. ',) ($/, (.', -$/0 !"#$1$&$'&$' =#56>?@',) '.,1343?F8?3:;G>3KDLF;(/$/ )0'&343;JD>;>D *$-* (./.'$--AB-- -$/0C !"#$1$&$'&$( )-+!""34!56789:3";<3=7D;E3 )'* (0$, ''' *$-. (/(+ -$/0 !"#$1$&$'&$* ;73F7GD>@')( Mean, ',*%343MED:?N;>DE3KDLF;0,$/ *((#343KD>?;75G ($// (.+)'$--AH-' -$/, !"#$1$&$'&$0 )('""343";IDE9:3";1.1,-$// where the value!""$%$&$'&$( was not a'*, result of sequence)'- alignment*0$+ errors.'0( Bold *$-,values highli(/') ght-$/0 core genes that !""$%$&$'"$' '0, '/0 *($, '(( *$-. (/*+ -$/, !""$%$&$)$') )-/ )'0 (+$( ')* *$-. (/)+ -$/0 also had FST values >0.5. !""$%$&$)$( '.+ '/0 *'$( '(- *$-. (/() -$/0 !""$%$&$)$+ '*0 ',) *0$+ '+/ *$-, (/)/ -$/, !""$%$&$)$0 '0- )-( *'$, )') *$-* (.// -$/0 !""$%$&$)$, '*) ',' **$) ',( *$-( (.,- -$/0 !""$%$2$'&$' '+0 '.* +-$* )', *$'0 *-)) -$/, !""$%$2$'&$) '0( '.. *($. ''+ *$'0 *-)- -$/, !""$%$2$'&$* '(/ ',* +'$- (/' *$'0 *-). -$/, !""$%$2$)$' '0) '/. *)$/ ',- *$'' (//. -$/, !""$1$&$'&$' (*/ (.' )-$( 0, *$-' (/+* -$// !""$1$&$'&$) '.) '/' *-$. /0 *$'- (/,) -$/, !""$1$&$)$' '+/ '.+ **$0 )++ ($/. (.+/ -$/, !""$1$&$)$* '+, '.) */$- (-+ *$') (//- -$/, !""$1$&$)$+ '.0 )(0 *)$, '-/ *$-/ (/0. -$/, !""$1$&$)$0 '0, '/. *)$/ '', *$-' (./- -$/, !""$1$&$)$. ))) )). (-$* '-/ *$-' (./' -$/, !""$1$#$'&$' '0, )', *,$+ (-0 *$') (/,- -$/0 !""$1$#$'"$' )'' ))( ()$) ',. *$'( (//' -$/, !""$1$#$'"$) '(0 ',, +($) ))/ *$'( (/,( -$/0 !""$1$#$'"$+ '(0 '.' +*$( '.( *$') (/,. -$/, !""$1$#$)$, '// )'' (+$. ''' *$-- (.0* -$/, !""$1$2$'&$' '). '0) +0$0 )') *$-, (/0+ -$/, !""$1$2$'&$) '.+ '/* (,$+ '-/ *$-/ (/0+ -$/, :323,+, ')( '+' )-$( 0, ($/, (.', -$/0 :'13%2 '0, '/0 *($(130 '++ *$-. (/*' -$/, :%;3,+, (.( *0) ,,$' *(( *$)+ *-0. -$/0

!"#$%&'(!)(8'BC( "'13,'2&("%,D$' 3"#$%&3#2(,'13% 3"#$%&3#2("+-"&.%&' 13$+&3#2 0#$#2B !"#343!56789:3";<3 =#56>?@ 1343?F8?3:;G>3KDLF; &343;JD>;>D '$--AB-- C !""34!56789:3";<3=7D;E3 ;73F7GD>@ %343MED:?N;>DE3KDLF; #343KD>?;75G '$--AH-' ""343";IDE9:3";< 23432#& '$--AH-) Clade& Number&of& Present&in&all& specificity genes Annotation members? 144 hypothetical-protein Y-(7) 7 CRISPR9associated-protein 4 glycosyl-transferase 3 DNA-helicase Y-(1) 3 vrlJ,-vrlQ,-&-vrlP 2 chaperone-protein-dnaK 2 DNA-sulfur-modification-protein-(dndB-&-dndD) 2 duf324-domain9containing-protein 2 sensory-transduction-histidine-kinase- 1 3'9phosphoadenosine-5'9phosphosulfate-sulfurtransferase-dndC 1 ATPase-involved-in-DNA-repair,-sbcC 1 cell-surface-protein 1 dipeptide-transport-system-permease-protein-dppB-(tc-3.a.1.5.2) 1 dolichol9phosphate-mannosyltransferase-(ec-2.4.1.83) 1 endonuclease-III-(ec-4.2.99.18) 1 grpE-protein-hsp970-cofactor mazei9T 1 helicase-(snf2/rad54-family) 1 huntingtin-interacting-protein-e9like-protein Y 1 hypothetical-protein-bvu-3741 1 lead,-cadmium,-zinc-and-mercury-transporting-ATPase-(ec-3.6.3.3)-(ec-3.6.3.5) 1 methyltransferase-(ec-2.1.1.9) 1 oligopeptide-ABC-transporter,-periplasmic-oligopeptide9binding-protein-oppA-(tc-3.a.1.5.1) 1 peptidase-c14,-caspase-catalytic-subunit-p20 1 putative-ATP9binding-protein 1 putative-serine/threonine-protein-kinase 1 signal-peptidase-I-(ec-3.4.21.89) 1 TPR-repeat 1 transposase Y 1 type-I-restriction9modification-system,-restriction-subunit-R-(ec-3.1.21.3) 1 type-III-restriction-enzyme,-res-subunit 1 ubiE/coq5-methyltransferase 1 ycfA9like Y 45 hypothetical-protein Y-(6) 3 type-I-restriction9modification-system-(subunits-M,-R,-&-S) Y-(2) 2 CoB9CoM-heterodisulfide-reductase-(hdrA-&-hdrC) 2 formate-dehydrogenase-(fdhA-&-fdhB) 1 DNA9cytosine-methyltransferase-(ec-2.1.1.37) 1 endonuclease mazei9WC 1 internalin 1 methyl9viologen9reducing-hydrogenase-(mvhD) 1 retron9type-RNA9directed-DNA-polymerase-(ec-2.7.7.49) 1 tetratricopeptide-TPR-2 1 t/g9specific-DNA-glycosylase(-ec:3.2.2.9-) 1 transposase Y Table 4.3: Genes only found in all members of the mazei-T or mazei-WC clade after accounting for artificial absences (i.e., present in at least half of a nearly clonal group of isolates). Numbers in parentheses in the fourth column refer to the number of genes.

131 CHAPTER 5: CONCLUDING REMARKS AND FUTURE STUDIES

5.1 Assessing ecological differentiation at multiple scales of biological organization Both human health and global health are dependent on a multitude of functional roles performed by a highly diverse microbial assemblage. A fundamental goal of microbial ecology is determining the influence of environmental change on the diversity and functioning of microbial community, and in turn, how changes in microbial community functioning alter the environment. Although genetic diversity can now be rapidly assessed with high-throughput sequencing, linking such data to the ecological and evolutionary processes dictating spatiotemporal variation in methanogen community diversity and functioning remains a fundamental challenge. For instance, microbial life is currently often categorized into arbitrarily defined taxonomic assemblages that may convolute and mislead investigations into how microbial community diversity changes through time and space and may also obscure the relationship between community diversity and function. High-throughput sequencing of 16S rRNA amplicons allows for a complete sampling of all diversity (at least at one locus) in a microbial community, and thereby permits more accurate monitoring of changes in community diversity over time and space. However, this approach intrinsically provides little information on ecologically relevant physiological variation among microbial taxa. In contrast, comparative genomics of closely related taxa can reveal genomic footprints of adaptive evolution and niche partitioning, but this approach is limited to a small subset of the total taxonomic and genomic diversity in complex microbial communities. This ‘bottom-up’ comparative genomics method and the ‘top-down’ community-level approach involving single marker sequencing (e.g., 16S rRNA 454 pyrosequencing) can provide complementary data on how to best organize microbial life into units of diversity based on the evolutionary and ecological processes defining them. However, this task may not be fully accomplished until a synthesis of both approaches are reached, so that cellular systems-level variation is directly linked to spatiotemporal changes in community diversity and function. The work that I have presented in this dissertation contributes to this synthesis by investigating of ecological differentiation at multiple levels of biological organization, from community-level patterns driven by environmental disturbance to genomic

132 footprints of adaptive evolution within a single population. In the next sections, I outline how each study can be built upon to further move towards the fundamental goal of categorizing microbial life into groupings most relevant to the study of the relationship between environmental change, community diversity, and community function.

5.2 More accurate assessments of ecological differentiation among members of highly diverse communities By investigating changes in abundance of clades defined at various taxonomic resolutions, I was able to discern that ecological differentiation in regards to response to disturbance occurred at various taxonomic levels. The basic methodology of investigating spatiotemporal distributions of taxa defined at different levels as described in Chapter 2 is applicable to many studies investigating microbial diversity using next-generation sequencing of the 16S rRNA gene or other marker. This methodology could benefit from a replicated experimental design in order to quantify random variation in sampling, nucleotide extraction, sequencing, and other experimental noise from actual differences in how taxa change in abundance over time. However, findings derived from artificial, replicated designs should still be compared to whole-ecosystem manipulations or observations to determine the whether processes observed in a controlled and replicated system scale to whole ecosystems. In niche theory, a Hutchinsonian niche can be highly multidimensional, with each dimension corresponding different environmental conditions and resources (e.g., temperature-dependent growth rates) that define an organism’s relative fitness (6). Using environmental disturbance to discern ecological variation among taxa will likely not partition taxa along one niche axis. Again, a replicated design where individual environmental parameters are altered independently could be employed for this end. As a follow-up to the work described in Chapter 2, I did engineer a replicated mesocosm design for this purpose. However, the initial trial of the experiment was never implemented in a fully replicated design as a result of logistical issues. The argument could be made that the vast number of niche axes needed to be tested in complex communities would be prohibitive to replicated designs. However, many studies suggest that only a few environmental variables largely drive

133 spatiotemporal changes in microbial community diversity (4,7,11), which would retain the feasibility of a replicated design approach. However, if only a few niche axes partition a community, then competitive exclusion should limit total community diversity, which is not observed in natural environments. High rates of migration and dormancy may inflate levels of diversity beyond the number of niches available (8). In addition, niche differentiation among very closely related taxa may occur along unmeasured micro-gradients. The relative importance of these processes may at least be partially inferred from DNA and RNA stable isotope probing (2), which can more directly test for functional differentiation and metabolic activity of multiple taxa in a microbial community.

5.3 Further defining the processes of community assembly driving the spatiotemporal distribution of freshwater methanogens At a more limited taxonomic scope from that of Chapter 2, I used a culture- independent approach to infer the processes of community assembly dictating freshwater methanogen community composition within and between a set of humic lakes. The evidence I presented in Chapter 3 suggests that methanogens in the water column comprised a distinct community that was subjected to differing community assembly processes than communities found in the sediment. This work highlights how little is known of spatiotemporal distributions of freshwater methanogens and the processes driving those distributions. The hypotheses supported by my work need to be further evaluated with more elaborate studies. For instance, the niches of water column- inhabiting methanogens could be further resolved by determining whether they are free- living, particle associated, symbiotic, or employ multiple lifestyles. Such insight would help resolve the potential importance of specific environmental drivers such as predation occurred either on free-living members or on anaerobic ciliates containing endosymbiotic methanogens. During the course of this study, I made multiple attempts to answer this question through selective filtering of particle-associated and free-living size fractions, followed by PCR screening of each fraction with methanogen-specific primers. My preliminary results suggested that both free-living and particle-attached/symbiotic lifestyles are employed by methanogens inhabiting the water column.

134 A main finding from this study was that freshwater methanogen diversity appeared to be partially driven by differences in temperature and oxygen concentrations among lakes. As with the study outlined in Chapter 2, replicated experimental designs could help test the influence of these specific environmental variables on freshwater methanogen diversity. Mesocosms that encapsulate the entire water column down to the sediment as used in Shade et al. would be needed to concurrently test both sediment and water column communities (14). Much of the among-lake variation in community composition was not explained by the measured environmental parameters, suggesting that other unmeasured environmental gradients may also dictate spatiotemporal distributions of freshwater methanogens. Additionally, geographic barriers appeared to also partially dictate spatial distributions of water column methanogens. Dispersal limitation may commonly play an important role in the community assembly of freshwater methanogens, given that methanogen taxa can vary significantly in oxygen and desiccation tolerance (5,1,15); both of which could influence dispersal rates. The relative contribution of these selective and neutral processes could be more robustly tested with a sampling design encompassing a large regional set of lakes as done in Yannarell et al. or an intercontinental sampling scheme as often used to assess the drivers of microbial biogeography (12,17,19). Measuring a swath of environmental parameters that may be affecting freshwater methanogen diversity as suggest by other studies (e.g., organic matter composition and algal biomass) would be necessary to determine whether spatial variation in community composition correlated to any specific environmental variables and/or geographic distance (3,16).

5.4 Linking genetic variation to ecological differentiation in Methanosarcina In Chapter 4, I used a comparative population genomics approach to identify genomic variation segregated within a Methanosarcina mazei population in order to develop hypotheses for how adaptive evolution has occurred in this clade. I’ve identified many sources of genetic segregation between the two major clades in the population, which may have been adaptive. First, the clade-specific compositions of both restriction- modification systems and CRISPR subtypes suggested that mobile element defense was

135 clade-specific. Both systems can also modulate recombination rates, and possibly created inter-clade barriers to gene flow. Second, a putative gene cassette was present in only one clade, indicating potentially relevant phenotypic variation. The high rates of fixation around loci potentially affected directly by the functioning of the gene cassette are consistent with the hypothesis that the cassette was adaptive. Finally, we observed phenotypic variable between the clades in terms of methane production rates when grown on trimethylamine, which could likely be an ecologically relevant distinction. Although the two clades were distinct in gene content, phenotype, and harbored nearly fixed genomic regions, inter-clade gene flow was still apparent and suggested that complete species barriers had not formed. This study is a work in progress, and many questions still remain. A major unknown is the actual role of the putatively adaptive gene cassette. I plan to perform growth experiments on formate to determine whether growth only occurs for isolates possessing the gene cassette. Low concentrations of formate will be used to mitigate the possibility of using a concentration inhibitory for growth as may have occurred when assaying growth on formate among characterized M. mazei strains. The alternative hypothesis that the cassette is used for formate biosynthesis can be tested as in (18), where cultures were grown in medium lacking formate and purines. Results supporting either hypothesis could be further supported by insertion of the cassette into a naïve strain and assaying for catabolic or anabolic use of exogenous formate. The methods used to detect signatures of selection among core genes may have not been representative of actual selection pressures, which may have lead to misinterpretations of selection occurring across the core genome. Kryazhimsky and Plotkin found dN/dS to not relate monotonically with the selection coefficient when only members of a single population were compared, but this issue was not present when comparing members of two populations (10). Therefore, to more accurately quantify putative selection in the M. mazei population, I plan to use another population of isolates obtained in the study for inter-population calculations of dN/dS. As of yet, the analysis of recombination within the M. mazei population has been limited, which may skew interpretations of inter-clade gene flow and linkage of loci across the genome. A major issue with the tree reconciliation approach taken was the

136 artificial inflation of inferred gene transfer events due to poorly supported trees. I plan to use a recently developed algorithm for tree reconciliation that mitigates this issue (13). As an additional approach, I plan to employ genome-wide scans of recombination using non-parametric tests as in (9). The investigation of CRISPR diversity in the M. mazei population was cursory in Chapter 4. However, a full characterization of the diversity and evolution in the M. mazei population and across the Methanosarcina genus is currently in progress by a former member of the Whitaker lab. Lastly, many of the Methanosarcinales isolates obtained in the study (N=72) have not been investigated, although the genomes of each are sequenced and assembled into relatively contiguous draft assemblies. Although the M. mazei isolates comprised the largest collection of highly related taxa in the culture collection, other clades of many highly related taxa were found. For instance, the culture collection included a clade of 21 isolates most closely related to Methanosarcina sp. MTP4, a deeply branching Methanosarcina. Also, two clades potentially representing new species of Methanolobus comprised 11 and 4 isolates, respectively. Applying similar approaches as used in Chapter 4 to these clades may help elucidate the applicability of that study’s findings to other Methanosarcinales populations.

137 5.5 References 1. Anderson KL, Apolinario EE, Sowers KR. Desiccation as a Long-Term Survival Mechanism for the Archaeon Methanosarcina barkeri. Appl Environ Microbiol. 2012; 78: 1473–1479.

2. Dumont MG, Murrell JC. Stable isotope probing—linking microbial identity to function. Nat Rev Microbiol. 2005; 3: 499–504.

3. Fan X, Wu QL. Intra-habitat differences in the composition of the methanogenic archaeal community between the Microcystis-dominated and the macrophyte- dominated bays in Taihu Lake. Geomicrobiol J. accepted.

4. Fierer N, Jackson RB. The diversity and biogeography of soil bacterial communities. Proc Natl Acad Sci U S A. 2006; 103: 626–631.

5. Horne AJ, Lessner DJ. Assessment of the oxidant tolerance of Methanosarcina acetivorans. FEMS Microbiol Lett. 2013; 343: 13–19.

6. Hutchinson GE. Concluding remarks. 1957. p. 415–427.

7. Jones RT, Robeson MS, Lauber CL, Hamady M, Knight R, Fierer N. A comprehensive survey of soil acidobacterial diversity using pyrosequencing and clone library analyses. ISME J. 2009; 3: 442–453.

8. Jones SE, Lennon JT. Dormancy contributes to the maintenance of microbial diversity. Proc Natl Acad Sci U S A. 2010; 107: 5881–5886.

9. Krause DJ, Didelot X, Cadillo-Quiroz H, Whitaker RJ. Recombination shapes genome architecture in an organism from the Archaeal domain. submitted.

10. Kryazhimskiy S, Plotkin JB. The Population Genetics of dN/dS. PLoS Genet. 2008; 4: e1000304.

11. Lozupone CA, Knight R. Global patterns in bacterial diversity. Proc Natl Acad Sci. 2007; 104: 11436–11440.

12. Martiny JBH, Eisen JA, Penn K, Allison SD, Horner-Devine MC. Drivers of bacterial β-diversity depend on spatial scale. Proc Natl Acad Sci. 2011; 108: 7850– 7854.

13. Nguyen TH, Ranwez V, Pointet S, Chifolleau A-MA, Doyon J-P, Berry V. Reconciliation and local gene tree rearrangement can be of mutual profit. Algorithms Mol Biol. 2013; 8: 12.

14. Shade A, Read JS, Welkie DG, Kratz TK, Wu CH, McMahon KD. Resistance, resilience and recovery: aquatic bacterial dynamics after water column disturbance. Environ Microbiol. 2011; 13: 2752–2767.

138 15. Wagner D, Schirmack J, Ganzert L, Morozova D, Mangelsdorf K. Methanosarcina soligelidi sp. nov., a desiccation- and freeze-thaw-resistant methanogenic archaeon from a Siberian permafrost-affected soil. Int J Syst Evol Microbiol. 2013; 63: 2986–2991.

16. West WE, Coloso JJ, Jones SE. Effects of algal and terrestrial carbon on methane production rates and methanogen community structure in a temperate lake sediment. Freshw Biol. 2012; 57: 949–955.

17. Whitaker RJ, Grogan DW, Taylor JW. Geographic Barriers Isolate Endemic Populations of Hyperthermophilic Archaea. Science. 2003; 301: 976–978.

18. Wood GE, Haydock AK, Leigh JA. Function and Regulation of the Formate Dehydrogenase Genes of the Methanogenic Archaeon Methanococcus maripaludis. J Bacteriol. 2003; 185: 2548–2554.

19. Yannarell AC, Triplett EW. Within-and between-lake variability in the composition of bacterioplankton communities: investigations using multiple spatial scales. Appl Environ Microbiol. 2004; 70: 214–223.

139 APPENDIX A

A.1 References 1. Newton RJ, Jones SE, Eiler A, McMahon KD, Bertilsson S. A Guide to the Natural History of Freshwater Lake Bacteria. Microbiology and Molecular Biology Reviews. 2011; 75: 14–49.

A.2 Tables and figures

2D Stress: 0.03 100 99.5

Phylum

76 - 75 99 - 93 80 - 77

92 - 87 86 - 85 Class Genus

Family Order 84 - 81

Figure A.1

140 Figure A.1 (cont.): Taxonomic scaling relationships among 12 microbial communities sampled over time in two water layers. Non-metric multidimensional scaling (NMDS) of pairwise Spearman’s correlation coefficients (ρ) between the set of 12 samples ranked by Bray-Curtis dissimilarity at different taxonomic scales. Black triangles represent Linnaean taxonomy (phylotyping using the SILVA taxonomy) and gray triangles represent levels of hierarchical clustering based on sequence identity. Numbers show the range of sequence identity covered by each cluster of points. See Table A.2 for the Mantel values (ρ) of all comparisons between stepwise changes in hierarchical clustering cutoffs.

141 4000 ●

3000

● ●

2000 ● OTUs

N ● ● Absences

● 1000

● ● ●

●● ● ● ● ● ●● ● ● ● ● ●● ● ●●● ● ●●● ● ●●●●●●● ● ●●●●●●● ●●●● ●●●● ●●●● ●●●●● ●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

20 40 60 80 100 120 140 Abundance Cutoff

Figure A.2: Loss of OTU data with an increasingly stringent abundance cutoff. The plot shows the number of OTUs defined at a 99.5% cutoff and the total number of absences in of all OTUs in all samples. The dashed line indicates the abundance cutoff chosen for further analyses (≥50 total reads assigned to an OTU).

142 81% - Class 81% - Class

32 ● ● 26 ● ●

127 ●

19 ● 4 ●

16 ● ● 104 ●

11 ● 4 ● ●

24 ● ●

2 ● Number of Number of 42 ● ● ● sequences sequences ● ● 69 17 ● ● 80 ● 99 ● 286 ● 358 146 ● ● ● 21 ● ● 619 ● 778 20 ● ● 1078 10 ● ● 1359 3 ● ● ● ● ● 1664 ● 2100 ● 2376 ● ● ● 28 ● 3002 19 ● ● ● 3215 ● 4065 181 ● 4181 18 ● ● ● ● 5289 5273 5 ● ● ● ● 6673 ● 3 ● ● 6 ● ●

5 ● ● ia r ia* ia r Acidobacteria r

ia Actinobacteria ia r ia r r Bacteriodetes ia*

r Chlorobi TM7* Chlorobia Opitutae Bacteroidia Sphingobacte Bacteroidetes* Holophagae Gamma Proteobacte Alpha Epsilon Beta Actinobacte unclassified ucomicrobia r Firmicutes er V Bacteroidetes Acidobacte Bacte Proteobacte Actinobacte TM7 Chlorobi Protobacteria TM7 85% - Order Verrucomicrobia 87% - Family

176 ● 205 ●

138 ● 157 ● 25 22 ● ● 54 ● 40 ● 4 ● ● 4 ● 163 ● ● 142 13 ● 18 ● 21 ●

207 ● ● 242 ● ●

● ● 12 ● 3 93 ● ● 3 ● ● ● Number of Number of sequences 189 ● sequences 80 ● ● ● ● 73 230 ● 59 197 ● ● 88 ● ● 260 ● 208 ● 75 44 ● ● 560 ● 447 37 ● 23 ● ● ● ● ● 975 ● ● 776 20 ● ● ● ● ● ● 89 ● 1504 303 ● ● 1195 76 ● ● ● 2147 30 ● ● ● 1704 259 ● 2904 5 ● ● 2303 27 ● ● ● ● ● 14 ● ● 3776 2993 5 ● ● ● ● ● ● 10 ● ● ● ● ● ● ● ● ● 4761 3773 10 ● ● 12 ● ● ● ● ● 6 ● ● 6 ● ● ia* r ia*

r Acidobacteria iales r Actinobacteria aceae aceae ales r r r iaceae iales aceae r illales r r r Bacteriodetes ycetales* ycetales m ellaceae m yromonadaceae inckiaceae x r ylophilaceae ylococcaceae ylophilales ylococcales Chlorobi h obiales* obiales ettsiales a h h h h kholde p kholde z z k r r r r

Firmicutes o Mo SAR11 Sphingomonadaceae Beije Rhi Acetobacte Rhodocyclaceae Met Oxalobacte Comamonadaceae Bu Acidimicrobiales* Iamiaceae Actino unclassified Holophagaceae P Chitinophagaceae Chlorobiaceae Puniceicoccaceae Met Sinobacte Betaproteobacte Met Bu Acidimicrobiales Actino unclassified Rhi Rhodospi Rhodocyclales Holophagales Bacteroidales Sphingobacte Chlorobiales Puniceicoccales Met Xanthomonadales Pseudomonadales Ric Sphingomonadales Alphaproteobacte Caulobacte Protobacteria TM7 Verrucomicrobia 93% - Genus 99.5% - Genus

349 ● 2146 ● 255 ● 2125 ● ● 34 ● 1220 527 ● 123 ● 1159 ● 4 ● ● 1156 ● 269 ● 1542 ● 1494 16 ● ● 1548 ● 108 ● 698 ● 3 ● ● 685 ● ● 323 ● 769 765 ● 313 ● 690 ● 384 ● 780 ● 312 ● Number of 772 ● ● Number of 32 ● ● ● sequences 1054 1047 ● sequences 144 ● ● 57 ● ● 1381 ● ● 30 514 583 ● ● 201 65 ● 2041 ● ● 99 ● 24 ● ● ● 430 2295 1262 ● ● 208 64 ● ● ● ● 747 2264 ● 15 ● ● ● 1000 ● ● 357 14 ● ● ● 1150 2425 ● ● ● 546 11 ● ● 222 ● 1640 320 ● 12 ● ● ● ● 775 ● 2216 303 ● ● ● 442 ● 171 ● ● 1044 430 ● ● 2879 96 ● ● ● 30 ● 104 ● 1353 ● 3628 1899 ● 302 ● 1797 ● ● 1701 6 ● 1733 ● 1694 ● ium r aceae* ium r r ax r e ix f r ax inckia ucleobacter r r ylophilaceae* ylobacter n e h h ix vibacter f r r inckia ucleobacter r ylophilaceae* aludibacter oly elagibacter n h vibacter P unclassified Rhodo Comamonadaceae* Acetobacte Chelatococcus Beije Rhodocyclaceae* Met Cu Met Steroidobacter Acinetobacter P Geoth P Chitinophagaceae* Sediminibacte Chlorobiaceae* Puniceicoccaceae* Iamia r oly elagibacter P unclassified Sediminibacte Iamia Chlorobiaceae* Puniceicoccaceae* Geoth Beije P Steroidobacter Rhodocyclaceae* Met Rhodo Cu Comamonadaceae*

Figure A.3

143 Figure A.3 (cont.): Mapping Linnaean taxonomic classifications to OTUs. The x-axis labels are Linnaean classifications as identified using established SILVA classifications to determine phylotypes. The y-axis labels are OTUs created by average-neighbor hierarchical clustering. The dendrograms show the hierarchical relatedness of classifications or OTUs. Classifications are colored by phylum. The size of each dot is proportional to how many sequences that fall into a classification also fall into a particular OTU (see plot legends; note that scaling varies among plots). Each plot maps a classification level with its corresponding hierarchical clustering cutoff based on Figure 2.1. Bold OTU numbers highlight OTUs that contain reads classified as two different taxonomic classifications (not including ‘unclassified’).

144 75% (Phylum) - 81% (Lineage/Clade) - - 85% (Lineage/Clade) - - - 87% (Lineage/Clade) Epilimnion Hypolimnion - - - - 93% (Clade/Tribe) 0 m 4 m - - - - - 99.5% (Tribe) 2 (Proteobacteria) 3 (Proteobacteria) - 3 (*) - - 3 (LiUU-3-334) - - 80 (*) - - - 93 (*) - - - 189 (Acin) - 5 (*) - - 5 (*) - - - 5 (*) - - - - 24 (LD28) - - - - 517 (*) - - 10 (*) - - - 10 (Pnec) - - - - 11 (*) - - - - 12 (PnecC) - - - - - 1694 - - - - - 1733 - - - - - 1797 - - - - - 1899 - - - - 14 (Pnec) - - - 12 (*) - - - - 15 (*) - - - - - 96 (Lhab-A4) - - - - - 104 (Lhab-A1) - - - - - 171 (betI-A) - - - - - 303 (betI-A) - - - - - 320 (Lhab-A3) - - - - 64 (Rhodo) - - - 14 (*) - - 27 - 19 (*) - - 20 (*) - - - 23 (*) - - - - 32 (*) - - - - - 1262 (alfI-B2) - - - - - 2264 (alfI-B2) - - - - - 2295 (alfI-A1) - - 37 (*) - - 75 (*) - - 76 (alfVI) - - 197 (alfV-A) - 181 (alfVIII) 4 (Actinobacteria) - 4 (acV) - - 4 (acV-A) - - - 4 - - - - 4 (acV-A1) - - - - 123 (acV-A2) 5 (Actinobacteria) - 6 (acI) - - 6 (acI-B) - - - 6 - - - - 6 - - - - - 685 (acI-B2) - - - - - 690 (acI-B3) - - - - - 698 (acI-B3) - - - - - 765 (acI-B2) - - - - - 769 (acI-B2) - - - - - 772 (acI-B3) - - - - - 780 (acI-B4) 10 (Verrucomicrobia) - 11 - - 12 - - - 13 - - - - 16 - - - - - 1047 - - - - - 1054 16 (Bacteroidetes) - 17 (bacI) - - 18 (bacI-A) - - - 21 - - - - 30 (bacI-A1) - - - - 302 (bacI-A1) - - - - 430 (bacI-A2) - 146 (bacI) 18 (Proteobacteria) - 20 - - 22 - - - 25 - - - - 34 - - - - - 1494 - - - - - 1542 - - - - - 1548 19 (Bacteroidetes) 21 (Bacteroidetes) - 24 (bacVI) - - 138 (*) - - - 157 (*) - - - - 255 (*) - - - - - 1156 (*) - - - - - 1159 (*) 26 (Proteobacteria) 28 (Actinobacteria) 69 (Proteobacteria) 3 7 3 7 -9 -9 11 20 11 20 Mix Mix

% relative abundance by OTU (across the row)

0% 20% 40% 60% 80% Figure A.4: The heat map as shown in Figure 2.1 with OTUs labeled by a lake-specific taxonomy as described in (1).

145

75% - Phylum 81% - Clade

26 ● 127 ● ● ●

24 ● ● ● 19 ● ●

20 ● 16 ● 32 ●

4 ● ● ● 4 ● ●

2 ● 104 ●

● ● Number of Number of 69 11 ● ● sequences sequences

● 99 ● ● 60 21 ● ● 17 ● 359 ● 210 146 ● 10 ● ● ● 779 ● 451 ● ● 1360 3 ● ● ● ● 783 28 ● ● ● ● 2102 ● 1206 19 ● ● ● ● ● ● ● 3005 ● 1720 18 ● ● ● ● 4069 181 ● ● 2325 5294 3022 3 ● ● ● 5 ● ● ● ● ● ● 6680 ● ● 3809

5 ● ● 6 ● ● ia ia r r Actinobacteria

Bacteriodetes bacVI* bacVI−B bacI−A bacI−B1 acV−A acI−B alfV−A gamI* alfIV* LiUU−3−334* Acin alfI−A1 alfI−B alfVI* betIV−A alfVIII* betI−B betI−A Pnec unclassified ucomicrobia r

er Protobacteria Bacteroidetes V Proteobacte Actinobacte unclassified Verrucomicrobia

85% - Clade 87% - Clade

176 ● ● ● 205 ● ● ●

● ● ● 138 ● ● ● 157 22 ● 25 ● 4 ● 40 ● 163 ● 4 ● 13 ● ● 142 ● 21 ● 12 ● ● 242 ● 18 ● 6 ● ● 207 ● Number of 3 ● ● Number of 6 ● ● sequences sequences 93 ● ● 3 ● ● ● 60 ● 60 189 ● 80 ● ● ● ● 210 ● 210 230 ● 197 ● ● ● 451 ● 450 44 ● ● 37 ● ● ● 782 ● 782 23 ● ● ●

● ● ● ● ● ● 1204 20 1205 89 ● ●

● ● ● 1717 76 ● 1718 303 ● 2321 259 ● ● 2323 30 ● ● ● ●

● ● ● 3016 27 ● ● 3018 5 ● ● ●

● 3802 5 ● ● ● 3805 12 ● ● ● 10 ● ● ● ● 10 ● ● ● ● bacVI* bacVI−B bacI−A bacI−B1 acV−A acI−B alfV−A gamI* alfIV* LiUU−3−334* Acin alfI−A1 alfI−B alfVI* betIV−A alfVIII* Pnec betI−B betI−A unclassified bacVI* bacVI−B bacI−A bacI−B1 acV−A acI−B alfV−A gamI* alfIV* LiUU−3−334* Acin alfI−A1 alfI−B alfVI* betIV−A alfVIII* Pnec betI−B betI−A unclassified Actinobacteria Bacteriodetes Protobacteria Verrucomicrobia

93% - Tribe 99.5% - Tribe

349 ● ● 2146 ● 2125 ● 255 ● ● 1220 ● 34 ● 527 ● 123 ● 1159 ● ● 1156 ● ● 4 ● 1542 ● 269 ● 1494 ● 16 ● 1548 ● 698 ● ● ● 3 685 ● 323 ● 769 ● ● ● 313 ● 765 ● 690 ● ● 384 780 ● 312 ● 772 ● Number of ● Number of 32 ● ● ● ● 1054 sequences 1047 ● sequences ● 144 1381 ● ● ● 30 514 ● 57 583 ● ● ● 2041 65 ● ● 200 ● 99 2295 ● 517 ● ● ● 429 1262 ● ● 208 24 ● 2264 ● ● 1000 ● ● 357 64 ● ● ● 744 2425 ● ● ● 15 ● ● ● ● ● 1145 222 ● ● 546 ● 14 ● 320 ● 775 ● 1633 303 ● ● 12 ● ● ● ● ● ● ● ● 2207 171 ● 1044 442 ● ● 96 ● 430 ● ● 2868 104 ● ● ● 1353 1899 ● 30 ● ● 3614 1797 ● ● 1701 302 ● ● 1733 ● 6 ● ● ● ● 1694 ● edo edo bacI−A2 bacI−A1 P LD12 LD28 alfI−A1* alfI−B2 Rhodo Lhab−A1 Lhab−A3 betI−A* Lhab−A4 PnecC acV−A1 acI−B2 acI−B4 acI−B3 acI−B* unclassified bacI−B1* bacI−A2 bacI−A1 P acV−A2 acV−A1 acI−B4 acI−B2 acI−B* acI−B3 LD12 Acin* alfI−A1* alfI−B* alfI−B2 LD28 Rhodo betI−A* Lhab−A3 Lhab−A1 Lhab−A4 PnecA PnecC unclassified

Figure A.5

146 Figure A.5 (cont.): Mapping freshwater lake specific taxonomic classifications to OTUs. The x-axis labels are classifications as defined in (1). The y-axis labels are OTUs created by average neighbor hierarchical clustering. The dendrograms show the hierarchical relatedness of classifications or OTUs. Classifications are colored by phylum. The size of each dot is proportional to how many sequences that fall into a classification also fall into a particular OTU (see figure legends; note that scaling varies among plots). Each plot maps a classification level with its corresponding hierarchical clustering cutoff based on Figure A.1. Bold OTU numbers highlight OTUs that contain reads classified as two different taxonomic classifications (not including ‘unclassified’).

147

Figure A.6: Correlations of ecological preference among sister taxa within each lineage. Nodes represent taxa that are colored by increasingly fine taxonomic scale (hierarchical clustering, purple: 75%, blue: 81%, orange: 85%, green: 87%, yellow: 93%, cyan: 99.5%). Node size is proportional to the square root of total taxon abundance. Node labeling was performed identically to Figure 2.1. Correlations among taxa are depicted as lines connecting nodes with solid lines representing correlations controlling for depth (temporal patterns) and dotted lines corresponding to correlations controlling for time (spatial patterns). The color of the line indicates whether the correlation is positive (ρ>0.6, Q<0.05, green) or negative (ρ<-0.6, Q<0.05, red).

148

Sequence' identity'cutoff' Mantel'(ρ) comparison 100#<%>#99.5 0.95 99.5#<%>#99 0.83 99#<%>#98 0.94 98#<%>#97 0.94 97#<%>#96 0.94 96#<%>#95 0.94 95#<%>#94 0.94 94#<%>#93 0.94 93#<%>#92 0.93 92#<%>#91 0.95 91#<%>#90 0.94 90#<%>#89 0.94 89#<%>#88 0.94 88#<%>#87 0.95 87#<%>#86 0.91 86#<%>#85 0.95 85#<%>#84 0.92 84#<%>#83 0.94 83#<%>#82 0.95 82#<%>#81 0.94 81#<%>#80 0.90 80#<%>#79 0.94 79#<%>#78 0.94 78#<%>#77 0.94 77#<%>#76 0.93

Table A.1: Mantel values (ρ) from the analysis shown in Figure A.1. Bold labels highlight the 6 lowest values, which were used for choosing the sequence identity cutoffs (75%, 81%, 85%, 87%, 93%, or 99.5%) for defining OTUs.

149 Sequence'identity' Number'of'OTUs' Number'of'OTUs' cutoff with'≥50'reads with'≥1'reads 75% 12 112 81% 15 187 85% 22 267 87% 25 314 93% 29 543 99.50% 36 2566

Table A.2: Summary of OTUs by taxonomic level. The table shows the number of OTUs at each sequence identity cutoff with either ≥50 reads or ≥1 read in the dataset.

150 APPENDIX B

B.1 Tables and figures

Mary Lake South Sparkling Bog

0 ● ●

● ●

● ●

● ●

● ●

5 ● ●

● ●

● ●

● ●

10 ●

● Depth (m) ●

15 ●

20 ●

0 500 1000 0 500 1000 CH4 (uM) Methane (μM)"

Figure B.1: CH4 profiles for the water columns of Mary Lake and South Sparkling Bog. For replicated measurements (closed circles), the points are the mean of 3 replicates, and the error bars are the mean ±1 standard deviation. Open circles indicate that only a single measurement was taken at that depth.

151

Figure B.2: Most mcrA genotypes fall into the methanomicrobiales order. The maximum likelihood phylogeny was inferred from the dataset mcrA alignment (plus type strains) using RAxML (GTR+Γ model; rooted on jannaschii DSM2661). Minor clades were collapsed. Symbol size denotes bootstrap support, and symbols are only mapped to nodes with a bootstrap support of ≥70 (100 bootstrap replicates). Numbers in parentheses indicate the number of mcrA sequences from the dataset that fall into the particular taxonomic group. The branch length key units are substitutions per site.

152 60

40 H ●●● ●●● ●●● ● ●●● ●● ●●● ●●● ●●● ●● ●●● ●● ●● ●●● ●● ●●● ●●●● ●● ●● ●●●● ●●●●● ●●●● ●●●● ●●●● ●●●● ●●● ●●● ●●● ●●● ●●●● ●●● ●●● ●●● ●●● ●● ●●●● ●● ●●● ●●● ●● ●●● ●●●● ●● ●● ●●● ●● ●●● ●●● 20 ● ●● ●●● ●● ●● ●●● ● ●●● ●●● ●● ●● ●●● ● ●● ●●● ● ●●● ●● ●● ●● ●●● ● ●● ●●● ●●●●●●●● ● ●● ●● ●●●●●●●●● ● ●●●●● ●●●●●●●●● ● ●●● ●●●●●●●●● Lake ●● ●●● ●●●●●●●● ●● ●●● ●●●●●●● ● ●●● ●●●●●●● ● ●● ●●●●●● ● ●●● ●●●●● ● ●●● ●●●● ●● ●●●● MA ●● ●● ●●●● ●●● ●● 0 ● ● NSB ● RL ● ●● ●● ●● ● ●● SSB ●● 60 ●● ● ● ●● TB ●● ●● ●● ●● ● ● ● ● ● ●● ● ● ●●● ●● ●● ●● ● ●● ●● ● ● ●● ● ●● ●● ● ●● ●● ● ● ●● ● ●● ●● ● ● ●● ● ●● ●● ● ● ● ● ●● ●● ● ● ●● 40 ● ●● ●● ● ● ●● ●● ● ●● ●● ●● ● ● ●●● ● S ● ●● ●● ●● ● ●● ●●● Number of OTUs (100% Sequence Identity) Number of OTUs ● ● ● ● ● ●● ●● ●● ● ● ● ● ●● ●●●●● ● ● ●● ●●● ●●●● ● ●● ●● ●● ●●●● ● ● ● ●● ●●●● ● ● ●● ●● ●●● ● ● ● ●● ●●● ● ●● ●● ●● ●●● ● ● ● ●●●●● ● ● ●● ●●●● ● ●● ● ●●● ● ●● ●●● ● ● ● ●● ● ● ● ●● ● ● ●● ●● ● ● ● ●●● ● ●● ● ●●● ● ● ●● ●● 20 ● ● ● ●●● ● ● ● ●● ● ● ●●● ●● ● ●●● ●● ● ●●● ●● ● ●● ●●●●●●● ●●● ●●● ●●●●●● ●●●●● ●●●●● ●●●● ●●● ●●● ●●● ●●● ●● ● ● ● ● ● 0 ● 0 25 50 75 Number Sampled

Figure B.3: Greater saturation of genotype diversity in the hypolimnion than the sediment. Rarefaction curves for all mcrA alleles in the dataset grouped by lake and layer. The errors bars delimit 95% confidence intervals. ‘H’ and ‘S’ indicate hypolimnion and sediment samples, respectively.

153 A) B) MPD Z value distribution MPD P value distribution 2 ● 1.00

● Frequency Frequency ● ● 0.75 0 ● ● ● ● ● ● ● ● ● ● ● ● 0.50 −2 ● ●

−4 ● 0.25 ● ● ●

● ● ● ● 0.00 ● ● 2 1.00 Richness ● ● 0.75 Richness 0 ● ● ● ● ● ● ● ● −2 ● ● 0.50 ● ● ● ● ● ● ● −4 0.25 ●

0.00 ● ● ● 2 1.00 Independent Z values Independent

P values ● ● ● ● ● Swap 0.75

0 ● Swap

● ● ● ● ● ● ● ● −2 ● ● 0.50 ● ● −4 ● 0.25

● ● ● 0.00 ● ● ● 2 1.00 ● Taxa Label

● Taxa Label ● ● 0 ● ● ● 0.75 ● ● ●

● ● ● ● ● −2 0.50 ● ● ● ● ● ● ● ● −4 ● 0.25

0.00 ● ● ● ● RLS TBS RLH TBH MAS MAH SSBS RLS TBS NSBS SSBH RLH TBH NSBH MAS MAH SSBS NSBS SSBH NSBH

Figure B.4: Variability of MPDSES effect size and significance among bootstrap replicates. Distributions of MPDSES z values (A) and P values (B) for 100 maximum likelihood (GTR+Γ model) bootstrap replicate trees. The null models used were the same as Table .2.

154 2 Persisting taxa

1

0 Branch length Branch

2 Transient taxa

1

0 MAH NSBH RLH SSBH TBH

Figure B.5: Branch length distribution among either persisting taxa or all taxa. The same maximum likelihood phylogeny as in Table B.2 was used to quantify the branch length distance among either persisting taxa (i.e., detected in a lake’s hypolimnion on all time points) or among transient taxa.

155 All Hypolimnion Matched

0.6

0.4 Frequency

0.2

0.0

0.000 0.025 0.050 0.075 0.100 0.000 0.025 0.050 0.075 0.100 0.000 0.025 0.050 0.075 0.100 Mean relative taxon abundance Figure B.6. The mcrA dataset does not appear to fit a neutral model of community assembly. Mean relative abundance of each unique mcrA allele (N = 571) was plotted against the frequency of the allele among all samples (“All” plot), just hypolimnion samples (“Hypolimnion” plot), or just matched hypolimnion and sediment samples (“Matched” plot).

156

'"(5.1.)<$.+$2"&12$)-=/ ##)#$ ##)($ ##)*$ #$)#$ -).$ -)&$ -)#$ #()$, #()(( ##),$ (#).$ (')$$ (()&$ (#)#$ #+)&$ (#)'$ ($)($ #')-$ #')-$ '$"7)<$.+$2"&12$)-=/ .)$* ').# ')'& .)'$ -)$* *),+ *),$ -)'# -)+$ &)-& #*)$+ +).# +)&# #$)&$ +)&, +),# #$)(( ')$$ ,).+ '575.1.)<$.+$2"&12$)-=/ -),$ -)'$ -)'$ -),$ $)&$ #)*$ $)+$ *)+$ *)+$ *).$ -),$ -)#$ -)#$ -)&$ -)*$ -)*$ -)*$ -)($ -)($ '"(5.1.)%9)-.:;!/ ')($ ')($ ')#' ,)(( ().* &),$ *)#( ')-' ,)($ &)(* ,)*+ +),# #$),$ ,)#' .)+* .)+$ +)#$ #$)+- +)&- '$"7)%9)-.:;!/ *)$$ ()-' ()&* -)-- $)*' #)#+ $).* *)(# (),, $)+* #)*# ()-# *)*+ #)-- *)*# ()-- *)(' ()$. #)++ integrated hypolimnion waterhypolimnion integrated

'575.1.)%9)-.:;!/ $)$$ $)$$ $)$$ $)## $)$, $)$' $)$' $)$, $)$$ $)$$ $)$& $)$* $)$( $)$* $)(( $)*$ $)*$ $)($ $)(# +8 &)&, -).- -)&+ -).& &)#$ -)+$ -)'$ &)(# -)&* &)#, -).+ &)#$ &)($ -)-* -)(& -)*$ *)+& -),' -)'$ North Sparkling Bog was used as an as usedarbitrary SparklingBog North was

%56&"74$)-#./ *#)+$ $)($ $)$$ -)'$ *#)+$ $)$$ -)'$ $)($ -)'$ *#)+$ $)$$ *#)+$ *()#$ $)($ $)$$ $)($ -)'$ *#)+$ *()#$ 0123"4$)"2$")-,"/ #)($ $)-- $)-' #)$# #)($ $)-' #)$# $)-- #)$# #)($ $)-' #)($ #)-* $)-- $)-' $)-- #)$# #)($ #)-* Environmental Environmental data.

'"()*$+&,)-./ (#)&$ .)$$ -)&$ ')+$ (#)&$ -)&$ ')+$ .)$$ ')+$ (#)&$ -)&$ (#)&$ #*)$$ .)$$ -)&$ .)$$ ')+$ (#)&$ #*)$$

.1: %"&$ #$%#&%$' #$%#&%$' #$%#&%$' #$%#,%$' #%-%$. #%-%$. #%-%$. -%*$%$. -%*$%$. &%#%$. ,%*$%$. '%#%$. '%#%$. '%*%$. &%(+%$+ &%(+%$+ &%(+%$+ &%*$%$+ &%*$%$+ Table Table B values pointdistance. mean,maximumMinimum, reference of for and calculating fromoxygentemperature calculated water (DO)and dissolved water column were wastaken m at profiles pH from 1 measured intervals. samples. !"#$ !" //0 1/0 20 !" 1/0 20 //0 20 !" 1/0 !" 34 //0 1/0 //0 20 !" 34

157 Frequency Richness Independent1Swap Taxa1Labels MAH $3.53** $2.34** $2.96** $2.38** MAS 0.49 $0.03 0.5 $0.06 NSBH $3.36** $2.39** $2.62** $2.51** NSBS 1.43 1.01 1.38 1.07 RLH $2.26* $2.97** $2.26** $3.05** RLS $0.11 $1.21 $0.36 $1.29 SSBH $1.78* $0.54 $0.88 $0.54 SSBS $0.66 $0.79 $0.71 $0.81 TBH 0.96 0.54 0.93 0.54 TBS $0.41 $0.98 $0.46 $0.91 *7P7<70.05;7**7P7<70.005

Table B.2: Varied phylogenetic dispersion among hypolimnion and sediment samples.

The standard effect size of mean phylogenetic distance (MPDSES) was calculated for each matched hypolimnion and sediment sample. Significance was ascertained using permutation tests (999 permutations) with four different null models: i) altering taxon richness, but maintaining total taxon frequencies (‘Frequency’) ii) altering taxon frequencies, but maintaining sample richness (‘Richness’) iii) shuffling taxa abundances among samples while maintaining total taxon frequencies and sample richness with the Independent Swap algorithm (‘Independent Swap’) iv) shuffling taxon labels across the phylogeny (‘Taxa Labels’).

R2# P# pH! ! 0.011! 0.962! Depth! 0.012! 0.613! Distance! 0.012! 0.958! Minimum!DO! 0.060# 0.006# Mean!DO! 0.020! 0.263! Maximum!DO! 0.009! 0.630! Minimum!Temperature! 0.030! 0.070! Mean!Temperature! Maximum!Temperature! 0.117! # 0.001! # Surface!Area! ! ! ! Table B.3: Multiple regression on matrices (MRM) assessing the significance and strength of variance in beta diversity among hypolimnion communities (excluding TBH) explained by each environmental parameter. Mean temperature and lake surface area were excluded because they co-varied strongly (ρ > 0.6) with maximum temperature and lake depth, respectively. Bold values highlight significance (P < 0.01).

158 APPENDIX C

C.1 Tables and figures

Freshwater High Salt 40 Initial cultures 30

20

10 Acetate 0 40 Methanol TMA sequenced

30 Cultures Number of cultures 20

10

0 BB BB YBB YBB YBM YBM Figure C.1: Distribution of isolates obtained on all sample-media-substrate pairwise combinations. ‘Initial cultures’ refers to the number of cultures that grew in liquid media following initial colony picking. ‘cultures sequenced’ refers to the cultures that were selected for genomic sequencing.

159 Methanosarcinales

11

Legend:Legend: Methanococcus aeolicus Nankai33 OTUs Methanococcus voltae A3 Dataset Selection media Methanococcus vannielii SB 1.F.A1.F.A Methanococcus maripaludis strain S2 Methanococcus maripaludis C6 Baker Bay 1.F.M1.F.M Methanococcus maripaludis C5 1.F.T1.F.T Methanococcus maripaludis C7 Young’s Bay 2.F.A2.F.A Msarcina semesiae MD1 (back) 2.F.M2.F.M R (3) LL (1)(1) 2.F.T2.F.T FFFF (1)(1) 3.F.A3.F.A Young’s Bay W (1) (Mouth) 3.F.M3.F.M HH (2) 3.F.T3.F.T JJJJ (2)(2) G (4) 1.H.A1.H.A IIII (1)(1) 1.H.M1.H.M Mvorans hollandica WWM590 11 10 1.H.T1.H.T Msarcina calensis cali Legend: Methanococcus aeolicus Nankai3 Methanococcus aeolicus Nankai33 Legend:2.H.A2.H.A N (1) Methanococcus voltae A3 Methanococcus voltae A3 MethanococcusQ (1) voltae A3 CulturesDataset2.H.M Selection media Dataset2.H.M Selection media Methanococcus vannielii SB Methanococcus vannielii SB MethanococcusMsarcina MTP4 vannielii SB 2.H.T2.H.T Methanococcus maripaludis strain S2 Methanosarcina mazei Acetate1.F.A1.F.A Methanococcus maripaludis strain S2 MethanococcusMsarcina WH1 maripaludis strain S2 3.H.A3.H.A Methanococcus maripaludis C6 MethanococcusMsarcina thermophila maripaludis TM1 C6 Baker Bay MeOH1.F.M1.F.M Msarcina thermophila TM1 Methanococcus maripaludis C5 3.H.M Methanococcus maripaludis C5 MethanococcusO (2) maripaludis C5 3.H.M1.F.T3.H.M TMA1.F.T1.F.T Methanococcus maripaludis C7 Methanococcus maripaludis C7 MethanococcusLLLL (1)(1) maripaludis C7 3.H.T3.H.T Acetate2.F.A2.F.A Msarcina semesiae MD1 Young’s Bay Methanosphaera stadtmanae DSM 3091 MsarcinaNN (1) semesiae MD1 MeOH2.F.M2.F.M Methanobrevibacter smithii ATCC 35061REE (3) (2) (back) EE (2) Methanobacterium thermoautotrophicumLGGL (1)(1) (2) str Delta H TMA2.F.T2.F.T GG (2) FF (1) Candidatus MethanoregulaFFDFF (1) (1)(1) boonei 6A8 3.F.A Acetate3.F.A3.F.A W (1) Young’s Bay MethanocorpusculumWC (1)(1) labreanum Z MeOH3.F.M3.F.M HH (2) (Mouth) MethanospirillumJHHJ (2)(2) (2) hungatei JF1 JJ (2) TMA3.F.T3.F.T CandidatusJJSJJS (2) (2) (2)(2) Methanosphaerula palustris E19c MethanoculleusGBB (4) (2) marisnigri JR1 1.H.ABlank1.H.A BBBB (2)(2) II (1) Uncultured methanogenicIIIIII (2) (2)(1)(1) archaeon RCI 1.H.M 1.H.M1.H.A1.H.M Mvorans hollandica WWM590 MethanosaetaMvoransEE (1)(1) hollandica harundinacea WWM590 1.H.T1.H.M Msarcina calensis cali 1.H.T MethanosaetaMsarcinaTT (1)(1) calensis concilii cali GP6 N (1) 2.H.A1.H.T2.H.A MethanosaetaNOO (1) (1) thermophila PT f (7) QMsarcina (1) barkeri fusaro 2.H.M2.H.A2.H.M Msarcina barkeri fusaro Msarcina MTP4 n (3) Msarcina MTP4mazei Go1 2.H.T 2.H.T2.H.M2.H.T Msarcina WH1 i (5) Msarcina WH1napels 100 3.H.A2.H.T Msarcina thermophila TM1 3.H.A MethanococcoidesMsarcina thermophilaacetivorans burtonii C2ATM1 DSM 6242 HighSalt 454-Pyro 3.H.M3.H.A k (2) OMsarcina (2) siciliae T4.M Freshwater 3.H.M Msarcina siciliae T4.M l (8) LLMsarcinaLL (1)(1) siciliae C2J 3.H.T3.H.M3.H.T Msarcina siciliae C2J NN (1) c (7) NNMsarcina (1) siciliae HI350 3.H.T EE (2) d (1) EEKEEK (1)(1) (2)(2) GG (2) e (5) GGU (1) (2) Figure C.2: Only the Methanosarcinales subtree of the mcrA locus tree from Figure 4.1. D (1) h (3)DVV (1)(1) C (1) MethanosarcinaCPP (1)(1) barkeri str fusaro J (2) MethanosarcinaJAAJAA (2)(2) (1)(1) mazei strain Goe1 S (2) The distribution of isolates is mapped onto the phylogeny alongside the number of reads MethanosarcinaSGS (2)(1)(2) acetivorans str C2A BB (2) m (2)BBPPBBPP (2)(1)(2)(1) I (2) b (10)IXIX (2) (2) (1)(1) E (1) g (1)EBEB (1)(3)(1)(3) T (1) in each 454-pyrosequencing OTU. The red arrow points to the 56 M. mazei isolates as a (38)THT (1)(5)(1) OO (1) j (6)OODD (1)(1) Msarcina barkeri fusaro MsarcinaYoY (1)(1) barkeri fusaro Msarcina mazei Go1 pMsarcinaM (1) (3) mazei Go1 Msarcina napels 100 shown in Figure 4.2. MsarcinaKKKK (3)(3) napels 100 Msarcina acetivorans C2A MsarcinaMM (1) acetivorans C2A Msarcina siciliae T4.M MsarcinaQQ (1) siciliae T4.M Msarcina siciliae C2J MsarcinaAA (33)(33) siciliae C2J Msarcina siciliae HI350 MsarcinaZZ (2)(2) siciliae HI350 K (1) KMsarcinaK (1)(1) mazei C16 U (1) UCC (1) (1) VV (1)(1) PP (1)(1) AAAA (1)(1) G (1) PPPP (1)(1) XX (1)(1) BB (3)(3) H (5) DD (1) YY (1)(1) M (3) KKKK (3)(3) MM (1) QQ (1) AA (33)(33) ZZ (2)(2) Msarcina mazei C16 CC (1)

160 A) B)

SarPi M. mazei C16

LYC LYC M. mazei WWM610

Go1 M. mazei TMA S6 M. mazei LYC

M. mazei Go1 WWM610 C16 M. mazei S_6 TMA TMA M. mazei SarPi

1.0

Figure C.3: A) A whole genome alignment of all seven closed Methanosarcina mazei genomes. Colored regions are local co-linear blocks (LCBs), which are regions lacking rearrangement of homologous sequence. Identical colors and connecting lines identify LCBs found in multiple genomes. Blocks below the genome position scale represent inversions relative to the reference M. mazei SarPi. White bars present within blocks represent regions of low sequence identity, with larger white bars indicating lower sequence identity. B) A dendrogram created by hierarchical clustering (average neighbor algorithm) double cut and join distance, which is a measure of genome synteny.

161

Figure C.4: Unrooted maximum likelihood phylogenies of all HdrA, HdrB, and HdrC homologs for all Methanosarcinales and Methanocella strains. Red and blue branches denote clades of genes solely found within Methanosarcina or Methanocella, respectively.

162 fdhB B) homologs

Methanomicrobiales!

77 A) 19 Methanoculleus_marisnigri_JR1_frhB Methanoculleus_bourgensis_MS2_fdhB fdhA 34 Methanocella_arvoryzae_MRE50_fdhB 0.1 99 Methanocella_paludicola_SANAE_fdhB 18 Methanoculleus_bourgensis_MS2_fdhB 70 Methanoculleus_marisnigri_JR1_frhB homologsMethanosarcina mazei 3 F A 2 8 Methanoculleus_bourgensis_MS2_fdhB 44 Methanoculleus_marisnigri_JR1_frhB Methanofollis_liminatans_DSM_4140_frhB

Methanosarcina mazei C16 99 78 Methanoplanus!

Methanocorpusculum_labreanum_Z_fdhB 100 hypothetical_protein_Methanocorpusculum_labreanum_Z Methanosarcina mazei (20 isolates) 98 Methanothermobacter! 50 16 Methanothermus_fervidus_DSM_2088_fdhB Methanosarcina sp (4 isolates) Methanobacterium_sp._Maddingley_MBC34_frhB 50 Methanobrevibacter_smithii_ATCC_35061_fdhB 55 79 81 82 Methanobrevibacter_ruminantium_M1_fdhB Methanosphaera_stadtmanae_DSM_3091_FdhB 100 Methanosarcina sp 2 H T 1A 6 Methanobacterium_formicicum_DSM_3637_fdhB

100 36 77 72 Methanococcus! Methanosarcina sp 1 H T 1A 1 90 85 Methanococcus_vannielii_SB_frhB 71 63 Methanococcus_voltae_A3_frhB 67 65 Methanococcus_aeolicus_Nankai-3_frhB Methanosarcina sp 1 H A 2 2 Methanothermococcus_okinawensis_IH1_fdhB 76 Methanothermococcus_okinawensis_IH1_fdhB Methanocaldococcus! Methanosarcina mazei Naples 100 100 41 86 48 87 ! 71 Methanosarcina horonobensis HB 1 82 Methanocaldococcus_villosus_KIN24-T80_fdhB 100

Methanosarcina sp 1 F T 1B 5 33 Methanococcus! 97 97 Methanococcus_maripaludis_C5_fdhB 60 98 Methanococcus_vannielii_SB_frhB uncultured archaeon Methanococcus_voltae_A3_frhB 100 100 Methanocella! Methanolobus psychrophilus R15

Methanosarcina calensis Cali 92 Methanobacterium! 66

Methanosarcina_sp_(4 isolates)_ fdhB Methanosarcina_sp_2_H_T_1A_8_ fdhB Methanosarcina_sp_2_H_A_1B_4_ fdhB Methanosarcina_sp_1_H_T_1A_1_ fdhB Methanosarcina_sp_Naples_100_ fdhB 51 Methanosarcina_mazei_C16_ fdhB Methanosarcinales! Methanosarcina_mazei_(22 isolates)_ fdhB 100 Methanosarcina_sp_1_F_T_1B_5_ fdhB Methanosarcina_horonobensis_HB_1_ fdhB 100 Methanosarcina_calensis_Cali_ fdhB Methanococcoides_sp_1_F_T_1A_2_ fdhB 99 Methanolobus_psychrophilus_R15_fdhB uncultured_archaeon_fdhB Methanopyrus_kandleri_AV19_frhB Figure C.5: Unrooted maximum likelihood phylogenies of all FdhA and FdhB homologs.

163 YBM YBB BB Distance)from)YBM)(km) 0 Distance)from)YBB)(km) 4 0 Distance)from)BB)(km) 20 24 0 Latitude 46.17469 46.15968 46.28551 Longitude 123.84933 123.80651 124.05187 pH 6.6 6.3 7.5 soluble)salts)(mmhos) 2.47 2.88 2.41 Organic)matter)(%) 5.4 8.8 1.8 P)(ppm) 6 15 17 K))(ppm) 480 462 454 Ca)(meq) 10 11.6 7.1 mg)(meq) 8.1 10.8 8.1 N03)(ppm) 3 2.5 2 NH4)(ppm) 150 170 15.5 S)(ppm) 32.4 160 132 B)(ppm) 2.1 1.2 2.4 Zn)(ppm) 12 7.7 5.2 Mn)(ppm) 243 162 20 Cu)(ppm) 3.6 3.3 2.4 Fe)(ppm) 214 254 130 Total)bases 19.3 23.6 16.4 Class SILT)LOAM SILT)LOAM SANDY)LOAM Sand 19.6 27.1 53.6 Silt 68.6 57.3 36.6 Clay 11.8 15.6 9.8

Table C.1: Geographic distance between samples and measured geochemical parameters at each sampling site.

164 Maximum( Number(of( scaffold( Total(length( Number(of( Isolate(ID Number(of(scaffolds contigs N50((kb) length((kb) (Mb) CDS CDS/1kb YBM.F.A.1A.3 124 151 77.1 274 4.05 3892 0.96 YBM.F.A.1B.3 145 169 49.8 180 4.03 3868 0.96 YBM.F.A.1B.4 171 189 47.0 166 4.05 3902 0.96 YBM.F.A.2.8 238 263 30.2 108 3.97 3858 0.97 YBM.F.M.0.5 164 180 44.2 151 4.07 3934 0.97 YBM.H.A.0.1 157 185 44.8 172 3.97 3817 0.96 YBM.H.A.1A.1 172 187 39.9 261 4.04 3898 0.96 YBM.H.A.1A.3 205 214 36.7 111 4.08 3935 0.96 YBM.H.A.1A.4 123 174 67.9 433 3.99 3852 0.97 YBM.H.A.1A.6 231 237 36.0 77 4.08 3964 0.97 YBM.H.A.2.1 294 320 25.5 116 4.00 3869 0.97 YBM.H.A.2.3 171 206 42.9 125 4.09 3953 0.97 YBM.H.A.2.6 204 232 37.0 136 4.01 3887 0.97 YBM.H.A.2.7 383 462 42.5 130 4.25 4006 0.94 YBM.H.A.2.8 337 381 22.1 69 4.08 3934 0.97 YBM.H.M.0.1 176 191 42.2 251 4.06 3972 0.98 YBM.H.M.1A.1 175 184 44.8 130 4.09 3957 0.97 YBM.H.M.1A.2 155 181 51.4 177 4.08 3935 0.96 YBM.H.M.1A.3 257 270 27.6 92 4.08 3944 0.97 YBM.H.M.2.1 263 317 31.2 100 4.19 4054 0.97 YBM.H.M.2.2 154 201 44.4 217 4.08 3934 0.96 YBM.H.M.2.3 142 198 47.3 166 3.98 3842 0.97 YBM.H.M.2.4 260 266 26.9 93 4.08 3957 0.97 YBM.H.T.2.1 159 194 46.7 131 4.08 3939 0.97 YBM.H.T.2.3 148 173 44.9 176 3.97 3839 0.97 YBM.H.T.2.5 142 186 49.4 162 4.09 3952 0.97 BB.F.A.2.3 189 200 41.1 125 4.08 3942 0.97 BB.F.A.2.4 146 169 53.6 300 4.20 4068 0.97 BB.F.T.0.2 136 178 50.6 277 4.09 3943 0.96 BB.F.T.2.6 236 245 31.6 94 4.06 3919 0.96 YBB.F.A.1A.1 270 287 30.2 121 4.11 4053 0.99 YBB.F.A.1A.3 147 210 46.5 163 4.07 3912 0.96 YBB.F.A.1B.1 167 196 43.7 133 4.08 3945 0.97 YBB.F.A.2.12 209 216 35.3 124 4.08 3925 0.96 YBB.F.A.2.3 185 196 41.3 130 4.08 3932 0.96 YBB.F.A.2.5 146 172 46.5 159 4.07 3929 0.97 YBB.F.A.2.6 160 203 41.7 212 4.04 3899 0.96 YBB.F.A.2.7 142 171 44.2 173 4.03 3870 0.96 YBB.F.T.1A.1 156 184 50.4 217 4.16 4022 0.97 YBB.F.T.1A.2 163 188 43.8 115 4.16 4020 0.97 YBB.F.T.1A.4 139 174 51.0 391 4.16 4028 0.97 YBB.F.T.2.1 162 198 42.9 170 4.11 3998 0.97 YBB.H.A.1A.1 349 381 20.3 67 4.01 3954 0.99 YBB.H.A.1A.2 182 191 40.8 96 4.10 3972 0.97 YBB.H.A.2.1 159 185 44.6 255 3.98 3859 0.97 YBB.H.A.2.4 157 182 49.0 305 4.12 3990 0.97 YBB.H.A.2.5 186 236 42.7 109 4.09 3968 0.97 YBB.H.A.2.6 167 198 42.9 117 4.01 3890 0.97 YBB.H.A.2.8 222 228 30.4 109 4.01 3891 0.97 YBB.H.M.1A.1 167 217 47.5 306 4.12 3970 0.96 YBB.H.M.1B.1 211 223 32.2 178 4.13 3991 0.97 YBB.H.M.1B.2 136 177 53.2 229 4.13 3973 0.96 YBB.H.M.1B.5 136 181 54.3 183 4.12 3978 0.97 YBB.H.M.2.7 199 211 35.8 111 4.00 3864 0.97 YBB.H.T.1A.1 128 162 56.6 212 4.07 3965 0.97 YBB.H.T.1A.2 185 194 37.5 109 4.09 3965 0.97 Minimum 123 151 20.3 67 3.97 3817 0.96 Median 167 196 43.3 155 4.08 3941 0.97 Maximum 383 462 77.1 433 4.25 4068 0.96

Isolate(ID(key:( sediment(sample isolation(media isolation(substrate dilution colony YBM3=3Young's3Bay3 (Mouth) H3=3high3salt3media A3=3acetate 1.00E+00 # YBB3=Young's3Bay3(near3 an3inlet) F3=3freshwater3media M3=3methanol 1.00EH01 BB3=3Baker's3Bay T3=3TMA 1.00EH02

Table C.2: Genome assembly contiguity statistics for all 56 M. mazei isolates

165 Number/of/specific*,/ Number/of/specific*,/unique/ Category Group unique/spacers spacers/(%/of/total) mazei&WC 75 3.35% Clade mazei&T 93 4.15% YBM 92 4.11% Site YBB 0 0.00% H 0 0.00% Media F 0 0.00% A 0 0.00% Substrate M 0 0.00% T 0 0.00% I&A 63 2.81% I&B 3 0.13% I&C 0 0.00% I&D 25 1.12% Subtype I&E 84 3.75% III&A 38 1.70% III&B 75 3.35% III&C 69 3.08% Total 2240 100.00% *CspacersCfoundCinC>50%CofCisolatesCinCgroupCandCnotCfoundCinCotherCgroups

Table C.3: The number of unique spacers unique to each group in each category.

166 acetate Df Sum&Sq Mean&Sq F&value Pr(>F) clade 1 0.000 0.000 0.564 0.458 round 3 0.003 0.001 14.338 0.000 *** clade:round 3 0.000 0.000 1.320 0.283 Residuals 36 0.002 0.000 methanol Df Sum&Sq Mean&Sq F&value Pr(>F) clade 1 0.000 0.000 1.423 0.243 round 3 0.003 0.001 11.608 0.000 *** clade:round 3 0.002 0.001 6.232 0.002 ** Residuals 27 0.002 0.000 trimethylamine Df Sum&Sq Mean&Sq F&value Pr(>F) clade 1 0.001 0.001 11.895 0.002 ** round 3 0.001 0.000 4.732 0.008 ** clade:round 3 0.000 0.000 1.617 0.205 Residuals 32 0.003 0.000 Signif.&codes:&&0&‘***’&0.001&‘**’&0.01&‘*’&0.05&‘.’&0.1&‘&’&1

Table C.4: Nested ANOVA tables for assessing significant treatment effects of clade (mazei-WC and mazei-T) and round (Rounds 1-4).

167