MODERN STROMATOLITE MICROBIAL DIVERSITY IN YELLOWSTONE NATIONAL

PARK, WYOMING IN THE CONTEXT OF MICROBIAL ALPHA AND BETA

DIVERSITY ALONG YELLOWSTONE GEOTHERMAL OUTFALLS

by Charles Pepe-Ranney c Copyright by Charles Pepe-Ranney, 2013 All Rights Reserved A thesis submitted to the Faculty and the Board of Trustees of the Colorado School of Mines in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Environmental Science and Engineering).

Golden, Colorado Date

Signed: Charles Pepe-Ranney

Signed: Dr. John Spear Thesis Advisor

Golden, Colorado Date

Signed: Dr. John McCray Professor and Head Department of Civil and Environmental Engineering

ii ABSTRACT

To better understand the diversity of mechanisms for stromatolite morphogenesis as well as the diversity of microorganisms and microbial metabolisms associated with stro- matolites it is imperative to describe the biological attributes of the full diversity of ”living” stromatolite-like geobiological structures worldwide. It is of additional interest to describe geographically and geochemically comparable ecosystems but lacking stromatolites to stromatolite-forming environments to unravel the geochemical and microbiological as- pects of stromatolite morphogenesis. Here we present a living stromatolite system in a Yellowstone National Park (YNP) hot spring that exhibits features in contrast to many popularly studied modern stromatolite analogs. Most notably, the YNP stromatolites are more finely laminated than living marine stromatolites and may be a more suitable textural analog to finely laminated stromatolites found in the rock record. The YNP stromatolites are composed of silica-encrusted cyanobacterial mats. The predominant lithofacies of the YNP stromatolite is comprised of silica-encrusted filaments and is distinctly laminated. The laminated quality of the main lithofacies is due to an alternating–possibly on a diurnal cycle–growth orientation of filamentous cyanobacteria. Two cyanobacterial mat types grow on the stromatolite surfaces and are preserved as two distinct lithofacies. One mat is present when the stromatolites are submerged or at the water-atmosphere interface and the other when stromatolites protrude from the hot spring. The lithofacies created by the encrustation of submerged mats constitutes the bulk of the stromatolites, is comprised of silica-encrusted filaments, and is distinctly laminated. To better understand the cyanobacterial membership and community structure differences between the mats, we collected mat samples from each type. Molecular methods revealed that submerged mat cyanobacteria were predominantly one novel phylotype while the exposed mats were predominantly heterocystous phylotypes (Chlorogloeopsis HTF and

iii Fischerella). The cyanobacterium dominating the submerged mat type does not belong in any of the subphylum groups of cyanobacteria recognized by the Ribosomal Database Project and has also been found in association with travertine stromatolites in a Southwest Japan hot spring. Cyanobacterial membership profiles indicate that the heterocystous phylotypes are ’rare biosphere’ members of the submerged mats. The heterocystous phylotypes likely emerge when the water level of the hot spring drops. Environmental pressures tied to water level such as sulfide exposure and possibly oxygen tension may inhibit the heterocystous types in submerged mats.

+ In contrast to living marine examples where the interplay of pCO2 and [Ca2] is the main influence on microbial lithification, the lithifcation in the YNP system is driven by a rapid decrease in silica solubility between high-temperature subsurface water emitted from the hot spring vent and lower-temperature surface water. The continuously favor- able conditions for rapid lithification in the hot spring system coupled to a cyanobacterial diurnal growth cycle that manifests itself as fine laminations in the stromatolite lithofacies leads to a growth rate for the YNP stromatolite on the order of centimeters of deposition in several months. The YNP system displays a perhaps overlooked mechanism for stroma- tolite morphogenesis and is compelling both for the presence of a seemingly unstudied cyanobacterium as well as a finely-laminated modern stromatolite analog of which there are few other examples.

iv TABLE OF CONTENTS

ABSTRACT...... iii

LIST OF FIGURES ...... ix

LIST OF TABLES ...... xiii

LIST OF ABBREVIATIONS ...... xiv

ACKNOWLEDGMENTS ...... xv

CHAPTER 1 INTRODUCTION ...... 1

CHAPTER 2 BACKGROUND ...... 3

2.1 Methods for investigations of microbial diversity via surveys of phylogenetic marker genes ...... 4

2.1.1 Preparation of environmental samples for amplicon sequencing surveys of marker genes ...... 4

2.1.1.1 DNA extraction bias ...... 5

2.1.1.2 PCR bias ...... 5

2.1.1.3 Additional considerations ...... 10

2.1.1.4 Sanger sequencing versus NGS ...... 16

2.1.2 Sequence Quality Control ...... 18

2.1.2.1 Chimera Checking ...... 19

2.1.2.2 Sequence Analysis Software ...... 20

2.1.3 Clustering algorithms ...... 21

2.2 Eco-physiology and ecology of Cyanobacteria in YNP ...... 23

2.2.1 Early studies of cyanobacterial diversity ...... 24

v 2.2.2 Synechococcus microdiversity and phylogeny ...... 28

2.2.3 Nitrogen fixation at Octopus Spring ...... 29

2.3 Microbial Diversity of Modern Stromatolites ...... 30

2.3.1 Phylum level diversity of living stromatolites ...... 32

2.3.2 Beta diversity of living stromatolites ...... 33

2.3.3 Methods ...... 37

2.3.3.1 Phylum Level Diversity ...... 37

2.3.3.2 Phylogenetic trees ...... 40

2.3.3.3 Unifrac Principal Coordinates Plot ...... 40

CHAPTER 3 REVIEW AND ANALYSIS OF THE TAXONOMIC ORGANIZATION OF CYANOBACTERIA IN SSU RRNA GENE DATABASES ...... 41

3.1 Results ...... 45

3.1.1 Review of taxonomic organizations ...... 45

3.1.1.1 Broad groups ...... 46

3.1.1.2 Genera ...... 47

3.1.2 Increase in known diversity by addition of sequences to public databases ...... 55

3.1.3 Analyses of the width of taxonomic levels ...... 55

3.1.4 Consistency of high resolution classifications with common clustering methods ...... 60

3.1.5 Additional considerations ...... 61

3.2 Discussion ...... 64

3.3 Methods ...... 65

3.3.1 Cumulative cyanobacterial sequences ...... 65

vi 3.3.2 Sequence alignment and clustering ...... 66

CHAPTER 4 CYANOBACTERIAL CONSTRUCTION OF HOT SPRING SILICEOUS STROMATOLITES IN YELLOWSTONE NATIONAL PARK ...... 67

4.1 Results ...... 70

4.1.1 Predominant Cyanobacterial Phylotypes ...... 72

4.1.2 Major Non-cyanobacterial Phylotypes ...... 77

4.1.3 Alpha Diversity ...... 78

4.2 Discussion ...... 85

4.2.1 Surface Community Structure ...... 85

4.2.2 Phylogeny of Stromatolite Builder ...... 96

4.2.3 Comparisons to other living stromatolite studies ...... 96

4.2.4 Major Non-cyanobacterial Phylotypes ...... 98

4.3 Summary ...... 98

4.4 Methods ...... 99

4.4.1 Sample Collection ...... 99

4.4.2 Microscopy, SEM and Thin-sectioning ...... 100

4.4.3 DNA Extraction, PCR, Cloning, Capillary Sequencing ...... 100

4.4.4 Sanger Sequence Assembly and Quality Trimming ...... 100

4.4.5 PCR and Pyrosequencing ...... 101

4.4.6 Sequence Analysis ...... 101

4.5 Supplementary Methods ...... 103

CHAPTER 5 TRENDS IN MICROBIAL COMMUNITY ALPHA AND BETA DIVERSITY IN HOT-SPRING OUTFALLS AT YELLOWSTONE NATIONAL PARK, WYOMING ...... 108

vii 5.1 Results ...... 109

5.1.1 Beta Diversity ...... 109

5.1.1.1 Emphasis on cyanobacterial membership and structure . . 115

5.2 Alpha Diversity ...... 117

5.3 Discussion ...... 117

5.3.0.2 Temperature influence on alpha and beta diversity . . . . . 122

5.4 Methods ...... 123

5.4.1 DNA Extraction, PCR and Sequencing ...... 123

5.4.2 Data analysis ...... 123

CHAPTER 6 CONCLUSION AND FUTURE WORK ...... 125

REFERENCES CITED ...... 127

APPENDIX - COPYRIGHT AGREEMENT AND CO-AUTHOR PERMISSIONS FOR PUBLISHED CONTENT ...... 155

viii LIST OF FIGURES

2.1 Plot of total character frequencies in SSURef104NR Silva database ...... 8

2.2 Rescaled plot of total character frequencies ...... 9

2.3 Base frequencies for E. coli positions 515 to 534 in the SSURef104NR Silva database...... 11

2.4 Base frequencies for E. coli positions 907 to 927 in the SSURef104NR Silva database...... 12

2.5 Base frequencies for E. coli positions 515 to 534 in the SSURef104NR Silva database with only Archaeal sequences...... 13

2.6 Base frequencies for E. coli positions 907 to 927 in the SSURef104NR Silva database with only Archaeal sequences...... 14

2.7 Analysis of 515F and 907R SSU rRNA primers ...... 15

2.8 Cartoon of barcoded PCR primers...... 17

2.9 Cyanobacterial morphotype diversity - Temperature ...... 26

2.10 Cyanobacterial morphotype diversity - pH...... 27

2.11 Cartoon depicting diurnal cycling in a microbial mat...... 31

2.12 Phylum-level diversity of selected living stromatolite microbial diversity studies...... 34

2.13 Diversity of living stromatolite sequences in full tree of nearly full-length SSU rRNA gene sequences ...... 35

2.14 Diversity of living stromatolite sequences from selected studies ...... 36

2.15 Principal coordinate plot of unweighted Unifrac distances between selected stromatolite sequence libraries ...... 38

ix 2.16 Principal coordinate plot of unweighted Unifrac distances between selected stromatolite sequence libraries and a library from the Guerrero Negro hypersaline mats...... 39

3.1 Cumulative sum of cyanobacterial 16S sequences submitted to public sequence databases...... 43

3.2 Cumulative sum of cyanobacterial 16S OTUs submitted to public sequence databases. 3% and 5% are typically used to denote “”-level and “genus”-level microbial taxa, respectively...... 44

3.3 Silva Guide tree of cyanobacterial 16S gene sequences (SSURef111) colored and collapsed by “class” annotations. Width of clades denotes number of sequences within the clade...... 48

3.4 Greengenes Guide tree of cyanobacterial 16S gene sequences colored and collapsed by “class annotations. Width of clade denotes number of sequences within the clade...... 49

3.5 Counts of sequences in broadest suphylum cyanobacterial groups in the Silva (SSURef111 taxonomic hierarchy.) ...... 50

3.6 Counts of sequences in the cyanobacterial classes in the Greengenes taxonomic hierarchy...... 51

3.7 Counts of sequences in RDP cyanobacterial ”groups“...... 52

3.8 Counts of sequences in Silva genera...... 53

3.9 Counts of sequences in Greengenes genera...... 54

3.10 Cultured representation in cyanobacterial OTUs...... 56

3.11 Cumulative taxonomic level-specific cumulative histograms of minimum inter-taxon identity for taxonomic groups in the Silva SSURef111 SSU rRNA gene database...... 57

3.12 Cumulative taxonomic level-specific cumulative histograms of minimum inter-taxon identity for taxonomic groups in the Greengenes SSU rRNA gene database...... 58

3.13 Cumulative taxonomic level-specific cumulative histograms of minimum inter-taxon identity for taxonomic groups in the RDP SSU rRNA gene database...... 59

x 3.14 NMI scores for Silva and Greengenes genus as well as RDP group annotations over a range of OTU assignment sequence identity thresholds. . 62

3.15 Purity scores for Silva and Greengenes genus as well as RDP group annotations over a range of OTU assignment sequence identity thresholds. . 63

4.1 Selected images of stromatolite facies ...... 69

4.2 A. The bar chart depicts the distribution of pyrosequencing reads into phyla for each pyrosequence library. B. Rank abundance plot of cyanobacteria 97% identity OTUs in the combined PSFM and PCDM libraries respectively. . 73

4.3 The heat-map and dendrogram shown here depict the distribution and abundance of nonsingleton/doubleton cyanobacterial reads from the pyrosequence libraries in this study. The dendrogram is not a phylogeny but a furthest-neighbour clustering of all pairwise substitution per site comparisons of the sequences...... 86

4.4 Maximum likelihood tree depicting the phylogenetic placement of Chlorogloeopsis and Fischerella phylotypes found in this study...... 87

4.5 A. Location of the stromatolite building phylotype in a broad, unrooted, maximum likelihood phylogeny of cyanobacteria. Groups are those named and recognized by RDP directed by study done by Wilmotte and Herdman [1]. Nodes with bootstrap support ¡50% were collapsed and thus polytomies denote ambiguous branching order of descendant nodes. B. Histogram of all identity values as determined by USearch between SFM-seq and cyanobacterial sequences in SSUParc106...... 88

4.6 Phylogeny of all cyanobacterial 16S SSU rRNA gene sequences from eight molecular studies of modern stromatolite studies as reviewed by Foster and Green [2]...... 89

4.7 Distribution of 16S rRNA gene non-cyanobacterial pyrosequences into 97% identity OTUs. OTUs with less than or equal to five total 16S rRNA gene sequences summed across every pyrosequence library are omitted from the figure...... 90

4.8 Rarefaction curves for pyrosequence libraries and just cyanobacterial sequences for combined PCDM and PSFM libraries respectively. Sequences were clustered at 97% identity using UClust. Points in curves represent the average observed OTUs for 10 re-samplings. Libraries were re-sampled at 100 sequence intervals...... 91

xi 5.1 Principal coordinate plot of Unifrac distances between 16S gene sequence libraries presented by Osburn et al. [3]. Points are colored by environment types described by Osburn et al. [3]...... 111

5.2 Heatmap of OTU abundance distributions for all 16S gene sequence libraries discussed in this study...... 112

5.3 Principal coordinate plot of weighted Unifrac distances between 16S gene sequence libraries presented by Osburn et al. [3] and Imperial Geyser outfall sample libraries...... 113

5.4 Principal coordinate plot of Unifrac distances between 16S gene sequence libraries presented by Osburn et al. [3]. Points are colored by type of geochemical platform...... 114

5.5 Heatmap of cyanobacterial OTU abundance distributions for all 16S gene sequence libraries discussed in this study with at least 100 cyanobacterial sequences ...... 116

5.6 Effective Shannon diversity of all 16S gene sequence libraries. Points grouped by environment type and colored by temperature...... 118

5.7 Effective Shannon diversity of all 16S gene sequence libraries plotted against temperature...... 119

5.8 Effective Shannon diversity of cyanobacteria in all 16S gene sequence libraries with at least 100 cyanobacterial sequences...... 120

xii LIST OF TABLES

2.1 Summary of commonly used clustering software in microbial diversity studies . 22

3.1 Summary of Nomenclature Sources for Major SSU rRNA Databases ...... 42

3.2 Database versions used in this study...... 45

4.1 16S library information for stromatolite samples ...... 73

4.2 Sequences in SSUParc106 and/or Genbank with at least 97% identity to SFM-seq ...... 76

4.3 Best BLAST hit in SSURef104 to major non-cyanobacterial OTUs ...... 78

4.4 Alpha diversity calculations for pyrosequence libraries...... 86

xiii LIST OF ABBREVIATIONS

YNP ...... Yellowstone National Park

SSU ...... Small Subunit rRNA ...... Ribosomal Ribonucleic Acid

NGS ...... Next Generation Sequencing

SLP ...... Single Linkage Preclustering

HC ...... Hierarchical Clustering

GHC ...... Greedy Heuristic Clustering

BC ...... Bayesian Clustering

PCoA ...... Principal Coordinates Analysis

RDP ...... Ribosomal Database Project

OTU ...... Operational Taxonomic Unit

NMI ...... Normalized Mutual Information

xiv ACKNOWLEDGMENTS

I would first and foremost like to thank my lab buddy Chase Williamson for many non- science lunch conversations and for humoring many distractions. Thanks to my advisor John spear for allowing me the freedom to learn often at the expense of short-term pro- ductivity. Thanks to Halley Eydal for her friendship and guidance. Thanks to Shannon Ulrich for great company on trips to Yellowstone. Jackson Lee and Chuck Roberson have been invaluable sounding boards for bioinformatics ideas. Thanks to Junk Munakata-Marr for being an excellent teacher and a great manuscript editor. Lastly, I would like to thank Jason Sahl for all his help along the way.

xv CHAPTER 1 INTRODUCTION

This dissertation presents a modern stromatolite analog study at a hot spring in Yellowstone National Park (YNP), Wyoming. The study employed molecular methods to characterize the microbial community associated with the hot spring stromatolites. In order to properly contextualize the data, hot spring microbial diversity from other geothermal features with YNP was also studied. Lastly, as Cyanobacteria are this study’s primary taxonomic focus, the current state of Cyanobacterial small subunit (SSU) ribosomal RNA (rRNA) systematics is reviewed. The chapters are organized as follows:

• Chapter 2 - Review and re-analysis of published living stromatolite microbial diversity studies. Additional review of the microbiology of YNP hot spring outfall cyanobacterial mats.

• Chapter 3 Review of the cyanobacterial taxonomic organization is SSU rRNA gene databases including implications of reference taxonomic systems on microbial diversity studies.

• Chapter 4 Presents SSU rRNA gene diversity of the living YNP stromatolite. Includes analyses of the living YNP stromatolite microbiology in the context of previous studies of YNP hot spring cyanobacteria and other living stromatolite analogs.

• Chapter 5 Presents microbial diversity data from cyanobacterial mat samples taken at YNP siliceous and calcareous hot spring outfalls. Microbial diversity results are presented in the context of previous microbial diversity studies done at YNP geothermal features.

1 Stromatolites are the most and perhaps only conspicuous micro-fossils that can be seen with the naked eye. Although many biosignatures exist though which geobiologists can study the history of life, the charismatic nature of stromatolites evokes a special curiosity in microbiologists and geologists. Here we present a finely laminated living stromatolite that bears striking resemblance to many stromatolites in the rock record.

2 CHAPTER 2 BACKGROUND

The primary subject of this dissertation is the microbial diversity of a living stromatolite-analog in Yellowstone National Park (YNP); to mark the significance of the findings, results must be presented within the proper context. A wealth of scientific literature describes many facets of stromatolites in the rock record. In addition, living stromatolites have been extensively studied by geologists and microbiologists. Here we limit the background contextual information to studies of modern stromatolite analogues that employed molecular techniques to catalogue microbial diversity with specific emphasis on Small Subunit (SSU) Ribosomal Ribonucleic Acid (rRNA) gene surveys similar to the methods presented in this dissertation. Additionally, we present background information concerning the history of cyanobacterial mat research done in YNP. Modern stromatolites are found in a diverse array of geochemical conditions and the relevant ecological counterparts to a survey of a modern stromatolite’s diversity is not necessarily other modern stromatolite analogs but rather similar ecosystems within a more narrowly defined geographical and geochemical scope. The stromatolites presented here are constructed by a cyanobacterial mat ecosystem in a YNP hot-spring. To better understand the ecology and eco-physiology of the microbial members of this dissertation’s primary subject it is necessary to describe and discuss the differences and commonalities between the YNP stromatolite and the numerous cyanobacterial communities in YNP in addition to the contrasting attributes of the YNP stromatolite microbial community to other living stromatolite analogs. First, however, it is necessary to review the methods for surveying microbial diversity employed in our studies of the YNP stromatolite and other cyanobacterial ecosystems in YNP.

3 2.1 Methods for investigations of microbial diversity via surveys of phylogenetic marker genes

Microbiologists have long noted that the number of microbes counted in situ far exceeds counts come to by culture based methods [4]. For instance, Staley and Konopka [5] coined the phrase “The Great Plate Count Anomaly” to describe the discrepancy between in situ observations and culture-based studies. Many theories have been put forth from the absence of siderophores [6] in culture media to microbial senescence [7] to explain the significant number of uncultured microbes. Whatever the cause or combination of causes, the majority of microbes in an environment are not surveyed by culture-based techniques. Classical ecological metrics used to describe diversity rely on taxa abundance distributions [8], but, unlike insects, plants and animals, for instance, microbes generally do not have distinguishable morphological features even under the microscope to categorize them into discrete taxonomical groups. Therefore, observation of microbes in situ is not a suitable method for discerning taxa abundance distributions. Further, as described above, culture-based analyses and classifications of microbes do not always follow phylogeny and miss a great number of the in situ microbial population. It is therefore necessary to utilize culture-independent, molecular methods to survey microbes in nature to conduct meaningful diversity based ecological studies [9]. Here we provide a brief introduction of molecular methods to provide context for the literature and results presented afterwards.

2.1.1 Preparation of environmental samples for amplicon sequencing surveys of marker genes

Regardless of the ultimate sequencing platform to be used, marker gene surveys from environmental samples follow the same two preliminary steps:

• DNA extraction

4 • PCR amplification of gene of interest.

Although each step is routine in many molecular laboratories, artifacts abound and can easily manifest themselves the final results.

2.1.1.1 DNA extraction bias

Willner et al. [10] studied how different DNA extraction methods affect microbial community profiles from human lung samples as well as mock communities. The basic extraction methods differed in the lysis technique, chemical or mechanical, as well as in the reagents used post-lysis. Differences in the microbial community of the lung samples themselves proved much greater than differences due to extraction variation although analysis of mock community results showed reproducible biases due to extraction method. As expected, mechanical lysis techniques recover more DNA from more robust cell envelopes as in the case of gram-positive . Rantakokko-Jalava and Jalava [11] previously reported mechanical lysis was more effective at recovering DNA from gram-positive bacteria and also noted mechanical as opposed to chemical lysis recovered greater diversity in general. Regardless of the extraction protocol it appears that extraction method bias is reproducible and therefore would not necessarily affect relative comparison of microbial communities prepared by similar techniques. Interestingly, Willner et al. [10] finds proprietary kits to yield the most reproducible results.

2.1.1.2 PCR bias

PCR bias can be categorized into two types:

• Primer bias,

• Amplification bias.

5 Primer bias is the preferential annealing of the PCR primers to specific templates. Frank et al. [12] showed that even single mismatches in the complimentary sequence of the primer and template can lead to significant bias against the non-matching template. Therefore it is imperative to understand the limitations of any selected primer pair to correctly interpret the absence of a particular marker gene from a dataset. For instance, commonly used SSU rRNA primers were systematically biased against the Verrucomicrobia phylum which led to its underrepresentation in soil microbial diversity studies [13]. While many protein-coding genes are utilized to assay the diversity of functional guilds [14, e.g.], most microbial diversity studies employ surveys of the SSU rRNA gene. The SSU rRNA gene is an excellent marker for microbial diversity because all known organisms posses the gene [15], it is highly conserved for stretches that allows for primer targets that are found in broad groups (even all three domains of life [16, e.g.]) and there are established and curated collections of annotated SSU rRNA genes that provide context for bioinformatic analysis of SSU rRNA gene libraries [17–19]. Additionally, much work has been done to understand what regions of the SSU rRNA gene are most appropriate for ecological analyses and how different regions might affect relative alpha (diversity within a community) and beta (shared diversity between communities) diversity indices between environments [20, 21]. Moreover, as curated SSU rRNA gene sequences are added to databases, primer specificity is constantly being re-assessed and improved upon [16, 22, 23] allowing researchers to critically evaluate past results and improve current methods. Primer design and analysis generally involves looking for and assessing primer targets in a multiple sequence alignment. The first degenerate SSU rRNA primers were indeed generated in this way [24] . As collections of sequence information grow [25], however, they become more difficult to handle and impossible to analyze by eye. For instance, to generate a multiple sequence alignment of all high-quality, near full-length

6 SSU rRNA databases from all three domains of life such that every column is unambiguously homologous the Silva database curators [18] expand the alignment to 50,000 positions. Much of the alignment is made up of low frequency columns. 2.1 and 2.2 depict the character frequencies of the SSURefNR104 Silva database alignment over the first 2000 alignment positions (of 50,000 total).

7 Figure 2.1: Plot of total character frequencies in SSURef104NR Silva database over the first 2000 alignment columns that include the 515F primer targets. The conserved positions where the primer anneals to the target have nearly no gaps whereas adjacent positions that represent a high degree of indels in the alignment have very few characters.

8 Figure 2.2: Rescaled plot of total character frequencies to show typical frequencies of low total character frequency columns. High character frequency bars are clipped in this figure.

9 Analyzing primers using this alignment requires that first the target columns in the extremely “gappy“ alignment be identified and then that the frequencies of all characters found in the alignment at the target positions are calculated. Working through this challenging informatics problem can yield interesting results. 2.3 and 2.4 show the character frequencies for the target columns for two popularly used three-domain SSU rRNA primers, 515F and 907R (named by the position of the primers relative to the Escherichia coli 16S gene), in the SSURef104 Silva database and 2.5 and 2.6 show the character frequencies for just Archaeal sequences. Clearly the actual character frequencies are at odds with the primers in several positions. Re-designing the primers to account for the new information yields are primer pair that is significanly improved with respect to its coverage of overall diversity as depicted in 2.7. Amplification bias is generally attributed to PCR saturation [12]. PCR saturation occurs when template reaches such a high concentration that is competes for primer targets with the primer oligos. Keeping PCR cycles to as small a number as downstream applications will allow is generally effective at inhibiting PCR saturation [26]. Schloss et al. [26] found that amplification bias is not significant especially with respect to the relative significance of other sample preparation artifacts.

2.1.1.3 Additional considerations

A large scale study of bias introduced by sequencing centers showed that extensive artifacts are introduced at the level of sequencing preparation [26]. So large, in fact, that differences between individual samples correlated more with the center where the samples were sequenced than with the sample metadata [26]. In the case where differences between samples is expected to be subtle, the variance in results introduced by sequencing preparation casts doubt as to whether amplicon sequencing is appropriate for finding high-resolution differences between microbial communities.

10 1.0 T C A 0.8 G

0.6

0.4 Frequency

0.2

0.0 G T GCC A GC M GCCGCGG TA A Traditional 515F sequence

Figure 2.3: Base frequencies for E. coli positions 515 to 534 in the SSURef104NR Silva database.

11 1.0 T C A 0.8 G

0.6

0.4 Frequency

0.2

0.0 CCG T C AATT CC TTTRA G TTT Traditional 907R sequence

Figure 2.4: Base frequencies for E. coli positions 907 to 927 in the SSURef104NR Silva database.

12 1.0 T C A 0.8 G

0.6

0.4 Frequency

0.2

0.0 G T GCC A GC M GCCGCGG TAA Traditional 515F sequence

Figure 2.5: Base frequencies for E. coli positions 515 to 534 in the SSURef104NR Silva database with only Archaeal sequences.

13 1.0 T C A 0.8 G

0.6

0.4 Frequency

0.2

0.0 CCG T C AATT CC TTTRA G TTT Traditional 907R sequence

Figure 2.6: Base frequencies for E. coli positions 907 to 927 in the SSURef104NR Silva database with only Archaeal sequences.

14 Figure 2.7: Analysis of 515F and 907R SSU rRNA primers. “m” denotes the modified version of the primer to incorporate current knowledge of the target region character fre- quencies. Analysis was conducted using the RDP ProbeMatch tool.

15 Additionally, the results presented in Schloss et al. [26] highlight the necessity of analyzing data in the light of preparation biases to ensure that observed ecological trends cannot be simply attributed to sample preparation artifacts.

2.1.1.4 Sanger sequencing versus NGS

Sanger sequencing technology is based on the inclusion of terminating and fluorescent nucleotides by a polymerase when replicating a single strand of DNA [27]. Pyrosequencing exploits the release of pyrophosphate during the action of the polymerase which in turn yields an illumination event via luciferase that is ultimately recorded on camera [28]. The Illumina technology also utilizes a fluorescent PCR terminator that is reversible in this case allowing for detection of each polymerase added base. While the ultimate signal and instrumentation is drastically different depending on the sequencing technology used, microbial ecology laboratories do not typically own and operate the sequencing machines themselves electing to purchase sequencing services on an as needed basis. Therefore, the preparation of sequencing template is carried out at the sequencing center as part of the sequencing service. Preliminary steps, however, are still performed by the researcher. For the microbial ecologist the most fundamental difference between preparing samples for sanger sequencing versus NGS is in the method of separating individual amplicons from a pool of PCR product. With sanger sequencing, amplicons are separated from each other by cloning whereas NGS amplicon sequencing studies utilize emulsion PCR techniques. Emulsion PCR is generally part of the sequencing services provided by a sequencing center. Another key difference between sanger and NGS is the use of barcoded primers in NGS. Barcoded primers carry unique tags at the 5’ end that eventually denote the sample from which the amplicon originated [29]. Barcoded primers allow researches to pool PCR product from many different samples and separate out sample-specific libraries in silico post sequencing (see 2.8).

16 Figure 2.8: Cartoon of barcoded PCR primers.

17 2.1.2 Sequence Quality Control

In the process of sequencing a DNA molecule differences from the true molecular sequence that ultimately become part of the final sequence analyzed by the researcher can originate at each step in the full workflow from PCR through the actual interpretation of the signal collected and produced the sequencing technology. Fortunately, extensive re-sequencing of known templates can provide a thorough understanding of sequencing error [30, 31, e.g.]. With an understanding of sequencing error, researchers can assign quality estimates to base calls [32], identify and discard poor sequencing reads [30] and in some cases correct errors with high confidence [33]. The primary use of any sequencing technology is for genomic shotgun sequencing where the depth of coverage for any genomic stretch is many-fold and therefore sequencing errors are generally easy to mitigate. When sequencing amplicons, each template is sequenced once (only one-fold coverage) and quality control of sequence libraries is significantly more involved. Quality control of sanger sequences involved critical examination of quality scores [32] for base calls and in the case of amplicon reads, the identification of probable chimeras [34]. Quality scores are generally expressed as a log-score and are derived from base-calling algorithms specific to the nature of sanger sequencing signal and tuned to characteristic error signatures established for the technology by analyzing the error-profiles derived sequencing known templates. Quality control in Next Generation Sequencing (NGS) amplicon libraries was thoroughly studied when the extent of the “rare-biosphere” [35] was suggested to be an artifact of sequencing errors that inflated diversity estimates [36, 37] . Huse et al. [30] identified key criteria for finding probable low quality sequencing reads from pyrosequencing technology. In a typical amplicon sequencing study, reads are screened with the criteria outlined by Huse et al. [30] in a preliminary quality control step. Subsequently, pyrosequencing reads can be further improved by filtering the underlying

18 flowgram signals through a noise-removal algorithm. Noise-removal algorithms exploit the error-profiles of the pyrosequencing technology to find the most likely set of sequences that emit the observed signal. Quince et al. [38] published the first noise-removal algorithm and a later faster, heuristic version was published by Reeder and Knight [39]. Errors due to the polymerase used in PCR can account for a significant number of the final sequencing errors and can be corrected for if the error-profile for the polymerase is known [40] however researchers generally elect to use a high fidelity polymerase for template amplification [3, e.g.] instead of computationally fixing polymerase errors post-sequencing. Illumina sequencing technology is cheaper per base than pyrosequencing and may become the preferred technology for amplicon sequencing studies [41] but the quality control parameters and software for the Illumina platform is not currently as mature a for pyrosequencing. Instead of noise-removal it has been proposed to tie low abundance probable error containing reads to their more abundant true sources [42] in a process called Single Linkage Preclustering . Combining parameters defined by Huse et al. [30] in addition to noise-removal or SLP can drastically reduce the error-rate in DNA sequences and significanly improve the researcher’s confidence in diversity estimates.

2.1.2.1 Chimera Checking

During PCR amplification, DNA strands that were generated in a previous cycle but not assembled through the entire template sequence can re-anneal to new templates in a subsequent step and produce a full amplicon originating from two (or more) different templates. Amplicons from more than one template are called “chimeras”. As artifacts of PCR and not representative of a true gene from the DNA template, chimeras should be discarded from sequence datasets [34].

19 Unfortunately it can be difficult to identify chimeric sequences [40, 43, 44] especially when the templates or “parent” sequences are closely related. Chimera detection algorithms generally rely on analyzing query sequences in the context of non-chimeric reference sequences [43, e.g.]. Although sometimes the reference databases do not represent a sufficient amount of diversity to act as useful context for the analysis of all query sequences and may contain chimeras themselves [34]. Recently, to circumvent the need for a reference database, chimera detection methods that assume high abundance reads in query sequence datasets to be likely non-chimeric and suitable as reference sequences have been developed. Chimera detection algorithms that find chimeras de novo without the need of a reference database by assuming chimera parents are in the query libraries themselves have proven to be sensitive and specific [40, 44]. Researchers can leverage the two general chimera-checking paradigms to robustly control for PCR artifacts in sequence libraries.

2.1.2.2 Sequence Analysis Software

Two sequence analysis software packages, QIIME [45] and Mothur [46] have emerged as the most utilized environments within which microbial ecologists analyze biological amplicon sequence libraries. Each package is developed with a drastically different philosophy but provide similar sets of functions. The QIIME development team hosts code on a freely accessible website [47]. Moreover, bleeding edge repositories of the code base can be accessed and installed. QIIME mostly utilizes the Python programming language and also interfaces significantly with the collection of bioinformatics modules available in PyCogent [48]. QIIME solicits code from the community at-large and is organized as a true open-source project. Notably, QIIME leverages the ability of Python to easily communicate with other software by electing to incorporate application controllers or “wrappers” rather than re-write algorithms that are already publicly available. For instance, QIIME “wraps” the RDP-Classifier [49] and

20 UClust [50] which are themselves standalone programs. In one sense QIIME is mostly a collection of parsers and input/output controllers that allows biologists to easily navigate the confusing maze of bioinformatics file formats. Mothur, on the other hand, is written from the ground-up and developers re-write Mothur specific implementations of published algorithms (e.g. the RDP Classifier). This makes the dependency list for Mothur significanly smaller than QIIME’s and allows for the Mothur software to be easily installed. Mothur is written in C++ which is generally faster than an on-the-fly interpreter language like Python but lacks the option parsing and readability conveniences of a high-level language like Python which makes the Mothur interface more difficult to use. A thorough understanding of each software program and more importantly a fundamental understanding of the basic steps in microbial ecology data analysis allows the research to mix and match the two programs and creatively generate, interpret and visualize results.

2.1.3 Clustering algorithms

Most of the preliminary literature that assessed NGS as a tool for microbial diversity studies focused on methods for identifying and possibly correcting sequencing errors [30, e.g.]. Recently, however, downstream data analysis methods and in particular, sequence clustering algorithms have been evaluated based on their ability to reproduce biologically relevant sequence groupings [51–57]. Similar to sequencing error, the magnitude of the effect the chosen clustering algorithm can have on diversity metrics depends on the metrics being applied. As a general rule, metrics based on taxa abundance distributions are more susceptible to clustering error. These metrics would include the Chao, Simpson and Shannon alpha diversity metrics or the theta and Bray-Curtis beta diversity metrics [58, for review see]. On the other hand, when clustering is utilized principally as a method to dereplicate or reduce a sequence dataset into a more manageable representative collection as in a typical Unifrac [59] based

21 analysis, the clustering error is negligible [60, 61]. Commonly used clustering algorithms by microbial ecologists can be categorized into three groups. Hierarchical Clustering (HC) , Greedy Heuristic Clustering (GHC) , and Bayesian Clustering (BC) . HC is a standard method for clustering distance matrices and allows for many different cluster definitions with respect to how items in the matrix are grouped together. Schloss and Westcott [53] suggest that the “average-neighbor” algorithm produces the most biologically relevant results for HC. GHC involves defining cluster centroids or seeds that non-centroids are either recruited to if they fall within a given distance threshold or become centroids themselves. BC is a broad category, there are two available BC algorithms to date for clustering biological sequences; Cheng et al. [54] utilizes a stochastic search clustering method whereas Hao et al. [55] utilizes a Gaussian mixture model and a birth-death process. 2.1 summarizes different software that employ the three above algorithms.

Table 2.1: Summary of commonly used clustering software in microbial diversity studies

Type of clustering algorithm Software HC Mothur [46], ESPIRIT-Tree [62] GHC UClust [63], CD-Hit [64] BC CROP [55], BEBaC [54]

‘ One of the difficulties in evaluating clustering approaches is finding a ground truth by which to measure the performance of the software. Researchers and software developers have utilized taxonomic annotations [51, 56], simulated datasets [54] and tallies of false positives/negatives and true positives/negatives in final cluster groupings [52, 53] to determine how clustering software programs compare to each other. There is little consensus concerning which software is best or which performance metric is most informative however there is one conclusion that is consistent among more than one reviewer. GHC algorithms perform significanly worse than both HC and BC.

22 As GHC is heuristic in nature it is not surprising that it makes the most mistakes but the magnitude that GHC can inflate diversity estimates is somewhat alarming. For instance, Sun et al. [56] estimate that GHC can overestimate the number of “species” level clusters by nearly four times. Unfortunately, unlike for multiple sequence alignment [65], there is no standard benchmark dataset or metric for cluster algorithm performance analysis so it is difficult to synthesize the results all clustering performance studies. It is likely that algorithm developers tune their software for their metric and test dataset of choice and therefore their algorithms perform better than those not similarly tuned. Also, often alternative algorithms are not properly represented. For example, Wang et al. [51] utilize a secondary-structure aware, covariance model based aligner [66] but do not mask out alignment columns not matching the covariance model that the alignment software purposefully leaves unaligned. This omission likely leads to inflated computed pairwise distances between sequences and an inaccurate evaluation of the combination of alignment algorithm and clustering method.

2.2 Eco-physiology and ecology of Cyanobacteria in YNP

Cyanobacteria in YNP have been studied at least since 1897 [67]. Early studies were observational in nature but many early findings have been confirmed by modern methods. Recently state-of-the-art techniques have been used to investigate everything from cyanobacterial diversity and phylogeny [68] to expression and translation of nitrogen-fixing genes and proteins [69, 70]. The unifying themes in cyanobacterial biology research at YNP include the ecological and physiological responses of cyanobacteria to sulfide, temperature and nitrogen. Although many cyanobacterial phylotypes exist in YNP, a cosmopolitan unicellular, rod morphotype commonly referred to as Synechococcus has been the focus of many studies.

23 2.2.1 Early studies of cyanobacterial diversity

Cyanobacteria in YNP were first extensively cataloged by Copeland [71] although an earlier publication [67] described an “algal stalactite” and several cyanobacterial morphotypes associated with the structure. Tilden [67] attributes the stalactite to the growth of cyanobacterial filaments into drops of condensation from the ceiling of a shallow cave over an active geyser. Similar “bio”-stalactites and affects of water droplets were characterized by Spear et al. [72] over one-hundred years later albeit not in Yellowstone but a geothermal mine adit and the microbes of interest were not cyanobacteria. Setchell [73] described cyanobacteria and their apparent upper temperature limits found in several YNP hot-springs including hot-springs found at Mammoth, and the Norris, Upper and Lower geyser basins. Setchell [73] notes the temperature heterogeneity of many hot-springs where a few lateral centimeters can encompass a 10-15 circC change in temperature and observes that tufts of growth at hot-spring margins were considerably cooler than water at the spring’s source. Setchell [73] also finds that non-phototrophic life continues beyond the temperatures where cyanobacteria are found but cyanobacteria can be found in water up to 77 circC and that cyanobacteria can be found at higher temperatures in siliceous versus calcerous hot-springs. The upper temperature limits for life were of particular interest to YNP bacteriologists from the early to mid-1900s and are still of interest today [74]. Early studies of phototroph diversity generally involved identifying and counting morphotypes [71]. Temperature effects on cyanobacterial richness and membership was a common theme amongst studies [71, 75] but the influence of pH was also noted [71]. Mann and Schlichting [75] describes the changes in relative abundance of five Synechococcus morphotypes and one Phormidium morphotype at two springs in the Firehole Loop Drive region. Mann and Schlichting [75] shows that morphtypes exhibit temperature optima across different springs and additionally catalogue 39 distinct

24 species of phototrophs including diatoms and their apparent temperature limits. Similarly to Setchell [73], Mann and Schlichting [75] shows that unicellular cyanobacteria and a thin filament called Phormidium by both Setchell [73] and Mann and Schlichting [75] are found at the upper temperature limits for cyanobacteria. Recent molecular data [68, 76] suggests that the thin filaments described in early studies of YNP cyanobacterial diversity is likely a green, non-sulphur bacterium and not Phormidium [77]. Copeland [71] offers the most extensive review of morphotypes found in YNP hot-springs. 2.9 and 2.10 summarize the findings of Copeland [71] with respect to the number of species/morphotypes at different temperatures and pH. There is a significant decrease in cyanobacterial richness with temperature. Additionally, Copeland [71] describes the lack of cyanobacteria at low pH. Both ecological findings have been confirmed by later studies [78]. The first study of Cyanobacterial diversity in YNP to not utilize morphotype counting methods was perhaps Inman [79]. Inman [79] studied the absorption spectra of thermophilic cyanobacteria and found the spectra of Chlorophyll a and b from hot-spring samples to not differ much from that of plants, however, and unidentified peak at 538 um was observed. Brock and Brock [80] also carried out a study more physiological in nature by taking mat samples along hot-spring outfalls and measuring RNA, protein and pigment content per unit area, Brock and Brock [80] shows the general temperature optimum for cyanobacterial mats at specific springs. RNA, protein and pigment content peak at nearly the same temperature, 60 circC. Castenholz [81] was the first to focus on the influence of sulfide on cyanobacteria membership in hot-springs. Castenholz [81] also conducted experimental investigations of sulfide tolerance with YNP hot-spring derived cyanobacterial cultivars. Castenholz [81] found that Spirulina morphotypes dominated membership in sulfidic hot-springs where pH was suitable for cyanobacterial growth. At similar pH but without sulfide, hot-springs are dominated by Synechococcus morphotypes except in the case where

25 Figure 2.9: Cyanobacterial morphotype diversity - Temperature: based on data presented in Copeland [71].

26 Figure 2.10: Cyanobacterial morphotype diversity - pH: based on data presented in Copeland [71].

27 spring flow is rapid in which case heterocystous filaments are most common. Experiments using photosystem II inhibitors and measurements of labeled bicarbonate uptake showed that a Spirulina isolate from YNP could likely utilize sulfide as a photoreductant to photosystem I thus mitigating its photosystem II inhibition [81].

2.2.2 Synechococcus microdiversity and phylogeny

Octopus Spring has been the epicenter for Synechococcus research in YNP since the 1970s [76, reviewed in]. Early studies at Octopus Spring produced a limited and reproducible few Synechococcus strains that could be easily cultured from outfall microbial mats suggesting that the Synechococcus population at Octopus Spring was fairly uniform [76, 82]. In fact, coupled with a low number of morphotypes observed by microscopy, traditional culture based studies conveyed that a single type of Synechococcus dominated the mats at Octopus Spring. A direct dilution culturing technique [83], however, showed that previous culture based techniques were biased towards microbial “weeds” that took best to culture conditions and were hiding a wealth of microdiversity [76, 82]. Novel cultures with Synechococcus-like morphology and cultures of aerobic chemoorganotophs produced by direct dilution of inocula from environmental samples exhibited a wide array of temperature optima that correlated well with conditions from where samples were collected [76]. Moreover, molecular evidence suggested that cultures derived from direct dilution techniques were members of the in situ predominant groups and that traditional enrichment culture techniques produced the same isolates over and over [82] but did not reliably yield predominant in situ phylotypes. Although the novel Synechococcus cultures still resembled strains within the Synechococcus genus by morphology, rRNA gene sequences revealed the morphological similarity to be in stark contrast to a significantly different phylogeny between novel cultures and easily grown Synechococcus “weeds”.

28 One of the more astounding developments in microbiology in the last ten years has been the discovery of extensive genome plasticity in microbial world even among very closely related strains. Studies Vibrio species associated with different size particles in the ocean has shown that microbes with highly similar rRNA genes (99% identity) can be part of ecologically differentiated populations [84]. A study of Vibrio genome diversity found that almost every Vibrio splendidus isolate in the study was genotypically unique [85]. While recent studies of distinct ecotypes within closely related phylogenetic groups have benefited from modern sequencing techniques, studies done at Octopus Spring in YNP found evidence for inter-species ecotypes at the infancy of culture-independent, microbial community profiling techniques [86]. Observations of community structure over temperature gradients at Octopus Spring revealed that five closely related cyanobacterial phylotypes occupied different defined temperature niches. Similarly, seemingly closely related green non-sulfur bacteria appeared to prefer different temperature defined niches [86]. As an aside Ward et al. [76] speculates that even collections of identical 16S gene sequences might include multiple ecotypes. Recent studies of microbial population pan-genomes [87, e.g.] nearly twenty years after the above Octopus Spring studies would seem support such speculation.

2.2.3 Nitrogen fixation at Octopus Spring

Although early studies of cyanobacterial diversity speculated that non-heterotrophic cyanobacteria would be competitively inhibited by heterocystous, nitrogen-fixing counterparts in nitrogen deplete hot-springs [71, e.g.], genomic evidence from two ecologically relevant Synechococcus isolates showed the isolates possessed the full suite of genes necessary for nitrogen-fixation [69]. Subsequent studies showed the Octopus Spring Synechococcus were fixing nitrogen and elucidated the diurnal cycling of nitrogenous gene transcription and protein abundance in the Octopus Spring microbial mat [69, 70]. Heterocystous cyanobacteria utilize differentiated, nitrogen-fixing cells at

29 uniform intervals along a filament to spatially separate the extremely oxygen-sensitive nitrogen fixing apparatus from oxygen evolution due to photosynthesis [88]. Not all nitrogen-fixing cyanobacteria are heterocystous, however. Many non-heterosystous cyanobacteria fix nitrogen and employ other strategies for segregating oxygen evolution and nitrogen fixation. For instance, temporal separation between oxygen evolution and nitrogen fixation involves fixing nitrogen at night after oxygen is consumed or during low light when respiration rates exceed oxygen production. The temporal separation of nitrogen fixation and oxygen evolution has been observed in many studies [89, for review see] but some cyanobacteria can fix nitrogen during the day without heterocysts. For instance, the marine cyanobacterium Trichodesmium fixes nitrogen during the day although the mechanisms that allow for aerobic nitrogen fixation by Trichodesmium are still somewhat unclear [89]. Additionally, a unicellular, uncultivated cyanobacterium, UCYN-A, does not possess photosystem II and also fixes nitrogen in the light [90]. Steunou et al. [69] showed that the Synechococcus ecotypes at Octopus Spring temporally separates nitrogen fixation and oxygenic photosynthesis. Nif transcript levels peaked in the evening and nitrogen fixation peaked during low light in the morning and evening. Nitrogen-fixation is not the only light affected activity at the Octopus Spring microbial mats. Light fluctuations are mirrored by a diurnal fluctuation in the oxygen profile of the mat which likely goes (nearly?) anoxic at night. In the day the Synechococcus secrete photosynthate and at night the cyanobacteria ferment storage products synthesized in the light and secrete fermentation products such as acetate [91]. 2.11 summarizes the day/night cycling due to cyanobacteria in microbial mats.

2.3 Microbial Diversity of Modern Stromatolites

The two most popularly studied living stromatolite systems are found in Shark Bay, Australia [92–97] and Highborne Cay, Bahamas [98–102]. Stromatolites have also been

30 Day Night

Photosynthate Acetate

O2

pH

Nitrogen Fixation

Figure 2.11: Cartoon depicting diurnal cycling in a microbial mat.

31 observed in YNP [103, 104], New Zealand [105] and Ruidera National Park, Spain [106]. The stromatolites at YNP and Frying Pan Lake, New Zealand are siliceous, however most modern analogs are found in carbonate platforms. The definition of stromatolite specifies that stromatolites are laminated [107] and all modern stromatolite analogs satisfy this definition to some extent. For instance, at Highborne Cay, Bahamas laminations are caused by seasonal fluctuations and intermittent hiatal periods of stromatolite surface microbial mats [108]. The different capacities of surface mats to trap sediment and the textural differences of the mats themselves produce distinct facies once lithified and buried by later generations of surface microbes. At Shark Bay, Australia laminations are similarly due to microbial action, in contrast, there does not appear to be distinct mat types [109] but a single surface community that migrates outward to out-pace sedimentation and precipitation [96]. Regardless of the mechanism of lamination, modern stromatolites are all created by microbial action [2] and the characterization of the microbial phylotypes is therefore of interest to microbiologists and geologists alike to further understand the microbial underpinnings of stromatolite morphogenesis. Below is a brief review of key living stromatolite microbial diversity studies.

2.3.1 Phylum level diversity of living stromatolites

For this review, microbial diversity studies of stromatolites from Shark Bay [92, 109, 110], Highborne Cay [100, 111] and Ruidera [106] have been selected as model studies for their respective environments. The accession numbers for the sequences selected in each study are reviewed in [2]. The Microbial diversity surveys of living stromatolites reviewed here have produced SSU rRNA sequences spanning 30 phyla. The most abundantly recovered phyla are , Cyanobacteria, and Planctomycetes in that order 2.12. It is difficult to make quantitative comparisons between studies as researchers have employed a variety

32 of probes with different taxonomic coverage. Further, samples from further inside the stromatolites versus the exterior would likely yield more anaerobic phylotypes in SSU rRNA sequence libraries. For example, interior samples taken by [92] showed both methanogen and sulfate-reducing phylotypes. Archaeal sequences were also derived from The Shark Bay stromatolite samples collected by Papineau et al. [109]. Proteobacteria, Cyanobacteria, Planctomycetes, Chloroflexi, Firmicutes, Bacteroidetes and Acidobacteria were found in each library 2.12. Although never comprising over 1.2% of sequences from any individual group in any single library, eleven candidate phyla (phylum level groups without a cultured representative) are represented across all libraries. In fact, the seven libraries cover a broad diversity of life. 2.13 shows the phyla represented in the selected libraries amongst the full diversity of nearly full-length SSU rRNA sequences and 2.14 shows the same tree with non-stromatolite associated sequences removed. The only phyla in the total dataset that is found in each Highborne Cay library but not from the Ruidera or Shark Bay libraries is Spirochaetes. No phyla found in all Shark Bay libraries are not found in either the Highborne Cay libraries or the Ruidera library. The Ruidera study [106] also does not exhibit any phyla unique to its library. The Ruidera library has the lowest phylum-level diversity with only eight phyla.

2.3.2 Beta diversity of living stromatolites

Although the selected stromatolite rRNA gene sequence libraries span multiple years and were conducted by different groups, there is still a signal correlated with location (Highborne Cay, Bahamas; Shark Bay, Australia; Ruidera National Park, Spain) exhibited by each library as shown in a 2.15. The beta diversity metric used in 2.15 is the unweighted Unifrac [59] distance and is a metric that does not factor relative abundance of taxa. Unifrac has been used similarly to compare SSU rRNA gene libraries from different studies and therefore different regions along the SSU rRNA gene [112] to show

33 Figure 2.12: Phylum-level diversity of selected living stromatolite microbial diversity stud- ies. First author of study and publication year are noted on the x-axis. The artificial stromatolite from the Havemann and Foster [111] (denoted 2008b in the figure)) is shown as a separate bar from the samples taken in situ.

34 Figure 2.13: The SSURef102NR Silva database [18] guide tree encompassing the di- versity of all nearly full-length, high quality SSU rRNA sequences submitted to public databases. red clades are those that possess sequences from living stromatolite studies

35 Figure 2.14: Diversity of living stromatolite sequences from selected studies. Clade anno- tations correspond to location of sequence in the Silva database [18] SSURef102 guide tree.

36 that salinity was a key parameter in shaping the membership of microbial communities on a global scale. The samples depicted in 2.15 also fall along a salinity gradient. The Shark Bay stromatolites are found in a hypersaline environment, the Highborne Cay stromatolites in a marine environment and the Ruidera stromatolites are in a freshwater lake. To see if the relative effects of geography and salinity could be delineated in the beta diversity values depicted in 2.15 an additional sample was added to the dataset from a non-stromatolitic but hypersaline library [113]. It appears as if the hypersaline environments cluster together suggesting that salinity may be most responsible for shaping the microbial membership of the living stromatolites (see 2.16). Although further studies would have to be done, these beta diversity analyses pose an interesting hypothesis that salinity beyond geography and the property of growing on or in a stromatolite is most influential with respect to shaping stromatolite microbial membership and questions whether there is a stromatolite defining suite of microbes in any system.

2.3.3 Methods

The following are data analysis methods used in the review of modern stromatolite microbial diversity studies (above)

2.3.3.1 Phylum Level Diversity

Sequences from the reviewed studies were collected from Silva SSUParc111 and taxonomically annotated using the RDP Classifier [49] with the QIIME [45] application controller. The RDP Classifier was re-trained with the Greengenes taxonomic hierarchy and a non-redundant set of SSU rRNA sequences from the Greengenes database [17] with corresponding taxonomic annotations [114] . Using Arb [115] with the Arb-compatible SSURef111 database from Silva [18] Phylum level clades in the database guide tree that matched taxonomic annotations (above) were highlighted.

37 Figure 2.15: Principal coordinate plot of unweighted Unifrac distances between selected stromatolite sequence libraries. Blue denotes Shark Bay, Australia; red Ruidera National Park, Spain; and orange Highborne Cay, Bahamas.

38 Figure 2.16: Principal coordinate plot of unweighted Unifrac distances between selected stromatolite sequence libraries and a library from the Guerrero Negro hypersaline mats [113]. Green denotes Shark Bay, Australia; orange Ruidera National Park, Spain; red Guerrero Negro, Mexico; and blue Highborne Cay, Bahamas.

39 2.3.3.2 Phylogenetic trees

As each study utilized different primers, it is difficult to reconstruct a phylogeny of all the sequences as the alignment coordinates would not sufficiently overlap. Recent advances in tree insertion algorithms have allowed researchers to place sequences into backbone reference trees [116, e.g.]. In this way the relationship between inserted sequences is assessed in the context of longer reference sequences for which a phylogeny is known. Sequences were inserted into the SSURef102NR Silva database [18] using the parsimony insertion algorithm available through Arb [115].

2.3.3.3 Unifrac Principal Coordinates Plot

Sequences from the studies reviewed above were collected from the Silva SSUParc111 SSU rRNA database [18]. Sequences were clustered with UClust [63] with the application controller available in QIIME [45]. Representative sequences from each cluster were (3% identity OTU) were inserted into the Silva SSURef102NR guide tree to create the phylogeny necessary to calculate Unifrac distances [117]. Prior to clustering all SSU rRNA libraries were randomly sampled 10 times to a depth of 120 sequences to account for sequencing depth heterogeneity. The principal coordinates analysis plot represents the variance due to random sampling by displaying a halo around each sample point which encompasses the extents in two-dimensional space that samples moved around [61, for details see] (Halo not visible in 2.15 or 2.16. PCoA visualization of Unifrac Distance matrices was performed using QIIME.

40 CHAPTER 3 REVIEW AND ANALYSIS OF THE TAXONOMIC ORGANIZATION OF CYANOBACTERIA IN SSU RRNA GENE DATABASES

Databases of annotated biological sequences are crucial to the interpretations of metagenomic data. Whether it is a study of metagenomes [118], metatranscriptomes [119], metaproteomes [77] or surveys of SSU rRNA genes [120], distributing biological sequences into hierarchically organized categories is a key step towards describing the biological underpinnings of observed phenomena and the similarities between samples collected over space and time. Additionally, there exists great opportunity to conduct insightful studies using entirely publicly available database sequences [121]. At the rate that sequence information is being produced by biologists there are bound to be misannotations and mistakes that slip by curation when a sequence enters a database. For instance, Tripp et al. [122] found that 23S rRNA gene sequences were misannotated and placed in the NCBI non-redundant protein coding sequence database and that misannotation bred a misannotated group in Pfam. It is imperative that database users understand the organization and curation of any database they choose in order to correctly interpret results. It is also appropriate to periodically review database structure and content. Here we discuss the taxonomic organization of cyanobacterial sequences in SSU rRNA databases. Cyanobacterial systematics has a convoluted history [123]. Originally thought of as plants, cyanobacteria were described and named according to the botanological code. As such, many cyanobacterial names are not officially recognized by the bacteriological code [124]. Further confusion stems from a significant number of cyanobacterial cultivars being misannotated and improperly added as members of existing genera (discussed in Wilmotte and Herdman [1]). As the names of cyanobacterial cultivars can

41 often be unreliable, it is difficult to evaluate the relative molecular phylogeny of key genera. The SSU rRNA gene has long been established as a useful phylogenetic marker [9]. Although novel cyanobacterial SSU rRNA gene sequences and diversity are steadily accumulating in public databases (see 3.1 and 3.2) there has not been a recent effort to establish a well-supported topology of the 16S SSU rRNA gene cyanobacterial phylogeny. Past studies of cyanobacterial 16S phylogeny have elicited conflicting results and the use of different collections of sequences makes it difficult to reconcile topological differences. Moreover, the consensus state of the organismal, systematic taxonomic outline for cyanobacteria is in flux [125, 126]. The unsettled nature of cyanobacterial phylogeny and systematics seems to have bled over into the SSU rRNA databases as there is little organizational consistency aspects of cyanobacterial between three major SSU rRNA gene databases, the Ribosomal Database Project (RDP) [127], Silva [18] and Greengenes [17]. Most differences between the SSU rRNA databases are likely due to the inconsistent sources from which each database derives its taxonomic nomenclature for cyanobacteria. 3.1 summarizes the foundation of each of the three aforementioned cyanobacterial taxonomic hierarchies.

Table 3.1: Summary of Nomenclature Sources for Major SSU rRNA Databases

Database Foundation for Cyanobacterial Taxonomic Nomenclature RDP Wilmotte and Herdman [128] Silva Castenholz [125], Euzeby [129] Greengenes Komarek and Hauer [130]

Culture-independent surveys have been, and, with the dropping cost of DNA sequencing are projected to remain, popular methods to probe microbial ecology. A standard step when analyzing SSU rRNA libraries is to “classify“ the likely organisms from which the sequences were derived [49]. Classifications can act as ecological units for categorical comparisons between samples [131], a taxonomic annotation to an

42 Figure 3.1: Cumulative sum of cyanobacterial 16S sequences submitted to public se- quence databases.

43 Figure 3.2: Cumulative sum of cyanobacterial 16S OTUs submitted to public sequence databases. 3% and 5% are typically used to denote “species”-level and “genus”-level microbial taxa, respectively.

44 Operational Taxonomic Unit (OTU) of note [120], as well as a preliminary binning method for collecting sequences from a particular group for a study of biogeography [132]. Methods for classifying sequences range from simple nearest neighbor strategies using searching algorithms such as BLAST [133], to progressive refinements of BLAST results that take into account numerous closely matching reference sequences [134], and alignment-independent algorithms based on overlaps of nmer-catalogues derived from reference and query sequences [49]. Regardless of the algorithm chosen for classification, the quality of the underlying ”training“ database will influence the reliability of classification results [114, e.g.]. We present an analysis of cyanobacterial 16S gene taxonomic organization that will inform researchers of the limitations of current systems as well as help researchers to understand the relationships between taxonomic groups recognized by major SSU rRNA gene databases.

3.1 Results

The following are results of investigations of SSU rRNA databases. 3.2 summarizes which database versions were used for each analysis.

Table 3.2: Database versions used in this study.

Database Version Silva SSURef111 RDP Full Release Version 10 and RDP Classifier training set Version 9 Greengenes Sequences used by McDonald et al. [135]

3.1.1 Review of taxonomic organizations

The different nomenclatural foundations for the RDP, Silva and Greengenes SSU rRNA databases (see 3.1 has led to three starkly different taxonomic organizations.

45 3.1.1.1 Broad groups

Silva, based on Castenholz [125], distributes cyanobacteria into five “subsections” and one subgroup that contains the Chlorogloeocystis genus. Additionally, SSURef111 shows many basal clades outside of typical cyanobacterial diversity classified as cyanobacteria that are seemingly candidate classes. The candidate classes will be omitted from the following discussion but reviewed later. As shown in 3.3, the subsections are not phylogenetically coherent in the Silva SSURef111 guide tree. Subsection IV appears to be the only coherent group. SubsectionI contains the most sequences (3312) followed by subsections III (1903) and IV (1595). Comparatively, subsections II, V and SubgroupI have significanly fewer sequences (199, 116, 7, respectively; see 3.5). Greengenes organizes cyanobacterial 16S gene sequences into four classes in addition to two candidate classes that will similarly omitted from this discussion. The Greengenes classes appear to be more phylogenetically coherent than the Silva subsections based on the Greengenes guide tree (see 3.4). Polyphyly is still apparent in the major groups although the Nostocphycideae class is monophyletic. Similarly to the Silva distribution, the number of sequences per major group for Greengenes is uneven (shown in 3.6). The Synechococcophycideae class possesses the most sequences (2072) followed by the Oscillatoriophycideae (985) and then the Nostocphycideae (940). Only 5 sequences are categorized in the Gloeobacterophycideae. The RDP only organizes cyanobacteria into two levels beyond phylum. The broader of the two levels has 14 groups and the level below has 15 groups. Essentially, 14 of the 15 broad groups span both subphylum levels and only one broad group is further divided into two sub-groups. The RDP does not supply a guide tree. The immediate subphylum groups in the RDP and the level after is just called “group” (e.g. FamilyI;GroupI). 3.7 summarizes the distribution of sequences into the RDP taxonomic hierarchy for Cyanobacteria. RDP adheres to a strict six-rank taxonomic hierarchy. Within the hierarchy Cyanobacteria/Chloroplast sequences are organized into a Phylum,

46 Cyanobacteria alone is a class and following is the “family” group. The highest resolution classification which would typically correspond to the genus level (level six) for other bacterial sequences in the RDP is where the “group” names fall.

3.1.1.2 Genera

Silva recognizes 64 genus names. Synechococcus possesses the most sequences in the Silva system, 1363, followed by Prochlorococcus (1102 sequences). There is a significantly skewed distribution of sequences per genus in the Silva database (see 3.8). Greengenes recognizes less genera, 52, but shows a similarly skewed distribution of sequences per genus. The most abundant genus annotation in the Greengenes database is Prochlorococcus, 1706 sequences, followed by Nostoc, 249 sequences, and Dolichospermum, 233 sequences (see 3.9). Silva does not incorporate the Dolichospermum name in its taxonomic hierarchy. There are comparatively fewer Synechococcus annotations in Greengenes where Synechococcus is the twelfth most abundant annotation, 54 sequences, versus Silva where it is second (see above). The union of the Silva and Greengenes list of genus names is 74 names long but the intersection of the databases is only 42 genera. Greengenes includes 10 genus names that Silva does not whereas Silva includes 22 names not found in Greengenes. The RDP, as discussed above, does not subdivide its immediate subphylum groups with one limited exception being the subdivision of FamilyII into groups IIa and IIb. The RDP does not incorporate any of the historical cyanobacterial genus names in its taxonomic system.

47 Figure 3.3: Silva Guide tree of cyanobacterial 16S gene sequences (SSURef111) colored and collapsed by “class” annotations. Width of clades denotes number of sequences within the clade.

48 Figure 3.4: Greengenes Guide tree of cyanobacterial 16S gene sequences colored and collapsed by “class annotations. Width of clade denotes number of sequences within the clade.

49 Figure 3.5: Counts of sequences in broadest suphylum cyanobacterial groups in the Silva (SSURef111 taxonomic hierarchy.)

50 Figure 3.6: Counts of sequences in the cyanobacterial classes in the Greengenes taxo- nomic hierarchy.

51 Figure 3.7: Counts of sequences in RDP cyanobacterial ”groups“.

52 Figure 3.8: Counts of sequences in Silva genera.

53 Figure 3.9: Counts of sequences in Greengenes genera.

54 3.1.2 Increase in known diversity by addition of sequences to public databases

A significant number of cyanobacterial 16S rRNA gene sequences get submitted to public databases yearly (3.1). Additionally, new 16S gene sequence submissions add to the known diversity of cyanobacteria 16S gene sequence space (3.2). 3.2 shows that not only are new cyanobacterial sequences being incorporated in SSU rRNA gene sequences databases but that the new sequences are forming novel OTUs at both 97% and 95% sequence identity thresholds. Further, the majority of cyanobacterial 16S rRNA gene 97% and 95% sequence identity OTUs do not possess a cultured representative (see 3.10). The accumulation of diversity within Cyanobacteria tracks well with the accumulation of 16S rRNA gene diversity within more broad groups like bacterial phyla and the Archaeal domain [9, 136].

3.1.3 Analyses of the width of taxonomic levels

As expected, 3.11 and 3.12 show that higher-resolution taxa in the hierarchy of the Silva and Greengenes taxonomic systems are generally narrower phylogenetically. Notably, however, significant overlap between the groups on different levels in the taxonomic hierarchy (e.g. some genera have the same minimum inter-taxon identity as subsections of classes) can be observed in the overlapping level-specific cumulative histograms of minimum inter-taxon identity (metric for phylogenetic group width [53]) (see 3.11 and 3.12). The Silva system appears to have wider groups as compared to corresponding levels in the Greengenes system. The RDP groups are more consistent in width to each other than any taxon level in the Silva or Greengenes taxonomic hierarchies (3.13). The RDP Classifier training set which is a subset of the full RDP database meant to be used for training the RDP Classifier algorithm appears to be consistent in terms of general taxonomic group phylogenetic width with the full RDP database.

55 Figure 3.10: Cultured representation in cyanobacterial OTUs.

56 Figure 3.11: Cumulative taxonomic level-specific cumulative histograms of minimum inter- taxon identity for taxonomic groups in the Silva SSURef111 SSU rRNA gene database.

57 Figure 3.12: Cumulative taxonomic level-specific cumulative histograms of minimum inter- taxon identity for taxonomic groups in the Greengenes SSU rRNA gene database.

58 Figure 3.13: Cumulative taxonomic level-specific cumulative histograms of minimum inter- taxon identity for taxonomic groups in the RDP SSU rRNA gene database.

59 3.1.4 Consistency of high resolution classifications with common clustering methods

Normalized Mutual Information (NMI) has recently been used to evaluate the consistency of OTU assignments with taxonomic annotations in an effort to compare clustering algorithms and bioinformatics methods for 16S rRNA gene sequence analysis [56]. Here NMI was used similarly, not to evaluate clustering, but to measure the consistency between database cyanobacterial taxonomic annotations and typical 16S rRNA gene OTU assignment methods (see 3.14). Additionally, we employed a purity metric to further assess how annotations track with OTU assignment. The NMI scores for the Silva database genus annotations peaked at the 97% identity threshold for OTU assignment and did not reach as high of values as either Greengenes genus or RDP group annotations. The Silva NMI peak was placed at a higher sequence identity clustering threshold suggesting the Silva genera are more narrow phylogenetic groups than the Greengenes genera or the RDP groups. The NMI scores were inconsistent with the cumulative histograms of minimum inter-taxon identity for the Silva database that showed the Silva taxonomic groups to be generally wider than corresponding groups in Greengenes. The Greengenes genus NMI scores peaked at the 95% sequence identity threshold for OTU assignment and reached approximately as high a value as the NMI peak for the RDP Classifier training set. The higher NMI scores for Greengenes and the RDP Classifier training set as compared to the Silva genus scores suggests that Greengenes and RDP taxonomic annotations are more consistent with standard OTU assignment methods than Silva. The full RDP training set showed lower NMI scores than the RDP Classifier subset but peaked at the same point, 91% sequence identity. The RDP “Groups” appear significanly wider than Silva or Greengenes genera based on clustering and annotation consistency. Purity can be use to assess the homogeneity of clusters with respect to an independent classification. As shown in 3.15 purity values for clusters support the finding

60 that Silva genera are generally narrower in phylogenetic scope than the Greengenes genera or the RDP groups. Purity scores for all three database systems begin to drop at the point NMI scores peak. The Silva purity scores do not reach a value 1.0 even at high identity clustering thresholds suggesting that some sequences sharing high sequence identity are placed into different taxonomic groups in the Silva system.

3.1.5 Additional considerations

Both the Silva and Greengenes databases recognize candidate cyanobacterial classes that fall outside what is generally considered the cyanobacterial clade rooted by Gloeobacter violaceus (3.3 and 3.4). Silva recognizes five such candidate classes and Greengenes recognizes two. The ’description’ field for cyanobacterial candidate class sequences in the Silva SSURef111 database includes many entries related to feces samples. It appear that the candidate class microbes are commonly found in gut microbiomes. In fact, one of the first studies to discuss the peculiar phylogenetic placement of sequences near but not in Cyanobacteria is a study of mouse gut microbiomes [137]. The types of environments where the candidate classes are found suggest that the 16S sequences were derived from a non-phototroph and could represent the closest non-phototrophic relative to oxygenic phototrophs although it is not clear that the candidate classes should be considered part of the Cyanobacteria phylum. The RDP, Silva and Greengenes all incorporate Chloroplast within the Cyanobacteria. It should be noted that simple grouping algorithms that collapse annotations into group counts at a given taxonomic level may count chloroplast sequences as part of the cyanobacteria group. This may or may not be desirable to the researcher depending on the application. Also, the candidate classes discussed above may be counted as cyanobacteria given their placement in the Silva and Greengenes taxonomic hierarchies.

61 Figure 3.14: NMI scores for Silva and Greengenes genus as well as RDP group annota- tions over a range of OTU assignment sequence identity thresholds.

62 Figure 3.15: Purity scores for Silva and Greengenes genus as well as RDP group anno- tations over a range of OTU assignment sequence identity thresholds.

63 3.2 Discussion

To produce the taxa abundance distributions that are a pillar of biological diversity research [8] organisms must first be placed and counted into ecologically relevant categories. The phylogenetic groups within cyanobacteria based on the taxonomic systems of major SSU rRNA gene sequence databases are not of uniform width within any given taxonomic level of resolution. This is not a curation mistake but likely due to uneven sampling of diversity by microbial ecologists. When classifying SSU rRNA gene sequences from a culture-independent study of a microbiome the resulting taxonomic counts will share the bias of the reference data utilized by the classification algorithm. Therefore, database bias should be considered when utilizing categorical counts based on cyanobacterial taxonomic annotations to surmise taxa abundance distributions. Although it has been noted before that the major SSU rRNA gene sequence databases do not have entirely overlapping namespaces, to our knowledge namespace discrepancies have not been discussed beyond the phylum level. Cyanobacterial SSU rRNA gene sequences are perhaps the most severe example of database organization differences as the broadest phylogenetic groups within Cyanobacteria for the RDP, Silva and Greengenes share no namespace at all. At higher levels of resolution the Silva and Greengenes databases have names in common but still many are unique to one database or the other and RDP employs an entirely different organization. It is difficult to assess the performance of the databases with respect to taxonomic curation of cyanobacteria relative to each other without an accepted ground truth by which to judge them. However, we can surmise that the RDP and Greengenes have taxonomic classifications that are more consistent with standard OTU assignment methods utilized in microbial diversity studies. Additionally, the Silva database shows lower purity scores for taxonomic annotations even within closely related groups. Keeping the low purity scores in mind and the narrow genus groupings suggested by NMI scores versus the lower minimum intra-taxon identities as opposed to

64 corresponding Greengenes taxonomic groups, it appears that the Silva database has more non-matching taxonomic annotations between pairs of closely matching sequences and perhaps identical taxonomic annotations between divergent sequences. This is not likely a curation mistake, however, but rather a product of forcing an inflexible taxonomic system on a non-matching phylogeny and/or utilizing sequence annotations for cultured strains from original sequence submissions that are notoriously error-prone. Nonetheless, database curators provide an essential service without which metagenomic studies would not be possible and they tackle the difficult problem of reconciling microbial systematics with molecular phylogenies.

3.3 Methods

The following are the bioinformatic methods used to produce the analyses and figures above.

3.3.1 Cumulative cyanobacterial sequences

To find the cumulative number of cyanobacterial 16S gene sequences being deposited in public databases yearly all sequences that were classified as “Cyanobacteria” by at least two of the three databases reviewed base on annotations in Silva SSURef111 were selected. The submittal date for each selected sequence was taken from the “submit date” field for Silva SSURef111 entries. To estimate the cumulative yearly sum of new cyanobacterial OTUs cyanobacterial sequences (above) were clustered using distances based on the Silva SSURef111 alignment and clustered at the indicated thresholds with an average neighbor algorithm. The minimum submit date for each OTU was considered the date the OTU was first deposited. Mothur [46] was used to calculate distances (each indel counted as one column, terminal gaps were considered) and cluster as well as to trim the alignment to coordinates such that all sequences fully overlapped before distance calculations.

65 3.3.2 Sequence alignment and clustering

All alignments were done using the NAST [138] aligner incorporated into the Mothur [46] software. The Silva reference alignment for bacteria as made available on the Mothur webpage was used as the alignment template. Identity values were calculated as follows:

Identities columns Where one indel counted as one column and terminal gaps were considered. Hierarchical clustering was done using the average-neighbor algorithm in the Mothur software. Prior to distance calculations all multiple sequence alignments were trimmed to alignment coordinates such that all sequences overlapped from beginning to end.

66 CHAPTER 4 CYANOBACTERIAL CONSTRUCTION OF HOT SPRING SILICEOUS STROMATOLITES IN YELLOWSTONE NATIONAL PARK

This chapter was published in the journal Environmental Microbiology (2012):

• Pepe-Ranney, C., Berelson, W.M., Corsetti, F.A., Treants, M., and Spear, J.R. (2012). Cyanobacterial construction of hot spring siliceous stromatolites in Yellowstone National Park. Environmental Microbiology, 14(5), pp1182-1197.

CPR did the field work, molecular laboratory work, bioinformatic analysis of 16S sequence libraries and wrote the manuscript. WMB, WMC and JRS helped edit the manuscript. JRS aided with field work and discovered the stromatolite. MT aided with construction of clone libraries. Stromatolites are laminated, accretionary structures commonly interpreted as a lithified manifestation of microbial life [107] and are observed in rocks as old as 3.4 billion years [139]. Stromatolites were common in shallow marine environments of Precambrian age, but became much more rare after the diversification of metazoan life, ca. 542 million years ago [140]. “Living” marine stromatolites found in Shark Bay, Western Australia [94, 95, 109] and Highborne Cay, Bahamas [98–100, 108, 141–143] lend insight into the biological significance of ancient forms. The microstructure of the modern marine examples, however, is rather coarse compared to most ancient forms [94, 144, 145]. Berelson et al. [145] reported finely laminated stromatolites growing in a hot spring in Yellowstone National Park (YNP), Wyoming, that constitute a somewhat better textural analogue to ancient stromatolites versus the modern marine examples. Berelson et al. [145] focused on growth rates and a formal description of the stromatolites from a geological perspective. Here, we focus on the molecular biology, environmental context,

67 and evolutionary importance of the microbial communities building the finely-laminated stromatolites from YNP. Most ancient stromatolites are composed of calcium carbonate, whereas the YNP examples described here are siliceous. Early lithification is crucial for a mat to form a stromatolite over time, regardless of the geochemistry of the system. In YNP, early lithification occurs as silica precipitates in association with the mats, thus forming an excellent textural, if not geochemical, analogue for many finely-laminated ancient stromatolites. Moreover, living [146] and ancient [147, 148] carbonate stromatolites appear to be formed by alternating layers of cyanobacterial filaments oriented vertically and prostrate, similar to the stromatolites discussed in this study. Thus, the mat behaviors are similar, even if the early lithification is provided by silica or carbonate. The living, siliceous stromatolites of this study grow around the rim of a hot spring in shallow water centimeters in depth. The stromatolites are built by accretion of silica-encrusted cyanobacterial mats. The bulk of the structure is comprised of silica tubes that are the remains of filamentous cyanobacteria [145] (see 4.1H). Fine, distinct laminations are the result of the uniform but alternating growth orientation of cyanobacterial sheaths, closely resembling a pattern described for another siliceous sinter at an unspecified YNP hot spring [94]. Specifically, silica-encrusted filaments in a single layer are either oriented sub-normal or sub-parallel to the surface and the textural differences between normal and parallel orientation gives rise to light and dark laminations visible to the naked eye [145]( 4.1A). The microstructure also resembles that described by Jones et al. [149] insofar as it is comprised of lithified filaments in alternating orientation. The objective of this study is to present the molecular phylogenetic makeup of the cyanobacteria in stromatolite-building mats. Molecular tools have been used to elucidate microbial diversity of other modern analogs. For instance, community characterizations of microbial populations in stromatolites by 16S rRNA gene sequence surveys have

68 Figure 4.1: Selected images of stromatolite facies. All scale bars are 10 m unless oth- erwise indicated. A. Cross-section of stromatolite from which CDM samples in August of 2006 were collected. Gradations in scale bar across bottom of image are millimetres. B. Cartoon depicting the location of facies in cross-section A. C. Stromatolite from image Ain situ. D. Autofluorescence image of predominant morphotype in SFM. E. Thin sec- tion image of Main Body lithofacies. F. Autofluorescence image of predominant phylotype in CDM samples from August of 2006. G. Thin section image of Drape lithofacies. H. Electron micrograph of silica tubes that comprise Main Body lithofacies.

69 shown that microbial community membership correlates well with stromatolite shape in the Shark Bay system [109], and that species richness increases with, and may be linked to, lithification in the Highborne Cay system [100]. In this study, information from a 16S rRNA gene sequence molecular survey of stromatolite mats was used to (1) investigate the cyanobacterial membership profiles of stromatolite-building mats and (2) describe the phylogeny of stromatolite-associated cyanobacteria. We report on rare biosphere membership relationships between different surface mat types and a novel cyanobacterial phylotype that is constructing the predominant stromatolite lithofacies. Although microbial cell morphology is commonly conservative (the specific genetic component of Precambrian stromatolites will likely never be known), understanding the molecular taxonomic composition of a modern example that most closely resembles the texture of the ancient stromatolites represents an important step in understanding stromatolite morphogenesis.

4.1 Results

Centimeter scale living stromatolites with sub-millimeter lamination grow around the rim of a hot spring in upper Hayden Valley in Yellowstone National Park (YNP) [145]. The water in the hot spring is 56 circC during the day and the pH is 5.7. pH does not fluctuate significantly, however temperature varies with wind and weather conditions [145]. The general water chemistry of the hot spring has been reported by Spear et al. [150] (see Obsidian Pool Prime). We measured combined nitrogen on-site to assess whether the hot spring communities could rely on nitrogen from hot spring water or had to fix

+ − − atmospheric nitrogen for growth. The NH4 , NO2 and NO3 concentrations in the hot spring were all below the detection limit (0.20 ppm as N, 0.080 ppm as N and 0.10 ppm as N, respectively) when measured in October of 2010. The sulfide concentration of water near the perimeter of the north end of the hot spring was measured in August 2008 at 0.224 ppm. Most stromatolites appear to be growing from the rim of the pool inward,

70 but isolated stromatolites away from the shore grow upward and outward in all directions. The stromatolites grow by accretion of silica-encrusted cyanobacterial mats [145] and our study focuses on yet-to-be encrusted mats. Two morphologically distinct cyanobacterial surface mats have been observed as part of the stromatolites; one possesses short coccoid chains and branching-filamentous cyanobacterial morphotypes in addition to silica diatom frustrules and is found forming a layer of the stromatolite when the structures protrude from the water. The other has non-branching cyanobacterial filaments lacking heterocysts and is found when stromatolites are submerged or at the water surface. The mats with coccoid chains and branching morphotypes could be easily smeared off of stromatolites and were texturally less cohesive and fabric-like than the non-branching filamentous mats which tightly adhered to surfaces as a cohesive fabric of uniform thickness. Here we call the exposed Mat type with Cocci/Diatom morphotypes CDM and the Submerged Mat type with the non-heterocystous Filamentous morphotype SFM. To study the microbial communities of surface mats with molecular methods, we collected samples of each mat type. Specifically: two samples were of the CDM, each from an exposed but moist mat on a stromatolite elevated just above the water (see 4.1C); and two samples were collected on separate visits to the hot spring from the SFM on mostly submerged stromatolites. Samples of the SFM were collected in different seasons (winter and summer). Greater coverage of the cyanobacterial mats around the hot-spring have been observed in all winter trips to the field site. It appears as though the lower light intensities in addition to the lower ambient temperatures and possibly reduced grazing during the winter are more suitable for the cyanobacteria. All samples were collected in the daytime and it is not known how community structure shifts over the short-term (i.e. throughout the day and night). Microstructure observations by thin section and scanning electron microscopy (SEM) revealed two lithofacies inside the stromatolites [145] that correspond to the surface

71 mats. For clarity, this study discusses samples from the two types of surface mats on the stromatolites (CDM and SFM above). Each type is eventually silica-encrusted and preserved as a distinct lithofacies. Morphotype similarities between each mats cyanobacteria and each lithofacies’ silica forms (see 4.1D-G) allowed us to discern the relationships between mats and lithofacies. The bulk of the stromatolites interiors can be categorized into a lithofacies composed of non-branching silica filaments (called the Main Body Facies by Berelson et al. [145]) ( 4.1E) which corresponds to the SFM. The second lithofacies, referred to by Berelson et al. [145] as the Drape Facies, corresponds to the CDM ( 4.1G). Bacterial small sub-unit (SSU) rRNA gene libraries of pyrosequences and nearly full-length Sanger sequences were produced for diversity and phylogeny investigations, respectively. Pyrosequence library names and descriptions are summarized in 4.1 and methods; all library names indicate the mat type of the librarys sample (CDM or SFM). Cyanobacteria comprise the most abundantly found phylum in each pyrosequence library (4.2A) and the cyanobacterial 16S rRNA sequences from each surface sample are of predominantly one phylotype (see 4.2B). When mat samples are viewed under the microscope it appears cyanobacteria comprise a greater fraction of biomass than representation in 16S rRNA gene sequence libraries indicate. The dominant phylotype of the SFM (4.2B) corresponds to the dominant SFM morphotype by qualitative comparisons of phylotype and morphotype distributions and is likely preserved as the silica tubes, essentially building the stromatolites.

4.1.1 Predominant Cyanobacterial Phylotypes

CDM Samples. Libraries PCDM0806a and PCDM0806b are dominated by one major and one minor phylotype (4.2B). 4.3 shows the distribution of non-singleton/doubleton cyanobacterial sequences in PCDM0806a and PCDM0806b. As shown in 4.2B, two lineages possess the majority of the cyanobacterial membership

72 Table 4.1: Sample and corresponding library information.

Mat Sequence Sequence Sample Collection facies info library description date # of seqs, name avg length(s.d) CDM 1800, 233(17) PCDM0806a Surface mat above water August 2006 surface with coccoidal and filamentous, hetero- cystous, true-branching cyanobacterial morpho- types CDM 2063, 233(25) PCDM0806b same as above August 2006 SFM 2044, 225(25) PSFM0209 Surface mat at water’s February 2009 surface dominated by filamentous cyanobacte- rial morphotype without heterocysts SFM 1706, 217(28) PSFM0809 Same as above August 2009

Figure 4.2: A. The bar chart depicts the distribution of pyrosequencing reads into phyla for each pyrosequence library. B. Rank abundance plot of cyanobacteria 97% identity OTUs in the combined PSFM and PCDM libraries respectively.

73 in each CDM pyrosequence library (99% and 98%). The most abundant phylotype in both samples (85% and 92% of total cyanobacterial sequences) is shown as the red cluster of 4.3; the average pairwise identity in this cluster is 99.22%. All sequences in the red clade have the same nearest neighbor by BLAST (97.86% - 99.57% identity) in the Silva SSURef104 database [18], a database of near full-length, annotated and curated SSU rRNA gene sequences. The nearest-neighbor is a 16S sequence from a cultivar in the Pasteur Culture Collection (PCC) annotated as Chlorogloeopsis HTF sp. PCC 7518 (Accession X68780) [151]. Cyanobacteria in the Chlorogloeopsis HTF are heterocystous and are arranged in short chains or aggregates of coccoidal cells [125]. Morphotypes from CDM samples fitting this description have been observed by microscopy (4.1). The second most abundant phylotype in the CDM libraries (14% and 6% of total cyanobacterial sequences in the pyrosequence libraries) has 100% sequence identity to a 16S rRNA gene sequence from an unpublished study of microbial communities in Australia’s Great Artesian Basin (Accession AF407696). The sequence also has 99.09% identity by BLAST to Fischerella (Mastigocladus laminosus) cultivars isolated from Costa Rica hot springs (Accession DQ786171, [152]). True-branching filaments characteristic of the Fischerella genus [125] have been observed by microscopy in our CDM samples. 4.4 shows the phylogenetic placement of near full-length sequences that represent the two most abundant phylotypes of CDM pyrosequence libraries.

SFM Samples. 93% and 82%, respectively, of cyanobacterial sequences in the SFM libraries PSFM0809 and PSFM0209 fall into one cluster with 98.77% average intra-cluster identity; this cluster is colored blue in 4.3. Each sequence in this dominant cluster has the same nearest neighbor (98.6% to 100% identity) by BLAST in Silva SSURef104. The nearest-neighbor is an unpublished sequence from a travertine hot spring in SW Japan (Accession AB518480) and is not classified beyond the phylum level by Greengenes [17], the Ribosomal Database Project (RDP) [19], or Silva. The SW Japan hot spring displays carbonate geobiological structures interpreted to be

74 stromatolites, although the noted sequence is unpublished and it is unclear whether it is associated with a microstructure organization that is comparable to the YNP stromatolites. Some rare sequences (<3% of total sequences) in the SFM libraries are the predominant CDM phylotype. Conversely, no sequences from the CDM samples fall into the predominant cluster of the SFM samples (4.3). To assess the phylogeny of the predominant phylotype of PSFM0809 and PSFM0209, nearly full-length 16S rRNA gene sequences were produced (Sanger sequenced) from the samples. One sequence, with optimal length (1364 bp) and quality (average QScore, 54.4), that shared 98.61% identity with the most abundant PSFM pyrosequence library sequence and up to 100% identity with other sequences in the predominant cluster (blue cluster 4.3) was selected to represent the predominant SFM phylotype. This near full-length representative sequence will hereafter be referred to as SFM-seq. Sequences in Genbank [153] and SSUParc106, a curated database of all SSU rRNA gene sequences (short and long) from public repositories of biological sequences, that share high identity (97% or greater) to SFM-seq are summarized in 4.2 and include one cultivar ”Microcoleus” sp. PCC 8701 (Accession AY768403), that was isolated from a sulfur hot-spring in Amelie Les Bains, Fance (PCC) in addition to several other travertine mat sequences from a Japan hot-spring. We determined the phylogenetic placement of SFM-seq and its SW Japan travertine near-neighbor by reconstructing their phylogeny in the context of a broad reference set of cyanobacterial 16S rRNA gene sequences. Although cyanobacteria have historically been classified into five subsections, the topology of cyanobacterial 16S phylogeny has shown more than five basal lineages [1, 154, 155]. We chose to investigate the phylogeny of SFM-seq in the context of the cyanobacterial groups recognized by the RDP [19] that are based on the topology described by Wilmotte and Herdman [1]. The phylogenetic placement of SFM-seq and the SW Japan travertine sequence is outside all

75 Table 4.2: Sequences in SSUParc106 and/or Genbank with at least 97% identity to SFM- seq

%ID Accesion Full name Study title Description to Number of in in in PubmedID SFM-seq top hit SSUParc106 SSUParc106 SSUParc106

99.91 FJ885932 uncultured cyanobac- Variable community structures in mi- Uncultured cyanobacterium clone terium crobial mats of the mesothermic O1UDE03 16S ribosomal RNA border of hot spring pools in YNP gene, partial sequence.

99.81 AY768403 Microcoleus sp. PCC Cyanobacterial natural products Microcoleus sp. PCC 8701 16S 8701 genes: a source of novel genes for ribosomal RNA gene, partial se- creating a metabolically engineered quence. microbe

99.02 AB614537 N/A Laminated microbial mat on Uncultured bacterium gene for 16S Nagano-yu travertine3 rRNA, partial sequence, clone: S1- 1014

98.62 AB614557 N/A Laminated microbial mat on Uncultured bacterium gene for 16S Nagano-yu travertine3 rRNA, partial sequence, clone: S2- 524

98.51 EF660480 uncultured bacterium A survey of the alkalithermophilic Uncultured bacterium clone F26 prokaryotic diversity from the hot 16S ribosomal RNA gene, partial spring waters in Coamo Puerto Rico sequence.

98.51 EF660494 uncultured bacterium A survey of the alkalithermophilic Uncultured bacterium clone F60 prokaryotic diversity from the hot 16S ribosomal RNA gene, partial spring waters in Coamo Puerto Rico sequence.

98.32 AB614561 N/A Laminated microbial mat on Uncultured bacterium gene for 16S Nagano-yu travertine3 rRNA, partial sequence, clone: S2- 604

98.21 AB518480 uncultured bacterium Microbial process forming daily lam- Uncultured bacterium gene for 16S ination in an aragonitetravertine, Na- ribosomal RNA, partial sequence, gayu hot spring, SW Japan clone: H2

98.12 AB614539 N/A Laminated microbial mat on Uncultured bacterium gene for 16S Nagano-yu travertine3 rRNA, partial sequence, clone: S1- 1064

98.02 AB614535 N/A Laminated microbial mat on Uncultured bacterium gene for 16S Nagano-yu travertine3 rRNA, partial sequence, clone: S1- 434

97.91 DQ146324 uncultured Nostoc Watering, fertilization, and slurry in- Uncultured Nostoc sp. isolate 16710791 sp. oculation promote recovery of bio- DGGE band 24 16S ribosomal RNA logical crust function in degraded gene, partial sequence. soils

97.71 DQ146332 uncultured Lyngbya Watering, fertilization, and slurry in- Uncultured Lyngbya sp. isolate 16710791 sp. oculation promote recovery of bio- DGGE band 14 16S ribosomal RNA logical crust function in degraded gene, partial sequence. soils

97.01 DQ146331 uncultured Lyngbya Watering, fertilization, and slurry in- Uncultured Lyngbya sp. isolate 16710791 sp. oculation promote recovery of bio- DGGE band 13 16S ribosomal RNA logical crust function in degraded gene, partial sequence. soils

1 %ID is the default identity definition in USearch version 4.1.93 2 %ID is the identity definition for NCBI BLAST 3 Study title in Genbank 4 Sequence description in Genbank

76 recognized subphylum cyanobacteria groups in the RDP taxonomy suggesting the sequences constitute a novel basal lineage in the cyanobacteria (4.5A). An analysis of the diversity of cyanobacteria found in eight other 16S rRNA surveys of living stromatolites reviewed by Foster and Green [2][95, 98, 100, 106, 111, 156, 157] shows that a wide diversity of cyanobacteria are found to be associated with living stromatolites (4.6). There does not appear to be any clear relationship between the major phylotypes found in this study and the cyanobacteria 16S rRNA sequences generated in surveys of other modern stromatolite analogues. There does appear to be a faint correlation between cyanobacterial membership with qualitative salinity (freshwater versus marine/hypersaline) in living stromatolite systems. Salinity has shown to have a high correlation with community membership on the global scale [112]. No published 16S rRNA gene sequences from living stromatolite systems show significant identity to SFM-seq (see 4.6).

4.1.2 Major Non-cyanobacterial Phylotypes

As shown in 4.2, the pyrosequence libraries have overlapping membership at the phylum level in most of the abundantly found phyla. One clear difference is the low OP10 membership of 16S rRNA gene sequences in the SFM libraries. Only one OP10 16S rRNA gene sequence was recovered in each SFM library. OP10 is found in relatively higher abundance in the CDM libraries (7% and 3% of total sequences, 4.7). However, when examining the distribution of sequences in this abundantly found phyla at higher resolution, there are several differences in the membership and structure of the CDM versus the SFM. Both the SFM and the CDM have members in the Bacteroidetes, OP10, Chlorobi and Chlorflexi groups, yet the most abundant operational taxonomic units (OTUs) with each of those phyla are not consistent between the SFM and the CDM (see 4.7). For instance, 82 and 92% of of Chlorobi sequences in PCDM0806a and PCDM0806b are in the same 97% identity OTU but none of the SFM sequences fall into

77 this OTU. Interestingly Chlorflexi appears to be a major member of many cyanobacterial mats in YNP based on 16S rRNA gene surveys. [3, 68] but only comprises a maximum of 9% of sequences in any pyrosequence library generated in this study. The most abundant non-cyanobacterial phylum in the pyrosequence libraries is Deinococcus-Thermus (4.7 and 4.1.3). The most abundant 97% identity OTU within Deinococcus-Thermus is closely related to Meiothermus silvanus an aerobic heterotroph isolated from hot-springs in northern Portugal (Tenreiro 1995).

4.1.3 Alpha Diversity

Rarefaction curves show richness of the CDM is less than the SFM (4.8). This trend holds for cyanobacterial specific richness as well. 4.4 summarizes the alpha diversity predictions of each sample as calculated by CatchAll [158]. The slopes of rarefaction curves for all pyrosequence libraries seem to be approaching asymptotes indicating the majority of the diversity in each sample has been recovered. Parametric richness estimates of entire pyrosequence libraries from each sample, however, indicate a significant amount of OTUs still have not been observed. In contrast to the total alpha diversity, coverage of just unique cyanobacterial sequences is greater. Only 13 unique cyanobacterial singletons are in the combined pyrosequence libraries. Extrapolating from the distribution of cyanobacterial reads combined for each mat type using CatchAll, only an additional 6 and 13 cyanobacterial sequences from the CDM and SFM, respectively, are predicted to be unobserved in the dataset (4.4).

Table 4.3: Best BLAST hit in SSURef104 to major non-cyanobacterial OTUs

BLAST Accesion Full name Type Study title Description OTU ID1 %ID to Number of in strain? in in PubmedID top hit2 top hit2 SSURef104 SSURef104 SSURef104

3 90.35 CP000686 Roseiflexus sp. Complete sequence Roseiflexus sp. RS- RS-1 of Roseiflexus sp. 1, complete genome. RS-1 Continued on Next Page. . .

78 Table 4.3: Continued.

– Continued BLAST Accesion Full name Type Study title Description OTU ID1 %ID to Number of in strain? in in PubmedID top hit2 top hit2 SSURef104 SSURef104 SSURef104

4 89.92 FN545867 uncultured Phylogenetic Uncultured bacterium bacterium diversity of unculture partial 16S rRNA bacteria in a hot gene, isolate S2R-57 spring 2 from the Rupi basin, Bulgaria

7 96.90 EU037345 uncultured Tracking the influ- Uncultured bacterium 18996186 bacterium ence of long-term clone G3DCM-199 chromium pollution 16S ribosomal on soil bacterial RNA gene, partial community structures sequence. by comparative anal- yses of 16S rRNA gene phylotypes

11 100.00 FM164634 Novosphingobium Studies on Culture Novosphingobium mathurense dependant and mathurensis partial culture independant 16S rRNA gene, bacterial diversity of isolate TD IW 05 a natural spring water from mid-western Ghats region in India.

12 95.50 DQ295908 uncultured Cohn’s Crenothrix Uncultured bacterium 16452171 bacterium is a filamentous clone Cr-cl9 16S ri- methane oxi- bosomal RNA gene, dizer with an partial sequence. unusual methane monooxygenase

14 99.19 AM749771 Bacteroidetes bac- Isolation of novel Bacteroidetes 18422642 terium P373 bacteria, including a bacterium P373 candidate division, partial 16S rRNA from geothermal gene, isolate P373 soils in New Zealand

15 95.67 AB078065 Flexibacter ruber Y Phylogenetic Flexibacter ruber 12469298 structure of the gene for 16S rRNA, genera Flexibacter, strain:IFO 16677. Flexithrix, and Microscilla deduced from 16S rRNA sequence analysis

16 97.91 AJ308502 sulfidic hot spring Biodiversity within hot Sulfidic hot spring bacterium NPE spring microbial mat bacterium NPE communities: molec- partial 16S rRNA ular monitoring of en- gene, strain NPE richment cultures

19 100.00 GQ354911 uncultured beta Comparison Uncultured beta proteo community structure proteobacterium bacterium between two clone 4-199 16S geochemically ribosomal RNA gene, distinct springs at partial sequence. Alum Rock Park

22 98.72 AY929064 Saprospira sp. Isolation, group Saprospira sp. PdY3 PdY3 behavior and 16S ribosomal algae-lysing RNA gene, partial characteristics of a sequence. novel Saprospira sp. PdY3 with algicidal activity against cyanobacteria

31 100.00 AY574979 Desulfovibrio Y Desulfovibrio putealis Desulfovibrio putealis 15653861 putealis sp. nov., a novel 16S ribosomal sulfate-reducing bac- RNA gene, partial terium isolated from sequence. a deep subsurface aquifer Continued on Next Page. . .

79 Table 4.3: Continued.

– Continued BLAST Accesion Full name Type Study title Description OTU ID1 %ID to Number of in strain? in in PubmedID top hit2 top hit2 SSURef104 SSURef104 SSURef104

34 94.14 EF648024 uncultured Bacterial community Uncultured bacterium bacterium dynamics of aerobic clone HB13 16S ri- activated sludge bosomal RNA gene, during different partial sequence. produced water treatment

38 95.02 EU037195 uncultured Culture-independent Uncultured bacterium bacterium Bacterial Diversity clone MO21 16S ri- Analysis of Hot bosomal RNA gene, Spring Microbial Mat partial sequence. from Khir Ganga, India

39 99.56 EU662525 uncultured Productivity-Diversity Uncultured bacterium Relationships from bacterium clone Chemolithoau- MC1B 16S 128 16S totrophically Based ribosomal RNA gene, Sulfidic Karst partial sequence. Systems

44 98.39 AY753393 uncultured Phylogenetic and Uncultured bacterium 17651138 bacterium functional diversity of SK47 16S ribosomal bacteria in biofilms RNA gene gene, par- from metal surfaces tial sequence. of an alkaline district heating system

45 98.22 FJ204422 groundwater Oxidative and Groundwater biofilm biofilm bacterium ultraviolet C treat- bacterium H2 16S ri- H2 ment of culturable bosomal RNA gene, biofilm communities partial sequence. harvested and identified from groundwater wells

61 100.00 AY945920 uncultured Thauera and Azoar- Uncultured bacterium 16420635 bacterium cus as functionally clone DR-29 16S ri- important genera in a bosomal RNA gene, denitrifying quinoline- partial sequence. removal bioreactor as revealed by microbial community structure comparison

63 91.74 FJ793172 uncultured Diversity of Archaea Uncultured bacterium bacterium and Bacteria in the clone TDB50 16S ri- Tao Dam Hot Spring bosomal RNA gene, in Thailand partial sequence.

68 99.55 EU266920 uncultured Depth-resolved Uncultured Bac- 18083871 Bacteroidetes quantification of teroidetes/Chlorobi Chlorobi group anaerobic toluene group bacterium bacterium degraders and clone D25 47 small aquifer microbial subunit ribosomal community patterns RNA gene, partial in distinct redox zones of a tar oil contaminant plume

69 88.52 EU134014 uncultured Novelty and unique- Uncultured bacterium 18606799 bacterium ness patterns of rare clone FFCH517 16S members of the soil ribosomal RNA gene, biosphere partial sequence.

90 99.08 DQ413062 uncultured Comparative Uncultured bacterium bacterium Analysis of Microbial clone 2 16S riboso- Communities from mal RNA gene, par- Culture-dependent tial sequence. and -independent Approaches in an Anaerobic/Aerobic SBR Reactor Continued on Next Page. . .

80 Table 4.3: Continued.

– Continued BLAST Accesion Full name Type Study title Description OTU ID1 %ID to Number of in strain? in in PubmedID top hit2 top hit2 SSURef104 SSURef104 SSURef104

100 98.31 AY510244 uncultured beta Bacterial diversity Uncultured beta 16329854 proteo and ecosystem func- proteobacterium bacterium tion of filamentous clone LKC3 102B.27 microbial mats from 16S ribosomal aphotic (cave) sulfidic RNA gene, partial springs dominated sequence. by chemolithoau- totrophic ”Epsilonproteobacte- ria”

110 93.00 GU390832 uncultured Dark hot springs: Uncultured bacterium bacterium community structure clone AS053 B18 of biofilms from the 16S ribosomal sulfidic caves of RNA gene, partial Acquasanta Terme sequence. (Italy)

112 86.64 EU861972 uncultured soil The effects of Uncultured soil 18764871 bacterium chronic nitrogen bacterium clone fertilization on alpine bac2nit96 16S tundra soil microbial ribosomal RNA gene, communities: partial sequence. implications for carbon and nitrogen cycling

135 99.14 FJ936833 uncultured 16S rDNA analysis Uncultured bacterium bacterium of environmental clone kab116 16S ri- samples from bosomal RNA gene, volcano mud taken partial sequence. at Avachinsky (Kamtchatka)

144 95.40 DQ256728 Chitinimonas Y Chitinimonas koreen- Chitinimonas 16902004 koreensis sis sp. nov., isolated koreensis strain from greenhouse soil R2A43-10 16S in Korea ribosomal RNA gene, partial sequence.

152 99.17 FJ484056 uncultured Novel microbial diver- Uncultured Thermus- Thermus- sity retrieved by an Deinococcus group Deinococcus group autonomous rover in bacterium clone bacterium the world’s deepest Z100B57 16S phreatic sinkhole ribosomal RNA gene, partial sequence

156 98.44 FJ484056 uncultured Novel microbial diver- Uncultured Thermus- Thermus- sity retrieved by an Deinococcus group Deinococcus group autonomous rover in bacterium clone bacterium the world’s deepest Z100B57 16S phreatic sinkhole ribosomal RNA gene, partial sequence

166 100.00 AY861866 uncultured Hydrogen and bioen- Uncultured Thermus 15671178 Thermus sp. ergetics in the Yel- sp. clone OPPB146 lowstone geothermal 16S ribosomal ecosystem RNA gene, partial sequence.

167 98.83 FJ382211 uncultured Potentially Uncultured bacterium 19581474 bacterium pathogenic bacteria clone 4S 1h05 16S in shower water and ribosomal RNA gene, air of a stem cell partial sequence. transplant unit

171 99.13 AY293405 uncultured Partitioning of Min- Uncultured Chlorobi Chlorobi eralogical, Geochem- bacterium clone bacterium ical, and Microbial SM2B09 16S Systems in Travertine ribosomal RNA gene, Terraces at Yellow- partial sequence. stone Hot Springs Continued on Next Page. . .

81 Table 4.3: Continued.

– Continued BLAST Accesion Full name Type Study title Description OTU ID1 %ID to Number of in strain? in in PubmedID top hit2 top hit2 SSURef104 SSURef104 SSURef104

175 97.35 FJ915153 Acidisphaera sp. Microorganisms Acidisphaera sp. nju- nju-AMDS1 in sediment of an AMDS1 16S riboso- extreme acidic river mal RNA gene, par- with a mean pH of tial sequence. 2.8. China: Anhui province, TongLing, JiGuanShan mine

185 93.12 FN563324 uncultured Impact of a changing Uncultured bacterium bacterium HRT, OLR and partial 16S rRNA substrate on gene, clone HAW- prokaryotes involved RM37-2-M-1755d-B in biomethanizitaion in a mesophilic and fuzzy logic controlled 2 phase biogas reactor

190 96.15 EF205453 uncultured Y Bacterial community Uncultured bacterium 19023516 bacterium composition in clone YCB25 16S ri- thermophilic bosomal RNA gene, microbial mats from partial sequence. five hot springs in central Tibet

192 98.96 X84211 Meiothermus Y Thermus silvanus sp. T. silvanus 16S rRNA 7547281 silvanus nov. and Thermus gene chliarophilus sp. nov., two new species related to Thermus ruber but with lower growth temperatures

193 98.68 EU283461 uncultured Sphin- Model Environment Uncultured Sphin- gomonas sp. For Early Earth gomonas sp. clone Harbors Deeply A553 16S ribosomal Divergent Divisions RNA gene, partial sequence.

194 92.77 EU134994 uncultured Novelty and unique- Uncultured bacterium 18606799 bacterium ness patterns of rare clone FFCH4697 16S members of the soil ribosomal RNA gene, biosphere partial sequence.

213 87.30 EU234165 uncultured Bacterial commu- Uncultured bacterium bacterium nities in penicillin clone B37 16S ri- G production bosomal RNA gene, wastewater and the partial sequence. receiving river by using culture and unculture techniques

214 98.76 GU208313 uncultured Study on Bacterial Uncultured prokary- prokaryote 16S rDNA Diversity ote clone Ftt3-1 16S in Sediment of Dong- ribosomal RNA gene, ping Lake partial sequence.

216 98.79 FN667251 uncultured Bacterial diversity at Uncultured compost compost different stages of the bacterium partial 16S bacterium composting process rRNA gene, clone FS983

229 100.00 X84211 Meiothermus Y emphThermus T. silvanus 16S rRNA 7547281 silvanus silvanus sp. nov. gene and Thermus chliarophilus sp. nov., two new species related to Thermus ruber but with lower growth temperatures Continued on Next Page. . .

82 Table 4.3: Continued.

– Continued BLAST Accesion Full name Type Study title Description OTU ID1 %ID to Number of in strain? in in PubmedID top hit2 top hit2 SSURef104 SSURef104 SSURef104

248 99.58 AF542054 Thermomonas hy- Y Thermomonas Thermomonas 12747412 drothermalis hydrothermalis sp. hydrothermalis 16S nov., a new slightly ribosomal RNA gene, thermophilic gamma- partial sequence. proteobacterium isolated from a hot spring in central Portugal

272 92.98 DQ146482 Desulfatirhabdium Y Desulfatirhabdium Desulfatirhabdium 18175693 butyrativorans butyrativorans gen. butyrativorans strain nov., sp. nov., a HB1 16S ribosomal butyrate-oxidizing, RNA gene, partial sulfate-reducing sequence. bacterium isolated from an anaerobic bioreactor

276 90.13 CP000804 Roseiflexus Y Complete sequence Roseiflexus castenholzii of Roseiflexus cas- castenholzii DSM tenholzii DSM 13941 13941, complete genome.

286 97.86 AM180886 uncultured In situ detection of Uncultured 17504489 Acidobacteria novel Acidobacteria Acidobacteria bacterium in microbial mats bacterium partial 16S from a chemolithoau- rRNA gene, clone totrophically based SS LKC22 UA44 cave ecosystem (Lower Kane Cave, WY, USA)

287 95.48 AB240316 uncultured Analysis of microbial Uncultured bacterium bacterium community structure gene for 16S rRNA, in rhizosphere of partial sequence, Phragmites clone: RB205.

295 93.59 AY921893 uncultured Comparative Uncultured Chlo- 15845853 Chloroflexi metagenomics roflexi bacterium bacterium of microbial clone AKYH632 16S communities ribosomal RNA gene, partial sequence.

314 96.23 EU490278 uncultured Microbial biodiversity Uncultured bacterium 19278453 bacterium of thermophilic com- clone ERB-E5 16S ri- munities in hot min- bosomal RNA gene, eral soils of Tramway partial sequence. Ridge, Mount Ere- bus, Antarctica

318 90.16 DQ066432 estrogen- 17beta-estradiol- Estrogen-degrading 17310711 degrading degrading bacteria bacterium KC2 16S bacterium KC2 isolated from ribosomal RNA gene, activated sludge partial sequence.

323 100.00 AJ871169 Meiothermus Meiothermus timidus Meiothermus timidus 15796977 timidus sp. nov., a new partial 16S rRNA slightly thermophilic gene, strain RQ-10 yellow-pigmented species

328 99.55 EU887785 uncultured PCR and RFLP anal- Uncultured Catel- Catellibacterium ysis of anaerobes of libacterium sp. clone sp. Nisargruna biogas 6IISN 16S ribosomal plant RNA gene, partial sequence.

335 97.57 EU246826 uncultured A polyphasic Uncultured Bac- Bacteroidetes description of teroidetes bacterium bacterium bacteria diversity clone Cobs2TisG7 in Pocillopora 16S ribosomal meandrina at RNA gene, partial Palmyra Atoll sequence. Continued on Next Page. . .

83 Table 4.3: Continued.

– Continued BLAST Accesion Full name Type Study title Description OTU ID1 %ID to Number of in strain? in in PubmedID top hit2 top hit2 SSURef104 SSURef104 SSURef104

336 100.00 AY861797 uncultured beta Hydrogen and bioen- Uncultured beta 15671178 proteo ergetics in the Yel- proteobacterium bacterium lowstone geothermal clone OPPB064 16S ecosystem ribosomal RNA gene, partial sequence.

337 99.58 AF026985 unidentified beta Novel division level Unidentified beta pro- 9440526 proteobacterium bacterial diversity in teobacterium OPB37 OPB37 a Yellowstone hot 16S ribosomal RNA spring gene, complete se- quence.

338 99.19 AM749768 candidate division Isolation of novel Candidate division 18422642 OP10 bacterium bacteria, including a OP10 bacterium P488 candidate division, P488 partial 16S from geothermal rRNA gene, isolate soils in New Zealand P488

352 84.23 AJ536886 uncultured Comparative Uncultured bacterium bacterium molecular analysis partial 16S rRNA of bacterial diversity gene, clone in soil samples of JG30-KF-AS1 uranium wastes by using different 16S rDNA amplification primer pairs

364 97.39 EF648160 uncultured Bacterial community Uncultured bacterium bacterium dynamics of aerobic clone XJ118 16S ri- activated sludge bosomal RNA gene, during different partial sequence. produced water treatment

367 100.00 AF027008 unidentified Novel division level Unidentified Cy- 9440526 Cytophagales bacterial diversity in tophagales OPB73 OPB73 a Yellowstone hot 16S ribosomal RNA spring gene, complete sequence.

383 92.07 AB252958 uncultured Identification of Uncultured bacterium bacterium bacteria in an gene for 16S rRNA, iron-oxidation biofilm partial sequence, at Shibayama clone: 3. lagoon, Ishikawa, Japan

384 100.00 EU037204 uncultured Culture-independent Uncultured bacterium bacterium Bacterial Diversity clone MO67 16S ri- Analysis of Hot bosomal RNA gene, Spring Microbial Mat partial sequence. from Khir Ganga, India

386 93.50 FJ936809 uncultured 16S rDNA analysis Uncultured bacterium bacterium of environmental clone kab92 16S ri- samples from bosomal RNA gene, volcano mud taken partial sequence. at Avachinsky (Kamtchatka)

390 100.00 GQ844388 uncultured Microbial diversity in Uncultured bacterium bacterium the anaerobic tank of clone BP-B53 16S ri- a full-scale produced bosomal RNA gene, water treatment plant partial sequence.

393 100.00 FJ936846 uncultured 16S rDNA analysis Uncultured bacterium bacterium of environmental clone kab129 16S ri- samples from bosomal RNA gene, volcano mud taken partial sequence. at Avachinsky (Kamtchatka) Continued on Next Page. . .

84 Table 4.3: Continued.

– Continued BLAST Accesion Full name Type Study title Description OTU ID1 %ID to Number of in strain? in in PubmedID top hit2 top hit2 SSURef104 SSURef104 SSURef104

395 92.77 AM749768 candidate division Isolation of novel Candidate division 18422642 OP10 bacterium bacteria, including a OP10 bacterium P488 candidate division, P488 partial 16S from geothermal rRNA gene, isolate soils in New Zealand P488

397 90.46 AF445665 uncultured Microbial 16S Uncultured CFB Bacteroidetes rRNA diversity group bacterium bacterium within travertine clone SM1C08 16S depositional facies ribosomal RNA gene, from 73 degrees partial sequence. to 25 degrees C at Angel Terrace, Mammoth Hot Springs, Yellowstone National Park, USA

400 99.58 AF407700 uncultured Phylogenetic charac- Uncultured bacterium bacterium terization of microbial clone G13 16S ri- communities from bosomal RNA gene, Australia’s Great partial sequence. Artesian Basin

402 100.00 GU559782 uncultured Novel Clostridium Uncultured bacterium bacterium populations involved clone C24.4 16S ri- in anaerobic bosomal RNA gene, degradation of partial sequence. Microcystis blooms

405 99.52 AY340841 uncultured Simple organic Uncultured bacterium 19712316 bacterium electron donors clone SR FBR L83 support diverse 16S ribosomal sulfate-reducing RNA gene, partial communities in sequence. fluidized-bed reactors treating acidic metal- and sulfate-containing wastewater

410 95.95 AB252956 uncultured Identification of Uncultured bacterium bacterium bacteria in an gene for 16S rRNA, iron-oxidation biofilm partial sequence, at Shibayama clone: 456. lagoon, Ishikawa, Japan

1 See 4.7 for relative distribution of above OTUs 2 Based on BLAST searching against the Silva SSURef104 SSU rRNA gene database

4.2 Discussion 4.2.1 Surface Community Structure

By 16S rRNA gene sequence composition, the Submerged, Filamentous morphotype Mat (SFM) possesses most of the cyanobacterial diversity of the Cocci, Diatom Mat (CDM) and identical sequences to the most abundant sequence of the CDM (4.3). It has been postulated that rare members of microbial communities have the metabolic

85 Figure 4.3: The heat-map and dendrogram shown here depict the distribution and abun- dance of nonsingleton/doubleton cyanobacterial reads from the pyrosequence libraries in this study. The dendrogram is not a phylogeny but a furthest-neighbour clustering of all pairwise substitution per site comparisons of the sequences.

Table 4.4: Alpha diversity calculations for pyrosequence libraries.

OTU Phylotypes Library Total # of Estimated Estimated Lower Upper % Name(s) Observed Total Coverage Confidence Confidence identity OTUs OTUs (%) Bound1 Bound1

100 Cyanobacteria PCDM0806a 15 17.8 84 15.8 25.2 and PCDM0806b 97 all PCDM0806a 69 178.5 39 102.7 424.6 97 all PCDM0806b 77 160.8 48 117.8 249 100 Cyanobacteria PSFM0809 32 41.3 77 36.1 53.2 and PSFM0209 97 all PSFM0809 117 198.3 59 161.7 265.2 97 all PSFM0209 108 236 46 169.7 373.5 1 95% confidence bound. For details see Bunge [158].

86 Figure 4.4: Maximum likelihood tree depicting the phylogenetic placement of Chlorogloeopsis and Fischerella phylotypes found in this study. We are using Group IV cyanobacteria as the outgroup for this tree. Gloeobacter violaceus (PCC 7421), com- monly used as an outgroup in cyanobacteria phylogenies, falls in Group IV according in the RDP taxonomic organization of cyanobacteria. Outlined sequences were generated in this study. The asterisk * denotes sequences that were placed into the tree by pplacer. Values at nodes are bootstrap support percentages. Nodes without support values were created by pplacer insertion. Polytomies denote undefined descendant branching order based on bootstrap values (i.e. nodes with bootstrap support < 50% were collapsed). Filled clades have cultured representation.

87 Figure 4.5: A. Location of the stromatolite building phylotype in a broad, unrooted, maxi- mum likelihood phylogeny of cyanobacteria. Groups are those named and recognized by RDP directed by study done by Wilmotte and Herdman [1]. Nodes with bootstrap support < 50% were collapsed and thus polytomies denote ambiguous branching order of de- scendant nodes. B. Histogram of all identity values as determined by USearch between SFM-seq and cyanobacterial sequences in SSUParc106.

88 Figure 4.6: Phylogeny of all cyanobacterial 16S SSU rRNA gene sequences from eight molecular studies of modern stromatolite studies as reviewed by Foster and Green [2]. Accession numbers and short descriptions for reference sequences used to build the backbone tree are shown. Arrow indicates position of SFM-seq tree. The qualitative salinity for each stromatolite derived 16S rRNA gene sequence is also shown. Clades with bootstrap values below 50% were collapsed and therefore polytomies indicate ambiguous branching order. Gloebacter violaceus PCC 7421 is used as the outgroup.

89 Figure 4.7: Distribution of 16S rRNA gene non-cyanobacterial pyrosequences into 97% identity OTUs. OTUs with less than or equal to five total 16S rRNA gene sequences summed across every pyrosequence library are omitted from the figure.

90 Figure 4.8: Rarefaction curves for pyrosequence libraries and just cyanobacterial se- quences for combined PCDM and PSFM libraries respectively. Sequences were clustered at 97% identity using UClust. Points in curves represent the average observed OTUs for 10 re-samplings. Libraries were re-sampled at 100 sequence intervals.

91 machinery to emerge with environmental changes [35]. In this stromatolite system, it appears that the Chlorogloeopsis HTF and Fischerella (Mastigocladus laminosus) phylotypes emerge from the SFM following a shift in environmental conditions. If so, each progression from the Main Body lithofacies to the Drape lithofacies [145] records an emergence event of Chlorogloeopsis HTF and Fischerella from rare to predominant members. Chlorogloeopsis HTF and Fischerella are found in hot-springs across the globe and both genera exploit nitrogen-limited waters where they can out-compete non-nitrogen-fixing cyanobacteria [78]. Fischerella can also be found in high abundance in YNP where there is sufficient combined nitrogen to make nitrogen fixation unnecessary [159]. Synechococcus mats are found in many hot-spring outfalls in YNP, but Fischerella can predominate in waters below 58 circC where the pH is low and/or sulfide levels are moderate [81]. Chlorogloeopsis HTF (originally Mastigocladus HTF) was isolated by Castenholz [160] from Icelandic hot-springs where it was found to exhibit a higher upper-temperature limit than previously described Mastigocladus types. Similarly to Fischerella, in YNP Chlorogloeopsis HTF predominates in lower pH springs where Synechococcus is absent [161]. While the SFM preserves a limited reservoir of the predominant CDM phylotypes, the CDM does not appear to possess the predominant SFM cyanobacterium (4.3). It is possible that the tube-building phylotype of the SFM would be observed in the CDM with deeper sequencing, although alpha diversity measurements indicate the sequencing effort has recovered most of expected unique cyanobacterial sequences (4.8, 4.4). Therefore, we interpret progressions from Drape to Main Body lithofacies to include a non-conformity, or period of undetermined length for which there is no corresponding record of deposition, as it is unknown how long it takes the SFM to reestablish itself on the stromatolites. Measured growth rates on artificial growth substrates over the short term have thus far proven to be faster than rates determined by radiometric dating of

92 stromatolites with multiple generations of Drape lithofacies [145]. There appear to be periods of unknown length where the stromatolites are not accreting and these no-growth periods may coincide with the time it takes to reestablish the SFM. Berelson et al. [145] describe 14C Csignatures in the YNP stromatolites that suggest the fixed carbon of the CDM has a greater atmospheric to subsurface CO2 input ratio than the SFM. Specifically, the Drape lithofacies shows a generally younger carbon age than the Main Body lithofacies indicating the CDM autotrophs receive more carbon input from younger atmospheric CO2 as compared to SFM autotrophs that receive a greater carbon input from older, radiocarbon-dead subsurface CO2 that vents from the hot spring. One explanation is that the CDM community exists on the stromatolites when water level in the hot spring drops and stromatolites protrude out of the pool. Indeed, the CDM has only been observed when stromatolites extend above the waters surface whereas the SFM is found on more submerged stromatolites. Nitrogen availability, temperature, sulfide and pH have all proven to be influential in selecting cyanobacterial phylotypes in hot-spring communities [78] and could potentially change with water level in this system. Chlorogloeopsis HTF and Fischerella are both heterocystous [125]. The SFM cyanobacterium, contrastingly, is not phylogenetically related to heterocystous cyanobacteria (4.5A) and heterocysts have proven to be a phylogenetically coherent trait [126, 162]. Stal et al. [163], in a study of the cyanobacterial mats in hypersaline ponds of Guererro Negro, Baja California Sur, reported a similar trend in relative abundance differences of heterocystous versus non-heterocystous cyanobacteria in relation to water level to that observed in this study. Nitrogen fixation and it’s relationship to oxygen tension could potentially explain the community structuring of submerged versus non-submerged mats provided combined nitrogen levels are low enough that the cyanobacteria must fix nitrogen. Presumably, lower diffusion rates in the submerged mat could allow for oxygen accumulation during the day beyond atmosphere saturation as

93 observed in the submerged Baja mats [163]. High oxygen tension would inhibit nitrogen fixation by heterocystous types that fix nitrogen during the day [164] giving the advantage to the tube-building phylotype which would presumably fix nitrogen at night once oxygen had been sufficiently depleted by respiration. Although the NH4+ and NO3−content of the stromatolite hot-spring was below 0.20 and 0.10 ppm, respectively, cyanobacterial hot-spring communities in YNP have been shown to survive without fixing nitrogen in waters with combined nitrogen levels of 0.15 ppm [159]. Therefore, it is unknown if either mat type is fixing nitrogen. Additionally, it has not been shown that the SFM accumulates oxygen in the light similarly to the marine Baja mats. Defined temperature niches in Synechococcus species have been observed within Octopus Spring (YNP) cyanobacterial mats [76] and inter-genus niches have been observed along a temperature gradient in Hunter’s Hot Springs, Oregon [78]. Similar temperature adaptations may be selecting for the cyanobacterial phylotypes in the YNP stromatolites. The exposed mats would experience greater temperature fluctuations throughout a 24-hour cycle and with the changing seasons. Also, the CDM would experience lower temperatures on average than the SFM. Fischerella and Chlorogloeopsis HTF are considered to be high-temperature tolerant nitrogen fixers, however, and have been observed in waters up to 58 circC and 64 circC respectively [78]. Fischerella has been found to predominate at temperatures 52 circC but below 56 circC in a Mammoth Hot-Springs outfall channel, YNP and White Creek, YNP [159]. The proximity of the apparent upper temperature limit for Fischerella to this study’s water temperature (56 C) may account for its lower abundance in submerged mats, though closely related cyanobacteria have been observed to occupy different temperature niches [76, 165], and therefore ribotype identity of a particular cyanobacterium probably is not indicative its temperature optimum. Nevertheless, the water temperature of this study’s hot-spring is within the range of temperatures where Fischerella and Chlorogloeopsis HTF communities have been previously observed. In fact, temperatures

94 of 60 circC have been used to enrich for Chlorogloeopsis HTF in the laboratory [161]. It does not appear that a water temperature of 56 circC would select against Chlorogloeopsis HTF but it may be inhibitory to Fischerella. In contrast to temperature, the correlation of sulfide with the cyanobacterial restructuring is more consistent with field observations of Fischerella and Chlorogloeopsis HTF. The sulfide levels in the stromatolite hot-spring are fairly high near the vent 0.60 ppm [150], and could inhibit some cyanobacteria [78, 166]. Chlorogloeopsis HTF and Fischerella appear to have a low and moderate tolerance to sulfide, respectively [78], and we would expect the submerged mats to experience more consistent exposure to the sulfidic water. Sulfide concentrations of 0.15 ppm were sufficient to inhibit photosystem II function by 50% in a Fischerella strain isolated from the Boiling River, YNP [166], however, the strain was isolated from a low sulfide environment and sulfide tolerance has not been proven to be reliably extrapolated to a cyanobacterium based on the phenotype of its close relatives [166]. It should also be noted that the sulfide concentration at the north end of the stromatolite hot-spring has been measured lower (0.224 ppm) than near the vent. Thus far, studies of the YNP stromatolites have been limited to samples collected from the north end. Stromatolite structures do grow on the southern rim, closer to the vent, however it is unknown at this time if the ”Drape” lithofacies is found in the south end stromatolites or if the sulfide levels of the south end are higher than the north end. The pH of the stromatolite hot-spring ( 5.7) is too low for many cyanobacteria, but both Chlorogloeopsis HTF and Fischerella have been found at lower pH and seem to be acid tolerant [78]. The pH of the CDM may be higher than the SFM because pH affects due to carbon fixation would be less mitigated by the surrounding water, although respiration would dampen that difference. Regardless, as with temperature, Chlorogloeopsis HTF and Fischerella would not appear to be inhibited by the pH of the surrounding spring based on published field observations and therefore presumably pH

95 is not strongly influencing cyanobacterial community structure in this system.

4.2.2 Phylogeny of Stromatolite Builder

The nearest-neighbor in SSURef104 to the sequence SFM-seq representing the predominant SFM cyanobacterium that is building the bulk of the stromatolites and is responsible for the fine laminations is not classified beyond the phylum level in Greengenes, RDP or the Silva databases. As shown in 4.5B, SFM-seq does not share genus-level identity, 95% [167], with any other sequence in Silva SSURef104. Moreover, the phylogeny of SFM-seq shows it does not belong in the cyanobacterial sub-phylum groups currently recognized by the RDP (4.5A). SFM-seq and its SW Japan stromatolite neighbor constitute a heretofore undocumented lineage within the cyanobacteria showing molecular studies have not recovered the full diversity of cyanobacterial 16S rRNA genes and that there is captured cyanobacterial diversity in sequence databases that remains undescribed from a phylogenetic standpoint.

4.2.3 Comparisons to other living stromatolite studies

Other molecular studies of living stromatolites have elucidated the complexity of stromatolite microbial membership and shown the correlation of geochemistry and stromatolite characteristics with diversity [100, 109]. To our knowledge, no other living stromatolite system has characteristics such that the growth of the stromatolites can be attributable to one or any specific phylotype(s). In contrast, the stromatolites presented here are predominantly built by one cyanobacterial phylotype. Furthermore, this cyanobacterial phylotype constitutes a novel and basal lineage in the cyanobacterial 16S rRNA gene phylogeny. It should also be noted that molecular studies have not shown any strong commonalities in the microbial members of the geographically distinct living stromatolites in Shark Bay, Australia and Exuma Sound, Bahamas [109]. These two living systems exhibit different lamination patterns and local conditions, and therefore, common

96 microbial membership would not be expected and would not necessarily reflect significance of any particular phylotype with respect to stromatolite growth. As shown in 4.6, a wide diversity of cyanobacterial 16S rRNA sequences have been recovered from living stromatolites. In the YNP examples and some SW Japan stromatolites, lamination and growth can be attributed to cyanobacteria [168]. Yet the full significance of such a commonality is unknown without more detailed information on the origins and geographical extent of the Japan travertine cyanobacterium that displays such close relation to SFM-seq. Several studies have focused on carbonate stromatolites in SW Japan hot springs [168–170]. A study by Takashima and Kano [168] show the presence of filamentous cyanobacteria on SW Japan carbonate stromatolites in the Shionoha hot spring and attributed lamination to cyanobacterial metabolism effects. While the above studies all discuss travertine stromatolites that exhibit daily lamination, it appears that the lamination is distinct both in texture and in the influencing microbes depending if the travertine is aragonite or calcite [169] and where the travertine deposit is in relation to the hot spring vent [169, 170]. Therefore, it may be only happenstance that the two systems share the presence of a particular cyanobacterium. Regardless, we describe and produce a nearly full-length 16S rRNA gene sequence from a cyanobacterium that is a stromatolite-building microbe. Additionally, particular SW Japan travertine stromatolites [168] and the YNP examples show that cyanobacterial growth rhythms can create laminae in conditions suitable for rapid lithification (for instance, in silica supersaturated hot springs). Additionally, cyanobacteria have been historically associated with stromatolites in the rock record. This is in part due to the presumed effect on pCO2 by carbon fixation and also to the conspicuous and easily identifiable morphologies of cyanobacteria [109]. In contrast, recent studies have highlighted the roles of non-cyanobacterial members in the

+ interplay between Ca2 and pCO2 concentrations that lead to CaCO3 precipitation or

97 dissolution [142, 143] and shown that carbonate stromatolites are not formed under the influence of cyanobacteria alone. Likewise, molecular analyses by Papineau et al. [109] found the fraction of cyanobacteria in clone libraries of 16S rRNA genes to be less than expected in living stromatolites found at Shark Bay, Australia. Similarly, Walter et al. [103] proposed that anoxygenic phototrophs and not cyanobacteria were influencing the growth of some living stromatolites in YNP. Here, in contrast, we present living YNP stromatolite growth mainly attributable to cyanobacteria.

4.2.4 Major Non-cyanobacterial Phylotypes

In carbonate systems, microbial effects on the budget of carbonate associated solutes drive precipitation. In contrast, a system where accretion is driven by cooling and evaporation of Si super-saturated water or by binding of detrital solids, the construction of stromatolites would be more associated with the surface chemistry of microbial cells and mats. It may be that while the non-cyanobacterial members play crucial roles in the health and function of this study’s mat ecosystem(s), they cannot be directly associated with stromatolite construction. Clearly the SFM and CDM possess different non-cyanobacterial microbial members (see 4.7), but whether these differences in membership are caused by the shift in the composition of the primary-producers in the mats or environmental effects or both is not known at this time.

4.3 Summary

Our molecular investigation of the cyanobacterial mats on stromatolites in YNP elucidated the biological significance of the two stromatolite lithofacies as described by Berelson et al. [145]. Specifically, transitions from the Main Body lithofacies to the Drape lithofacies mark past emergence events of Chlorogloeopsis HTF and Fischerella from rare members of the community to the large majority of surface biomass. The cyanobacterial community re-structuring is likely driven by physiological traits under selective environmental pressures that vary with water level, including sulfide exposure

98 and potentially oxygen tension. Transition from the CDM back to the SFM seems to require the reestablishment of the predominant cyanobacterium that may coincide with a period of no accretion. The SFM is dominated by a non-heterocystous, filamentous, novel cyanobacterium that is preserved as silica tubes that are the constructive microstructure behind the stromatolites lamination. This cyanobacterium is interestingly also found to be associated with travertine stromatolites in SW Japan hot springs. The YNP living stromatolites are new, compelling analogues for ancient stromatolites and display distinct microbiological characteristics in comparison to popularly studied living marine examples.

4.4 Methods 4.4.1 Sample Collection

4.1 summarizes the sample information. Surface samples were collected by scraping mats from stromatolites using sterile razor blades. Samples were placed in cryovials and frozen in liquid nitrogen for transport and kept at -80 circC for long term storage. 16S rRNA gene sequence libraries from each sample were named as follows: the first letter of the library name denotes the sequencing technology used to create the library (P for pyrosequencing); the next 3 letters of each sample name denotes its mat type; and four numbers indicate the date of the sample (MMYY). For example, the pyrosequence library from a SFM mat sample or incipient SFM collected in August of 2009 is labeled PSFM0809. Lastly, samples of the same surface mat type collected on the same date are lettered. NH4+, NO2−and NO3−content of the hot spring were measured on-site using colorimetric assays with CHEMetrics kit numbers K-1403, K-7004 and K-6913 (CHEMetrics, Calverton, VA) for NH4+, NO2−and NO3−respectively. Sulfide was measured on-site using the methylene blue colorimetric assay (K-9510, CHEMetrics). A V-2000 Multi-analyte Photometer (CHEMetrics) was used in the field to read the sample

99 ampoules.

4.4.2 Microscopy, SEM and Thin-sectioning

A portion of each sample was thawed in the lab, fixed in 2% paraformaldehyde in 0.2 um filtered hot spring water, homogenized and collected on a 0.2 um black polycarbonate membrane filter (Millipore, Billerica, MA). Cyanobacteria on the membrane filter were viewed by autofluorescence using a HQ:R fluorescence filter (Chroma Technology Corp, Rockingham, VA) and a Leica DM RXA microscope. SEM micrographs were taken with a Hitachi TM-1000 Tabletop Microscope. For thin-sectioning, stromatolite samples were dried, impregnated with petropoxy, and cut with a diamond saw. The cut surface was polished flat, cemented to a microscope slide with additional petropoxy, and ground down to a thickness of 40 microns for petrographic analysis.

4.4.3 DNA Extraction, PCR, Cloning, Capillary Sequencing

DNA was extracted using the MoBio Powersoil DNA extraction kit (MoBio, Carlsbad, CA). A one-minute bead-beating step was employed for lysis in place of the 10 min vortexing step outlined in the manufacturers protocol. For Sanger capillary sequencing, bacterial primers 8F and 1492R [24] were used to amplify 16S rRNA genes from environmental DNA samples. PCR, cloning/transformation and sequencing were completed as described by Sahl et al. [171].

4.4.4 Sanger Sequence Assembly and Quality Trimming

Bases were called from each chromatogram using Phred [31, 32] and reads for each clone were assembled using Phrap (Phil Green, www.phrap.org). Phred and Phrap were wrapped by a custom Python script that was also developed to trim the reads by quality score (details in Supplementary Methods).

100 4.4.5 PCR and Pyrosequencing

For pyrosequencing, bacterial 16S rRNA genes were amplified from environmental DNA using 8F and 338R [172] primers. PCR was conducted with Promega PCR MasterMix (Promega). The PCR program was as follows: Initial denaturation for 2 min at 94 circC followed by 30 cycles of annealing (52 circC for 20 sec), elongation (72 circC for 20 sec) and denaturation (94 circC for 1 min). The amplicon region has proven to yield reliable results in phylogenetic studies even with short reads [20, 21]. Eight base barcodes specific to each sample were attached to reverse primers by a 2 base linker to allow for post-sequencing binning of reads by sample (for general method overview see Hamady et al. [29]). Additionally, the primers included adapter sequences to be compatible with Roches 454 GSFLX sequencing platform. Amplicons were normalized using the SeQualprep kit (Invitrogen), pooled, and gel purified (Montage DNA gel extraction kit, Millipore) prior to being sent to the sequencing facility (Anschutz Medical Campus, University of Colorado Denver).

4.4.6 Sequence Analysis

Quality Control of Pyrosequences. Pyrosequences were binned by barcodes and initially quality filtered based on vital parameters identified by Huse et al. [30] using the split libraries.py script in the QIIME software package [45]. Specifically, sequences with ambiguous characters, errors in the barcode or primer, sequence length less than 120 nt or greater than 275 nt, average quality score below 27, or homopolymer runs greater than 6 nt were discarded. The remaining sequences were denoised using DeNoiser version 0.851 [39]. Chimeras were identified by ChimeraSlayer [43] and discarded.

Taxonomic Classifications. The phylogenetic content of each sample was determined from the pyrosequence libraries. Taxonomic classifications for pyrosequences were made by recruiting pyrosequences to classified reference sequences in the Silva SSURef102 NR database using BLAST [133] and extrapolating

101 reference annotations of the top BLAST hit for each pyrosequence. Quality filtering of reference sequences is described in Supplementary Methods.

Alpha Diversity Measurements. Richness calculations for all sequences in a given sample were calculated from Operational Taxonomic Unit (OTU) counts after clustering reads at 97% identity. For richness estimates of just the cyanobacterial phylotypes, reads were clustered at 100% identity. All clustering of sequences for alpha diversity analyses was done using UClust (Edgar, 2010) via QIIME [45] with input order to UClust determined by read abundance. Rarefaction curves (4.8) were generated using QIIME. Parametric estimates of richness were determined using CatchAll [173, 173]. The richness estimates and corresponding 95% confidence bounds from the parametric model that fit the data best are reported in 4.4.

Sequence Alignment Multiple sequence alignments of nearly full-length, and short-length cyanobacterial 16S rRNA gene sequences were constructed using SSU-align [174, 175] with a custom, cyanobacteria-specific covariance model constructed from sequences annotated as cyanobacteria in the Silva taxonomy field of the Silva SSURef102 NR database [18] (details in Supplementary Methods). All alignments were filtered with SSU-align using posterior probabilities of aligned characters calculated by SSU-align. Specifically, positions where less than 95% of characters had posterior alignment probabilities of 95% or greater were masked for phylogenetic reconstruction. All alignments were visually inspected before further analysis. Phylogenetic reconstructions. Only nearly-full length sequences were used to construct phylogenies. RAxML [176] version 7.2.7 was used to reconstruct phylogenies depicted in 4.4 and 4.2. All RAxML trees were calculated utilizing the gamma distribution for rate heterogeneity and the GTR model of nucleotide substitution. The final topology of each phylogeny represents the best topology as determined by likelihood scores for 10 and 25 separate RAxML searches starting from random parsimony trees for 4.4 and

102 4.2, respectively. The initial rearrangement setting for the best tree search and number of rate categories for the gamma distribution were selected by RAxML for each run. Bootstrap replicates were found using the rapid-bootstrapping algorithm of RAxML [177] and the number of bootstrap replicates necessary for each phylogeny was determined using the RAxML frequency or the extended majority rule criteria [178]. SH-like branch support values [179] were also calculated for backbone trees using RAxML. pplacer [116] was used to insert short and redundant sequences into backbone RAxML trees. Sequences to be inserted by pplacer were aligned to the same covariance model as the backbone sequences (see Methods:Sequence analysis:Sequence Alignment) for each tree and were subject to the same column masking as the backbone sequences. Nodes below indicated support values were collapsed using a custom Python script that incorporated tree manipulation modules from PyCogent [48]. Selecting Reference Sequences. Two sets of Cyanobacterial reference SSU rRNA sequences were selected to portray the phylogeny of environmental sequences. Each reference set is discussed at length in Supplementary Methods. Accession Numbers. Nearly full-length sequences generated in this study to represent SFM-seq and the Chlorogloeopsis HTF and Fischerella phylotypes are deposited in Genbank under Accession numbers JF303685, JF303684 and JF303683 respectively. We have made the presented pyrosequence information available via our lab website (http://inside.mines.edu/ jspear/resources.html).

4.5 Supplementary Methods

Quality trimming capillary sequences The algorithm defining high quality subsequences of assemblies involved sliding a 25-character-long window across an untrimmed assembled sequence starting at both ends. The first base in the first window that had an average quality score above 25 where the initial quality score was 25 or greater determined the beginning and end point of the high quality subsequence. Bases

103 falling outside the high quality region were discarded.

Quality filtering reference dataset for taxonomic classification Sequences of poor quality or high possibility of being chimeric were removed from the reference dataset using ARB [115]. Sequence quality was determined by the “seq qual slv” and “align qual slv” fields in the Silva [18] SSURef102 NR and the “pintail slv” field was used to determine the possibility that reads were chimeric. Specifically, references sequences with seq qual slv scores less than 50, align qual slv scores less than 50 or pintail slv scores less than 40 were removed before recruitment similarly to how sequences are quality controlled before being used in the GAST classification system [134]. Selecting sequences and secondary structure information for cyanobacteria-specific covariance model Aligned sequences in SSURef102 NR used for training the covariance model were the single highest quality sequences in each 95% identity cluster of all sequences in the Silva SSURef102 NR alignment. Clustering was done with UClust through QIIME with input order to UClust determined by sequence length, longest sequences first. Sequence quality was determined by the Silva database “seq quality slv” field. Secondary structure information for the alignment was inferred from secondary structure metadata (Sequence Associated information (SAI) entries “Helix” and “Helix NR”) also incorporated in SSURef102 NR. Pseudoknots were removed using modules available in PyCogent [48] as described by [180] optimizing for maximum number of basepairs. Selecting reference sequences for phylogenies Two large sets of cyanobacterial reference SSU rRNA sequences (discussed below) were selected to portray the phylogeny of environmental sequences at different levels of resolution each is discussed below. Nearly full-length environmental sequences from clone libraries generated in this study that represented the major cyanobacterial OTUs found in our pyrosequence survey were selected by BLASTing the pyrosequence reads in each cyanobacterial OTU against the full-length libraries and then selecting the best

104 representative based on sequence quality, length and identity to OTU reads.

Broad spectrum reference set. For broad resolution analyses, all sequences classified as cyanobacteria by the RDP [19] and present in the Silva SSURef102 database were initially selected. Sequences that did not stretch from Escherichia coli position 8-1492 in the SSURef102 alignment were culled from the dataset. From this culled set, sequences were binned by taxonomic annotations into the 14 subphylum groups recognized in the RDP taxonomic nomenclature. Sequences that were unclassified below the phylum rank were discarded. To remove redundancy, each subphylum was clustered at 97% identity except for Group IIa sequences which were clustered at 99% identity to account for a higher number of available sequences. UClust [63] and QIIME [45] were used for clustering and UClust input order was determined by sequence length, longest first. The highest quality sequence as determined by its value in the Silva SSURef102 seq quality slv field was selected as the representative sequence in each cluster. From this reduced, high-quality representative set, a multiple alignment was created and well aligned positions were selected (see Methods:Sequence Analysis:Sequence alignment) for phylogenetic reconstruction using FastTree with default parameters [181]. From the FastTree phylogeny, clades with the greatest descending branch length for which 95% of sequences shared the same subphylum level taxonomic annotation were selected as the defining clades for the consensus subphylum annotation. Only one clade was selected for each subphylum group. Sequences that fell outside the defining clades or had taxonomic annotations that did not match the consensus taxonomy of the defining clades were considered misannotations and discarded. After the initial selection of subphylum, group-defining sequences was completed, the process was iterated changing the definition of defining nodes from 95% identical taxonomies to monophyletic. Each iteration worked from the defining set determined in the previous step until no sequences needed to be removed. The final near-full length,

105 broad-resolution, annotated, cyanobacterial reference set after iterative refinement included 244 sequences.

Reference sequences from previous molecular studies of living stromatolites. 16S rRNA sequences generated in eight molecular studies of living stromatolites were used to construct a reference set of cyanobacterial sequences associated with stromatolite growth. The eight previous studies were reviewed by Foster etal. (2011). The studies as a whole encompass 16S rRNA surveys of stromatolites from Shark Bay, Australia; Highborne Cay, Bahamas and Ruidera Pools, Spain. Sequences were downloaded from the Silva website [18] based on the accession numbers given in each paper and inserted into the SSURef104 NR guide tree using the parsimony insertion tool in ARB [115]. All sequences that fell in the cyanobacteria clade of the guide tree and had alignment quality values greater than 75 and Pintail quality values greater than 50 were exported from ARB and clustered at 97% identity using UClust with input order determined by sequence length. Sets of sequences from each study were clustered separately. Seed sequences were selected to represent each 97OTU. Nearly-full length reference sequences from the SSURef104 NR were selected based on their relationship to the stromatolite cyanobacterial sequences by navigating the SSURef104 NR guide tree in ARB. The nearly-full length sequences plus the nearly full-length 16 rRNA sequences from the major phylotypes in this study were aligned and masked using SSU-Align [174] with the described cyanobacterial covariance model (above and Methods:Sequence Analysis:Alignment). The nearly full-length reference sequences were used to construct the tree and seed sequences from the 97identity OTUs constructed from the eight molecular surveys discussed above were aligned with the cyanobacterial covariance model using SSU-Align and inserted into the tree using pplacer [116] in maximum likelihood mode.

Distribution of cyanobacterial sequences. Environmental sequences generated in this study were considered to have come from cyanobacteria if their nearest neighbors –

106 found by BLAST searching against the Silva SSURef102 NR 16S database – were annotated as cyanobacteria in the Silva taxonomic nomenclature. Cyanobacterial sequences were clustered at 100% identity and one sequence in each cluster was chosen as the cluster representative. To allow for a more legible plot, singleton and doubleton data points were removed from 4.3. The cyanobacterial representative set was aligned and masked as described (Methods:Sequence Analysis:Sequence alignment). To assess shared cyanobacterial diversity between surface samples, a dendrogram displaying the clustering pattern of the cyanobacteria representative set based on the multiple sequence alignment was constructed using the furthest neighbor, hierarchical clustering algorithm in hcluster (Damian Eads, code.google.com/p/scipy-cluster). Additionally, a heatmap displaying relative abundance of cyanobacterial sequences in the dendrogram was constructed with custom Python scripts that incorporated Python modules from Matplotlib [182]. The distances used for clustering were substitutions per site values between sequences in the alignment. All pairwise distances were calculated using a custom Python script and modules in PyCogent. The resulting figure is included as 4.3.

Rank abundance plots. Rank abundance distributions of pyrosequence libraries (4.2B) were constructed from OTU counts after clustering sequences at 97identity using UClust. Read abundance dictated the order of sequences considered by UClust with more abundant reads considered first. OTU counts of libraries representing the same mat type (SFM and CDM) were combined.

Sequence Identity Histogram. Sequence identity values shown in 4.5B were calculated by using UClust to globally align all sequences in the Silva SSURef104 database to SFM-seq (“SFM-seq” discussed in results). The histogram plot was created using Matplotlib.

107 CHAPTER 5 TRENDS IN MICROBIAL COMMUNITY ALPHA AND BETA DIVERSITY IN HOT-SPRING OUTFALLS AT YELLOWSTONE NATIONAL PARK, WYOMING

Many studies of outfall microbial diversity in hot-springs at Yellowstone National Park. Wyoming (YNP) have focused on a few or even a single outfall [68–70, 82, 183, 184]. Comparatively few recent studies have focused on multiple outfalls covering a significant geographical distance in the Yellowstone geothermal complex [150, 185–187]. Though few in number, marker gene surveys of multiple geothermal sites have described many interesting ecological phenomena. Ross et al. [186] used Sanger sequencing technology and culture independent methods to survey 16S rRNA genes in YNP orange mats commonly found in hot-spring outfalls and demonstrated the extent of a recently discovered and described Acidobacterial chlorophototroph throughout the park. Also using Sanger technology Boyd et al. [187] and Hamilton et al. [185] probed for diagnostic markers of fermentation and nitrogen fixation, respectively, to further understand the ecology of microbial functional guilds over a wide array of hot-springs and found that dispersal limitation is a key ecological consideration for the guilds studied. Although not an outfall study, Spear et al. [150] observed an over-representation of hydrogen-oxidizing microbes near hot-spring vents and hypothesized and later confirmed that hydrogen is a thermodynamically favorable electron-donor in hot-spring systems. While Sanger sequencing technology has been used to make many key findings at YNP, Next Generation Sequencing (NGS) technology affords microbial ecologists the opportunity to greatly expand the number of environments that can be characterized by culture independent methods in a single sequencing run. Miller et al. [68] used NGS methods to survey two YNP outfalls, Rabbit and White Creek, via the bacterial SSU rRNA gene and showed that in the two outfalls temperature is highly influential in

108 structuring and determining the microbial membership. Osburn et al. [3] utilized NGS to characterize different environments that were further analyzed for hydrogen-isotope fractionation in lipids as an analog for isotope signatures found in the rock record. NGS-based SSU rRNA gene surveys can be used to probe many outstanding questions based on previous studies in YNP. For example, it is unclear if the Synechococcus-like phylotypes found in studies of the alkaline, chloride Octopus Spring and its outfalls predominate in other outfalls. The qualitatively similar texture, color, geochemistry and temperature of outfalls in other alkaline chloride springs like those in the Sentinel Meadow would suggest that that the microbiology of Octopus Spring can be reliably extrapolated. Additionally, it is not known how microbial membership changes in siliceous versus calcareous geochemical outfall platforms or if temperature is influential in determining microbial membership and structure across many different environments. Further, microbial structure and membership profiles are useful as foundations upon which new ecological hypotheses can be advanced. Here we describe trends in microbial alpha and beta diversity from samples covering six qualitative environment types (streamers, high and low temperature cyanobacterial mats, yellow biofilms, carbonate biomats, and sulfur-rich sediments) in addition to core samples taken from microbial mats along the Imperial Geyser outfall to address the above questions and provide context for further investigations of microbial ecology in YNP hot-springs.

5.1 Results

The following results incorporate data from Osburn et al. [3] as well as an unpublished dataset from core samples of microbial mats along the Imperial Geyser outfall.

5.1.1 Beta Diversity

The data presented in Osburn et al. [3] incorporates samples taken from six discrete environment types that span multiple temperatures and geochemistry. As shown in 5.1, a principal coordinate plot of unweighted Unifrac [117] distances between samples, the

109 microbial membership (as determined by surveying SSU rRNA genes) clusters by environment type. The membership clustering is further depicted in 5.2, a heatmap of OTU abundance profiles. Some environments appear to be highly skewed with low richness and possess specific OTUs that are diagnostic for the environment type while other communities are more even in their OTU abundance distribution and have many overlapping members with other environments. To probe whether temperature is correlated with microbial community structure a principal coordinate plot of weighted Unifrac distances between all samples including the samples taken from the Imperial Geyser outfall was constructed with points colored by the temperature at the sampling site. A strong correlation between temperature and the first principal coordinate is shown in 5.3 and the first principal coordinate accounts for a significant amount of the variance in Unifrac distances. The temperature correlation suggests that temperature is an influential parameter for shaping microbial membership and structure. Temperature has been demonstrated as strongly influencing microbial membership structure previously in YNP at the geothermal White and Rabbit creeks. Interestingly there appears to be a secondary membership clustering by platform geochemistry, siliceous versus calcareous, of the data presented in Osburn et al. [3] (5.4). To our knowledge a formal description of microbes endemic to either silica or carbonate geothermal features has not been presented. The clustering correlated with geochemical platform does not appear to hold for beta diversity indices that consider relative abundance (see 5.2). A possible cause is that the carbonate-platform communities have several low abundance lineages in common that are not found in the silica-platform samples. If so, the abundance weighted versus unweighted beta diversity distances could be significanly different.

110 Figure 5.1: Principal coordinate plot of Unifrac distances between 16S gene sequence li- braries presented by Osburn et al. [3]. Points are colored by environment types described by Osburn et al. [3]. red - carbonate, blue - high temperature orange mat, yellow - unclas- sified, aquamarine - yellow biofilm, green - streamers and purple - sulfur rich sediments.

111 Figure 5.2: Heatmap of OTU abundance distributions for all 16S gene sequence libraries discussed in this study. Only top 35 OTUs across all samples are shown. Colors of sample names on x-axis correspond to coloring of 5.1 where applicable. Both dendrograms are constructed from Bray-Curtis distances.

112 Figure 5.3: Principal coordinate plot of Unifrac distances between 16S gene sequence libraries presented by Osburn et al. [3] and Imperial Geyser outfall sample libraries. Points are colored by temperature.

113 Figure 5.4: Principal coordinate plot of Unifrac distances between 16S gene sequence li- braries presented by Osburn et al. [3]. Points are colored by type of geochemical platform. red - calcareous, blue - siliceous.

114 5.1.1.1 Emphasis on cyanobacterial membership and structure

Cyanobacterial membership was specifically investigated to determine the distribution of thermophilic Synechococcus-like phylotypes studied at Octopus Spring [76]. As shown in 5.5, a heatmap of cyanobacterial OTU abundance distributions for libraries that had at least 100 cyanobacterial 16S gene sequences, different environments harbor a suite of cyanobacterial members. The types and abundances of cyanobacteria in a given sample appears to be influenced primarily by temperature (5.5). Further, the most abundant members appear to be strongly correlated with temperature and there is a correlation in abundant members with geochemical platform, carbonate or silica. At high temperatures in the “high temperature orange mats” and the “yellow biofilms” the most abundant cyanobacterial OTU is closely related to environmental 16S gene sequences recovered by Meyer-Dombard et al. [188] and also shares high sequence identity ( 98%) with the 16S gene from a Synechococcus strain (OH34) isolated from inocula collected at 65 ◦C from an Oregon hot-spring [189]. Lower temperature “high temperature orange mats” in addition to many Imperial Geyser 16S gene sequence libraries were dominated by a cyanobacterial phylotype that shows high sequence identity ( 99%) with Synechococcus sp. JA-2-3b isolated from Octopus spring [183] at a site ranging in temperature from 51-61 ◦C. The “low temperature orange mats” are dominated by a phylotype sharing high sequence identity with Leptolyngbya sp. FYG cultured from a YNP hot-spring [190]. The libraries from “carbonate” environment types had one abundant OTU in particular that was found in only one non-carbonate library. This OTU shares high sequence identity from another sequence generated from the same location as part of an unpublished study but beyond that, the carbonate specific cyanobacterial OTU does not share 95% or greater sequence identity with any sequence in the Genbank nucleotide collection [153].

115 Figure 5.5: Heatmap of cyanobacterial OTU abundance distributions for all 16S gene sequence libraries discussed in this study with at least 100 cyanobacterial sequences. Only top 15 OTUs across all samples are shown. Colors of sample names on x-axis correspond to coloring of 5.1 where applicable. Both dendrograms are constructed from Bray-Curtis distances.

116 5.2 Alpha Diversity

Miller et al. [68] previously showed that temperature had a significant impact on microbial alpha diversity in YNP microbial mats but the study was limited to two geothermal creeks. To determine if alpha diversity was similarly dependent on temperature across a wider array of geothermal features we plotted the effective Shannon diversity (exp(Shannon)) of each environment type discussed in Osburn et al. [3] as well as the Imperial Geyser samples (5.6). Additionally, temperature versus effective Shannon Diversity was plotted (5.7). There did not appear to be a strong correlation between environment type and alpha diversity with many environment types showing a wide range of effective Shannon diversities (5.6). The magnitude of the temperature/diversity correlation seems fairly small as many samples at lower relative temperatures exhibit low diversity. A sharp increase in the range of diversities observed around 70 ◦C corresponds with the switch from chemotrophy to phototrophy as a means for primary production. The yellow biofilm libraries, however, exhibit similarly low alpha diversities to chemotrophic streamer communities but they possess many cyanobacteria (5.2). The alpha diversity of just cyanobacterial sequences tracks with the diversities of the entire community 5.8.

5.3 Discussion

Three cyanobacterial OTUs appear to dominate in three separate temperature regimes in the alkaline chloride geothermal outfalls studied. At high temperatures in orange mats and yellow biofilms, a Synechococcus-like cyanobacterium predominates. This particular phylotype has been found in YNP before [188] however its temperature niche has not been discussed to our knowledge. The high-temperature OTU was found in abundance in three of the geothermal features studied (Imperial Geyser, Octopus Spring and Bison Pool) and was observed in at least low abundance in all environment types but not all samples. The phylotype appears to be specifically well suited to high

117 Figure 5.6: Effective Shannon diversity of all 16S gene sequence libraries. Points grouped by environment type and colored by temperature.

118 Figure 5.7: Effective Shannon diversity of all 16S gene sequence libraries plotted against temperature.

119 Figure 5.8: Effective Shannon diversity of cyanobacteria in all 16S gene sequence li- braries with at least 100 cyanobacterial sequences.

120 temperatures for phototrophs and may constitute a thermophilic lineage on the global scale as a close relative was cultured from a thermal spring in Oregon [189]. A second Synechococcus phylotype is dominant at mid-temperatures and appears to be a part of the same OTU as well-characterized Octopus Spring cultivars [69, 70, 183]. Early culture-based studies of Octopus Spring concluded that a single Synechococcus strain dominated the entire mat [82]. Misleading similarities in morphology between phylogenetically distant types and culture techniques that did not produce cultivars from the most abundant in situ phylotypes contributed to the incorrect finding [82] while molecular techniques elucidated the true diversity and abundant cyanobacteria at Octopus. The geographical extent of the Octopus Spring phylotypes has not previously been discussed perhaps because it is intuitive that the lessons learned from Octopus Spring will hold for very similar environments throughout YNP. Indeed, molecular surveys from this study confirm the presence of Octopus Spring OTUs throughout YNP in alkaline chloride geothermal outfalls. Unexpectedly, a fourth conspicuous OTU not previously described is found in abundance in carbonate phototrophic mats. This OTU has one unpublished relative in Genbank also derived from a carbonate hot-spring in YNP (accession AY195603). While the Octopus Spring Synechococcus cultivars are useful model organisms from which much can be extrapolated, there still exists unstudied and possibly even unobserved cyanobacterial phylotypes that act as key ecological units in many systems. An OTU of Leptolyngbya-related sequences predominates in the samples studied at the lowest temperatures. All three dominant OTUs are found in the Imperial Geyser outfall which has not received the same attention as Octopus Spring with respect to studies of cyanobacterial temperature ecotypes but it looks as if Imperial Geyser could serve as a useful foil to Octopus Spring microbiology. Lastly, the secondary clustering of 16S sequence library membership by underlying geochemical platform (silica versus carbonate) poses the hypothesis that microbes are

121 endemic to siliceous versus calcareous geothermal features and vice versa. Silica and carbonate platform endemism has not yet been addressed in YNP to our knowledge.

5.3.0.2 Temperature influence on alpha and beta diversity

While temperature was shown to have a strong influence on the beta and alpha diversity at Rabbit and White Creek [68], across multiple geothermal features and environment types temperature appears to only affect beta diversity. It may be that the sampling scope of our study is not sufficient to confidently assess temperature effects on diversity. Snapshot community profiles of limited biomass and single time-point temperature measurements do not definitively represent the spatial diversity and overall physiological conditions of YNP phototrophic mats. Setchell [73] discusses the problems associated with measuring the living temperatures of hot-spring inhabitants where unpredictable water currents can cause significant fluctuations in water temperatures at a single site and temperatures can increase and decrease 10-15 ◦C over centimeters laterally. Additionally, Berelson et al. [145] showed that the temperatures fluctuate with ambient conditions. The samples taken for Osburn et al. [3] were not meant for a YNP-wide study of temperature/alpha diversity relationships but do pose two interesting hypotheses

• Phototrophic communities at the upper temperature limit for oxygenic phototrophic growth can exhibit similarly low diversities as high-temperature chemotrophy based communities

• relatively lower temperature ecosystems do not show higher diversity but rather a greater range in alpha diversities.

The first hypothesis, if true, would suggest that temperature, below a certain limit, and not the products of photosynthesis allows for the diversity bloom seen in mesothermic YNP ecosystems as compared to high temperature ecosystems dominated

122 by chemolithoautotrophs. Miller et al. [68] proposed that cyanobacteria and green non-sulfur bacteria would compete for carbon dioxide in microbial mat ecosystems. Carbon dioxide may be limiting in hot water. Perhaps the low diversity of the yellow biofilms and streamer communities is a function of carbon dioxide which becomes more soluble in cooler downstream waters.

5.4 Methods

Samples from Osburn et al. [3] are named with a prefix that denotes the sample location. B denotes Bison, IG Imperial Geyser, BS Boulder Spring, NG Narrow Gauge and OS is Octopus Spring. Osburn et al. [3] includes a comprehensive description of all samples. Library names with the IM prefix are the unpublished Imperial Geyser samples. Samples were collected at four locations along the outfall. The number after the prefix denotes which location the samples were collected with number 1 being the location closest to the outfall source and number 4 being the furthest. The four sites span a distance on the order of 100 meters. Water temperature was recorded from each site.

5.4.1 DNA Extraction, PCR and Sequencing

A thorough description of laboratory methods can be found in Osburn et al. [3]. Identical laboratory methods were used to produce the samples from Imperial Geyser. Briefly, for each sample SSU rRNA genes were amplified from community DNA using barcoded 16S primers that amplified the 515-907 (E. coli coordinates) region of the SSU rRNA gene. The primer targets were bioinformatically analyzed and literature primers improved per the results of the analysis prior to PCR. Sequencing for all samples was done with Roche 454 technology on the Titanium platform.

5.4.2 Data analysis

All sequence flowgrams were screened using parameters suggested by Huse et al. [30]. Poor quality reads were discarded. Reads passing quality control standards were

123 “de-noised” using the Qiime 454 flowgram denoising algorithm [39]. The steps towards assigning reads to OTUS were as follows: Reads were aligned using the bacterial and archaeal reference alignments corresponding to the Silva database alignment [18] as distributed on the Mothur [46] website and the Mothur NAST aligner. Pairwise distances between reads were calculated treating all indels as a single column and considering leading and trailing gaps although all alignments were trimmed to alignment coordinates such that all sequences overlapped from beginning to end prior to computing distances. Finally, distances were clustered by the average-neighbor algorithm and reads were grouped in 97% average identity OTUs. All plots were created using Matplotlib [182]. Shannon diversity [8] values were calculated with QIIME and converted to effective values as described by Jost [191] to represent diversity in values that correspond to number of evenly abundant classes (in this case OTUs). Unifrac values were calculated in QIIME [45]. The tree required for Unifrac calculations was computed from a multiple sequence alignment (methods above) using FastTree [181] and the tree was rooted between archaeal and bacterial clades. Bray-Curtis distances used to create dendrograms that flank heatmaps were calculated with scipy hcluster [192] archaeal and bacterial clades.

124 CHAPTER 6 CONCLUSION AND FUTURE WORK

Modern marine stromatolites in Shark Bay, Australia and Highborne Cay, Bahamas are intriguing models for pH regulated stromatolite morphogenesis in marine environments. Unfortunately, the coarse texture of the marine stromatolite analogs is not consistent with finely-laminated stromatolites in the rock-record. The YNP stromatolite presented in this dissertation is finely laminated and is therefore is a better textural analog to ancient stromatolites. Molecular studies of the YNP living stromatolite show that a single cyanobacterial ribotype constructs the stromatolite. The stromatolite-building cyanobacterium is not found in other YNP hot springs nor has its rRNA gene phylogeny been previously described. The fluctuating water level of the hot spring where the YNP stromatolite is found elicits a profound re-structuring of the stromatolite microbial community. The current state of cyanobacterial systematics is not sufficient for describing the YNP stromatolite building cyanobacterium. Moreover, cyanobacterial systematics between major curators of rRNA gene databases is highly inconsistent. The cyanobacterial systematics inconsistency between SSU rRNA gene reference databases could manifest in conflicting ecological conclusions drawn from microbial diversity studies where taxonomic annotations of environmental sequences are used as ecological units. The genomic content of the deep-branching and novel cyanobacterial rRNA genotype that builds the YNP stromatolites will expand the known diversity of the cyanobacterial pangenome. Additionally, the stromatolite-building cyanobacterium constitutes a novel genus within cyanobacteria. Future work will focus on understanding the physiology of the stromatolite-building cyanobacterium though genomics. In situ studies will utilize genomic information to determine the ecology of the stromatolite-building

125 cyanobacterium as it relates to measurable environmental variables including pH, temperature and light adaptation. Laboratory studies will establish the physiology of the stromatolite building cyanobacterium such that it can be officially recognized taxonomically within the bacteriological code.

126 REFERENCES CITED

[1] A. Wilmotte and M. Herdman. Phylogenetic relationships among the cyanobacteria based on 16S rRNA sequences. In Bergey’s Manual of Systematic Bacteriology. Volume 1: The Archaea and the Deeply Branching and Phototropic Bacteria, pages 487–493. Second edition, 2001.

[2] Jamie Foster and Stefan Green. Microbial diversity in modern stromatolites. In Vinod Tewari and Joseph Seckbach, editors, STROMATOLITES: Interaction of Microbes with Sediments, Cellular Origin, Life in Extreme Habitats and Astrobiology, volume 18, pages 385–405. Springer Netherlands, 2011. URL http://dx.doi.org/10.1007/978-94-007-0397-1.

[3] Magdalena R. Osburn, Alex L. Sessions, Charles Pepe-Ranney, and John R. Spear. Hydrogen-isotopic variability in fatty acids from yellowstone national park hot spring microbial communities. Geochimica et Cosmochimica Acta, 75(17): 4830–4845, September 2011. ISSN 0016-7037. doi: 10.1016/j.gca.2011.05.038. URL http://www.sciencedirect.com/science/article/pii/S0016703711003152.

[4] A. S. Razumov. The direct method of calculation of bacteria in water: comparison with the koch method. Mikrobiologija, 1:131146, 1932.

[5] J T Staley and A Konopka. Measurement of in situ activities of nonphotosynthetic microorganisms in aquatic and terrestrial habitats. Annual review of microbiology, 39:321–346, 1985. ISSN 0066-4227. doi: 10.1146/annurev.mi.39.100185.001541. PMID: 3904603.

[6] Anthony D’Onofrio, Jason M. Crawford, Eric J. Stewart, Kathrin Witt, Ekaterina Gavrish, Slava Epstein, Jon Clardy, and Kim Lewis. Siderophores from neighboring organisms promote the growth of uncultured bacteria. Chemistry & Biology, 17(3):254–264, March 2010. ISSN 1074-5521. doi: 10.1016/j.chembiol.2010.02.010. URL http://www.cell.com/chemistry-biology/abstract/S1074-5521(10)00079-7.

[7] John R. Postgate. The Outer Reaches of Life. Cambridge University Press, 1st edition, March 1994. ISBN 0521440106.

[8] Anne E. Magurran. Measuring Biological Diversity. Wiley-Blackwell, December 2003. ISBN 0632056339.

127 [9] Norman R Pace. Mapping the tree of life: progress and prospects. Microbiology and molecular biology reviews: MMBR, 73(4):565–576, December 2009. ISSN 1098-5557. doi: 10.1128/MMBR.00033-09. PMID: 19946133.

[10] Dana Willner, Joshua Daly, David Whiley, Keith Grimwood, Claire E. Wainwright, and Philip Hugenholtz. Comparison of DNA extraction methods for microbial community profiling with an application to pediatric bronchoalveolar lavage samples. PLoS ONE, 7(4):e34605, April 2012. doi: 10.1371/journal.pone.0034605. URL http://dx.doi.org/10.1371/journal.pone.0034605.

[11] Kaisu Rantakokko-Jalava and Jari Jalava. Optimal DNA isolation method for detection of bacteria in clinical specimens by broad-range PCR. Journal of clinical microbiology, 40(11):4211–4217, November 2002. ISSN 0095-1137. PMID: 12409400.

[12] Jeremy A. Frank, Claudia I. Reich, Shobha Sharma, Jon S. Weisbaum, Brenda A. Wilson, and Gary J. Olsen. Critical evaluation of two commonly-used primers for amplification of bacterial 16S rRNA genes – frank et al., 10.1128/AEM.02272-07 – applied and environmental microbiology. http://aem.asm.org/cgi/content/abstract/AEM.02272-07v1. URL http://aem.asm.org/cgi/content/abstract/AEM.02272-07v1.

[13] Gaddy T. Bergmann, Scott T. Bates, Kathryn G. Eilers, Christian L. Lauber, J. Gregory Caporaso, William A. Walters, Rob Knight, and Noah Fierer. The under-recognized dominance of verrucomicrobia in soil bacterial communities. Soil biology & biochemistry, 43(7):1450–1455, July 2011. ISSN 0038-0717. doi: 10.1016/j.soilbio.2011.03.012. URL http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3260529/. PMID: 22267877 PMCID: PMC3260529.

[14] Eric S. Boyd, John R. Spear, and John W. Peters. [FeFe] hydrogenase genetic diversity provides insight into molecular adaptation in a saline microbial mat community. Applied and Environmental Microbiology, 75(13):4620–4623, July 2009. ISSN 0099-2240. doi: 10.1128/AEM.00582-09. PMID: 19429563 PMCID: 2704818.

[15] G J Olsen and C R Woese. Ribosomal RNA: a key to phylogeny. The FASEB Journal: Official Publication of the Federation of American Societies for Experimental Biology, 7(1):113–23, January 1993. ISSN 0892-6638. URL http://www.ncbi.nlm.nih.gov/pubmed/8422957. PMID: 8422957.

128 [16] William A Walters, J Gregory Caporaso, Christian L Lauber, Donna Berg-Lyons, Noah Fierer, and Rob Knight. PrimerProspector: de novo design and taxonomic analysis of barcoded polymerase chain reaction primers. Bioinformatics (Oxford, England), 27(8):1159–1161, April 2011. ISSN 1367-4811. doi: 10.1093/bioinformatics/btr087. PMID: 21349862.

[17] TZ DeSantis, P Hugenholtz, N Larsen, M Rojas, EL Brodie, K Keller, T Huber, D Dalevi, P Hu, and GL Andersen. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Applied and Environmental Microbiology, 72(7):5069–5072, July 2006. ISSN 0099-2240. doi: 10.1128/AEM.03006-05. URL http://apps.isiknowledge.com/full_record.do?product=WOS&search_mode= GeneralSearch&qid=8&SID=3CooBp1d9F27dhjDnjk&page=1&doc=1.

[18] E Pruesse, C Quast, K Knittel, BM Fuchs, WG Ludwig, J Peplies, and FO Glockner. SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. NUCLEIC ACIDS RESEARCH, 35(21):7188–7196, December 2007. ISSN 0305-1048. doi: 10.1093/nar/gkm864. URL http://apps.isiknowledge.com/full_record.do?product=WOS&search_mode= GeneralSearch&qid=16&SID=3CooBp1d9F27dhjDnjk&page=1&doc=1.

[19] J. R. Cole, Q. Wang, E. Cardenas, J. Fish, B. Chai, R. J. Farris, A. S. Kulam-Syed-Mohideen, D. M. McGarrell, T. Marsh, G. M. Garrity, and J. M. Tiedje. The ribosomal database project: improved alignments and new tools for rRNA analysis. Nucleic Acids Research, 37(Database issue):D141–D145, January 2009. ISSN 0305-1048. doi: 10.1093/nar/gkn879. PMID: 19004872 PMCID: 2686447.

[20] Zongzhi Liu, Catherine Lozupone, Micah Hamady, Frederic D Bushman, and Rob Knight. Short pyrosequencing reads suffice for accurate microbial community analysis. Nucleic Acids Research, 35(18):e120, 2007. ISSN 1362-4962. doi: 10.1093/nar/gkm541. URL http://www.ncbi.nlm.nih.gov/pubmed/17881377. PMID: 17881377.

[21] Zongzhi Liu, Todd Z. DeSantis, Gary L. Andersen, and Rob Knight. Accurate taxonomy assignments from 16S rRNA sequences produced by highly parallel pyrosequencers. Nucl. Acids Res., 36(18):e120, October 2008. doi: 10.1093/nar/gkn491. URL http://nar.oxfordjournals.org/cgi/content/abstract/36/18/e120.

129 [22] GC Baker, JJ Smith, and DA Cowan. Review and re-analysis of domain-specific 16S primers. JOURNAL OF MICROBIOLOGICAL METHODS, 55(3):541–555, December 2003. ISSN 0167-7012. URL http://apps.isiknowledge.com/full_record.do?product=WOS&search_mode= GeneralSearch&qid=1&SID=4CJkg6JG42Hep3CH8Gj&page=1&doc=1.

[23] Anna Klindworth, Elmar Pruesse, Timmy Schweer, Jrg Peplies, Christian Quast, Matthias Horn, and Frank Oliver Glckner. Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies. Nucleic acids research, October 2012. ISSN 1362-4962. doi: 10.1093/nar/gks808. PMID: 22933715.

[24] DJ Lane. 16S/23S rRNA sequencing. In E Stackebrandt and M Goodfellow, editors, Nucleic Acid Techniques in Bacteria Systematics, page 329. Wiley and Sons, Chichester, UK, 1991.

[25] Norman R. Pace. Mapping the tree of life: Progress and prospects. Microbiology and Molecular Biology Reviews, 73(4):565–576, December 2009. ISSN 1092-2172, 1098-5557. doi: 10.1128/MMBR.00033-09. URL http://mmbr.asm.org/content/73/4/565.

[26] Patrick D. Schloss, Dirk Gevers, and Sarah L. Westcott. Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-Based studies. PLoS ONE, 6(12):e27310, December 2011. doi: 10.1371/journal.pone.0027310. URL http://dx.doi.org/10.1371/journal.pone.0027310.

[27] Barry L. Karger and Andras Guttman. DNA sequencing by capillary electrophoresis. Electrophoresis, 30(Suppl 1):S196–S202, June 2009. ISSN 0173-0835. doi: 10.1002/elps.200900218. URL http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2782523/. PMID: 19517496 PMCID: PMC2782523.

130 [28] Marcel Margulies, Michael Egholm, William E. Altman, Said Attiya, Joel S. Bader, Lisa A. Bemben, Jan Berka, Michael S. Braverman, Yi-Ju Chen, Zhoutao Chen, Scott B. Dewell, Lei Du, Joseph M. Fierro, Xavier V. Gomes, Brian C. Godwin, Wen He, Scott Helgesen, Chun He Ho, Gerard P. Irzyk, Szilveszter C. Jando, Maria L. I. Alenquer, Thomas P. Jarvie, Kshama B. Jirage, Jong-Bum Kim, James R. Knight, Janna R. Lanza, John H. Leamon, Steven M. Lefkowitz, Ming Lei, Jing Li, Kenton L. Lohman, Hong Lu, Vinod B. Makhijani, Keith E. McDade, Michael P. McKenna, Eugene W. Myers, Elizabeth Nickerson, John R. Nobile, Ramona Plant, Bernard P. Puc, Michael T. Ronan, George T. Roth, Gary J. Sarkis, Jan Fredrik Simons, John W. Simpson, Maithreyan Srinivasan, Karrie R. Tartaro, Alexander Tomasz, Kari A. Vogt, Greg A. Volkmer, Shally H. Wang, Yong Wang, Michael P. Weiner, Pengguang Yu, Richard F. Begley, and Jonathan M. Rothberg. Genome sequencing in microfabricated high-density picolitre reactors. Nature, 437(7057): 376–380, July 2005. ISSN 0028-0836. doi: 10.1038/nature03959. URL http://www.nature.com/nature/journal/v437/n7057/abs/nature03959.html.

[29] Micah Hamady, Jeffrey J. Walker, J. Kirk Harris, Nicholas J. Gold, and Rob Knight. Error-correcting barcoded primers allow hundreds of samples to be pyrosequenced in multiplex. Nature methods, 5(3):235–237, March 2008. ISSN 1548-7091. doi: 10.1038/nmeth.1184. URL http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3439997/. PMID: 18264105 PMCID: PMC3439997.

[30] SM Huse, JA Huber, HG Morrison, ML Sogin, and D Mark Welch. Accuracy and quality of massively parallel DNA pyrosequencing. GENOME BIOLOGY, 8(7), 2007. ISSN 1474-760X. doi: 10.1186/gb-2007-8-7-r143. URL http://apps.isiknowledge.com/full_record.do?product=WOS&search_mode= GeneralSearch&qid=1&SID=2CK18MD1Kdo57ELPg74&page=1&doc=2.

[31] B Ewing and P Green. Base-calling of automated sequencer traces using phred. II. error probabilities. Genome Research, 8(3):186–194, March 1998. ISSN 1088-9051. URL http://www.ncbi.nlm.nih.gov/pubmed/9521922. PMID: 9521922.

[32] B Ewing, L Hillier, M C Wendl, and P Green. Base-calling of automated sequencer traces using phred. i. accuracy assessment. Genome Research, 8(3):175–185, March 1998. ISSN 1088-9051. URL http://www.ncbi.nlm.nih.gov/pubmed/9521921. PMID: 9521921.

[33] Jens Reeder and Rob Knight. Rapidly denoising pyrosequencing amplicon reads by exploiting rank-abundance distributions. Nat Meth, 7(9):668–669, 2010. ISSN 1548-7091. doi: 10.1038/nmeth0910-668b. URL http://dx.doi.org/10.1038/nmeth0910-668b.

131 [34] Kevin E. Ashelford, Nadia A. Chuzhanova, John C. Fry, Antonia J. Jones, and Andrew J. Weightman. At least 1 in 20 16S rRNA sequence records currently held in public repositories is estimated to contain substantial anomalies. Applied and Environmental Microbiology, 71(12):7724–7736, December 2005. ISSN 0099-2240, 1098-5336. doi: 10.1128/AEM.71.12.7724-7736.2005. URL http://aem.asm.org/content/71/12/7724.

[35] Mitchell L. Sogin, Hilary G. Morrison, Julie A. Huber, David Mark Welch, Susan M. Huse, Phillip R. Neal, Jesus M. Arrieta, and Gerhard J. Herndl. Microbial diversity in the deep sea and the underexplored rare biosphere. Proceedings of the National Academy of Sciences of the United States of America, 103(32): 12115–12120, August 2006. ISSN 0027-8424. doi: 10.1073/pnas.0605127103. PMID: 16880384 PMCID: 1524930.

[36] Victor Kunin, Anna Engelbrektson, Howard Ochman, and Philip Hugenholtz. Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environmental Microbiology, 12(1):118–123, January 2010. ISSN 1462-2920. doi: 10.1111/j.1462-2920.2009.02051.x. URL http://www.ncbi.nlm.nih.gov/pubmed/19725865. PMID: 19725865.

[37] Jens Reeder and Rob Knight. The ’rare biosphere’: a reality check. Nature Methods, 6(9):636–637, 2009. ISSN 1548-7091. doi: 10.1038/nmeth0909-636. URL http://www.nature.com/nmeth/journal/v6/n9/abs/nmeth0909-636.html.

[38] C Quince, A Lanzen, TP Curtis, RJ Davenport, N Hall, IM Head, LF Read, and WT Sloan. Accurate determination of microbial diversity from 454 pyrosequencing data. NATURE METHODS, 6(9):639–U27, September 2009. ISSN 1548-7091. doi: 10.1038/NMETH.1361. URL http://apps.isiknowledge.com/full_record.do?product=WOS&search_mode= GeneralSearch&qid=1&SID=1CiEE1EGbkfhjHepi@5&page=1&doc=1.

[39] Jens Reeder and Rob Knight. Rapidly denoising pyrosequencing amplicon reads by exploiting rank-abundance distributions. Nat Meth, 7(9):668–669, 2010. ISSN 1548-7091. doi: 10.1038/nmeth0910-668b. URL http://dx.doi.org/10.1038/nmeth0910-668b.

[40] Christopher Quince, Anders Lanzen, Russell J Davenport, and Peter J Turnbaugh. Removing noise from pyrosequenced amplicons. BMC bioinformatics, 12:38, 2011. ISSN 1471-2105. doi: 10.1186/1471-2105-12-38. PMID: 21276213.

132 [41] J. Gregory Caporaso, Christian L. Lauber, William A. Walters, Donna Berg-Lyons, Catherine A. Lozupone, Peter J. Turnbaugh, Noah Fierer, and Rob Knight. Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proceedings of the National Academy of Sciences, June 2010. ISSN 0027-8424, 1091-6490. doi: 10.1073/pnas.1000080107. URL http://www.pnas.org/content/early/2010/06/02/1000080107.

[42] Susan M Huse, David Mark Welch, Hilary G Morrison, and Mitchell L Sogin. Ironing out the wrinkles in the rare biosphere through improved OTU clustering. Environmental Microbiology, 12(7):1889–1898, July 2010. ISSN 1462-2920. doi: 10.1111/j.1462-2920.2010.02193.x. URL http://www.ncbi.nlm.nih.gov/pubmed/20236171. PMID: 20236171.

[43] Brian J Haas, Dirk Gevers, Ashlee M Earl, Mike Feldgarden, Doyle V Ward, Georgia Giannoukos, Dawn Ciulla, Diana Tabbaa, Sarah K Highlander, Erica Sodergren, Barbara Meth, Todd Z DeSantis, Joseph F Petrosino, Rob Knight, and Bruce W Birren. Chimeric 16S rRNA sequence formation and detection in sanger and 454-pyrosequenced PCR amplicons. Genome Research, 21(3):494–504, March 2011. ISSN 1549-5469. doi: 10.1101/gr.112730.110. URL http://www.ncbi.nlm.nih.gov/pubmed/21212162. PMID: 21212162.

[44] Robert C. Edgar, Brian J. Haas, Jose C. Clemente, Christopher Quince, and Rob Knight. UCHIME improves sensitivity and speed of chimera detection. Bioinformatics, 27(16):2194–2200, August 2011. ISSN 1367-4803, 1460-2059. doi: 10.1093/bioinformatics/btr381. URL http://bioinformatics.oxfordjournals.org/content/27/16/2194.

[45] Gregory Caporaso, Justin Kuczynski, Jesse Stombaugh, Kyle Bittinger, Frederic Bushman, Elizabeth Costello, Noah Fierer, Antonio Pena, Julia Goodrich, Jeffrey Gordon, Gavin Huttley, Scott Kelley, Dan Knights, Jeremy Koenig, Ruth Ley, Catherine Lozupone, Daniel McDonald, Brian Muegge, Meg Pirrung, Jens Reeder, Joel Sevinsky, Peter Turnbaugh, William Walters, Jeremy Widmann, Tanya Yatsunenko, Jesse Zaneveld, and Rob Knight. QIIME allows analysis of high-throughput community sequencing data. Nature Methods, 7(5):335–336, April 2010. ISSN 1548-7091. URL http://dx.doi.org/10.1038/nmeth.f.303.

[46] Patrick D. Schloss, Sarah L. Westcott, Thomas Ryabin, Justine R. Hall, Martin Hartmann, Emily B. Hollister, Ryan A. Lesniewski, Brian B. Oakley, Donovan H. Parks, Courtney J. Robinson, Jason W. Sahl, Blaz Stres, Gerhard G. Thallinger, David J. Van Horn, and Carolyn F. Weber. Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Applied and Environmental Microbiology, 75 (23):7537–7541, December 2009. ISSN 0099-2240, 1098-5336. doi: 10.1128/AEM.01541-09. URL http://aem.asm.org/content/75/23/7537.

133 [47] QIIME. http://sourceforge.net/projects/qiime/. URL http://sourceforge.net/projects/qiime/.

[48] Rob Knight, Peter Maxwell, Amanda Birmingham, Jason Carnes, J Gregory Caporaso, Brett Easton, Michael Eaton, Micah Hamady, Helen Lindsay, Zongzhi Liu, Catherine Lozupone, Daniel McDonald, Michael Robeson, Raymond Sammut, Sandra Smit, Matthew Wakefield, Jeremy Widmann, Shandy Wikman, Stephanie Wilson, Hua Ying, and Gavin Huttley. PyCogent: a toolkit for making sense from sequence. Genome Biology, 8(8):R171, 2007. ISSN 1465-6906. doi: 10.1186/gb-2007-8-8-r171. URL http://genomebiology.com/2007/8/8/R171.

[49] Qiong Wang, George M. Garrity, James M. Tiedje, and James R. Cole. Naive bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol., 73(16):5261–5267, August 2007. doi: 10.1128/AEM.00062-07. URL http://aem.asm.org/cgi/content/abstract/73/16/5261.

[50] Robert C. Edgar. Search and clustering orders of magnitude faster than BLAST. Bioinformatics, August 2010. doi: 10.1093/bioinformatics/btq461. URL http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btq461v1.

[51] Xiaoyu Wang, Yunpeng Cai, Yijun Sun, Rob Knight, and Volker Mai. Secondary structure information does not improve OTU assignment for partial 16s rRNA sequences. The ISME Journal, 6(7):1277–1280, 2012. ISSN 1751-7362. doi: 10.1038/ismej.2011.187. URL http://www.nature.com/ismej/journal/v6/n7/full/ismej2011187a.html.

[52] Patrick D. Schloss. Secondary structure improves OTU assignments of 16S rRNA gene sequences. The ISME Journal, January . ISSN 1751-7362. doi: 10.1038/ismej.2012.102. URL http: //www.nature.com/ismej/journal/vaop/ncurrent/full/ismej2012102a.html.

[53] Patrick D. Schloss and Sarah L. Westcott. Assessing and improving methods used in operational taxonomic unit-based approaches for 16S rRNA gene sequence analysis. Applied and Environmental Microbiology, 77(10):3219–3226, May 2011. ISSN 0099-2240, 1098-5336. doi: 10.1128/AEM.02810-10. URL http://aem.asm.org/content/77/10/3219.

[54] Lu Cheng, Alan W. Walker, and Jukka Corander. Bayesian estimation of bacterial community composition from 454 sequencing data. Nucleic Acids Research, March 2012. ISSN 0305-1048, 1362-4962. doi: 10.1093/nar/gks227. URL http://nar.oxfordjournals.org/content/early/2012/03/09/nar.gks227.

134 [55] Xiaolin Hao, Rui Jiang, and Ting Chen. Clustering 16S rRNA for OTU prediction: a method of unsupervised bayesian clustering. Bioinformatics (Oxford, England), 27 (5):611–618, March 2011. ISSN 1367-4811. doi: 10.1093/bioinformatics/btq725. PMID: 21233169.

[56] Yijun Sun, Yunpeng Cai, Susan M Huse, Rob Knight, William G Farmerie, Xiaoyu Wang, and Volker Mai. A large-scale benchmark study of existing algorithms for taxonomy-independent microbial community analysis. Briefings in bioinformatics, 13(1):107–121, January 2012. ISSN 1477-4054. doi: 10.1093/bib/bbr009. PMID: 21525143.

[57] Maksim Sipos, Patricio Jeraldo, Nicholas Chia, Ani Qu, A. Singh Dhillon, Michael E. Konkel, Karen E. Nelson, Bryan A. White, and Nigel Goldenfeld. Robust computational analysis of rRNA hypervariable tag datasets. PLoS ONE, 5 (12):e15220, December 2010. doi: 10.1371/journal.pone.0015220. URL http://dx.doi.org/10.1371/journal.pone.0015220.

[58] CA Lozupone, M Hamady, ST Kelley, and R Knight. Quantitative and qualitative beta diversity measures lead to different insights into factors that structure microbial communities. APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 73 (5):1576–1585, March 2007. ISSN 0099-2240. URL http://apps.isiknowledge.com/full_record.do?product=WOS&search_mode= GeneralSearch&qid=49&SID=3CC3DGbBEC1ifpEdDPl&page=1&doc=1.

[59] C Lozupone and R Knight. UniFrac: a new phylogenetic method for comparing microbial communities. APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 71 (12):8228–8235, December 2005. ISSN 0099-2240. URL http://apps.isiknowledge.com/full_record.do?product=WOS&search_mode= GeneralSearch&qid=25&SID=3CC3DGbBEC1ifpEdDPl&page=1&doc=5.

[60] Micah Hamady and Rob Knight. Microbial community profiling for human microbiome projects: Tools, techniques, and challenges. Genome Research, April 2009. ISSN 1088-9051. doi: 10.1101/gr.085464.108. URL http://www.ncbi.nlm.nih.gov/pubmed/19383763. PMID: 19383763.

[61] Catherine Lozupone, Manuel E Lladser, Dan Knights, Jesse Stombaugh, and Rob Knight. UniFrac: an effective distance metric for microbial community comparison. The ISME journal, 5(2):169–172, February 2011. ISSN 1751-7362. doi: 10.1038/ismej.2010.133. URL http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3105689/. PMID: 20827291 PMCID: PMC3105689.

135 [62] Yunpeng Cai and Yijun Sun. ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time. Nucleic Acids Research, May 2011. ISSN 0305-1048, 1362-4962. doi: 10.1093/nar/gkr349. URL http://nar.oxfordjournals.org/content/early/2011/05/19/nar.gkr349.

[63] Robert C Edgar. Search and clustering orders of magnitude faster than BLAST. Bioinformatics (Oxford, England), 26(19):2460–2461, October 2010. ISSN 1367-4811. doi: 10.1093/bioinformatics/btq461. URL http://www.ncbi.nlm.nih.gov/pubmed/20709691. PMID: 20709691.

[64] Weizhong Li and Adam Godzik. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics (Oxford, England), 22(13):1658–1659, July 2006. ISSN 1367-4803. doi: 10.1093/bioinformatics/btl158. URL http://www.ncbi.nlm.nih.gov/pubmed/16731699. PMID: 16731699.

[65] J D Thompson, F Plewniak, and O Poch. BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics (Oxford, England), 15(1):87–88, January 1999. ISSN 1367-4803. PMID: 10068696.

[66] EP Nawrocki, DL Kolbe, and SR Eddy. Infernal 1.0: inference of RNA alignments. BIOINFORMATICS, 25(10):1335–1337, May 2009. ISSN 1367-4803. doi: 10.1093/bioinformatics/btp157. URL http://apps.isiknowledge.com/full_record.do?product=WOS&search_mode= GeneralSearch&qid=1&SID=2EDLLC1NNEB@fk5mPE7&page=1&doc=2.

[67] Josephine E. Tilden. On some algal stalactites of the yellowstone national park. Botanical Gazette, 24(3):194–199, September 1897. ISSN 0006-8071. doi: 10.2307/2464569. URL http://www.jstor.org/stable/2464569. ArticleType: research-article / Full publication date: Sep., 1897 /.

[68] Scott R. Miller, Aaron L. Strong, Kenneth L. Jones, and Mark C. Ungerer. Bar-coded pyrosequencing reveals shared bacterial community properties along the temperature gradients of two alkaline hot springs in yellowstone national park. Appl. Environ. Microbiol., 75(13):4565–4572, July 2009. doi: hpi10.1128/AEM.02792-08h/pi. URL http://aem.asm.org/cgi/content/abstract/75/13/4565.

136 [69] Anne-Soisig Steunou, Sheila I Jensen, Eric Brecht, Eric D Becraft, Mary M Bateson, Oliver Kilian, Devaki Bhaya, David M Ward, John W Peters, Arthur R Grossman, and Michael Kuhl. Regulation of nif gene expression and the energetics of n2 fixation over the diel cycle in a hot spring microbial mat. ISME J, 2 (4):364–378, March 2008. ISSN 1751-7362. URL http://dx.doi.org/10.1038/ismej.2007.117.

[70] Anne-Soisig Steunou, Devaki Bhaya, Mary M. Bateson, Melanie C. Melendrez, David M. Ward, Eric Brecht, John W. Peters, Michael Khl, and Arthur R. Grossman. In situ analysis of nitrogen fixation and metabolic switching in unicellular thermophilic cyanobacteria inhabiting hot spring microbial mats. Proceedings of the National Academy of Sciences of the United States of America, 103(7):2398–2403, February 2006. doi: 10.1073/pnas.0507513103. URL http://www.pnas.org/content/103/7/2398.abstract.

[71] Joseph J. Copeland. Yellowstone thermal myxophyceae. Annals of the New York Academy of Sciences, 36(1):4223, 1936. ISSN 1749-6632. doi: 10.1111/j.1749-6632.1936.tb56976.x. URL http://onlinelibrary.wiley.com/doi/ 10.1111/j.1749-6632.1936.tb56976.x/abstract.

[72] John R. Spear, Hazel A. Barton, Charles E. Robertson, Christopher A. Francis, and Norman R. Pace. Microbial community biofabrics in a geothermal mine adit. Appl. Environ. Microbiol., 73(19):6172–6180, October 2007. doi: 10.1128/AEM.00393-07. URL http://aem.asm.org/cgi/content/abstract/73/19/6172.

[73] W. A. Setchell. THE UPPER TEMPERATURE LIMITS OF LIFE. Science, 17(441): 934–937, June 1903. ISSN 0036-8075, 1095-9203. doi: 10.1126/science.17.441.934. URL https://pubget.com/paper/17844131/THE_UPPER_TEMPERATURE_LIMITS_OF_LIFE.

[74] Kazem Kashefi and Derek R. Lovley. Extending the upper temperature limit for life. Science, 301(5635):934–934, August 2003. ISSN 0036-8075, 1095-9203. doi: 10.1126/science.1086823. URL http://www.sciencemag.org/content/301/5635/934.

[75] James E. Mann and Harold E. Schlichting. Benthic algae of selected thermal springs in yellowstone national park. Transactions of the American Microscopical Society, 86(1):2, January 1967. ISSN 00030023. doi: 10.2307/3224417. URL http://www.jstor.org/discover/10.2307/3224417?uid=29631&uid= 3739568&uid=2129&uid=2134&uid=2&uid=70&uid=3&uid=29630&uid=67&uid= 62&uid=3739256&sid=21101487261007.

137 [76] David M. Ward, Michael J. Ferris, Stephen C. Nold, and Mary M. Bateson. A natural view of microbial biodiversity within hot spring cyanobacterial mat communities. Microbiol. Mol. Biol. Rev., 62(4):1353–1370, December 1998. URL http://mmbr.asm.org/cgi/content/abstract/62/4/1353.

[77] Courtney S Schaffert, Christian G Klatt, David M Ward, Mark Pauley, and Laurey Steinke. Identification and distribution of high-abundance proteins in the octopus spring microbial mat community. Applied and environmental microbiology, 78(23): 8481–8484, December 2012. ISSN 1098-5336. doi: 10.1128/AEM.01695-12. PMID: 23001677.

[78] David M. Ward and Richard W. Castenholz. Cyanobacteria in geothermal habitats. In Brian A. Whitton and Malcolm Potts, editors, The Ecology of Cyanobacteria, pages 37–59. Springer Netherlands, January 2002. ISBN 978-0-7923-4735-4, 978-0-306-46855-1. URL http://link.springer.com/chapter/10.1007/0-306-46855-7_3.

[79] O. L. Inman. STUDIES ON THE CHLOROPHYLLS AND PHOTOSYNTHESIS OF THERMAL ALGAE FROM YELLOWSTONE NATIONAL PARK, CALIFORNIA, AND NEVADA. The Journal of General Physiology, 23(6):661–666, July 1940. ISSN 0022-1295. URL http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2237965/. PMID: 19873181 PMCID: PMC2237965.

[80] T. D. Brock and M. Louise Brock. Temperature optima for algal development in yellowstone and iceland hot springs. , Published online: 12 February 1966; | doi:10.1038/209733a0, 209(5024):733–734, February 1966. ISSN $footerJournalISSN. doi: 10.1038/209733a0. URL http://www.nature.com/nature/journal/v209/n5024/abs/209733a0.html.

[81] Richard W. Castenholz. The effect of sulfide on the blue-green algae of hot springs II. yellowstone national park. Microbial Ecology, 3(2):79–105, June 1977. ISSN 0095-3628. doi: 10.1007/BF02010399. URL http://www.springerlink.com/content/p472158298731828/.

[82] M J Ferris, A L Ruff-Roberts, E D Kopczynski, M M Bateson, and D M Ward. Enrichment culture and microscopy conceal diverse thermophilic synechococcus populations in a single hot spring microbial mat habitat. Applied and Environmental Microbiology, 62(3):1045–1050, March 1996. ISSN 0099-2240. URL http://www.ncbi.nlm.nih.gov/pmc/articles/PMC167868/. PMID: 11536748 PMCID: PMC167868.

138 [83] D. K. Button, Frits Schut, Pham Quang, Ravonna Martin, and Betsy R. Robertson. Viability and isolation of marine bacteria by dilution culture: Theory, procedures, and initial results. Applied and Environmental Microbiology, 59(3):881–891, March 1993. ISSN 0099-2240. URL http://www.ncbi.nlm.nih.gov/pmc/articles/PMC202203/. PMID: 16348896 PMCID: PMC202203.

[84] Dana E. Hunt, Lawrence A. David, Dirk Gevers, Sarah P. Preheim, Eric J. Alm, and Martin F. Polz. Resource partitioning and sympatric differentiation among closely related bacterioplankton. Science, 320(5879):1081–1085, May 2008. ISSN 0036-8075, 1095-9203. doi: 10.1126/science.1157890. URL http://www.sciencemag.org/content/320/5879/1081.

[85] Janelle R. Thompson, Sarah Pacocha, Chanathip Pharino, Vanja Klepac-Ceraj, Dana E. Hunt, Jennifer Benoit, Ramahi Sarma-Rupavtarm, Daniel L. Distel, and Martin F. Polz. Genotypic diversity within a natural coastal bacterioplankton population. Science, 307(5713):1311–1313, February 2005. ISSN 0036-8075, 1095-9203. doi: 10.1126/science.1106028. URL http://www.sciencemag.org/content/307/5713/1311.

[86] M J Ferris and D M Ward. Seasonal distributions of dominant 16S rRNA-defined populations in a hot spring microbial mat examined by denaturing gradient gel electrophoresis. Applied and environmental microbiology, 63(4):1375–1381, April 1997. ISSN 0099-2240. PMID: 9097434.

[87] Claudio Donati, N. Luisa Hiller, Herv Tettelin, Alessandro Muzzi, Nicholas Croucher, Samuel Angiuoli, Marco Oggioni, Julie Dunning Hotopp, Fen Hu, David Riley, Antonello Covacci, Tim Mitchell, Stephen Bentley, Morgens Kilian, Garth Ehrlich, Rino Rappuoli, E. Richard Moxon, and Vega Masignani. Structure and dynamics of the pan-genome of streptococcus pneumoniae and closely related species. Genome Biology, 11(10):R107, October 2010. ISSN 1465-6906. doi: 10.1186/gb-2010-11-10-r107. URL http://genomebiology.com/2010/11/10/R107/abstract.

[88] Michael T. Madigan, John M. Martinko, David Stahl, and David P. Clark. Brock Biology of Microorganisms. Benjamin Cummings, 13 edition, December 2010. ISBN 032164963X.

[89] Jonathan P Zehr. Nitrogen fixation by marine cyanobacteria. Trends in microbiology, 19(4):162–173, April 2011. ISSN 1878-4380. doi: 10.1016/j.tim.2010.12.004. PMID: 21227699.

139 [90] Pia H Moisander, Roxanne A Beinart, Ian Hewson, Angelicque E White, Kenneth S Johnson, Craig A Carlson, Joseph P Montoya, and Jonathan P Zehr. Unicellular cyanobacterial distributions broaden the oceanic n2 fixation domain. Science (New York, N.Y.), 327(5972):1512–1514, March 2010. ISSN 1095-9203. doi: 10.1126/science.1185468. PMID: 20185682.

[91] S. C. Nold and D. M. Ward. Photosynthate partitioning and fermentation in hot spring microbial mat communities,. Applied and Environmental Microbiology,, 62, (12,):4598,, December 1996. URL http://europepmc.org/articles/PMC1389010/?report=abstract,.

[92] Falicia Goh, Michelle A Allen, Stefan Leuko, Tomohiro Kawaguchi, Alan W Decho, Brendan P Burns, and Brett A Neilan. Determining the specific microbial populations and their spatial distribution within the stromatolite ecosystem of shark bay. ISME J, 3(4):383–396, December 2008. ISSN 1751-7362. URL http://dx.doi.org/10.1038/ismej.2008.114.

[93] RP Reid, NP James, IG Macintyre, CP Dupraz, and RV Burne. Shark bay stromatolites: Microfabrics and reinterpretation of origins. FACIES, 49:299–324, 2003. ISSN 0172-9179. URL http://apps.isiknowledge.com/full_record.do?product=WOS&search_mode= GeneralSearch&qid=2&SID=3FNF8d2Mdmlj@igGNcO&page=1&doc=6.

[94] Paul Hoffman and M.R. Walter. Chapter 6.1 stromatolite morphogenesis in shark bay, western australia. In Stromatolites, volume Volume 20, pages 261–271. Elsevier, 1976. ISBN 0070-4571. URL http://www.sciencedirect.com/science/ article/B8G3P-4T2BBNJ-P/2/418b2e9f0a647cae4fffe9423e15444a.

[95] Brendan P Burns, Falicia Goh, Michelle Allen, and Brett A Neilan. Microbial diversity of extant stromatolites in the hypersaline marine environment of shark bay, australia. Environmental microbiology, 6(10):1096–1101, October 2004. ISSN 1462-2912. doi: 10.1111/j.1462-2920.2004.00651.x. PMID: 15344935.

[96] AR Chivas, T Torgersen, and HA Polach. Growth-rates and holocene development of stromatolites for shark bay, western australia. Australian Journal of Earth Sciences, 37(2):113–121, June 1990. ISSN 0812-0099. URL http://apps.isiknowledge.com/full_record.do?product=WOS&search_mode= GeneralSearch&qid=1&SID=3D8K1kdPN2oibm8b9cF&page=1&doc=1.

[97] Brendan P Burns, Falicia Goh, Michelle Allen, and Brett A Neilan. Microbial diversity of extant stromatolites in the hypersaline marine environment of shark bay, australia. Environmental Microbiology, 6(10):1096–1101, October 2004. ISSN 1462-2920. doi: 10.1111/j.1462-2920.2004.00651.x. URL http://onlinelibrary. wiley.com/doi/10.1111/j.1462-2920.2004.00651.x/abstract.

140 [98] Jamie S Foster, Stefan J Green, Steven R Ahrendt, Stjepko Golubic, R Pamela Reid, Kevin L Hetherington, and Lee Bebout. Molecular and morphological characterization of cyanobacterial diversity in the stromatolites of highborne cay, bahamas. ISME J, 3(5):573–587, January 2009. ISSN 1751-7362. URL http://dx.doi.org/10.1038/ismej.2008.129.

[99] Miriam S. Andres and R. Pamela Reid. Growth morphologies of modern marine stromatolites: A case study from highborne cay, bahamas. Sedimentary Geology, 185(3-4):319–328, March 2006. ISSN 0037-0738. doi: 10.1016/j.sedgeo.2005.12.020. URL http://www.sciencedirect.com/science/ article/B6V6X-4JB9N1N-2/2/2d4da8cc2398f59f45cacd44e8bf69ca.

[100] LK Baumgartner, JR Spear, DH Buckley, NR Pace, RP Reid, C Dupraz, and PT Visscher. Microbial diversity in modern marine stromatolites, highborne cay, bahamas. ENVIRONMENTAL MICROBIOLOGY, 11(10):2710–2719, October 2009. ISSN 1462-2912. doi: 10.1111/j.1462-2920.2009.01998.x. URL http://apps.isiknowledge.com/full_record.do?product=WOS&search_mode= GeneralSearch&qid=1&SID=4CAAFHLjiIlnco2Ll5m&page=1&doc=1.

[101] R. Pamela Reid, Ian G. Macintyre, Kathleen M. Browne, Robert S. Steneck, and Timothy Miller. Modern marine stromatolites in the exuma cays, bahamas: Uncommonly common. Facies, 33(1):1–17, December 1995. ISSN 0172-9179, 1612-4820. doi: 10.1007/BF02537442. URL http://apps.webofknowledge.com/InboundService.do?UT= A1995TA21500001&IsProductCode=Yes&mode=FullRecord&product=WOS&SID= 2CpBF9DmhKohMMhcA6G&smartRedirect=yes&SrcApp=Nature&DestFail=http%3A% 2F%2Fwww.webofknowledge.com%3FDestApp%3DCEL%26DestParams%3D% 253Faction%253Dretrieve%2526mode%253DFullRecord%2526product%253DCEL% 2526UT%253DA1995TA21500001%2526customersID%253DNature%26e%3DTE_ct9Ziw. oqna1K_wv4Pl5HSZrAt2Vr8z.uJJp_Z58Fx68SHCIlDLQxxoQt5DUz%26SrcApp% 3DNature%26SrcAuth%3DNature&action=retrieve&Init=Yes&SrcAuth= Nature&customersID=Nature&Func=Frame.

[102] J. J. Dravis. Hardened subtidal stromatolites, bahamas. Science, 219(4583): 385–386, January 1983. ISSN 0036-8075, 1095-9203. doi: 10.1126/science.219.4583.385. URL http://apps.webofknowledge.com/InboundService.do?UT= A1983PY51600020&IsProductCode=Yes&mode=FullRecord&product=WOS&SID= 2CpBF9DmhKohMMhcA6G&smartRedirect=yes&SrcApp=Nature&DestFail=http%3A% 2F%2Fwww.webofknowledge.com%3FDestApp%3DCEL%26DestParams%3D% 253Faction%253Dretrieve%2526mode%253DFullRecord%2526product%253DCEL% 2526UT%253DA1983PY51600020%2526customersID%253DNature%26e% 3DkkuP0wdh199ZPKcWlTsu8rDeg.xyFGo4hG3udJRmnJ69_ffeYEYMVhwidzLpREUF% 26SrcApp%3DNature%26SrcAuth%3DNature&action=retrieve&Init=Yes&SrcAuth= Nature&customersID=Nature&Func=Frame.

141 [103] Malcolm R. Walter, John Bauld, and Thomas D. Brock. Siliceous algal and bacterial stromatolites in hot spring and geyser effluents of yellowstone national park. Science, 178(4059):402–405, October 1972. doi: 10.1126/science.178.4059.402. URL http://www.sciencemag.org/cgi/content/abstract/178/4059/402.

[104] Stanley M. Awramik and James P. Vanyo. Heliotropism in modern stromatolites. Science, 231(4743):1279–1281, March 1986. doi: 10.1126/science.231.4743.1279. URL http://www.sciencemag.org/cgi/content/abstract/231/4743/1279.

[105] Brian Jones, Robin W. Renaut, and Kurt O. Konhauser. Genesis of large siliceous stromatolites at frying pan lake, waimangu geothermal field, north island, new zealand. Sedimentology, 52(6):12291252, 2005. ISSN 1365-3091. doi: 10.1111/j.1365-3091.2005.00739.x. URL http://onlinelibrary.wiley.com/doi/ 10.1111/j.1365-3091.2005.00739.x/abstract.

[106] Fernando Santos, Arantxa Pea, Balbina Nogales, Elena Soria-Soria, M Angeles Garca Del Cura, Juan Antonio Gonzlez-Martn, and Josefa Antn. Bacterial diversity in dry modern freshwater stromatolites from ruidera pools natural park, spain. Systematic and Applied Microbiology, 33(4):209–221, June 2010. ISSN 1618-0984. doi: 10.1016/j.syapm.2010.02.006. URL http://www.ncbi.nlm.nih.gov/pubmed/20409657. PMID: 20409657.

[107] Stanley M. Awramik, Kathleen Grey, Richard B. Hoover, Gilbert V. Levin, Alexei Y. Rozanov, and G. Randall Gladstone. Stromatolites: biogenicity, biosignatures, and bioconfusion. In Astrobiology and Planetary Missions, volume 5906, pages 59060P–9, San Diego, CA, USA, 2005. SPIE. URL http://link.aip.org/link/?PSI/5906/59060P/1.

[108] R P Reid, P T Visscher, A W Decho, J F Stolz, B M Bebout, C Dupraz, I G Macintyre, H W Paerl, J L Pinckney, L Prufert-Bebout, T F Steppe, and D J DesMarais. The role of microbes in accretion, lamination and early lithification of modern marine stromatolites. Nature, 406(6799):989–992, August 2000. ISSN 0028-0836. doi: 10.1038/35023158. URL http://www.ncbi.nlm.nih.gov/pubmed/10984051. PMID: 10984051.

[109] Dominic Papineau, Jeffrey J. Walker, Stephen J. Mojzsis, and Norman R. Pace. Composition and structure of microbial communities from stromatolites of hamelin pool in shark bay, western australia. Appl. Environ. Microbiol., 71(8):4822–4832, August 2005. doi: 10.1128/AEM.71.8.4822-4832.2005. URL http://aem.asm.org/cgi/content/abstract/71/8/4822.

142 [110] Brendan P. Burns, Roberto Anitori, Philip Butterworth, Ruth Henneberger, Falicia Goh, Michelle A. Allen, Raquel Ibaez-Peral, Peter L. Bergquist, Malcolm R. Walter, and Brett A. Neilan. Modern analogues and the early history of microbial life. Precambrian Research, 173(1-4):10–18, September 2009. ISSN 0301-9268. doi: 10.1016/j.precamres.2009.05.006. URL http://www.sciencedirect.com/science/ article/B6VBP-4WJBC11-1/2/5b5e91dfb7d1e41052ece9516fbdeec6.

[111] Stephanie A. Havemann and Jamie S. Foster. Comparative characterization of the microbial diversities of an artificial microbialite model and a natural stromatolite. Appl. Environ. Microbiol., 74(23):7410–7421, December 2008. doi: hpi10.1128/AEM.01710-08h/pi. URL http://aem.asm.org/cgi/content/abstract/74/23/7410.

[112] CA Lozupone and R Knight. Global patterns in bacterial diversity. Proceedings of the National Academy of Sciences of the United States Of, 104(27):11436–11440, July 2007. ISSN 0027-8424. doi: 10.1073/pnas.0611525104. URL http://apps.isiknowledge.com/full_record.do?product=WOS&search_mode= GeneralSearch&qid=8&SID=4BdEoidOOj5jlDNA7h9&page=1&doc=10.

[113] Ruth E. Ley, J. Kirk Harris, Joshua Wilcox, John R. Spear, Scott R. Miller, Brad M. Bebout, Julia A. Maresca, Donald A. Bryant, Mitchell L. Sogin, and Norman R. Pace. Unexpected diversity and complexity of the guerrero negro hypersaline microbial mat. Applied and Environmental Microbiology, 72(5):3685–3695, May 2006. ISSN 0099-2240, 1098-5336. doi: 10.1128/AEM.72.5.3685-3695.2006. URL http://aem.asm.org/content/72/5/3685.

[114] Jeffrey J Werner, Omry Koren, Philip Hugenholtz, Todd Z DeSantis, William A Walters, J Gregory Caporaso, Largus T Angenent, Rob Knight, and Ruth E Ley. Impact of training sets on classification of high-throughput bacterial 16s rRNA gene surveys. The ISME journal, 6(1):94–103, January 2012. ISSN 1751-7370. doi: 10.1038/ismej.2011.82. PMID: 21716311.

[115] Wolfgang Ludwig, Oliver Strunk, Ralf Westram, Lothar Richter, Harald Meier, Yadhukumar, Arno Buchner, Tina Lai, Susanne Steppi, Gangolf Jobb, Wolfram Frster, Igor Brettske, Stefan Gerber, Anton W Ginhart, Oliver Gross, Silke Grumann, Stefan Hermann, Ralf Jost, Andreas Knig, Thomas Liss, Ralph Lssmann, Michael May, Bjrn Nonhoff, Boris Reichel, Robert Strehlow, Alexandros Stamatakis, Norbert Stuckmann, Alexander Vilbig, Michael Lenke, Thomas Ludwig, Arndt Bode, and Karl-Heinz Schleifer. ARB: a software environment for sequence data. Nucleic Acids Research, 32(4):1363–1371, 2004. ISSN 1362-4962. doi: 10.1093/nar/gkh293. URL http://www.ncbi.nlm.nih.gov/pubmed/14985472. PMID: 14985472.

143 [116] Frederick Matsen, Robin Kodner, and E Virginia Armbrust. pplacer: linear time maximum-likelihood and bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics, 11(1):538, 2010. ISSN 1471-2105. doi: 10.1186/1471-2105-11-538. URL http://www.biomedcentral.com/1471-2105/11/538.

[117] Catherine Lozupone, Manuel E Lladser, Dan Knights, Jesse Stombaugh, and Rob Knight. UniFrac: an effective distance metric for microbial community comparison. The ISME journal, 5(2):169–172, February 2011. ISSN 1751-7362. doi: 10.1038/ismej.2010.133. URL http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3105689/. PMID: 20827291 PMCID: PMC3105689.

[118] Elizabeth A. Dinsdale, Robert A. Edwards, Dana Hall, Florent Angly, Mya Breitbart, Jennifer M. Brulc, Mike Furlan, Christelle Desnues, Matthew Haynes, Linlin Li, Lauren McDaniel, Mary Ann Moran, Karen E. Nelson, Christina Nilsson, Robert Olson, John Paul, Beltran Rodriguez Brito, Yijun Ruan, Brandon K. Swan, Rick Stevens, David L. Valentine, Rebecca Vega Thurber, Linda Wegley, Bryan A. White, and Forest Rohwer. Functional metagenomic profiling of nine biomes. Nature, 452(7187):629–632, April 2008. ISSN 0028-0836. doi: 10.1038/nature06810. URL http://dx.doi.org/10.1038/nature06810.

[119] Jack A. Gilbert, Folker Meyer, Lynn Schriml, Ian R Joint, Martin Mhling, and Dawn Field. Metagenomes and metatranscriptomes from the l4 long-term coastal monitoring station in the western english channel. Standards in Genomic Sciences, 3(2):183–193, October 2010. ISSN 1944-3277. doi: 10.4056/sigs.1202536. URL http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3035373/. PMID: 21304748 PMCID: PMC3035373.

[120] Jeffrey J. Werner, Dan Knights, Marcelo L. Garcia, Nicholas B. Scalfone, Samual Smith, Kevin Yarasheski, Theresa A. Cummings, Allen R. Beers, Rob Knight, and Largus T. Angenent. Bacterial community structures are unique and resilient in full-scale bioenergy systems. Proceedings of the National Academy of Sciences, February 2011. ISSN 0027-8424, 1091-6490. doi: 10.1073/pnas.1015676108. URL http://www.pnas.org/content/early/2011/02/15/1015676108.

[121] John Christian Gaby and Daniel H. Buckley. A global census of nitrogenase diversity. Environmental Microbiology, 13(7):17901799, 2011. ISSN 1462-2920. doi: 10.1111/j.1462-2920.2011.02488.x. URL http://onlinelibrary.wiley.com/ doi/10.1111/j.1462-2920.2011.02488.x/abstract.

144 [122] H. James Tripp, Ian Hewson, Sam Boyarsky, Joshua M. Stuart, and Jonathan P. Zehr. Misannotations of rRNA can now generate 90% false positive protein matches in metatranscriptomic studies. Nucleic Acids Research, July 2011. ISSN 0305-1048, 1362-4962. doi: 10.1093/nar/gkr576. URL http://nar.oxfordjournals.org/content/early/2011/07/19/nar.gkr576.

[123] Richard W. Castenholz. SPECIES USAGE, CONCEPT, AND EVOLUTION IN THE CYANOBACTERIA (BLUE-GREEN ALGAE). Journal of Phycology, 28(6): 737–745, 1992. doi: 10.1111/j.0022-3646.1992.00737.x. URL http://dx.doi.org/10.1111/j.0022-3646.1992.00737.x.

[124] Aharon Oren and Brian J. Tindall. Nomenclature of the cyanophyta/cyanobacteria/cyanoprokaryotes under the international code of nomenclature of prokaryotes. Algological Studies, 117(1):39–52, 2005. doi: 10.1127/1864-1318/2005/0117-0039.

[125] R. W. Castenholz. Phylum BX. cyanobacteria. oxygenic photosynthetic bacteria. In G. M. Garrity, D. R. Boone, and R. W. Castenholz, editors, The Archaea and the Deeply Branching and Phototropic Bacteria, volume 1, pages 473–599. Springer-Verlag, New York, 2nd edition, 2001.

[126] Lucien Hoffmann, Jiri Komarek, and Jan Kastovsky. System of cyanoprokaryotes (cyanobacteria) state in 2004. Algological Studies, 117:95–115(21), October 2005. doi: doi:10.1127/1864-1318/2005/0117-0095. URL http://www.ingentaconnect. com/content/schweiz/algo/2005/00000117/00000001/art00007.

[127] J. R. Cole, B. Chai, R. J. Farris, Q. Wang, S. A. Kulam, D. M. McGarrell, G. M. Garrity, and J. M. Tiedje. The ribosomal database project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucl. Acids Res., 33(suppl 1):D294–296, January 2005. doi: 10.1093/nar/gki038. URL http://nar.oxfordjournals.org/cgi/content/abstract/33/suppl_1/D294.

[128] A Wilmotte and M. Herdman. Phylogenetic relationships among the cyanobacteria based on 16S rRNA sequences.

[129] J. P. Euzeby. List of prokaryotic names with standing in nomenclature, 1997. URL http://www.bacterio.cict.fr/.

[130] Jiri Komarek and T. Hauer. Cyanodb.cz - on-line database of cyanobacterial genera. URL http://www.cyanodb.dz.

[131] Brandie D Wagner, Charles E Robertson, and J Kirk Harris. Application of two-part statistics for comparison of sequence variant counts. PloS one, 6(5):e20296, 2011. ISSN 1932-6203. doi: 10.1371/journal.pone.0020296. PMID: 21629788.

145 [132] Sara Freitas, Stephen Hatosy, Jed A Fuhrman, Susan M Huse, David B Mark Welch, Mitchell L Sogin, and Adam C Martiny. Global distribution and diversity of marine verrucomicrobia. The ISME journal, 6(8):1499–1505, August 2012. ISSN 1751-7370. doi: 10.1038/ismej.2012.3. PMID: 22318305.

[133] S F Altschul, W Gish, W Miller, E W Myers, and D J Lipman. Basic local alignment search tool. Journal of Molecular Biology, 215(3):403–10, October 1990. ISSN 0022-2836. doi: 10.1006/jmbi.1990.9999. URL http://www.ncbi.nlm.nih.gov/pubmed/2231712. PMID: 2231712.

[134] Susan M. Huse, Les Dethlefsen, Julie A. Huber, David Mark Welch, David A. Relman, and Mitchell L. Sogin. Exploring microbial diversity and taxonomy using SSU rRNA hypervariable tag sequencing. PLoS Genet, 4(11):e1000255, November 2008. doi: 10.1371/journal.pgen.1000255. URL http://dx.doi.org/10.1371/journal.pgen.1000255.

[135] Daniel McDonald, Morgan N Price, Julia Goodrich, Eric P Nawrocki, Todd Z DeSantis, Alexander Probst, Gary L Andersen, Rob Knight, and Philip Hugenholtz. An improved greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. The ISME Journal, 6(3):610–618, March 2012. ISSN 1751-7362. doi: 10.1038/ismej.2011.139. URL http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3280142/. PMID: 22134646 PMCID: PMC3280142.

[136] Charles E Robertson, J Kirk Harris, John R Spear, and Norman R Pace. Phylogenetic diversity and ecology of environmental archaea. Current opinion in microbiology, 8(6):638–642, December 2005. ISSN 1369-5274. doi: 10.1016/j.mib.2005.10.003. PMID: 16236543.

[137] Ruth E. Ley, Fredrik Bckhed, Peter Turnbaugh, Catherine A. Lozupone, Robin D. Knight, and Jeffrey I. Gordon. Obesity alters gut microbial ecology. Proceedings of the National Academy of Sciences of the United States of America, 102(31):11070 –11075, 2005. doi: 10.1073/pnas.0504978102. URL http://www.pnas.org/content/102/31/11070.abstract.

[138] T. Z. DeSantis, P. Hugenholtz, K. Keller, E. L. Brodie, N. Larsen, Y. M. Piceno, R. Phan, and G. L. Andersen. NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes. Nucl. Acids Res., 34(suppl 2): W394–399, July 2006. doi: 10.1093/nar/gkl244. URL http://nar.oxfordjournals.org/cgi/content/abstract/34/suppl_2/W394.

146 [139] Abigail C Allwood, Malcolm R Walter, Balz S Kamber, Craig P Marshall, and Ian W Burch. Stromatolite reef from the early archaean era of australia. Nature, 441 (7094):714–718, June 2006. ISSN 1476-4687. doi: 10.1038/nature04764. URL http://www.ncbi.nlm.nih.gov/pubmed/16760969. PMID: 16760969.

[140] Stanley M. Awramik and James Sprinkle. Proterozoic stromatolites: The first marine evolutionary biota. Historical Biology, 13(4):241–253, 1999. ISSN 0891-2963. doi: 10.1080/08912969909386584. URL http://www.tandfonline.com/doi/abs/10.1080/08912969909386584.

[141] Pieter T. Visscher, R. Pamela Reid, Brad M. Bebout, Shelley E. Hoeft, Ian G. Macintyre, and John A. Thompson. Formation of lithified micritic laminae in modern marine stromatolites (Bahamas); the role of sulfur cycling. American Mineralogist, 83(11-12 Part 2):1482–1493, December 1998. URL http: //ammin.geoscienceworld.org/cgi/content/abstract/83/11-12_Part_2/1482.

[142] Pieter T. Visscher, R. Pamela Reid, and Brad M. Bebout. Microscale observations of sulfate reduction: Correlation of microbial activity with lithified micritic laminae in modern marine stromatolites. Geology, 28(10):919 –922, October 2000. doi: 10.1130/0091-7613(2000)28h919:MOOSRCi2.0.CO;2. URL http://geology.gsapubs.org/content/28/10/919.abstract.

[143] H W Paerl, T F Steppe, and R P Reid. Bacterially mediated precipitation in marine stromatolites. Environmental Microbiology, 3(2):123–130, February 2001. ISSN 1462-2912. URL http://www.ncbi.nlm.nih.gov/pubmed/11321542. PMID: 11321542.

[144] Stanley M. Awramik and Robert Riding. Role of algal eukaryotes in subtidal columnar stromatolite formation. Proceedings of the National Academy of Sciences of the United States of America, 85(5):1327 –1329, March 1988. URL http://www.pnas.org/content/85/5/1327.abstract.

[145] W. M. Berelson, F. A. Corsetti, C. Pepe-Ranney, D. E. Hammond, W. Beaumont, and J. R. Spear. Hot spring siliceous stromatolites from yellowstone national park: assessing growth rate and laminae formation. Geobiology, 9(5):411424, 2011. ISSN 1472-4669. doi: 10.1111/j.1472-4669.2011.00288.x. URL http:// onlinelibrary.wiley.com/doi/10.1111/j.1472-4669.2011.00288.x/abstract.

[146] S. Golubic and J. W. Focke. Phormidium hendersonii howe; identity and significance of a modern stromatolite building microorganism. Journal of Sedimentary Research, 48(3):751–764, September 1978. doi: hpi10.1306/212F7559-2B24-11D7-8648000102C1865Dh/pi. URL http://jsedres.sepmonline.org/cgi/content/abstract/48/3/751.

147 [147] LEE SEONGJOO and STJEPKO GOLUBIC. Multitrichomous cyanobacterial microfossils from the mesoproterozoic gaoyuzhuang formation, china: Paleoecological and taxonomic implications. Lethaia, 31(3):169–184, September 1998. ISSN 1502-3931. doi: 10.1111/j.1502-3931.1998.tb00505.x. URL http://onlinelibrary.wiley.com/doi/10.1111/j.1502-3931.1998.tb00505.x/ abstract.

[148] Zhang Zhongying. Solar cyclicity in the precambrian microfossil record. Palaeontology, 29:101–111, 1986.

[149] Brian Jones, Robin W. Renaut, and Michael R. Rosen. Microbial biofacies in hot-spring sinters; a model based on ohaaki pool, north island, new zealand. JOURNAL OF SEDIMENTARY RESEARCH, 68(3):413–434, May 1998. URL http://jsedres.sepmonline.org/cgi/content/abstract/68/3/413.

[150] John R. Spear, Jeffrey J. Walker, Thomas M. McCollom, and Norman R. Pace. Hydrogen and bioenergetics in the yellowstone geothermal ecosystem. Proceedings of the National Academy of Sciences of the United States of America, 102(7):2555–2560, February 2005. doi: 10.1073/pnas.0409574102. URL http://www.pnas.org/content/102/7/2555.abstract.

[151] A Wilmotte, G Van der Auwera, and R De Wachter. Structure of the 16 s ribosomal RNA of the thermophilic cyanobacterium chlorogloeopsis HTF (’Mastigocladus laminosus HTF’) strain PCC7518, and phylogenetic analysis. FEBS Letters, 317 (1-2):96–100, February 1993. ISSN 0014-5793. URL http://www.ncbi.nlm.nih.gov/pubmed/8428640. PMID: 8428640.

[152] Karin Finsinger, Ingeborg Scholz, Aurelio Serrano, Saylen Morales, Lorena Uribe-Lorio, Marielos Mora, Ana Sittenfeld, Jrgen Weckesser, and Wolfgang R Hess. Characterization of true-branching cyanobacteria from geothermal sites and hot springs of costa rica. Environmental Microbiology, 10(2):460–473, February 2008. ISSN 1462-2920. doi: 10.1111/j.1462-2920.2007.01467.x. URL http://www.ncbi.nlm.nih.gov/pubmed/18093164. PMID: 18093164.

[153] Dennis A Benson, Ilene Karsch-Mizrachi, David J Lipman, James Ostell, and David L Wheeler. GenBank. Nucleic Acids Research, 36(Database issue): D25–30, January 2008. ISSN 1362-4962. doi: 10.1093/nar/gkm929. URL http://www.ncbi.nlm.nih.gov/pubmed/18073190. PMID: 18073190.

148 [154] SeN Turner, Kathleen M. Pryer, Vivian P. W. Miao, and Jeffrey D. Palmer. Investigating deep phylogenetic relationships among cyanobacteria and plastids by small subunit rRNA sequence analysis. The Journal of Eukaryotic Microbiology, 46(4):327–338, July 1999. ISSN 1066-5234. doi: 10.1111/j.1550-7408.1999.tb04612.x. URL http: //onlinelibrary.wiley.com/doi/10.1111/j.1550-7408.1999.tb04612.x/pdf.

[155] Daiske Honda, Akira Yokota, and Junta Sugiyama. Detection of seven major evolutionary lineages in cyanobacteria based on the 16S rRNA gene sequence analysis with new sequences of five marine synechococcus strains. Journal of Molecular Evolution, 48(6):723–739, June 1999. doi: 10.1007/PL00006517. URL http://dx.doi.org/10.1007/PL00006517.

[156] James J Elser, John H. Schampel, FERRAN GarciaPichel, Brian D Wade, Valeria Souza, Luis Eguiarte, Ana Escalante, and Jack D Farmer. Effects of phosphorus enrichment and grazing snails on modern stromatolitic microbial communities. Freshwater Biology, 50(11):1808–1825, November 2005. ISSN 1365-2427. doi: 10.1111/j.1365-2427.2005.01451.x. URL http://onlinelibrary.wiley.com/doi/ 10.1111/j.1365-2427.2005.01451.x/abstract.

[157] Falicia Goh, Michelle A Allen, Stefan Leuko, Tomohiro Kawaguchi, Alan W Decho, Brendan P Burns, and Brett A Neilan. Determining the specific microbial populations and their spatial distribution within the stromatolite ecosystem of shark bay. The ISME journal, 3(4):383–396, April 2009. ISSN 1751-7370. doi: 10.1038/ismej.2008.114. PMID: 19092864.

[158] John Bunge. ESTIMATING THE NUMBER OF SPECIES WITH CATCHALL. In Pacific Symposium on Biocomputing Proceedings 2011, Fairmont Orchid, Hawaii, 2011. URL http://psb.stanford.edu/psb-online/proceedings/psb11/#Microbiome.

[159] Scott R. Miller, Michael D. Purugganan, and Stephanie E. Curtis. Molecular population genetics and phenotypic diversification of two populations of the thermophilic cyanobacterium mastigocladus laminosus. Applied and Environmental Microbiology, 72(4):2793–2800, April 2006. ISSN 0099-2240. doi: 10.1128/AEM.72.4.2793-2800.2006. PMID: 16597984 PMCID: 1449082.

[160] Richard W Castenholz. The thermophilic cyanophytes of iceland and the upper temperature limit1. Journal of Phycology, 5(4):360–368, December 1969. ISSN 1529-8817. doi: 10.1111/j.1529-8817.1969.tb02626.x. URL http://onlinelibrary. wiley.com/doi/10.1111/j.1529-8817.1969.tb02626.x/abstract.

[161] RW Castenholz. The biogeography of hot spring algae through enrichment cultures. Mitt. Int. Ver. Limnol, 21:296–315, 1978.

149 [162] Akiko Tomitani, Andrew H. Knoll, Colleen M. Cavanaugh, and Terufumi Ohno. The evolutionary diversification of cyanobacteria: Molecularphylogenetic and paleontological perspectives. 103(14):5442–5447, April 2006. doi: 10.1073/pnas.0600999103. URL http://www.pnas.org/content/103/14/5442.abstract.

[163] Lucas J Stal, Hans W Paerl, Brad Bebout, and Marlies Villbrandt. Heterocystous versus non-heterocystous cyanobacteria in microbial mats. In Lucas J Stal and Paul Caumette, editors, Microbial Mats: Sturcture, Development and Environmental Significance, volume 35 of Ecological Sciences. Springer-Verlag, New York, 1993.

[164] P Fay. Oxygen relations of nitrogen fixation in cyanobacteria. Microbiological Reviews, 56(2):340–373, June 1992. ISSN 0146-0749. PMID: 1620069 PMCID: 372871.

[165] Scott R. Miller, Carin Williams, Aaron L. Strong, and Darla Carvey. Ecological specialization in a spatially structured population of the thermophilic cyanobacterium mastigocladus laminosus. Appl. Environ. Microbiol., 75(3): 729–734, February 2009. doi: hpi10.1128/AEM.01901-08h/pi. URL http://aem.asm.org/cgi/content/abstract/75/3/729.

[166] Scott R. Miller and Brad M. Bebout. Variation in sulfide tolerance of photosystem II in phylogenetically diverse cyanobacteria from sulfidic habitats. Appl. Environ. Microbiol., 70(2):736–744, February 2004. doi: hpi10.1128/AEM.70.2.736-744.2004h/pi. URL http://aem.asm.org/cgi/content/abstract/70/2/736.

[167] Wolfgang Ludwig, Oliver Strunk, Sabine Klugbauer, Norbert Klugbauer, Michael Weizenegger, Judith Neumaier, Marianne Bachleitner, and Karl Heinz Schleifer. Bacterial phylogeny based on comparative sequence analysis (review). Electrophoresis, 19(4):554–568, April 1998. ISSN 0173-0835. doi: 10.1002/elps.1150190416. URL http://onlinelibrary.wiley.com/doi/10.1002/elps.1150190416/pdf.

[168] Chizuru Takashima and Akihiro Kano. Microbial processes forming daily lamination in a stromatolitic travertine. Sedimentary Geology, 208(3-4):114–119, August 2008. ISSN 0037-0738. doi: 10.1016/j.sedgeo.2008.06.001. URL http://www.sciencedirect.com/science/article/B6V6X-4SSG4XV-1/2/ eb2ae878c8bcd375e107aa48233b98bf.

150 [169] Tomoyo Okumura, Fumito Shiraishi, Takeshi Naganuma, Kise Yukimura, Chiduru Takashima, Shin Nishida, Hiroko Koike, and Akihiro Kano. Microbial contribution to lamina formation in an aragonite travertine. In Joachim Reitner, Nadia-Valerie Queric, and Mike Reich, editors, Geobiology of stromatolithes : International Kalkowsky-Symposium Gttingen, October 4-11.2008 ; abstract volume and field guide to excursions, pages 105–106. Universittsverlag Gttingen, October 2008. ISBN 9783940344526.

[170] Takashima Chizuru and Kano Akihiro. Depositional processes of travertine developed at shionoha hot spring, nara prefecture, japan. Journal of the Geological Society of Japan, 111(12):751–764, 2005. ISSN 0016-7630. URL http://sciencelinks.jp/j-east/article/200602/000020060206A0034252.php.

[171] Jason W Sahl, Nathaniel Fairfield, J Kirk Harris, David Wettergreen, William C Stone, and John R Spear. Novel microbial diversity retrieved by autonomous robotic exploration of the world’s deepest vertical phreatic sinkhole. Astrobiology, 10(2):201–213, March 2010. ISSN 1557-8070. doi: 10.1089/ast.2009.0378. URL http://www.ncbi.nlm.nih.gov/pubmed/20298146. PMID: 20298146.

[172] R I Amann, W Ludwig, and K H Schleifer. Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiological Reviews, 59(1):143–169, March 1995. ISSN 0146-0749. URL http://www.ncbi.nlm.nih.gov/pubmed/7535888. PMID: 7535888.

[173] Sun-Hee Hong, John Bunge, Sun-Ok Jeon, and Slava S Epstein. Predicting microbial species richness. Proceedings of the National Academy of Sciences of the United States of America, 103(1):117–122, January 2006. ISSN 0027-8424. doi: 10.1073/pnas.0507245102. URL http://www.ncbi.nlm.nih.gov/pubmed/16368757. PMID: 16368757.

[174] E. Nawrocki. Structural RNA Homology Search and Alignment Using Covariance Models. PhD Thesis: Washington University School of Mediciine, 2009.

[175] Jamie Cannone, Sankar Subramanian, Murray Schnare, James Collett, Lisa D’Souza, Yushi Du, Brian Feng, Nan Lin, Lakshmi Madabusi, Kirsten Muller, Nupur Pande, Zhidi Shang, Nan Yu, and Robin Gutell. The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs: correction. BMC Bioinformatics, 3(1):15, 2002. ISSN 1471-2105. doi: 10.1186/1471-2105-3-15. URL http://www.biomedcentral.com/1471-2105/3/15.

151 [176] A. Stamatakis, T. Ludwig, and H. Meier. RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics, 21(4): 456–463, February 2005. doi: 10.1093/bioinformatics/bti191. URL http://bioinformatics.oxfordjournals.org/cgi/content/abstract/21/4/456.

[177] Alexandros Stamatakis, Paul Hoover, and Jacques Rougemont. A rapid bootstrap algorithm for the RAxML web servers. Syst Biol, 57(5):758–771, October 2008. doi: 10.1080/10635150802429642. URL http://sysbio.oxfordjournals.org/cgi/content/abstract/57/5/758.

[178] Nicholas D Pattengale, Masoud Alipour, Olaf R P Bininda-Emonds, Bernard M E Moret, and Alexandros Stamatakis. How many bootstrap replicates are necessary? Journal of Computational Biology: A Journal of Computational Molecular Cell Biology, 17(3):337–354, March 2010. ISSN 1557-8666. doi: 10.1089/cmb.2009.0179. URL http://www.ncbi.nlm.nih.gov/pubmed/20377449. PMID: 20377449.

[179] Maria Anisimova and Olivier Gascuel. Approximate likelihood-ratio test for branches: A fast, accurate, and powerful alternative. Systematic Biology, 55(4): 539–552, August 2006. ISSN 1063-5157. doi: 10.1080/10635150600755453. URL http://www.ncbi.nlm.nih.gov/pubmed/16785212. PMID: 16785212.

[180] Sandra Smit, Kristian Rother, Jaap Heringa, and Rob Knight. From knotted to nested RNA structures: A variety of computational methods for pseudoknot removal. RNA, 14(3):410 –416, March 2008. doi: 10.1261/rna.881308. URL http://rnajournal.cshlp.org/content/14/3/410.abstract.

[181] Morgan N. Price, Paramvir S. Dehal, and Adam P. Arkin. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol Biol Evol, 26(7):1641–1650, July 2009. doi: 10.1093/molbev/msp077. URL http://mbe.oxfordjournals.org/cgi/content/abstract/26/7/1641.

[182] JD Hunter. Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 9(3):90–95, 2007. URL http://dx.doi.org/10.1109/MCSE.2007.55.

[183] Devaki Bhaya, Arthur R Grossman, Anne-Soisig Steunou, Natalia Khuri, Frederick M Cohan, Natsuko Hamamura, Melanie C Melendrez, Mary M Bateson, David M Ward, and John F Heidelberg. Population level functional diversity in a microbial community revealed by comparative genomic and metagenomic analyses. ISME J, 1(8):703–713, October 2007. ISSN 1751-7362. URL http://dx.doi.org/10.1038/ismej.2007.46.

152 [184] Niels Birger Ramsing, Mike J. Ferris, and David M. Ward. Highly ordered vertical structure of synechococcus populations within the one-millimeter-thick photic zone of a hot spring cyanobacterial mat. Applied and Environmental Microbiology, 66 (3):1038–1049, March 2000. ISSN 0099-2240. URL http://www.ncbi.nlm.nih.gov/pmc/articles/PMC91940/. PMID: 10698769 PMCID: PMC91940.

[185] Trinity L Hamilton, Eric S Boyd, and John W Peters. Environmental constraints underpin the distribution and phylogenetic diversity of nifH in the yellowstone geothermal complex. Microbial ecology, 61(4):860–870, May 2011. ISSN 1432-184X. doi: 10.1007/s00248-011-9824-9. PMID: 21365232.

[186] Kimberly A Ross, Leah M Feazel, Charles E Robertson, Babu Z Fathepure, Katherine E Wright, Rebecca M Turk-Macleod, Mallory M Chan, Nicole L Held, John R Spear, and Norman R Pace. Phototrophic phylotypes dominate mesothermal microbial mats associated with hot springs in yellowstone national park. Microbial ecology, 64(1):162–170, July 2012. ISSN 1432-184X. doi: 10.1007/s00248-012-0012-3. PMID: 22327269.

[187] Eric S Boyd, Trinity L Hamilton, John R Spear, Matthew Lavin, and John W Peters. FeFe-hydrogenase in yellowstone national park: evidence for dispersal limitation and phylogenetic niche conservatism. The ISME journal, 4(12):1485–1495, December 2010. ISSN 1751-7370. doi: 10.1038/ismej.2010.76. PMID: 20535223.

[188] D’Arcy R Meyer-Dombard, Wesley Swingley, Jason Raymond, Jeff Havig, Everett L Shock, and Roger E Summons. Hydrothermal ecotones and streamer biofilm communities in the lower geyser basin, yellowstone national park. Environmental microbiology, 13(8):2216–2231, August 2011. ISSN 1462-2920. doi: 10.1111/j.1462-2920.2011.02476.x. PMID: 21453405.

[189] Scott R. Miller and Richard W. Castenholz. Evolution of thermotolerance in hot spring cyanobacteria of the genus synechococcus. Applied and Environmental Microbiology, 66(10):4222–4229, October 2000. ISSN 0099-2240, 1098-5336. doi: 10.1128/AEM.66.10.4222-4229.2000. URL http://aem.asm.org/content/66/10/4222.

[190] Tanja Bosak, Biqing Liang, Min Sub Sim, and Alexander P. Petroff. Morphological record of oxygenic photosynthesis in conical stromatolites. Proceedings of the National Academy of Sciences, 106(27):10939–10943, July 2009. ISSN 0027-8424, 1091-6490. doi: 10.1073/pnas.0900885106. URL http://www.pnas.org/content/106/27/10939.

153 [191] Lou Jost. Entropy and diversity. Oikos, 113(2):363375, 2006. ISSN 1600-0706. doi: 10.1111/j.2006.0030-1299.14714.x. URL http://onlinelibrary.wiley.com/ doi/10.1111/j.2006.0030-1299.14714.x/abstract.

[192] Damian Eads, 2008. URL http://scipy-cluster.googlecode.com/. hcluster: Hierarchical Clustering for SciPy.

154 APPENDIX - COPYRIGHT AGREEMENT AND CO-AUTHOR PERMISSIONS FOR PUBLISHED CONTENT

155 COPYRIGHT TRANSFER AGREEMENT

Date: Contributor name:

Contributor address:

Manuscript number (if known):

Re: Manuscript entitled

(the “Contribution”) for publication in (the “Journal”) published by (“Wiley-Blackwell”).

Dear Contributor(s): Thank you for submitting your Contribution for publication. In order to expedite the editing and publishing process and enable Wiley-Blackwell to disseminate your Contribution to the fullest extent, we need to have this Copyright Transfer Agreement signed and returned as directed in the Journal’s instructions for authors as soon as possible. If the Contribution is not accepted for publication, or if the Contribution is subsequently rejected, this Agreement shall be null and void. Publication cannot proceed without a signed copy of this Agreement.

A. COPYRIGHT 3. Final Published Version. Wiley-Blackwell hereby licenses back to the Contributor the following rights with respect to the final published version of 1. The Contributor assigns to Wiley-Blackwell, during the full term of copy- the Contribution: right and any extensions or renewals, all copyright in and to the Contribution, and all rights therein, including but not limited to the right to publish, repub- a. Copies for colleagues. The personal right of the Contributor only to send lish, transmit, sell, distribute and otherwise use the Contribution in whole or in or transmit individual copies of the final published version in any format to part in electronic and print editions of the Journal and in derivative works colleagues upon their specific request provided no fee is charged, and throughout the world, in all languages and in all media of expression now further-provided that there is no systematic distribution of the Contribu- known or later developed, and to license or permit others to do so. tion, e.g. posting on a listserve, website or automated delivery. 2. Reproduction, posting, transmission or other distribution or use of the final b. Re-use in other publications. The right to re-use the final Contribution or Contribution in whole or in part in any medium by the Contributor as permit- parts thereof for any publication authored or edited by the Contributor ted by this Agreement requires a citation to the Journal and an appropriate (excluding journal articles) where such re-used material constitutes less credit to Wiley-Blackwell as Publisher, and/or the Society if applicable, suitable than half of the total material in such publication. In such case, any modifi- in form and content as follows: (Title of Article, Author, Journal Title and cations should be accurately noted. Volume/Issue, Copyright © [year], copyright owner as specified in the Journal). c. Teaching duties. The right to include the Contribution in teaching or Links to the final article on Wiley-Blackwell’s website are encouraged where training duties at the Contributor’s institution/place of employment includ- appropriate. ing in course packs, e-reserves, presentation at professional conferences, in-house training, or distance learning. The Contribution may not be used B. RETAINED RIGHTS in seminars outside of normal teaching obligations (e.g. commercial semi- Notwithstanding the above, the Contributor or, if applicable, the Contributor’s nars). Electronic posting of the final published version in connection with Employer, retains all proprietary rights other than copyright, such as patent teaching/training at the Contributor’s institution/place of employment is rights, in any process, procedure or article of manufacture described in the permitted subject to the implementation of reasonable access control Contribution. mechanisms, such as user name and password. Posting the final published version on the open Internet is not permitted. C. PERMITTED USES BY CONTRIBUTOR d. Oral presentations. The right to make oral presentations based on the 1. Submitted Version. Wiley-Blackwell licenses back the following rights to Contribution. the Contributor in the version of the Contribution as originally submitted for 4. Article Abstracts, Figures, Tables, Data Sets, Artwork and Selected publication: Text (up to 250 words). a. After publication of the final article, the right to self-archive on the Con- a. Contributors may re-use unmodified abstracts for any non-commercial tributor’s personal website or in the Contributor’s institution’s/employer’s purpose. For on-line uses of the abstracts, Wiley-Blackwell encourages but institutional repository or archive. This right extends to both intranets and does not require linking back to the final published versions. the Internet. The Contributor may not update the submission version or replace it with the published Contribution. The version posted must contain b. Contributors may re-use figures, tables, data sets, artwork, and selected a legend as follows: This is the pre-peer reviewed version of the following text up to 250 words from their Contributions, provided the following article: FULL CITE, which has been published in final form at [Link to final conditions are met: article]. (i) Full and accurate credit must be given to the Contribution. b. The right to transmit, print and share copies with colleagues. (ii) Modifications to the figures, tables and data must be noted. Otherwise, no changes may be made. 2. Accepted Version. Re-use of the accepted and peer-reviewed (but not (iii) The reuse may not be made for direct commercial purposes, or for final) version of the Contribution shall be by separate agreement with Wiley- financial consideration to the Contributor. Blackwell. Wiley-Blackwell has agreements with certain funding agencies (iv) Nothing herein shall permit dual publication in violation of journal governing reuse of this version. The details of those relationships, and other ethical practices. offerings allowing open web use, are set forth at the following website: http://www.wiley.com/go/funderstatement. NIH grantees should check the box at the bottom of this document.

CTA-A 156 D. CONTRIBUTIONS OWNED BY EMPLOYER ment purposes only, if the U.S. Government contract or grant so requires. (U.S. Government, U.K. Government, and other government employees: see notes 1. If the Contribution was written by the Contributor in the course of the at end) Contributor’s employment (as a “work-made-for-hire” in the course of employment), the Contribution is owned by the company/employer which F. COPYRIGHT NOTICE must sign this Agreement (in addition to the Contributor’s signature) in the space provided below. In such case, the company/employer hereby assigns to The Contributor and the company/employer agree that any and all copies of Wiley-Blackwell, during the full term of copyright, all copyright in and to the the final published version of the Contribution or any part thereof distributed Contribution for the full term of copyright throughout the world as specified in or posted by them in print or electronic format as permitted herein will include paragraph A above. the notice of copyright as stipulated in the Journal and a full citation to the Journal as published by Wiley-Blackwell. 2. In addition to the rights specified as retained in paragraph B above and the rights granted back to the Contributor pursuant to paragraph C above, Wiley- Blackwell hereby grants back, without charge, to such company/employer, its G. CONTRIBUTOR’S REPRESENTATIONS subsidiaries and divisions, the right to make copies of and distribute the final The Contributor represents that the Contribution is the Contributor’s original published Contribution internally in print format or electronically on the Com- work, all individuals identified as Contributors actually contributed to the Con- pany’s internal network. Copies so used may not be resold or distributed externally. tribution, and all individuals who contributed are included. If the Contribution However the company/employer may include information and text from the was prepared jointly, the Contributor agrees to inform the co-Contributors of Contribution as part of an information package included with software or the terms of this Agreement and to obtain their signature to this Agreement or other products offered for sale or license or included in patent applications. their written permission to sign on their behalf. The Contribution is submitted Posting of the final published Contribution by the institution on a public access only to this Journal and has not been published before. (If excerpts from copy- website may only be done with Wiley-Blackwell’s written permission, and payment righted works owned by third parties are included, the Contributor will obtain of any applicable fee(s). Also, upon payment of Wiley-Blackwell’s reprint fee, written permission from the copyright owners for all uses as set forth in Wiley- the institution may distribute print copies of the published Contribution externally. Blackwell’s permissions form or in the Journal’s Instructions for Contributors, and show credit to the sources in the Contribution.) The Contributor also E. GOVERNMENT CONTRACTS warrants that the Contribution contains no libelous or unlawful statements, does not infringe upon the rights (including without limitation the copyright, In the case of a Contribution prepared under U.S. Government contract or patent or trademark rights) or the privacy of others, or contain material or grant, the U.S. Government may reproduce, without charge, all or portions of instructions that might cause harm or injury. the Contribution and may authorize others to do so, for official U.S. Govern-

CHECK ONE BOX:

Contributor-owned work

ATTACH ADDITIONAL SIGNATURE PAGES AS NECESSARY Contributor’s signature Date

Type or print name and title

Co-contributor’s signature Date

Type or print name and title

Company/Institution-owned work (made-for-hire in the course of employment) Company or Institution (Employer-for-Hire) Date

Authorized signature of Employer Date

U.S. Government work Note to U.S. Government Employees A contribution prepared by a U.S. federal government employee as part of the employee’s official duties, or which is an official U.S. Government publication, is called a “U.S. Government work,” and is in the public domain in the United States. In such case, the employee may cross out Paragraph A.1 but must sign (in the Contributor’s signature line) and return this Agreement. If the Contribution was not prepared as part of the employee’s duties or is not an official U.S. Government publication, it is not a U.S. Government work.

U.K. Government work Note to U.K. Government Employees (Crown Copyright) The rights in a Contribution prepared by an employee of a U.K. government department, agency or other Crown body as part of his/her official duties, or which is an official government publication, belong to the Crown. U.K. government authors should submit a signed declaration form together with this Agreement. The form can be obtained via http://www.opsi.gov.uk/advice/crown-copyright/copyright-guidance/ publication-of-articles-written-by-ministers-and-civil-servants.htm

Other Government work Note to Non-U.S., Non-U.K. Government Employees If your status as a government employee legally prevents you from signing this Agreement, please contact the editorial office.

NIH Grantees Note to NIH Grantees Pursuant to NIH mandate, Wiley-Blackwell will post the accepted version of Contributions authored by NIH grant-holders to PubMed Central upon acceptance. This accepted version will be made publicly available 12 months after publication. For further information, see www.wiley.com/go/nihmandate.

CTA-A 157 Gmail - (no subject) https://mail.google.com/mail/u/0/?ui=2&ik=3cdf2db6e4...

Chuck Ranney

(no subject)

Merika Treants Thu, Jan 17, 2013 at 4:11 PM To: Chuck Ranney

I authorize that the article titled "Cyanobacterial construction of hot spring siliceous stromatolites in Yellowstone National Park" for which I am a co-author can be used as a chapter in Charles Pepe-Ranney's doctoral dissertation.

Sincerely,

Merika Elyse Treants

158

1 of 1 01/22/2013 11:15 PM Gmail - co-author reprint consent https://mail.google.com/mail/u/0/?ui=2&ik=3cdf2db6e4...

Chuck Ranney

co-author reprint consent

William Berelson Thu, Jan 17, 2013 at 12:15 PM To: Chuck Pepe-Ranney

Hi Chuck (and whomever)

I authorize that the article titled "Cyanobacterial construction of hot spring siliceous stromatolites in Yellowstone National Park" for which I am a co-author can be used as a chapter in Charles Pepe-Ranney's doctoral dissertation.

best,

Will

______Will Berelson Chair, Department of Earth Sciences (ZHS 227) 3651 Trousdale Pky. Los Angeles, CA 90089-0740 213-740-5828; FAX: 213-740-8801

159

1 of 1 01/22/2013 11:17 PM