BIOENERGETIC AND METAGENOMIC ANALYSIS OF MICROBIAL SULFUR CYCLING AT BORUP FIORD PASS GLACIER, CANADIAN HIGH ARCTIC

by KATHERINE ELIZABETH WRIGHT BSc(Hons) University College London, University of London, 1985 MSc University of London, 2003

A thesis submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment of the requirement for the degree of Doctor of Philosophy Department of Molecular, Cellular and Developmental Biology 2012

This thesis entitled:

BIOENERGETIC AND METAGENOMIC ANALYSIS OF MICROBIAL SULFUR CYCLING AT BORUP FIORD PASS GLACIER, CANADIAN HIGH ARCTIC

written by Katherine Elizabeth Wright has been approved for the Department of Molecular, Cellular and Developmental Biology

Norman Pace, Committee Chair

Alexis Templeton

Date

The final copy of this thesis has been examined by the signatories, and we Find that both the content and the form meet acceptable presentation standards Of scholarly work in the above mentioned discipline.

ABSTRACT

Wright, Katherine Elizabeth (Ph.D., Molecular, Cellular and Developmental Biology)

BIOENERGETIC AND METAGENOMIC ANALYSIS OF MICROBIAL SULFUR CYCLING AT BORUP FIORD PASS GLACIER, CANADIAN HIGH ARCTIC

Thesis directed by Associate Professor Alexis S. Templeton

The relative importance of different microbial energy metabolisms in varying environments, and so their environmental impacts, are not well understood. This study combined geochemical analysis, bioenergetic calculations, analysis of environmental small subunit ribosomal RNA and functional genes, and culturing studies, to investigate microbial sulfur cycling on the surface of Borup Fiord Pass Glacier, Canadian High Arctic. The particular focus was to investigate how well the relative amounts of energy available from different redox reactions predicted the microbial utilization of those reactions, as indicated by relative abundance of key functional genes. Bioenergetics accurately predicted that the most abundant energy-related genes would be those used in the oxidation of sulfur species. However, genes for oxygenic or anoxygenic , aerobic and anaerobic oxidation of ammonium were largely or completely absent, even though these all represented energy sources that could in principle sustain life in this environment.

This investigation also found that the deposit on which the metagenome analysis was performed was dominated by Sulfurovum sp. and Sulfuricurvum sp. the first time these

Epsilonproteobacteria have been seen to be abundant in a sub-aerial environment. The data

iii

strongly support the hypothesis that these were the dominant primary producers of this community, using sulfur redox reactions, and in particular the oxidation of S0, to obtain energy. The genes responsible for oxidizing S0 are not fully known, but the disproportionately-high relative abundance of DsrE genes raises the possibility that the DsrE gene in these Epsilonproteobacteria might be involved in mobilizing external S0.

The surface layer of the deposit was dominated by Flavobacterium sp. which may therefore have a previously-unrecognized ability to metabolize sulfur compounds. This organism was isolated, and in culture it oxidized thiosulfate to sulfate, but was not able to conserve energy from this reaction. A Borup Gillisia sp. isolate, also a member of the Flavobacteriaceae family, demonstrated the ability to create unusual S0-biomineralized structures in culture, a previously unknown ability for Flavobacteria.

iv

ACKNOWLEDGEMENTS

I should like to thank Alexis Templeton for guidance and support through my PhD.

I should also like to thank my collaborators:

Charles Williamson (Colorado School of Mines) performed the DNA extractions of samples

BF09-02, BF09-04, BF09-05 and BF09-06a and worked jointly with me on the analysis of SSU rRNA data and preparation of the Epsilonproteobacteria phylogenetic tree. The preparation of the shotgun library, pyrosequencing of the metagenome and SSU rRNA amplicons was carried out by EnGenCore (the University of South Carolina Environmental Genomics Core Facility).

Sanger sequencing of the SSU rRNA amplicons was carried out by SeqWright. Ion chromatography, mass spectrometry and inductively coupled optical emission spectroscopy were carried out by Fred Luiszer (University of Colorado). XRD was performed by the Canadian

Geological Survey. DOC measurements were made by Holly Hughes and Carla-Erba measurements were made by Maegan McKee (all University of Colorado). Jason W. Sahl suggested the modifications to the Dojka et al. DNA extraction protocol. Charles Pepe-Ranney wrote the custom script for primer removal.

I would like to thank John Spear, Ed DeLong, Rob Knight, Justin Kuczynski, Cathy Lozupone,

Joe Jones, the MG-RAST support team and Frank. J. Stewart for helpful discussions.

Funding was provided by the NASA Astrobiology Institute/American Philosophical Society

Lewis & Clarke travel fellowship, and by the David and Lucille Packard Foundation.

v

CONTENTS

Chapter Page

1: Introduction 1

2: Metagenomic evidence for sulfur lithotrophy by Epsilonproteobacteria as the major 16 energy source for primary productivity in a sub-aerial arctic glacial deposit, Borup Fiord Pass

3: Metagenome analysis method development and supplementary analysis 49

4: S0 biomineralization, heterotrophic sulfur oxidation and carbon uptake by 83 Flavobacteriaceae

5: Additional culturing studies 120

6: Conclusions 129

References 145

Appendix 1 164

Appendix 2 167

vi

LIST OF TABLES

Table Page Table 2.1 30 Table 3.1 56 Table 3.2 61 Table 3.3 64 Table 3.4 65 Table 3.5 66 Table 3.6 66-67 Table 3.7 68 Table 4.1 87 Table 5.1 127-128

vii

LIST OF FIGURES

Figure Page Figure 2.1 19 Figure 2.2 31 Figure 2.3 33 Figure 2.4 34 Figure 2.5 36 Figure 2.6 43 Figure 3.1 63 Figure 3.2 69 Figure 3.3 71 Figure 3.4 72 Figure 3.5 72 Figure 3.6 73 Figure 3.7 75 Figure 4.1 93 Figure 4.2 94 Figure 4.3 95 Figure 4.4 95 Figure 4.5 96 Figure 4.6 98 Figure 4.7 99 Figure 4.8 100 Figure 4.9 101 Figure 4.10 102 Figure 4.11 103 Figure 4.12 104 Figure 4.13 105 Figure 4.14 106 Figure 4.15 107 Figure 4.16 108 Figure 4.17 109 Figure 5.1 124 Figure 5.2 124

viii

CHAPTER 1 – INTRODUCTION

Redox chemistry is central to the mechanisms by which microbes gain energy for life.

Microbes may gain energy by reacting organic carbon with oxygen, as we do ourselves, but also utilize a very wide range of inorganic chemical redox reactions (Amend and Shock, 2001). In doing so microbes chemically alter their environment and so affect the global biogeochemical cycling of major elements such as carbon, hydrogen, nitrogen, sulfur and iron (Falkowski 2008).

Phototrophs also use redox chemistry, and so also impact biogeochemical cycles through their energy metabolisms. Oxygenic phototrophs use light energy to extract electrons from water and so produce oxygen as a waste product, which is an excellent electron acceptor and so provides opportunities for other microbes to gain energy from a wide range of redox reactions (Shock et al., 2005). Oxygen also reacts abiotically with a wide range of substances, so altering the surface chemistry of our planet. Anoxygenic phototrophs use hydrogen, or reduced forms of sulfur and iron as electron donors, oxidizing these substances in the process (Bryant 2006).

SULFUR CHEMISTRY AS THE ENERGY BASIS FOR LIFE

Sulfur is one of the most abundant redox-active elements on Earth and provides a versatile energy source for microbes due to its ability to exist in several different redox states, and so to be involved in a large number of energy-releasing reactions (Kelly 1999). The most reduced form of sulfur is sulfide, which has a redox state of -2, and in the most oxidized form of sulfur, which is sulfate, the sulfur has a redox state of +6 (Langmuir 1997). The full oxidation from sulfide to sulfate, or reduction from sulfate to sulfide, therefore involves 8 electron transfers that can potentially be utilized in electron transport chains to produce proton motive force that can be used to generate ATP. No known organism does these reaction in one step (Ehrlich &

1

Newman, 2009). There are many possible intermediates between sulfide and sulfate, including

0 2- 2- 2- elemental sulfur (S ) , polysulfide (Sn ), thiosulfate (S2O3 ), tetrathionate (S4O6 ) and sulfite

2- (SO3 ) (Williamson & Rimstidt, 1992).

Both the oxidation and reduction of sulfur compounds are used by microbes to gain energy for growth. Microbial reduction of sulfate reduction by either organic carbon or hydrogen, using ATP sulfurylase, APS reductase and Dissimilatory Sulfite Reductase (Dsr) genes is ubiquitous in sediments and is believed to be the dominant way by which sulfate is reduced to sulfide on our planet as the abiotic reduction of sulfate requires very high temperatures. Microbes also make use of the full oxidative potential of sulfur, oxidizing sulfide back to sulfate. This is a multi-step reaction and microbes may catalyze the full oxidation from sulfide to sulfate, or only one step (Ehrlich & Newman, 2009). There are also different possible pathways and intermediates, and these are not all understood. The existence of many possible sulfur-oxygen anions, and their highly reactive nature resulting in rapid abiotic reactions, makes it very difficult to elucidate the exact pathways that microbes use. Two pathways that have been well studied, but are still not completely understood, are the sox pathway (sox genes) and the reverse dissimilatory pathway (rDsr genes). The sox proteins oxidize thiosulfate to sulfate and this reaction is known to be used by a wide range of to obtain energy (Friedrich et al.,

2005). These proteins have also been shown to be able to oxidize a wider range of sulfur compounds in vitro, including sulfide, elemental sulfur and sulfite (Rother et al., 2001) but it is not known if living bacteria can use sox proteins in this way. The products of the rDsr genes use a different oxidation pathway, oxidizing sulfide to sulfite, which is then oxidized further to sulfate by APS reductase and ATP sulfurylase, which are components of both the oxidative and reductive sulfur cycle (Dahl et al., 2008, Stewart et al., 2011).

2

Microbes have also been shown to be capable of using disproportionation reactions, where the same sulfur compound is both oxidized and reduced, in order to obtain energy. For

2- example, in the disproportionation of thiosulfate (S2O3 ) energy is released when one of the sulfur atoms is oxidized and the other is reduced, resulting in the production of both sulfide and sulfate (Bak & Cypionka 1987).

INTERACTION OF SULFUR WITH CARBON AND NITROGEN CYCLES

The oxidation and reduction of sulfur impacts significantly on the cycling of other elements, both directly and indirectly. As organic carbons are used to reduce sulfate they are oxidized in the process, ultimately to CO2. This may be used by other organisms which fix CO2 to produce biomass, including by bacteria that gain energy from sulfur redox reactions, such as the Gammaproteobacterial sulfur chemolithoautotrophs like Thiomicrospira (Scott et al., 2006).

Excess CO2 will be released into the atmosphere. Sulfur oxidizers also affect the release of CO2 by affecting the pH of their environment. The oxidation of sulfur to sulfate produces sulfuric acid which can strongly acidify the local environment, for example in sulfuric hot springs or acid mine drainage (Edwards et al., 2000). The solubility of CO2 is directly affected by pH; acidification causes carbonate rocks to dissolve and CO2 to be released into the atmosphere whereas an increase in pH causes it to precipitate (Langmuir 1997). The action of sulfur metabolizing microbes may therefore impact on global atmosphere CO2 levels, and so on global warming.

Microbial sulfur metabolism may also impact directly or indirectly on other elements. In the absence of oxygen, alternative oxidants such as Fe3+ or nitrate can be used to oxidize sulfur species in order to obtain energy (Warren 2008). Nitrate may be reduced all the way to nitrogen

3

gas which is returned to the atmosphere, or alternatively be reduced to ammonium by other microbes (Ehrlich & Newman 2009). Reduced iron may then be reoxidized by other microbes that can gain energy from this reaction (Blöthe & Roden, 2009). In addition to direct microbial action, the sulfide produced by microbial reduction of sulfate will react abiotically to reduce a wide range of oxidized species, including oxidized metals such as Fe3+, and will combine with metals to form sulfide minerals (Langmuir, 1997).

HABITABILITY AND MICROBIAL ENERGY METABOLISMS

There are likely to be many different factors that will influence the type of energy metabolism that microbes utilize in any given environment, including: i) the amount of energy available per electron transferred from each reaction, given the environmental conditions. As the net result of energy transfer via the electron transport chain is to pump protons across the membrane in order to generate proton motive force, and as electrons are effectively used as the proton carrier, the amount of energy available per electron is a relevant factor and allows direct comparison of different reactions, that involve different numbers of electrons (Amend & Shock, 2001); ii) the total amount of energy available in an environment from different sources. This takes into account not just the energy per electron transferred, but the total amount of each substance in the environment (Hentscher et al., 2005, Rogers et al., 2007); iii) the rate of flux of energy sources into an environment. An energy source may be present in only low concentrations, but nevertheless be able to provide significant energy if there is a high flux of it into the system (Hoehler 2004, Shock & Holland, 2007);

4

iv) the proportion of the energy that is conserved from each reaction, which will depend on the efficiency of the molecular machinery and will differ from reaction to reaction (Sievert &

Vetriani 2012); v) the efficiency of the organism in using energy for growth. Specifically, for autotrophic organisms, it will include the amount of energy needed to build biomass. For example, the

Calvin-Benson-Bassham carbon fixation pathway uses 3 ATP molecules to fix each CO2 molecule, whereas the reductive citric acid cycle only uses 1.67 ATP molecules (Canfield et al.,

2005).

In addition, the relative success of each type of energy metabolism, and so the type of microbial metabolism that is likely to be dominant in an environment, and have the greatest environmental impact, will also depend on the efficiency of the organism in adapting to its environment. This will depend partly on the environmental conditions that being faced, including temperature, pH, limitations of essential nutrients such as Fe and P that require the organism to actively scavenge these from the environment, and exposure to toxic substances which the organism needs to be able to withstand. It will also partly depend on how efficiently and effectively the organism’s molecular machinery can deal with those conditions (Hoehler 2007a).

For example, in the work covered in this thesis, subsurface organisms face a major environmental change in being moved in the spring from an anoxic, sulfidic, subsurface environment, to the surface of the ice, where they are subjected to both an oxygenated atmosphere, and UV radiation, both potentially toxic. Some of the subsurface organisms are likely to not be able to cope with the conditions at all, while others may do so with varying degrees of efficiency.

5

We do not yet have a good understanding of how these different factors affect microbial metabolism, or their relative importance in determining how successful, or otherwise, microbes are in adapting to their environment. Nor do we have a good understanding of the overall, global impact that microbes have on the biogeochemical cycling of major elements, and so on the environmental conditions of the planet as a whole, including climate. It is important to understand these factors better in order to understand how microbial biogeochemical cycling is likely to alter as environmental conditions change, and so the further environmental changes that will result because of microbial positive and negative feedback loops. This thesis begins to address that knowledge deficit by investigating the specific question of whether the relative abundance of microbial energy metabolisms correlates with the relative amounts of energy that are available from the respective redox reactions, in elemental sulfur deposits on the surface of

Borup Fiord Pass Glacier, Canadian High Arctic. The questions this thesis investigates are also relevant to the search for life beyond Earth. The energy sources that are available are a key factor in determining the best places to look for life on Mars, Europa or other planetary bodies, and to understanding the types of life that could potentially exist in such places, and so the chemical signatures for which we might search. This has been described as the ‘Follow the Energy’ approach to the search for possible life beyond Earth (Hoehler 2007b).

6

ENVIRONMENTAL BIOENERGETICS

The amount of energy that is available from redox reactions depends on the environmental concentrations of the reactants and products, the pH if the reaction involves proton transfer, and on temperature, and is given by the equation:

ΔGr = ΔG°r + RTlnQ = RTln(Q/K) where ΔGr is the free energy change of the reaction, ΔG°r is the standard free energy change (at the reaction temperature), R is the universal gas constant, T is the temperature in Kelvin, Q is the reaction quotient and K is the equilibrium constant at the reaction temperature (Amend & Shock

2001).

The amount of energy that is available from each redox reaction therefore varies from one site to another, depending on local chemistry and temperature (Amend & Shock, 2001,

Amend et al., 2011; McCollom, 2007; Shock et al., 2005; Shock et al., 2010, Spear et al, 2005).

Hundreds of chemical reactions may potentially provide energy for microbial life (Shock et al.,

2005). As the chemical composition of groundwater is affected by reactions with the rocks through which it has passed, the underlying geology will affect the energy sources available.

Investigation of hydrothermal vents has demonstrated that ultramafic-hosted vent systems which produce hydrogen and methane due to serpentinization reactions supply about twice as much energy as basalt-hosted systems (McCollom 2007). Work in Yellowstone hot springs with different chemical compositions and temperatures has shown that not only do the amounts of energy from each reaction vary, but the relative amounts of energy from different reactions changes, so that different reactions are the most energetically favorable in different pools (Shock et al., 2011). Local chemistry will also affect whether a particular reaction is energy-releasing, or requires an energy input. Anabolism reactions which are exergonic (release energy) in the

7

environmental conditions present at peridotite-hosted (ultramafic) hydrothermal vent systems where hydrogen oxidation was the most significant potential energy source, have been shown to be less exergonic, or endergonic (requiring energy) in basalt-hosted systems where sulfide oxidation was the most abundant source of energy (Amend et al., 2011). It is therefore necessary to calculate the bioenergetics of each site individually in order to understand the amounts of energy that are available from different sources there. Only a few studies have thoroughly addressed the question of the relative amounts of energy available in different environments, and these have been carried out in volcanic environments including hot springs in Yellowstone, and hydrothermal vents (Amend & Shock, 2001; Amend et al., 2011; McCollom, 2007; Rogers et al.,

2007, Shock et al., 2005; Shock et al., 2010, Spear et al, 2005).

LINKING MICROBIAL METABOLISM AND BIOENERGETICS

In order to improve our understanding of which energy metabolisms microbes are using in a particular site, and the relative success of each, it is necessary to gain an overview of microbial metabolism. The only way to do this is to take a molecular approach. By extracting

DNA from the environment it is possible to gain an overall picture of the nature and relative abundance of the microbes present there.

Previous attempts to do this have involved looking at which microbes are present using the small subunit ribosomal RNA (SSU rRNA) gene, and what is known about the nature of their metabolisms from other work (Bond et al., 2005, Macur et al., 2004, Spear et al., 2005, Costa et al., 2009, Gaidos et al., 2009, Vick et al., 2009). More direct approaches have involved using the polymerase chain reaction (PCR) with gene-specific primers to search for specific genes known

8

to code proteins that catalyze energy-releasing reactions (Hall et al., 2008, Chen et al., 2009,

Flores et al., 2012).

The invention of high-throughput pyrosequencing (Margulies et al., 2005) has provided a much better way to investigate this type of question. A shotgun library made from environmental

DNA will contain a sample of all the genes present in an environment (the metagenome), without needing to rely on primers which by their nature are selective and can miss relevant genes if sequences are very diverse, or if primers for a particular gene are not used at all.

Metatranscriptomic studies, sequencing a cDNA shotgun library made from environmental RNA, adds another layer of information by demonstrating which genes are being expressed in the environment (Frias-Lopez et al., 2008).

Genes that are present in low abundance in an environment may still be missed by a shotgun library, and the depth of sequencing in metagenomic and metatranscriptomic studies is therefore critical to consideration of how thoroughly the environment has been sampled (and this issue was explicitly addressed in the methodology of this study). Metagenomic (and metatranscriptomic) studies are increasingly being carried out to investigate the question of how the geochemistry of an environment affects microbial metabolism, including in some cases investigating the different energy sources that are available. In common use, the term metagenome is sometimes applied to studies that only look at the SSU rRNA gene in addition to work that looks at all the genes present in an environment. In this thesis the term metagenome is only used to mean the latter type of investigation, which encompasses the full range of genes present in an environment.

9

METAGENOMIC STUDIES LINKING GEOCHEMISTRY, BIOENERGETICS AND

MICROBIAL METABOLISM

Metagenomic studies are increasingly being used to investigate how geochemistry, including energy sources affects microbial metabolism. Metagenomic studies investigating the links between geochemical conditions and microbial function have investigated a range of different issues. This has included the ways in which microbes obtain essential nutrients in challenging environments. Metagenomic analysis combined with functional screening was used to identify novel pathways for taking up phosphonates as an adaptation to the poor availability of phosphorus in the ocean (Martinez et al., 2010). A metagenomic approach was also used to study iron uptake mechanisms in the ocean, as iron is also often limiting (Hopkinson & Barbeau,

2011). Other studies have investigated how microbes cope with environmental hazards. For example, the genomes of Dead Sea microbes have been shown to have an enrichment in genes relating to ion transport as an adaptation to their high-salt environment (Bodaker et al., 2010).

Metagenomic analysis of microbial populations at increasing depth in the ocean demonstrated an increase in mobile genetic elements shown by increases in prophages and transposes as depth increased, suggesting the need to allow for increased genetic variation to cope with these challenging conditions (Konstantinidis et al., 2009). A further use of metagenomic data includes identifying which microbes are carrying out which functions, by looking at the taxonomic origin of function genes. This has included identifying bacterial, rather than archael, amoA genes in sub-glacial waters, indicating that bacteria are the main nitrifiers in this environment (Boyd et al., 2011). Metagenomic analysis can also be used to compare the overall characteristics of communities in different environments. A comparative study of metagenomes from samples of different types found that the most important overall environmental characteristic in determining

10

the nature of the metagenome was whether the sample came from a water or sediment site, and that this was more important than salinity or concentration of nutrients (Jeffries et al., 2011).

Relatively few studies to date have used metagenomic studies to look explicitly at the question of the bioenergetics of the environment. These environmental metagenomic studies appear to support the idea that, in general, the major energy sources present do predict the dominant types of energy metabolism present, but are also beginning to highlight that this is not universally the case. However, this work is only just beginning to scratch the surface of determining the types of microbial metabolism that exist in different places, and their relative importance, and there is insufficient data to reach any definitive overall conclusions on the relative importance of different energy sources in varied types of environment.

In hydrothermal vent sites, metagenomic analysis has provided support for the hypothesis that the major type of energy metabolism correlates with the major geochemical energy source.

At hydrothermal vent and seep sites sites where serpentinization was taking place, so producing hydrogen, the genetic potential was identified for both hydrogen production and hydrogen oxidation as major energy metabolisms (Brazelton et al., 2012). In contrast, metagenomic analysis demonstrated that sulfur oxidation, possibly linked to nitrate reduction, was the most likely driver of primary productivity at a sulfide chimney in the basalt-hosted Mothra hydrothermal vent site (Xie et al., 2012). These findings are consistent with the results of

McCollom, 2007 and Amend et al., 2011, who found that hydrogen oxidation was the main potential energy source at serpentization sites and that sulfide oxygen was the most important energy source at basalt-hosted sites.

Metagenomic and geochemical analysis has also been used to determine that oxidation of reduced sulfur species is one of, or the, main energy metabolism in highly acidic (pH 0-1)

11

biofilms in sulfidic caves (Jones et al., 2012) and acidic sulfur-rich hot springs (Inskeep et al.,

2010). Such analyses have also been used to investigate energy metabolisms and environmental adaptations to cope with the microaerophilic environment of an acid mine, including the presence of cytochromes with a high affinity for oxygen, a possible electron transport chain to oxidize iron (Tyson et al., 2004).

One recent metagenomic study has also demonstrated that geochemical analysis alone cannot accurately predict all the types of energy metabolism that may exist in an environment.

Canfield et al (2010) used a metagenomic approach, detecting and quantifying genes involved in energy-releasing sulfur redox reactions, to deduce the presence of ‘cryptic’ sulfur cycling in an oxygen minimum zone (OMZ), that would not have been predicted from the geochemistry because no sulfide was detected in the water. They calculated the energy available from both sulfate reduction and sulfide oxidation, and demonstrated that both reactions could provide energy for growth in this environment. They then used isotopic labeling to confirm the presence of sulfur cycling. The findings of this study have not yet been tested in other environments to see if they are more widely applicable. Walsh et al. (2009) also used metagenomic analysis to determine the potential for sulfur oxidation, linked to nitrate respiration, by an uncultivated OMZ microbe, but did not test this experimentally.

More work is needed to determine the extent to which the energy sources present in an environment can predict the types of energy metabolism that are present there, and the limitations of this approach. The work in this thesis investigates this question in a sulfur-rich site,

Borup Fiord Pass Glacier, that is very different from the hydrothermal vents, volcanic hot springs and acid mine drainage sites that have received most attention to date, as it is not volcanic, high temperature or acidic, and is sub-aerial rather than underwater.

12

SULFUR CYCLING AT BORUP FIORD PASS GLACIER

Borup Fiord Pass Glacier in the Canadian High Arctic was chosen as a field site because of its astrobiological relevance as an excellent terrestrial analog site for Europa, which is also relevant to icy sulfur-rich sites on Mars. At Borup, a spring with very high levels of sulfide rises through the glacier and is surrounded by significant deposits of S0, gypsum (calcium sulfate) and calcite (calcium carbonate) (Grasby et al. 2003). The site is unique because it is not hydrothermal, the source of the sulfide is not volcanic activity, and no other site is known where extensive sulfur deposits, not of volcanic origin, are found on an ice surface. This is a therefore an excellent analog site for the surface of Europa, where deposits including sulfate are found on the ice that covers the surface of this moon (McCord 1998), and where occasional upwelling of water to the moon’s surface from the subsurface ocean is analagous to the rising of the spring at

Borup each year.

Previous work at the site has focused on characterisation of the overall features of the site, in particular its geological context (Grasby et al 2003, Grasby et al 2012), and studying the nature of the mineral phases found on the glacier using spectroscopic remote sensing, optical and electron microscopy and X-ray diffraction (Gleeson et al 2010, Grasby 2003, Gleeson et al

2012). Some initial microbiological characterization has been carried out, though it was not a comprehensive analysis, and mixed-cultures of microbes from the site have been shown to be able to produce S0-biomineralized structures, although the microbe(s) responsible for producing these structures, and their mechanism of production, were not determined (Grasby et al 2003,

Gleeson et al 2011).

13

SCOPE OF THE WORK IN THIS THESIS

The specific objectives of the work in this thesis were: i) to undertake a comprehensive, robust and quantitative study on how well the bioenergetics of a sulfur-dominated environment predict the microbial energy metabolisms that are utilized, by combining bioenergetic calculations, SSU rRNA and metagenomic (functional gene) analysis; ii) to improve our understanding of which microbial groups are major players in sulfur cycling at this site, and to determine which microbial sulfur redox reactions are likely to be significant in the glacial sulfur deposit; and iii) to identify the microbe(s) responsible for producing the S0 biomineralized structures that

Gleeson et al., 2011 had observed in culture;

Chapter 2 of this thesis contains the outcome of the bioenergetic calculations, SSU rRNA analysis and quantitative functional gene assessment of the metagenome. It explains the evidence supporting the conclusion that sulfur lithotrophy by Epsilonproteobacteria is the main driver of primary productivity in the glacial S0 deposit, and that this correlates with the results of the bioenergetic calculations. Chapter 2 also explains that the relative quantities of functional genes indicate that some metabolisms predicted to be energetically-favorable - photosynthesis and ammonium oxidation - are unexpectedly missing. The work in Chapter 2 has been submitted for publication in “Frontiers in Extreme Microbiology” and is currently in review. Chapter 3 describes the development of the methods used in Chapter 2, and further analysis that was done to check and support the results. It also includes conclusions on the challenges of metagenomic analysis, the limitations of different approaches, and how metagenomic analysis methods could be improved. Chapter 4 covers the experimental investigation of Flavobacteriaceae isolates from the Borup Fiord Pass Glacier sulfur deposits that may have a previously-unrecognized role in

14

sulfur cycling. It demonstrates that a Gillisia isolate produces novel S0 biomineralized structures unlike previously-known examples of biological S0 precipitation, although it is not known whether this isolate is catalyzing the production of S0 from sulfide, or merely precipitating S0 that is formed by abiotic oxidation. However, either way, it is a newly-discovered capability for

Flavobacteriaceae. A second isolate, representative of the Flavobacterium sp. that dominates the surface of the S0 deposits, demonstrated the ability to oxidize thiosulfate in culture, but did not appear to be able to use this reaction as a source of energy for growth. Chapter 5 touches briefly on other culturing work that was carried out, while Chapter 6 contains concluding thoughts.

15

CHAPTER 2

METAGENOMIC EVIDENCE FOR SULFUR LITHOTROPHY BY

EPSILONPROTEOBACTERIA AS THE MAJOR ENERGY SOURCE FOR PRIMARY

PRODUCTIVITY IN A SUB-AERIAL ARCTIC GLACIAL DEPOSIT,

BORUP FIORD PASS

ABSTRACT

I combined bioenergetic and metagenomic analysis of an elemental sulfur (S0) deposit on the surface of Borup Fiord Pass Glacier, Ellesmere Island, in the Canadian High Arctic to investigate how well the energy available from different redox reactions in an environment

+ predicted microbial metabolism. Although many S, C, Fe, As, Mn and NH4 oxidation reactions were predicted to be energetically feasible in the deposit, aerobic oxidation of S0 was the most abundant chemical energy source. Small subunit ribosomal RNA (SSU rRNA) data showed that the dominant phylotypes were Sulfurovum sp. and Sulfuricurvum sp., both Epsilonproteobacteria known to be capable of sulfur lithotrophy. Sulfur redox genes were abundant in the metagenome, but the sox pathway was strongly preferred over the reverse dissimilatory sulfite reductase pathway. Interestingly, there appeared to be habitable niches that were unoccupied.

+ Photosynthesis and NH4 oxidation should both be energetically favorable, but I found few or no

+ functional genes for oxygenic or anoxygenic photosynthesis, or for NH4 oxidation by either oxygen (nitrification) or nitrite (anammox). The bioenergetic, SSU rRNA and quantitative functional gene data are all consistent with the hypothesis that sulfur-based chemolithoautotrophy by Epsilonproteobacteria (Sulfurovum and Sulfuricurvum) is the main form of primary productivity at this site, instead of photosynthesis, even though light is abundant

16

and photosynthesis is not obviously prohibited by environmental conditions. This is the first time that Sulfurovum and Sulfuricurvum have been shown to dominate a sub-aerial environment, rather than anoxic or sulfidic settings. I also found that Flavobacteria dominate the surface of the sulfur deposits and I hypothesize that this aerobic heterotroph uses enough oxygen to create a microaerophilic environment in the sulfur deposit below, where the Epsilonproteobacteria can flourish.

INTRODUCTION

Microbes use a wide range of redox reactions to obtain energy for growth and therefore have a significant impact on the biogeochemical cycling of elements including carbon, nitrogen and sulfur (Falkowski et al., 2008). Bioenergetic calculations using geochemical analyses of an environment demonstrate that the most energetically-favorable redox reactions vary depending on the local chemistry and temperature (Amend & Shock, 2001; Amend et al., 2011; McCollom,

2007; Shock et al., 2005; Shock et al., 2010). Such calculations can be used to assess the habitability of diverse environments on Earth, or other planetary bodies such as Mars and Europa

(Hoehler, 2007). However, we still have only a very basic understanding of which reactions are utilized by microbes in any given environment. Some studies have investigated this question by environmental analysis of the small subunit ribosomal RNA (SSU rRNA) gene (Macur et al.,

2004; Spear et al., 2005; Costa et al., 2009; Gaidos et al., 2009; Vick et al., 2010) but this analysis is limited by our incomplete knowledge of the types of energy metabolism used by each phylotype. Another option is to use the polymerase chain reaction (PCR) to amplify genes for enzymes known to catalyze energy-releasing redox reactions (Hall et al., 2008; Chen et al.,

2009; Flores et al., 2012). This approach is also limited, as it will only detect genes for which

17

appropriate primers are used. Next-generation sequencing of metagenomes produced from shotgun libraries overcome these limitations by sampling the functional genes present in an environment without the need for primers. Metagenomic and metatranscriptomic studies that investigate the relationship between geochemistry, energy sources and microbial function have already been carried out on a range of different sites including ocean environments (Canfield et al., 2010; Walsh et al., 2009), Yellowstone hot springs (Inskeep et al., 2010), acidic sulfur-rich cave biofilms, (Jones et al., 2012) and hydrothermal vents (Brazelton et al., 2012; Xie et al.,

2011). The discovery by Canfield et al., 2010 of ‘cryptic cycling’ of sulfur, not predicted from the environmental geochemical data, in a metagenomic study of an ocean minimum zone, demonstrated the power of this approach to improve knowledge of microbial ecology.

To date, no studies have been published which combine a quantitative overview of redox reactions that can provide energy in a given environment, with a quantitative overview of the functional genes involved in energy-releasing reactions from the same environment in order to investigate the extent to which bioenergetics predict the reactions that are used by microbes to obtain energy for growth. I undertook such a study at Borup Fiord Pass Glacier in the Canadian

High Arctic (photo in Figure 2.1A, for a map of the area see Grasby et al., 2012). At this unique site, a cold spring, at neutral to alkaline pH, flows up through and over the glacier, and is surrounded by deposits of elemental sulfur (S0), gypsum and carbonates on the surface of the ice that extend downstream for a significant distance (Grasby et al., 2003; Grasby, 2003). This is not a hydrothermal system and it is not yet known whether the S0 is produced biologically, abiotically or a combination of both. Borup Fiord Pass Glacier is an environment that is substantially different to sulfur-rich sites that have been the subject of previous metagenomic or metatranscriptomic studies. It is also an excellent terrestrial analog to consider the habitability of

18

sites where subsurface waters penetrate icy overlayers in sulfur-rich environments on Mars or

Europa (Gleeson et al., 2010; Gleeson et al., 2012; Grasby et al., 2012).

Figure 2.1: Borup Fiord Pass Glacier. Figure 2.1A (top) overview of the field site in 2009. Figure 2.1B (bottom left) the spring source (site BF09-01) and sulfur varnish beside the spring (site BF09-02). Figure 2.1C (bottom right) sulfur deposit BF09-06 from which DNA was extracted for the metagenome.

19

MATERIALS AND METHODS

Site Description

I carried out field work at Borup Fiord Pass Glacier in the Canadian High Arctic (81°N) in July 2009. A sulfide-rich spring rises through the glacier, near the toe, discharges from the surface of the glacier and flows into a supra-glacial meltwater stream. The spring rises in the same general area each year, but the exact site varies. The surface of the ice around the spring, and alongside its course is covered with an elemental sulfur varnish. There is no volcanic or hydrothermal activity in the area that would explain the presence of the sulfur (Grasby et al.,

2003) and evidence suggests that the spring is the result of a glacially-driven groundwater system

(Grasby et al., 2012).

Aqueous and mineral geochemical sampling and analysis

I measured pH of the spring water and sulfur deposits in situ using colorpHast pH paper

(EDH, Germany). I measured air temperature with a thermometer (REI). Steve Grasby measured water temperature and pH using an Orion 5-star multimeter (ThermoFisher, USA). For aqueous geochemical analysis, I filter sterilized spring water (sample BF09-01) with a 0.2 µm filter and then injected it into a sterile, argon-filled vial (to avoid oxidation of the sample), maintained it at approximately 4°C for 4 ½ days as I transported it back to the laboratory, then maintained at 4°C until it was analyzed; ultra-pure nitric acid was added to one of the samples to avoid precipitation of aqueous cations. I collected samples of the sulfur deposits using a sterile spatula and containers. I maintained them either in a freezer at -20°C, or, packed in a cooler with freezer packs to try to keep them frozen as I transported them back to the laboratory where I kept them frozen at -80°C until analyzed. Most of the sulfur was an extremely thin varnish (estimated to be

20

no more than 1mm thick) on the glacier surface (Figure 2.1B), but approximately 300m from the spring source, a much deeper sulfur deposit (approximately 15 cm deep) was found beside the stream (Figure 2.1C); this was the sample site furthest from the spring’s source (sample BF09-

06). Once samples had been returned to the laboratory, I extracted water from the BF09-06 sulfur deposit by thawing part of the sample, allowing the thawed sample to settle for approximately 5-

10 minutes, and then removing the upper, water layer. I filter sterilized this water using a 0.2 µm filter and added ultrapure nitric acid to one of the water samples to preserve the solubility of cations. Both spring water and water extracted from the BF09-06 sulfur deposit were analyzed for sulfide, anions and cations. I measured sulfide using the methylene blue method (Hach,

USA). Fred Luiszer (University of Colorado) measured anions using ion chromatography

(Dionex), cations using mass spectrometry and inductively coupled optical emission spectroscopy. I assayed Fe2+ by the ferrozine method (Stookey, 1970). I assayed ammonium using the phenol hypochlorite method (Weatherburn, 1967). Sulfide was also measured gravimetrically by Steve Grasby, who filter sterilized water at the site spring water, added cadmium acetate immediately to precipitate the sulfide, which was later filtered and weighed at the Geological Survey of Canada. Mineralogy of the sulfur deposit was determined by X-ray diffraction (XRD) by the Geological Survery of Canada. Total carbon content of the BF09-06 deposit was measured by the Carla-Erba combustion method by Maegan McKee (University of

Colorado). Total DOC of the spring water, and the water extracted from the BF09-06 deposit, was measured with a Shimadzu TOC-V CSN Total Organic Carbon Analyzer by Holly Hughes

(University of Colorado).

21

Bioenergetic calculations

I used the results of the geochemical analyses to calculate the energy potentially available from a range of redox reactions that could take place within the BF09-06 sulfur deposit by the equation: ΔGr = ΔG°r + RTlnQ = RTln(Q/K) where ΔGr is the free energy change, ΔG°r is the standard free energy change, R is the universal gas constant, T is the temperature in Kelvin,

Q is the reaction quotient and K is the equilibrium constant. I calculated energies at 0 °C using values for the equilibrium constant (K) for each reaction at 0 °C from The Geochemist’s

Workbench software (v9.0, Aqueous Solutions, University of Illinois, USA) supplemented where necessary with thermodynamic data from Amend and Shock 2001. Concentrations of each ion were taken from the geochemical analysis. Where chemical species were below the detection limit of our instruments I used a range of concentrations to model ionic species (from 10-12 M, as being effectively zero, to the detection limit as a maximum). For As and Mn, only total element concentrations were measured, so I used the Geochemist’s Workbench module Act2 to determine the oxidized and reduced species most likely to be present at the pH of the sulfur depost (pH 6.5). I modeled the maximum energy potentially available from oxidation by calculating the energy available if all the element was in the reduced form, and the minimum energy by using a concentration of 10-12 M for the oxidized species. For atmospheric gases, I calculated the maximum concentration possible at 0 °C using the Henry’s Law constant for each gas at that temperature, assuming that the dissolved gases were in equilibrium with the atmosphere. To model energies from reactions that involve organic carbon I assumed the total

DOC measured was all acetate. To calculate the maximum total energy available from reactions that involve S0 I assumed the total dry weight of BF09-06 was S0, and to calculate the minimum energy I assumed that only 10% of the dry weight was S0. I calculated activities from the

22

concentrations with the standard Debye-Hückel equation for ionic species, and the Setchenow equation for gases (Langmuir, 1997). I measured alkalinity by end-point titration to pH 4.5

(Langmuir, 1997) and Steve Grasby also made alkalinity measurements of additional water samples.

Sampling for DNA extraction

I took samples from sulfur deposits immediately beside the spring source (site BF09-02), moving progressively downstream (sites BF09-04, BF09-05 and BF09-06) with a sterile spatula that had been thoroughly washed in ultra-pure water (Milli-Q, Millipore, USA) prior to sterilization. Sample BF09-06a was taken from the surface layer of the BF09-06 deposit, while sample BF09-06b sampled the whole depth of the BF09-06 deposit. I preserved samples by either (1) immediately immersing in 70% ethanol and maintaining at 4 °C during transport back to the laboratory (BF09-02, BF09-04, BF09-05 and BF09-06a) or (2) freezing on collection, and then maintaining either in a freezer at -20°C, or packed in a cooler with freezer packs to try to keep samples frozen during transport (BF09-06b). All samples were maintained frozen at -80°C in the laboratory until DNA was extracted. I filtered two aliquots of spring water (3 litres each) in situ using a sterile 0.2 µm filter. I preserved the filters in 70% ethanol, maintained them at 4°C during transport and then froze them at -80°C until DNA was extracted.

DNA extraction and purification

I extracted DNA from one of the filters, and the sulfur samples (BF09-02, 04, 05 and

06a) using a phenol-chloroform extraction as previously described (Dojka et al., 1998). I

23

extracted DNA from the other filter using Trizol (Invitrogen, USA) to extract both RNA and

DNA, following manufacturer’s instructions, except that a bead beating step (5 m/s for 45 s) was added at the start to lyse the cells. I extracted DNA for the metagenome from 65.822 g of the

BF09-06b sample using the Powermax Soil DNA isolation kit (MoBio, USA) following the manufacturer’s instructions, except that at the final spin filter stage the extracts were combined to use only 4 filters instead of 6, with an elution volume of 5 ml per filter. I concentrated the eluant using repeated ethanol precipitations and re-suspended the DNA in nuclease-free water

(Sigma, USA). I extracted DNA from a further 5.335 g of the BF09-06b sample using a phenol- chloroform extraction (modified from Dojka et al., 1998). Briefly, 0.2 to 0.9 g of sample was resuspended in buffer A (200 mM Tris [pH8.0], 50 mM EDTA, 200 mM NaCl, 2 mM Na citrate,

10 mM CaCl2). Lysozyme was added to give a final concentration of 1 mg/ml and the sample was incubated at 37°C, inverting tubes to mix every 10 minutes, for 1 hour. Proteinase K (to give 1 mg/ml) and sodium dodecyl sulfate (to give 0.3% wt/vol) were then added, and the sample was incubated at 37°C, inverting tubes to mix every 10 minutes, for a further hour. Tubes were centrifuged at 14,100 g for 5 minutes. The supernatant was extracted first with 1 ml phenol:chloroform: isoamyl alcohol (25:24:1) then with 1 ml chloroform:isoamyl alchohol

(24:1), followed by precipitation with sodium acetate (to give 0.3M final concentration) and

100% cold isopropanol (equal volume to the aqueous phase). After precipitation I washed the

DNA pellet twice with 70% ethanol and once with 100% ethanol, then re-suspended in sterile 10 mM Tris[pH8.0]. I combined the BF09-06b DNA from both the MoBio and phenol-chloroform and extracted DNA of approximately 1.5 kb and longer from a 0.8% agarose gel using the

E.Z.N.A. gel extraction kit (Omega Bio-Tek, USA) following the manufacturer’s instructions,

24

with a final elution volume of 30 µl. I quantified the DNA using picogreen (Invitrogen, USA) and assessed purity using a UV Nanodrop (Thermo Scientific, USA).

Sequencing for SSU rRNA analysis

I amplified DNA from BF09-02, BF09-04, BF09-05, BF09-06a and BF09-06b with modified PCR primers 515F and 927R as previously described (Osburn et al., 2011). The primers also contained a unique barcode for each sequence. I gel-purified PCR amplicons using the E.Z.N.A. gel extraction kit (Omega Bio-Tek, USA) and then normalized using a SequalPrep plate (Invitrogen, USA). Sequencing of the SSU rRNA amplicons was performed on a Roche pyrosequencer (Margulies et al., 2005) with FLX Titanium chemistry (Roche, Mannheim,

Germany) by EnGenCore (the University of South Carolina Environmental Genomics Core

Facility). I also amplified the BF09-06b DNA with the ‘universal’ primers 515F and 1391R

(Lane et al., 1985), gel purified using the E.Z.N.A. gel extraction kit (Omega Bio-Tek, USA), and cloned with the TOPO TA cloning kit for sequencing (Invitrogen, USA). This DNA was sequenced by Sanger sequencing by SeqWright. I used a nested PCR to amplify the DNA from the spring water (BF09-01); first the DNA was amplified with ‘universal’ primers 515F and

1391R, gel purified using the E.Z.N.A. gel extraction kit (Omega Bio-Tek, USA), then further amplified with primers 515F and 927R, and prepared for pyrosequencing as above.

Pyrosequencing of the metagenome

A shotgun library was made from the BF09-06b DNA using the Rapid Library

Preparation Method (Roche, Germany) and sequenced using a full plate on a Roche pyrosequencer with FLX Titanium chemistry (Roche, Germany) by EnGenCore.

25

Analysis of SSU rRNA data

I analysed SSU rRNA pyrosequencing data with QIIME v1.5.0 (Caporaso et al., 2010). working jointly with Chase Williamson. To exclude poor quality data, sequences with minimum average quality score less than 25, ambiguous bases, primer or barcode mismatches, or maximum homopolymer run greater than 6 nucleotides, were discarded. Only sequences between

410 and 470 nucleotides were used in the analysis as sequences that are significantly longer or shorter than the typical length for a sequencing run have been shown to be poor quality (Huse et al., 2007). The remaining sequences were denoised using flowgram clustering (Reeder & Knight,

2010). Primer sequences were removed using a custom script. Sequences were clustered into operational taxonomic units (OTUs) at 97% identity using UClust (Edgar, 2010) and the most abundant sequence in each cluster was chosen as the cluster representative sequence (rep seq) to assign for the OTU. The same OTUs were used across all samples, so that rep seqs could come from any sample. Chimeras were identified by ChimeraSlayer (Haas et al., 2011) using the ‘gold’ 16S NAST-aligned MicrobiomeUtilities database

(http://sourceforge.net/projects/microbiomeutil/files/) as the reference database, and chimeric sequences were discarded. Taxonomic classifications of pyrosequences were assigned with the

RDP classifer (Wang et al., 2007) and the Greengenes taxonomy (DeSantis et al., 2006, OTU reference and utility files gg_otus_4feb2011 from http://greengenes.lbl.gov). For OTUs that represented more than 5% of the sequences for any sample, I determined taxonomic identification at the Genus level by using BLAST (Altschul et al., 1990) to compare the rep seq for that OTU against the NCBI nucleotide non-redundant (nt nr) database. I also determined the taxonomic identification of the Sanger sequences by using BLAST against the NCBI nt nr

26

database. I downloaded reference sequences for Sulfurovum, Sulfuricurvum and Sulfurimonas sp.

SSU rRNA genes (all >800 nt) from NCBI to form the basis of the Epsilonproteobacteria phylogenetic tree. I BLASTed the rep seqs for the most abundant OTUs identified as

Sulfurovum, Sulfuricurvum and Sulfurimonas in the BF09-06b SSU rRNA data against the Borup

Sanger sequences to find the best matches. Good matches were found for Sulfurovum (BF09-

06_M97 at 100% identity to OTU 754 rep seq) and Sulfuricurvum (BF09-06_M102 at 99.46% identity to OTU 686 rep seq) but no match was found in the Borup Sanger data for the

Sulfurimonas rep seq (OTU 742 rep seq). The Borup Sanger sequences BF09-06_M97 and

BF09-06_M102 and the Epsilonprotebacteria reference sequences were aligned using ssu-align

(Narwocki, 2009). I made maximum likelihood trees with the online RAxML Black Box

(Stamatakis, 2006, Stamatakis et al., 2008) using the gamma model of rate heterogeneity, and performing 100 bootstraps. Pyrosequences (rep seqs for OTUs 754, 686 and 742) were added into the RAxML tree with pplacer (Matsen et al., 2010) by Chase Williamson.

Metagenome sequence data analysis

I used the MG-RAST web-based analysis package version 3 (Meyer et al., 2008) to produce a quality-controlled database from the metagenome data by removing reads less than 75 bp in length, that contained 10 or more ambiguous bases in the sequence, or that were artificial duplicates created by the sequencing process (identified as reads which started at the same base and which had exactly the same sequence for the first 50 bases). I analyzed the quality-controlled database to provide functional and phylogenetic information using the MG-RAST analysis pipeline, setting a maximum e-value of 10-10 and a minimum percentage identity of 50% for functional gene annotation against the GenBank nr database. The MG-RAST software gives only

27

approximate quantifications, and this can include counting the same metagenome sequence more than once in a result. Therefore to produce accurate quantifications I downloaded the metagenome sequences that were recorded as hits against genes of interest, and removed duplicate entries (i.e. the same metagenome sequence being recorded twice as a hit against the same gene) manually to produce a quality-controlled abundance for the number of hits against each gene. As the metagenome DNA sequences were not the full gene length, longer genes would automatically get more hits than shorter genes, even if the two genes were present in the environment in the same numbers. I therefore normalized the abundance of each gene against gene length, by dividing the quality-controlled abundance by the gene length in kb, to produce an abundance per kb of gene (as described in Canfield et al., 2010). As genes with more than 1 copy per genome would also receive more hits than single-copy genes, I divided the abundance per kb of gene by the copy number per genome to give a value that could be used to compare directly the normalized relative abundances (NRA) of two genes within the metagenome, as a measure of the relative abundance of organisms possessing each gene within the environment. As the metagenome had been prepared using random shearing to create a shotgun library, the relative abundance within the metagenome data was considered to be a reasonable assessment of the relative abundance of each gene within the environment. As the metagenome DNA was a random sample of the DNA in the environment, in order to determine whether differences in the

NRA seen in the genes were statistically significant, I calculated the standard error (SE) of each

NRA using the equation: SE = (p(1-p)/n)1/2 where p is the proportion of the total results given for a particular NRA, and n is the total number of results (Gardner & Altman, 1986). I calculated error bars for each sample at the 99% confidence level (+/- 2.58 SE). This means that there is a probability of 0.99 that the NRA of a gene in the environment is included within the error bar

28

range, so that NRA values that do not have overlapping error bars are significantly different at the p = 0.01 level. Only genes for metabolic reactions of interest were quantified. DNA sequences will be submitted to GenBank or will be made available in the public section of the

MG-RAST website at: http://metagenomics.anl.gov/.

RESULTS

Geochemistry

Aqueous geochemistry of the spring water and water extracted from the BF09-06 sulfur deposit is given in Table 2.1 (below). The spring water has high levels of sulfide (4 mM measured gravimetrically and 6.3 mM by the methylene blue method) and also contains 13 mM sulfate and 102 uM thiosulfate. No sulfide, thiosulfate or sulfite were detected in the sulfur deposit. Iron, manganese, ammonium and arsenic were also detected in both the spring water and sulfur deposit. Nitrate was detected in the sulfur deposit and the spring water, and nitrite was also detected in the sulfur deposit, although the spring water nitrate and sulfur deposit nitrite detections were below the detection limit to which the ion chromatograph had been calibrated, and therefore these could not be considered an accurate quantifications. XRD showed that the

BF09-06 deposit was composed of S0 with a very small amount (estimated <3%) of gypsum.

Bioenergetics

Many reactions could provide energy for microbial growth in the BF09-06 sulfur deposit.

Comparison of the energy available per electron transferred shows that the most energy per electron transferred is available from the aerobic oxidation of reduced sulfur species (sulfide, S0, thiosulfate or sulfite), aerobic oxidation of organic carbon, or anammox (Figure 2.2A). The

29

maximum possible dissolved oxygen concentration was used for these calculations. As oxygen levels might decrease below the surface of the deposit, energies for oxidation of sulfur species were also calculated for a range of concentrations down to 10-10M. These reactions continued to be highly energetically favorable (-80 to -110 kJ per mole electrons transferred) even at these low oxygen levels. Comparison of the total energy available from different sources in the deposit showed that the energy available from aerobic S0 oxidation was greater than from any other measured source, including from the aerobic oxidation of organic carbon (Figure 2.2B).

BF09-01 BF09-06b DL Temperature -0.3°C 0°C (in contact with ice) to 5°C (air temperature) pH 7.37 6.5 Na (mM) 51.562 1.601 0.001 K (mM) 0.303 0.040 0.005 Mg (mM) 12.876 0.836 0.0003 Ca (mM) 17.306 10.921 0.003 Fe total (µM) 5 1 0.215 Fe2+ (µM) DL DL 5 NH4 (µM) 254 15 Mn (µM) 1 13 0.01 Si (µM) 88 22 1.317 S2- (mM) 4 - 6.3 DL 0.005 SO4 (mM) 13.029 9.381 2 2- S2O3 (µM) 102 DL 2 Cl (mM) 39.072 2.119 F (µM) 54 31 Br (µM) 33 DL 1 NO3 (µM) 3 13 8 NO2 (µM) DL 7 11 PO4 (µM) DL DL 2 Alkalinity (mg/L) 15 163 Total DOC (mg/L) 3.9 11.4 0.06 Total organic carbon (%w of solid) - 0.12 Table 2.1: Geochemical analysis of the spring water (BF09-01) and water extracted from the glacial elemental sulfur deposit (BF09-06b) from which DNA was extracted and sequenced for the metagenome. DL = detection limit.

30

(%)*+,- . % /%*+,- 0 %(. . )/" (%)*+,- . #$1 /% 0 )2345 . (%/ )2345 . (%/ . !$1 /%*+,- 0 %(. . )/" )2345 . #$1 /%*+,- . #$1 (%/ 0 (. . #$1 )%/6 )%/6 . % /%*+,- . (%/ 0 %(. . %)/" )2345 . (%/ . /%*+,- 0 %(. . )/6 )/6 . #$1 /%*+,- 0 )/" )%/6 . (%/ 0 )/" . (%)*+,- (%)*+,- . 7/6 . (%/ 0 )/" . 7(". (%)*+,- . !$'7/6 0 #$"(. . )/" . #$& 7%*+,- . #$& (%/ )2345 . #$817/6 . !$81(%/ 0 #$1(. . )/" . #$817(". )2345 . !$%7/6 . #$"(%/ 0 #$&(. . )/" . #$'7%*+,- 6)2345 . 6(%/ 0 %(%)*+,- . )/6 . %(. ")2345 . "(%/ 0 6(%) . )/" . %(. 9(69// . 6(. . )/" 0 %9/%*+,- . (%)*+,- . %(%/ 9(69// . (. . ")2345 . %(%/ 0 %9/%*+,- . "(%)*+,- 9(69//*+,- . (.*+,- . %/%*+,- 0 %9/%*+,- . %(%/ 7(". . !$1 /%*+,- 0 %(. . 7/% . (%/ 7(". . 7/% 0 7% . %(%/ :3.. . #$%1 /%*+,- . %$1 (%/ 0 :3*/(-6*;- . %(. <;*/(-6*+,- . #$1 /%*+,- 0 (. . (%<;/"*+,- =>.. . (%/ .#$1 /%*+,- 0 =>/%*;- . %(. #$## %#$## "#$## '#$## &#$## !##$## !%#$## !"#$## !"#$% &' () *"+ ,-# " ! ! *(+,-./ ) ( 0(,-./ 1 (*) ) +0' *(+,-./ ) #"2 0( 1 +3456 ) *(0 +3456 ) *(0 ) !"2 0(,-./ 1 (*) ) +0' +3456 ) #"2 0(,-./ ) #"2 *(0 1 *) ) #"2 +(07 +(07 ) ( 0(,-./ ) *(0 1 (*) ) (+0' +3456 ) *(0 ) 0(,-./ 1 (*) ) +07 +07 ) #"2 0(,-./ 1 +0' +(07 ) *(0 1 +0' ) *(+,-./ *(+,-./ ) 807 ) *(0 1 +0' ) 8*') *(+,-./ ) !"&807 1 #"'*) ) +0' ) #"% 8(,-./ ) #"% *(0 +3456 ) #"92807 ) !"92*(0 1 #"2*) ) +0' ) #"928*') +3456 ) !"(807 ) #"'*(0 1 #"%*) ) +0' ) #"&8(,-./ 7+3456 ) 7*(0 1 (*(+,-./ ) +07 ) (*) '+3456 ) '*(0 1 7*(+ ) +0' ) (*) :*7:00 ) 7*) ) +0' 1 (:0(,-./ ) *(+,-./ ) (*(0 :*7:00 ) *) ) '+3456 ) (*(0 1 (:0(,-./ ) '*(+,-./ :*7:00,-./ ) *),-./ ) (0(,-./ 1 (:0(,-./ ) (*(0 8*') ) !"2 0(,-./ 1 (*) ) 80( ) *(0 8*') ) 80( 1 8( ) (*(0 ;4)) ) #"(2 0(,-./ ) ("2 *(0 1 ;4,0*/7,?)) ) *(0 )#"2 0(,-./ 1 >?0(,

!"##$!# !"##$#% !"##$#& !"##$#' !"##$#( !"##$)## !"##$)#( !"##$)#' !"##$)#& !" # $ Figure 2.2: Energy available from different redox reactions that could occur in the BF09-06 ! sulfur deposit. Ranges in the amount of energy reflect the range of uncertainty for substances that could not be detected (up to a maximum of the detection limit used for the assay). Figure 2.2A (top) shows the energy available per electron transferred. Most energy per electron transferred is available from aerobic oxidation of reduced sulfur species, aerobic oxidation of organic carbon or anammox. Figure 2.2B (bottom) shows the total energy available from the same reactions in the BF09-06 sulfur deposit, taking into account the total amount of each reactant present. Most energy is available from the aerobic oxidation of S0 (ringed in red) which provides an order of magnitude more energy than the next best reaction, disproportionation of S0, and several orders of magnitude more energy than the aerobic oxidation of organic carbon, or any other reaction.

31

Identity of micro-organisms in the spring water and sulfur deposits

The SSU rRNA sequence data show that the sulfide spring and glacial sulfur deposits were dominated by a few bacterial phylotypes (Figures 2.3A and B). In the spring water

Burkholderiaceae (Burkholderia sp. and Ralstonia sp.) were strongly dominant. In BF09-02,

BF09-04, BF09-05 and the BF09-06 surface sample (BF09-06a) Flavobacterium sp. were dominant, but in the BF09-06 deposit as a whole (sample BF09-06b) Epsilonproteobacteria

(Sulfurovum sp. and Sulfuricurvum sp.) were strongly dominant. For each of the Flavobacterium,

Sulfurovum and Sulfuricurvum a single species-level OTU (grouped at 97% SSU rRNA identity) comprised most of the sequences (Figure 2.3B). Thiomicrospira (Gammaproteobacteria) and

Thiobacillus (Betaproteobacteria), other Genera known to be capable of sulfur lithotrophy, were present at significantly lower abundances (1.3% and 0.8% of BF09-06b, respectively). No

Eucaryal sequences and only 4 Archaeal sequences were detected in the SSU rRNA pyrosequencing data (n= 88,606 across all samples). There were no Eucaryal or Archaeal sequences in the BF09-06b pyrosequencing data or the Sanger data (n = 54, data not shown). A phylogenetic tree demonstrating how the dominant Epsilonproteobacteria seen at Borup Fiord

Pass Glacier compare to Epsilonproteobacteria sequences found elsewhere (Figure 2.4) shows that the dominant Borup Sulfurovum sp. and Sulfuricurvum sp. are closely related to sequences identified at thermal sulfide springs on Svalbard (Reigstad et al., 2011).

32

100%

AC1 90% Acidobacteria Actinobacteria Armatimonadetes 80% Bacteroidetes Chlorobi 70% Chloroflexi Cyanobacteria Elusimicrobia 60% Firmicutes Gemmatimonadetes 50% Nitrospirae Planctomycetes Alphaproteobacteria 40% Betaproteobacteria Deltaproteobacteria 30% Epsilonproteobacteria Gammaproteobacteria Spirochaetes 20% TM6 Tenericutes 10% Thermi Thermotogae Other 0% BF09.01 BF09.02 BF09.04 BF09.05 BF09.06a BF09.06b

n = 2,335 n = 21,546 n = 27,039 n = 21,262 n = 13,519 n = 2,905

100%

90%

80%

70%

60% OTU 52 Burkholderia OTU 514 Ralstonia 50% OTU 660 Flavobacterium OTU 686 Sulfuricurvum

40% OTU 754 Sulfurovum OTU 806 Polaromonas Other 30%

20%

10%

0% BF09.01 BF09.02 BF09.04 BF09.05 BF09.06a BF09.06b

n = 2,335 n = 21,546 n = 27,039 n = 21,262 n = 13,519 n = 2,905

Figure 2.3: SSU rRNA sequence data. Figure 2.3A (top) shows the SSU rRNA data at the ! phylum level, except that the have been split into classes. BF09-01 is dominated by Betaproteobacteria. BF09-02, BF09-04, BF09-05 and BF09-06a are dominated by Bacteroidetes and BF09-06b is dominated by Epsilonproteobacteria. Phyla which are less than 0.01% in all samples are included in ‘other’. Figure 2.3B (bottom) shows each OTU representing more than 5% of any sample, with all other OTUs included in ‘other’. BF09-01 is dominated by the Betaproteobacteria Burkholderia and Ralstonia. BF09-02, BF09-04, BF09-05 and BF09-06a are dominated by Bacteroidetes member Flavobacterium, while BF09-06b is dominated by the Epsilonproteobacteria Sulfurovum and Sulfuricurvum, and also has significant numbers of Flavobacterium.

33

AB175543

Key 81!AB197160 82! Sulfurovum_NBC37-1 # Environmental sequence from Borup Fiord Pass Glacier 53! * Shorter sequence from 454 sequencing Sulfurovum_lithotrophicum

JF837612 Scale <50! 100! JF837776 Sulfurovum 0.1 substitutions/site 73! HM141114

79! OTU754_repseq_BF09.05_39978# * <50! 100! BF09-06_M97 #

EU101255

GU390872

JF837804 <50! BF09-06_M102 # 55! JQ278752

OTU686_repseq_BF09.04_1633# *

HQ162722 92! 85! JQ079852 Sulfuricurvum 77! HM632015 51! 63! Sulfuricurvum_kujiense

EF417531 92! 62! GU472654

JF789594

AB197158 100! NR_041439

58 ! AB088432_Sulfurimonas_autotrophica

JF808462

Sulfurimonas_denitrificans_DSM1251 51! 53! 100 ! EU403630 62! <50! OTU742_repseq_BF09.04_1302# * 66! JF837799 55! Sulfurimonas AB189362 86! JF837679

AB235320 <50! <50 50 ! !AB518743 69! AB304904 99! AB267457

DQ881175

JF806835

// Thiomicrospira_crunogena 0.07

Figure 2.4: A maximum-likelihood tree representing the phylogenetic relationships of the dominant Epsilonproteobacteria OTUs in the BF09-06 sulfur deposit to Epsilonproteobacteria reported from other sites. 100 bootstraps were performed and bootstrap values are shown beside each node. Bootstraps of less than 50 are not considered robust, and those nodes are therefore considered un-resolved. The closest match to the BF09-06 Sulfurovum and Sulfuricurvum sequences are from thermal sulfidic springs on Svalbard. Thiomicrospira crunogena (Gammproteobacteria) is the outgroup used to root the tree.

34

Sulfur redox genes

The metagenome data contains abundant sulfur-redox genes (Figure 2.5), including genes for: sulfide quinone reductase (sqr) and sulfide dehydrogenase (fcc, soxEF or sud) both involved in oxidizing sulfide to elemental sulfur; sulfur oxidation (sox) proteins (sox ABCDXYZ), known to be involved in the oxidation of thiosulfate, and also shown in vitro to have the capability to oxidize sulfide, elemental sulfur and sulfite (Rother et al., 2001); sulfite oxidoreductase, which oxidizes sulfite to sulfate (sor); sulfur oxygenase reductase (also sor genes) involved in the disproportionation of elemental sulfur to sulfide and sulfite; thiosulfate reductase (phs), responsible for the disproportionation of thiosulfate to sulfide and sulfate; polysulphide reductase

(psr), used to reduce polysulfides to sulfide; dissimilatory sulfite reductase (dsr), known to be involved in the reduction of sulfite to sulfide and vice versa; adenosine 5’-phosphosulfate (APS) reductase (apr) and quinone-interacting membrane-bound oxidoreductase (qmo) involved in the oxidation of sulfite to sulfate, and vice versa; tetrathionate reductase (Ttr), DMSO reductase

(dms), thiocyanate hydrolase (scn) and elemental sulfur reductase (HybA hydrogenase) (Friedrich et al., 2005; Stewart et al., 2011). However, the relative abundance of the genes present in the metagenome varied significantly. The sqr, sulfide dehydrogenase, sox, psr, sulfite oxidase and dsrE genes were present in significantly higher relative abundance (NRA values of 47 to 281) than other sulfur redox genes, including the other dsr genes (NRA values of 1 to 15)

35

Sulfide quinone reductase Sulfide dehydrogenase Polysulphide reductase Sox A Sox B Sox C Sox D Sox X Sox Y Sox Z Sulfite oxidase (sorA) Sulfite oxidase (sorB) Thiosulfate reductase APS reductase alpha APS reductase beta QmoA QmoB QmoC DsrA (alpha subunit) DsrB (beta subunit) DsrC DsrD DsrE DsrF Dsr gamma DsrK DsrL DsrM DsrN DsrP Tetrathionate reductase A Tetrathionate reductase B Tetrathionate reductase C Elemental sulfur reduction (hydrogenase HybA) Elemental sulfur disproportionation (sulfur oxygenase reductase) Thiocyanate hydrolase (scnC) DMSO reductase A DMSO reductase B Photosystem I P700 a apoprotein A1 Photosystem I subunit VII protein Photosystem II D2 protein Photosystem II core light harvesting protein (psbB) Photosystem II oxygen evolving complex protein Photosystem II protein N Chlorophyll synthesis pathway (bchC) Chlorophyll synthase (bchG) Bacteriochlorophyllide reductase subunit (bchX) Bacteriochlorophyllide reductase subunit (bchZ) Bacteriochlorophyll synthase 44.5 kDa chain Photosynthetic reaction center L subunit (pufL) Photosynthetic reaction center M protein (pufM) Photosynthetic complex (LH1) assembly protein LhaA Light-independent protochlorophyllide reductase subunit ChlL Light-independent protochlorophyllide reductase, B subunit Light-independent protochlorophyllide reductase, N subunit Photosystem q(b) protein NiFe-Hydrogenase group 1, small subunit (hydA) NiFe-Hydrogenase group 4 Fe-Fe Hydrogenase (hydA) Fe-Fe Hydrogenase (hydB) Fe-Fe hydrogenase (hoxU) Methane monooxygenase Ammonium monooxygenase (amoA) Hydroxylamine reductase Hydrazine synthase Hydrazine dehydrogenase Managanese oxidase (mnxA,F,G or cum genes) Arsenite oxidase Arsenate (respiratory) reductase (arrA) Respiratory (periplasmic) nitrate reductase (napA) Respiratory (periplasmic) nitrate reductase (napB) Respiratory (membrane) nitrate reductase (narG) Ammonification (membrane) nitrite reductase (nrfA) Ammonification (membrane) nitrite reductase (nrfD) Ammonification (membrane) nitrite reductase (nrfH) Denitrification nitrite reductase (nirK) Nitric oxide reductase subunit B (norB) Nitric oxide reductase subunit C (norC) Nitrous oxide reductase (nos) MoFe nitrogenase alpha subunit (nifD) MoFe nitrogenase beta subunit (nifK) FeFe nitrogenase (anf) Ribulose_bisphosphate carboxylase (RuBisCo)(cbb) ATP citrate lyase alpha subunit (aclA) ATP citrate lyase beta subunit (aclB) RNA polymerase beta subunit (rpoB) !" #!!" $!!" %!!" &!!" '!!" Number of hits per kb of gene per gene copy

Figure 2.5: Normalized relative abundance (NRA) of functional genes in the metagenome. Sulfur redox genes, particularly genes whose products oxidize reduced forms of sulfur, are present in high abundance, as are ATP citrate lyase genes. The RNA polymerase B (rpoB) gene, believed to be present in all organisms as a single copy gene, provides an absolute measure of normalization, and can be used to estimate the proportion of the microbes present which contain the other genes that have been quantified.

36

Other energy metabolism genes

There are almost no photosynthetic genes present in the metagenome (Figure 2.5), or phototrophs within the SSU rRNA data, indicating that the sulfur deposits do not contain significant numbers of bacteria capable of either oxygenic or anoxygenic photosynthesis, relative to the numbers of non-photosynthetic bacteria. Significant numbers of hydrogenases were detected in the metagenome (Figure 2.5), the vast majority of which were NiFe-hydrogenases group 1 (hydA) which are respiratory hydrogenases allowing microbes to use hydrogen as an electron donor in redox reactions (Mulder et al., 2010). In addition, the metagenome contained sequences for the Fe-Fe hydrogenase (hydA,B, hoxU) typically used by microbes to generate H2 by using protons as terminal electron acceptors in their electron transport chains in order to allow respiration to continue in anoxic environments when no other suitable electron acceptor is present (Boyd et al., 2010). The metagenome contained large numbers of the respiratory

(periplasmic) nitrate reductase (nap) genes (NRA > 150), indicating the genetic potential for nitrate respiration, but very few nar genes for membrane-bound respiratory nitrate reductase

(NRA = 2). The metagenome contained genes for nitrite reductase (nir), nitric oxide reductase

(nor) and nitrous oxide reductase (nos), involved in denitrification, but with higher NRA of nor and nos genes (NRA = 16 - 64) than nir genes (NRA = 4). Very few ammonification nitrite reductase (nrf) gene sequences were present (NRA = 2 - 5). No copies of the first gene in the aerobic oxidation of ammonium (amoA) were detected, and only 1 hit against hydrazine oxidoreductase (the second gene involved in ammonium oxidation to nitrite). No hydrazine synthase or hydrazine dehydrogenase genes (indicative of anammox) were found. Almost no arsenite oxidase (aox, aro or arx) genes were found, and no manganese oxidation (mxn or cum)

37

genes. Both reactions are potentially energetically-favorable but as only total As and Mn concentrations were assayed, arsenite or reduced manganese may not have been present.

Carbon and nitrogen fixation

The metagenome contained almost no RuBisCo (cbb) genes, indicative of carbon fixation via the Calvin-Benson cycle (Figure 2.5). RuBisCo is used by many bacteria to fix carbon, including phototrophs, so its absence is consistent with the lack of phototroph SSU rRNA or functional genes. There was a high relative abundance (NRA = 173–244) of ATP citrate lyase

(acl) genes, indicative of carbon fixation via the reductive TCA cycle (Hugler et al., 2005).

Comparison of the NRA range for RNA polymerase B (rpoB), which is believed to be a single- copy gene in every organism (Canfield et al., 2010), with the NRA of acl genes, demonstrated the acl genes are present in about 50-60% of all microbes at the site. Very few nitrogenase (nif, anf or vnf) genes (NRA = 0 – 8) indicate very little ability to fix nitrogen.

Phylogenetic origin of genes

Examination of the phylogeny of the best hits for genes that are abundant within the metagenome showed that the majority of sox, psr, sor, acl nap and rpoB gene sequences were likely to be of Epsilonproteobacterial origin, consistent with the fact that Epsilonproteobacteria dominate the BF09-06b SSU rRNA data.

DISCUSSION

My bioenergetic calculations based on the geochemical analysis of the BF09-06 sulfur deposit, the SSU rRNA data and the data on functional genes found in the metagenome, strongly

38

establish the hypothesis that the main energy source for primary productivity in the BF09-06 deposit is the oxidation of reduced sulfur species. The metagenome data show that the BF09-06 microbes have the genetic capability to oxidize sulfide through multiple oxidation reactions to sulfate, and the bioenergetic calculations (Figures 2.2A & 2.2B) confirm that this is highly energetically favorable at every stage, and that the oxidation of S0 provides the most energy.

I infer that that Sulfurovum sp. and Sulfuricurvum sp. are the major primary producers in this sulfur-driven microbial ecosystem. All the sulfur redox genes that are present in high relative abundance are possessed by the sequenced representatives of these Genera, Sulfurovum NBC37-1

(NCBI accession number: NC_009663, Nakagawa et al., 2007) and Sulfuricurvum kujiense

(NCBI accession number: NC_014762) indicating the genetic capability to oxidize sulfide, thiosulfate and sulfite, and to reduce polysulfide. Cultured representatives of Sulfurovum and

Sulfuricurvum have shown the ability to grow using energy from the oxidation of reduced sulfur species, including sulfide, S0 and thiosulfate (Inagaki et al., 2004; Kodama and Watanabe, 2003

& 2004; Nakagawa et al., 2007; Yamamoto et al., 2010; Yamamoto & Takai, 2011) and to use the reductive TCA cycle to fix carbon (Hugler et al., 2005; Kodama &Watanabe 2004;

Nakagawa et al., 2007). It is therefore likely that the Sulfurovum and Sulfuricurvum sp. present in the BF09-06 sulfur deposit also have these capabilities, although it is not clear which reduced sulfur species are being utilized, or which of their sulfur redox genes are being expressed.

No sulfide was detected in the BF09-06 deposit, but the sulfide assay was not done until several days after sample collection, by which time any sulfide could have oxidized. Thiosulfate and sulfite were not detected either, but these species are often present below detection levels due to rapid cycling (Thamdrup et al., 1994; Zopfi et al., 2004). Both thiosulfate and sulfite will form by abiotic oxidation of sulfide (Zhang & Millero, 1993) and so would be generated if

39

sulfide was present. Polysulfides will form abiotically when sulfide and S0 react, and are the main product of the sulfide-quinone reductase oxidation of sulfide in vitro (Griesbeck et al.,

2002), so would probably also be present in the deposit if sulfide is there. The Borup Sulfurovum and Sulfuricurvum sp. could potentially be involved in cycling sulfur, by aerobic oxidation of sulfide to polysulfide using sqr genes, and then re-reducing the polysulfide to sulfide using psr genes if a suitable reductant is present. Organic carbon could, in theory, be used to reduce polysulfides, but culturing of Sulfurovum has not so far demonstrated this ability (Yamamoto et al., 2010).

S0 is abundant on the glacial surface, and aerobic oxidation of S0 was the most abundant energy source at the time of sampling, though it is important to note that the most important energy source over time may vary depending on the flux of sulfur species and other nutrients into the deposit. However, given the known abilities of cultured Sulfurovum spp. and Sulfuricurvum sp. to oxidize S0 it seems likely that the Borup Sulfurovum sp. and Sulfuricurvum sp. are oxidizing S0 to provide energy for primary productivity.

The genes involved in the oxidation of S0 are still not completely understood. The reverse dsr gene products are known to be able to be utilized to oxidize S0 as well as sulfide (Dahl et al.,

2005; Dahl et al., 2008) but these genes were not detected in significant quantities in the metagenome. The sox gene products have been shown to be able to oxidize S0 in vitro (Rother et al., 2001) although to our knowledge it has not yet been proven that bacteria possessing the sox genes can actually use them in this way in vivo, given the challenges of enabling an intracellular enzyme complex to access an extracellular insoluble substrate like S0. An intriguing possibility for how the periplasmic sox proteins might be able to access external S0 is provided by the high relative abundance of dsrE family genes in the metagenome, and their presence in the genomes

40

of Sulfurovum NBC37-1 and Sulfuricurvum kujiense, which do not possess the other dsr genes.

In the full dsr complex, the dsrE gene product is one of those used to mobilize S0 that is inside the cell (Dahl et al., 2008) and so our data raise the question of whether the dsrE family protein in Epsilonproteobacteria might be used to mobilize external S0 so that sox proteins can oxidize it.

Given that the same amount of energy is available from the oxidation of sulfide to sulfate whichever pathway is taken, it is not clear why the relative abundance of sox genes are so much higher than the relative abundance of reverse dsr genes. Factors which could influence this might include the relative efficiency of enzymes in conserving energy (Sievert & Vetriani, 2012), or could be unrelated to energy and simply be a consequence of better environmental fitness of the

Epsilonproteobacteria compared to organisms which use the reverse dsr genes, for other reasons.

The metagenome data show very low levels of other genes known to be involved in reductive sulfur redox reactions (dsr, apr and qmo genes) or the disproportionation of thiosulfate and S0 (phs and sulfur oxygenase reductase sor genes, respectively) so I conclude that these reactions are unlikely to be significant in this environment, despite having been shown to be significant components of sulfur cycling in marine sediments (Jørgensen & Nelson, 2004). A significant caveat is that few S0 disproportionation genes have been characterized, so they may be under-represented in GenBank compared to other sulfur cycling genes, which would affect the results of my analysis.

Nitrate and nitrite respiration

The metagenome data show very high levels of periplasmic nitrate reductase (nap) genes, illustrating that the microbial community in the sulfur deposits possesses the capability for nitrate respiration. Both the sequenced and cultured representatives of Sulfurovum and

41

Sulfuricurvum are known to be able to respire nitrate (Inagaki et al., 2004; Kodama and

Watanabe, 2004; Nakagawa et al., 2007) suggesting that the Borup strains of these Genera also have this capability. The genes needed for denitrification and nitrate ammonification, the onward reduction of nitrite to nitrogen gas and ammonium, respectively (Zumft, 1997) are present in much lower numbers than the nap genes. Very little nitrate was detected in the deposit, and although a nitrite measurement was taken, it was below the lower limit of the standard curve used for the assay, and so cannot be considered reliable. Nitrate or nitrite could only be used as significant oxidants by the microbial community if they were being replenished, but the genes for nitrification (the aerobic oxidation of ammonium to nitrate via hydroxylamine and nitrite) were at extremely low relative abundance or not detected at all. Atmospheric deposition may be a possible nitrate source (Holtgrieve et al., 2011) and very low levels of nitrate were detected in glacial run-off streams (0.03-0.15 ppm, S.Grasby unpublished data) although this data is from a different year.

Taking into account the bioenergetics, SSU rRNA data and the relative abundance of functional genes in the metagenome, I conclude that several sulfur redox reactions may be significant in this environment (see Figure 2.6 below). DNA evidence only shows the genetic potential of microbes, not which genes are actively being used, so it is impossible to tell which of these reactions are actually being catalyzed by the bacteria in the sulfur deposit. However, my current interpretation of the integrated data is that the Borup Sulfurovum sp. and Sulfuricurvum sp. are oxidizing reduced sulfur species, in particular S0, using oxygen and possibly also nitrate, and that these reactions are being used to provide energy sources for carbon fixation. Their numerical dominance in the SSU rRNA data suggests they are the main primary producers of

42

this site. This is further supported by the fact that the majority of sulfur redox and carbon fixation genes appear to be of Epsilonproteobacterial origin.

"94*4>2*$

"%$ %&$ !"#$ !,9$ 6,B<029$ & !/0172$ @"' $ ,94*,-2*/67(82$

%&$ %&$ !"' ()*+,-$!%"'

@47-(72$ 20267-,)8$ -2*/67(82$

%& .$$ !) $ ! :,0;8/0<=4*2$ -2*/67(82$ !/01*2$3/4),)2$ @" & 52*/67(82$ % $ !/01*2$ *2=;*-,?2)(82$ %&$ ! A%"$ 52*/62*$

Figure 2.6: Reaction pathways which are likely to be significant in the BF09-06 sulfur deposit, based on the relative abundance of functional genes. Solid blue lines indicate reactions catalyzed by enzymes whose genes are present in high relative abundance. The dotted blue line indicates that genes involved in the oxidation of S0 are not known with certainty. The black line is an abiotic reaction.

Sub-aerial Sulfurovum and Sulfuricurvum

This is the first study in which either Sulfurovum spp. or Sulfuricurvum spp. ave been shown to be dominant in a sub-aerial environment. In previous studies, Sulfurovum has been

43

found in sulfidic environments including springs, hydrothermal vents, caves, sinkholes and anoxic/sulfidic sediments (Borin et al., 2009; Han et al., 2009; Handley et al., 2012; Jones et al.,

2010; Macalady et al., 2006; Macalady et al., 2008; Nakagawa et al., 2005, Porter & Engel,

2008; Ramos-Padrón et al., 2011; Reigstad et al., 2011; Sahl et al., 2010) and it has been suggested that they preferentially colonize environments with high levels of sulfide and low levels of oxygen (Macalady et al., 2008; Jones et al., 2010). This accurately describes the spring water which I believe was the source of the organisms that seeded the glacial sulfur deposit, but it is not clear how well this describes the deposit itself as we were not able to determine a sulfide:oxygen ratio. Culturing studies of Sulfurovum demonstrates that it prefers microaerophilic to fully oxygenated growth conditions (Yamamoto et al., 2010). The cultured

Sulfuricurvum has also been shown to prefer anoxic or microaerophilic environments (Kodama

& Watanabe, 2004) and environmental sequences are from anoxic, often sulfidic, sites (Chen et al., 2009; Gaidos et al., 2009; Haaijer et al., 2008; Kodama & Watanabe, 2003; Wagner et al.,

2007; Watanabe et al., 2000). Also of note is that our SSU rRNA data show that Flavobacterium are more abundant in the surface part of the BF09-06 deposit than the Epsilonproteobacteria, and are by far the most abundant in the thin sulfur varnishes (BF09-02, BF09-04 and BF09-05). As

Flavobacterium has been characterized as an aerobic heterotroph we hypothesize that the aerobic metabolism of Flavobacterium at the surface of BF09-06 is reducing the oxygen availability, creating a microaerophilic environment in the lower parts of the deposit, thus creating the conditions in which Sulfurovum and Sulfuricurvum can thrive. The low abundance of Sulfurovum and Sulfuricurvum in the much thinner sulfur varnish deposits support the hypothesis that they cannot thrive in a fully oxygenated environment.

44

Flavobacterium

The role of Flavobacterium in this system is not fully determined. Flavobacterium is a known heterotroph, and there is energy available from organic carbon for growth. However, the relatively high level of Flavobacterium in the surface sulfur deposits raises the question of whether it plays a role in sulfur cycling as well. Microbial heterotrophic sulfur cycling has been shown to be significant in other environments (Mason & Kelly 1988; Sorokin, 1996) and

Flavobacterium has been shown to be able to oxidize thiosulfate (Teske et al., 2000) and dimethyl sulfide (Green et al., 2011) in culture, although it still required organic carbon to grow.

It is not clear why Flavobacterium is so dominant in the thin sulfur varnishes, and this remains a subject of investigation.

Missing metabolisms

A surprising feature of this microbial community is the fact that some energetically- favorable reactions appear not to be utilized. There were also almost no genes present for photosynthesis or ammonium oxidation despite the fact that these are energetically favorable.

Previous metagenomic and SSU rRNA studies using the same methods as in this study have detected abundant phototroph SSU rRNA and functional genes at other sites, so it is clear that the absence of such genes in this study is not because the methods I used could not detect them, but does in fact indicated their relative absence compared to the sulfur redox genes.

Photosynthesis is commonly regarded as the dominant energy source for primary productivity in ‘light’ environments. As our field site is in the arctic, at the time of sampling it experienced 24-hour sunlight, and the presence of plant life in the proglacial area proves that photosynthesis is feasible at this latitude. Other studies have also demonstrated that both

45

oxygenic and anoxygenic phototrophs can grow in permanently cold and arctic conditions, although productivity may be lower than elsewhere (Hughes & Lawley, 2003; Ng et al., 2010;

Roeselers et al., 2007; Schmidt et al., 2010; Simon et al., 2009). While sulfide is known to inhibit oxygenic photosynthesis (Oren et al., 1979; Miller & Bebout, 2004) some cyanobacteria readily utilize sulfide for anoxygenic photosynthesis (Castenholtz & Utkilen, 1984; Cohen et al.,

1986; Jørgensen et al., 1986). The presence of sulfide also does not explain the absence of green and purple sulfur phototrophs which also not only tolerate, but use, sulfide, and dominate other cold sulfidic sites (Douglas & Douglas, 2001; Klepac-Ceraj et al., 2012; Ley et al., 2006). While photosynthesis is known to be inhibited by temperatures above 72 °C (Hamilton et al., 2012) that is clearly not an issue in this environment. The fact that there were a few SSU rRNA genes from phototrophs, and very small numbers of functional genes (only 1 or 2 sequences per gene) indicates that phototrophs are able to reach the site via global water and/or wind distribution systems, and other work has shown evidence of strong local aerial transport of cyanobacteria in the area (Harding et al., 2011). I hypothesize that the relative absence of phototrophs is probably due to a combination of several factors: (1) the arctic setting allows only a short growth season, due to lack of light and extreme cold during winter; (2) the glacial surface appears to be a relatively pristine site, as it is not covered in soil, plant or visible microbial growth, and (3) the spring emerges in a slightly different place each year, seeding the glacial surface with microbes from a dark subsurface environment. The large influx of subsurface organisms (unlikely to contain many phototrophs), together with an abundant energy supply for microbes capable of aerobic oxidation of reduced sulfur, allows for microbial growth that is more rapid than phototroph colonization via random atmospheric transportation. In hydrothermal vent environments Epsilonproteobacteria have been shown to be the most rapid initial colonizers,

46

even though they are not necessarily dominant in the environment overall (López-Garcia et al.,

2003, Nakagawa et al., 2005) and that may help to explain why Epsilonproteobacteria are dominant in our sulfur deposits despite the fact that they are not dominant in the spring water.

The role played by the Burkholderiaceae that dominate the spring water is unknown.

The absence of ammonium oxidation is also intriguing. Ammonium oxidation by oxygen

(nitrification) or nitrite (anammox) is energetically favorable according to our bioenergetic calculations, and nitrification has been shown to be significant in another subglacial site (Boyd et al., 2011). However, neither the bacterial nor archaeal amoA gene responsible for the first step in this process (Boyd et al., 2011) were detected in the metagenome. Genes associated with anammox (hydrazine synthase and hydrazine dehydrogenase, Kartal et al., 2011) were also not detected and the SSU rRNA data for BF09-06 does not contain any Planctomycetes (the only lineage so far known to be able to carry out anammox) although there are some present in the spring water. Both these metabolisms could be present at a very low level, with gene numbers too small to be detected by the depth of our sequencing, but it is surprising not to find either one or the other. The best hypothesis to explain the lack of ammonium oxidation is that the bulk samples required for geochemical analysis and DNA extraction in this low biomass system may have sampled across any stratification in chemical species. This could mean that the ammonium was not present in the same part of the deposit as either nitrite or oxygen, so that neither aerobic nor anaerobic oxidation of ammonium was feasible. However, the deposit was very liquid, and so diffusion of chemical species should be possible. In environments where anammox is known to take place, such as the Black Sea, it is typical to observe a stratified system with oxidation of ammonium by nitrite in anoxic zones, and with nitrite introduced by mixing or diffusion from more oxidized zones nearer the surface, with potential overlap of both anammox and nitrification

47

in suboxic zones (Thamdrup & Daalsgard, 2002; Kuypers et al., 2003; Lam et al., 2007;

Daalsgard et al., 2005).

CONCLUSION

The bioenergetics calculated from the geochemical analysis, SSU rRNA data, metagenomic data and previously published work on the genomes of Epsilonproteobacteria and their capabilities in culture, are all consistent with the hypothesis that the Sulfurovum sp. and

Sulfuricurvum sp. that dominate the BF09-06 sulfur deposit are autotrophs, using sulfur redox energy to fuel primary productivity. Given the numerical dominance of the Sulfurovum and

Sulfuricurvum in my dataset, they appear to be the major primary producers in the deposit, which is the first time they have been observed to be dominant in a sub-aeriel environment. The evidence suggests that sulfur redox energy is the main source of primary productivity in this environment, instead of light, even though light is abundant. It is not clear why energetically- favorable metabolisms such as photosynthesis, nitrification and anammox are not being utilized to any significant degree. My study has shed new light on the way that bioenergetics and metagenomic data can be used in combination to understand the specific ways that microbes are obtaining energy from their environment, and to identify habitable niches that are unoccupied.

Further work would be needed to determine the reasons why these environmental niches are not utilized, and whether this is a temporary or permanent phenomenon.

48

CHAPTER 3

METAGENOME ANALYSIS METHOD DEVELOPMENT AND SUPPLEMENTARY ANALYSIS

INTRODUCTION

In order to produce the results in Chapter 2 it was necessary to determine the best method of analyzing the metagenome data in order to answer the specific scientific questions being asked. Metagenomic and metatranscriptomic studies in their current form did not exist prior to the invention of 454 pyrosequencing in 2005 (Margulies et al., 2005). The studies that were carried out using Sanger sequencing (e.g. Poretsky et al., 2005, Rondon et al., 2000, Rusch et al.,

2007, Tringe et al., 2005) faced significant difficulty in obtaining large enough numbers of sequences to produce meaningful coverage of a range of functional genes. The invention of 454 and subsequent next-generation sequencing technologies has massively increased the amount of sequence data that it is possible to obtain but this creates a new challenge of how best to analyze such large quantities of data. Although the number of metagenomes that have been analyzed now runs into the thousands and is increasing rapidly, metagenomic analysis is still a developing field and a consensus has not been reached on the best way to handle this type and scale of data.

Moreover the expertise to do so is not yet widespread. For example, although the University of

Colorado contains a significant cadre of expert bioinformaticians, at the time this analysis was done, as far as it could be determined, no-one at the University had previously analyzed a metagenome. This chapter sets out the process by which I developed and tested the metagenome analysis method used in Chapter 2, and some supplementary analyses I carried out to check and confirm the results set out in Chapter 2. It also includes my thoughts on how metagenome analysis methods could be improved.

49

CHALLENGES INVOLVED IN ANALYZING A METAGENOME

The challenges to be faced in analyzing a metagenome (or metatranscriptome) can be summarized as: i) quality control of sequences to exclude poor quality reads; ii) identifying the function, and likely phylogenetic origin, of genes represented within the metagenome, iii) collecting this information together in a way which enables the researcher to answer the scientific questions that the study was designed to address.

Essentially the problem is one of scale. For a ‘metagenome’ of 100 sequences the task would be trivial. The National Center for Biotechnology Information (NCBI) website

(http://blast.ncbi.nlm.nih.gov/Blast.cgi) could be used to BLAST (Altschul et al., 1990) the sequences against a protein or nucleotide databases. Quality control would be assured by using top hits that met suitable cut-offs (e.g. bit score). The function of each hit and the phylogenetic origin of the sequence that was the best match could be obtained by manually looking through the hits, and the number of sequences is small enough that a researcher could instantly gain an overview of the information contained in the ‘metagenome’, although of course not much useful information could be obtained from such a small number of sequences. For a metagenome of a million sequences, this approach would take a prohibitively-long period of time, and so automated methods are needed. This requires access to computer programs that will collate the information in an appropriate way to enable to science questions to be answered, or the programming skill to write custom programs, and access to sufficient computational power to handle the data sets.

50

METAGENOME ANALYSIS METHOD DEVELOPMENT

Initial analysis of the metagenome was performed by using gene sequences of known sulfur redox genes as probes against the metagenome data. I downloaded the BLAST 2.2.22

Executables from the NCBI website and installed them on a local computer so that the BLAST algorithms (Altschul et al. 1990) could be used to probe the metagenome data. I converted the metagenome sequence file of 1.2 million sequences into a database that could be searched using the BLAST programs by using the BLAST formatdb function. I downloaded sequences of sulfur redox genes were downloaded from the NCBI (http://www.ncbi.nlm.nih.gov/) or KEGG

(http://www.genome.jp/kegg/, Ogata et al., 1999) databases as either nucleotide (nt) or amino acid (aa) sequences from a range of different organisms. The soxB gene, which has been well characterized in many organisms, was the first gene that I tested using nt and aa sequences from organisms spanning the sequence diversity of this gene as set out in Petri et al., 2001. As the metagenome sequences were nt sequences, the blastn program was used when nt probe sequences were used, and the tblastn program was used when aa probe sequences were used. As expected, given the redundancy of the genetic code which allows silent mutations to occur, the aa sequences produced much more consistent numbers of hits than nt sequences, which varied from only 7 hits using the SoxB nt sequence from Paracoccus to 253 hits using the SoxB gene from Sulfurovum, using a bit score cutoff of 50 (the cutoff used by Canfield et al, 2010).

Consequently aa sequences were used as the probe for all subsequent analyses. I also changed the quality control cutoffs to be more stringent, as I was concerned that a bit score of 50 was not sufficiently robust, and so I subsequently used cutoffs of a maximum e-value of 10-10 and a minimum percentage identity of 50%, as described below.

51

Although the method of BLASTing specific gene sequences against a database made of the metagenome sequences produced significant hits, it was possible that a metagenome sequence might have an even more significant hit against another gene. To test this it was necessary to identify all the metagenome sequences that were significant hits to a reference sequence (eg to a soxB reference sequence), and then BLAST the metagenome sequences against

GenBank or another universal database in order to determine if the top hit was a gene with a different function than the one originally used as reference. As I used a number of different reference sequences for each sulfur redox gene in order to try and capture the sequence diversity of each gene (e.g. 15 different soxB reference sequences were used), it was first necessary to combine all these results and remove metagenome sequences that were included in the results more than once because they were significant hits for more than one reference sequence. I did this manually. I then used the unique list of metagenome sequences that were significant hits to the sulfur redox reference genes as probes to BLAST against the GenBank protein nr database using the blastx program. The blastx program translates the nt sequence into all 6 possible aa sequences, i.e. all three reading frames for both the given sequence and its complementary sequence, and then uses all 6 aa sequences as probes against the protein nr database, retaining the best hits. As the initial BLAST result gave only the identifying number of each metagenome sequence that was a significant hit to the target gene, not the sequence itself, I wrote a short computer program in the Python programming language in order to quickly copy a batch of sequences from the metagenome to avoid the need to find and copy each sequence individually.

This Python script (grab_seqs.py) takes a list of sequence identifying numbers as input, and produces a fasta file with all the sequences as output. It also produces a log file that includes error reporting. I wrote a second script (check_seq_ident_input.py) to check that the input file for

52

grab_seqs.py is in the correct format. The most recent version of grab_seqs.py is in Appendix 1 and the most recent version of check_seq_ident_input.py is in Appendix 2.

BLASTing the unique list of metagenome sequences that had been significant hits for the reference genes against the NCBI protein nr database was done for the hits to each of the DsrA,

DsrB, rDsrA, rDsrB, psrA, soxB, soxC, sqr, sulfide dehydrogenase and thiosulfate reductase reference gene sequences. The next step was to scan all the results to see if the top hit was still the sulfur redox gene, or if there was something else that was an even more significant hit. For the first genes tested, DsrA&B and rDsrA&B, this was a fairly quick process as there were very few hits. For example, there were only 4 metagenome sequences that hit DsrB significantly and when BLASTed against GenBank three of them had DsrB sequences as their top hit while the fourth was DsrA. However, for genes with more hits (e.g. there were 244 unique metagenome sequences that were unique hits to sqr) it was quickly determined it would take a long time to look at each individual result manually, and so, although technically feasible, it would take a very long time to quantify each gene of interest this way, given the large number of genes of interest.

Even more importantly, it was also a consideration that the method of using specific gene sequences as probes could result in something important being missed. Relevant genes could be missed if the gene possessed such a high range of diversity in its aa sequence that the reference aa sequence used wasn’t a significant match. Since the full range of diversity in aa sequence isn’t known for all genes this risk would be impossible to exclude. Therefore I considered it was preferable to use the metagenome sequences as probes and BLAST them against a comprehensive database in order to assign function. Although relevant genes could still potentially be missed if they were so different in sequence to previously-known genes that they

53

did not have any significant hits in the database, the risk would be much lower as the whole database would be available for sequence comparisons, rather than just a reference sequence (or selection of reference sequences).

An unavoidable limitation of any molecular study, separate from the issue of gene sequence diversity, is that we may have not have identified all genes responsible for functions of interest to the study, in which case relevant genes could not be identified by any sequence- comparison method. For example, although many bacteria have been identified which can gain energy by the oxidation of Fe2+ (Blöthe & Roden 2009, Lamers et al., 2012, Kozubal et al.,

2012, Wang et al., 2011) the genes which are responsible for this capability have not been determined. It was therefore impossible to quantify these genes in the Borup metagenome, even though the bioenergetic calculations showed that iron oxidation should be energetically- favorable. There may also be microbial functions that have yet not been identified in either environmental or culturing studies, and so for which not only have genes not been identified, but we would not even know that such genes might exist.

In the absence of access to a computer cluster or server, the first attempt I made to assign functions to the metagenomic sequences by BLASTing the sequences against a universal database involved BLASTing the sequences against the NCBI protein nr database online, again using the tblastn algorithm. This algorithm is a lot more computationally intensive than the blastn algorithm. The NCBI server assigns only a set amount of time to each job submitted to it online, in order to ensure that users with very heavy computational needs don’t take up all the computational time, which would exclude other users. The number of sequences that can be processed in any one job depends on the specific blast algorithm used. While several hundred sequences can be processed using the blastn algorithm, a far smaller number can be processed in

54

one go using the tblastn algorithm, and when jobs time out on the NCBI server all the data is lost

(i.e. partial results are not returned). I determined by trial and error that only 65 sequences could be submitted at one time, so that it was clearly going to take far too long to use this method to assign functions to the entire metagenome. An alternative that was considered was to download the protein nr database and carry out the BLAST on a local computer. Consultation with more experienced bioinformaticians who had carried out similar types of processes, indicated that this should be feasible, but would take a significant amount of computational time (possibly several weeks on a single laptop). Moreover, it still left the requirement for another step to collate the results of the BLAST search, collecting the top hits that resulted from each sequence in a format that would be of manageable size and so which would enable useful information to be retrieved.

This would also require significant computational power, and newly-written software as an ‘off- the-shelf’ program to do this was not available.

I decided instead to use the MG-RAST online metagenome analysis pipeline (Meyer et al., 2007) for the initial quality control of the metagenome sequences, the assignment of function by BLASTing the metagenome sequences against the GenBank protein nr or other universal databases, and the initial collating of top hits resulting from the BLAST. These are the most computationally-intensive parts of the process, and the parts that require the most complicated computer programming ability. Although the issues of lack of access to a computer cluster and limited programming ability could have been solved in time, or by collaboration, it was a much faster process to use the MG-RAST pipeline, and I therefore considered this the preferable solution.

The MG-RAST pipeline provides a freely-available web-based service which produces an automated annotation of metagenome data with user input into the quality controls used. The

55

first step is to remove poor quality reads from the metagenome. Artificial replication of sequences is a known artifact of the 454 sequencing process (Gomez-Alvarez et al., 2009), and so, as set out in Chapter 2, the MG-RAST pipeline was used to remove sequences that had exactly the same nt sequence for the first 50 nt, leaving only one sequence, as these were considered more likely to be artificial than genuine replicates. Poor quality sequences, defined as those with 10 or more ambiguous bases, or sequences that were less than 75 nt long, were also removed. Table 3.1 below shows the number and characteristics of the metagenome data initially, and following these quality-control steps. The dataset resulting from these quality- control steps (the quality-controlled database) was downloaded from the MG-RAST website and was used for all subsequent analyses, not just the remainder of the MG-RAST process.

Initial metagenome database Post quality control database Number of sequences 1,238,751 957,074 Average length 530 +/- 63 bp 537 +/- 37 bp Total bp 657,394,433 bp 514,016,777 bp GC % 41 +/- 9% 41 +/- 9% Table 3.1 Characteristics of the metagenome database before and after initial quality-control to exclude poor-quality sequences and artificial duplicates

Following quality-control the MG-RAST uses gene-finding algorithms to predict gene sequences (‘features’) within the metagenome data. It does this, rather than BLASTing each sequence against a database because some metagenome sequences could contain fragments of 2 genes. For protein-coding sequences, the nucleotide sequences are also translated into amino acid sequences which are used for all subsequent analyses. The protein-coding sequences are clustered at 90% amino acid similarity and a single sequence is used as a representative sequence for the entire cluster to identify the function and likely phylogeny. The representative sequence is compared to a number of databases using a BLAST-like algorithm called BLAT (Kent 2002).

56

The protein databases include the SEED (Overbeek et al., 2005), GenBank nr (Benson et al.,

2011), RefSeq (Pruitt et al., 2009), IMG/G (Markowitz et al., 2008), UniProt (The UniProt

Consortium, 2011), eggNOGG (Muller et al., 2010), KEGG (Kanehisa et al., 2008) and PATRIC databases (Gillespie et al., 2011) as well as the M5NR non-redundant database made by the MG-

RAST team from combining data from all of the above. The top hit from this comparison is retained. If there are two or more top hits with the same score they are all retained, as the MG-

RAST team has no way to determine which is more likely to be correct. MG-RAST also analyzes ribosomal RNA sequences using the Greengenes (DeSantis et al., 2006), SILVA

(Pruesse et al., 2007) and RDP (Cole et al., 2009) databases. The user can download the results of analysis against any of these databases, and can set the quality-controls used to filter results.

The MG-RAST pipeline identified 823,237 predicted protein features in the Borup metagenome and was able to assign a function to 536,866 of these features (65.2%) using a maximum e-value of 10-3 (the default MG-RAST cutoff).

I carefully considered the question of the cutoffs to be used in each BLAST. The Canfield et al. 2010 paper which was in general considered a good model for this study as it also involved quantifying a number of sulfur redox genes, discarded hits with a bitscore of less than 50, based on work done by Shi et al. 2011 suggesting that this was a reasonable score to assess significance. However, the Shi et al. work was done when 454 sequences were shorter than those currently being obtained, and discussion with one of the main authors of that study (Ed DeLong) suggested that perhaps a higher cutoff would now be more appropriate. The MG-RAST analysis allows selection by e-value and percentage identity. Literature study cutoffs included a bit score of 50 (Canfield et al., 2010, Inskeep et al., 2010), or an e value of 10-3 (Dinsdale et al., 2008,

Simon et al., 2009) or 10-5 (Biddle et al., 2011). However, I considered that these cutoffs were

57

too generous. An e-value of 10-5 means that there is a 1 in 105 probability that a hit is random, rather than significant, for the database used. With a database of around 106 sequences (957,000 sequences after the MG-RAST initial quality controls – see below) this would mean that every probe had 10 false positives. In order to ensure that the probability of a false positive hit was

0.01 or less, an e-value of, at most, 10-8 would be needed for a database of 106 sequences. As the goal of the project was to do some accurate relative quantification of genes in the metagenome, a cutoff of 10-10 was used to provide extra stringency in order to minimize the risk of noise in the data.

I also considered the question of the percentage identity cutoff. Proteins with as low a percentage identity as 30% have been shown to have the same function in some cases (Cheng

2006) but that is not always the case (Krugel et al., 1995). As proteins have different levels of conservation there will not be one single value that works for all proteins. A balance would be needed between not being too relaxed, in order to avoid false identifications, but also not being too stringent, which could exclude too much valid data. I perfomed a sensitivity analysis by using an e-value cutoff of 10-10 and percentage identities of 50%, 60%, 70% and 80%. The number of results did not decrease markedly when 60% was used instead of 50%, but at 70% the number of hits was only half that at 50%, and at 80% the number of hits was only 10% of that at

50%. I therefore decided that using 70% or 80% would probably result in too much information being lost, and so either 50% or 60% would be more appropriate. A value of 50% was used as this was already more stringent than any quality control cutoff seen in the literature. An in silico study simulating the proportion of protein-coding sequences that hit a gene of similar function at different e-value cutoffs showed that 500 bp sequences will hit a gene of similar function at an e- value cutoff of 10-10 and 50% identity (Mou et al., 2008).

58

A further issue I considered was the quality of the annotation on GenBank and other databases, and the risk of ‘annotation drift’. This potential problem could arise because most genes are annotated by sequence comparison. Originally there must have been an experimental study which determined the function and sequence of a given gene. Subsequently, genes with closely-related sequences would be annotated as the same function. ‘Annotation drift’ could occur as this process iterates. For example, gene A has its function experimentally determined.

Gene B is subsequently annotated as being the same as gene A because it is 80% identical . Gene

C is subsequently annotated as being the same gene it is 80% identical to gene B, but at this point it might be only 64% identical to gene A. Extending this process could result in mis-annotation of genes. GenBank is the database with the most universal sequence coverage and this is an advantage in terms of being able to identify a diverse range of genes, but in order to maintain its universal nature its content is not audited, and the quality of annotation is varied. Some metagenome studies have addressed this challenge by using a custom database (e.g. Inskeep et al., 2010) and The Seed is an attempt by a group of microbiologists to produce a good quality database that will be a general resource for the community, by aiming to produce high-quality annotation of 1000 microbial genomes (Overbeek et al., 2005). I decided to use GenBank for annotation as I considered the advantages of having the widest-possible range of reference sequences to use to try to identify function outweighed the risks of mis-annotation. This was particularly the case given the fact that the environmental characteristics of the site are different from the types of sulfur-rich systems that have previously been studied, such as hydrothermal vents, acid mine drainage and acidic hot springs, which could mean that the metagenome would contain functional genes that were quite diverse from those previously characterized. In order to minimize the risk of mis-annotation I examined manually the NCBI record for gene sequences

59

hit by large numbers of metagenome sequences. While it is impossible to independently verify that the annotation in NCBI was correct, it was possible to see whether the annotation looked reasonable by checking whether there was a link to a published experimental study, or whether the organism given as the top hit was a close relative of one known to possess those capabilities.

An additional issue that I considered was that environmental sequences could be closely- related to relevant genes, but might non functional, for example if there was an internal stop codon. As all the comparisons were done using aa sequences any stop codons within the metagenome sequence would shorten the ‘feature’ used by MG-RAST to assign function, and this would be reflected in a lower quality score, which would mean that such hits would be less likely to be included in the results given the rigorous quality-control limits that were used.

However, this could not of course be guaranteed. In addition, as the metagenome sequences were not full gene length, if a sequence hit a non-functional gene that had a stop codon outside the region covered by the metagenome sequence, it would be missed. This is an inevitable consequence of using less than full length gene sequences. The alternative of using assembled sequences (contigs) was considered, but this would not have allowed for accurate relative quantification of genes, which was an essential part of the study, so this approach was not used.

Initial analysis using MG-RAST also tested the feasibility of using one of the ‘universal’ functional classification schemes: KEGG orthology (Kanehisa et al., 2008), COG (Sayers et al.,

2009), NOG (Muller et al., 2010) or the SEED subsystems (Overbeek et al., 2005), to select sulfur redox and other energy genes. Table 3.2 below shows the number and percentages of protein features that were recognized by each of these ‘universal’ functional group classification schemes.

60

Classification Scheme Number of protein- Percentage of Percentage of all coding features annotated protein- protein-coding recognized coding features features recognized recognized COG 445,046 83 54 KEGG (KO) 321,561 60 39 NOG 58,316 11 7 The SEED 770,416 144 94 subsystems Table 3.2 Protein-coding features recognized by different ‘universal’ functional classification schemes

Clearly there is some double-counting as the SEED subsystems has apparently recognized >100% of the protein features to which a function had been assigned, and unfortunately the MG-RAST method of calculating the numbers of features recognized by each classification scheme was not explained. However, it is clear from these data that the SEED subsystems classification did the best job of recognizing proteins, and that the COG, KEGG and

NOG systems are nowhere near comprehensive enough for this type of dataset. Unfortunately, the SEED subsystems did not identify even all the major sulfur redox genes in its classification system; the sox, Dsr, sulfite oxidase and sulfide dehydrogenase (fcc) genes are included, but key sulfur-cycle genes such as sulfide-quinone reductase (sqr), polysulphide reductase and thiosulfate reductase do not appear to be included within the classification scheme. I therefore decided that it would be necessary to go through the output of the MG-RAST annotation manually, to identify hits of interest that would require accurate quantification, as described in

Chapter 2. This involved going through the hit result table of 44,371 lines, manually, to find hits against proteins of interest. The metagenome sequences that had hit a protein of interest were then downloaded and quantified as described in Chapter 2.

61

SUPPLEMENTARY ANALYSES

I carried out supplementary analyses to further check and confirm the results of the metagenome analysis which are set out in Chapter 2.

Comparison of ribosomal RNA analysis from different sources

In addition to containing protein-coding sequences the metagenome also contained ribosomal RNA sequences, and I analyzed these using MG-RAST in order to compare the results with those found using pyrosequencing of SSU rRNA amplicons (as set out in Chapter 2). As quality-control cutoffs are better established for ribosomal RNA sequence analysis than for proteins, it was possible to use higher cutoffs. Cutoffs of a maximum e-value of 10-10 and a minimum percentage identity of 95% were used. Taxonomic assignments were made using the

Ribosomal Data Project (RDP, Cole et al., 2009)), Greengenes (DeSantis et al., 2006), Silva small subunit (SSU) and Silva large subunit (LSU) databases (Pruesse et al., 2007).

Each classification captured different numbers of sequences, and differed in the proportion of those sequences to which they assigned a phylogeny using these quality control cutoffs, but all four analyses were consistent with each other, and with the SSU rRNA analysis of the PCR amplications described in Chapter 2. All four analyses showed that the metagenome is dominated by bacterial sequences, with very few eucaryote sequences, and no archaeal sequences detected in the ribosomal RNA (Figure 3.1 below).

62

n=2092 n=2376 n=2909 n=3344 100%

90%

80%

70%

60% Bacteria 50% Eucaryota 40% Unassigned 30% Percentage of rRNA reads reads of rRNA Percentage 20%

10%

0% RDP Greengenes SSU LSU

Database Figure 3.1 Ribosomal RNA reads within the metagenome classified by different ribosomal RNA databases, including unassigned reads

All four analyses showed the numerical dominance of the Epsilonproteobacteria and

Flavobacteria in the metagenome sample (see analysis at Class level in Tables 3.3-3.5 below).

63

Domain Phylum Class Abundance Bacteria Actinobacteria Actinobacteria (class) 20 Bacteria Bacteroidetes Bacteroidia 10 Bacteria Bacteroidetes Cytophagia 26 Bacteria Bacteroidetes Flavobacteria 348 Bacteria Bacteroidetes Sphingobacteria 6 Bacteria Chloroflexi Chloroflexi (class) 1 Bacteria Firmicutes Bacilli 9 Bacteria Firmicutes Clostridia 31 Bacteria Planctomycetes Planctomycetacia 1 Bacteria Proteobacteria Alphaproteobacteria 15 Bacteria Proteobacteria Betaproteobacteria 107 Bacteria Proteobacteria Deltaproteobacteria 17 Bacteria Proteobacteria Epsilonproteobacteria 1280 Bacteria Proteobacteria Gammaproteobacteria 105 Bacteria Spirochaetes Spirochaetes (class) 1 Bacteria Synergistetes Synergistia 1 Bacteria Tenericutes Mollicutes 8 Bacteria Verrucomicrobia Opitutae 1 Bacteria Other Other 95 Eukaryota Bacillariophyta Bacillariophyceae 1 Eukaryota Chlorophyta Trebouxiophyceae 1 Eukaryota Other Bangiophyceae 1 Eukaryota Other Pelagophyceae 1 unassigned unassigned unassigned 6 TOTAL 2092 Table 3.3 Ribosomal RNA classsification at Class level against RDP database

64

Domain Phylum Class Abundance Bacteria Actinobacteria Actinobacteria (class) 19 Bacteria Bacteroidetes Bacteroidia 5 Bacteria Bacteroidetes Cytophagia 17 Bacteria Bacteroidetes Flavobacteria 329 Bacteria Bacteroidetes Sphingobacteria 4 Bacteria Bacteroidetes Other 1 Bacteria Elusimicrobia Elusimicrobia (class) 1 Bacteria Firmicutes Bacilli 3 Bacteria Firmicutes Clostridia 25 Bacteria Firmicutes Erysipelotrichi 7 Bacteria Planctomycetes Planctomycetacia 1 Bacteria Proteobacteria Alphaproteobacteria 6 Bacteria Proteobacteria Betaproteobacteria 47 Bacteria Proteobacteria Deltaproteobacteria 8 Bacteria Proteobacteria Epsilonproteobacteria 805 Bacteria Proteobacteria Gammaproteobacteria 19 Bacteria Spirochaetes Spirochaetes (class) 1 Bacteria Synergistetes Synergistia 1 Bacteria Tenericutes Mollicutes 8 Bacteria Verrucomicrobia Opitutae 1 Eukaryota Streptophyta Jungermanniopsida 1 Eukaryota Other Florideophyceae 1 unassigned unassigned unassigned 1066 TOTAL 2376 Table 3.4 Ribosomal RNA classification at Class level against Greengenes database

65

Domain Phylum Class Abundance Bacteria Actinobacteria Actinobacteria (class) 16 Bacteria Bacteroidetes Bacteroidia 11 Bacteria Bacteroidetes Cytophagia 27 Bacteria Bacteroidetes Flavobacteria 403 Bacteria Bacteroidetes Sphingobacteria 7 Bacteria Chloroflexi Chloroflexi (class) 1 Bacteria Cyanobacteria Other 2 Bacteria Elusimicrobia Elusimicrobia (class) 1 Bacteria Firmicutes Bacilli 6 Bacteria Firmicutes Clostridia 29 Bacteria Firmicutes Erysipelotrichi 7 Bacteria Planctomycetes Planctomycetacia 2 Bacteria Proteobacteria Alphaproteobacteria 16 Bacteria Proteobacteria Betaproteobacteria 61 Bacteria Proteobacteria Deltaproteobacteria 10 Bacteria Proteobacteria Epsilonproteobacteria 847 Bacteria Proteobacteria Gammaproteobacteria 190 Bacteria Tenericutes Mollicutes 11 Bacteria Verrucomicrobia Opitutae 1 Eukaryota Arthropoda Insecta 5 Eukaryota Ascomycota Pezizomycetes 1 Eukaryota Chlorophyta Trebouxiophyceae 2 Eukaryota Chordata Actinopterygii 11 Eukaryota Streptophyta Liliopsida 7 Eukaryota Streptophyta Zygnemophyceae 1 Eukaryota Other Bangiophyceae 1 Eukaryota Other Chrysophyceae 2 Eukaryota Other Pelagophyceae 1 Eukaryota Other Spirotrichea 1 Eukaryota Other Other 4 unassigned unassigned unassigned 1225 TOTAL 2909 Table 3.5 Ribosomal RNA classification at Class level against Silva SSU database

Domain Phylum Class Abundance Bacteria Actinobacteria Actinobacteria (class) 51 Bacteria Bacteroidetes Bacteroidia 11 Bacteria Bacteroidetes Cytophagia 20 Bacteria Bacteroidetes Flavobacteria 510 Bacteria Bacteroidetes Sphingobacteria 60

66

Bacteria Bacteroidetes Other 1 Bacteria Chloroflexi Chloroflexi (class) 3 Bacteria Chloroflexi Dehalococcoidetes 1 Bacteria Firmicutes Bacilli 20 Bacteria Firmicutes Clostridia 20 Bacteria Firmicutes Erysipelotrichi 2 Bacteria Firmicutes Negativicutes 1 Gemmatimonadetes Bacteria Gemmatimonadetes (class) 3 Bacteria Proteobacteria Alphaproteobacteria 38 Bacteria Proteobacteria Betaproteobacteria 170 Bacteria Proteobacteria Deltaproteobacteria 22 Bacteria Proteobacteria Epsilonproteobacteria 2173 Bacteria Proteobacteria Gammaproteobacteria 101 Bacteria Spirochaetes Spirochaetes (class) 11 Bacteria Synergistetes Synergistia 2 Bacteria Tenericutes Mollicutes 31 Bacteria Verrucomicrobia Opitutae 2 Bacteria Verrucomicrobia Other 2 unclassified (derived Bacteria from Bacteria) Other 1 Eukaryota Annelida Polychaeta 1 Eukaryota Arthropoda Branchiopoda 10 Eukaryota Arthropoda Insecta 8 Eukaryota Ascomycota Sordariomycetes 3 Eukaryota Bacillariophyta Other 1 Eukaryota Chlorophyta Chlorophyceae 2 Eukaryota Cnidaria Anthozoa 1 Eukaryota Cnidaria Hydrozoa 13 Eukaryota Phaeophyceae Other 1 Eukaryota Streptophyta Liliopsida 7 Eukaryota Streptophyta Mesostigmatophyceae 1 Eukaryota Streptophyta Other 2 Eukaryota Other Bangiophyceae 1 Eukaryota Other Chrysophyceae 2 Eukaryota Other Hyphochytriomycetes 1 Eukaryota Other Oligohymenophorea 1 Eukaryota Other Raphidophyceae 1 Eukaryota Other Other 3 unassigned unassigned unassigned 29 TOTAL 3344 Table 3.6 Ribosomal RNA classification at Class level against Silva LSU database

67

At Genus level all four analyses were also consistent with each other, and with the amplicon pyrosequencing data, in confirming that Sulfurovum was the most abundant Genus and that Flavobacterium was also highly abundant. Curiously, none of the four analyses showed any

Sulfuricurvum, in contrast to the results from the pyrosequenced SSU rRNA amplicons which showed this Genus as being highly abundant (Chapter 2). However, all four analyses did show various other Epsilonproteobacteria, including Arcobacter, Helicobacter, Sulfurimonas and

Sulfurospirillum, although the relative abundance of these Genera varied depending on which database was used for the analysis (see Table 3.7 below).

RDP Greengenes SSU LSU Sulfurovum 32 48 40 17 Flavobacterium 16 22 21 7 Arcobacter N/A N/A 0.4 8 Helicobacter 21 3 2 13 Sulfospirillum 0.3 0.2 0.5 7 Sulfurimonas 4 4 2 10 Coxiella N/A N/A 6 N/A Other 28 23 27 39 Table 3.7 Percentage of total reads (excluding unassigned reads) for Genera which represent 5% or more of sequences against any of the ribosomal RNA databases

The variation in the relative abundance assigned to each Epsilonproteobacteria Genera, including the absence of Sulfuricurvum sequences, is probably due to the fact that reads are clustered and only one representative sequence is used to assign phylogeny, so that a small variation in the classification system used by each database to distinguish between closely- related Genera could result in the whole cluster being assigned to a different Genus. However, the pattern was clearly that Epsilonproteobacteria were extremely abundant, and that Sulfurovum was the most abundant Epsilonproteobacteria. The total number of reads assigned to

Epsilonproteobacteria was 50% of all assigned reads using the SSU database, 61% of reads assigned using either the RDP or Greengenes and 66% of all reads assigned using the LSU.

68

In addition to the ribosomal RNA reads within the metagenome, and the pyrosequenced amplicons described in Chapter 2, there were also some longer, Sanger sequences from the DNA sample used for the metagenome. As described in Chapter 2, the representative sequence from the Operational Taxonomic Unit (OTU) in the SSU rRNA amplicon pyrosequenced data that

QIIME had identified as Sulfuricurvum was 99% identical to one of the Sanger sequences, which was also identified as Sulfuricurvum when BLASTed against GenBank. The Sanger sequence also clustered with other, reference, Sulfuricurvum sequences in the Epsilonproteobacteria phylogenetic tree (Figure 4 in Chapter 2). Therefore I considered that the identification of the

OTU in the amplicon data as Sulfuricurvum was appropriate. In addition, the Sanger data were also consistent with Sulfurovum, Sulfuricurvum and Flavobacterium being highly numerically dominant in the sequence data (see Figure 3.2 below).

n=54

Sulfurovum Sulfuricurvum Flavobacterium Cyanobacteria Chlorolexi Pseudomonas Prolixibacter Actinobacter Firmicutes

Figure 3.2 Phylogenetic assignment of SSU rRNA Sanger sequences of the metagenome DNA

69

Phylogenetic distribution of protein-coding genes

I also used MG-RAST to look at the taxonomy of the protein-coding sequences which were the best hits for the metagenome sequences. This would give an indication of the most likely phylogenetic origin of the metagenome sequences, although taxonomic origin of hits can only ever be a question of the most probable origin, in any case, when dealing with this kind of data. Without complete genomes it is not possible to be certain about which genes came from which organism, particularly given the existence of horizontal gene transfer; it is only possible to determine the closest match of gene sequence, and so the most likely phylogenetic origin.

However, this analysis was consistent with the ribosomal RNA analysis, showing that the

Epsilonproteobacteria and Flavobacteria sequences were the most abundant top hits (44% and

12% of top hits for all protein coding sequences, respectively). Figure 3.3 below shows the taxonomic origin for all Classes which represented 1% of more of the top hits within the total protein-coding sequences. As with the other MG-RAST data, there is a possibility of double- counting if a metagenome sequence had two or more top hits. Most likely these would be of similar phylogenetic origin, so Classes that are better characterized within GenBank may be over-represented in this data.

70

n = 1,016,469 Actinobacteria Aquiicae Bacteroidia Cytophagia Flavobacteria Sphingobacteria Bacilli Clostridia Alphaproteobacteria Betaproteobacteria Gammaproteobacteria Deltaproteobacteria Epsilonproteobacteria

Figure 3.3 Taxonomic origin of top hits resulting from comparison of metagenome sequences to GenBank with a maximum e-value of 10-10 and a minimum percentage identity of 50%

Rarefaction and alpha diversity

MG-RAST generated rarefaction curves of the number of observed species level groups detected as sequence numbers sampled increased. The rarefaction curve for protein-coding sequences showed a greater depth of sampling than for SSU rRNA sequences (Figures 3.4, 3.5 and 3.6).

This was also evident in the calculation of alpha diversity. The Shannon richness (calculated as antilog of the Shannon diversity index) at species-level of the metagenome was calculated as

396.565 from all the annotation databases used by MG-RAST, 275.37 from the protein-coding sequences, but only 28.84 or 42.62 from the SSU rRNA sequences contained within the metagenome when annotated against Greengenes or SILVA SSU databases, respectively.

71

Figure 3.4 Rarefaction curve of observed species with increased sampling of ribosomal RNA reads annotated against the Greengenes database

Figure 3.5 Rarefaction curve of observed species with increased sampling of ribosomal RNA reads annotated against the Silva SSU database

72

Figure 3.6 Rarefaction curve of observed species with increased sampling of protein reads annotated against GenBank

Comparative analyses with other metagenomes

The MG-RAST system contains some tools to allow comparative analyses of metagenomes in its database. This should be one of its most useful features. An accurate method to compare metagenomes from different sites, using the same quality control cutoffs and the same method of classifying function, would be extremely valuable in testing out whether the results of one metagenome study are consistent with those from other locations. It can be difficult to do this directly from the literature because of different quality controls and analysis methods, as described above, and so a rigorous analysis would require re-analyzing metagenomes from other studies, which requires suitable computational tools and sufficient computational power. However, the fact that MG-RAST only produces approximate quantifications limits the utility of this feature. There are two reasons given by the MG-RAST documentation for this. The first is that if a read has two top hits with equally good quality scores, then both are included in the final analysis as it is not known which one is the more

73

accurate. This seems reasonable, although it would be helpful (and not too difficult) to also provide the number of unique metagenome sequences for each functional assignment. If a sequence has two or more equally good hits it is most likely because it is hitting the same gene in two different organisms, so that in terms of simply determining which genes are present, without trying to assess the phylogenetic origin of hits, it would be valuable to have the number of unique metagenome sequences. The second reason given by the MG-RAST team for the approximate numbering is that once the function and phylogenetic origin of the best hit to each cluster’s representative sequence has been determined, the total number of sequences assigned to that hit is based on an estimate of the number of sequences in the cluster. There does not seem to be any good reason to use an estimate at this stage, since computers count extremely easily and quickly it should not be difficult to produce exact quantifications. This is a major weakness of the MG-RAST system. The work described in Chapter 2 to quantify accurately the number of metagenome hits against genes of interest demonstrated however that the approximate quantifications produced by MG-RAST are in fact approximately correct, and so can be relied on to give a general idea of relative quantification of genes. Since the degree to which genes receive multiple hits from metagenome sequences will depend on how well represented that gene is in the annotation database (i.e. whether there are lots of very closely related sequences) genes which receives lots of duplicate hits from one metagenome are likely also to receive duplicate hits from other metagenomes, and so broad relative comparisons between the metagenomes may still be valid. As explained above, the functional classification system which captured the largest number of genes in the Borup metagenome was the SEED Subsystems classification. I therefore used this classification system to compare the relative abundance of genes in the Borup

74

metagenome with 6 metagenomes from Yellowstone and hotsprings and 2 metagenomes from hydrothermal vents (Figure 3.7).

!" #" $" %" &" '" (" )" *"

Figure 3.7 Comparison of the relative abundance of gene functional groups within the Borup metagenome, with the relative abundances in metagenomes from other environments. Light green represents highest relative abundance and red represents lowest relative abundance. Key (L to R): 1: Alvinella Pompejena (hydrothermal vent polychaete) epibiont; 2: Borup Fiord Pass Glacier; 3: UBA Acid Mine Drainage; 4: Octopus hot springs; 5: Guerrero Negro Mat 34-49 mm; 6: Guerrero Negro Mat 0-1mm; 7: High Andean Forest Soil; 8: Canadian Arctic Ocean Viruses; 9: Arctic Marine Viruses

The comparison in Figure 3.7 shows that the Borup metagenome clusters most closely with the epibiont of a hydrothermal vent tube worm. However, this very high level analysis was not considered very useful in drawing meaningful conclusions on the questions that this study was designed to answer and given the poor coverage of sulfur redox genes in particular by this

75

classification system it wasn’t considered worthwhile to pursue this method of analysis any further.

CONCLUSION

The supplemenetary analyses carried out all support the conclusions reached in Chapter 2 on the microbial comunity of the Borup Fiord Pass Glacial sulfur deposits. Moreover the process of developing, and testing, different approaches to analyzing the metagenome has enabled some conclusions to be reached on metagenome analysis methods.

There are two types of trade-offs to be made when considering how best to analyze a metagenome (or metatranscriptome, the data analysis issues are exactly the same).

The first trade-off concerns the balance between access to relevant expertise and tools, and intellectual control over the process. To analyze a metagenome effectively requires three things: i) coding (computer programming) ability to write the scripts (programs) needed to analyze such large data sets; ii) significant computational power; iii) scientific knowledge of the genes and biological systems under investigation.

The two types of expertise, scientific and computational, can either be held by a single individual or by collaboration within a team. The ideal situation is that either both types of expertise are either held by a single individual, or that there is enough shared expertise, and strong enough communication within a collaborating team, that the same quality of results can be achieved.

Without extremely good communication, however, the quality of analysis will suffer. The way in which computer programs are written affects the quality of the output. If the individual who

76

writes the script (program) does not possess scientific knowledge of the subject matter, and if the individual with the scientific knowledge doesn’t understand how the script operates, then it is possible that the computer analysis will not be appropriate for the scientific questions being asked. Good communication between collaborators can overcome this, but without that, the analysis may be inadequate and result in inaccurate conclusions being reached.

Access to sufficient computational power is a less significant issue, but not trivial. Any academic institution in the USA or Europe should have the ability to provide this resource to its own investigators, but it may affect researchers in small institutions that are not well funded, particularly in poorer countries outside North America and Europe. Commercial cloud computing resources can provide suitable computational power for anyone, although the cost of this is not yet clear, and could be prohibitive. In this study although access to computational power was an issue, it was an issue that could have been resolved. The reason for making use of the computational resources provided by MG-RAST (Meyer et al., 2008) instead was mostly as a way of accessing the software provided by that system.

For an investigator who does not have the coding skills to write their own software, or a suitable individual with whom to collaborate, or who lacks access to suitable computing power, the MG-RAST system provides an extremely valuable resource, as it provides both the necessary hardware and software to allow an analysis to be performed. However, there are drawbacks.

Using an ‘off-the-shelf’ system like MG-RAST involves some loss of intellectual control for the investigator in determining how the analysis will be carried out. The most significant example of this, and a major weakness of the MG-RAST system, is that the quantifications provided are only approximate. The reasons given for this are not sufficient in my opinion. Moreover the MG-

RAST process is not as transparent as it could be. The documentation does not provide much

77

detail in some cases on how parts of the analysis are carried out, and the scripts used are not made public. A much better model for this type of ‘off-the-shelf’ system is the QIIME software system for analyzing SSU rRNA data (Caporaso et al., 2010). QIIME has accurate quantification, much more complete documentation and all the scripts are publicly available so that it is possible to examine exactly how they work, and to amend them if desired. However, one strength that the MG-RAST system shares with QIIME is that it allows the investigator to download data not just at the end of the analysis, but at various points within it, so that it is possible for the investigator to supplement the analysis. For example, in the current study, data was downloaded from the MG-RAST system after the initial quality-control process to provide a metagenome database (the quality-controlled database of 957,074 sequences) from which poor quality reads and artificial duplicates had been removed, and which was used for further analysis outside of the MG-RAST system. In addition, after the MG-RAST system had assigned functions to the metagenome reads the sequences assigned to genes of interest were downloaded and accurate quantifications were performed manually as set out in Chapter 2. In this way a significantly higher quality of analysis was performed than would have been possible just using

MG-RAST alone. The results of the analysis in Chapter 2 did however show that the MG-RAST approximate quantifications are probably accurate to an order of magnitude, so probably can be relied upon for making very high-level broad brush conclusions.

In the current study it would of course have been possible to collaborate with an individual with greater coding skills in order to produce custom-written software. That approach wasn’t taken because a key objective of this study was to learn how to analyze a metagenome myself. The experience gained in doing so has been invaluable, and has resulted in insights that might not have been obtained otherwise. Over the course of the study I have developed enough

78

coding expertise to understand how to go about writing suitable software for this task, and in future studies I can use this expertise either to interact much more effectively with coding specialists, to use scripts written by others and (with some more development of my coding skills) to write suitable scripts myself. I have already written a couple of simple scripts to aid the analysis process (see Appendices 1 & 2). Whether future work involves writing software, or collaborating with those who do, the skills and experience developed in this study will support better quality work in future. For example, in working through the output list of functional assignments produced by the MG-RAST system manually, it became clear that there is enormous variation in the way in which the same gene is annotated. It required in depth knowledge of the different sulfur redox genes and their function to be able to recognize some of the less obvious assignments. The soxB gene for example may alternatively be described as a sulfate thiol esterase, sulfur oxidation protein or thiosulfate oxidation protein. Knowledge of the relevant literature is needed to recognize possible synonyms in order to check them out in more detail. It is impossible for any individual to have this type of in-depth knowledge of every gene and so coders working on generic software for analyzing metagenomes cannot be expected to do a perfect job of collecting together the results of such an output.

The second trade off that needs to be made in analyzing metagenomes is between the quantity of data to be analyzed and the quality of the analysis. In the present study a high quality analysis was produced by doing several steps manually, but this is very time consuming. It was appropriate for this study because of the objective to obtain very accurate and specific relative quantification of sulfur redox and other energy metabolism genes, and because only a small subset of all the genes in the metagenome were quantified accurately. Genes for functions that were unrelated to the questions being asked in this study were not quantified accurately.

79

It would not be possible to use the method set out in Chapter 2 to quantify every single gene in a metagenome, or series of metagenomes to do a quantitative comparative analysis. Any study which seeks to compare large numbers of metagenomes will probably need to rely entirely on computation as any manual analysis is unlikely to be feasible. This means that key elements that improved the quality of the work done in this study could not be replicated. For example, to do an overview comparison of all the genes in several different metagenomes it would not be possible to do any checking of the annotations in GenBank as was done here. Annotations would all need to be taken at face value. Similarly it would not be possible to go through the results manually as was done here to collect together genes for functions of interest. For an overview comparison it would be necessary to rely on the existing gene classification systems (e.g. COG,

KEGG or SEED subsystems) which this study has shown do not completely capture all relevant genes. Such large scale overview studies (e.g. Booijink et al., 2010, Gomez-Alvarez et al., 2012,

Mackelprang et al., 2011, McNulty et al., 2011) do still provide valuable insights, but their limitations should be understood.

At the present time the numbers of metagenomes being analyzed, and the number of different researchers doing this type of work, is rapidly increasing. Different approaches are being tried and there is value in this experimentation to determine the most appropriate methods.

Moreover, the appropriate methods for different studies will are likely to vary depending on the specific scientific questions being asked. However, metagenome analysis methods could be improved by the following: i) Agreement within the community on a common set of quality-control cutoffs to remove poor quality sequences, and the levels of significance (e-value, percentage identity) that are considered appropriate in deciding whether a sequence is, or is not, the same function as a

80

reference gene. This would make it much easier to draw valid comparisons between the results of different metagenome analyses from different studies. Studies which include work to test out suitable quality control limits (e.g. Wommack et al., 2008, Mou et al., 2008, Mende et al., 2012) are extremely valuable to the community as a whole, and there is ongoing debate about the best analytical methods to use (e.g. Desai et al., 2012, Prakash & Taylor 2012, Scholz et al., 2012,

Thomas et al., 2012). ii) The introduction of statistical significance tests to determine whether differences in abundances of genes and gene families are, in fact, statistically significant or not. This is not normal practice at present in metagenome studies. Wright et al., 2012 (in review, the material covered in Chapter 2) introduces one method for doing this and it is hoped that this will promote the wider use of such tests, and further debate about the best statistical methods to use. iii) The further development of ‘off-the-shelf’ tools such as MG-RAST for researchers who do not have the computational skills, or suitable collaborators, to carry out metagenome analyses entirely on their own. When the ‘off-the-shelf’ software is also combined with a free online resource that provides the necessary computational power, as MG-RAST does, it is even more useful. Despite its limitations, MG-RAST is an enormously valuable resource to the scientific community as a whole. Other ‘off-the-shelf’s systems do exist, but were not investigated in detail in this study, including the Joint Genome Institute IMG/M tools (Markowitz et al., 2011), the

University of California’s CAMERA (Sun et al., 2011) and the J.Craig Venter Institute

Metagenomics Reports (Metarep) tool (Goll et al., 2010). New metagenomic analysis software is being continuously developed (e.g. Mitra et al., 2011, Suzuki et al., 2012, Ziemert et al., 2012) and both the Knight lab QIIME team and the J.Craig Venter Institute have now developed virtual machines that can run on cloud computing platforms (the QIIME Virtual Box which can run on a

81

local machine or the Amazon Cloud http://www.qiime.org/ and the JCVI Cloud Bio-Linux project http://www.jcvi.org/cms/research/projects/jcvi-cloud-biolinux/overview/) to assist researchers who do not have access to suitable computer hardware, which is another extremely valuable development. iv) It would also be helpful if researchers who write custom scripts for analyzing metagenome data (e.g. Canfield et al., 2010) make these public, in the same way that detailed experimental protocols are published, to enable others to utilize them. v) Improvement of the functional classification systems such as COG and the SEED Subsystems to ensure that they are more representative would greatly assist metagenomic analysis, and this objective should not be difficult to achieve. v) Improvement of the quality of annotation of GenBank and other universal databases would also benefit not only metagenomic analysis but other types of investigation as well. However this is difficult to reconcile with the need to keep such databases as a comprehensive up-to-date repository of the huge amount of sequence data currently being generated.

82

CHAPTER 4

S0 BIOMINERALIZATION, HETEROTROPHIC SULFUR OXIDATION AND CARBON UPTAKE BY FLAVOBACTERIACEAE

ABSTRACT

Flavobacteriaceae genera play an important role in the global degradation of complex organic carbon. I isolated two Flavobacteriaceae members from S0 deposits on the surface of

Borup Fiord Pass Glacier, Canadian High Arctic, which may also have a significant sulfur- cycling role. Isolate Flavobacterium sp. BF09-06_kw11.51, a representative of the species-level

Flavobacterium that dominates surface layers of the glacial S0 deposits, oxidized thiosulfate to sulfate but did not conserve energy from this reaction. In contrast, Gillisia sp. BF64A_kw10.4 produced novel S0 biomineralized structures. It was not determined whether the Gillisia sp.

BF64A_kw10.4 isolate oxidized the sulfide to S0 or simply biomineralized S0 that formed by abiotic sulfide oxidation, but in either case this represents a previously-unknown capability for

Flavobacteriaceae, Both isolates were able to assimilate 14C-labeled bicarbonate, although the reason for this capability was not determined.

INTRODUCTION

Flavobacteriaceae genera are widely distributed globally in freshwater and marine environments, soils and sediments, and are known to be capable of degrading a wide range of high molecular-weight polymers, including polysaccharides and proteins, and in particular chitin, so are considered to play an important role in the degradation of complex organic matter

(Gómez-Pereira et al., 2012, McBride et al., 2009, Tamaki et al., 2003).

83

Previous work has demonstrated that Flavobacteriaceae sp. were dominant in thin elemental sulfur (S0) deposits (varnishes) and the surface layer of deeper S0 deposits on the surface of Borup Fiord Pass Glacier, Ellesmere Island, Canadian High Arctic in 2009 (Chapter

2). Analysis of environmental DNA small subunit ribosomal RNA (SSU rRNA) sequences using operational taxonomic units (OTUs) grouped at 97% identity showed that a single

Flavobacterium sp. OTU represented 78% of the SSU rRNA sequences in S0 varnishes on the glacial ice immediately beside the source of a highly-sulfidic stream (Chapter 2). This environmental dominance suggested that Flavobacterium sp., previously characterized as heterotrophic, might have a greater role in sulfur cycling than previously expected.

Only isolated reports of Flavobacteriaceae involvement in sulfur cycling have been published. Teske et al. (2000) found that benthic Flavobacteriaceae isolates were capable of the

2- aerobic oxidization of thiosulfate (S2O3 ) to sulfate, but although this reaction is energetically favorable the Flavobacteriaceae were not able to conserve energy for growth from it, but instead still required organic carbon to grow. Green et al. (2011) demonstrated that a marine

Flavobacterium sp. could oxidize dimethyl sulfide to dimethyl sulfoxide, but again still required organic carbon for growth, although the Flavobacterium did grow faster when dimethyl sulfide was present. In addition, Gleeson et al. (2011) demonstrated that mixed cultures of Marinobacter and Flavobacteriaceae from the Borup Fiord Pass Glacier S0 deposits produced novel structures of biomineralized S0. The current study was therefore designed to test whether the Borup Fiord

Pass Glacier Flavobacteriaceae were catalyzing sulfur oxidation reactions, and if so, whether they were able to obtain energy for growth from such reactions. If they did, this would indicate a previously-unsuspected sulfur lithotrophic metabolism, which could mean that Flavobacteriaceae have a significant impact on sulfur cycling in other environments.

84

MATERIALS AND METHODS

Isolation procedures

I isolated the Marinobacter and Flavobacteria from Gleeson et al., 2011’s BF06_4A sulfide gradient tube culture by making serial dilutions of the culture in sterile EM media (the same artificial seawater media used for the cultures, made as described in Gleeson et al., 2011) and then spreading these dilutions onto agar plates also made with the EM media, supplemented with 4 mM NaS. No organic carbon was included except the agar needed to solidify the plates.

The plates were incubated at either 6°C or room temperature for 7 weeks. Several well- established, isolated, colonies were transferred onto a new EM-NaS plate each. I used some biomass of each transferred clone as the template for a PCR reaction using the ‘universal’ primers for the small subunit ribosomal RNA (SSU rRNA) 515F and 1391R (Lane et al., 1985).

I purified the PCR product was purified using the Purelink PCR purification kit (Invitrogen,

USA) and it was sequenced by SeqWright using Sanger sequencing. Similarly, I isolated the

Flavobacterium from the 2009 Borup sulfur deposits by making serial dilutions of the environmental BF09-06 sulfur deposit in sterile EM media, spreading the dilutions onto EM plates supplemented with 5 mM NaS, which were incubated at 6°C for 5½ weeks, and then transferring and characterizing individual colonies as described above. A second Marinobacter was isolated from cultures designed to enrich for S0 oxidation ability. The S0 oxidation cultures

0 were set up with 0.1 g S (Fisher, USA) in 10 ml of EM media with 10 mM NaHCO3 as the only carbon source and inoculated with Borup sulfur deposits collected in 2007. Cultures were incubated at 6°C in the dark for 1 to 5 months, after which 0.1 ml of culture was transferred to a new culture. This was designed to ensure that successive generations of the cultures would be enriched in any cells that were actively growing, and were therefore likely to be able to use S0 as

85

an energy source, while non-growing cells would gradually be diluted and so they would become a smaller proportion of the total numbers. I spread a serial dilution of the 8th generation culture on agar plates made with EM media plus 100 µM acetate. The plates were incubated at 6°C in the dark for 7 weeks, and then well-separated colonies were transferred onto a new plate each, and their DNA amplified to sequence the SSU rRNA DNA, as described for the other isolates above.

Culturing experiments

I tested the Marinobacter isolates, Marinobacter sp. BF64A_kw10.3 (isolated from the

BF6_4A culture) and Marinobacter sp. BF07-02_kw10.8 (isolated from the S0 oxidation culture) and the Flavobacteria isolates, Gillisia sp. BF64A_kw10.4 (isolated from the BF64A culture) and Flavobacterium sp. BF09-06_kw11.51 (isolated from the 2009 environmental sample) for the ability to produce S0-biomineralized structures using sulfide gradient tube cultures. In round

1 experiments which involved Marinobacter sp. BF64A_kw10.3, Marinobacter sp. BF07-

02_kw10.8 and Gillisia sp. BF64A_kw10.4, the sulfide gradient tubes were made as described in

Gleeson et al., 2011, except that the EM media was modified to a low-salt EM media which used only 10% of the NaCl in the recipe in Gleeson et al., 2011, in order to match more closely the salinity of the 2006 spring water chemistry (also given in Gleeson et al., 2011) and the pH was reduced to pH 6.5 to match the pH of the BF09-06 sulfur deposit (Wright et al., 2012 in review,

Chapter 2). Briefly therefore, the cultures consisted of an agar plug (EM + 1% w/v agar) containing 4 mM NaS with an overlying column of EM containing 0.15% (w/v) agar in order to create a slushy layer that would slow gas diffusion thus allowing a microaerophilic environment to develop due to opposing gradients of sulfide from the plug, and oxygen from the atmosphere.

86

The overlayer was heated and sparged with N2:CO2 (80:20) to de-oxygenate it before it was added to the cultures. In round 2 experiments, which involved the same isolates, the gradient tube culture method was further modified by setting up the cultures without the addition of the

Borup sediment that Gleeson et al., 2011 added to the NaS-agar plug. In addition, gradient tube cultures for Gillisia sp. BF64A_kw10.4 were made up with both 4 mM and 8 mM NaS in the agar plug. In round 3 experiments, sulfide gradient tube cultures were set up for Gillisia sp.

BF64A_kw10.4 and Flavobacterium sp. BF09-06_kw11.51 with 4 mM, 8 mM, 12 mM or 16mM

NaS in the agar plug. For Flavobacterium sp. BF09-06_kw11.51 the media used was nBF09SM, a mineral salts media based on the chemical composition of the BF09-06b environmental sulfur deposit. The composition of nBF09SM is given in Table 4.1 below, and can be compared to the environmental chemistry of deposit BF09-06b given in Table 2.1 of Wright et al., 2012 (in review, Chapter 2). The media nBF09SM was used as the base media for all experiments involving Flavobacterium sp. BF09-06_kw11.51 while the low-salt EM was used as the base media for all experiments involving the other three isolates.

Concentration (mM) KCl 0.6 MgCl2 0.97 CaCl2 8 NH4Cl 10 K2HPO4 0.2 SiO2 0.1 NaF 0.1 MgSO4 0.03 NaHCO3 10 Trace elements As described in for EM in Gleeson et al., 2011 Vitamins As described in for EM in Gleeson et al., 2011 Table 4.1 Composition of media nBF09SM; pH was adjusted to pH 6.5.

In round 4 experiments, I tested isolates Gillisia sp. BF64A_kw10.4 and

Flavobacterium sp. BF09-06_kw11.51 for the ability to oxidize sulfide in cultures set up as

87

above, except that the overlayer did not contain agar and 5 mM NaS was used in the plug.

Cultures were set up in triplicate, and were sampled periodically with samples being filter- sterilized and preserved for analysis of sulfate and thiosulfate, which were analyzed by ion chromatography. I performed cell counts were performed at the same time, using a Petroff-

Hausser cell counter and Live-Dead fluorescent dye (Invitrogen). Three counts were made for each culture, and these were performed blinded (so that I did not know which samples were culture and which were controls at the time of counting) with sterile controls intermixed with culture samples. The controls were (a) sterile negative controls set up in exactly the same way as the sulfide-containing cultures, and (b) ‘S-minus’ controls, which were cultures inoculated into gradient tubes prepared without any sulfide in the plug.

I set up a duplicate set of cultures and controls (also set up in triplicate for each culture/control) in exactly the same way, but with the addition of 14C-labeled bicarbonate (Perkin

Elmer) at 0.925 µCi/ml. To measure radiolabel incorporation, I took 5 ml samples from each culture and filtered them through a 0.2 µM filter; 10 ml of sterile media was flushed through the filter afterwards to wash off any adsorbed radiolabel. Filters were exposed to fuming HCl for 15

14 minutes to ensure that any C-labeled bicarbonate on the filter was converted to CO2 gas, and then left open to the atmosphere for at least an hour to allow the gas to evaporate from the filter, to ensure that only 14C that had been incorporated into the cell was measured (Tuttle & Jannasch,

1977).

0 I also teseted all four isolates for the ability to oxidize S and thiosulfate (S2O3) by setting up cultures of base media (EM or nBF09SM) plus 10 mM NaHCO3 as the only carbon source

0 and either 0.1g S /10ml media or 10 mM Na2S2O3 (Sigma-Aldrich, USA), respectively. Samples

0 of the cultures were analyzed for sulfate (as evidence of the oxidation of S and S2O3) and

88

thiosulfate by Fred Luiszer using ion chromatography (Dionex). For Flavobacterium sp. BF09-

06_kw11.51 and Marinobacter sp. BF07-02_kw10.8 I set up cultures in triplicate. I did repeated sampling for analysis for sulfate and thiosulfate concentrations and cell counting using the same method as described above. For isolates Marinobacter sp. BF64A_kw10.3 and Gillisia sp.

BF64A_kw10.4 I set up single cultures and no cell counting was done.

Finally, I tested isolates Gillisia sp. BF64A_kw10.4 and Flavobacterium sp. BF09-

06_kw11.51 for the ability to carry out heterotrophic thiosulfate oxidation. Cultures were set up containing an agar plug (base media plus 1% w/v agar, no sulfide or thiosulfate) with a liquid overlayer (no agar in the overlayer). The overlayer contained 2.5 mM thiosulfate. The controls included (a) sterile negative controls set up in exactly the same way as the cultures, (b) ‘S-minus’ controls which were cultures set up without thiosulfate in the overlayer, and (c) ‘C-minus’ cultures which contained 2.5 mM thiosulfate but did not have an agar plug. All cultures and controls were set up in triplicate, with sampling for cell counting using the Petroff-Hausser cell counter (as described above except the counts were not done on a blinded basis) and ion chromatography analysis for sulfate and thiosulfate concentrations.

All the experiments described above included sterile negative controls set up in exactly the same way as the cultures. All cultures and controls were incubated at 6°C in the dark for 4-14 weeks. Cultures and their respective controls were always set up on the same day, incubated for the same amount of time, and sampled on the same day.

DNA extractions, sequencing and sequence analysis

To confirm that cultures of Marinobacter sp. BF07-02_kw10.8 and Gillisia sp.

BF64A_kw10.4 were still pure isolate cultures after several months, I extracted DNA from each

89

culture using the DNEasy Blood and Tissue Kit (Qiagen, USA), prepared for pyrosequencing using primers 515F and 927R as described in Wright et al., 2012 (in review, Chapter 2) and sequenced on a Roche pyrosequencer (Margulies et al., 2005) with FLX Titanium chemistry

(Roche, Mannheim, Germany) by EnGenCore. A negative control of the DNA extraction and sequencing preparation process was also sequenced.

I analyzed DNA sequences obtained from the pyrosequencing using QIIME v1.1.0

(Caporaso et al., 2010). To exclude poor quality data, sequences with minimum average quality score less than 25, ambiguous bases, primer or barcode mismatches, or maximum homopolymer run greater than 6 nucleotides, were discarded. Only sequences between 410 and 419 nucleotides were used in the analysis as sequences that are significantly longer or shorter than the typical length for a sequencing run have been shown to be poor quality (Huse et al., 2007). Sequences were clustered into operational taxonomic units (OTUs) at 97% identity using UClust (Edgar,

2010) and the most abundant sequence in each cluster was chosen as the cluster representative sequence to assign taxonomy for the OTU. Taxonomic classifications of pyrosequences were assigned with the RDP classifer (Wang et al., 2007) and the Greengenes taxonomy (DeSantis et al., 2006). I analyzed Sanger sequences by BLASTing (Altschul et al., 1990) against the

GenBank non-redundant nucleotide database.

Functional gene tests

I tested DNA from isolates Marinobacter sp. BF64A_kw10.3, Marinobacter sp. BF07-

02_kw10.8 and Gillisia sp. BF64A_kw10.4 for possession of the soxB gene with the degenerate soxB primers 432F and 1446B (Petri et al., 2001). I also tested Marinobacter sp. BF64A_kw10.3 for the rDsrAB genes with the degenerate rDsr primers rDsr4Rdeg (SSRWARCAIGCNCCRCA)

90

and rDsr1Fdeg (WWNGGNTAYTGGAARG), which were adaptations of the rDsr primers of

Loy et al., 2009.

Flavobacteriaceae phylogenetic tree

I constructed a Flavobacteriaceae phylogenetic tree using reference Flavobacteriaceae

SSU rRNA sequences downloaded from GenBank and the SSU rRNA sequences from Gillisia sp. BF64A_kw10.4 and Flavobacterium sp. BF09-06_kw11.51. Sequences were aligned using ssu-align (Narwocki, 2009). Maximum likelihood trees were made with the online RAxML

Black Box (Stamatakis, 2006, Stamatakis et al., 2008) using the gamma model of rate heterogeneity, and performing 100 bootstraps to get a consensus tree.

Sequence data access

SSU rRNA sequences for the four isolates are in the process of being submitted to

GenBank.

91

RESULTS

Identification of isolates

Marinobacter sp. BF64A_kw10.3 shares 98.5% identity in its SSU rRNA sequence with the Marinobacter sp. enrichment culture clone BF64_C1 (Accession number HM141447) that was one of the only two members of the mixed culture producing S0 biomineralized stuctures described in Gleeson et al., 2011, while Marinobacter sp. BF07-02_kw10.8 shares 99.6% identity with Marinobacter sp. enrichment culture clone BF64_C1. Marinobacter sp.

BF64A_kw10.3 and Marinobacter sp. BF07-02_kw10.8 share 97.8% identity with each other.

Gillisia sp. BF64A_kw10.4 shares 99.27% identity with Flavobacterium sp. enrichment clone BF64_C33 (Accession number HM141489) which was the other member of the S0 biomineralizing consortia described in Gleeson et al., 2011. Although described in that paper as a Flavobacterium, on further analysis of the SSU rRNA data it is clear that it groups more closely with the Gillisia Genus, a close relative to the Flavobacterium Genus as both are members of the Flavobacteriaceae Family (see below).

Flavobacterium sp. BF09-06_kw11.51 has 99.72% identity with the representative sequence of the Flavobacterium OTU that was dominant in the 2009 Borup Fiord Pass glacial sulfur deposits BF09-02, BF09-04 and BF09-05, and the surface layer of the BF09-06 deposit

(see Chapter 2, Figure 2.3B). Flavobacterium sp. BF09-06_kw11.51 and Gillisia sp.

BF64A_kw10.4 share 93.95% identity.

Gillisia sp. BF64A_kw10.4 groups with other Gillisia sequences phylogenetically

(Figure 4.1 below). Gillisia sp. BF64A_kw10.4 shares 98.78% identity with the Gillisia type strain Gillisia limnaea DSM 15749, which was isolated from Lake Fryxell, Antarctica (Von

Trappen et al., 2004) and it is also closely related to other Antarctic Dry Valley sequences from

92

studies at Blood Falls (98-99% identity with Accession numbers DQ677875 & DQ677863,

Mikucki & Priscu 2007) and Lake Vida (99% identity with Accession number DQ521523

Mosier et al., 2007).

Flavobacterium_johnsoniae 0.05 GU000131_Flavobacterium

JQ800072_Flavobacterium <50 <50 AB075231_Flavobacterium_limicola <50 Flavobacterium_psychrophilum <50 BF09-06_kw11_51

NR_042332_Flavobacterium_fryxellicola

EU196339_Gillisia

NR_043125_Gillisia_hiemivivida

89 AF254113 <50 99 AF254114 94 95

FJ889670_Gillisia 54 BF964_kw10_4 63 HM141489_Flavobacterium_BF64A_C33 84 62 DQ677875 <50 35 DQ521523 61 DQ677863_Flavobacterium

NR_042141_Gillisia_limnaea_DSM_15749

DQ486480_Muricauda 67 DQ486485_Arenibacter

!!" Sulfurovum_sp_NBC37-1

0.3

Figure 4.1 Representation of a phylogenetic tree of showing the relationship of the Borup Flavobacteriaceae isolates to other Flavobacteriaceae members. Bootstrap values are shown by nodes. Bootstrap values of less than 50 are not considered reliable, and those nodes are considered un-resolved. The scale bar shows substitutions per site. 93

Flavobacterium sp. BF09-06_kw11.51 is closely related to another isolate from Lake

Fryxell, sharing 98.74% identity with Flavobacterium fryxellicola (Accession number

NR_042332 Von Trappen et al., 2005). Flavobacterium sp. BF09-06_kw11.51 is equally closely related (98.97% identity) to Flavobacterium sequences identified from coldwater sites in Japan

(Flavobacterium limicola Accession number AB075231 Tamaki et al., 2003) the Arctic

(Accession number JQ800072) and the Antarctic (Accession number GU000131 Shivaji et al.,

2011). In addition, Flavobacterium sp. BF09-06_kw11.51 shares 97.13% identity with the salmon and trout pathogen Flavobacterium psychrophilum (Duchaud et al., 2007).

Production of S0 biomineralized structures

S0 biomineralized structures were produced in the Gillisia sp. BF64A_kw10.4 cultures in rounds 1, 2 and 3 of the sulfide gradient tube experiments (Figures 4.2-4.5 below).

B

Figure 4.2 S0 biomineralized structures produced in the cultures of Gillisia sp. BF64A_kw10.4. DIC microscopy x1000 magnification

94

DC

Figure 4.3 S0 biomineralized structures produced in the cultures of Gillisia sp. BF64A_kw10.4. DIC microscopy x400 magnification

Figure 4.4 S0 biomineralized structures produced in the cultures of Gillisia sp. BF64A_kw10.4. DIC microscopy x400 magnification

95

Figure 4.5 S0 biomineralized structures produced in the cultures of Gillisia sp. BF64A_kw10.4. Merged photo of DIC and fluorescence microscopy x400 magnification.

Cells are false-colored green.

0 0 The photos were all taken using DIC microscopy and so the S is light and shiny as S is

0 highly reflective of light under DIC. Figure 4.2 shows a central mass of S surrounded by

0 filaments, some of which are biomineralized with S , similar to the morphology seen in Figure

7A of Gleeson et al (2011). Figures 4.3 and 4.4 show further biomineralized structures and non- n biomineralized filaments, analagous to those shown in Figures 7D and 7E from Gleeson et al

(2011) and Figure 4.3 shows a close-up of a filament that is just in the process of being biomineralized (with S0 at just one end). Figure 4.4 is a merged photo of the same microscope view taken under DIC conditions and fluorescence. Cells were dyed using Live-Dead stain which results in live cells fluorescing bright green and these cells are shown as green (using false-color) on the photo, demonstrating the close association of these cells with the S0-biomineralized structures (compare to Figure 7C of Gleeson et al. 2011).

96

The S0 biomineralized structures were not observed in any of the sulfide gradient tube cultures of Marinobacter sp. BF64A_kw10.3, Marinobacter sp. BF07-02_kw10.8 or

Flavobacterium sp. BF09-06_kw11.51, or in any of the sterile negative controls. S0 was present in the 16mM sulfide gradient tube sterile negative controls in the form of individual spherical- appearing grains, and as amorphous masses (similar to the central S0 mass seen in Figure 4.3) but there were no filaments, or mineralized structures resembling those seen in the Gillisia sp.

BF64A_kw10.4 cultures.

Confirmation of isolate purity

The purity of the Gillisia sp. BF64A_kw10.4 culture producing the S0 biomineralized structures was confirmed by the pyrosequence data. A total of 3954 sequences were obtained from this culture. Of these, 3947 were Gillisia. The other 7 sequences were Uruburuella and

Microbacteriaceae, and the same sequences were also present in the sequences from the procedural negative control demonstrating that they were contaminants of the extraction process and not from within the culture itself.

The purity of the Marinobacter sp. BF07-02_kw10.8 culture was also confirmed by pyrosequencing. A total of 6147 sequences were obtained. Of these, 6112 were Marinobacter and the remaining 32 were all sequences that were also seen in the procedural negative control.

97

Oxidation of sulfide, S0 and thiosulfate

In the sulfide oxidation experiments (round 4 sulfide gradient tube experiments where the overlayer did not contain agar) the Flavobacterium sp. BF09-06_kw11.51 culture demonstrated a significant increase in sulfate compared to both the sterile negative control and the S-minus control (Figure 4.6 below).

0.14 S plus Culture S minus Culture 0.12 Sterile control

0.1

0.08

0.06

0.04 Sulfate concentration (mM) 0.02

0 0 10 20 30 40 50 60 70 80 90 100 Day

Figure 4.6 Sulfate concentrations in sulfide oxidation experiment with Flavobacterium sp. BF09-06_kw11.51. Data points are the mean of measurements from triplicate cultures and error bars are 2x standard deviation

98

The Flavobacterium sp. BF09-06_kw11.51 culture also demonstrated a significant reduction in thiosulfate compared to the sterile negative control (Figure 4.7 below). Thiosulfate levels initially increased in both the thiosulfate-containing (S-plus) culture and the sterile negative controls, but then decreased in the culture whilst thiosulfate levels in the sterile control continued to rise. Thiosulfate was not present in the S-minus control.

0.07 S plus Culture S minus Culture 0.06 Sterile control 0.05

0.04

0.03

0.02

0.01

0 0 10 20 30 40 50 60 70 80 90 100 -0.01 Thiosulfate concentration (mM) -0.02

-0.03 Day

Figure 4.7 Thiosulfate concentrations in sulfide oxidation experiment with

Flavobacterium sp. BF09-06_kw11.51. Data points are the mean of measurements from triplicate cultures and error bars are 2x standard deviation

99

Thiosulfate is known to be produced by abiotic oxidation of sulfide (Zhang and Millero, 1993) and so these results could indicate oxidation of either sulfide or thiosulfate (or both) by

Flavobacterium sp. BF09-06_kw11.51. Cell numbers were not significantly different in the sulfide-containing and non-sulfide-containing (S minus) cultures, demonstrating that

Flavobacterium sp. BF09-06_kw11.51 was not gaining energy for growth from the oxidation of reduced sulfur (Figure 4.8 below).

2.E+07 S containing culture S minus culture 2.E+07 Sterile negative control 2.E+07

1.E+07

1.E+07

1.E+07

Cells/ml 8.E+06

6.E+06

4.E+06

2.E+06

0.E+00 0 10 20 30 40 50 60 70 80 90 100 Day

Figure 4.8 Cell counts for the Flavobacterium sp. BF09-06_kw11.51 sulfide oxidation experiment. Values are the mean of 9 counts (3 counts on each of 3 cultures) and error bars are 2x standard deviation. The detection limit for the counting chamber was 5.E05 cells/ml. The data shown are the counts that were obtained, but a zero detection could mean that cells were present below the level of detection, and cells were known to be present, even though not observed, in the cultures at time point zero as counts were made immediately after inoculation. No cells were observed in any sterile controls at any time.

100

In the heterotrophic thiosulfate oxidation experiment, sulfate levels again increased significantly in the Flavobacterium sp. BF09-06_kw11.51 culture, compared to the sterile and S minus controls, demonstrating that this isolate can oxidize thiosulfate to sulfate (Figure 4.9 below). A significant increase in sulfate was seen in both the culture that contained thiosulfate and an agar plug, and in the culture that contained thiosulfate without an agar plug, but the increase in sulfate was larger in the culture that contained the agar plug.

1.400 Sterile Control 1.200 S2O3 + org C culture Org C only 1.000 S2O3 only 0.800

0.600

0.400

Sulfate concentration (mM) concentration Sulfate 0.200

0.000 0 5 10 15 20 25 30 Day

Figure 4.9 Sulfate concentrations in the Flavobacterium sp. BF09-06_kw11.51

heterotrophic thiosulfate oxidation experiment. Data points are the mean of measurements from triplicate cultures and the error bars are 2x standard deviation.

101

No significant difference was seen in the thiosulfate levels in the heterotrophic thiosulfate

oxidation experiment, but the range of uncertainty in the measurements was large enough to have

masked a reduction of the scale needed to produce the amount of sulfate (Figure 4.10 below).

4.500 Sterile control 4.000 S2O3 + org C culture 3.500 S2O3 only culture Org C only culture 3.000

2.500

2.000

1.500

1.000

0.500 Thiosulfate concentration (mM) concentration Thiosulfate 0.000 0 5 10 15 20 25 30 Day

Figure 4.10 Thiosulfate concentrations in the Flavobacterium sp. BF09-06_kw11.51 heterotrophic thiosulfate oxidation experiment. Data points are the mean of measurements from triplicate cultures and the error bars are 2x standard deviation.

102

There was no significant difference in cell numbers between thiosulfate-containing and S minus cultures (Figure 4.11 below).

S2O3 plus organic C 3.E+07 Organic C only

2.E+07 Sterile negative control

S2O3 only 2.E+07

1.E+07

Cells/ml 8.E+06

3.E+06

-2.E+06 0 5 10 15 20 25 30

-7.E+06 Day

Figure 4.11 Cell counts for Flavobacterium sp. BF09-06_kw11.51 heterotrophic thiosulfate oxidation experiment. Values are the mean of 9 measurements (triplicate counts on triplicate cultures) and error bars are 2x standard deviation. The data shown are the cell numbers as counted, but a count of zero could mean that cells were present below the detection level of this counting method (which is 5E.05 cells/ml). Cells were known to be present on day in the cultures on day 0 as counts were done immediately after innoculation. No cells were observed in sterile controls at any time.

103

In contrast, the data for the sulfide oxidation (round 4 sulfide gradient tube) experiment for

Gillisia sp. BF64A_kw10.4 do not demonstrate any evidence of an ability to oxidize reduced sulfur species to sulfate as there was no increase in sulfate production (Figure 4.12 below) or decrease in thiosulfate (Figure 4.13). There was also no difference in cell numbers between the sulfide-containing and S minus cultures (data not shown).

0.35 S plus Culture 0.3 S minus Culture Sterile control 0.25

0.2

0.15

0.1

Sulfate concentration (mM) 0.05

0 0 20 40 60 80 100 120 140 -0.05 Day

Figure 4.12 Sulfate concentrations in the Gillisia sp. BF64A_kw10.4 sulfide oxidation

experiment. Data points are the mean of measurements from triplicate cultures and the error bars are 2x standard deviation.

104

0.04 S plus Culture S minus Culture Sterile neg control 0.03

0.02

0.01

0 Thiosulfate concentration (mM) 0 20 40 60 80 100 120 140

-0.01 Day

Figure 4.13 Thiosulfate concentrations in the Gillisia sp. BF64A_kw10.4 sulfide oxidation experiment. Data points are the mean of measurements from triplicate cultures and the

error bars are 2x standard deviation.

There was also no evidence that Gillisia sp. BF64A_kw10.4 could oxidize thiosulfate in the heterotrophic thiosulfate experiment. No sulfate production was observed, although background sulfate levels were high enough that changes of the same magnitude as those seen in the

Flavobacterium experiment would not have been noticeable (data not shown). Cell numbers in the Gillisia heterotrophic thiosulfate oxidation experiment were no different between thiosulfate- containing and S minus controls (data not shown).

105

Marinobacter sp. BF07-02_kw10.8 demonstrated the ability to oxidize thiosulfate as shown by a significant increase in sulfate compared to sterile controls (Figure 4.14 below). No reduction in thiosulfate was seen in the same experiment, but the range of uncertainty in the thiosulfate concentrations was great enough to have masked any reduction (data not shown). Cell numbers were the same in thiosulfate and non-thiosulfate containing cultures (data not shown).

0.35 S plus culture 0.3 S minus culture Sterile control

0.25

0.2

0.15

0.1

Sulfate concentration (mM) 0.05

0 0 20 40 60 80 100 120 140 -0.05 Day

Figure 4.14 Sulfate concentrations in the Marinobacter sp. BF07-02_kw10.8 thiosulfate oxidation experiment. Data points are the mean of measurements from triplicate cultures and the error bars are 2x standard deviation.

None of the four isolates demonstrated any ability to oxidize S0 (data not shown).

106

Functional gene PCRs

None of the PCR reactions using the soxB or rDsr primers yielded any PCR product.

However, these PCR reactions also did not have effective positive controls so these results are inconclusive.

Carbon fixation experiments

All three isolates tested, Flavobacterium sp. BF09-06_kw11.51, Gillisia sp.

BF64A_kw10.4 and Marinobacter sp. BF07-02_kw10.8, demonstrated some ability to take up

14C-labeled bicarbonate (Figures 4.15-4.17 below) but that there was no increase in uptake between sulfide-containing and non sulfide-containing cultures in any of the experiments.

1600

S plus culture 1400 S minus culture 1200 Sterile negative control

1000

800 cpm 600

400

200

0 0 10 20 30 40 50 60 70 80 90 100 -200 Days

Figure 4.15 14C-uptake by Flavobacterium sp. BF09-06_kw11.51. Values are the mean of measurements from triplicate cultures and error bars are 2x standard deviation.

107

10000 S plus culture S minus culture 8000 Sterile negative control

6000

4000 cpm

2000

0 0 20 40 60 80 100 120 140

-2000 Days

Figure 4.16 14C-uptake by Gillisia sp. BF64A_kw10.4. Values are the mean of measurements from triplicate cultures and error bars are 2x standard deviation. The increase in cpm was statistically significant for the S minus culture but was not statistically significant for the S plus culture.

108

3000 S plus culture 2500 S minus culture Sterile negative control

2000

1500 cpm 1000

500

0 0 20 40 60 80 100 120 140

-500 Days

Figure 4.17 14C-uptake by Marinobacter sp. BF07-02_kw10.8. Values are the mean of measurements from triplicate cultures and error bars are 2x standard deviation. The increase in cpm was statistically significant for the S minus culture but was not statistically significant for the S plus culture.

DISCUSSION

S0 biomineralization

The results clearly demonstrate that Gillisia sp. BF64A_kw10.4 is capable of producing the S0 biomineralized structures reported in Gleeson et al., 2011 as these structures were repeatedly observed in pure cultures of this isolate in multiple experiments. Although

Marinobacter was numerically dominant in the mixed cultures that Gleeson et al. described, the results in the current study show that its presence is not necessary, as the pyrosequences showed that the Gillisia cultures were pure cultures and did not contain any Marinobacter. It also appears

109

that Marinobacter is not capable of producing the S0 structures itself as they were not observed in any of the Marinobacter isolate cultures.

The S0-biomineralized structures produced by Gillisia sp. BF64A_kw10.4 are a novel form of extracellular S0 precipitation, with a different morphology from the internal deposits of

S0 seen in sulfur oxidizers such as Allochromatium (Frigaard & Dahl 2009) or Beggiatoa (Nelson et al., 1989), but also different from extracellular S0 precipitated by Arcobacter (Sievert et al.,

2007) or Thiobacillus (Hallberg et al., 1996). There is one example reported of environmental S0 structures with a similar morphology in a cold sulfur spring in Ancaster, Canada (Douglas &

Douglas 2000 & 2001). However, this study did not undertake any microbiological work and so did not determine the microbes responsible for these structures.

It is not clear whether Gillisia sp. BF64A_kw10.4 is actively oxidizing sulfide to S0 in culture or whether it is simply catalyzing the precipitation of S0 that has formed by abiotic oxidation of sulfide, as S0 was also observed in some of the sterile negative controls. It has been clear from both the Gleeson et al. 2011 study and this one that the S0-biomineralized structures are not present in abiotic controls, but that does not exclude the possibility that S0 could be formed by abiotic oxidation, with Gillisia sp. BF64A_kw10.4 simply providing a nucleating surface (the filaments) on which S0 formed by abiotic oxidation can precipitate. S0 can form as one of the abiotic oxidation products of sulfide oxidation and the proportion of S0 to other oxidation products increases as the sulfide:oxygen ration increases (Zhang & Millero, 1993). In that context it is relevant to note that in the experiment where varying concentrations of NaS were used in the agar plug the S0 biomineralized structures were first observed in the culture with the highest concentration of NaS (16 mM in the plug) and were only later observed in cultures that had lower NaS concentrations. To determine, conclusively, whether Gillisia sp.

110

BF64A_kw10.4 was catalyzing the oxidation of sulfide to S0 it would be necessary to undertake a quantitative assessment of the total S0 present in cultures compared to sterile controls using a technique such as extraction of S0 with chloroform followed by hplc (Rethmeier et al.,1997,

Kamyshny et al., 2009).

It is also not clear why this Gillisia sp. BF64A_kw10.4 is producing the S0 biomineralized structures. Iron encrusted sheaths and filaments are produced by bacteria such as

Gallionella and Leptothrix that oxidize Fe2+ to Fe3+ at circumneutral pH conditions as a way of ensuring that the cells do not become entombed, as the Fe3+ they produce is extremely insoluble at non-acidic pH (Baskar et al., 2012, Emerson & Revsbech 1994, Hallbeck & Pederson 1990).

Similarly, if Gillisia sp. BF64A_kw10.4 is oxidizing sulfide to S0 it may be producing filaments for the S0 to precipitate on to avoid entombing itself, as S0 is highly insoluble. However, one key difference is that Fe3+ will be actively attracted to cells as the outer surfaces of cells usually carry a net negative charge due to the phospholipids in the membrane, whereas S0 will not be, as it does not carry a charge.

Another possible reasons for the S0 biomineralization is that Gillisia sp. BF64A_kw10.4 may be storing the S0 as a food supply. This is the reason that S0 is precipitated by known sulfur oxidizers such as Allochromatium (Frigaard & Dahl 2009), Beggiatoa (Nelson et al., 1989) or

Thiobacillus (Hallberg et al., 1996). In these cases sulfide is oxidized to S0, which is stored intracellularly (for Allochromatium and Beggiatoa) or extracellularly (for Thiobacillus) and later oxidized further to sulfate. Gillisia sp. BF64A_kw10.4 did not demonstrate the ability to oxidize

S0 in culture. However, the S0 that was used for the substrate in those experiments was industrially-produced precipitated S0 (Fisher, USA). S0 can form rings or chains of atoms and typically forms an S8 ring (orthorhombic α sulfur) and this is the most stable form which

111

typically dominates commercially-prepared S0 (Cotton et al., 1999, Franz et al., 2007). However, other crystalline forms of S0, with different configurations of atoms, are possible, including β sulfur that forms at high temperatures by re-arrangement of the S8 ring, γ sulfur (rosickyite) that exists at low temperatures but is less stable than α sulfur, and polymeric sulfur consisting of chains of atoms (Cotton et al., 1999, Franz et al., 2007, Meyer 1976). The exact crystalline structure of S0 may affect the ability of bacteria to use it as a substrate, as it is not trivial to mobilize an insoluble substrate and the mechanism by which this is done is not understood

(Franz et al., 2007). Allochromatium can oxidize S0 but has been shown not to utilize α sulfur but instead to use only the polymeric S0 that is present as a minor constituent of commercially- available S0 (Franz et al., 2007). Previous work has shown that biologically-formed S0 may have a range of different configurations (Prange et al., 2002). It is possible that the S0 biomineralized structures contains S0 in a different atomic configuration than the commercial S0 used in the S0 oxidation experiments in this study, in which case it may be possible for Gillisia to use the biomineralized S0 as an energy source even though it could not utilize the commercial S0.

Previous work by Gleeson et al (2011) confirmed that the biomineralized S0 was composed only of the element sulfur and not a sulfur-oxygen anion or other sulfur-containing compound, by using Electron Dispersive Spectroscopy coupled with SEM. However this technique only provides elemental composition, not redox state or structural information, and so does not give detailed information on the crystalline form of the S0, or whether the biomineralization contained

2- polysulphides (Sn ). A separate study, not part of this work, is underway try to determine the configuration of the biomineralized S0 (G. Lau, personal communication). A further question is whether Gillisia sp. BF64A_kw10.4 and Flavobacterium sp. BF09-06_kw11.51 may be able to oxidize the S0 from the glacial deposits at Borup Fiord Pass to provide energy for growth. The

112

Borup Fiord Pass glacial S0 deposits have been shown to contain both γ sulfur (rosickyite) and α sulfur (Gleeson et al., 2012) and more detailed work is underway to further characterize its nature (G. Lau personal communication). It is possible that the Borup Fiord Pass glacial S0 deposits contain S0 in a different, more bioavailable, configuration than the commercial S0 used as a test substrate to date. If Flavobacterium sp. BF09-06_kw11.51 is able to use the S0 from the

Borup Fiord Pass glacial deposits as an energy source that may explain why it was so strongly dominant in the SSU rRNA sequences, as described in Chapter 2, but this remains to be tested.

Finally, it is not known whether Gillisia sp. BF64A_kw10.4 produces the S0 biomineralized structures in the Borup Fiord Pass glacial deposits as such structures have not been definitively observed when these deposits were examined under the microscope, although there were some possible detections (Gleeson et al., 2012).

Heterotrophic thiosulfate oxidation

Flavobacterium sp. BF09-06_kw11.51 did not appear to be able to produce the S0 biomineralized structures, as such structures were never observed in these cultures, even though it is closely related to Gillisia sp. BF64A_kw10.4. However, Flavobacterium sp. BF09-

06_kw11.51 did however demonstrate the ability to oxidize thiosulfate to sulfate (Figure 4.9), although it did not appear to be able to gain energy from this reaction as cell numbers were not significantly different when this isolate was grown in a culture containing an organic carbon source (agar) either with, or without thiosulfate, with all other culture conditions, and inoculum numbers, being identical (Figure 4.11). A Flavobacterium isolate has previously been reported as being able to oxidize thiosulfate (Teske et al., 2000) although in that case the isolate also still needed organic carbon in the form of agar to grow. Flavobacterium limicola, which is closely

113

related to Flavobacterium sp. BF09-06_kw11.51 (see Figure 4.1) has been shown to be able to hydrolyze agar in culture (Tamaki et al., 2003). The SSU rRNA sequences from the isolates from the Teske et al. study (Accession numbers AF254113 & AF254114) are also closely related to

Flavobacterium sp. BF09-06_kw11.51 although they seem to be even more closely related to

Gillisia sp. BF64A_kw10.4 (Figure 4.1) which did not demonstrate the ability to oxidize thiosulfate. However there is poor definition (<50% bootstrap) between the Gillisia sequences group and those of other Genera in the Flavobacteriaceae family shown in Figure 4.1, in particular the Flavobacterium group, although Gillisia and Flavobacterium are annotated as separate Genera in both the Greengenes and Silva phylogenetic classifications. In addition, the results of the experiments designed to test whether Gillisia sp. BF64A_kw10.4 was able to oxidize thiosulfate were ambiguous, as high background sulfate would have masked changes in sulfate concentration of the same magnitude as those seen in the Flavobacterium sp. BF09-

06_kw11.51 experiment. It would be necessary to repeat the Gillisia experiment with lower background sulfate in order to resolve this question. Marinobacter sp. BF07-02_kw10.8 also demonstrated the ability to oxidize thiosulfate, but not to obtain energy for growth from this reaction, and this is consistent with the findings of Choi et al. (2009) who found that a

Marinobacter isolate could oxidize thiosulfate but still required organic carbon for growth. Some

Marinobacter have been demonstrated to possess a soxB gene (Perreault et al., 2008) but this was not detected in DNA from Marinobacter sp. BF07-02_kw10.8.

Heterotrophic thiosulfate oxidation has been observed in a range of microbes that are widely distributed in soil and sediments, including Psuedomonas, Bacillus and filamentous fungi

(Ehrlich & Newman 2009, Tuttle et al., 1974, Mason & Kelly, 1988). Consequently it has been suggested that it is a ubiquitous and significant process in sulfidic sediments where organic

114

carbon is present (Jørgensen & Nelson 2004). The ability of Flavobacterium and Marinobacter, which are also widely distributed geographically across different types of environment (Green et al., 2009, Singer et al., 2011), to engage in heterotrophic thiosulfate oxidation strengthens the case for this process being globally significant in the carbon cycle.

Genetic capability for sulfur redox metabolisms

The genome of Flavobacterium johnsoniae contains a polysulphide reductase

(psr) gene (McBride et al., 2009) and the sequence of this gene was used to probe other

Flavobacterium genomes. Flavobacterium psychrophilum genome contains a gene which is 84% identical to the F. johnsoniae gene at the amino acid level and which contains an annotated psr domain (Duchaud et al., 2007), although the F.psychrophilum gene is annotated as a molybdopterin oxidoreductase (the general type of enzyme to which psr belongs). The only

Gillisia genome to have been sequenced to date is that of Gillisia limnaea DSM 15749

(Accession number JH594606) which also has a gene annotated with a psr domain, although the gene is described as a quniol:cytochrome c oxidoreductase. The Gillisia gene shares 85% identity at amino acid level with the F.johnsoniae gene. None of these genes appear to have been tested to determine whether they are functional. The psr gene encodes the three-subunit enzyme

2- that reduces polysulphide (Sn ) to sulfide (Yamamoto et al., 2010). If Gillisia sp.

BF64A_kw10.4 and Flavobacterium sp. BF09-06_kw11.51 possess psr genes, then an interesting possibility is whether Gillisia sp. BF64A_kw10.4 might be using the polysulphide reductase enzyme in the reverse direction to normal, oxidizing sulfide to polysulphide, which then precipitates on the filaments to produce the S0 biomineralized structures that have been observed. To our knowledge the use of the polysulphide reductase in this way has not been

115

observed in a living organism, but there seems no logical reason why this enzyme should not be able to operate in the reverse direction if conditions are suitable. The reverse polysulphide reductase reaction (oxidation of sulfide to polysulphide) has been used to assay this enzyme’s activity in vitro (Yamamoto et al., 2010). Interestingly, in that assay the sulfide oxidation was linked to menaquinone reduction (Yamamoto et al., 2010) and genome analysis has shown that the Flavobacteriaceae family use menaquinone in their electron transport chains (Duchaud et al.,

2007). Gillisia sp. BF64A_kw10.4 shares 98.78% SSU rRNA sequence identity with Gillisia limnaea DSM 15749 while Flavobacterium sp. BF09-06_kw11.51 shares 97.13% SSU rRNA sequence identity with F. psychrophilum and 95.76% sequence identity with F. johnsoniae, so a high priority for future work will be to determine if these isolates also possess psr genes and, if so, whether they are being expressed in cultures where S0 bioimineralization or thiosulfate oxidation is observed. No other genes likely to have sulfur redox activity were detected in the genomes of F.johnsoniae, F.psychrophilum and G. limnaea when the genomes were probed directly using amino acid sequences for the Sox, sulfide-quinone reductase, sulfite oxidase and rDsr enzymes. In addition, the information available on the genetic capabilities of

Flavobacterium and Gillisia does not explain why Gillisia sp. BF64A_kw10.4 produces S0 biomineralized structures but Flavobacterium sp. BF09-06_kw11.51 does not, when cultured under the same conditions.

Carbon uptake

A final intriguing finding was that Gillisia sp. BF64A_kw10.4, Flavobacterium sp.

BF09-06_kw11.51 and Marinobacter sp. BF07-02_kw10.8 all showed the ability to take up bicarbonate in the 14C uptake experiment. Carbon fixation ability has been reported previously

116

for at least one other Marinobacter (Edwards et al., 2003) but has not been previously reported for Flavobacterium or Gillisia and the genomes of F. johnsoniae, F. psychrophilum and G. limnaea have not been reported as containing carbon fixation genes (Duchaud et al., 2007,

McBride et al., 2009). However, recent work has demonstrated that some Flavobacterium sp. are capable of utilizing C1 compounds to provide energy for growth, including formate, methanol, monomethylamine, methanesulfonate (Moosvi et al. 2005, Boden et al., 2008, Green et al., 2011). In addition, isolates from two other Genera in the Flavobacteriaceae family,

Muricauda sp. (Accession number DQ48648) and Arenibacter sp. (Accession number

DQ486485) have been demonstrated to be able to obtain energy from the oxidation of dimethylsulfide, although organic carbon was still needed for growth (Green et al., 2011). The isolates described in Moosvi et al. and Boden et al. studies share 89.96% to 96.54% SSU rRNA sequence identity with Flavobacterium sp. BF09-06_kw11.51, and 85.88% to 93.55% identity with Gillisia sp. BF64A_kw10.4, while the Green et al. isolates are more distantly related.

If Flavobacterium sp. BF09-06_kw11.51 possesses methylotrophic capabilities this may provide a possible alternative explanation for its dominance in the glacial sulfur deposits at

Borup Fiord Pass. The spring water contains methane (8% of headspace gas, S.Grasby personal communication) so if the Flavobacterium sp. in the deposits are capable of methane oxidation this might explain their presence, and the reason why their relative dominance decreases as the distance from the spring source increases, as methane would be less abundant further from the spring source. However the metagenome data do not contain significant numbers of methane monooxygenase (Figure 2.5, Chapter 2), methanol dehydrogenase or methanesulfonate monoxygenase genes (data not shown), which are all involved in different methylotrophic capabilities (Boden et al., 2007, Chauhan et al., 2012). Nor are these genes observed in the

117

sequenced Flavobacterium and Gillisia genomes. Further work would be needed to elucidate the capabilities of Gillisia sp. BF64A_kw10.4 and Flavobacterium sp. BF09-06_kw11.51 to take up and utilize C1 compounds, including inorganic carbon, in order to determine whether they are capable of methylotrophy and/or carbon fixation.

Role of Flavobacterium and Gillisia in other sulfur-rich environments

Flavobacterium sp. BF09-06_kw11.51 and Gillisia sp. BF64A_kw10.4 isolates are closely related to sequences identified from other cold, sulfur-rich sites, at Blood Falls in the

Antarctic (Sequence Accession Numbers DQ677863 and DQ677875, Mikucki et al., 2007) and cold sulfur springs on Axel Heiberg Island in the Canadian High Arctic (Sequence Accession

Number EU196339, Perreault et al., 2008) as shown by the phylogenetic tree in Figure 4.1.

These sites share, with Borup Fiord Pass Glacier, the characteristics of extreme cold, and of microbial populations supported by chemolithotrophic metabolisms as the main source of energy for growth; sulfur chemistry is significant at both of these sites, although Blood Falls is distinctly different from Borup Fiord Pass glacier in that it is also dominated by iron chemistry (Mikucki et al., 2007, Mikucki et al., 2008, Niederberger et al., 2009, Perreault et al., 2007, Perreault et al.,

2008). The authors of those studies did not identify the Flavobacterium and Gillisia present as having a role in sulfur chemistry, but this work suggests that the Flavobacterium and Gillisia present at those sites may have a previously-unsuspected role in the sulfur cycling there.

CONCLUSIONS

This study has demonstrated that Gillisia sp. BF64A_kw10.4 possess the ability to produce S0 biomineralized structures that are different from the types of biogenic sulfur deposits

118

that have previously been characterized, and that it is responsible for the S0 biomineralized structures reported in Gleeson et al., 2011. Sulfur biomineralization has not previously been reported for a member of the Flavobacteria. It is not clear whether Gillisia sp. BF64A_kw10.4 actively oxidizes sulfide to S0, or whether it is capable of utilizing the stored S0 to provide energy for growth. It is also not clear whether this Gillisia sp. is generating these structures in the Borup

Fiord Pass glacial sulfur deposits, but if it is, then this could affect the ability of other bacteria present to utilize S0 as a substrate. Flavobacterium sp. BF09-06_kw11.51 did not exhibit the same ability as Gillisia sp. BF64A_kw10.4 to generate S0 biomineralized structures but did show the ability to oxidize thiosulfate to sulfate, although it did not appear to be able to conserve energy for growth from this reaction, which is consistent with previous studies (Teske et al.,

2000). The genetic capabilities for these activities has not been determined, but Flavobacterium and Gillisia genomes each contain a gene with a polysulphide reductase (psr) domain. Future work will investigate whether Flavobacterium sp. BF09-06_kw11.51 and Gillisia sp.

BF64A_kw10.4 possess psr genes and if so, whether they are being expressed in the culture conditions under which thiosulfate oxidation, and S0 biomineralization, are observed.

Flavobacteria are very widespread in the environment and this work adds to the developing body of evidence (Boden et al., 2008, Moosvi et al., 2005, Green et al., 2009) that they may be far more important in global sulfur cycling than is currently recognized.

119

CHAPTER 5

ADDITIONAL CULTURING STUDIES

INTRODUCTION

One of the key questions about the sulfur deposits at Borup Fiord Pass is whether the S0 remains stable over significant time periods (years rather than months) or whether it becomes oxidized to sulfate, given the fact that this is a thermodynamically favorable reaction in the oxidized surface environment (see Chapter 2). An early objective of this project was therefore to seek to culture Borup microbes capable of the oxidation of S0 or thiosulfate (one possible intermediate in the oxidation of S0 to sulfate) in culture conditions that matched the environment as closely as possible. This was one way to try and determine whether bacteria in the deposits had this capability. Success in culturing such organisms would indicate that Borup microbes had the capability to oxidize S0 or thiosulfate (as appropriate) in environmentally-relevant conditions, although it would not prove that they were necessarily doing so at Borup (further work would be needed to test that) since culture conditions will never match environmental conditions exactly.

For the same reason, failure to culture such organisms would not prove that they did not exist at

Borup. In addition, the oxidation of S0 is one of the steps in the sulfur cycle for which the genes have not been fully identified (as described in Chapter 2) and so a particular objective was to seek to isolate an organism oxidizing S0 , which could be used to investigate the genes involved in this process.

120

PRELIMINARY CULTURING EXPERIMENTS

I set up initial cultures using the same EM media as used in Gleeson et al., 2011, with 10 mM bicarbonate, except that aliquots of media were made at different pHs ranging from pH 6.5 to pH 10, in 0.5 pH units. To buffer at pH 6.5 MES was used, to buffer at pH 7.0 and 7.5 HEPES was used, to buffer at pH 8.0, 8.5 and 9.0 TRIS was used and to buffer at pH 9.5 and 10.0 a mix of NaHCO3 and Na2CO3 was used. The cultures were set up using 10 ml of media and the

0 addition of either 0.1g of S or 10mM thiosulfate (added as Na2 S2O3). In each case, this was the only sulfur-containing substance added. S0 could not be autoclaved at the normal temperature as it melts at either 113°C or 119°C, depending on how rapidly it is heated. When heated rapidly it melts at 113°C but if heated more slowly the cystalline structure re-arranges to a more stable configuration and it melts at the higher temperature of 119°C. Early experimentation by me had shown that in the autoclave it melted at 113°C. I therefore devised a modified autoclave program. Instead of using a sterilization temperature of 125°C for 20 minutes (the standard procedure for sterilization) a sterilization temperature of 110°C for 60 minutes was used for S0.

2- My early experiments also showed that autoclaving S2O3 could result in the partial breakdown

2- 2- of the S2O3 into sulfide. In order to ensure that experiments contained only S2O3 and not also

2- sulfide (to be sure the enrichment was specific for S2O3 oxidation) thiosulfate stock solutions were filter sterilized using a 0.2 µm filter instead of autoclaving. Cultures were loosely capped to ensure that gas transfer (in particular access to atmospheric oxygen and carbon dioxide) was possible. All culture experiments included sterile controls. Cultures were incubated at 60C in the dark for a minimum of 1 month after which transfers were made using 0.1ml of a culture to a

10ml daughter culture set up in exactly the same way as the original culture. As the cultures did not contain any organic carbon, or any energy source other than oxidation of the relevant sulfur

121

substance, the objective was to enrich for microbes that gained energy for growth using either S0 oxidation or thiosulfate oxidation (as appropriate). I inoculated cultures with samples of sulfur taken from the surface of Borup Fiord Pass Glacier by Steve Grasby in 2007, and kept refrigerated until cultures were set up in October 2008; they were therefore designed as BF07 cultures. Transferring the cultures ensured that cells was designed to ensure that cells which could not grow under these conditions were successively diluted out, in order to obtain a culture where sulfur lithotrophy was the energy metabolism used. I assessed cultures using microscope examination to roughly assess cell growth. Cells were stained using the Live-Dead stain

(Invitrogen) and cultures were differentiated using an approximate assesssment of cell growth

(whether abundant cells or very few cells were observed under the microscope). I tested the pH of cultures using pH strips to see if acidification (expected if sulfur species are oxidized through to sulfate) was occurring.

These intial culturing experiments demonstrated that S0 oxidation and thiosulfate cultures would only grow if the pH was below pH 8. The pH needed to be below pH 7 for optimum growth, and pH in these cultures stabilized at pH 5 after several months. A drop in pH was only observed in cultures, not in abiotic controls set up in the same way, which was consistent with an interpretation that cultures were oxidizing S0 oxidation or thiosulfate (depending on the type of culture) to sulfate, so acidifying the culture. It had not been possible to obtain a pH measurement of the sulfur deposits in 2006 or 2007 (although the pH of the spring was measured in 2006).

When field work was done in 2009, I measured the pH of the deposits at pH 6.0 – 7.0. It is therefore quite likely that the sulfur deposits in 2007 were also at about this pH and that would explain why microbes cultured from the deposits prefer neutral to mildly acidic pH.

122

Once mixed cultures were established which appeared to be growing well, samples were taken filter sterilized by me, and analyzed for sulfate and thiosulfate concentrations using ion chromatography by Fred Luiszer. I also did accurate cell counting using a Petroff-Hausser cell counter. I extracted DNA from cultures using the Powersoil DNA extraction kit (MoBio, USA), amplified it using universal small subunit ribosomal RNA primers 515F and 1391R. I then gel purified the PCR product using the Montage gel purification kit (Millipore), re-adenylated it, cloned using the TOPO cloning kit (Invitrogen, USA), amplified using M13 primers, and purified using the Purelink PCR clean-up kit (Invitrogen). DNA was sequenced by SeqWright using Sanger sequencing. However, the DNA extraction for sequencing was not performed at the same time as the cell counts using the Petroff-Hauser cell counter and sampling for sulfate and thiosulfate analysis (cell counting and sampling were done at the same time). I identified DNA sequences by using BLAST against the NCBI nt nr database.

Experiments using accurate cell counting with the Petroff-Hauser cell counter, and measurement of sulfate and thiosulfate concentrations using ion chromatography showed cell growth of mixed cultures linked to production of sulfate in both the S0 oxidation and thiosulfate oxidation cultures (Figures 5.1 and 5.2 below). This was interpreted as cell growth being linked to S0 oxidation, or thiosulfate oxidation, respectively.

123

2 60000000 Sulfate (culture) 1.8 Sulfate (abiotic control) Cells (culture) 50000000 1.6 Cells (abiotic control) 1.4 40000000 1.2

1 30000000 mM Cells/ml 0.8 20000000 0.6

0.4 10000000 0.2

0 0 0 5 10 15 20 25 30 35 40

Days

Figure 5.1 Mixed BF07 S0 oxidation culture demonstrating sulfate production linked to cell growth

124

12 Sulfate (culture) 30000000 Sulfate (abiotic control) Thiosulfate (culture) 10 Thiosulfate (abiotic control) 25000000 Cells (culture) Cells (abiotic control) 8 20000000

6 15000000 mM Cells/ml

4 10000000

2 5000000

0 0 0 7 14 21 28 35 42 Days

Figure 5.2 Mixed BF07 thiosulfate oxidation culture demonstrating thiosulfate oxidation to sulfate linked to cell growth

The mixed culture sequence data showed that the S0 oxidation culture was dominated by

Marinobacter (34 out of 49 sequences = 69%) with Gillisia being the second most dominant group at 20% (10 sequences). The Marinobacter isolated from this culture was the Marinobacter sp. BF07-02_kw10.8 described in Chapter 4. As set out in Chapter 4 it did not demonstrate the ability to oxidize S0 in pure culture, but did demonstrate the ability to oxidize thiosulfate. The

2007 Gillisia was not isolated.

The mixed culture sequence data from the thiosulfate oxidation culture showed that

Thiomicrospira was the dominant bacteria (27 sequences), which is a well-known thiosulfate oxidizer and which was dominant in the environment SSU rRNA data from 2007 (data not shown). The culture also contained Alphaproteobacteria (10 sequences), Marinobacter (2

125

sequences), Actinobacteria (2 sequences), Firmicutes-Clostridiales (1 sequence)and

Bacteroidetes (2 sequences). Both cultures may not have been fully characterized by the amount of sequencing carried out, and as sequencing was not carried out at the same time as the cell counting and chemical analyses, it is possible that the cultures contained other members which were responsible for the S0 observed.

ADDITIONAL ISOLATIONS

A number of additional isolates were obtained by me as work was undertaken to try to isolate the Marinobacter, Flavobacterium and Gillisia that were the subjects of Chapter 4. They were not further characterized or cultured as they were not considered priorities for further investigation. The isolates obtained, their source, and the methods by which they are obtained, are shown in Table 5.1 (below, after conclusions).

CONCLUSIONS

These experiments demonstrated that mixed cultures of Borup microbes from the 2007 deposits were capable of both thiosulfate and S0 oxidation oxidation. In the experiments shown in Figures 5.1 and 5.2 thiosulfate and S0 oxidation oxidation were linked to cell growth in a culture that did not contain any organic carbon, so this is also evidence that these microbes were capable of fixing carbon and using sulfur redox reactions to provide energy for growth.

126

Table 5.1 Borup Fiord isolates

2006 Isolates

Source Name Identification Enrichment culture Isolation plate and (closest hit) appearance BF06-5b kw7 Paenibacillus NaS + sediment gradient K then Lept plates tube BF06-4a kw_10.4 Flavobacterium NaS + sediment gradient 4 mM sulfide (in EM tube base) at room temp BF06-5b kw5 *1 Shewanella? NaS + sediment gradient K then Lept plates (closest match but tube not good identification) BF06-4a kw_10.3 Marinobacter NaS + sediment gradient 4 mM sulfide (in EM tube base) at 6 deg Celcius

2007 Isolates

Source Name Identification Enrichment culture Isolation plate and (closest hit) appearance BF07-02 kw_10.8 Marinobacter S0 oxidation 100 µM acetate (in Acc#EU908283.1 EM base) at RT BF07-02 kw_10.11 Lokatanella S0 oxidation 4 mM sulfide (in EM Acc# FJ362503 base) at RT Note: later determined not to be pure culture BF07-02 kw_10.18 Loktanella S0 oxidation K plate at RT BF07-03 kw4, Microbacteriaceae( S0 oxidation 4 mM sulfide (in EM kw5 *1 & Cryobacteria / base) kw6 Labedella) BF07-02 kw_10.15 Rhizobiales (same S0 oxidation 4mM NaS + 2mM as 11.7) S2O3 then 4 mM NaS BF07-02 kw_10.17 Aurantimonas S0 oxidation K plate at room temp BF07-06 kw_11.42 Thiobacillus None - Field sample 10 mM S2O3 at 6 °C Acc#AY082471 BF07-06 kw_11.36 Thiobacillus None – field sample S2O3 at RT Acc# AY082471.1 May be mixed culture BF07-06 kw_11.63 Alpha-proteo? None – field sample 5 mM sulfide at 6 °C Acc# FJ437982.1 microaerophilic only 83% ident

127

2009 Isolates

Source Name Identification Enrichment culture Isolation plate and (closest hit) appearance BF09-02 kw_10.19 Arthrobacter NaS + S2O3 gradient tube K plate at RT (discarded) BF09-02 kw_11.1 Arthrobacter NaS + S2O3 gradient tube K plate at 6 °C BF09-02 kw_11.3 Flavobacterium NaS + S2O3 gradient tube 4 mM sulfide (in EM base) at 6 °C BF09-02 kw_11.4 Loktanella NaS + S2O3 gradient tube 4 mM sulfide (in EM base) at room temp BF09-06 kw_11.7 Aurantimonas NaS + S2O3 gradient tube 4 mM sulfide (in EM Acc#EF540474.1 base) at room temp BF09-06 kw_11.17 Salinibacterium K plate at 6 °C BF09-06 kw_11.21 Pseudomonas? 4 mM sulfide (in EM 91% ident (contig) base) at room temp BF09-02 kw_10.20 Arthrobacter NaS + S2O3 gradient tube K plate at room temp BF09-02 kw_11.16 Microbacteriaceae NaS + S2O3 gradient tube K plate at room temp BF09-02 kw_11.19 Arthrobacter NaS + S2O3 gradient tube K plate at 6 °C BF09-06 kw_11.2 Pseudomonas NaS + S2O3 gradient tube K plate at 6 °C Acc#GU62544.1 BF09-06 kw_11.24 Planococcus None – field sample K plate at 6 °C Acc# EU196338.1

BF09-06 kw_11.31 Thiobacillus None – field sample S2O3 at RT Acc# AY082471.1 also Frassassi uncultured bacterium Acc# DQ415787 BF09-06 kw_11.32 Thiobacillus None – field sample S2O3 at RT Acc# AY082471.1 BF09-06 kw_11.51 Flavobacterium None – field sample 5 mM sulfide at 6 °C Acc# AB075231.1 BF09-02 kw_11.53 Flavobacterium None-field sample 5 mM sulfide at 6 °C Acc# AB075231.1 microaerophilic 97% ident to11.51 BF09-06 kw_11.59 Arthrobacter None-field sample (bag 5 mM sulfide at 6 °C Acc#EF423346.1 kept frozen) microaerophilic Only89% ident but several Ns BF09-06 kw11.49 Microbacteriaceae None – field sample cream or yellow on 5 (Cryobacterium (metagenome sample) mM sulfide at 6 °C /Labedella) *1 The BF07 kw5 is NOT the same as the BF06 kw5.

128

CHAPTER 6

CONCLUSIONS

INTRODUCTION

This chapter contains my overall conclusions on microbial sulfur cycling at Borup Fiord

Pass Glacier, highlights the novel aspects of this work, and discusses the main limitations of this study and how they could be addressed in future work. It also discusses the wider relevance of this work.

OBJECTIVES OF THIS STUDY

The main objective of this study was to determine how well bioenergetic calculations of the relative amounts of energy available from different redox reactions in the sulfur deposits on

Borup Fiord Pass Glacier predicted the microbial utilization of those reactions, as indicated by relative abundance of key functional genes. A secondary objective was to determine the identity of the Borup microbe(s) responsible for the production of novel S0 biomineralized structures that had previously been observed in mixed cultures originally inoculated with sulfur from the Borup glacial deposits (Gleeson et al., 2011).

SUMMARY OF THE FINDINGS OF THIS STUDY

The microbial community of the elemental sulfur (S0) on the surface of the Borup Fiord

Pass Glacier, Canadian High Arctic, is dominated by members which possess the genetic capability to undertake a range of sulfur redox reactions, and in particular to oxidize reduced sulfur species, starting with sulfide (the most reduced form of sulfur) all the way through to

129

sulfate (the most oxidized form of sulfur). The geochemical data, bioenergetic calculations, small subunit ribosomal RNA (SSU rRNA) data and the quantitative functional gene data from the metagenome all support the conclusion that this community is driven by sulfur lithotrophy as the main energy source for primary productivity. The deep deposit on which the metagenome analysis was performed was dominated by Sulfurovum and Sulfuricurvum, Epsilonproteobacteria known to be capable of sulfur lithoautotrophy. Their numerical dominance in the deposits, combined with the fact that the most abundant sulfur redox and carbon fixation genes in the metagenome appear, by sequence comparison, to be of Epsilonproteobacterial origin, indicate that these Epsilonproteobacteria are most likely the major primary producers and sulfur cyclers.

This is the first time that these Genera have been reported as being dominant in a sub-aerial environment, as they have previously been found in anoxic and sulfidic environments such as hydrothermal vents and sulfidic caves. By far the most abundant energy source available in the glacial deposits is the S0 and as cultured representatives of these Genera have previously been shown in culturing studies to be capable of oxidizing S0 to sulfate it is likely that the Borup

Epsilonproteobacteria are using this reaction to obtain energy for growth. The genes responsible for oxidizing S0 to sulfate are not fully known, but the metagenome shows high levels of abundance of the sox gene cluster, known to be able to oxidize S0 to sulfate in vitro. There is disproportionately-high relative abundance of DsrE genes in the metagenome compared to the other Dsr genes, and as the DsrE gene product is known to be involved in mobilization of internal S0 deposits in bacteria which possess the entire Dsr cluster, this raises the interesting question of whether the DsrE gene in these Epsilonproteobacteria might be involved in mobilizing the external, solid, S0 so that the intracellular sox genes can act upon it.

130

There is a curious absence of both SSU rRNA and functional genes from phototrophs in the metagenome, indicating that photosynthesis is not being significantly utilized by this community despite the presence of abundant light (24-hour daylight during the arctic summer when the samples were collected). There was also an absence of genes for both the aerobic and anaerobic oxidation of ammonium (nitrification and anammox, respectively) despite the fact that both metabolisms should be energetically favorable.

The cultured representatives of Sulfurovum and Sulfuricurvum have been shown to prefer microaerophilic rather than fully oxygenated conditions and it was noteworthy that the surface of

S0 deposits on the glacier are dominated by Flavobacterium. It is possible that the aerobic metabolism of this Flavobacterium is drawing down the oxygen level sufficiently to create a microaerophilic climate in the lower regions of the deposit. This dominant Flavobacterium was isolated, and culturing studies indicated that it was capable of oxidizing thiosulfate to sulfate, but was not able to gain energy for growth from this reaction. It is therefore most likely gaining energy from the oxidation of carbon compounds. One possibility that would explain its dominance near the spring is that the spring water contains methane, and some Flavobacterium sp. have shown the ability to utilize C1 compounds to obtain energy for growth. However, a key gene known to be involved in methane oxidation, methane monooxygenase, was not strongly represented in the metagenome.

A Gillisia sp. isolate, also a member of the Flavobacteriaceae family, was also isolated from cultures originally inoculated with sulfur from the Borup glacial deposits. This isolate showed the ability to create unusual S0-biomineralized structures although it is not clear whether it is actively metabolizing sulfur species or simply catalyzing a novel precipitation process.

Nevertheless, this is a previously-unknown ability for Flavobacteria. It is not clear whether this

131

process is significant in the Borup system as the Gillisia is numerically a minor constituent and the biomineralized structures have not so far been observed in the environmental S0 deposits.

The dominant members of the community in the S0 deposits are very different from that of the stream. This suggests that the change in environment going from an anoxic, sulfide-rich, subsurface to the surface S0 deposits which have ample access (at least at their surface) to atmospheric oxygen allows different microbes to be successful in each environment. The dominant members of the stream microbial community are Burkholderiaceae members

Burkholderia and Ralstonia, but their role in the system was not investigated as part of this study. Both Burkholderia and Ralstonia representatives have shown the ability to oxidize thiosulfate, but while Burkholderia could conserve energy for growth from this reaction,

Ralstonia has not so far shown this ability (Anandham et al., 2008, Cramm 2009). The role that

Burkholderia and Ralstonia are playing within the Borup Glacial spring and sulfur deposits therefore remains unknown.

OVERVIEW CONCLUSIONS ON MICROBIAL SULFUR CYCLING AT BORUP

A major question that remains unanswered is whether the S0 on the surface of Borup

Fiord Pass Glacier is produced by direct biological oxidation, indirect biological action, or abiotic processes. We also still do not know whether the S0 is generated at the site in which it is found, or transported there from elsewhere by the spring water. Finally, we do not know at what time, and under what environmental conditions, the S0 is laid down. There are certainly organisms within the spring water that are capable of oxidizing sulfide to S0, in particular the

Epsilonproteobacteria Sulfurovum and Sulfuricurvum, the Gammaproteobacteria Thiomicrospira and the Betaproteobacteria Thiobacillus. But these are all minor constituents of the spring water.

132

However, the microbial population of the spring water is only a wash out of whatever is underground, and that community composition might be quite different. The composition of the spring water could also be different at the time the S0 is laid down, than it was at the time we sampled. Abiotic oxidation of sulfide to S0 is also a possibility, and the extremely high level of sulfide in the stream increases the probability of this, as high sulfide:oxygen ratio increases the likelihood that S0 will be formed as one of the oxidation products of sulfide (Zhang & Millero,

1993). At the moment, it seems likely that a combination of biological and abiotic processes are involved in the production of the S0, but more work, and in particular additional field work, will be needed to determine the relative contributions of each type of process.

Successful resolution of this question would add to our understanding of the environmental conditions in which microbes do, or do not, oxidize sulfide to S0, which would add to our overall knowledge of microbial ecology. In addition, if the S0 is proven to be largely biogenic, then it could be used to add to our understanding of the differences between biogenic and abiotic S0, and our ability to distinguish between the two when trying to interpret deposits of

S0 that have been preserved through geological time in the rock record on Earth, or to interpret the potential biogenicity of any S0 deposits that may be found on Mars, Europa or other planetary bodies. Detailed studies on the Borup S0, using techniques such as high-energy X-ray spectroscopy to explore the exact molecular structure, are currently underway as part of a parallel project within the laboratory, not within the scope of this thesis, to further explore this possibility.

The final fate of the S0 at Borup is also unknown, but there is rather more information to go on. In both 2007 and 2009, bacteria known to be capable of oxidizing elemental sulfur to sulfate were seen to be strongly dominant: Thiomicrospira in 2007 (data not shown) and the

133

Epsilonproteobacteria in 2009. Moreover cultures set up to enrich for S0 oxidation, using culture conditions that matched the environment, were successful in generating mixed cultures of organisms from the site that oxidized S0 to sulfate, indicating that viable organisms with this capability exist within the deposits. Given the fact that the oxidation of sulfur is energetically- favorable in this environment it would be expected that, eventually, all the S0 would be oxidized to sulfate. The presence of dissolved sulfate and the mineral gypsum (CaSO4) at the site may result from this process, although these could equally well be abiotic. However, although it would be expected that the S0 would be oxidized, this is a slow process in these conditions, and so it could be preserved over long timescales if the microbial process was halted, for example, by the freezing conditions of the arctic winter. The presence of S0 in paleodeposits at the site that may indicate former spring sites (Grasby et al., 2012) do indicate that the S0 can be preserved for many years.

NOVEL FEATURES AND BROADER SIGNIFICANCE OF THIS STUDY

The novel features of this work that have wider relevance are:

1. There are some environmental conditions in which photosynthesis is not a competitive metabolism, over and above the well-accepted limits of lack of light and high temperatures.

Photosynthesis, and in particular oxygenic photosynthesis, is widely regarded as the most energetically-favorable metabolism on Earth, and the most important driver of primary productivity. The finding that it is not competitive in the sulfur deposits on top of Borup Fiord

Pass Glacier, even in conditions of 24-hour daylight, suggest that there may be previously unknown constraints which impact more on photosynthesis than on some other types of

134

metabolism, in this case, sulfur lithotrophy. The suggestion made in this study is that the absence of photosynthesis at this site is due to a combination of several factors, namely that:

i) the surface of the glacier is a relatively pristine site, which does not support macroscopic life and which probably only supports a relatively low concentration of microbial life in comparison with other environments;

ii) it suddenly receives a relatively high load of new organisms from the subsurface, a dark environment, which therefore does not contain phototrophs;

iii) along with the new organisms there is a significant supply of energy, in the form of reduced forms of sulfur (in particular sulfide and S0);

iv) some of the organisms from the subsurface possess flexible metabolic capabilities, in particular the ability to grow chemolithoautotrophically using sulfur oxidation reactions to provide energy, and fixing carbon dioxide from the atmosphere, and so survive the transition from an anoxic to oxygenated environment, and

v) the new arrivals from the subsurface populate the surface S0 deposits (both by growth and continued seeding from the subsurface) faster than photosynthetic organisms can colonize the same area due to growth and seeding from atmospheric transport.

Further investigation would be needed to determine whether this hypothesis is correct. It would be necessary to undertake repeated sampling of the spring and sulfur deposits throughout the course of an entire season, and also to test the microbial composition of the glacial surface away from the spring and sulfur, in order to track the changes in community composition and the total biomass. This would provide the data to determine:

135

i) whether the microbial population is indeed derived from the spring, as it appears from the data to date, or whether it is derived from the microbial population of the glacial surface away from the spring;

ii) how the microbial community in the S0 changes over time, both in terms of total cell numbers and relative proportions of dominant members, to check whether there is indeed a shift in the community composition from the spring water to the S0. Although it appears that this is the case from the data to date, there is only one time point of data from the spring water and so we do not know whether the difference in community composition seen between the spring water and the deposits is really a shift once the microbes have seeded the surface, or whether it is that the spring water microbial community simply changed over time; and

iii) the change in the number of phototrophs over time, so see if the relative proportion increases during the summer, which would indicate that the phototrophs are gradually out- competing the lithotrophs. If no change was seen, it could mean that atmospheric transportation is extremely slow, or alternatively that there was something in the environment that was actively preventing phototrophs from growing successfully.

Either way, a study of this type could provide some extremely interesting new information on the limits of the effectiveness of photosynthesis as a microbial metabolism.

2. The DsrE gene may play a role in mobilizing extracellular S0 and so enable Sox proteins to oxidize this solid energy source The mechanisms by which bacteria oxidize S0 are not fully understood. Bacteria with the rDsr genes are known to use those gene products in S0 oxidation

(Dahl et al., 2008), but those genes were not abundant in the metagenome. The Sox proteins have been shown to be able to oxidize S0 in vitro (Rother et al., 2001), but it has not been

136

demonstrated that living bacteria can use them in this way. The Sox proteins are periplasmic, and it is not clear how S0, which is extremely insoluble, could be transported into the cell to enable the Sox complex to act upon it. The high relative abundance of DsrE-type genes in the metagenome and their presence in Epsilonproteobacterial genomes, in the absence of other Dsr genes, combined with the facts that Epsilonproteobacteria are known to be able to oxidize S0, and that in the full DsrE complex the DsrE gene product is involved in mobilizing internal S0 stores, give rise to the question of whether the DsrE gene product might be used by

Epsilonproteobacteria to mobilize S0 outside the cell. If this hypothesis regarding DsrE is correct, it would fill a significant knowledge gap in how bacteria can utilize S0. However, experimental evidence to test the hypothesis would be needed to confirm whether or not it is correct. Culturing studies involving S0-oxidizing Epsilonproteobacteria would need to test whether the DsrE was being expressed when S0 was being used as an energy source. If the DsrE gene is indeed being expressed, then making a mutant with this gene knocked out would be necessary to test whether or not it was essential to the S0-oxidation process. As the DsrE gene could be an essential gene for other reasons, it would be important also to test the knockout mutant on a different energy source such as sulfide, which Epsilonproteobacteria could oxidize using their sqr or fcc genes.

3. Flavobacteria may be more significant in global sulfur cycling than has previously been suspected. The ability of the Borup Gillisia isolate BF06-kw10.4 to form biomineralized structures is novel, not only for this Genus, but more widely, but its environmental importance is unclear. As far as is known, such structures have only been reported once previously, in a study of another cold sulfur spring (Douglas & Douglas 2000) and the microbes responsible for

137

making the structures were not identified. It is not known whether the Borup Gillisia sp. makes these structures at Borup, but the previous Douglas & Douglas observations certainly suggests that it is possible. The mineral structure of can S0 affect how easy, or difficult, it is for other microbes to utilize it (Franz et al., 2007) and so S0 biomineralization could potentially alter the bioavailability of sulfur to other organisms, and so the way microbial sulfur cycling works in some environments. A useful experiment would be to test how easy or difficult it is for bacteria, which are known to have the capability to oxidize S0, to oxidize the sulfur in these structures.

The ability of the Flavobacterium isolate BF09-kw11.51 to oxidize thiosulfate is not novel, and the dominance of this Flavobacterium in the S0 is still unexplained, but given the global prevalence of Flavobacteria (Green et al., 2009) if they have a previously-unrecognized ability to metabolize sulfur compounds then this could be of significant environmental importance.

4. Sulfurovum and Sulfuricurvum may inhabit wider environmental niches than the highly-sulfidic environments in which they have been reported. Epsilonproteobacteria that are known to undertake sulfur lithotrophy have been observed to be highly abundant in sulfidic environments such as hydrothermal vents and sulfidic caves (Macalady et al., 2006 Nakagawa et al., 2005). The fact that they have now been found to be highly abundant in a sub-aerial setting could indicate an even broader role for Epsilonproteobacteria in global sulfur cycling than has previously been considered. However, this is only a tentative conclusion as it was not possible to get an accurate measurement of the sulfide content of the S0 deposit used for the metagenome.

Even though the site is sub-aerieal, and the surface of the deposit is exposed to the oxygenated atmosphere, it is possible that the interior of the deposit was, in fact, highly sulfidic in which case this environment would correspond more closely to those in which these Genera have

138

previously been observed. The fact that Epsilonproteobacteria were highly abundant in the deposit, but not the spring water, suggest that they may be rapid colonizers of the surface sulfur deposits, and this is consistent with observations that they are rapid colonizers at hydrothermal vent sites, even when Gammaproteobacteria sulfur oxidizers are more dominant overall (López-

Garcia et al., 2003, Nakagawa et al., 2005). One possible reason for this may be because they use different processes to fix CO2. The Epsilonproteobacteria sulfur lithoautotrophs use the reductive citric acid cycle, which only uses 1.67 ATP molecules to fix each CO2 molecule, whereas

Gammaproteobacteria sulfur lithotrophs such as Thiomicrospira use the Calvin-Benson-Bassham cycle which uses 3 ATP molecules to fix each CO2 (Canfield 2005, Hugler et al., 2005, Scott et al., 2006). All other things being equal, this would allow Epsilonprotebacteria to grow more quickly.

MAIN CONSTRAINTS OF THE STUDY, AND POSSIBLE FURTHER WORK

The constraints of the individual aspects of this study have been set out in the relevant sections of the preceding chapters, so this section just focuses on the main constraints. These fall into three distinct categories: i) constraints in the raw data available, due to the inability to return to the field site; ii) constraints in the computational power available to analyze the data; iii) the constraint that is inherent in using DNA data, as DNA represents genetic potential but does not provide information on which aspects of that genetic potential are in use at any given time.

The main impact of the first constraint is the fact that it was not possible to obtain some chemical information, most importantly the sulfide concentration of the S0 deposit, or to take

139

repeat measurements or samples, in order to test the hypotheses produced as a result of the initial data analysis, or in order to obtain information on how the chemical conditions, and microbial community, changed over the course of the season. This could be addressed by future fieldwork but it was not possible to return to the site during the course of this study because it is so remote and so it is very difficult, and expensive, to reach.

Additional information that could be obtained without further fieldwork would be to look at which genes are being expressed in this environment, if it is possible to extract RNA from samples of the sulfur that remain preserved in the laboratory freezer. This would provide valuable further insight into which reactions microbes are, in fact, using, given that they have the genetic capability to carry out several different sulfur redox reactions, by looking at which of the highly-abundant sulfur redox genes are actually being expressed. The metagenome sequence data could be used to design primers which specifically target the genes of interest.

The second constraint was mostly a question of access to hardware, but partly an issue of the bioinformatic skills needed for the task. In this study both of these problems were overcome by using the MG-RAST pipeline for the initial stages of the analytical process, as these were the most demanding in terms of computational power. The issue of computational power could be solved in future by accessing the University of Colorado’s supercomputer (not available at the time this analysis was carried out) and by further work to write suitable programs to enable the analysis to be carried out without needing to use the MG-RAST pipeline.

More generally, access to significant computing power, and bioinformatic skills, is likely to be an issue for many environmental microbiologists are likely to face this issue as the ability to produce huge quantities of sequence data has expanded more rapidly than the ability of scientists to analyze the data. The introduction of cloud computing may provide a relatively low-

140

cost method for laboratories which only do small amounts of such work to access the necessary hardware, but it is not yet clear how much resource, and for what cost, a typical analysis would require. The skill gap is also significant. Whilst collaboration with specialist bioinformatic groups is one option, that is not currently feasible for all. A tremendous service is provided to the community by freely-available resources such as MG-RAST, but there are trade-offs. In order to provide this computational resource for free, the MG-RAST pipeline takes steps to reduce the computational power needed for each metagenome. Perhaps the most significant of these is the fact that sequences are clustered at 90% amino acid similarity, and then only one sequence of the cluster is used for functional identification, rather than every sequence being identified individually. This could reduce the accuracy of the quantification of functions. It is also true that using such pipelines means surrendering some degree of intellectual control. Nevertheless, it is a useful resource. In any future metagenome studies that I did, I might still use the MG-RAST system for initial quality control of the data, but I would then download the NCBI protein nr database, use the BLAST Executables (installed on a local computer) to BLAST the metagenome data against the database to assign function to each sequence individually, and I would then write programs of my own, to collate and quantify the results, supplemented by existing software such as MEGAN4 (Huson et al., 2011) which can be used to group BLAST outputs by functional groups using the SEED (Overbeek et al., 2005) or KEGG (Kanehisa et al., 2008) classification schemes.

In order to maximize the benefit for the sequence data that are being produced, it would be extremely helpful if these ‘universal’ classification schemes were improved to more comprehensively cover the full range of functional genes known. The SEED system (Overbeek et al. 2005) seems to be the most comprehensive, but could still be improved.

141

It would also benefit this field if there was some consensus on the quality control measures used, so that datasets from different studies, and different sites, could be directly compared in order to draw wider conclusions. For example, the critical question of what quality control cutoff should be used for determining whether a certain sequence is, or is not, the same gene as a reference sequence is one which remains unanswered. This is in part because the answer is unlikely to be the same in every case as different genes evolve at different rates, but it may be possible to reach some agreement on a cutoff to be used, just as in SSU rRNA analysis there is a general consensus that 97% identity should be considered a ‘species-level’ OTU, even though it is recognized that this is merely a convention.

The impact of the third constraint is that it is impossible to tell with certainty which of the genes present in the metagenome are being used by the microbes that are found within the sulfur deposit. The underlying assumption that is being made in this work is that genes that are present in high abundance, which must therefore be present in a significant proportion of the microbes present, are more likely to be in use than those which are present in low abundance, which could be derived from microbes that are dormant, or dead. A particular issue arises in the case of the sulfur redox genes possessed by Sulfurovum and Sulfuricurvum as these organisms possess several different sulfur redox genes within the same genome. It is impossible to tell which of these genes they are using, and so exactly which sulfur redox reactions they are catalyzing, from

DNA alone. For example, if they are highly abundant in the metagenome because they are using their sox genes, then as their cell numbers increase, the relative abundance of all their genes will increase in parallel, whether each gene is being used or not. This limitation can be addressed by extracting RNA from the environmental sample, to determine which genes are being expressed.

142

More generally, the use of metatranscriptomic approaches, using RNA extracted from the environment as the template for cDNA, and then making a shotgun library from this cDNA, which is then sequenced in the same way as the metagenome DNA, provides a way to gain a comprehensive overview of all the genes that are being used by an environmental microbial community at any time. The comparison of metagenomic and metatranscriptomic data from the same environmental sample provides invaluable information not only about which genes are in use, but which are not, which adds to our knowledge of the way in which environmental conditions affect microbial activities (Frias-Lopez et al., 2008, Stewart et al, 2011b).

The emerging field of metaproteomics will add yet another layer of information, as RNA expression levels may not necessarily accurately predict protein levels, due to differential rates of

RNA and protein turnover, but so far metaproteomic techniques do not provide the same quantity of information as metagenomic or metatranscriptomic approaches (Ram et al., 2005).

CONCLUSION

The particular focus of this study was to look at how well bioenergetic calculations of the amount of energy available from different reactions in this particular environment predicted the type of microbial energy metabolism that was found there. It is, as far as I know, the most robust and comprehensive investigation of this question that has been published to date. Clearly it does not deal with all possible aspects that affect habitability, and how microbes will react to changing conditions, but it does begin to illuminate some aspects of this, by demonstrating that energetically-favorable niches can remain unfilled. Much more work is needed to determine the reasons for this, and how widely applicable these findings are to other environments. Some questions, such as the possible role of DsrE in mobilizing extracellular S0,

143

will require culturing experiments, while others, such as the relative lack of phototrophs, will require a molecular approach. The work in this thesis demonstrates how different types of investigative approaches, in this case geochemical analysis, bioenergetic calculations, metagenomic analysis and environmentally-relevant culturing experiments, can be used together to deepen our understanding of microbial ecology, and the ways in which it impacts our environment.

Microbial cycling of major elements such as carbon, nitrogen, iron and sulfur can have a significant impact on the environment, both locally, and globally. We do not know the full extent of microbial biogeochemical cycling on our planet and its climate, nor how this will change in responding to any change in environmental conditions. These cycles are inter-connected, and contain multiple interactions, including the possibility of both positive and negative feedback loops. In trying to understand how our planet is going to respond to the warming climate, and in particular in trying to understand the impact of further human action, deliberate or otherwise, it is critically important that we understand better how these microbial biogeochemical cycles work.

Otherwise, any human interventions designed to improve environmental conditions, for example, to address climate change, could end up making the problem even worse. We are only just beginning to scratch the surface of this understanding. It is only in the last few years, with the massive increase in sequence data available via next-generation sequencing technologies, that we have the tools that enable us to investigate the extent of microbial diversity, and its actions and impacts. Work on the human microbiome, the area where most work has been done, continues to show new and unexpected impacts of this microbiome on human wellbeing. It is not unrealistic to assume that the impact of the planetary microbiome as a whole on the wellbeing of Earth, and its inhabitants, is also much greater than is currently realized.

144

REFERENCES

Altschul S.F., Gish W., Miller W., Myers E.W. and Lipman D.J. 1990. Basic Local Alignment Search Tool. J. Mol. Biol. 215: 403-410

Amend J.P. & Shock E.L. (2001) Energetics of overall metabolic reactions of thermophilic and hyperthermophilic Archaea and Bacteria FEMS Microbiol Rev 25: 175-243

Amend J.P., McCollom T.M., Hentscher M. & Bach W. (2011) Catabolic and anabolic energy for chemolithoautotrophs in deep-sea hydrothermal systems hosted in different rock types. Geochimica et Cosmochimica Acta 75: 5736-5748

Anandham R., Indiragandhi P., Madhaiyan M., Ryu K.Y., Jee H.J. & Sa T.M. (2008) Chemolithoautotrophic oxidation of thiosulfate and phylogenetic distribution of sulfur oxidation gene (soxB) in rhizobacteria isolated from crop plants Research in Microbiology 159: 579-589

Baskar S., Baskar R., Thorseth I.H., Øvreas L. & Pedersen R.B. (2012) Microbially Induced Iron Precipitation Associated with a Neutrophilic Spring at Borra Caves, Vishakhapatnam, India Astrobiology 12(4): 327-346

Benson D.A., Karsch-Mizrachi I., Lipman D.J., Ostell J. & Sayers E.W. (2011) GenBank Nucleic Acids Research 39: D32 doi:10.1093/nar/gkq1079

Bertin P.N., Heinrich-Salmeron A., Pelletier E., Goulhen-Chollet F., Arsène-Ploetze F., Gallien S., Lauga B., Casiot C., Calteau A., Vallenet D., Bonnefoy V., Bruneel O., Chane- Woon-Ming B., Cleiss-Arnold J., Duran R., Elbaz-Poulichet F., Fonknechten N., Giloteaux L., Halter D., Koechler S., Marchal M., Mornico D., Schaeffer C., Smith A.A.T., Van Dorsselaer A., Weissenbach J., Médigue C. & Le Paslier D. (2011) Metabolic diversity among main microorganisms inside an arsenic-rich ecosystem revealed by meta- and proteo- genomics ISME J. 5(11): 1735-47

Biddle J.F., Robert White J., Teske A.P. & House C.H. (2011) Metagenomics of the subsurface Brazos-Trinity Basin (IODP site 1320): comparison with other sediment and pyrosequenced metagenomes ISME J. 5: 1038-1047

Blöthe M. & Roden E.E. (2009) Microbial Iron Redox Cycling in a Circumneutral-pH Groundwater Seep Appl Environ Microbiol 75(2): 468-473

Bodaker I., Sharon I., Suzuki M.T., Feingersch R., Shmoish M., Andreischeva E., Sogin M.L., Rosenberg M., Maguire M.E., Belkin S., Oren A. & Béjà O. (2010) Comparative community genomics in the Dead Sea: an increasingly extreme environment ISME J. 4: 399-407

Boden R., Thomas E., Savani P., Kelly D.P. & Wood A.P. (2008) Novel methylotrophic bacteria isolated from the River Thames (London, UK) Environmental Microbiology 10(12): 3225-3236

145

Bond P.L., Smriga S.P. & Banfield J.F. (2000) Phylogeny of Microorganisms Populating a Thick, Subaerial, Predominantly Lithotrophic Biofilm at an Extreme Acid Mine Drainage Site Appl Environ Microbiol 66(9): 3842-3849

Booijink C.C.G.M., Boekhorst J., Zoetendal E.G., Smidt H., Kleerebezem M. & de Vos W.M. (2011) Metatranscriptome Analysis of the Human Fecal Microbiota Reveals Subject- Specific Expression Profiles, with Genes Encoding Proteins Involved in Carbohydrate Metabolism Being Dominantly Expressed Appl Environ Microbiol 76(16): 5533-5540

Borin S., Brusetti L., Daffonchio D., Delaney E. & Baldi F. (2009) Biodiversity of prokaryotic communities in sediments of different sub-basins of the Venice lagoon. Research in Microbiology 160: 307-314

Boyd E., Hamilton T.L., Spear J.R., Lavin M. & Peters J.W. (2010) [FeFe]-hydrogenase in Yellowstone National Park: evidence for dispersal limitation and phylogenetic nice conservatism ISME J. 4: 1485-1495

Boyd E., Lange R.K., Mitchell A.C., Havig J.R., Hamilton T.L., Lafrenière M.J., Shock E.L., Peters J.W. & Skidmore M. (2011) Diversity, Abundance and Potential Activity of Nitrifying and Nitrate-Reducing Microbial Assemblages in a Subglacial Ecosystem Appl Env Microbiol 77(14): 4778-4787

Brazelton W.J., Nelson B and Schrenk M.O. (2012) Metagenomic evidence for H2 production and H2 oxidation by serpentenite-hosted subsurface microbial communities. Frontiers in Extreme Microbiology 2:268: doi: 10.3389/fmicb.2011.00268

Canfield D.E., Thamdrup B. & Kristensen E. (2005) Aquatic Geomicrobiology; Advances in Marine Biology Volume 48. pub Elsevier Academic Press, San Diego, CA, USA.

Canfield D.E., Stewart F.J., Thamdrup B., De Brabandere L., Dalsgaard T., Delong E.F., Revsbech N.P. & Ulloa O. (2010) A Cryptic Sulfur Cycle in Oxygen-Minimum-Zone Waters off the Chilean Coast Science 330: 1375-1378

Caporaso J.G., Kuczynski J., Stombaugh J., Bittinger K., Bushman F.D., Costello E.K., Fierer N., Gonzalez Peña A., Goodrich J.K., Gordon J.I., Huttley G.A., Kelley S.T., Knights D., Koenig J.E., Ley R., Lozupone C.A., McDonald D., Muegge B.D., Pirrung M., Reeder J., Sevinsky J.R., Turnbaugh P.J., Walters W.A., Widmann J., Yatsunenko T., Zaneveld J. & Knight R. (2010) QIIME allows analysis of high-throughput community sequencing data. Nat.Methods 7(5): 335-336

Castenholz R.W. & Utkilen H.C. (1984) Physiology of sulfide tolerance in a thermophilic Oscillatoria Arch Microbiol 138: 299-305

Chauhan A., Pathak A. & Ogram A. (2012) Composition of Methane-Oxidizing Bacterial Communities as a Function of Nutrient Loading in the Florida Everglades Soil Microbiology DOI 10.1007/s00248-012-0058-2

146

Chen Y., Wu L., Boden. R., Hillebrand A., Kumaresan D., Moussard H., Baciu M., Lu Y., & Murrell J.C. (2009) Life without light: microbial diversity and evidence of sulfur- and ammonium-based chemolithotrophy in Movile Cave. ISME J 3: 1093-1104

Cheng Q. (2006) Structural diversity and functional novelty of new biosynthesis genes J Ind Microbiol Biotechnol 33: 552-559

Choi B.-R., Pham V.H., Park S.-J., Kim S.-J., Roh D.-H. & Rhee S.-K. (2009) Characterization of Facultative Sulfur-oxidizing Marinobacter sp. BR13 Isolated from Marine Sediment of Yellow Sea, Korea J Korean Soc Appl Biol Chem 52(4): 309-314

Cohen Y., Jørgensen B.B., Revsbech N.P. & Poplawski R. (1986) Adaptation to of Oxygenic and Anoxygenic Photosynthesis among Cyanobacteria Appl Environ Microbiol 51(2): 398-407

Cole J.R., Wang Q., Cardenas E., Fish J., Chai B., Farris R.J., Kulam-Syed-Mohideen A.S., McGarrell D.M., Marsh T., Garrity G.M. & Tiedje J.M. (2009) The Ribosomal Database Project: improved alignments and new tools for rRNA analysis Nucleic Acids Research 37: D141-D145

Costa K.C., Navarro J.B., Shock E.L., Zhang C.L., Soukup D., Hedlund B.P. (2009) Microbiology and geochemistry of great boiling and mud hot springs in the United States Great Basin. Extremophiles 13: 447-459

Cotton F.A., Wilkinson G., Murillo C.A. & Bochman M. (1999) Advanced Inorganic Chemistry 6th Edition Wiley-Interscience, New York, USA

Cramm R. (2009) Genomic View of Energy Metabolism in Ralstonia eutropha H16 Journal of Molecular Microbiology and Biotechnology 16: 38-52

Dahl C., Engels S., Pott-Sperling A.S., Schulte A., Sander J., Lübbe Y., Deuster O. & Brune D.C. (2005) Novel Genes of the dsr Gene Cluster and Evidence for Close Interaction of Dsr Proteins during Sulfur Oxidation in the Phototrophic Sulfur Bacterium Allochromatium vinosum J Bacteriol 187(4): 1392-1404

Dahl C., Schulte A., Stockdreher Y., Hong C., Grimm F., Sander J., Kim R., K. S.-H. & Shin D.H. (2008) Structural and Molecular Genetic Insight into a Widespread Sulfur Oxidation Pathway. J. Mol. Biol. 384: 1287-1300

Dalsgaard T., Thamdrup B. & Canfield D.E. (2005) Anaerobic ammonium oxidation (anammox) in the marine environment. Research in Microbiology 156: 457-464

Desai N., Antonopoulos D., Gilbert J.A., Glass E.M. & Meyer F. (2012) From genomics to metagenomics Current Opinion in Biotechnology 23: 72-76

147

DeSantis T.Z., Hugenholtz P., Larsen N., Rojas M., Brodie E.L., Keller K., Huber T., Dalevi D., Hu P. & Andersen G.L. (2006) Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB Appl Environ Microbiol 72(7): 5069-5072

Dinsdale E.A., Edwards R.A., Hall D., Angly F., Breitbart M., Brulc J.M., Furlan M., Desnues C., Haynes M., Li L., McDaniel L., Moran M.A., Nelson K.E., Nilsson C., Olson R., Paul J., Rodriguez Brito B., Ruan Y., Swan B.K., Stevens R., Valentine D.L., Vega Thurber R., Wegley L, White B.A. & Rohwer F. (2008) Functional metagenomic profiling of nine biomes Nature 452: 629-632

Dojka M.A., Hugenholtz P., Haack S.K. and Pace N.R. 1998. Microbial diversity in a hydrocarbon- and chlorinated-solvent-contaminated aquifer undergoing intrinsic bioremediation. Appl. Environ. Microbiol. 64: 3869-3877

Douglas S. & Douglas D.D. (2001) Structural and Geomicrobiological Characteristics of a Microbial Community from a Cold Sulfide Spring Geomicrobiology Journal 18: 401-422

Duchaud E., Boussaha M., Loux V., Bernardet J.-F., Michel C., Kerouault B., Mondot S., Nicolas P., Bossy R., Caron C., Bessières P., Gibrat J.-F., Claverol S., Dumetz F., Le Hénaff M. & Benmansour A. (2007) Complete genome sequence of the fish pathogen Flavobacterium psychrophilum Nature Biotechnology 25(7): 763-769

Edgar R.C. (2010) Search and clustering orders of magnitude faster than BLAST Bioinformatics 26(19): 2460-2461

Ehrlich H.L. & Newman D.K. (2009) Geomicrobiology 5th Edition CRC Press, Boca Ratan, FL, USA

Emerson D. & Revsbech N.P. (1994) Investigation of an Iron-Oxidizing Microbial Mat Community Located near Aarhus, Denmark: Field Studies Appl Environ Microbiol 60(11): 4022-4031

Falkowski P.G., Fenchel T. & DeLong E. (2008) The Microbial Engines That Drive Earth’s Biogeochemical Cycles Science 320: 1034-1039

Flores, G.E., Campbell J.H., Kirshtein J.D., Meneghin J., Podar M., Steinberg J.I., Seewald J.S., Kingston Tivey M., Voytek M.A., Yang Z.K. & Reysenbach A.-L. (2012) Microbial community structure of hydrothermal deposits from geochemically different vent fields along the Mid-Atlantic Ridge Environmental microbiology 13(8): 2158-2171

Franz B., Lichtenberg H., Hormes J., Modrow H., Dahl C. & Prange A. (2007) Utilization of solid ‘elemental’ sulfur by the phototrophic purple sulfur bacterium Allochromatium vinosum: a sulfur K-edge X-ray absorption spectroscopy study Microbiology 153: 1268-1274

148

Frias-Lopez J., Shi Y., Tyson G.W., Coleman M.L., Schuster S.C., Chisholm S.W. & DeLong E.F. (2008) Microbial community gene expression in ocean surface waters PNAS 105(10): 3805-3810

Friedrich C.G., Bardischewsky F., Rother D. Quentmeier A. & Fischer J. (2005) Prokaryotic sulfur oxidation. Current Opinion in Microbiology 8: 253-259

Frigaard N.-U. & Dahl C. (2009) Sulfur Metabolism in Phototrophic Sulfur Bacteria Advances in Microbial Physiology 54: 103-200

Gaidos E., Marteinsson V., Thorsteinsson T., Johannesson T., Runarsson A.R., Stefansson A., Glazer B., Lanoil B., Skidmore M., Han S., Miller M., Rusch A., Foo W. (2009) An oligarchic microbial assemblage in the anoxic bottom waters of a volcanic subglacial lake The ISME Journal 3: 486-497

Gardner M.J. & Altman D.G. (1986) Confidence intervals rather than P values: estimation rather than hypothesis testing British Medical Journal 292: 746-750

Gillespie J.J., Wattam A.R., Cammer S.A., Gabbard J.L., Shukla M.P., Dalay O., Driscoll T., Hix D., Mane S.P., Mao C., Nordberg E.K., Scott M., Schulman J.R., Snyder E.E., Sullivan D.E., Wang C., Warren A., Williams K.P., Xue T., Seung Yoo H., Zhang C., Zhang Y., Will R., Kenyon R.W. & Sobral B.W. (2011) PATRIC: the Comprehensive Bacterial Bioinformatics Resource with a Focus on Human Pathogenic Species Infection and Immunity 79(11): 4286-4298

Gleeson D.F., Pappalardo R.T., Grasby S.E., Anderson M.S., Beauchamp B., Castaño R., Chien S.A., Doggett T., Mandrake L. & Wagstaff K.L. (2010) Characterization of a sulfur- rich Arctic spring site and field analog to Europa using hyperspectral data. Remote Sensing of Environment 114: 1297-1311

Gleeson D.F., Williamson C., Grasby S.E., Pappalardo R.T., Spear J.R. & Templeton A.S. (2011) Low-temperature S0 biomineralization at a supraglacial spring system in the Canadian High Arctic Geobiology 9: 360-375

Gleeson D.F., Pappalardo R.T., Anderson M.S., Grasby S.E., Mielke R.E., Wright K.E. & Templeton A.S. (2012) Biosignature Detection at an Arctic Analog to Europa Astrobiology 12(2): 135-150

Goll J., Rusch D.B., Tanenbaum D.M., Thiagarajan M., Li K., Methé B.A. & Yooseph S. (2010) METAREP: JCVI metagenomics reports – an open source tool for high-performance comparative metagenomics Bioinformatics 26(20): 2631-2632

Gomez-Alvarez V., Teal T.K. & Schmidt T.M. (2009) Systematic artifacts in metagenomes from complex microbial communities ISME J. 3: 1314-1317

149

Gomez-Alvarez V., Revetta R.P., Santo Domingo J.W. (2012) Metagenome analyses of corroded concrete wastewater pipe biofilms reveals a complex microbial system BMC Microbiology 12: 122

Grasby S.E., Allen C.C., Longazo T.G., Lisle J.T., Griffin D.W. & Beauchamp B. (2003) Supraglacial Sulfur Springs and Associated Biological Activity in the Canadian High Arctic – Signs of Life Beneath the Ice. Astrobiology 3(3): 583-596

Grasby S.E. (2003) Naturally precipitating vaterite (µ-CaCO3) spheres: Unusual carbonates formed in an extreme environment Geochimica et Cosmochimica Acta 67(9): 1659-1666

Grasby S.E., Beauchamp B. & Bense V. (2012) Sulfuric Acid Speleogenesis Associated with a Glacially Driven Groundwater System – Paleo-spring “Pipes” at Borup Fiord Pass, Nunavut Astrobiology 12(1): 1-10

Green D.H., Shenoy D.M, Hart M.C. & Hatton A.D. (2011) Coupling of Dimethylsulfide Oxidation to Biomass Production by a Marine Flavobacterium Appl Environ Microbiol 77(9): 3137-3140

Griesbeck C., Schütz M., Schödl T., Bathe S., Nausch L., Mederer N., Vielreicher M. & Hauska G. (2002) Mechanism of Sulfide-Quinone Reductase Investigated Using Site-Directed Mutagenesis and Sulfur Analysis Biochemistry 41: 11552-11565

Haaijer S.C.M., Harhangi H.R., Meijerink B.B., Strous M., Pol A., Smolders A.J.P., Verwegen K., Jetten M.S.M. & Op den Camp H.J.M. (2008) Bacteria associated with iron seeps in a sulfur-rich, neutral pH, freshwater ecosystem. The ISME Journal 2: 1231-1242

Haas B.J., Gevers D., Earl A.M., Feldgarden M., Ward D.V., Giannoukos G., Ciulla D., Tabbaa D., Highlander S.K., Sodergren E., Methé B., DeSantis T.Z., The Human Microbiome Consortium, Petrosino J.F., Knight R. & Birren B.W. (2011) Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons Genome Research 21: 494-504

Hall J.R., Mitchell K.R., Jackson-Weaver O., Kooser A.S., Cron B.R., Crossey L.J. & Takacs-Vesbach C.D. (2008). Molecular Characterization of the Diversity and Distribution of a Thermal Spring Microbial Community by using rRNA and Metabolic Genes. Appl Environ Microbiol 74(15): 4910-4922

Hallbeck L., Pedersen K. (1990) Culture parameters regulating stalk formation and growth rates of Gallionella ferruginea Journal of General Microbiology 136: 1675-1680

Hallberg K.B., Dopson M. & Lindström E.B. (1996) Reduced Sulfur Compound Oxidation by Thiobacillus caldus J Bacteriol 178(1): 6-11

150

Hamilton T.L., Vogl K., Bryant D.A., Boyd E.S. & Peters J.W. (2012) Environmental constraints defining the distribution, composition, and evolution of chlorophototrophs in thermal features of Yellowstone National Park Geobiology 10: 236-249

Han P., Zheng L., Cui Z.S., Guo X.C. & Tian L. (2009). Isolation, identification and diversity analysis of petroleum-degrading bacteria in Shengli Oil Field wetland soil Ying Yong Sheng Tai Xue Bao 20(5): 1202-1208

Handley K.M., Wrighton K.C., Piceno Y.M., Andersen G.L., DeSantis T.Z., Williams K.H., Wilkins M.J., N’Guessan L., Peacock A., Bargar J., Long P.E. & Banfield J.F. (2012) High- density PhyloChip profiling of stimulated aquifer microbial communities reveals a complex response to acetate amendment. FEMS Microbiol Ecol 81(1): 188-204

Harding T., Jungblut A.D., Lovejoy C. & Vincent W.F. (2011) Microbes in High Arctic Snow and Implications for the Cold Biosphere Appl Environ Microbiol 77(10): 3234-3243

Hentscher M. & Bach W. (2102) Geochemically induced shifts in catabolic energy yields explain past ecological changes of diffuse vents in the East Pacific Rise 9 degrees 50’N area Geochemical Transactions 13(2): doi:10.1186/1467-4866-13-2

Hoehler T. (2004) Biological energy requirements as quantitative boundary conditions for life in the subsurface Geobiology 2: 205-215

Hoehler T. (2007a) An Energy Balance Concept for Habitability Astrobiology 7(6): 824-838

Hoehler T. (2007b) A “Follow the Energy” Approach for Astrobiology Astrobiology 7(6): 819- 823

Holtgrieve G.W., Schindler D.E., Hobbs W.O., Leavitt P.R., Ward E.J., Bunting L., Chen G., Finney B.P., Gregory-Eaves I., Holmgren S., Lisac M.J., Lisi P.J., Nydick K., Rogers L.A., Saros J.E., Selbie D.T., Shapley M.D., Walsh P.B. & Wolfe A.P. (2011) A Coherent Signature of Anthropogenic Nitrogen Deposition to Remote Watersheds of the Northern Hemisphere Science 334: 1545-1548

Hopkinson B.M. & Barbeau K.A. (2011) Iron transporters in marine prokaryotic genomes Environmental Microbiology doi:10.1111/j.1462-2920.2011.02539.x

Hughes K.A. & Lawley B. (2003) A novel Antarctic microbial endolithic community within gypsum crusts Environmental Microbiology 5(7): 555-565

Hugler M., Wirsen C.O., Fuchs G., Taylor C.D. & Sievert S.M. (2005) Evidence for Autotrophic CO2 Fixation via the Reductive Tricarboxylic Acid Cycle by Members of the Subdivision of ε Proteobacteria J Bacteriol 187(9): 3020-3027

Huse S.M., Huber J.A., Morrison H.G., Sogin M.L. & Welch D.M. (2007) Accuracy and quality of massively parallel DNA pyrosequencing Genome Biology 8(7): R143

151

Huson D.H., Mitra S., Ruscheweyh H.-J., Weber N. & Schuster S. (2011) Integrative analysis of environmental sequences using MEGAN4 Genome Research 21(9): 1552-1560

Inagaki F., Takai K., Nealson K.H. & Horikoshi K. (2004) Sulfurovum lithotrophicum gen. nov., sp. nov., a novel sulfur-oxidizing chemolithoautotroph within the ε-Proteobacteria isolated from Okinawa Trough hydrothermal sediments Int J Syst Evol Microbiol 54: 1477-1482

Inskeep W.P., Rusch D.B., Jay Z.J., Herrgard M.J., Kozubal M.A., Richardson T.H., Macur R.E., Hamamura N., Jennings R.M., Fouke B.W., Reysenbach A-L., Roberto F., Young M. Schwartz A., Boyd E.S., Badger J.H., Mathur E.J., Ortmann A.C., Bateson M., Geesey G., Frazier M. (2010) Metagenomes from High-Temperature Chemotrophic Systems Reveal Geochemical Controls on Microbial Community Structure and Function PLoS One 5(3): e9773

Jeffries T.C., Seymour J.R., Gilbert J.A., Dinsdale E.A., Newton K., Leterme S.S.C., Roudnew B., Smith R.J., Seuront L. & Mitchell J.G. (2011) Substrate Type Determines Metagenomic Profiles from Diverse Chemical Habitats PLoS One 6(9): e25173

Jones D.S., Tobler D.J., Schaperdoth I., Mainiero M. & Macalady J.L. (2010) Community Structure of Subsurface Biofilms in the Thermal Sulfidic Caves of Acquasanta Terme, Italy Appl Environ Microbiol 76(17): 5902-5910 Jones D.S., Albrecht H.L., Dawson K.S., Schaperdoth I., Freeman K.H., Pi Y., Pearson A. and Macalady J.L. (2012) Community genomic analysis of an extremely acidophilic sulfur- oxidizing biofilm ISME J 6:158-170

Jørgensen B.B., Cohen Y. & Revsbech N.P. (1986) Transition from Anoxygenic to Oxygenic Photosynthesis in a Microcoleus chthonoplastes Cyanobacterial Mat Appl Environ Microbiol 51(2): 408-417

Jørgensen B.B. & Nelson D.C. (2004) Sulfide oxidation in marine sediments: Geochemistry meets microbiology Sulfur Biogeochemistry—Past and Present: Geological Society of America Special Paper 379: 63–81

Kamyshny A., Borkenstein C.G. & Ferdelman T.G. (2009) Protocol for Quantitative Detection of Elemental Sulfur and Polysulfide Zero-Valent Sulfur Distribution in Natural Aquatic Samples Geostandards and Geoanalytical Research 33(3): 415-435

Kanehisa M., Araki M., Goto S., Hattori M., Hirakawa M., Itoh M., Katayama T., Kawashima S., Okuda S., Tokimatsu T. & Yamanishi Y. (2008) KEGG for linking genomes to life and the environment Nucleic Acids Research 36: D480-D484

Kartal B., Maalcke W.J., de Almeida N.M., Cirpus I., Gloerich J., Geerts W., Op den Camp H.J.M., Harhangi H.R., Janssen-Megens E.M., Francoijs K-J., Stunnenberg H.G., Keltjens J.T., Jetten M.S.M. & Strous M. (2011) Molecular mechanism of anaerobic ammonium oxidation Nature 479: 127-130

152

Kent W.J. (2002) BLAT—The BLAST-Like Alignment Tool Genome Research 12: 656-664

Klepac-Ceraj V., Hayes C.A., Gilhooly W.P., Lyons T.W., Kolter R. & Pearson A. (2012) Microbial diversity under extreme : Mahoney Lake, Canada Geobiology 10: 223-235

Kodama Y. & Watanabe K. (2003) Isolation and Characterization of a Sulfur-Oxidizing Chemolithotroph Growing on Crude Oil under Anaerobic Conditions Appl Environ Microbiol 69(1): 107-112

Kodama Y. & Watanabe K. (2004) Sufuricurvum kujiense gen. nov. sp. nov. a facultatively anaerobic, chemolithoautotrophic, sulfur-oxidizing bacterium isolated from an underground crude-oil storage cavity. International Journal of Systematic and Evolutionary Microbiology 54: 2297-2300

Konstantinidis K.T., Braff J., Karl D.M. & DeLong E.F. (2009) Comparative Metagenomic Analysis of a Microbial Community Residing at a Depth of 4,000 Meters at Station ALOHA in the North Pacific Subtropical Gyre Appl Enviro Microbiol 75(16): 5345-5355

Kozubal M.A., Macur R.E., Jay Z.J., Beam J.P., Malfatti S.A., Tringe S.G., Kocar B.D., Borch T. & Inskeep W.P. (2012) Microbial iron cycling in acidic geothermal springs of Yellowstone National Park: integrating molecular surveys, geochemical processes, and isolation of novel Fe-active microorganisms Frontiers in Microbiology 3: 109 doi: 10.3389/fmicb.2012.00109

Krugel H., Krubasik P., Weber K., Saluz H.P. & Sandmann G. (1995) Functional analysis of genes from Streptomyces griseus involved in the synthesis of isorenieratene, a carotenoid with aromatic end groups, revealed a novel type of carotenoid desaturase Biochimica et Biophysica Acta 1439: 57-64

Kuypers M.M.M., Sliekers A.O., Lavik G., Schmid M., Jørgenson B.B., Kuenen J.G., Sinninghe Damste J.S., Strous M & Jetten M.S.M (2003) Anaerobic ammonium oxidation by anammox bacteria in the Black Sea Nature 422: 608-611

Lam P., Jensen M.M., Lavik G., McGinnis D.F., Muller B., Schubert C.J., Amann R., Thamdrup B. & Kuypers M.M.M. (2007) Linking crenarchaeal and bacterial nitrification to anammox in the Black Sea. Proc Nat Acad Sci 104(17): 7104-7109

Lamers L.P.M., van Diggelen J.M.H., Op den Camp H.J.M., Visser E.J.W., Lucassen E.C.H.E.T., Vile M.A., Jetten M.S.M., Smolders A.J.P. & Roelofs J.G.M. (2012) Microbial transformations of nitrogen, sulfur and iron dictate vegetarian composition in wetlands: a review Frontiers in Microbiology 3: 156 doi: 10.3389/fmicb.2012.00156

Lane D.J., Pace B., Olsen G.J., Stahl D.A., Sogin M.L. & Pace N.R. (1985) Rapid determination of 16S ribosomal RNA sequences for phylogenetic analyses. Proc Natl Acad Sci 82: 6955–6959

153

Langmuir D. (1997) Aqueous Environmental Geochemistry Prentice Hall, Upper Saddle River, NJ, USA

Ley R.E., Harris J.K., Wilcox J., Spear J.R., Miller S.R., Bebout B.M., Maresca J.A., Bryant D.A., Sogin M.L. & Pace N.R. (2006) Unexpected Diversity and Complexity of the Guerrero Negro Hypersaline Microbial Mat Appl Environ Microbiol 72(5): 3685-3695

López-Garcia P., Duperron S., Philippot P., Foriel J., Susini J. & Moreira D. (2003) Bacterial diversity in hydrothermal sediment and epsilonproteobacterial dominance in experimental microcolonizers at the Mid-Atlantic Ridge. Environmental Microbiology 5(10): 961-976

Loy A., Duller S., Baranyl C., Mussmann M., Ott J., Sharon I., Béjà O., Le Pasller D., Dahl C. & Wagner M. (2009) Reverse dissimilatory sulfite reductase as phylogenetic marker for a subgroup of sulfur-oxidizing prokaryotes Environmental Microbiology 11(2): 289-299

Macalady J.L., Lyon E.H., Koffman B., Albertson L.K., Meyer K., Galdenzi S. & Mariani S. (2006) Dominant Microbial Populations in Limestone-Corroding Stream Biofilms, Frasassi Cave System, Italy Appl and Environ Microbiol 72(8): 5596-5609 Macalady J., Dattagupta S., Schaperdoth I., Jones D.S., Druschel G.K. & Eastman D. (2008) Niche differentiation among sulfur-oxidizing bacterial populations in cave waters. ISME J 2: 590-601

Mackelprang R., Waldrop M.P., DeAngelis K.M., David M.M., Chavarria K.L., Blazewicz S.J., Rubin E.M. & Jansson J.K. (2011) Metagenome analysis of a permafrost microbial community reveals a rapid response to thaw Nature 480: 368-372

Macur R.E., Langner H.W., Kocar B.D. & Inskeep W.P. (2004) Linking geochemical processes with microbial community analysis: successional dynamics in an arsenic-rich, acid- sulphate-chloride geothermal spring. Geobiology 2: 163-177

Madigan M.T. & Martinko J.M. (2006) Brock Biology of Microorganisms 11th Edition Pearson Prentice Hall Upper Saddle River, NJ, USA

Margulies M., Egholm M., Altman W.E., Attiya S., Bader J.S., Bemben L.A., Berka J., Braverman M.S., Chen Y.J., Chen Z., Dewell S.B., Du L., Fierro J.M., Gomes X.V., Godwin B.C., He W., Helgesen S., Ho C.H., Irzyk G.P., Jando S.C., Alenquer M.L., Jarvie T.P., Jirage K.B., Kim J.B., Knight J.R., Lanza J.R., Leamon J.H., Lefkowitz S.M., Lei M., Li J., Lohman K.L., Lu H., Makhijani V.B., McDade K.E., McKenna M.P., Myers E.W., Nickerson E., Nobile J.R., Plant R., Puc B.P., Ronan M.T., Roth G.T., Sarkis G.J., Simons J.F., Simpson J.W., Srinivasan M., Tartaro K.R., Tomasz A., Vogt K.A., Volkmer G.A., Wang S.H., Wang Y., Weiner M.P., Yu P., Begley R.F., Rothberg J.M. 2005. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437:376–80

Markowitz V.M., Ivanova N.N., Szeto E., Palaniappan K., Chu K., Dalevi D., Chen I-Min.

154

A., Grechkin Y., Dubchak I., Anderson I., Lykidis A., Mavromatis K., Hugenholtz P. & Kyrpides N.C. (2008) IMG/M: a data metagenome and analysis system for metagenomes Nucleic Acids Research 36: D534-D538

Markowitz V.M., Chen I.-M., A., Chu K., Szeto E., Palaniappan K., Grechkin Y., Ratner A., Jacob B., Pati A., Huntemann M., Liolos K., Pagani I., Anderson I., Mavromatis K., Ivanova N.N. & Kyrpides N.C. (2012) IMG/M: the integrated metagenome data management and comparative analysis system Nucleic Acids Research 40: D123-D129

Martinez A., Tyson G.W. & DeLong E.F. (2010) Widespread known and novel phosphonate utilization pathways in marine bacteria revealed by functional screening and metagenomic analyses Environmental Microbiology 12(1): 222-238

Mason J. & Kelly D.P. (1988) Thiosulfate Oxidation by Obligately Heterotrophic Bacteria Microbial Ecology 15(2): 123-134

Matsen F.A., Kodner R.B. & Armbrust E.V. (2010) pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics 11: 538

McBride M.J., Xie G., Martens E.C., Lapidus A., Henrissat B., Rhodes R.G., Goltsman E., Wang W., Xu J., Hunnicutt D.W., Staroscik A.M., Hoover T.R., Cheng Y.-Q. & Stein J.L. (2009) Novel Features of the Polysaccharide-Digesting Gliding Bacterium Flavobacterium johnsoniae as Revealed by Genome Sequence Analysis Appl Enviro Microbiol 75(21): 6874- 6875

McCollom T.M. (2007) Geochemical Constraints on Sources of Metabolic Energy for Chemolithoautotrophy in Ultramafic-Hosted Deep-Sea Hydrothermal Systems Astrobiology 7(6): 933-950

McGurran A.E. (2004) Measuring Biological Diversity Blackwell Publishing, Oxford, U.K.

McNulty N.P., Yatsunenko T., Hsiao A., Faith J.J., Muegge B.D., Goodman A.L., Henrissat B., Oozer R., Cools-Portier S., Gobert G., Chervaux C., Knights D., Lozupone C.A., Knight R., Duncan A.E., Bain J.R., Muehlbauer M.J., Newgard C.B., Heath A.C. & Gordon J.I. (2011) The impact of a consortium of fermented milk strains on the gut microbiome of gnotobiotic mice and monozygotic twins 3(106): doi:10.1126/scitranslmed.3002701

Mende D.R., Waller A.S., Sunagawa S., Järvelin A.I., Chan M.M., Arumugam M., Raes J. & Bork P. (2012) Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data PLoS One 7(2): e31386

Meyer B. (1976) Elemental sulfur Chem Rev 76: 367-388

Meyer F., Phaarmann D., D’Souza M., Olson R., Glass E.M., Kubal M., Paczian T.,

155

Rodriguez A., Stevens R., Wilke A., Wilkening J. and Edwards R.A. 2008. The metagenomics-RAST server – a public resource for the automatic phylogenetic and functional annotation of metagenomes. BMC Bioinformatics 9: 386

Mikucki JA & Priscu JC (2007) Bacterial Diversity Associated with Blood Falls, a Subglacial Outflow from the Taylor Glacier, Antarctica Appl Env Microbiol 73(12): 4029

Mikucki J.A., Pearson A., Johnston D.T., Turchyn A.V., Farquhar J., Schrag D.P, Anbar A.D., Priscu J.C. & Lee P.A. (2009) A Contemporary Microbially Maintained Subglacial Ferrous "Ocean". Science 324: 397-400

Miller S.R. & Bebout B.M. (2004) Variation in Sulfide Tolerance of Photosystem II in Phylogenetically Diverse Cyanobacteria from Sulfidic Habitats Appl Environ Microbiol 70(2): 736-744

Mitra S., Rupek P., Richter D.C., Urich T., Gilbert J.A., Meyer F., Wilke A. & Huson D.H. (2011) Functional analysis of metagenomes and metatranscriptomes using SEED and KEGG BMC Bioinformatics 12(Suppl 1): S21

Mosier A.C., Murray A.E. & Fritsen C.H. (2007) Microbiota within the perennial ice cover of Lake Vida, Antarctica FEMS Microbial Ecol 59: 274-288

Moosvi S.A., McDonald I.R., Pearce D.A., Kelly D.P. & Wood A.P. (2005) Molecular detection and isolation from Antarctica of methylotrophic bacteria able to grow with methylated sulfur compounds Systematic and Applied Microbiology 28: 541-554

Mou X., Sun S., Edwards R.A., Hodson R.E. & Moran M.A. (2008) Bacterial carbon processing by generalist species in the coastal ocean Nature 451: 708-713

Mulder D.W., Boyd E.S., Sarma R., Lange R.K., Endrizzi J.A., Broderick J.B. & Peters J.W. (2010) Stepwise [FeFe]-hydrogenase H-cluster assembly revealed in the structure of EFG HydAΔ Nature 465: 248-252

Muller J., Szklarczyk D., Julien P., Letunic I., Roth A., Kuhn M., Powell S., von Mering C., Doerks T., Jensen L.J. & Bork P. (2010) Extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations Nucleic Acids Research 38: D190-D195

Nakagawa S., Takai K., Inagaki F., Hirayama H., Nunoura T., Horikoshi K. & Sako Y. (2005) Distribution, phylogenetic diversity and physiological characteristics of epsilon- Proteobacteria in a deep-sea hydrothermal field Environmental Microbiology 7(10): 1619-1632

Nakagawa S., Takaki Y., Shimamura S., Reysenbach A.-L., Takai K. & Horikoshi K. (2007) Deep-sea vent ε-proteobacterial genomes provide insights into emergence of pathogens. Proc Natl Acad Sci 104(29): 12146-12150

156

Narwocki E. (2009) Structural RNA Homology Search and Alignment Using Covariance Models. PhD Thesis. Washington University School of Medicine.

Nelson D.C., Wirsen C.O. & Jannasch H.W. (1989) Characterisation of large, autotrophic Beggiatoa spp. abundant at hydrothermal vents of the Guaymas Basin. Appl Environ Microbiol 55: 2909-2917

Ng C., DeMaere M.Z., Williams T.J., Lauro F.M., Raftery M., Gibson J.A.E., Andrews- Pfannkoch C., Lewis M., Hoffman J.M., Thomas T. & Cavicchioli R. (2010) Metaproteogenomic analysis of a dominant green sulfur bacterium from Ace Lake, Antarctica ISME J. 4: 1002-1019

Niederberger T.D., Perreault N.N., Lawrence J.R., Nadeau J.L., Mlelke R.E., Greer C.W., Andersen D.T. & Whyte L.G. (2009) Novel sulfur-oxidizing streamers thriving in perennial cold saline springs of the Canadian high Arctic. Environmental Microbiology 11(3): 616-629

Ogata H., Goto S., Sato K., Fujibuchi W., Bono H. & Kanehisa M. (1999) KEGG: Kyoto Encyclopedia of Genes and Genomes Nucleic Acids Research 27(1): 29-34

Oren A., Padan E. & Malkin S. (1979) Sulfide Inhibition of Photosystem II in Cyanobacteria (Blue-Green Algae) and Tobacco Chloroplasts Biochimica et Biophysica Acta 546: 270-279

Osburn M.R., Sessions A.L., Pepe-Ranney, C., Spear, J.R. 2011. Hydrogen-isotopic variability in fatty acids from Yellowstone National Park hot spring microbial communities. Geochimica et Cosmochimica Acta. 75: 4830-4845

Overbeek R., Begley T., Butler R.M., Choudhuri J.V., Chuang H.-Y., Cohoon M., de Crécy-Lagard V., Diaz N., Disz T., Edwards R., Fonstein M., Frank E.D., Gerdes S., Glass E.M., Goesmann A., Hanson A., Iwata-Reuyl D., Jensen R., Jamshidi N., Krause L., Kubal M., Larsen N., Linke B., McHardy A.C., Meyer F., Neuweger H., Olsen G., Olson R., Osterman A., Portnoy V., Pusch G.D., Rodionov D.A., Rückert C., Steiner J., Stevens R., Thiele I., Vassieva O., Ye Y., Zagnitko O. & Vonstein V. (2005) The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes Nucleic Acids Research 33(17): 5691-5702

Perreault N.N., Andersen D.T., Pollard W.H., Greer C.W. & Whyte L.G. (2007) Characterization of the Prokaryotic Diversity in Cold Saline Perennial Springs of the Canadian High Arctic. Appl Environ Microbiol 73(5): 1532-1543

Perreault N.N., Greer C.W., Andersen D.T., Tille S., Lacrampe-Couloume G., Sherwood Lollar B. & Whyte L.G. (2008) Heterotrophic and Autotrophic Microbial Populations in Cold Perennial Springs of the High Arctic. Appl Env Microbiol 74(22): 6898-6907

Petri R., Podgorsek L. & Imhoff J.F. (2001) Phylogeny and distribution of the soxB gene among thiosulfate-oxidizing bacteria FEMS Microbiol Lett. 197: 171-178

157

Poretsky R.S., Bano N., Buchan A., LeCleir G. Kleikemper J., Pickering M., Pate W.M., Moran A.M. & Hollibaugh J.T. (2005) Analysis of Microbial Gene Transcripts in Environmental Samples Appl Environ Microbiol 71(7): 4121-4126

Porter M.L. & Engel A.S. (2008) Diversity of Uncultured Epsilonproteobacteria from Terrestrial Sulfidic Caves and Springs Appl Environ Microbiol 74(15): 4973-4977

Prakash T. & Taylor T.D. (2012) Functional Assignment of metagenomic data: challenges and applications Briefings in Bioinformatics doi:10.1093/bib/bbs033

Prange A., Chauvistré R., Modrow H., Hormes J., Trüper H.G. & Dahl C. (2002) Quantitative speciation of sulfur in bacterial sulfur globules: X-ray absorption spectroscopy reveals at least three different species of sulfur Microbiology 148: 267-276

Pruesse E., Quast C., Knittel K., Fuchs B.M., Ludwig W., Peplies J. & Glöckner F.O. (2007) SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB Nucleic Acids Research 35(21): 7188-7196

Pruitt K.D., Tatusova T., Klimke W. & Maglott D.R. (2008) NCBI Reference Sequences: current status, policy and new initiatives Nucleic Acids Research 37: D32-D36

Ram R.J., VerBerkmoes N.C., Thelan M.P., Tyson G.W., Baker B.J., Blake II R.C., Shah M., Hettich R.L. & Banfield J.F. (2005) Community Proteomics of a Natural Microbial Biofilm Science 308: 1915-1920

Ramos-Padrón E., Bordenave S. , Lin S., Bhaskar I.M., Dong X., Sensen C.W., Fourner J., Voordown G. and Gieg L.M. (2011) Carbon and Sulfur Cycling by Microbial Communities in a Gypsum-Treated Oil Sands Tailings Pond Environ. Sci. Technol. 45: 439-446

Reeder J. & Knight R. (2010) Rapidly denoising pyrosequencing amplicon reads by exploiting rank-abundance distributions. Nat Methods 7: 668-669

Reigstad L.J., Jorgensen S.L., Lauritzen S.-E., Schleper C. & Urich T. (2011) Sulfur- Oxidizing Chemolithotrophic Proteobacteria Dominate the Microbiota in High Arctic Thermal Springs on Svalbard Astrobiology 11(7): 665-678

Rethmeier J., Rabenstein A., Langer M. & Fischer U. (1997) Detection of traces of oxidized and reduced sulfur compounds in small samples by combination of different high-performance liquid chromatography methods Journal of Chromatography A 760: 295-302

Roeselers G., Norris T.B., Castenholz R.W., Rysgaard S., Glud R.N., Kühl M. & Muyzer G. (2007) Diversity of phototrophic bacteria in microbial mats from Arctic hot springs (Greenland) Environmental Microbiology 9(1): 26-38

Rogers K.L., Amend J.P. & Gurrieri S. (2007) Temporal Changes in Fluid Chemistry and Energy Profiles in the Vulcano Island Hydrothermal System Astrobiology 7(6): 905-932

158

Rondon M.R., August P.R., Betterman A.D., Brady S.F., Grossman T.H., Liles M.R., Loiacono K.A., Lynch B.A., MacNeil I.A., Minor C., Tiong C.L., Gilman M., Osburne M.S., Clardy J., Handelsman J. & Goodman R.M. (2000) Cloning the Soil Metagenome: a Strategy for Accessing the Genetic and Functional Diversity of Uncultured Microorganisms Appl Environ Microbiol 66(6): 2541-2547

Rother D., Henrich H.-J., Quentmeier A., Bardischewsky F. & Friedrich C.G. (2001) Novel Genes of the sox Gene Cluster, Mutagenesis of the Flavoprotein SoxF, and Evidence for a General Sulfur-Oxidizing System in Paracoccus pantotrophus GB17 Journal of Bacteriology 183(15): 4499-4508

Rusch D.B., Halpern A.L., Sutton G., Heidelberg K.B., Williamson S., Yooseph S., Wu D., Eisen J.A., Hoffman J.M., Remington K., Beeson K., Tran B., Smith H., Baden-Tillson H., Stewart C., Thorpe J., Freeman J., Andrews-Pfannkoch C., Venter J.E., Li K., Kravitz S., Heidelberg J.F., Utterback T., Rogers Y.-H., Falcón L.I., Souza V., Bonilla-Rosso G., Eguiarte L.e., Karl D.M., Sathyendranath S., Platt T., Bermingham E., Gallardo V., Tamayo-Castillo G., Ferrari M.R., Strausberg R.L., Nealson K., Friedman R., Frazier M. & Venter J.C. (2007) The Sorceror II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific PLoS Biol 5(3): e77 doi:10.1371/journal.pbio.0050077

Sahl J.W., Fairfield N., Harris J.K., Wettergreen D., Stone W.C. & Spear J.R. (2010) Novel Microbial Diversity Retrieved by Autonomous Robotic Exploration of the World’s Deepest Vertical Phreatic Sinkhole Astrobiology 10(2): 201-213

Sayers E.W., Barrett T., Benson D.A., Bryant S.H., Canese K., Chetvernin V., Church D.M., DiCuccio M., Edgar R., Federhen S., Feolo M., Geer L.Y., Helmberg W., Kapustin Y., Landsman D., Lipman D.J., Madden T.L., Maglott D.R., Miller V., Mizrachi I., Ostell J., Pruitt K.D., Schuler G.D., Sequeira E., Sherry S.T., Shumway M., Sirotkin K., Sourorov A., Starchenko G., Tatusova T.A., Wagner L., Yaschenko E. & Ye J. (2009) Database resources of the National Center for Biotechnology Information Nucleic Acids Res 37: D5-15

Schmidt S.K., Lynch R.C., King A.J., Karki D., Robeson M.S., Nagy L., Williams M.W., Mitter M.S. & Freeman K.R. (2010) Phylogeography of microbial phototrophs in the dry valleys of the high Himalayas and Antarctica Proc. R. Soc. B 278: 702-708

Scholz M.B., Lo C.-C. & Chain P.S.G. (2012) Next generation sequencing and bioinformatics bottlenecks: the current state of metagenomic data analysis Current Opinion in Biotechnology 23: 9-15

Scott K.M., Sievert S.M., Abril F.N., Ball L.A., Barrett C.J., Blake R.A., Boller A.J., Chain P.S.G., Clark J.A., Davis C.R., Detter C., Do K.F., Dobrinski K.P., Faza B.I., Fitzpatrick K.A., Freyermuth S.K., Harmer T.L., Hauser L.J., Hügler M., Kerfeld C.A., Klotz M.G., Kong W.W., Land M., Lapidus A., Larimer F.W., Longo D.L., Lucas S., Malfatti S.A., Massey S.E., Martin D.D., McCuddin Z., Meyer F., Moore J.L., Ocampo Jr L.H., Paul J.H., Paulsen I.T., Reep D.K., Ren Q., Ross R.L., Sato P.Y., Thomas P., Tinkham L.E. &

159

Zeruth G.T. (2006) The Genome of Deep-Sea Vent Chemolithoautotroph Thiomicrospira crunogena XCL-2 PLoS Biology 4(12): e383

Shi Y., Tyson G.W., Eppley J.M. & DeLong E.F. (2011) Integrated metatranscriptomic and metagenomic analyses of stratified microbial assemblages ISME J. 5: 999-1013

Shivaji S., Kumari K., Kishore K.H., Pindi P.K., Rao P.S. Srinivas T.N.R., Asthana R. & Ravindra R. (2011) Vertical distribution of bacteria in a lake sediment from Antarctica by culture-independent and culture-dependent approaches Research in Microbiology 162: 191-203

Shock E.L, Holland M., Meyer-Dombard D.R. & Amend J.P. (2005) Geochemical Sources of Energy for Microbial Metabolism in Hydrothermal Ecosystems: Obsidian Pool, Yellowstone National Park. Geothermal Biology and Geochemistry in Yellowstone National Park 95-109 pub Montana State University Thermal Biology Institute, Bozeman, MT, USA

Shock E.L, Holland M., Meyer-Dombard D., Amend J.P., Osburn G.R., Fischer T.P. (2010) Quantifying inorganic sources of geochemical energy in hydrothermal ecosystems, Yellowstone National Park, USA. Geochimica et Cosmochimica Acta 74: 4005-4043

Sievert S.M., Wieringa E.B.A., Wirsen C.O. & Taylor C.D. (2007) Growth and mechanism of filamentous-sulfur formation by Candidatus Arcobacter sulfidicus in opposing oxygen-sulfide gradients Environmental Microbiology 9(1): 271-276

Sievert S.M. & Vetriani C. (2012) Chemoautotrophy at deep-sea vents: Past, present and future Oceanography 25(1): 218-233

Simon C., Wiezer A., Strittmatter A.W. & Daniel R. (2009) Phylogenetic Diversity and Metabolic Potential Revealed in a Glacier Ice Metagenome Appl Env Microbiol 75(23): 7519- 7526

Sorokin D.Y. (1996) Oxidation of sulfide and elemental sulfur to tetrathionate by chemoorganoheterotrophic bacteria Microbiology 65(1): 1-5

Spear J.R., Walker J.J., McCollom T.M. & Pace N.R. (2005) Hydrogen and bioenergetics in the Yellowstone geothermal ecosystem Proc Natl Acad Sci 102(7): 2555-2560

Stamatakis A. (2006) RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models Bioinformatics 22(21): 2688-2690

Stamatakis A., Hoover P. & Rougemont J. (2008) A Rapid Bootstrap Algorithm for the RAxML Web Servers Syst. Biol. 57(5): 758-771

Stewart F.J., Dmytrenko O., DeLong E.F. & Cavanaugh C.M. (2011a) Metatranscriptomic analysis of sulfur oxidation genes in the endosymbiont of Solemya velum Frontiers in Microbiology 2: Article Number 134 DOI=10.3389/fmicb.2011.00134

160

Stewart F.J., Ulloa O. & DeLong E.F. (2011) Microbial metatranscriptomics in a permanent marine oxygen minimum zone Environmental Microbiology doi:10.1111/j.1462- 2920.2010.02400.x

Stookey L.L. (1970) Ferrozine – A New Spectrophotometric Reagent for Iron Analytical Chemistry 42(7): 779-781

Sun S., Chen J., Li W., Altintas I., Lin A., Peltier S., Stocks K., Allen E.E., Ellisman M., Grethe J. & Wooley J. (2011) Community cyberinfrastructure for Advanced Microbial Ecology Research and Analysis: the CAMERA resource Nucleic Acids Research 39: D546-D551

Suzuki S., Ishida T., Kurokawa K. & Akiyama Y. (2012) GHOSTM: A GPU-Accelerated Homology Search Tool for Metagenomics PLoS ONE 7(5): e36060

Tamaki H., Hanada S., Kamagata Y., Nakamura K., Nomura N., Nakano K. & Matsumura M. (2003) Flavobacterium limicola sp. nov., a psychrophilic, organic-polymer-degrading bacterium isolated from freshwater sediments International Journal of Systematic and Evolutionary Microbiology 53: 519-526

Teske A., Brinkhoff T., Muyzer G., Moser D.P., Rethmeier J. & Jannasch H.W. (2000) Diversity of Thiosulfate-Oxidizing Bacteria from Marine Sediments and Hydrothermal Vents Appl Environ Microbiol 66(8): 3125-3133

Thamdrup B., Finster K., Fossing H, Hansen J.K. & Jørgensen B.B. (1994) Thiosulfate and sulfite distributions in porewater of marine sediments related to manganese, iron and sulfur geochemistry Geochimica et Cosmochimica Acta 58: 67-73

Thamdrup B and Dalsgaard T (2002). Production of N2 through Anaerobic Ammonium Oxidation Coupled to Nitrate Reduction in Marine Sediments. Applied and Environmental Microbiology 68(3): 1312-1318

Thomas T., Gilbert J. & Meyer F. (2012) Metagenomics – a guide from sampling to data analysis Microbial Informatics and Experimentation 2: 3

Tringe S.G., von Mering C., Kobayashi A., Salamov A.A., Chen K., Chang H.W., Podar M., Short J.M., Mathur E.J., Detter J.C., Bork P., Hugenholtz P. & Rubin E.M. (2005) Comparative Metagenomics of Microbial Communities Science 308: 554-557

Tuttle J.H. & Jannasch H.W. (1977) Thiosulfate Stimulation of Microbial Dark Assimilation of Carbon Dioxide in Shallow Marine Waters Microbial Ecology 4: 9-25

Tyson G.W., Chapman J., Hugenholtz P., Allen E.E., Ram R.J., Richardson P.M., Solovyev V.V., Rubin E.M., Rokhsar D.S. & Banfield J.F. (2004) Community structure and metabolism through reconstruction of microbial genomes from the environment Nature 428: 37- 43

161

The UniProt Consortium (2011) Ongoing and future developments at the Universal Protein Resource Nucleic Acids Research 39: D214-D219

Vick T.J., Dodsworth J.A., Costa K.C., Shock E.L and Hedlund B.P. (2009) Microbiology and geochemistry of Little Hot Creek, a hot spring environment in the Long Valley Caldera. Geobiology 8(2): 140-154

Von Trappen S., Vandecandelaere I., Mergaert J. & Swings J. (2004) Gillisia limnaea gen. nov., sp. nov., a new member of the family Flavobacteriaceae isolated from a microbial met in Lake Fryxell, Antarctica International Journal of Systematic and Evolutionary Microbiology 54: 445-448

Von Trappen S., Vandecandelaere I., Mergaert J. & Swings J. (2005) Flavobacterium fryxellicola sp. nov and Flavobacterium psychrolimnae sp. nov., novel psychrophilic bacteria isolated from microbial mats in Antarctic lakes International Journal of Systematic and Evolutionary Microbiology 55: 769-772

Wagner C., Mau M., Schlomann M., Heinicke J. & Koch. U. (2007) Characterization of the bacterial flora in mineral waters in upstreaming fluids of deep igneous rock aquifers. Journal of Geophysical Research 112: G01003

Wang Q., Garrity G.M., Tiedje J.M. & Cole J.R. (2007) Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy Appl Environ Microbiol 73(16): 5261-5267

Wang J., Vollrath S., Behrends T., Bodelier P.L.E., Muyzer G., Meima-Franke M., Den Oudsten F., Van Cappellen P. & Laanbroek H.J. (2011) Distribution and Diversity of Gallionella-Like Neutrophilic Iron Oxidizers in a Tidal Freshwater Marsh Appl Environ Microbiol 77(7): 2337-2344

Walsh D.A., Zaikova E., Howes C.G., Song Y.C., Wright J.J., Tringe S.G., Tortell P.D. & Hallam S.J. (2009) Metagenome of a Versatile Chemolithoautotroph from Expanding Ocean Dead Zones. Science 326: 578-582

Watanabe K., Watanabe K., Kodama Y., Syutsubo K. & Harayama S. (2000) Molecular Characterization of Bacterial Populations in Petroleum-Contaminated Groundwater Discharged from Underground Crude Oil Storage Cavities Appl Env Microbiology 66(11): 4803-4809

Wommack K.E., Bhavsar J. & Ravel J. (2008) Metagenomics: Read Length Matters Appl Environ Microbiol 74(5): 1453-1463

Weatherburn M.W. (1967) Phenol-Hypochlorite Reaction for Determination of Ammonia Analytical Chemistry 39(8): 971-974

Williamson M.A. & Rimstidt J.D. (1992) Correlation between structure and thermodynamic properties of aqueous sulfur species Geochimica et Cosmochimica Acta 56: 3867-3880

162

Xie W., Wang F., Guo L., Chen Z., Sievert S.M., Meng J., Huang G., Li Y., Yan Q., Wu S., Wang X., Chen S., He G., Xiao X. & Xu A. (2011) Comparative metagenomics of microbial communities inhabiting deep-sea hydrothermal vent chimneys with contrasting chemistries. ISME J 5: 414-426

Yamamoto M., Nakagawa S., Shimamura S., Takai K. & Horikoshi K. (2010) Molecular characterization of inorganic sulfur-compound metabolism in the deep-sea epsilonproteobacterium Sulfurovum sp. NBC37-1 Environmental Microbiology 12(5): 1144- 1153

Yamamoto M. & Takai K. (2011) Sulfur metabolisms in epsilon- and gamma-Protebacteria in deep-sea hydrothermal fields Frontiers in Microbiology 2: Article 192 doi: 10.3389/fmicb.2011.00192

Ziemert N., Podell S., Penn K., Badger J.H., Allen E. & Jensen P.R. (2012) The Natural Product Domain Seeker NaPDoS: A Phylogeny Based Bioinformatic Tool to Classify Secondary Metabolite Gene Diversity PLoS ONE 7(3): e34064

Zhang J.-Z. & Millero F.J. (1993) The products from the oxidation of H2S in seawater Geochimica et Cosmochimica Acta 57: 1705-1718

Zopfi J., Ferdelman T.G. & Fossing H. (2004) Distribution and fate of sulfur intermediates – sulfite, tetrathionate, thiosulfate, and elemental sulfur – in marine sediments Sulfur Biogeochemistry – Past and Present. Geological Society of America Special Paper 379: 97-116

Zumft WG (1997) Cell Biology and Molecular Basis of Denitrification. Microbiology and Molecular Biology Reviews 61(4): 533-616

163

APPENDIX 1

PYTHON SCRIPT grab_seqs.py

"""Program grab_seqs.py written by [email protected] version 2.7 12 November 2011

Documentation

Purpose of the program is to enable the user to quickly retrieve specific DNA, RNA or amino acid sequences from a much larger fasta file of sequences. The program reads an input file containing the list of unique sequence identifiers (seqID) that the user wants to retrieve. It then searches the larger fasta file of sequences for each identifier in turn, and when it is found, copies the sequence, and its identifier, into a new file. The new file is in fasta format ready to use in BLAST. The larger file is not altered (ie target sequences are copied not removed).

The program can cope with an input file that contains duplicates of the same sequence identifier (eg if using to grab seqs for genome assembly) and it provides a report (a log file) if a sequence is not found, or if it is found more than once (log file is empty if each sequence identifier is only included once in the input sequence identifier list, and only appears once in the target fasta file).

The output fasta file contains sequences in the same order as the list of sequence identifiers in the input file.

If a seqID is included only once in input sequence identifier file, and the sequence is only included once in the target fasta file, then there will be no message in the log file. If a sequence is not found at all, there will be a message saying that. If any seqID is included more than once in the input sequence identifier file (ie is included x times) then there will be x messages for that seqID. If target sequence is only found once in the target fasta file then each message will say the sequence is found x times. If the messages say that the sequence was found more than x times (eg y times) then it appears more than once (ie y/x times) in the target fasta file; that should never happen, each sequence identifier should only be attached to a single sequence in the target fasta file. So if y does not equal x, further investigation is needed to tell if the input fasta file has duplicate entries for that sequence or, worse, if the same sequence identifier is attached to two different sequences.

Program, files with sequencer identifiers and the fasta sequences all need to be in

164

the same directory, and you need to be in this directory in the terminal. Also need to make sure that the directory doesn't already contain files with the name output_seqs.txt or you will end up adding the new seqs to that file, instead of creating new file (so once program has run, be sure to re-name that file with a more descriptive, and unique, name). Similarly need to ensure directory doesn't already contain a file named log_file.txt

Program reports when it has finished in the terminal.

Program is run from the command line with the command: python grab_seqs.py argument1 argument2 leave a space (no comma) between program name and the arguments (and between arguments) argument1 is the file name of the file containing the sequence identifiers (including suffix such as .txt) argument2 is the target file of fasta seqs (including suffix such as .txt)

The sequence identifier input file should have each sequence identifier on a separate line, with no spaces before or after, and no carriage return after the last entry. The sequence identifiers should not have the > sign before them.

IMPORTANT - it makes a huge difference whether the carraige return \r is used (program assumes this) or if the carriage return \n is used (program doesn't work). If the sequence identifiers were in a single column on an Excel spreadsheet and copied to a txt file they should be in the correct format automatically. Can use the script check_seq_ident_input.py to check seq ID input files are in the correct format (it splits seqIDs into a list, using the \r to separate, just like this program does, and then prints the list - so it is possible to see if the list seqs1 has been created correctly ie one list item = one seqID or not). If the return button was used to create a new line this will leave a \n at the end - so amend grab_seqs.py to recognize \n instead of \r

The target fasta file (from which the sequences are going to be copied) MUST have a carriage return at the end of the file, after the last sequence.

""" import sys # sys module allows program to read command line arguments arg1 = sys.argv[1] # imports 1st command line argument (input file of sequence identifiers) and assigns it a variable (string) name arg2 = sys.argv[2] # imports 2nd command line argument (target file of fasta sequences) and assigns it a variable name myfile1 = open(arg1,'r') # opens file for reading myfile2 = open(arg2,'r') myfile3 = open('output_seqs.txt', 'a')

165

# creates new file with the name output_seqs.txt to which output (ie sequences program has copied) will be written # if there is already a file with the name output_seqs.txt the sequences will be added to the end of that file myfile4 = open('log_file.txt','a') # creates a new file (or opens existing file if it exists) with name log_file.txt s1 = myfile1.read() # program reads file into 1 string, with variable name s1 s2 = myfile2.read() seqs1 = s1.split('\r') # creates a list where each sequence identifier is a list entry (function splits fasta string at \r) # As this is a list, if it is printed all entries will be in '' seqs2 = s2.split('>') # creates a list where each sequence in fasta string (split at >) is a list entry, in '' with > removed output1 = [] for seqID in seqs1: for seq in seqs2: if seqID in seq: output1.append(seq) else: pass output1.insert(0,'') output2 = '>'.join(output1) # re-joins list into string, removing '' and replacing > for seqID in seqs1: a = output2.count(seqID) if a == 0: myfile4.write(seqID) myfile4.write(' was not found\n') elif a == 1: pass elif a >= 1: b = str(a) myfile4.write(seqID) myfile4.write(' was copied ') myfile4.write(b) myfile4.write(' times\n') # reports on whether each sequence was found once (no message), not at all, or multiple times myfile3.write(output2) # writes output2 into the file 'output_seqs.txt' myfile1.close() # closes file myfile2.close() myfile3.close() myfile4.close() print 'program has finished running'

166

APPENDIX 2

PYTHON SCRIPT check_seq_ident_input.py

"""check_seq_ident_input.py written by [email protected] version 2.0 12 November 2011

Purpose of the program is the check the format of a file of sequence identifiers which will be used as an input file for grab_seqs.py (as the first argument for that program), ie the file which contains the sequence identifiers of the sequences which one wants to copy from a large fasta file of sequences (the second argument for grab_seqs.py).

The sequence identifier input file should have each sequence identifier on a separate line, with no spaces before or after, and no carriage return after the last entry.

This program will check that format by printing a list of each sequence identifier. If the input file is formatted correctly, each sequence identifier will be a separate list item, there will be no extra spaces or extra characters and no empty list items (will get an empty list item at the end of the printed list if there is a carriage return after the last sequence identifier in the input file)

To run from the command line, command is: python check_seq_ident_input.py name_of_seq_ident_input_file.txt leave a space between the program name and the name of the file to be checked

""" import sys # sys module allows program to read command line arguments arg1 = sys.argv[1] # imports 1st command line argument (input file of sequence identifiers) and assigns it a variable (string) name myfile1 = open(arg1,'r') # opens file for reading s1 = myfile1.read() # program reads file into 1 string, with variable name s1 seqs1 = s1.split('\r') # creates a list where each sequence identifier is a list entry (function splits fasta string at carriage return ie '\r') # As this is a list, if it is printed all entries will be in '' print seqs1

167